1. Huqariq will allow you to listen to directions, record, play and delete audio files that will then be automatically linked with their respective transcription files.
2. Huqariq requires an account to use it. The identification field will be connected via web service to the RENIEC database. The connection with RENIEC will allow us 1) to facilitate the registration since only the DNI number will suffice, 2)
determine the variety of Quechua spoken by the user and 3) guarantee the veracity of the identity of the people who used our application and 4) reduce cases of deliberate misuse of the tool
3. Huqariq will convert the audio files to a mono channel, sampled at 16 kHz, encoding with 16-bit precision and saving them in WAV format.
4. Huqariq will implement a phrase compensation method, so that not only some phrases are recorded, but all phrases are recorded. The method finds that the most frequently recorded phrases do not appear at the beginning of the list of phrases to be recorded, but rather that the least recorded phrases are first in the list. We seek that all prayers are recorded in the same quantity.
5. When the tool is used in a connected mode, the information will be sent through a restful service (API) to a server where the information will be processed and stored. In offline mode, the information will be stored in a database on the same cell phone. All information stored on the cell phone will be automatically sent to the server when the user has a connection to
Internet.
6. The sending process will have a load verification method in the service, that is, the information will not be sent to an intermediate server that has a lot of network traffic, but the information will be sent to another server with less traffic. We will use an application server that allows up to 864,000 connections daily, up to 12 requests / s and 8 queued requests.
7. Information security is important for this reason, the application will use a 4-layer transmission architecture. In the first layer is Flask, which is a back-end server, the second layer is Gunicorn, which is a python web framework, the third layer is a supervisor, which monitors and controls the Gunicorn process, finally the fourth layer is Nginx , which is a reverse proxy. These components integrate the information transmission architecture to improve system performance, in addition to encrypting the data using MD5 to python, allowing information to be sent securely.
8. The correct pronunciation checker will not only help the user, but will also act as a filter to prevent unwanted recordings. Optionally, to get an idea of the participant's level of command of Quechua, an oral previous test could be added to discover the evidence of the
influence of Spanish on the volunteer. Also optionally the corpus entries could be labeled according to language proficiency levels.
9. Develop a semi-automatic audio quality verifier. The verifier will verify that the language spoken in the audio is correct, it will also detect if the audio contains only noise or only silence, in addition, the verifier will classify the audios into good, neutral and bad according to the quality of the recording. The verifier will help us filter the audios correctly to create a quality corpus.
10. Includes a set of tools that allow us to improve the quality of the audios recorded by users. Among the functions that the tools will have are: eliminating long silences, eliminating background noise, eliminating unfinished words, improving the volume of the audio and eliminating background music.
2. Huqariq requires an account to use it. The identification field will be connected via web service to the RENIEC database. The connection with RENIEC will allow us 1) to facilitate the registration since only the DNI number will suffice, 2)
determine the variety of Quechua spoken by the user and 3) guarantee the veracity of the identity of the people who used our application and 4) reduce cases of deliberate misuse of the tool
3. Huqariq will convert the audio files to a mono channel, sampled at 16 kHz, encoding with 16-bit precision and saving them in WAV format.
4. Huqariq will implement a phrase compensation method, so that not only some phrases are recorded, but all phrases are recorded. The method finds that the most frequently recorded phrases do not appear at the beginning of the list of phrases to be recorded, but rather that the least recorded phrases are first in the list. We seek that all prayers are recorded in the same quantity.
5. When the tool is used in a connected mode, the information will be sent through a restful service (API) to a server where the information will be processed and stored. In offline mode, the information will be stored in a database on the same cell phone. All information stored on the cell phone will be automatically sent to the server when the user has a connection to
Internet.
6. The sending process will have a load verification method in the service, that is, the information will not be sent to an intermediate server that has a lot of network traffic, but the information will be sent to another server with less traffic. We will use an application server that allows up to 864,000 connections daily, up to 12 requests / s and 8 queued requests.
7. Information security is important for this reason, the application will use a 4-layer transmission architecture. In the first layer is Flask, which is a back-end server, the second layer is Gunicorn, which is a python web framework, the third layer is a supervisor, which monitors and controls the Gunicorn process, finally the fourth layer is Nginx , which is a reverse proxy. These components integrate the information transmission architecture to improve system performance, in addition to encrypting the data using MD5 to python, allowing information to be sent securely.
8. The correct pronunciation checker will not only help the user, but will also act as a filter to prevent unwanted recordings. Optionally, to get an idea of the participant's level of command of Quechua, an oral previous test could be added to discover the evidence of the
influence of Spanish on the volunteer. Also optionally the corpus entries could be labeled according to language proficiency levels.
9. Develop a semi-automatic audio quality verifier. The verifier will verify that the language spoken in the audio is correct, it will also detect if the audio contains only noise or only silence, in addition, the verifier will classify the audios into good, neutral and bad according to the quality of the recording. The verifier will help us filter the audios correctly to create a quality corpus.
10. Includes a set of tools that allow us to improve the quality of the audios recorded by users. Among the functions that the tools will have are: eliminating long silences, eliminating background noise, eliminating unfinished words, improving the volume of the audio and eliminating background music.
Show More