In the last years the use of vocal recognition
systems, and more generally speech recognition system, has captured the
interest of many people. With speech recognition we mean the translation of spoken words into text.
In recent years companies like Google and
Microsoft showed a lot their interest in this technology: this is strictly related
to their investments on handheld systems, like smartphone and tablet. Honestly
I think that also in the next years this subject will be trendy: the direction
is towards touch-less systems.
In this work I tried to understand how neural
nets can be applied, in particular, to vowel recognition. If we think about it,
to speak is one of the most important property of humans, but we learn a lot of
it when we are child. As you know neural nets try to reproduce human brain
behavior: so I thought to apply this technology on a basic (but very complex)
human task. My goal is to understand how neural nets are suitable in this
field, and moreover what are the strategies to take into account.
As result, goals of this work are to verify if
vowels can be recognized only by pronouncing it: if it works, we can try to
recognize it in a word.
Moreover, I tried to make things as accessible as
possible: just using a browser (in particular Chrome/Safari). I used the web
audio API (a new incoming standard related to HTML5) that offer a wide range of
instruments. For the neural network side I used a very good library, called brain-js,
that worked very well.
DEMO
In order to use the application, just click
this link.
If you want run it locally, you have to use a local server (like Apache, or
WAMP for window user). You have to put the folder in the www folder, and reach
the index from localhost.
If you want to test my training set (an Italian one), you can load it from the
application.
SPECIFICATION DOCUMENT
If you want to continue the reading, download the whole document here
SPECIFICATION DOCUMENT
If you want to continue the reading, download the whole document here