{"id":3568,"date":"2020-02-29T16:06:15","date_gmt":"2020-02-29T21:06:15","guid":{"rendered":"http:\/\/williamsportwebdeveloper.com\/cgi\/wp\/?p=3568"},"modified":"2020-02-29T16:11:32","modified_gmt":"2020-02-29T21:11:32","slug":"deepspeech-on-the-jetson-nano","status":"publish","type":"post","link":"http:\/\/williamsportwebdeveloper.com\/cgi\/wp\/?p=3568","title":{"rendered":"DeepSpeech on the Jetson Nano"},"content":{"rendered":"<p><a href=\"https:\/\/github.com\/mozilla\/DeepSpeech\">DeepSpeech<\/a> is an open source speech recognition engine developed by Mozilla. It uses machine learning to convert speech to text. Since it relies on TensorFlow and Nvidia&#8217;s CUDA it is a natural choice for the Jetson Nano which was designed with a GPU to support this technology. Unfortunately, getting this running is not easy so I thought I would write a helpful bog post with some tips.<\/p>\n<p>First, the hard part of compiling DeepSpeech for the Jetson Nano has already been done for you. Go to <a href=\"https:\/\/github.com\/domcross\/DeepSpeech-for-Jetson-Nano\/releases\/tag\/v0.6.0\">https:\/\/github.com\/domcross\/DeepSpeech-for-Jetson-Nano\/releases\/tag\/v0.6.0<\/a> and download the <strong>deepspeech-0.6.0-cp36-cp36m-linux_aarch64.whl <\/strong> and <strong>libdeepspeech.so<\/strong> files from the GitHub repository. That should be all the instruction you need. Unfortunately it is not that easy.<\/p>\n<p>Second, install the Python wheel from the file. You cannot install DeepSpeech without this downloaded file you provide:<\/p>\n<pre>sudo pip install deepspeech-0.6.0-cp36-cp36m-linux_aarch64.whl<\/pre>\n<p>If you are not familiar with Linux, you may be wondering where to copy the <strong>libdeepspeech.so<\/strong> file. Run the following command to determine where to copy the libdeepspeech.so file:<\/p>\n<pre>cat \/etc\/ld.so.conf.d\/*<\/pre>\n<p>This indicates that \/usr\/local\/lib would be a good location so copy the file there:<\/p>\n<pre>sudo cp libdeepspeech.so \/usr\/local\/lib<\/pre>\n<p>But just copying that file is not enough. You need to run another command so Linux knows about this new shared library:<\/p>\n<pre>sudo ldconfig<\/pre>\n<p>Finally run the following command to see if DeepSpeech is working:<\/p>\n<pre>rsrobbins@nvidia-ai:~$ deepspeech --version\r\nTensorFlow:\r\nDeepSpeech:\r\nrsrobbins@nvidia-ai:~$<\/pre>\n<p>You are supposed to get version numbers for TensorFlow and DeepSpeech but both are blank. At least you are not getting any errors. Next you need to download the pre-trained English models from <a href=\"https:\/\/github.com\/mozilla\/DeepSpeech\">https:\/\/github.com\/mozilla\/DeepSpeech<\/a> and extract them. The deepspeech-0.6.1-models.tar.gz file is 1.14 GB so you might want to download this using a computer with a decent Internet connection and copy the file to your Jetson Nano.<\/p>\n<p>You can now transcribe an audio file:<\/p>\n<pre><span class=\"style1\">rsrobbins@nvidia-ai<\/span>:~$ cd deepspeech\r\n<span class=\"style1\">rsrobbins@nvidia-ai<\/span>:~\/deepspeech$ deepspeech --model deepspeech-0.6.1-models\/output_graph.pbmm --lm deepspeech-0.6.1-models\/lm.binary --trie deepspeech-0.6.1-models\/trie --audio audio\/2830-3980-0043.wav\r\nLoading model from file deepspeech-0.6.1-models\/output_graph.pbmm\r\nTensorFlow:\r\nDeepSpeech:\r\n2020-02-29 14:46:19.470759: I tensorflow\/stream_executor\/platform\/default\/dso_loader.cc:42] Successfully opened dynamic library libcuda.so.1\r\n2020-02-29 14:46:19.479426: I tensorflow\/stream_executor\/cuda\/cuda_gpu_executor.cc:972] ARM64 does not support NUMA - returning NUMA node zero\r\n2020-02-29 14:46:19.479575: I tensorflow\/core\/common_runtime\/gpu\/gpu_device.cc:1640] Found device 0 with properties:\r\nname: NVIDIA Tegra X1 major: 5 minor: 3 memoryClockRate(GHz): 0.9216\r\npciBusID: 0000:00:00.0\r\n2020-02-29 14:46:19.479619: I tensorflow\/stream_executor\/platform\/default\/dlopen_checker_stub.cc:25] GPU libraries are statically linked, skip dlopen check.\r\n2020-02-29 14:46:19.479744: I tensorflow\/stream_executor\/cuda\/cuda_gpu_executor.cc:972] ARM64 does not support NUMA - returning NUMA node zero\r\n2020-02-29 14:46:19.479900: I tensorflow\/stream_executor\/cuda\/cuda_gpu_executor.cc:972] ARM64 does not support NUMA - returning NUMA node zero\r\n2020-02-29 14:46:19.479978: I tensorflow\/core\/common_runtime\/gpu\/gpu_device.cc:1763] Adding visible gpu devices: 0\r\n2020-02-29 14:46:20.310523: I tensorflow\/core\/common_runtime\/gpu\/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:\r\n2020-02-29 14:46:20.310602: I tensorflow\/core\/common_runtime\/gpu\/gpu_device.cc:1187]\u00a0\u00a0\u00a0\u00a0\u00a0 0\r\n2020-02-29 14:46:20.310635: I tensorflow\/core\/common_runtime\/gpu\/gpu_device.cc:1200] 0:\u00a0\u00a0 N\r\n2020-02-29 14:46:20.310884: I tensorflow\/stream_executor\/cuda\/cuda_gpu_executor.cc:972] ARM64 does not support NUMA - returning NUMA node zero\r\n2020-02-29 14:46:20.311108: I tensorflow\/stream_executor\/cuda\/cuda_gpu_executor.cc:972] ARM64 does not support NUMA - returning NUMA node zero\r\n2020-02-29 14:46:20.311283: I tensorflow\/stream_executor\/cuda\/cuda_gpu_executor.cc:972] ARM64 does not support NUMA - returning NUMA node zero\r\n2020-02-29 14:46:20.311425: I tensorflow\/core\/common_runtime\/gpu\/gpu_device.cc:1326] Created TensorFlow device (\/job:localhost\/replica:0\/task:0\/device:GPU:0 with 704 MB memory) -&gt; physical GPU (device: 0, name: NVIDIA Tegra X1, pci bus id: 0000:00:00.0, compute capability: 5.3)\r\nLoaded model in 1.53s.\r\nLoading language model from files deepspeech-0.6.1-models\/lm.binary deepspeech-0.6.1-models\/trie\r\nLoaded language model in 0.0271s.\r\nRunning inference.\r\nexperience proof less\r\nInference took 7.315s for 1.975s audio file.\r\n<span class=\"style1\">rsrobbins@nvidia-ai<\/span>:~\/deepspeech$<\/pre>\n<p>You might be wondering where the heck is the text from the speech in the audio file? This program does not have a very intuitive user interface. The transcribed text is actually in the output directly after &#8220;Running inference&#8221; and reads &#8220;experience proof less&#8221;. The demo WAV file has only three spoken words. The actual speech in the audio file is &#8220;experience proves this&#8221;.<\/p>\n<p>Although the demo audio files from Mozilla work well enough, you may need to install Sound eXchange to support conversion of audio files. DeepSpeech expects this to be installed. Naturally there is no mention of this requirement in the documentation. Run this command to install SoX:<\/p>\n<pre>sudo apt-get install sox<\/pre>\n<p>My additional tip is to run DeepSpeech using sudo if you get an error and run it again if the GPU runs out of memory.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>DeepSpeech is an open source speech recognition engine developed by Mozilla. It uses machine learning to convert speech to text. Since it relies on TensorFlow and Nvidia&#8217;s CUDA it is a natural choice for the Jetson Nano which was designed &hellip; <a href=\"http:\/\/williamsportwebdeveloper.com\/cgi\/wp\/?p=3568\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1606],"tags":[1875,1876,1877],"_links":{"self":[{"href":"http:\/\/williamsportwebdeveloper.com\/cgi\/wp\/index.php?rest_route=\/wp\/v2\/posts\/3568"}],"collection":[{"href":"http:\/\/williamsportwebdeveloper.com\/cgi\/wp\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/williamsportwebdeveloper.com\/cgi\/wp\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/williamsportwebdeveloper.com\/cgi\/wp\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/williamsportwebdeveloper.com\/cgi\/wp\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=3568"}],"version-history":[{"count":3,"href":"http:\/\/williamsportwebdeveloper.com\/cgi\/wp\/index.php?rest_route=\/wp\/v2\/posts\/3568\/revisions"}],"predecessor-version":[{"id":3571,"href":"http:\/\/williamsportwebdeveloper.com\/cgi\/wp\/index.php?rest_route=\/wp\/v2\/posts\/3568\/revisions\/3571"}],"wp:attachment":[{"href":"http:\/\/williamsportwebdeveloper.com\/cgi\/wp\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=3568"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/williamsportwebdeveloper.com\/cgi\/wp\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=3568"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/williamsportwebdeveloper.com\/cgi\/wp\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=3568"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}