Build a transcription application

Speech-to-Text using AssemblyAI and Streamlit

Reda Merzouki
5 min readJan 20, 2022
Photo by History in HD on Unsplash

Do you need to convert pre-recorded audio and video files as well as real-time audio streams into text? If so, you should definitely consider using the AssemblyAI APIs.

AssemblyAI is a deep learning company that builds powerful APIs to transcribe and understand audio.

In this article, I will focus on the first category of use cases i.e. converting pre-recorded audio and video files into text. I will share with you a way to easily build a speech-to-text transcription application leveraging AssemblyAI APIs and make it accessible to the audience. I will describe step by step how to build a transcription application, deploy it to the cloud and share it with your audience. This application is built with Python, AssemblyAI, Streamlit and lastly deployed on Streamlit Cloud.

As I wrote in one of my previous articles, Streamlit is a great open-source framework that makes it easy to create and share beautiful custom web applications for machine learning and data science.

The Speech-to-Text web application I built was inspired by Chanin Nantasenamat Chanin’s video. In fact, I started with Chanin’s code to which I made some modifications mainly to handle the option of downloading audio or video files from your local machine in addition to YouTube files, I also described how to deploy the local Streamlit application to the cloud.

The aim of my approach is on the one hand to share my Speech-to-Text application with as many people as possible by deploying it on Streamlit Cloud and on the other hand to demonstrate how easy it is to build a Speech-to-Text application in a way that people both enjoy and understand.

Those who want to know more about the code, tools and files that allowed the deployment of the application may refer to my GitHub repository here.

1- Required tools to create your recipe :

2- The code :

The following code is quite straightforward, consisting of three custom functions.

Function #1 -> def get_yt(URL) : this is intended to handle YouTube video through its URL. It retrieves the YouTube audio file through its URL using pytube library and loads it into your current working directory.

Function #2 -> def upload_file(uploaded_file) : this is intended to handle the second option i.e. uploaded file from your local machine. This function will write the uploaded audio into your current working directory.

Function #3 -> def transcribe_audio_file() : This function will upload the mp4 audio file from your current working directory to AssemblyAI where it will be transcribed to text and retrieved. For this purpose, we will successively use the following API endpoints, https://api.assemblyai.com/v2/upload and https://api.assemblyai.com/v2/transcript.

3- Testing our code locally :

First you need to install the required libraries :

pip install streamlit pytube

Then run the app code :

streamlit run audio_to_text_2.py

We test our application with YouTube URL and audio or video mp4 files uploaded from our local machine.

Speech-to-text application running locally

As soon as we are confident that the application is running properly we initiate the deployment on Streamlit Cloud, but before that we make sure to update our Github repository with any changes made in our local project directory. Indeed, on Streamlit, Apps are deployed directly from their GitHub repository.

4- Pushing our code changes on Github :

git statusgit add .git commit -m “my message”git push origin main

5- Deploying the application on Streamlit :

Go back to the application page running locally, open the menu at the top right, select and hit “ Deploy this app”.

Deploy the app on Streamlit

You will land on the following page ; click on “Advanced settings…”.

Copy your API key stored in the secrets.toml file, paste it below and hit save.

Now, hit Deploy!

Notice that when changes occur in your Github repository, they are detected and code changes are pulled automatically and the app is updated.

In the following short video, I show you step by step how to deploy the Speech-to-Text transcription application on Streamlit Cloud and how to use it.

For your information, my web app is deployed on Streamlit and is accessible through the following path : https://share.streamlit.io/rmerzouki/speech-to-text/main/audio_to_text_2.py

On average, turnaround time is roughly 15–30% of the audio file’s duration, with a minimum processing time of around 20–30 seconds.

6- Recreate the application and use it :

If you are interested in this application then you may simply follow the steps described in README.md file - available in my GitHub repository here - to recreate it on your side, use it and even enhance it.

7- Conclusion :

I hope you enjoyed reading this article where I described a simple way to build a speech-to-text transcription application using AssemblyAI and Streamlit. I also hope this article will be useful in your daily work. Of course, do not hesitate to use my code and files in my GitHub repository, feel free to enrich this work by suggesting new changes and/or adding new features.

Should you have any questions or you would like to stay in touch, feel free to contact me on LinkedIn: Reda Merzouki

Thank you for reading !

--

--

Reda Merzouki

Senior Data Scientist & Solutions Architect passionate about data, digital transformation and mathematics underlying machine learning.