How To Use Whisper AI - The Only Guide You Need
How To Use Whisper AI - The Only Guide You Need
Learn More
From ChatGPT to Dalle, and now, Whisper, Open AI has set the stage for the revolution of AI with some of
the most valuable and mindblowing tools you will ever find. Their youngest child, Whisper, is on a
transcription tool that beats all the rest in time, cost, and accuracy.
While it has gained much praise for being the best, one concern remains: few people know how to use it. The
fact that you can’t download it like any other software is a big letdown to likely users.
After conducting my research, some of the concerns I noted respondents raising were, “It's too technical to
use!” and ‘You have to go through numerous developer notes that are tiresome to read!”
If this is a problem you have encountered, here is an easy step-by-step solution on how to use Whisper
OpenAI.
A meeting head who wants to derive the context of a previously recorded Zoom meeting
Start Now
Our guide is a step-by-step process for installing Whisper in Windows for offline use. To get started, you
need several prerequisites on your computer to ensure a smooth download and install.
1. Python
2. Git
3. Rust
6. Pytorch
7. FFmpeg
Python
For this installation, we will use Python version 3.9.9, but its dependencies allow it to work with versions
between 3.7 and 3.11.
Head to the Python website and click on the preferred Python, depending on the release date to download.
For this guide, I chose to use the Python 3.9.9. Click on it and scroll to the section with the installation files.
Click on the file best suited to your systems. The download will start immediately.
Once done, install the software into your system. When installing Pythion for the first time, remember to click
"Add to path" at the bottom of the first page of the installer. This allows you to run Python from a terminal.
Failure to check this box can cause the entire Whisper installation to fail.
Git
Since the Open AI Whisper files are on a GitHub repository, you need to download, configure, and install Git
to your system to access these files.
Visit Git for Windows and choose an installer that suits your device.
Rust
Installing Rust in your system will help you avoid errors when building the wheels for tokenizers, a unique
requirement when installing (Python) py-based projects.
1. Head to Rust’s official site and choose an installer that best fits your computer system.
2.Open your command interface and run the following command line: pip install setuptools-rust
N.B: To open a CMD interface, Click ‘Windows+R’ to quick launch an app; type ‘cmd’ then click ‘run.’
NVIDIA CUDA
If you have used any AI tool before, you already know that a lot of computation power is needed to run these
tools. Therefore, running the AI tools on devices that run using NVIDIA GPUs and have NVIDIA CUDA
installed is highly favorable. CUDa improves the GPUs' processing power, allowing them to be more efficient
in processing data than traditional GPUs.
Unfortunately, you can only install CUDA on devices that run on NVIDIA GPUs. However, this does not mean
you cannot use Whisper on CPU devices. As you will see later, Whisper can run on various models from tiny,
base, small, medium, and large. The higher the model, the more the computation power and vice versa.
Therefore, all models, CPU or GPU users, can benefit.
If your device can support an NVIDIA CUDA, visit the NVIDIA website and download the latest CUDA
compatible with PyTorch.
PIP
PIP is a package installer and management tool for Python applications and packages. It’s a necessity if you
want to manage all your PyPL installations using the command line.
Newer versions of Python come with an already installed PIP. However, if you are running an older version,
you must download it to your computer.
To check if there is an installed PIP on your device, access your cmd and run the command prompt:
Pip help
However, if you find an error response, you must install it on your device.
Visit https://pip.pypa.io/en/stable/installation/ for a step-by-step guide on downloading PIP into your system.
PyTorch
Pytorch is a deep-learning library mostly used when running applications that rely on GPUs and CPUs.
Developers prefer it due to its speed and flexibility of implementation.
To install it, go to the PyTorch Website and choose your installation preferences based on what you will be
using.
Start for Free
Copy and run the command in your cmd interface to download PyTorch.
N.B: If you use a GPU, select CUDA 11.7 or 11.8. Select the CPU if your device does not have an NVIDIA
graphics card.
FFmpeg
FFmpeg is one of the most critical tools in this list since it will help convert audio to the format Whisper can
process. To download it:
Scroll down to where the Windows Icon is and click on it. Click on one of the two files that appear below it. I
have chosen the ‘Windows builds by BtbN.’This will open a new page where you will find various ffmpeg
assets.
Scroll down and select the one that matches your system. For me, I'll choose the bigger ‘Win64’gpl. Click on
it to download the zip folder containing the files.
Extract the files to a folder and open them. In the bin file, you will find three applications you must install on
your system.
To do so, head to the local disk C and create a folder. Name this as ‘Path.’ Then, copy your three applications
and paste them into the ‘Path file’ on the local disk.
Click at the top of the drive to copy the file path ‘C:\Path’
Next, Click on the Start button and search for “Edit environment variables.” Open it.
To confirm the installation is successful, open a new cmd prompt window and run ‘ffmpeg.” The installation
succeeded if the code appears like that in the image below.
Install Whisper
Since everything is ready, you can now install Whisper. To do so:
1. Open your command console and run the command lines below:
pip install git+https://github.com/openai/whisper.git
This error means the pip command cannot locate git in your device. As a result, it cannot connect to the
Whisper repository. To correct this problem, click here to download git for Windows, then run the pip install
command again. During the git installation, click on the check box that auto-updates the path automatically.
This will allow Pip to locate the git on your device.
2. Once the installation is complete, you only need to run Whisper in a command interface:
Here, you will see all the languages the tool can work with alongside other options that can help you run the
tool, such as the Whisper model and output format. To get more information on the various commands you
can run whisper on, use the command:
Whisper -h
N.B: If you encounter an error that says “it’s not a recognized internal or external command,” add the
Python script directory to the Path with your Python installation.
Start Now
3.Click on Audio Setup and set your microphone as the recording device for a crispier take.
4.Click on the Record icon to start recording. Once done, Click the Stop Button to end the recording.
5.Head to ‘File’ and select ‘Export’ to save your recording as MP3, WAV, or OGG.
When using Notta:
The Chrome Extension can allow you to capture audio from a source.
To use it:
Hit ‘Start Recording’ and Play the audio source. Click ‘Stop’ to complete the recording.
N.B: Notta automatically saves all the recordings in the dashboard. To access and export them, navigate to
your account dashboard and find the recording you want to export. Notta allows you to export the audio as
an MP3.
Save the audio file you want to transcribe in a new folder. I will call my folder ‘Transcribe.’
Open a new command prompt from the new folder. To do this, click on the file directory and type ‘cmd.’
In the command prompt window, Type ‘Whisper followed by the file name you want to transcribe. If there are
spaces in between the name of the file, remember to add apostrophe marks.
The transcription process will begin, and the time it takes to complete will depend on
A research paper comparing the Word-Error-Rate (WER) between Whisper and six other current speech
recognition models reveals that Whisper outperforms the best open-source model (NVIDIA STT) in every
data set.
As you can tell from the table above, Whisper AI takes the crown of being the most accurate tool among all
the other language models.
Still, it's essential to acknowledge that less than five languages have a word error rate lower than 5%, and
more than 25 languages have a 50% and above word error rate. Still, it manages to make 50% fewer errors
than language models.
N.B: AI speech technology is constantly improving, and Whisper AI is far from perfect. Some areas it may be
lacking include:
It can occasionally leave out some punctuations
While it shines in performance, we still acknowledge that accuracy is still a concern to all language models,
Whisper included, especially when dealing with non-English languages.
Whisper can transcribe a total of 99 languages and translate them all into English. According to the AI, the
most straightforward language to transcribe is Spanish, Italian, English, and Portuguese. All these have a
word error rate of less than 5%.
Here is a distribution of how the languages compare in their word error rates:
4 <5%
9 5 - 10 %
19 10 - 20 %
11 20 - 30 %
4 30 - 40 %
6 40 - 50 %
11 50 - 90 %
18 90 - 200%
The most significant benefit that comes with using Whisper is that it is free to use! You can run Whisper
locally without registering and paying any subscription fees.
But there is a catch. It will cost you time and resources to install and use the software. Considering Open AI
does not provide ongoing support and integration assistance, encountering errors will create operational
setbacks.
At the same time, to get the best out of the tool, you need to use a device with a good GPU. How so?
Whisper provides five language models that you can use for transcription. These include
Tiny
Base
Small
Medium
Large.
Each model requires a certain amount of processing power to operate. For example, tiny and base needs a
VRAM of about 1 GB each, small 2GB, medium 5GB, and large 10 GB. The higher the processing power, the
faster the result.
Ideally, an Nvidia GPU (GTX970 or any newer version) can serve you well.
Do not confuse speed with accuracy. While the larger models use less time and more GPU resources, they
are not necessarily the most accurate.
As seen above, Whisper AI is a winner in transcription accuracy. Unfortunately, it lags behind due to its
limited features, numerous failure modes, and a lack of assistance. Also, it eliminates users with CPU devices
as they cannot maximize the use of the tool.
As such, one tool that may interest the average user that boasts high accuracy and everything else Whisper
lacks is Notta.
Notta is a transcription and translation software that can record, transcribe, and translate both audio and
video. It is among the best tools for podcasters, students, and marketing teams. Notta is a web app, Chrome
extension, and mobile app that allows seamless access across devices. Some of its most notable features
include:
1. Highly accurate - Notta delivers an accuracy of 99.98%, making it better than most tools in the
market.
2. AI summary - Notta leverages GPT-4 to derive a highly accurate and concise summary from the
generated transcription to give you an overview of the whole conversation.
3. Extensive language support - It can transcribe 58 languages and translate 42 more than any other
AI tool.
4. Fast turnaround time - The transcription process is very fast. You can get a 2-hour audio in just 5
minutes. Moreover, you don't need an expensive GPU to improve the speed!
A world of handpicked news, insights, and trending topics are just one subscription away.
Related Blogs
4.Navigate to the dashboard, click on the transcribed file, and make any necessary edits using the built-in
editor.
Product Product
Notta Bot: A Smarter Way to Join and Record Meetings Notta Translation: An Efficient Translator Powered by AI
5.When ready to export, click the ‘Download’ icon at the top right corner.
Get Notta Learn More Compare
6.Choose the format you want to export and save your transcript.
Web App About Us vs Otter.ai
Pricing vs Rev
Changelog vs Sonix.ai
Integrations Tools
Conclusion
Dropbox YouTube Video Summarizer
From afar,
Google Whisper AI may seem like a tool only for tech-literate individuals, but it is, in fact, easy to use. The
Drive
only challenge you may encounter is during the set-up. While the steps may seem technical, follow this
guide to the letter, and nothing will stand in your way.
Please note that you can only access Whisper AI on the device that you install it. If you want a tool
compatible with various devices but still delivers the same level of accuracy as OpenAI’s Whisper model, give
English
Notta a try today.
Terms of Service Privacy Security Sitemap Notta Terms of Refer & Earn