Building a Multilingual Speech Recognition Model for RAG Without
Training.
About the Company:
TensorGo Technologies is an enterprise-grade low code PaaS company for computer vision products.
The platform enables users to build the most complex ML/DL applications in an easier manner by
integrating our APIs. We custom build State-Of-The-Art neural networks to solve the most challenging
problems in the world. We are shaping a smarter tomorrow through our deep learning, computer
vision-powered products.
Our fundamental goal is to help companies scale up their businesses, improve their processes, bring
down costs and enhance their customer engagement most efficiently. With our powerful and
enterprise-ready solutions years ahead in the game, we make the future happen at TensorGo.
Gartner Inc. has recognized TensorGo as a Cool Vendor in The Cool Vendor in AI for Computer Vision
- 2022. We also won the accolade for the Best Overall Pitch in the prestigious Oracle APAC Startup
Idol 2022.
Visit us at: https://tensorgo.com for more information. The TensorGo team wishes you Good Luck!
Objective:
To build a multilingual speech recognition model without training, using a pre-trained multilingual
speech recognition model, such as Multilingual Whisper, to enable RAG to perform tasks in multiple
languages.
Background:
RAG is a generative model that can be used for a variety of tasks, including speech recognition,
translation, and summarization. However, RAG is currently only trained to perform these tasks in a single
language. By building a multilingual speech recognition model without training, we can enable RAG to
perform these tasks in multiple languages without the need for additional training.
Deliverables:
The following deliverables are expected for the assignment:
● A multilingual speech recognition model without training, using a pre-trained multilingual
speech recognition model, such as Multilingual Whisper.
● A report that describes the evaluation of the model.
● Code and documentation for the project.
● Use RAG for doing Translation and Summarization
● The input will be audio/video files.
Evaluation:
The intern's performance will be assessed on the following criteria:
● The quality of the transcribed text generated by the model.
● The performance of RAG on the translation and summarization tasks, using the transcribed
text generated by the model.
● The clarity and completeness of the report.
● The quality of the code and documentation.
Duration:
The duration to submit the assignment is 3 days, can be extended to 5 days on email communication
TensorGo Software Pvt Ltd. CONFIDENTIAL Page 1 of 1