0% found this document useful (0 votes)
2 views

ZAX RESEARCH_PAPER_2

The document discusses 'ZAX', an AI voice assistant developed using gTTS and Python, which integrates AIML and Google text-to-speech technologies to assist users with various daily tasks. It highlights the functionalities of AI voice assistants, their underlying technologies like neural networks and natural language processing, and the continuous growth in research surrounding this field. The paper also presents a survey of related projects and their respective advantages and challenges in the realm of voice assistant technology.

Uploaded by

pranaydhole545
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

ZAX RESEARCH_PAPER_2

The document discusses 'ZAX', an AI voice assistant developed using gTTS and Python, which integrates AIML and Google text-to-speech technologies to assist users with various daily tasks. It highlights the functionalities of AI voice assistants, their underlying technologies like neural networks and natural language processing, and the continuous growth in research surrounding this field. The paper also presents a survey of related projects and their respective advantages and challenges in the realm of voice assistant technology.

Uploaded by

pranaydhole545
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

International Journal of Science and Research (IJSR)

ISSN: 2319-7064
SJIF (2022): 7.942

"ZAX" - AI Voice Assistant


Rajat Sharma1, Adweteeya Dwivedi2
1B-Tech, Department of CSE, SRM Institute of Science &Technology, Delhi NCR Campus, U.P., India
Email: rrajat.sharma19[at]gmail.com
2B-Tech, Department of CSE, SRM Institute of Science & Technology, Delhi NCR Campus, U.P., India
Email: adweteeya1999[at]gmail.com

Abstract: ZAX, a virtual embedded voice assistant that includes cutting-edge technology based on gTTS and Python in developing a
personalized assistant. ZAX integrates the functionality of AIML and, together with Google, the industry leader, a text-to-speech
platform and thus male/female voices into the gTTS libraries powered by the Marvel world. This is often the result of adopting the
dynamic base Pyttsx Pythons considered wise in contiguous phases of gTTS, facilitating the establishment of essentially fine-tuned
dialogues between assistants management and users. It will help end users in their daily activities like general human speech, query
search in Google, Bing or Yahoo, video search, image retrieval, live weather, word meaning, predict and remind users of scheduled
events and tasks. This is often the sole result of over-contributing by multiple contributors, such as AIML’s usability and ability to
dynamically merge with platforms like Python [pyttsx] and gTTS [Google Text to Speech] ] results in the same ZAX standard structure
showing general reusability and almost zero or no maintainability.

Keywords: Voice Assistant, NLP, Neural Network, Google Search

1. Introduction AI voice assistants often perform simple tasks for end users,
such as adding tasks to the calendar; provide information
AI voice assistant, also known as a virtual or digital that can usually be searched in an Internet browser; or
assistant, is a device that uses voice recognition technology, control and check the health of sensitive devices in the
natural language processing, and Artificial Intelligence (AI) home, send emails, setting up of alarms, getting weather
to respond to people. Through technology, the device reports, can give your location, perform some basic
aggregates user messages, breaks them down, rates them, mathematical calculations, check news, start the music, and
and gives meaningful feedback in return. Artificial open different websites like stack overflow, you tube,
intelligence can bring real conversations. Virtual assistants, Facebook etc.
understand natural language voice commands and performs
tasks for users. These tasks, previously performed by a 2. Related Work
personal assistant or secretary, include dictation, reading text
messages or exchanging email messages aloud, schedule 2.1 Generalization
appointments for end users. The AI assistant can also
perform other activities, such as sending messages, The below mentioned pie chart shows the analysis of virtual
answering phone calls, and getting directions. It also helps to assistants in context to education as well as purpose of this
read news and weather updates, open Google, You Tube, work with a total of papers from 13 countries. The highest
Stack Overflow, etc. , answer any questions, web scraping, contribution was made by country England with most
play mu-sic, etc. Although this definition emphasizes the number of papers (3), followed by Russia and Switzerland
digital style of a virtual assistant, the term virtual assistant or (2 papers each). The remaining countries, namely Singapore,
virtual personal assistant is additionally unremarkably wont Pakistan, Canada, India, France, Bulgaria, Saudi Arabia and
to describe contract employees United Nations agency work Germany are also mentioned with 1 paper each (Figure 2.1)
from home and perform body tasks un-remarkably
performed by executives, assistant or secretary. Digital
assistants can also be compared with other form of
consumer-facing AI programming known as responsive
advisors. Sensible adviser programs are topic-oriented,
whereas virtual assistants are task-oriented.

"Virtual assistants are typically cloud-based programs that


require internet-connected devices and/or applications to
function". The technologies that power virtual assistants
require vast amounts of knowledge, powering the platforms,
as well as machine learning, language communication Figure 2.1: Pie Chart
processes, and speech recognition arena. There are dedicated
devices to provide virtual assistance. The most stylish on the The below displayed bar graph shows that growth is
market from Amazon, Google and Microsoft having Alexa, continuous in research papers since 2000, except the year
Google Siri and Cortana as AI voice assistants respectively 2010 (Figure 2.2). So this is indicating that this field of
given by each company. research is progressive in a contiguous manner.

Volume 11 Issue 5, May 2022


www.ijsr.net
Licensed Under Creative Commons Attribution CC BY
Paper ID: SR22503183839 DOI: 10.21275/SR22503183839 386
International Journal of Science and Research (IJSR)
ISSN: 2319-7064
SJIF (2022): 7.942
• The project aims to develop a non-public assistant for
Computers (computer Personal Assistant). It provides an
easy interface for finishing a selection of tasks by using
bound well-defined commands. Users will move with the
assistant through voice commands. As a non-public
assistant, it assists the end-user with regular activities
sort as general human spoken communication, looking
out queries, reading the most recent news, translating
words, live weather, and causation mail through voice.
The software package uses a device’s electro-acoustic
transducer to receive voice requests whereas the output
Figure 2.2: Bar Graph
takes place at the system’s speaker.[11]
• "The virtual worlds offer many resources to engage their
2.2 Specific Researches
users (named avatars) like freedom of movement,
teleport yourself to other locals, communication with
• AI technologies appear to be extensively adopted, folks other inhabitants (both text and voice messages),
don’t use them in some cases. Technology adoption has capacity to create, modify and destroy objects and the
been studied for several years, and there is a square possibility of programming behaviors to these objects via
measure, several general models, within the literature scripts". The world is surplus of the resources for
describing it. How-ever, having a lot of made-to-order excelling in different fields but they just require some
models for rising technologies upon their options appears ways for communication. [10]
necessary. during this study, we have a tendency to • This article introduces virtual embedded voice assistants
develop an abstract model involving a replacement including gTTS, and ad-vanced Python-based technology
system quality construct, i.e., interaction quality, that we in custom assistant development. It integrates features of
have a tendency to believe will higher describe the AIML and Google’s industry-leading platform for text-
adoption of AI-based technologies.[5] to-speech con-version, and thus human voices are
• Artificial Intelligence programs have currently become included in the gTTS library. This is often the result of
capable of difficult humans by providing professional applying the Pythons pyttsx dynamic base that is
Systems, Neural Networks (NN), Natural Language considered wise in the contiguous phases of gTTS and
Processing (NLP), and Speech Recognition. Computer AIML, facilitating the establishment of noisy dialogues
science brings a bright future for various technical that are worth attention between the assistant and thus
inventions in various fields. This review paper shows the the user.[7]
final thought of computer science, and therefore the use
of speech recognition, and gifts the impact of computer The below survey table shows some projects with their
science within the present and future world.[4] respective pros and cons. (Table2.1)

Table 2.1: Survey Table


S. No Project Technologies Result Issues
1. Voice Assistant using Voice activation, automatic speech Design and implementation of Absence of additional or
python recognition, dialog management digital assistance multiple features
2. AI based voice assistant Python 2.7 , Spider, json, machine A modern model with some Similar with basic
learning advance features established. prototype and lacks
multidimensionality
3. An interpretation of gTTS(Google text to speech), Integration of gTTS, AIML Dependency on a
AIML with integration of AIML(Artificial Intelligence Markup particular platform
gTTS and Python Language)
4. Interoperability in virtual WWW(World wide web) services, Virtual world’s communication, Less vulnerable to modern
world HTTP, XML real world to virtual world (R2V) operating systems
5. Natural language Artificial Intelligence, Natural language Understanding of natural Only developing the
understanding processing language processing, syntact understanding of NLP,
processing difficult to implement
6. Chabot song Python, chatterbot library, list trainer Developing basic Chabot system Dedicated to a particular
recommender system feature only
7. AI Chabot in python Pip , NumPy, tensorflow, random Automated communication Limited to certain queries
system developed and conversation

3. System Analysis • And, with the help of machine learning modules and
Deep Learning modules built emotions in the model and
3.1 Training Model dataset to help the model in training.

• With the help of NN as neural network and NLP as 3.2 Neural Networks
natural language processing, create a brain of the model.
"NN reflects the behavior of the human brain, enabling
computer programs to recognize patterns and solve common

Volume 11 Issue 5, May 2022


www.ijsr.net
Licensed Under Creative Commons Attribution CC BY
Paper ID: SR22503183839 DOI: 10.21275/SR22503183839 387
International Journal of Science and Research (IJSR)
ISSN: 2319-7064
SJIF (2022): 7.942
problems in artificial intelligence and other AI applications". 3.4 Natural Language Processing
An Artificial Neural Networks (ANNs) consists of a layer of
nodes, including an input layer, one or more hidden layers, NLP implies "Natural Language Processing", which is part
and an output layer. Each node is connected to another node, of the user language of computer science and one of the
with weights and thresholds associated with it. If the output applications of artificial intelligence. This is a technology
of an individual node is greater than the specified threshold, used by machines to understand, analyze, manipulate, and
that node wakes up and sends data to the next layer of the interpret human language. This helpdevelopers organize
network. Otherwise, the data will not be sent to the next their knowledge to perform tasks such as translation, book
layer of the network. The network relies on training datasets reading, speech recognition, and topic segmentation. (Figure
to learn and improve accuracy over time. (Figure 3.1) 3.2)
1) NLP helps users ask questions about a topic and get a
direct answer within seconds.
2) NLP provides accurate answers to your questions. That
is, it does not pro-vide unnecessary and unnecessary
information.
3) NLP helps computers communicate with people in that
language.
4) Most IT industries use natural language processing to
improve the efficiency and accuracy of the
documentation process and identify information from
large databases.
Figure 3.1: Neural Networks

Think of each node as a unique linear regression model


consisting of input data, weights, bias as thresholds, and
outputs.
ziri + th = z1r1 + z2r2 + z3r3 + th
output = g(r) = 1 if z1x1 + c >= 0; 0
if z1x1 + c < 0

When an input layer is specified, weight area units are Figure 3.2: Natural Language Processing
assigned. These weights make it easy to see the importance
of a particular variable. Large variables pay a lot of attention 3.3 Speech Recognition System
to the output for different inputs. Then the units of all input
areas are incremented and summed with different weights. The speech recognition system is the core of the voice
Then the output is passed. When this output exceeds a application system, which is capable of understanding the
certain threshold, the node is triggered and knowledge is voice input given by the user, and at the same time operating
propagated to future layers in the network. This makes the the applications efficiently and generating voice feedback to
exit of one node the entrance of future nodes. This method the user. This system is an important component for users as
of passing knowledge from one layer to the future layer a gateway to use their voice as an input component. (Figure
defines this neural network as a feed forward network. 3.3) . In a word, in order to clearly recognize the user’s
speech command and get a response from the system, we
should consider that the speech recognition system contains
the whole process by which the application system directs
the generation. voice signal to text data and some important
meanings, forms of speech.

Figure 3.3: Speech Recognition System

Volume 11 Issue 5, May 2022


www.ijsr.net
Licensed Under Creative Commons Attribution CC BY
Paper ID: SR22503183839 DOI: 10.21275/SR22503183839 388
International Journal of Science and Research (IJSR)
ISSN: 2319-7064
SJIF (2022): 7.942
3.4 PyTorch - Machine Learning Library • Getting Current News about his/her motherland, about
world, about technologies, about sports or about
PyTorch allows developers to teach neural network models entertainment of the industry and much more, the user
in a very distributed way. It uses Python’s native support for can easily get the news just by giving voice input to
user-machine communication and asynchronous execution assistant to open news so it will open new tab and it can
of aggregate operations to provide optimized performance in also fetch the data from the websites and return it to the
analytics and production environments. console and read out for user without any labor.
• *torch.cuda: supports CUDA tensor types that implement • Weather Forecast, through this feature users can see the
the same function as CPU tensors. weather forecast for any location. In addition, the
• *torch.nn: this package provides many more classes and temperature and humidity of Kelvin will return the
modules to implement and train the neural network. weather.
• *torch.utilis.data: this package is mainly used for • Open Applications like , YouTube, google search
creating datasets. engine , launching websites , system applications with
the help of web browser python library and os for
3.5 Linear Regression Concept opening system applications(like, code editor,
notepad,chrome,etc.)
This algorithm is a method of finding a linear relationship • Close Applications, the application work perfectly by
between a dependent variable and an independent variable providing a command ‘TASKKILL/ F/im file.exe’. The
by minimizing the distance. This is a supervised algorithm. assistant close that application asked to close.
Here, we use a machine learning supervised algorithmic • Automation, the application performs automation for
approach to categorize individual categories. Using this YouTube and any search engine with the help of
algorithm, we created a voice assistant model that allows keyboard python library. The user just need to give input
users to predict relationships between dependent and and the assistant will perform the automation ask.
independent entities. • Voice Assistant caneven repeat the user’s words by
takeCommand function and speak function.
• WhatsApp Messages, the application work by taking
mobile number of the receiver or the name of the
receiver, message to send , time when to send as a query.
As the result , voice assistant will send the message and
inform you. This is done with help of pywhatkit python
library. And the history of messages will be saved in
pywhatkit database file .
• Checking Internet Speed, the application is done with
the help of speedtest python library by which assistant
will check and return the result on the console.
• Checking my location, this feature allows users to view
their current location or find directions to any location.
• Listening to music, the voice assistant plays the music
requested by the user, either from the user’s system or
D=p+qI through an online search, without the user having to do
p= (D)(I2)-(I)(ID)n(I2) - (I)2(D)(I2) - (I)(ID)n(I2) - (I)2 it.
q= n(ID) - (I)(D)n(I2) - (I)2 • ZAX translator, this feature translates the user’s original
text input into the desired language. Over forty human
I is the independent variable languages are stored in the dictionary.
D is the dependent variable • Audiobooks, the application is very attractive as the
p is intercept and q is slope of line here voice assistant will open and read the book in your
favourite language for the user to understand the book
4. Proposed System with the help of pypdf python library.
• The Assistant create a note to save the user’s important
• The voice assistant initiates voice mode and prompts the data for future use.
user to provide input in voice/text format for best results • Sending Mails, this feature allows users to send an email
from the voice assistant. As this program can also be to someone whose contacts include an email address. It
controlled with your phone with help of an application then sends the successful execution of the task back to
‘WO-MIC’, it just turns any android phone into a the user via the Hearing Assistant.
wireless microphone and helps in the reduction of • TimeTable Notification, the voice assistant will remind
unwanted noise in the environment. you the work accord-ing to the user’s time table schedule
• Using this application, which is Wikipedia’s search and as a result it also give notification with the help of
engine, users can contact the wizard and the wizard will notify python library .
retrieve the data from the internet. The results are • The Voice Assistant can answer any query with the
displayed in the console window in audible format, up to help of wikihow python library and wolframalpha
a limited number of lines. algorithm.

Volume 11 Issue 5, May 2022


www.ijsr.net
Licensed Under Creative Commons Attribution CC BY
Paper ID: SR22503183839 DOI: 10.21275/SR22503183839 389
International Journal of Science and Research (IJSR)
ISSN: 2319-7064
SJIF (2022): 7.942
• A setting alarm is a basic function of any device, this and ask the user for the name of the file to save in the
allows the user to set the alarm to a specific time. required file folder on the system for later viewing.
• Chatbot, this feature communicates with the user on a • Calculations, this function performs an arithmetic
case-by-case basis. It also works whenever the user calculation with a user’s voice command and produces
provides voice input to the assistant and the user receives an output that is a calculated solution through a voice
the output in the voice response of the voice assistant assistant.
ChatBot.
• ScreenShot, this feature allows the user to take a 4.1 System Architecture
screenshot of the current window or photo or current file

Initially the condition is that if the ZAX voice assistant is Now after the proceedings if the skills to be executed are
active or not, if it is active then it asks for the user input adequate to ZAX then it gives a positive response to the
otherwise make ZAX active(make it on). Then user provides user in form of speech and then executes the commands for
the input in the form of speech or text, after that if the input operations, hence gives the console output and speech.On
provided is in text then it goes for the action to be taken or the other hand if the skills to be executed are not adequate or
the skills to be executed, else if the input is in speech then it inappropriate to ZAX it gives a negative response and
uses the speech recoginition feature and converts it into text executes no further commands to give console output.
and goes for the action. (Figure 4.1) (Figure 4.1)

4.2 Sequence Diagram

Volume 11 Issue 5, May 2022


www.ijsr.net
Licensed Under Creative Commons Attribution CC BY
Paper ID: SR22503183839 DOI: 10.21275/SR22503183839 390
International Journal of Science and Research (IJSR)
ISSN: 2319-7064
SJIF (2022): 7.942

Figure 4.2: Sequence Diagram

• The user sends command to the voice assistant ZAX Figure 4.3: Use Case Diagram
then it forwards it to Interpreter i.e. speech recoginition
feature here and then is directed perform the specific 4.4 Activity Diagram
task, after the processing in task model ZAX executes
the task and give the response or feedback to the user.
(Figure 4.2)
• If after the processing at task model there is some
missing information then ZAX asks for that information,
takes the input again, gathers all information and follow
the same process as detailed above. (Figure 4.2)

4.3 Use Case Diagram

Figure 4.4: Activity Diagram

5. Experimental Result
On User speech command voice assistant display google
search of the query asked and read the solution for the user
too.(Figure 5.1)

Volume 11 Issue 5, May 2022


www.ijsr.net
Licensed Under Creative Commons Attribution CC BY
Paper ID: SR22503183839 DOI: 10.21275/SR22503183839 391
International Journal of Science and Research (IJSR)
ISSN: 2319-7064
SJIF (2022): 7.942

Figure 5.1: Google Search

On User speech command voice assistant open News (Figure 5.2)

Figure 5.2: Open News (TOI)

On User speech command voice assistant open ‘my location’ (Figure 5.3)

Figure 5.3: Open My Location

Volume 11 Issue 5, May 2022


www.ijsr.net
Licensed Under Creative Commons Attribution CC BY
Paper ID: SR22503183839 DOI: 10.21275/SR22503183839 392
International Journal of Science and Research (IJSR)
ISSN: 2319-7064
SJIF (2022): 7.942
6. Conclusion [5] Nasirian, F., Ahmadian, M., and Lee, O.-K. D. (2017).
“Ai-based voice assistant systems: evaluating from the
ZAX - An AI Voice Assistant System uses speech interaction and trust perspectives.
recoginition, gTTs and other AI techniques along with [6] RAJA, K. D. P. R. A. (2020). “ZAX ai using python.
Neural Networks and Natural Language Processing for a [7] Sangpal, R., Gawand, T., Vaykar, S., and Madhavi, N.
smart responsive system to the given circumstances or (2019). “ZAX: An inter-pretation of AIML with
conditions. It can reduce the workload of basic human integration of gtts and python.” 2019 2nd International
activities or the daily activities and can replace some human Con-ference on Intelligent Computing, Instrumentation
working posts like personal secretaries employed for and Control Technologies (ICI-CICT), Vol. 1. 486–
scheduling a person’s per day time table. Critically, the 489.
system is designed to interrelate with other sub-systems [8] Steen, J. and Wilroth, M. (2021). “Adaptive voice
smartly and comprehensively. control system using ai.
[9] Terzopoulos, G. and Satratzemi, M. (2019). “Voice
The system will have the following phases: Input phase in assistants and artificial intelligence in education.”
which data or query given in form of text or speech, Proceedings of the 9th Balkan Conference on
Interpretetion of voice to text, Processing and storing of Informatics. 1–6.
data, producing output in the form of voice from the refined [10] Tibola, L. R. and Tarouco, L. M. R. (2013).
text to ZAX console. The information produced at each step “Interoperability in virtual world.” XVIII Congreso
can then be used to retreive patterns and analyze them for Argentino de Ciencias de la Computación.
later use. This could be the main basis for artificial [11] Vora, J., Yadav, D., Jain, R., and Gupta, J. (2021).
intelligence machines to learn and recognize patterns for “ZAX: A pc voice assistant.
people. So, based on a literature study and analysis of [12] Nasirian et al. (2017) Malodia et al. (2021) Vora et al.
persisting systems, the conclusion is derived that our (2021) Tibola and Tarouco(2013) Sangpal et al. (2019)
provided system will not only facilitate interaction with RAJA (2020) Beirl et al. (2019) Terzopoulos and
systems and modules but also keep users more organized. Satratzemi
[13] (2019) Alotto et al. (2020) Steen and Wilroth (2021)
7. Future Enhancement Canbek and Mutlu (2016)

Enhancement in the capacity of database or the data training


sets can be done in this for more situations or the
acquaintances that can be faced by ZAX. This would
upgrade its effectiveness and the wide range ability of
producing responses. Further addition of more voices can
also be done as an additional feature. So these limitations
can be broken with the increase in data training sets.

The interface of the system can be improved more or we can


say can be optimized. From saying more optimized it is
meant that the interface can be more user friendly,
comprehensive and easy to use for more percentage of users.
So the ZAX would become more accessible and intractable.

References
[1] Alotto, F., Scidà, I., and Osello, A. (2020). “Building
modeling with artificial intelligence and speech
recognition for learning purpose.” Proceedings of
EDULEARN20 Conference, Vol. 6. 7th.
[2] Beirl, D., Rogers, Y., and Yuill, N. (2019). “Using
voice assistant skills in family life.” Computer-
Supported Collaborative Learning Conference, CSCL,
Vol. 1, Inter-national Society of the Learning Sciences,
Inc. 96–103.
[3] Canbek, N. G. and Mutlu, M. E. (2016). “On the track
of artificial intelligence: Learning with intelligent
personal assistants.” Journal of Human Sciences,
13(1), 592–601.
[4] Malodia, S., Islam, N., Kaur, P., and Dhir, A. (2021).
“Why do people use artificial intelligence (AI)-enabled
voice assistants?.” IEEE Transactions on Engineering
Management.

Volume 11 Issue 5, May 2022


www.ijsr.net
Licensed Under Creative Commons Attribution CC BY
Paper ID: SR22503183839 DOI: 10.21275/SR22503183839 393

You might also like