Chatbot 5 43
Chatbot 5 43
As we know Python is an emerging language so it becomes easy to write a script for Vo Assistant in
Python. The instructions for the assistant can be handled as per the requirement of user. Speech
recognition is the process of converting speech into text. is commonly used in voice assistants like
Alexa, Siri, etc. In Python there is an API called Speech Recognition which allows to convert
speech into text. It was an interesting task to make my own assistant. It became easier to send
emails without typing any word, searching on Google without opening the browser, and performing
many other daily tasks like playing music, opening your favourite IDE with the help of a single
voice command. In the current scenario, advancement in technologies is such that they can perform
any task with same effectiveness or can say more effectively than us. By making this project,
I realized that the concept of AI in every field is decreasing human effort and saving time.
Functionalities of this project include:
Now the basic question arises in mind that how it is an AI? The virtual assistant that I have created
is like if it is not an A.I, but it is the output of a bundle of the statement. But fundamentally, the
mail purpose of A.I machines is that it can perform human tasks with the same efficiency or even
more efficiently than humans. It is a fact that my virtual assistant is not a very good example of A.I.
v
LIST OF ABBREVIATIONS
S. No Acronym Abbreviations
1. NPL Natural language processing
2. OOPs Object Oriented Programming
3. AI Artificial Intelligence
4. ML Machine Learning
5. IDE Integrated Development Environment
6. GUI Graphical User Interface
vi
TABLE OF CONTENTS
Abstract v
List of Symbols & Abbreviations vi
1. Introduction……………………………………………………………
1. Introduction 1
1.1 About the project 2
1.2 Objective 2
2. System Analysis………………………………………………………….
2.1 Existing System 3
2.2 Proposed System 3
2.2.1 Details 4
2.2.2 Impact on Environment 9
2.2.3 Safety 9
2.2.4 Ethics 9
2.2.5 Cost 10
2.2.6 Type 10
2.2.7 Standards 10
2.3 Scope of the Project 11
2.4 Modules Description 12
2.5 System Configuration 12
3. Literature Overview……………………………………………….
3. Literature Overview 13
4. System Design…………………………………………………………
4.1 System Architecture 14
4.2 UML Diagrams 15
4.3System Design 19
5. Sample Code………………………………………………………………………
5.1 Coding 20
6. Testing………………………………………………………………………………..
6.1 Testing 26
6.2 Test cases 27
7. Output Screens………………………………………………………………………
7. Output Screens 29
8. Conclusion…………………………………………………………………………….
8.1 Conclusion 31
8.2 Further Enhancements
9. Bibliography………………………………………………………………………….
9.1 Books References 32
9.2 Websites References 32
9.3 Technical Publication References. 33
10 Appendices
A. Methodologies used 33
B. Testing Methods used etc., 35
11 Plagiarism Report 37
1.1INTRODUCTION
In today’s era almost all tasks are digitalized. We have Smartphone in hands and it is nothing less
than having world at your fingertips. These days we aren’t even using fingers. We just speak of the
task and it is done. There exist systems where we can say Text Dad, “I’ll be late today.” And the
text is sent. That is the task of a Virtual Assistant.
It also supports specialized task such as booking a flight, or finding cheapest book online from
various e-commerce sites and then providing an interface to book an order are helping automate
search, discovery and online order operations.
Virtual Assistants are software programs that help you ease your day-to-day tasks, such as search tool,
playing music, time, weather forecast, calendar and tells jokes etc. They can take commands via text
(online chat bots) or by voice. Voice based intelligent assistants need an invoking word or wake word
to activate the listener, followed by the command. For my project the wake word is JIA. We have so
many virtual assistants, such as Apple’s Siri, Amazon’s Alexa and Microsoft’s Cortana. For this
project, wake word was chosen JIA. This system is designed to be used efficiently on desktops.
Personal assistant software improves user productivity by managing routine tasks of the user and by
providing information from online sources to the user. JIA is effortless to use. Call the wake word
‘JIA ‘followed by the command. And within seconds, it gets executed. Voice searches have dominated
over text search. Web searches conducted via mobile devices have only just overtaken those carried
out using a computer and the analysts are already predicting that 50% of searches will be via voice
by 2020.Virtual assistants are turning out to be smarter than ever. Allow your intelligent assistant to
make email work for you. Detect intent, pick out important information, automate processes, and
deliver personalized responses. This project was started on the premise that there is sufficient amount
of openly available data and information on the web that can be utilized to build a virtual assistant
that has access to making intelligent decisions for routine user activities.
1
1.1 About the project
Virtual assistants have programmed applications that recognize human language and
commands which then result in the program performing tasks. Virtual assistants typically
perform simple jobs for users, such as adding tasks to a calendar; providing information that
would normally be searched in a web browser; or controlling and checking the status of smart
home devices, including lights, cameras, and thermostats.
Virtual Assistants are part of the Internet of Things. The Internet of Things is a network of
‘things’ that are able to collect and transfer data over a wireless network without human
intervention. You can read more about them here:
The core of these assistants is made of speech recognition programs and an audio engine.
Once you have this core, the possibilities are endless and you can add as many features as
you like.
1.2 OBJECTIVES
Main objective of building personal assistant software (a virtual assistant) is using semantic
data sources available on the web, user generated content and providing knowledge from
knowledge databases. The main purpose of an intelligent virtual assistant is to answer
questions that users may have. This may be done in a business environment, for example, on
the business website, with a chat interface. On the mobile platform, the intelligent virtual
assistant is available as a call-button operated service where a voice asks the user “Hello sir I
am your desktop assistant. Tell me how may I help you?” and then responds to verbal input.
Virtual assistants can tremendously save you time. We spend hours in online research and
then making the report in our terms of understanding. JIA can do that for you. Provide a topic
for research and continue with your tasks while JIA does the research. Another difficult task
is to remember test dates, birthdates or anniversaries. It comes with a surprise when you enter
the class and realize it is class test today. Just tell JIA in advance about your tests and she
reminds you well in advance so you can prepare for the test. One of the main advantages of
voice searches is their rapidity. In fact, voice is reputed to be four times faster than a written
search: whereas we can write about 40 words per minute, we are capable of speaking around
150 during the same period of time15. In this respect, the ability of personal assistants to
accurately recognize spoken words is a prerequisite for them to be adopted by consumers.
2
2. System Analysis
The problem with adding too many skills to a voice assistant is that there’s no way for the
user to memorize the list of voice commands it can and can’t give the AI assistant. As a result,
when an AI assistant can perform too many tasks, users will expect it to be able to understand
and do anything they tell it. We already have multiple virtual assistants like Siri, Alexa,
Google assistant etc.., There are number of people who have issues in voice recognition.
Usually, user needs to manually manage multiple sets of applications to complete one task.
Also, they are easy to use on mobile devices than desktop systems. The Virtual assistant like
Alexa ‘s cost is about Rs.2,500 to Rs.22,500 and above. The amount is huge for a virtual
assistant. The Siri and Google assistants are only worked on mobiles but not in desktops or
Laptops
We already have multiple virtual assistants. But we hardly use it. There are number of
people who have issues in voice recognition. These systems can understand English phrases
but they fail to recognize in our accent. Our way of pronunciation is way distinct from
theirs. Also, they are easy to use on mobile devices than desktop systems. There is need of a
virtual assistant that can understand English in Indian accent and work on desktop system.
When a virtual assistant is not able to answer questions accurately, it’s because it lacks the
proper context or doesn’t understand the intent of the question. Its ability to answer
questions relevantly only happens with rigorous optimization, involving both humans and
machine learning. Continuously ensuring solid quality control strategies will also help
manage the risk of the virtual assistant learning undesired bad behaviors. They require large
amount of information to be fed in order for it to work efficiently.
Virtual assistant should be able to model complex task dependencies and use these models to
recommend optimized plans for the user. It needs to be tested for finding optimum paths when
a task has multiple sub-tasks and each sub-task can have its own sub-tasks. In such a case
there can be multiple solutions to paths, and the it should be able to consider user preferences,
other active tasks, priorities in order to recommend a particular plan.
3
2.2.1 Details
Algorithms Used
Natural Language Processing (NLP), a subset of Artificial Intelligence (AI), enables
chatbots to understand language as we humans speak it. It means that the virtual assistant
(VA) doesn’t just read the words, but can understand the intent of a consumer’s question. It
can also understand the context of the conversation. This lets the interaction flow as a
conversation instead of as a question-answer session. This is far harder than it appears and
only the advances in AI have made it possible. Chatbots that use NLP can converse with users
or customers just like they would with a human agent, and get answers similarly.
4
How do voice assistants work?
Once the voice assistant is activated and receives audio input, the voice recognition software
launches a sequence of steps needed to process the data and generate appropriate response.
To put it simply, the stages a typical voice assistant goes through are:
speech recording coupled with noise reduction, (background sounds elimination), and
identification of speech components that the machine is able to process;
The core of most of these steps lies within a natural language processing (NLP) technology,
a subfield of computer science, information engineering, and artificial intelligence
concerned with the ability of machines to recognize human speech and analyze its meaning.
Voice assistants mainly rely on natural language generation (NLG) and natural language
understanding (NLU) as the parts of NLP. Obviously, NLU is responsible for accurate
5
speech comprehension, while NLG transforms data into understandable to human ear
parlance. Based on this info, we can distinguish the major challenges any voice assistant
deals with to become a working solution.
Voice assistants have to be able to distinguish between different accents, voice rates and
pitches as well as abstract from any ambient sounds. After this, audio segmentation steps in
to divide it into short consistent pieces that are later converted into text.
Much of this speech-to-text routine takes place on the servers and in the cloud, where your
phone can access the corresponding database and software that analyse your speech. Since
those remote servers are accessed by many people, the more it is used, the more
knowledgeable the system gets.
Prior to any recognition process, the system should have an implemented acoustic model
(AM) trained from a speech database and a linguistic model (LM) that determines the
possible sequence of words/sentences. The data is handled to machine learning and server
then compares the trained model with the input stream.
“The software breaks your speech down into tiny, recognizable parts called phonemes — there
are only 44 of them in the English language. It’s the order, combination and context of these
phonemes that allows the sophisticated audio analysis software to figure out what exactly
you’re saying … For words that are pronounced the same way, such as eight and ate, the
software analyses the context and syntax of the sentence to figure out the best text match for
the word you spoke. In its database, the software then matches the analysed words with the
text that best matches the words you spoke, “science line says.
6
For the system to transcribe your speech correctly, it should also be able to isolate the actual
wording from any background sounds. This could be done both with hardware and software
means. For instance, Siri uses Apple’s hardware. Yet, as long as noise reduction is based on
algorithms, the process is first done with some software. After this stage, it is either moved to
the hardware (if it is possible as in the case with Apple) or uses an OS pre-processor that
removes rough noises (ex. baby cry, bark, clicks and similar sudden sounds).
For stationary noise - a random noise for which the probability that the noise voltage lies
within any given interval does not change with time (ex. street, underground, trains) the
procedure is different. Stationary noise is used to train an AM with the help of machine
learning. Initially, it accumulates a solid noise-free database and is later added background
sounds that are also included in the AM. This way, the system is able to recognize a human
speech with stationary noise on the background without going through noise reduction stage.
Text entity extraction enables the voice system to correlate words to their meanings and
relevance within a sentence. It greatly contributes to the degree a voice assistant understands
how the language works and to the ability of the system to trace certain sentence patterns to
certain meanings.
However, to interpret the exact meaning of user’s speech, it’s essential to identify the single intent -
or aim - of the message. The challenge is that languages provide us with multiple ways to convey one
and the same intent:
Or imagine someone asks the voice assistant to talk about a certain city, what should the
system extract as a real intent? Is this the latest city news, timetables, the weather forecast or
something else pertaining to the name of the city? People rarely do say things explicitly, and
may even omit the keywords. If the voice assistant is not able to recognize the intent, it won’t
operate efficiently.
7
One example of a successful intent abstraction is IBM DeepQA, a software architecture for
deep content analysis and evidence-based reasoning. Here is how it works:
“First up, DeepQA works out what the question is asking, then works out some possible
answers based on the information it has to hand, creating a thread for each. Every thread
uses hundreds of algorithms to study the evidence, looking at factors including what the
information says, what type of information it is, its reliability, and how likely it is to be
relevant, then creating an individual weighting based on what Watson has previously learned
about how likely they are to be right. It then generated a ranked list of answers, with evidence
for each of its options.”
Finally, the intent should be translated into action to respond to the user’s request. Voice
assistant like Siri have come a long way from answering trivial questions about the weather
to controlling cars and smart home devices.
The process of impelling the system to perform a desired operation belongs to the OS level
and is no different from those with tactile input.
-Play songs
-Weather
-Location
-Open websites.
-Gives information
-Tell jokes
-News etc.
The software is programmed in such a way that its impact on environment is very safe and
does not impact environment in wrong way.
2.2.3 Safety
Virtual assistant collects a wide range of privacy and personal information that is
highly sensitive for individual users. It is important for us to protect their users ’ data
and keep them safe from leaks. And we have taken some steps towards this by making
it more difficult for third parties to track user behaviour and information.
2.2.4Ethics
The application follows the general software ethics. We have followed the general SW ethics
not harming anybody physically or virtually, and maintaining confidentiality.
2.2.5 Cost
There is one key rule for building a minimum viable product (MVP) for any voice initiative
- start small and keep it simple. MVPs should include only those critical features that are
needed for a product to function, yet demonstrate enough value for its earlier adopters.
We will use a driver’s voice assistant as an example and try to give you a rough idea of the
costs associated with it. Note that we won’t be covering the hardware-related issues, such as
porting to multiple platforms, noise dampening etc.
9
For an MVP you will need the voice assistant to react to about 20 most popular commands,
e.g. “Voice assistant on”, “Navigate home”, “Find contact/location” etc. If your assistant is
designed for the disabled people, you might need to prepare additional commands to
accommodate their needs.
The first step would be to create suitable acoustic and linguistic models. To do this you should
record at least a thousand people speaking the necessary commands. This number should
include both men and women of varying ages and education levels. Creating a model would
cost around 5000 USD.
The second step would require a team, including programmers, R&D specialists, QA
engineers, and other professionals. They will create, test and train a deep neural network that
would process the commands and act on them. We would estimate this part at around 20.000
USD.
2.2.7 Standards
Software process, which can deal with many challenges facing the python related
development process. These challenges include volatility of requirements, strong user
involvement, development time tightness, process simplicity, and production of valuable
software at low cost. The model divides the project into a series of development cycles or
short boxes, which are assigned to each professional on the project team. It is a collaborative
approach that allows a response to rapid change.
Voice assistants will continue to offer more individualized experiences as they get better at
differentiating between voices. However, it’s not just developers that need to address the
complexity of developing for voice as brands also need to understand the capabilities of each
device and integration and if it makes sense for their specific brand. They will also need to
focus on maintaining a user experience that is consistent within the coming years as
complexity
becomes more of a concern. This is because the visual interface with voice assistants is
missing. Users simply cannot see or touch a voice interface.
A. Subprocess: - This module is used for getting system subprocess details which are used
in various commands i.e., Shutdown, Sleep, etc. This module comes built-in with Python.
B. Wolfram Alpha: - It is used to compute expert-level answers of any command using
Wolfram’s algorithms, knowledgebase and AI technology.
10
C. pyttsx3: - pyttsx3 is a text-to-speech conversion library in Python. It works offline and
is compatible with python like Python 2 and 3 works without internet connection or delay.
The text-to-speech features for this pyttsx3 module are based on python languages installed
in your operating system. It is a very easy to use tool which converts the entered text into
speech.
D. JSON: - JSON stands for JavaScript Object Notation. JSON is a lightweight format for
storing and transporting data. JSON is often used when data is sent from a server to a web
page. JSON is "self-describing" and easy to understand.
E. Speech recognition: - Speech recognition means that when humans are speaking, a
machine understands it. In our project we are using Google Speech API in Python to make
software which is used to run machines on command. We need to install the Pyaudio python
package for recognize the voice commands. Pyaudio can be installed by using pip install
Pyaudio command.
F. Datetime: - Datetime package is used to showing Date and Time. This datetime module
comes built-in with Python.
G. Wikipedia: - We all know Wikipedia is a great and huge source of knowledge just like
information or any other sources we have used the Wikipedia module in our project to get
more information from Wikipedia or to perform a Wikipedia search. To install this Wikipedia
module, use pip install Wikipedia.
H. webbrowser: - To perform Web Search. This module comes built-in with Python.
I. OS: - The OS module in Python provides functions for interacting with the operating
system. OS comes under Python’s standard utility modules. This module provides a portable
way of using operating system dependent functionality.
J. Pyjokes:- Pyjokes is used for collection Python Jokes over the Internet. Pyjokes is add in
our project because it adds jokes in our project. It is very interesting. Pyjokes is the one-line
joke which makes our project interesting.
K. Pyaudio:- PyAudio is a set of Python bindings for Port Audio, a cross-platform C++
library interfacing with audio drivers.
L. ctypes:- ctypes is a built-in powerful Python feature to create and manipulate C data types
in Python, it is enables you not only for call functions in dynamically linked libraries but also
used for low-level memory manipulation. It allows wrapping these libraries in pure Python.
M. Requests: - Requests module allows u to send http requests using python. Requests is
used for making GET and POST requests. It abstracts the complexities of making requests
behind a beautiful, simple API.
11
2.5.2 Hardware Requirements
❖ Processor: Any Updated Processor
❖ Hard Disk: 1TB
❖ Ram: 4 GB or above
❖Ethernet connection (LAN) or WI-FI
❖ Other Requirements: Web camera
12
3. Literature Overview
Python
Python is an OOPs (Object Oriented Programming) based, high level, interpreted
programming language. It is a robust, highly useful language focused on rapid application
development (RAD). Python helps in easy writing and execution of codes. Python can
implement the same logic with as much as 1/5th code as compared to other OOPs languages.
Python provides a huge list of benefits to all. The usage of Python is such that it cannot be
limited to only one activity. Its growing popularity has allowed it to enter into some of the
most popular and complex processes like Artificial Intelligence (AI), Machine Learning
(ML), natural language processing, data science etc. Python has a lot of libraries for every
need of this project. For JIA, libraries used are speech recognition to recognize voice, Pyttsx
for text to speech, selenium for web automation etc.
Python is reasonably efficient. Efficiency is usually not a problem for small examples. If
your Python code is not efficient enough, a general procedure to improve it is to find out
what is taking most the time, and implement just that part more efficiently in some lower-
level language. This will result in much less programming and more efficient code (because
you will have more time to optimize) than writing everything in a low-level language.
Visual Studio
Visual Studio is an Integrated Development Environment (IDE) developed by Microsoft to
develop GUI (Graphical User Interface), console, Web applications, web apps, mobile apps,
cloud, and web services, etc. With the help of this IDE, you can create managed code as well
as native code. It uses the various platforms of Microsoft software development software
like Windows store, Microsoft Silverlight, and Windows API, etc. It is not a language-
specific IDE as you can use this to write code in C#, C++, VB (Visual Basic), Python,
JavaScript, and many more languages. It provides support for 36 different programming
languages. It is available for Windows as well as for macOS. Evolution of Visual Studio:
The first version of VS (Visual Studio) was released in 1997, named as Visual Studio 97
having version number 5.0. The latest version of Visual Studio is 15.0 which was released
on March 7, 2017. It is also termed as Visual Studio 2017. The supported .Net Framework
Versions in latest Visual Studio is 3.5 to 4.7. Java was supported in old versions of Visual
Studio but in the latest version doesn’t provide any support for Java language.
13
4. System Design
14
In this project there is only one user. The user queries command to the system. System then
interprets it and fetches answer. The response is sent back to the user.
Class Diagram
The class user has 2 attributes command that it sends in audio and the response it receives
which is also audio. It performs function to listen the user command. Interpret it and then
reply or sends back response accordingly. Question class has the command in string form as
it is interpreted by interpret class. It sends it to general or about or search function based on
its identification. The task class also has interpreted command in string format. It has various
functions like reminder, note, mimic, research and reader.
15
Activity
Initially, the system is in idle mode. As it receives any wake up call it begins execution. The
received command is identified whether it is a questionnaire or a task to be performed.
Specific action is taken accordingly. After the Question is being answered or the task is being
performed, the system waits for another command. This loop continues unless it receives quit
command. At that moment, it goes back to sleep.
Sequence Diagram
16
The above sequence diagram shows how an answer asked by the user is being fetched from
internet. The audio query is interpreted and sent to Web scraper. The web scraper searches
and finds the answer. It is then sent back to speaker, where it speaks the answer tourer.
Component Diagram
The main component here is the Virtual Assistant. It provides two specific service, executing
Task or Answering your question.
17
4.4 System Design
The system uses Google’s online speech recognition system for converting speech input to
text. The speech input Users can obtain texts from the special corpora organized on the
computer network server at the information centre from the microphone is temporarily stored
in the system which is then sent to Google cloud for speech recognition. The equivalent text
is then received and fed to the central processor.
Python Backend:
The python backend gets the output from the speech recognition module and then identifies
whether the command or the speech output is an API Call and Context Extraction. The output
is then sent back to the python backend to give the required output to the user.
API calls
API stands for Application Programming Interface. An API is a software intermediary that
allows two applications to talk to each other. In other words, an API is a messenger that
delivers your request to the provider that you’re requesting it from and then delivers the
response back to you.
18
Content Extraction
Context extraction (CE) is the task of automatically extracting structured information from
unstructured and/or semi-structured machine-readable documents. In most cases, this activity
concerns processing human language texts using natural language processing (NLP). Recent
activities in multimedia document processing like automatic annotation and content
extraction out of images/audio/video could be seen as context extraction TEST RESULTS.
Text-to-speech module
Text-to-Speech (TTS) refers to the ability of computers to read text aloud. A TTS Engine
converts written text to a phonemic representation, then converts the phonemic representation
to waveforms that can be output as sound. TTS engines with different languages, dialects and
specialized vocabularies are available through third-party publishers.
19
5. Sample Code
5.1 Coding
import subprocess
import wolframalpha
import pyttsx3
import tkinter
import json
import random
import operator
import speech_recognition as sr
import datetime
import wikipedia
import webbrowser
import os
import winshell
import pyjokes
import feedparser
import smtplib
import ctypes
import time
import requests
import shutil
listener = sr.Recognizer()
engine = pyttsx3.init()
voices = engine.getProperty('voices')
engine.setProperty('voice', voices[0].id)
def talk(text):
engine.say(text)
engine.runAndWait()
def take_command():
try:
with sr.Microphone() as source:
20
print('listening...')
voice = listener.listen(source)
command = listener.recognize_google(voice)
command = command.lower()
if 'alexa' in command:
command = command.replace('alexa', '')
print(command)
except:
pass
return command
def Hello():
talk("hello sir I am your desktop assistant. Tell me how may I help you")
print("hello sir I am your desktop assistant. Tell me how may I help you")
def tellDay():
day = datetime.datetime.today().weekday() + 1
Day_dict = {1: 'Monday', 2: 'Tuesday',3: 'Wednesday', 4: 'Thursday',
5: 'Friday', 6: 'Saturday',
7: 'Sunday'}
if day in Day_dict.keys():
day_of_the_week = Day_dict[day]
print(day_of_the_week)
talk("The day is " + day_of_the_week)
def weather():
city = "hyderabad"
res =
requests.get(f"http://api.openweathermap.org/data/2.5/weather?q={city}&appid=16f0afad2
fd9e18b7aee9582e8ce650b&units=metric").json()
temp1 = res["weather"][0]["description"]
temp2 = res["main"]["temp"]
talk(f"Temperature is {format(temp2)} degree Celsius \nWeather is {format(temp1)}")
print(f"Temperature is {format(temp2)} degree Celsius \nWeather is {format(temp1)}")
def news():
newsapi = NewsApiClient(api_key='5840b303fbf949c9985f0e1016fc1155')
talk("What topic you need the news about")
topic = take_command()
data = newsapi.get_top_headlines(
q=topic, language="en", page_size=5)
21
newsData = data["articles"]
for y in newsData:
talk(y["description"])
print(y["description"]
def wishMe():
hour=datetime.datetime.now().hour
if hour>=0 and hour<12:
talk("Good Morning")
print("Good Morning")
elif hour>=12 and hour<18:
talk("Good Afternoon")
print("Good Afternoon")
else:
talk("Good Evening")
print("Good Evening")
def run_alexa():
wishMe()
Hello()
command = take_command()
print(command)
if 'play' in command:
song = command.replace('play', '')
talk('playing ' + song)
pywhatkit.playonyt(song)
elif 'time' in command:
time = datetime.datetime.now().strftime('%I:%M %p')
talk('Current time is ' + time)
print('Current time is ' + time)
elif 'who is' in command:
person = command.replace('who is', '')
info = wikipedia.summary(person, 1)
print(info)
talk(info)
elif 'open gmail' in command:
webbrowser.open_new_tab("gmail.com")
talk("Google Mail open now")
time.sleep(5)
elif 'will you come to date with me' in command:
talk('sorry, I have a headache')
print('sorry, I have a headache')
elif 'are you single' in command:
talk('I am in a relationship with wifi')
print('I am in a relationship with wifi')
elif 'joke' in command:
talk(pyjokes.get_joke())
22
print(pyjokes.get_joke())
elif "open google" in command:
talk("Opening Google ")
webbrowser.open("www.google.com")
elif "bye" in command:
talk("Bye. Have a good day.")
print("Bye. Have a good day.")
exit()
elif "tell me your name" in command:
talk("I am Alexa. Your desktop Assistant")
print("I am Alexa. Your desktop Assistant")
elif "which day it is" in command:
tellDay()
elif 'open youtube' in command:
talk("Here you go to Youtube\n")
webbrowser.open("youtube.com")
elif 'shutdown system' in command:
talk("Hold On a Sec ! Your system is on its way to shut down")
subprocess.call('shutdown / p /f')
elif 'empty recycle bin' in command:
winshell.recycle_bin().empty(confirm = False, show_progress = False, sound = True)
talk("Recycle Bin Recycled")
elif "show note" in command:
talk("Showing Notes")
file = open("virtual.txt", "r")
print(file.read())
talk(file.read(6))
elif 'change background' in command:
ctypes.windll.user32.SystemParametersInfoW(20, 0, "Location of wallpaper", 0)
talk("Background changed successfully")
elif 'reason for you' in command:
talk("I was created as for Minor project ")
print("I was created as for Minor project ")
elif 'lock window' in command:
talk("locking the device")
ctypes.windll.user32.LockWorkStation()
elif 'news' in command:
news()
elif 'date' in command:
date = datetime.datetime.now().strftime('%d-%m-%Y')
print(date)
talk("Today's date is" + date)
elif 'weather' in command:
weather()
elif 'how are you' in command:
23
talk("I am fine, Thank you")
talk("How are you, Sir")
print("I am fine, Thank you")
print("How are you, Sir")
elif 'fine' in command or "good" in command:
talk("It's good to know that your fine")
elif "camera" in command or "take a photo" in command:
ec.capture(0, " Camera ", "img.jpg")
else:
talk('Please say the command again.')
while True:
run_alexa()
24
6. Testing
6.1 Testing
Voice devices are smart devices designed specifically for getting tasks done through voice
conversations. These devices are always awake and listening. There are such devices that are ruling
the market those are Voice Assistant and virtual Assistant. These devices can be used for playing
music, news, showing time and date, lock the window, jokes, searching the web, and even ordering
food.
UNIT TESTING
This type of test is targeted to voice-based app developers. They need to do unit testing to ensure
the code is working correctly in isolation. As you perform unit testing while coding, you need it to
be fast, so it doesn’t interrupt your coding pace. Unit testing is focused on making sure your code
and logic are correct so there is no need to hit the cloud (where most voice apps backend lives), or
call real external services. It is important that the unit testing tool you choose supports mocks and
preferably runs locally.
END-TO-END TESTING
This type of test is targeted to QA teams. It ensures the entire system is working correctly — AI
(ASR + NLU), code, and external services. The tests should be comprehensive and if possible easy
to write, read and maintain. As closely as possible, we want to imitate what real users do in their
everyday interactions with voice apps.
This type of test is targeted to Ops people and is to ensure that a service, once deployed, works
flawlessly. This kind of testing verifies your voice-based app at a regular interval. It should be easy
to set up and the results should be delivered immediately when the voice-based app has stopped
behaving as expected.
Usability Performance Testing makes sure that the AI components of the app are working correctly.
It is targeted to Product Managers and developers. The main goal is to identify issues with the
speech recognition and NLU behaviour of the assistant and your app. It consists of comprehensive
testing of the interaction model, creating a baseline set of results. These results, in turn, are the
basis for making improvements to the interaction model and the code of the skill – once revisions
are completed, additional tests are run to ensure everything is working as expected.
25
AUTOMATED VOICE TESTING (AVT)
This approach makes use of Speech Recognition Engine (SRE) for automatic control of the voice-
first devices just like any human being would.
Test Case 1
Test Title: Response Time
Test ID: T1
Test Priority: High
Test Objective:
To make sure that the system respond back time is efficient.
Description:
Time is very critical in a voice-based system. As we are not typing inputs, we are speaking
them. The system must also reply in a moment. User must get instant response of the query
made.
Test Case 2
Test Title: Accuracy
Test ID: T2
Test Priority: High
Test Objective:
To assure that answers retrieved by system are accurate as per gathered data.
Description:
A virtual assistant system is mainly used to get precise answers to any question asked. Getting
answer in a moment is of no use if the answer is not correct. Accuracy is of utmost importance
in a virtual assistant system.
Test Case 3
Test Title: Approximation
Test ID: T3
Test priority: Moderate
Test Objective:
To check approximate answers about calculations.
Description:
There are times when mathematical calculation requires approximate value. For example, if
someone asks for value of PI the system must respond with approximate value and not the
accurate value. Getting exact value in such cases is undesirable. Note: There might include a
few more test cases and these test cases are also subject to change with the final software
development.
26
7. Output Screens
27
28
8. Conclusion
In this paper “Virtual Assistant Using Python” we discussed the design and implementation
of Digital Assistance. The project is built using open-source software modules with Visual
Studio community backing which can accommodate any updates shortly. The modular nature
of this project makes it more flexible and easier to add additional features without disturbing
current system functionalities.
It not only works on human commands but also give responses to the user based on the query
being asked or the words spoken by the user such as opening tasks and operations. It is
greeting the user the way the user feels more comfortable and feels free to interact with the
voice assistant. The application should also eliminate any kind of unnecessary manual work
required in the user life of performing every task. The entire system works on the verbal input
rather than the next one.
Virtual Personal Assistants are a very effective way to organize your program. There are now
many Smart Personal Digital Assistant apps available on the market for various device
platforms. These new Software apps work much better than PDA devices as they provide all
the features of your smartphone. VPAs are also more reliable than Personal Assistants because
VPAs are portable and you can use them at any time. And they have more information than
any assistant as they are connected to the internet.
FUTURE SCOPE-
The virtual assistants which are currently available are fast and responsive but we still have
to go a long way. The understanding and reliability of the current systems need to be improved
a lot. The assistants available nowadays are still not reliable in critical scenarios. The future
of these assistants will have the virtual assistants incorporated with Artificial Intelligence
which includes Machine Learning, Neural Networks, etc. and IoT. With the incorporation of
these technologies, we will be able to achieve new heights. What the virtual assistants can
achieve is much beyond what we have achieved till now. Most of us have seen Jarvis, that is
a virtual assistant developed by iron man which is although fictional but this has set new
standards of what we can achieve using voice-activated virtual assistants.
29
9. Bibliography
How to Build Your Own First Voice Assistant in Python by Dan We, Released December 2021
Publisher(s): Packt Publishing, ISBN: 9781803234274
www.codecademy.com
www.tutorialspoint.com
www.google.com.in
https://www.geeksforgeeks.org/introduction-to-visual-studio/
Designing Personal Assistant Software for Task Management using Semantic Web
Technologies and Knowledge Databases, PurushothamBotla.
Python code for Artificial Intelligence: Foundations of Computational Agents, David L. Poole
and Alan K. Mackwort
30
10. Appendices
A. Methodologies used
The system uses Google’s online speech recognition system for converting speech input to
text. The speech input Users can obtain texts from the special corpora organized on the
computer network server at the information centre from the microphone is temporarily stored
in the system which is then sent to Google cloud for speech recognition. The equivalent text
is then received and fed to the central processor.
Python Backend:
The python backend gets the output from the speech recognition module and then identifies
whether the command or the speech output is an API Call and Context Extraction. The output
is then sent back to the python backend to give the required output to the user.
API calls
API stands for Application Programming Interface. An API is a software intermediary that
allows two applications to talk to each other. In other words, an API is a messenger that
delivers your request to the provider that you’re requesting it from and then delivers the
response back to you.
31
Content Extraction
Context extraction (CE) is the task of automatically extracting structured information from
unstructured and/or semi-structured machine-readable documents. In most cases, this activity
concerns processing human language texts using natural language processing (NLP). Recent
activities in multimedia document processing like automatic annotation and content
extraction out of images/audio/video could be seen as context extraction TEST RESULTS.
Text-to-speech module
Text-to-Speech (TTS) refers to the ability of computers to read text aloud. A TTS Engine
converts written text to a phonemic representation, then converts the phonemic representation
to waveforms that can be output as sound. TTS engines with different languages, dialects and
specialized vocabularies are available through third-party publishers.
UNIT TESTING
This type of test is targeted to voice-based app developers. They need to do unit testing to ensure
the code is working correctly in isolation. As you perform unit testing while coding, you need it to
be fast, so it doesn’t interrupt your coding pace. Unit testing is focused on making sure your code
and logic are correct so there is no need to hit the cloud (where most voice apps backend lives), or
call real external services. It is important that the unit testing tool you choose supports mocks and
preferably runs locally.
END-TO-END TESTING
This type of test is targeted to QA teams. It ensures the entire system is working correctly — AI
(ASR + NLU), code, and external services. The tests should be comprehensive and if possible easy
to write, read and maintain. As closely as possible, we want to imitate what real users do in their
everyday interactions with voice apps.
This type of test is targeted to Ops people and is to ensure that a service, once deployed, works
flawlessly. This kind of testing verifies your voice-based app at a regular interval. It should be easy
to set up and the results should be delivered immediately when the voice-based app has stopped
behaving as expected.
32
Usability Performance Testing makes sure that the AI components of the app are working correctly.
It is targeted to Product Managers and developers. The main goal is to identify issues with the
speech recognition and NLU behaviour of the assistant and your app. It consists of comprehensive
testing of the interaction model, creating a baseline set of results. These results, in turn, are the
basis for making improvements to the interaction model and the code of the skill – once revisions
are completed, additional tests are run to ensure everything is working as expected.
This approach makes use of Speech Recognition Engine (SRE) for automatic control of the voice-
first devices just like any human being would.
33
11. PLAGIARISM REPORT