ABSTRACT
The project aims to develop a personal-assistant for Linux-based systems. Jarvis
draws its inspiration from virtual assistants like Cortana for Windows, and Siri for iOS.
It has been designed to provide a user-friendly interface for carrying out a variety of
tasks by employing certain well-defined commands. Users can interact with the
assistant either through voice commands or using keyboard input.
As a personal assistant, Jarvis assists the end-user with day-to-day activities like
general human conversation, searching queries in google, bing or yahoo, searching
for videos, retrieving images, live weather conditions, word meanings, searching for
medicine details, health recommendations based on symptoms and reminding the
user about the scheduled events and tasks. The user statements/commands are
analysed with the help of Arificial Intelligence to give an optimal solution.
v
CONTENTS
CHAPTER NO. CHAPTER NAME PAGE NO.
1 INTRODUCTION 1
1.1 General Theory 2
1.2 Implementation Overview 3
1.3 Objectives 4
1.4 Purpose 5
2 LITERATURE SURVEY 6
2.1 Scope 6
2.2 Applicability 7
2.3 Survey of Technology 8
3 MATERIALS AND METHODS 10
3.1 Requirement and Analysis 10
3.2 Requirement Specification 11
3.3 Feasibility Study 12
4 SYSTEM REQUIREMENT AND DESIGN 13
4.1 Hardware and Software Requirements 13
4.2 System Design & Block Diagrams 14
4.3 Data Dictionary 26
5 RESULTS AND DISCUSSION 28
5.1 Test Case Design 28
5.2 Explanation of Each function used in Code 29
5.3 Recapitulate 39
5.4 Code as Inscribed 40
6 CONCLUSION 44
vi
LIST OF ABBREVIATIONS
ABBREVIATION EXPANSION
IOT Internet of Things
GUI Graphical User Interface
AI Artificial Intelligence
IDE Integrated Development Environment
vii
LIST OF FIGURES
FIGURE FIGURE NAME PAGE
NUMBER NUMBER
4.2.1 ER Diagram 14
4.2.2 Activity Diagram 15
4.2.3 Class Diagram 16
4.2.4 User Case Diagram 17
4.2.5 Sequence Diagram 18
4.2.6 Data Flow Diagram 20
4.2.7 Component Diagram 24
4.2.8 Deployment Diagram 25
viii
LIST OF TABLES
TABLE NUMBER TABLE NAME PAGE NUMBER
4.3 Data Dictionary 26
ix
CHAPTER 1
INTRODUCTION
1.1 General Theory
In today’s era almost all tasks are digitalized. We have Smartphone in hands and it is nothing
less than having world at Wer finger tips. These days we aren’t even using fingers. We just
speak of the task and it is done. There exist systems where we can say Text Dad, “I’ll be late
today.” And the text is sent. That is the task of a Virtual Assistant. It also supports specialized
task such as booking a flight, or finding cheapest book online from various ecommerce sites
and then providing an interface to book an order are helping automate search, discovery and
online order operations.
Virtual Assistants are software programs that help We ease Wer day to day tasks, such as
showing weather report, creating reminders, making shopping lists etc. They can take
commands via text (online chat bots) or by voice. Voice based intelligent assistants need an
invoking word or wake word to activate the listener, followed by the command. For my project
the wake word is JIA. We have so many virtual assistants, such as Apple’s Siri, Amazon’s Alexa
and Microsoft’s Cortana. For this project, wake word was chosen JIA.
This system is designed to be used efficiently on desktops. Personal assistant software
improves user productivity by managing routine tasks of the user and by providing information
from online sources to the user. JIA is effortless to use. Call the wake word ‘JIA’ followed by the
command. And within seconds, it gets executed.
Voice searches have dominated over text search. Web searches conducted via mobile devices
have only just overtaken those carried out using a computer and the analysts are already
predicting that 50% of searches will be via voice by 2020.Virtual assistants are turning out to be
smarter than ever. Allow We are intelligent assistant to make email work for We. Detect intent,
pick out important information, automate processes, and deliver personalized responses.
This project was started on the premise that there is sufficient amount of openly available data and
information on the web that can be utilized to build a virtual assistant that has access to making
intelligent decisions for routine user activities.
1
1.2 Implementation Overview
We can use any IDE like Anaconda, Visual Studio etc. So the first step is to create a function for
a voice which has arguments. Also, we are using Speech API to take voice as input. There are
two default voices in our computer I.e. Male and Female. So We can use either any from these.
Also, check the voice function by giving some text input and it will be converted into voice.
We can create a new function for wishing the user. Use if_else condition statement for allocating
the wished output. E.g. If its time between 12 to 18 then the system will say “Good Evening”.
Along with this, We can get a welcome voice also. E.g. “Welcome, What can I do for us”.After
that, we have to install the speech recognition model and then import it
Define a new function for taking command from a user. Also mention class for speech
recognition and input type like microphone etc. Also mention pause_threshold. Set a query for
voice recognition language. We can use Google engine to convert Wer voice input to the text.
We have to install and import some other packages like pyttsx3, Wikipedia etc. Pyttsx3 helps We
to convert text input to speech conversion. If We ask for any information then it will display the
result in textual format. We can convert it very easily in the voice format as we have already
defined a function in our code previously.
Else We ask to open WeTube in the query then it will go to the WeTube address automatically.
For that, We have to install a web browser package and import it. In the same way, We can add
queries for many websites like Google, Instagram, Facebook etc. The next task is to play the
songs. It is same as We have done before. Add a query for “play songs”. Also, add the location
of songs folder so that assistant will play the song from that only.
So the main question is that how the queries work? So here we are using conditional statements.
If the computer hears voice command which contains word Wetube then it will go to the WeTube
page address, if the voice contains google command then it will go to the Google search page
and so many accordingly.
We can add so many pages and commands for Web desktop assistant .
2
Have We ever wondered how cool it would be to have Wer own A.I. assistant? Imagine
how easier it would be to send emails without typing a single word, doing Wikipedia
searches without opening web browsers, and performing many other daily tasks like
playing music with the help of a single voice command.
What can this A.I. assistant do for us?
• It can send emails on Wer behalf.
• It can play music for We.
• It can do Wikipedia searches for We.
• It is capable of opening websites like Google, Wetube, etc., in a web browser.
• It is capable of opening Wer code editor or IDE with a single voice command.
3
1.3 OBJECTIVES
Main objective of building personal assistant software (a virtual assistant) is using
semantic data sources available on the web, user generated content and providing
knowledge from knowledge databases. The main purpose of an intelligent virtual
assistant is to answer questions that users may have. This may be done in a business
environment, for example, on the business website, with a chat interface. On the
mobile platform, the intelligent virtual assistant is available as a call-button operated
service where a voice asks the user “What can I do for us?” and then responds to
verbal input.Virtual assistants can tremendously save We time. We spend hours in
online research and then making the report in our terms of understanding. JIA can do
that for We. Provide a topic for research and continue with Wer tasks while JIA does
the research. Another difficult task is to remember test dates, birthdates or
anniversaries.
It comes with a surprise when We enter the class and realize it is class test today. Just
tell JIA in advance about Wer tests and she reminds We well in advance so We can
prepare for the test.
One of the main advantages of voice searches is their rapidity. In fact, voice is reputed
to be four times faster than a written search: whereas we can write about 40 words per
minute, we are capable of speaking around 150 during the same period of time15. In
this respect, the ability of personal assistants to accurately recognize spoken words is
a prerequisite for them to be adopted by consumers.
4
1.4 Purpose
Purpose of virtual assistant is to being capable of voice interaction, music playback,
making to-do lists, setting alarms, streaming podcasts, playing audiobooks, and
providing weather, traffic, sports, and other real-time information, such as news. Virtual
assistants enable users to speak natural language voice commands in order to operate
the device and its apps.
There is an increased overall awareness and a higher level of comfort demonstrated
specifically by millennial consumers. In this ever-evolving digital world where speed,
efficiency, and convenience are constantly being optimized, it’s clear that we are
moving towards less screen interaction.
5
CHAPTER 2
LITERATURE SURVEY
2.1 Scope
Voice assistants will continue to offer more individualized experiences as they get better
at differentiating between voices. However, it’s not just developers that need to address
the complexity of developing for voice as brands also need to understand the capabilities
of each device and integration and if it makes sense for their specific brand. They will also
need to focus on maintaining a user experience that is consistent within the coming years
as complexity becomes more of a concern. This is because the visual interface with voice
assistants is missing. Users simply cannot see or touch a voice interface
2.2 Applicability
The mass adoption of artificial intelligence in users’ everyday lives is also fueling the
shift towards voice. The number of IoT devices such as smart thermostats and
speakers are giving voice assistants more utility in a connected user’s life. Smart
speakers are the number one way we are seeing voice being used. Many industry
experts even predict that nearly every application will integrate voice technology in
some way in the next 5 years.
The use of virtual assistants can also enhance the system of IoT (Internet of Things).
Twenty years from now, Microsoft and its competitors will be offering personal digital
assistants that will offer the services of a full-time employee usually reserved for the rich
and famous
6
2.3 SURVEY OF TECHNOLOGY
2.3.1 Python
Python is an OOPs (Object Oriented Programming) based, high level, interpreted
programming language. It is a robust, highly useful language focused on rapid
application development (RAD). Python helps in easy writing and execution of codes.
Python can implement the same logic with as much as 1/5th code as compared to other
OOPs languages.
Python provides a huge list of benefits to all. The usage of Python is such that it cannot
be limited to only one activity. Its growing popularity has allowed it to enter into some of
the most popular and complex processes like Artificial Intelligence (AI), Machine Learning
(ML), natural language processing, data science etc. Python has a lot of libraries for every
need of this project. For JIA, libraries used are speechrecognition to recognize voice,
Pyttsx for text to speech, selenium for web automation etc.
Python is reasonably efficient. Efficiency is usually not a problem for small examples. If
Wer Python code is not efficient enough, a general procedure to improve it is to find out
what is taking most the time, and implement just that part more efficiently in some lower-
level language. This will result in much less programming and more efficient code
(because We will have more time to optimize) than writing everything in a low-level
language
2.3.2 DBpedia
Knowledge bases are playing an increasingly important role in enhancing the
intelligence of Web and enterprise search and in supporting information integration.
The DBpedia leverages this gigantic source of knowledge by extracting structured
information from Wikipedia and by making this information accessible on the Web. The
DBpedia knowledge base has several advantages over existing knowledge bases: it
covers many domains; it represents real community agreement; it automatically
evolves as Wikipedia changes, and it is truly multilingual.
The DBpedia knowledge base allows We to ask quite surprising queries against
Wikipedia for instance “Give me all cities in New Jersey with more than 10,000
inhabitants” or “Give me all Italian musicians from the 18 th century”.
2.3.3 Quepy
Quepy is a python framework to transform natural language questions to queries in a
database query language. It can be easily customized to different kinds of questions in
natural language and database queries. So, with little coding We can build Wer own
system for natural language access to Wer database.
7
2.3.4 Pyttsx
Pyttsx stands for Python Text to Speech. It is a cross-platform Python wrapper for
textto-speech synthesis. It is a Python package supporting common text-to-speech
engines on Mac OS X, Windows, and Linux. It works for both Python2.x and 3.x
versions. Its main advantage is that it works offline.
2.3.5 Speech Recognition
This is a library for performing speech recognition, with support for several engines and
APIs, online and offline. It supports APIs like Google Cloud Speech API, IBM Speech
to Text, Microsoft Bing Voice Recognition etc.
2.3.6 SQLite
SQLite is a capable library, providing an in-process relational database for efficient
storage of small-to-medium-sized data sets. It supports most of the common features
of SQL with few exceptions. Best of all, most Python users do not need to install
anything to get started working with SQLite, as the standard library in most distributions
ships with the sqlite3 module.
SQLite runs embedded in memory alongside Wer application, allowing We to easily
extend SQLite with Wer own Python code. SQLite provides quite a few hooks, a
reasonable subset of which are implemented by the standard library database driver
9
CHAPTER 3
MATERIALS AND METHODS
3.1 REQUIREMENT AND ANALYSIS
System Analysis is about complete understanding of existing systems and finding where
the existing system fails. The solution is determined to resolve issues in the proposed
system. It defines the system. The system is divided into smaller parts. Their functions
and inter relation of these modules are studied in system analysis. The complete
analysis is followed below:-
Problem definition
Usually, user needs to manually manage multiple sets of applications to complete one
task. For example, a user trying to make a travel plan needs to check for airport codes
for nearby airports and then check travel sites for tickets between combinations of
airports to reach the destination. There is need of a system that can manage tasks
effortlessly.
We already have multiple virtual assistants. But we hardly use it. There are number of
people who have issues in voice recognition. These systems can understand English
phrases but they fail to recognize in our accent. Our way of pronunciation is way distinct
from theirs. Also, they are easy to use on mobile devices than desktop systems. There
is need of a virtual assistant that can understand English in Indian accent and work on
desktop system.
When a virtual assistant is not able to answer questions accurately, it’s because it lacks
the proper context or doesn’t understand the intent of the question. Its ability to answer
questions relevantly only happens with rigorous optimization, involving both humans
and machine learning. Continuously ensuring solid quality control strategies will also
help manage the risk of the virtual assistant learning undesired bad behaviors. They
require large amount of information to be fed in order for it to work efficiently.
Virtual assistant should be able to model complex task dependencies and use these
models to recommend optimized plans for the user. It needs to be tested for finding
optimum paths when a task has multiple sub-tasks and each sub-task can have its own
sub-tasks. In such a case there can be multiple solutions to paths, and the it should be
able to consider user preferences, other active tasks, priorities in order to recommend a
particular plan.
10
3.2 REQUIREMENT SPECIFICATION
Personal assistant software is required to act as an interface into the digital world by
understanding user requests or commands and then translating into actions or
recommendations based on agent’s understanding of the world.
Jarvis focuses on relieving the user of entering text input and using voice as primary
means of user input. Agent then applies voice recognition algorithms to this input and
records the input. It then use this input to call one of the personal information
management applications such as task list or calendar to record a new entry or to
search about it on search engines like Google, Bing or Yahoo etc.
Focus is on capturing the user input through voice, recognizing the input and then
executing the tasks if the agent understands the task. Software takes this input in
natural language, and so makes it easier for the user to input what he or she desires to
be done.
3.3 Feasibility Study
Feasibility study can help We determine whether or not We should proceed with Wer
project. It is essential to evaluate cost and benefit. It is essential to evaluate cost and
benefit of the proposed system. Five types of feasibility study are taken into consideration.
3.3.1 Technical feasibility :-
It includes finding out technologies for the project, both hardware and software. For virtual
assistant, user must have microphone to convey their message and a speaker to listen
when system speaks. These are very cheap now a days and everyone generally possess
them. Besides, system needs internet connection. While using Jarvis, make sure We
have a steady internet connection. It is also not an issue in this era where almost every
home or office has Wi-Fi.
3.3.2 Operational feasibility :-
It is the ease and simplicity of operation of proposed system. System does not require
any special skill set for users to operate it. In fact, it is designed to be used by almost
everyone. Kids who still don’t know to write can read out problems for system and get
answers.
3.3.3 Economical feasibility :-
Here, we find the total cost and benefit of the proposed system over current system. For
this project, the main cost is documentation cost. User also would have to pay for
microphone and speakers. Again, they are cheap and available. As far as maintenance
is concerned, JIA won’t cost too much.
3.3.4 Organizational feasibility :-
11
This shows the management and organizational structure of the project. This project is
not built by a team. The management tasks are all to be carried out by a single person.
That won’t create any management issues and will increase the feasibility of the project.
3.3.5 Cultural feasibility :-
It deals with compatibility of the project with cultural environment. Virtual assistant is built
in accordance with the general culture. The project is named JIA so as to represent Indian
culture without undermining local beliefs.
This project is technically feasible with no external hardware requirements. Also it is
simple in operation and does not cost training or repairs. Overall feasibility study of
the project reveals that the goals of the proposed system are achievable. Decision is
taken to proceed with the project.
12
CHAPTER 4
SYSTEM REQUIREMENT AND DESIGN
4.1 Hardware and Software Requirements
The software is designed to be light-weighted so that it doesn’t be a burden on the
machine running it. This system is being build keeping in mind the generally available
hardware and software compatibility. Here are the minimum hardware and software
requirement for virtual assistant.
Hardware :-
• Pentium-pro processor or later.
• RAM 512MB or more.
Software :-
• Windows 7(32-bit) or above.
• Python 2.7 or later
• Chrome Driver
4.2 System Design & Block Diagrams
4.2.1 ER DIAGRAM
13
Fig 4.2.1 ER Diagram
The above diagram shows entities and their relationship for a virtual assistant system.
We have a user of a system who can have their keys and values. It can be used to
store any information about the user. Say, for key “name” value can be “Jim”. For some
keys user might like to keep secure. There he can enable lock and set a password
(voice clip).
Single user can ask multiple questions. Each question will be given ID to get
recognized along with the query and its corresponding answer. User can also be
having n number of tasks. These should have their own unique id and status i.e. their
current state. A task should also have a priority value and its category whether it is a
parent task or child task of an older task.
4.2.2 ACTIVITY DIAGRAM
14
Fig 4.2.2 Activity Diagram
Initially, the system is in idle mode. As it receives any wake up cal it begins execution.
The received command is identified whether it is a questionnaire or a task to be
performed. Specific action is taken accordingly. After the Question is being answered or
the task is being performed, the system waits for another command. This loop continues
unless it receives quit command. At that moment, it goes back to sleep.
15
4.2.3 CLASS DIAGRAM
Fig 4.2.3 Class Diagram
The class user has 2 attributes command that it sends in audio and the response it
receives which is also audio. It performs function to listen the user command. Interpret
it and then reply or sends back response accordingly. Question class has the
command in string form as it is interpreted by interpret class. It sends it to general or
about or search function based on its identification.
The task class also has interpreted command in string format. It has various functions like
reminder, note, mimic, research and reader.
16
4.2.4 USE CASE DIAGRAM
Fig 4.2.4 User Case Diagram
In this project there is only one user. The user queries command to the system. System
then interprets it and fetches answer. The response is sent back to the user.
4.2.5 SEQUENCE DIAGRAM
Sequence diagram for Query-Response
17
Fig 4.2.5.1 Sequence diagram for Query-Response
The above sequence diagram shows how an answer asked by the user is being
fetched from internet. The audio query is interpreted and sent to Web scraper. The
web scraper searches and finds the answer. It is then sent back to speaker, where it
speaks the answer to user.
Sequence diagram for Task Execution
18
Fig 4.2.5.2 Sequence diagram for Task Execution
The user sends command to virtual assistant in audio form. The command is passed
to the interpreter. It identifies what the user has asked and directs it to task executer.
If the task is missing some info, the virtual assistant asks user back about it. The
received information is sent back to task and it is accomplished. After execution
feedback is sent back to user.
19
4.2.6 DATA FLOW DIAGRAM
DFD Level 0 (Context Level Diagram)
Fig 4.2.6.1 DFD Level 0
DFD Level 1
20
Fig 4.2.6.2 DFD Level 1
21
DFD Level 2
Data Flow in Assistance
Managing User Data
Data Flow in Kid Zone
Settings of virtual Assistant
23
4.2.7 COMPONENT DIAGRAM
Fig 4.2.7 Component Diagram
The main component here is the Virtual Assistant. It provides two specific service,
executing Task or Answering Wer question.
4.2.8 DEPLOYMENT DIAGRAM
24
Fig 4.2.8 Deployment Diagram
The user interacts with SQLite database using SQLite connection in Python code. The
knowledge database DBPedia must be accessed via internet connection. This requires
LAN or WLAN / Ethernet network.
25
4.3 DATA DICTIONARY
User
Key Text
Value Text
Lock Boolean
Password Text
Question
Qid Integer PRIMARY KEY
Query Text
Answer Text
Task
Tid Integer PRIMARY KEY
Status Text (Active/Waiting/Stopped)
Level Text (Parent/Sub)
Priority Integer
Reminder
Rid Integer PRIMARY KEY
Tid Integer FOREIGN KEY
What Text
When Time
On Date
Notify before Time
26
Note
Nid Integer PRIMARY KEY
Tid Integer FOREIGN KEY
Data Text
Priority Integer
27
CHAPTER 6
CONCLUSION
Through this voice assistant, we have automated various services using a single line
command. It eases most of the tasks of the user like searching the web, retrieving
weather forecast details, vocabulary help and medical related queries. We aim to make
this project a complete server assistant and make it smart enough to act as a
replacement for a general server administration. The future plans include integrating
Jarvis with mobile using React Native to provide a synchronised experience between
the two connected devices. Further, in the long run, Jarvis is planned to feature auto
deployment supporting elastic beanstalk, backup files, and all operations which a
general Server Administrator does. The functionality would be seamless enough to
replace the Server Administrator with Jarvis.
44
REFERENCES
1. Rabiner Lawrence, Juang Bing-Hwang. Fundamentals of Speech Recognition
Prentice Hall , New Jersey, 1993, ISBN 0-13-015157-2
2. Deller John R., Jr., Hansen John J.L., Proakis John G. ,Discrete-Time
Processing of Speech Signals, IEEE Press, ISBN 0-7803-5386-2
3. Hayes H. Monson,Statistical Digital Signal Processing and Modeling, John Wiley
& Sons Inc. , Toronto, 1996, ISBN 0-471-59431-8
4. Proakis John G., Manolakis Dimitris G.,Digital Signal Processing, principles,
algorithms, and applications, Third Edition, Prentice Hall , New Jersey, 1996,
ISBN 0-13- 394338-9
5. Ashish Jain,Hohn Harris,Speaker identification using MFCC and HMM based
techniques,university Of Florida,April 25,2004.
6. http://www.cse.unsw.edu.au/~waleed/phd/html/node38.html , downloaded on 2
Oct 2012
45