0% found this document useful (0 votes)
39 views7 pages

SKD Academy (CBSE) Session - 2024-2025 Subject - Artificial Intelligence (417) Important Questions Chap - NLP

The document contains important questions related to Artificial Intelligence, specifically focusing on Natural Language Processing (NLP) for the academic session 2024-2025 at SKD Academy. It covers various topics such as sentiment analysis, chatbot definitions, text normalization, and the Bag of Words model, along with practical applications like TFIDF and confusion matrices. Additionally, it includes definitions, comparisons, and examples to aid in understanding key concepts in NLP.

Uploaded by

lemontech111
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views7 pages

SKD Academy (CBSE) Session - 2024-2025 Subject - Artificial Intelligence (417) Important Questions Chap - NLP

The document contains important questions related to Artificial Intelligence, specifically focusing on Natural Language Processing (NLP) for the academic session 2024-2025 at SKD Academy. It covers various topics such as sentiment analysis, chatbot definitions, text normalization, and the Bag of Words model, along with practical applications like TFIDF and confusion matrices. Additionally, it includes definitions, comparisons, and examples to aid in understanding key concepts in NLP.

Uploaded by

lemontech111
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

SKD Academy (CBSE)

Session – 2024-2025
Subject – Artificial Intelligence (417)
Important Questions
Chap - NLP
1. _________Information overload is a real problem when we need to access a specific,
important piece of information from a huge knowledge base.
a) Automatic Summarization
b) Sentiment Analysis
c) Text Classification
d) All of the above
2. The goal of sentiment analysis is to identify sentiment among several posts or even in the
same post where emotion is not always explicitly expressed.
a) Automatic Summarization
b) Sentiment Analysis
c) Text Classification
d) All of the above
3. By dividing up large problems into smaller ones, --------aims to help you manage them in
a more constructive manner.
a) CDP
b) CBT
c) CSP
d) CLP
4. Cognitive Behavioral Therapy includes .
a) Your Thoughts
b) Your Behaviors
c) Your Emotions
d) All of the above
5. Once the textual data has been collected, it needs to be processed and cleaned so that
an easier version can be sent to the machine. This is known as .
a) Data Acquisition
b) Data Exploration
c) Data Mining
d) None of the above
6. ________bots work around a script which is programmed in them.
a) Script-bot
b) Smart-bot
c) Both a) and b)
d) None of the above
7. ________work on bigger databases and other resources directly.
a) Script-bot
b) Smart-bot
c) Both a) and b)
d) None of the above
8. __________helps in cleaning up the textual data in such a way that it comes down to a
level where its complexity is lower than the actual data.
a) Speech Normalization
b) Text Normalization
c) Visual Normalization
d) None of the above
9. __________are the words which occur very frequently in the corpus but do not add any
value to it.
a) Tokens
b) Words
c) Stopwords
d) None of the above
10. Applications of TFIDF are - .
a) Document Classification
b) Topic Modelling
c) Information Retrieval System and Stop word filtering
d) All of the above
11. _______is the process in which the affixes of words are removed and the words are
converted to their base form.
a) Stemming
b) Stopwords
c) Case-sensitivity
d) All of the above
12. __________is a Natural Language Processing model which helps in extracting features
out of the text which can be helpful in machine learning algorithms.
a) Bag of Words
b) Big Words
c) Best Words
d) All of the above
13. Which of the following is not correct about NLP?
a) It is a sub field of AI.
b) It is focused on enabling computers to understand and process human languages.
c) It takes in the data of Natural Languages which humans use in their daily lives.
d) None of the above
14. One of the applications of Natural Language Processing is relevant when used to provide
an overview of a news item or blog post, while avoiding redundancy from multiple
sources and maximizing the diversity of content obtained. Identify the application from
the following
a) Sentiment Analysis
b) Virtual Assistants
c) Text classification
d) Automatic Summarization
15. The term used for the whole textual data from all the documents altogether is known as
a) Complete Data
b) Slab
c) Corpus
d) Cropus

1. What is a Chabot?
A chatbot is a computer program that's designed to simulate human conversation through
voice commands or text chats or both. Eg: Mitsuku Bot, Jabberwacky etc.
2. While working with NLP what is the meaning of?
a. Syntax
b. Semantics
Syntax: Syntax refers to the grammatical structure of a sentence.
Semantics: It refers to the meaning of the sentence.

3. What is the difference between stemming and lemmatization?


Stemming is the process in which the affixes of words are removed and the words are
converted to their base form.
It is just like cutting down the branches of a tree to its stems. For example, the stem of the
words eating, eats, eaten is eat.
Lemmatization is the grouping together of different forms of the same word. In
search queries, lemmatization allows end users to query any version of a base word
and get relevant results.

4. What is meant by a dictionary in NLP?


Dictionary in NLP means a list of all the unique words occurring in the corpus. If some words
are repeated in different documents, they are all written just once as while creating the
dictionary.

5. What is term frequency?


Term frequency is the frequency of a word in one document. Term frequency can easily
be found from the document vector table as in that table we mention the frequency of
each word of the vocabulary in each document.

6. Which package is used for Natural Language Processing in Python


programming? Natural Language Toolkit (NLTK). NLTK is one of the leading
platforms for building Python programs that can work with human language data.

7. What is a document vector table?


Document Vector Table is used while implementing Bag of Words algorithm.
In a document vector table, the header row contains the vocabulary of the corpus and other
rows correspond to different documents.

8. What do you mean by corpus?


In Text Normalization, we undergo several steps to normalize the text to a lower level.
That is, we will be working on text from multiple documents and the term used for the
whole textual data from all the documents altogether is known as corpus.

9. Differentiate between a script-bot and a smart-bot.


Script-bot Smart-bot
a) A scripted chatbot doesn’t a) Smart bots are built on NLP
carry even a glimpse of A.I and ML.
b) Script bots are easy to make b) Smart –bots are
comparatively difficult to
c) Script bot functioning is very make.
limited as they are less c) Smart-bots are flexible
powerful. and powerful.
d) Script bots work around a d) Smart bots work on
script which is programmed bigger databases and
in them other resources directly
e) Wide functionality
e) Limited functionality

10. What is inverse document frequency?


To understand inverse document frequency, first we need to understand document
frequency. Document Frequency is the number of documents in which the word occurs
irrespective of how many times it has occurred in those documents.
In case of inverse document frequency, we need to put the document frequency in the
denominator while the total number of documents is the numerator.
For example, if the document frequency of a word “AMAN” is 2 in a particular document
then its inverse document frequency will be 3/2. (Here no. of documents is 3)
11. What do you mean by document vectors?
Document Vector contains the frequency of each word of the vocabulary in a particular
document.
In document vector vocabulary is written in the top row. Now, for each word in the
document, if it matches with the vocabulary, put a 1 under it. If the same word appears
again, increment the previous value by 1. And if the word does not occur in that document,
put a 0 under it.

12. What is TFIDF?


Term frequency–inverse document frequency, is a numerical statistic that is intended to
reflect how important a word is to a document in a collection or corpus.
The number of times a word appears in a document divided by the total number of words in
the document. Every document has its own term frequency.

13. Which words in a corpus have the highest values and which ones have the
least?
Stop words like - and, this, is, the, etc. have highest values in a corpus. But these words do
not talk about the corpus at all. Hence, these are termed as stopwords and are mostly
removed at the pre- processing stage only.
Rare or valuable words occur the least but add the most importance to the corpus. Hence,
when we look at the text, we take frequent and rare words into consideration.

14. Does the vocabulary of a corpus remain the same before and after text
normalization? Why?

No, the vocabulary of a corpus does not remain the same before and after text
normalization. Reasons are –
a) In normalization the text is normalized through various steps and is lowered to
minimum vocabulary since the machine does not require grammatically correct statements
but the essence of it.
b) In normalization Stop words, Special Characters and Numbers are removed.
c) In stemming the affixes of words are removed and the words are converted to
their base form. So, after normalization, we get the reduced vocabulary.

15. Explain the concept of Bag of Words.


Bag of Words is a Natural Language Processing model which helps in extracting features out
of the text which can be helpful in machine learning algorithms. In bag of words, we get the
occurrences of each word and construct the vocabulary for the corpus.

16. Explain the relation between occurrence and value of a word.


As shown in the graph, occurrence and value of a word are inversely proportional. The
words which occur most (like stop words) have negligible value. As the occurrence of words
drops, the value of such words rises. These words are termed as rare or valuable words.
These words occur the least but add the most value to the corpus.

17. What are the applications of TFIDF?


TFIDF is commonly used in the Natural Language Processing domain. Some of its applications
are:
a) Document Classification - Helps in classifying the type and genre of a document.
b) Topic Modelling - It helps in predicting the topic for a corpus.
c) Information Retrieval System - To extract the important information out of a corpus.
d) Stop word filtering - Helps in removing the unnecessary words out of a text body.

18. Create a document vector table for the given corpus:


Document 1: We are going to Mumbai
Document 2: Mumbai is a famous place.
Document 3: We are going to a famous place.
Document 4: I am famous in Mumbai.

We Are going to Mumbai is a famous place I am in

1 1 1 1 0 0 0 0 0 0
0 0 0 1 1 1 1 0 0 0
1 1 1 0 0 1 1 0 0 0
0 0 0 1 0 1 0 1 1 1
19. Write the steps necessary to implement the bag of words
algorithm. Answer – The steps to implement bag of words
algorithm are as follows:
1. Text Normalisation: Collect data and pre-process it
2. Create Dictionary: Make a list of all the unique words occurring in the corpus.
3. Create document vectors: For each document in the corpus, find out how many times
the word from the unique list of words has occurred.
4. Create document vectors for all the documents.

20. Imagine developing a prediction model based on AI and deploying it to


monitor traffic congestion on the roadways. Now, the model’s goal is to foretell
whether or not there will be a traffic jam. We must now determine whether or
not the predictions this model generates are accurate in order to gauge its
efficacy. Prediction and Reality are the two conditions that we need to
consider.
Today, traffic jams are a regular occurrence in our life. Every time you get on the
road when you live in an urban location, you have to deal with traffic. Most
pupils choose to take buses to school. Due to these traffic bottlenecks, the bus
frequently runs late, making it impossible for the pupils to get to school on time.
Create a Confusion Matrix for the aforementioned scenario while taking into
account all potential outcomes.
Answer –
Case 1: Is there a traffic Jam?
Prediction: Yes Reality: Yes True Positive
Case 2: Is there a traffic Jam?
Prediction: No Reality: No True Negative
Case 3: Is there a traffic Jam?
Prediction: Yes Reality: No False Positive
Case 4: Is there a traffic Jam?
Prediction: No Reality: Yes False Negative

21. Make a 4W Project Canvas.


Risks will become more concentrated in a single network as more and more
innovative technologies are used. In such cases, cybersecurity becomes incredibly
complex and is no longer under the authority of firewalls. It won’t be able to
recognise odd behaviour patterns, including data migration.
Consider how AI systems can sift through voluminous data to find user
behaviour that is vulnerable. To explicitly define the scope, the method of data
collection, the model, and the evaluation criteria, use an AI project cycle.

You might also like