Module 3
Module 3
In the context of natural language processing, how can we leverage the concepts of TF-IDF,
training set, validation set, test set, and stop words to improve the accuracy and
effectiveness of machine learning models and algorithms? Additionally, what are some
potential challenges and considerations when working with these concepts, and how can we
address them? CO3 BL3 5 Marks
2. Define text classification. CO3 BL3 2 Marks
3. Describe the ways of Information Extraction from unstructured text. CO3 BL3 5
Marks
4. Explain ad-hoc retrieval problems. CO3 BL3 2 Marks
5. What aspects of ad-hoc retrieval problems are addressed by Information Retrieval research?
CO3 BL3 2 Marks
6. What are the contents of an Information Retrieval model? CO3 BL2 2 Marks
7. What is an inverted index? CO3 BL2 2 Marks
8. Describe how hand-coded rules help in performing text classification. CO3 BL3 5
Marks
9. What are the machine learning approaches used for text classification? CO3 BL3 5
Marks
10. What is/are the drawback/s of the Naive Bayes classifier? CO3 BL3 5 Marks
11. Explain the result of Multinomial Naïve Bayes Independence Assumptions. CO3 BL3
5 Marks
12. Write two NLP applications where we can use the bag-of-words technique. CO3 BL3
5 Marks
13. What is the problem with the maximum likelihood for the Multinomial Naive Bayes
classifier? How to resolve? CO3 BL3 10 Marks
14. Explain the confusion matrix that can be generated in terms of a spam detector. CO3 BL3
5 Marks
15. How k-fold cross validation is used for evaluating a text classifier? CO3 BL2 5
Marks
16. Explain practical issues of a text classifier and how to solve them. CO3 BL2 5
Marks
17. What are the types of Text classification techniques? CO3 BL1 5 Marks
18. Give any 3 different evaluation metrics available for text classification. Explain with
examples. CO3 BL2 10
19. What are the evaluation measures to be undertaken to judge the performance of a matrix?
CO3 BL3 2 Marks
20. With a schematic diagram explain Word2vec type of word embedding. CO3 BL2 5
Marks
21. Explain the working of Doc2Vec type of word embedding with labelled diagram. CO3 BL2
5 Marks
22. With example explain the following word to sequence analysis:- CO3 BL2 5
Marks
a) vector semantic
b) probabilistic language model
23. Define opinion mining. CO3 BL3 2 Marks
24. What are the aspects taken into account while collecting feedback of brands for sentiment
analysis? CO3 BL3 5 Marks
25. What is intent analysis? CO3 BL3 2 Marks
26. Explain emotion analysis. CO3 BL3 2 Marks
27. How does emotional analytics work? CO3 BL3 5 Marks
28. Naïve Bayes classifier is not so naïve – explain. CO3 BL3 5 Marks
29. With detailed steps explain the working of Multinomial Naive Bayes learning. CO3 BL3
5 Marks
30. What is micro averaging and macro averaging? Explain with an example. CO3 BL3
10 Marks
31. State 3 opinion mining techniques with proper explanation. CO3 BL3 10 Marks
32. What issue crops up for Information Retrieval based on keyword search in case of a huge
size document? CO3 BL3 5 Marks
33. What are the initial stages of text processing? CO3 BL3 10 Marks
34. What is the goal of an IR system? CO3 BL4 10 Marks
35. What are the different ways to use Bag-of-words representation for text classification? CO3
BL3 10 Marks
36. State the difference between sentiment analysis, intent analysis and emotion analysis. CO3
BL3 10 Marks
37. How is sentiment analysis used by different brands to assess the status of the market after
launching a product? CO3 BL3 10 Marks
38. Mention few practical application of emotion analysis by emotion recognition. CO3 BL3
10 Marks
39. Step by step explain how Naive Bayes classifier can be used for text classification. CO3
BL3 10 Marks
40. What are the 4 steps of text normalization? CO3 BL3 5 Marks
41. Highlight practical applications of text classification concept. CO3 BL4 10 Marks
42. What is Named Entity Recognition (NER)? CO3 BL2 2 Marks
43. How is Named Entity Recognition useful in NLP applications? CO3 BL2 5 Marks
44. How k-fold cross validation is used for evaluating a text classifier. CO3 BL3 10
Marks
45. Explain the fundamental concepts of Natural Language Processing (NLP) and discuss its
significance in today's digital era, providing examples of real-world applications and
potential future advancements. CO3 BL3 5 Marks
46. What is Ambiguity? Explain different types of ambiguity in NLP. CO3 BL3 5 Marks
47. What are the benefits of a text classification system? Give an example. CO3 BL3 5
Marks
48. Explain the Building Blocks of Semantic System? CO3 BL3 5 Marks
49. What is NLTK? How is it different from Spacy? CO3 BL3 5 Marks
50. Explain Dependency Parsing in NLP? CO3 BL3 10 Marks
51. What are the steps involved in pre-processing data for NLP? CO2 BL3 5 Marks
52. What are some common applications of chatbots in various industries? CO3 BL3 10
Marks
53. Compute the minimum edit distance in transforming the word DOG to COW using
Levenshtein distance, i.e., insertion = deletion =1 and substitution = 2. CO3 BL4 10
Marks
54. What are word embedding in NLP and how can they be used in various NLP applications?
CO3 BL5 10 Marks
55. Do you believe there are any distinctions between prediction and classification? Illustrate
with an example. CO3 BL3 5 Marks
56. How do lexical resources like WordNet contribute to lexical semantics in NLP? How does
lexical ambiguity impacts NLP tasks such as machine translation or sentiment analysis? CO3
BL3 5 Marks
57. Analyze the purpose of topic modeling in text analysis. CO3 BL5 5 Marks
58. Given the following dataset, classify whether a new email is spam or not using Naïve Bayes.
Email Contains "Offer" Contains "Win" Contains "Money" Spam (Yes=1, No=0)
1 Yes Yes No 1
2 Yes No Yes 1
3 No Yes No 0
4 Yes No No 0
5 No Yes Yes 0
Using Naive bayes, predict whether the email (Offer = Yes, Win = Yes, Money = Yes) is
spam or not. CO3 BL6 10 Marks
59. A company wants to classify customer feedback as "Positive" or "Negative" based on word
occurrences. The training dataset is:
Given a new feedback (Good = Yes, Fast = Yes, Cheap = No), use Naive Bayes to classify
whether the sentiment is positive or negative. CO3 BL6 10 Marks
60. A weather dataset is given for predicting whether a person will play tennis.
Using Naive Bayes, classify whether a person will play tennis if the weather conditions are:
Outlook = Rain
Temperature = Mild
Humidity = High
Wind = Strong CO3 BL6 10 Marks