0% found this document useful (0 votes)

199 views4 pages

Brown Corpus Analysis and Programming Tasks

The document contains 8 questions asking to analyze and summarize various corpora from NLTK including: 1) Counting words starting with "wh" in the Brown news corpus 2) Finding conditional frequency of modals in Brown corpus categories 3) Finding the year from filenames in the Inaugural Address Corpus 4) Counting occurrences of words in State of the Union addresses over time 5) Comparing vocabulary between two texts 6) Finding words that occur at least 3 times in Brown Corpus 7) Calculating lexical diversity scores for each Brown Corpus genre 8) Writing a function to find the 50 most frequent words in a text

Uploaded by

naman

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

199 views4 pages

Brown Corpus Analysis and Programming Tasks

Uploaded by

naman

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Assignment 2(a)

Q1: In the news genre of brown corpus, find the count of the words starting with
wh, such as what, when, where, who and why?

Q2: Find the conditional frequency distribution of modals ['can', 'could', 'may',
'might', 'must', 'will'] in all the categories of brown corpus?

Q3: Find the year out of the filenames in the Inaugural Address Corpus?

Q4: Read in the texts of the State of the Union addresses, using the state_union
corpus reader. Count occurrences of men , women , and people in each
document. What has happened to the usage of these words over time?
Q5: Pick a pair of texts and study the differences between them, in terms of
vocabulary, vocabulary richness, genre, etc.

Q6: Write a program to find all words that occur at least three times in the Brown
Corpus.

Q7: Write a program to generate a table of lexical diversity scores (i.e., token/type
ratios) for each genre Include the full set of Brown Corpus genres (
nltk.corpus.brown.categories() ). Which genre has the lowest diversity

Ans :

categories=brown.categories()
minLength=20000
category=""
for i in categories:
words=brown.words(categories=[i])
words=len(words)
vocab=len(set(words))
lexicalDiversity=words/vocab
if(lexicalDiversity<=minLength):
minLength=lexicalDiversity
category=i
print(category,minLength)

Q8: Write a function that finds the 50 most frequently occurring words of a text?

MADE BY: NAMAN MALHOTRA

101512035
SEM2

Sara Question Paper
No ratings yet
Sara Question Paper
1 page
Fixing Word Count Bug in Python Program
No ratings yet
Fixing Word Count Bug in Python Program
2 pages
10001A - Year - B.A. (NEW CBCS Pattern) Sem-I Subject - BA12A1 - Compulsory English
No ratings yet
10001A - Year - B.A. (NEW CBCS Pattern) Sem-I Subject - BA12A1 - Compulsory English
4 pages
2024 08 06 09 57 Solution
No ratings yet
2024 08 06 09 57 Solution
11 pages
S.Y.B.A (Sem - III) 2019 Pattern
No ratings yet
S.Y.B.A (Sem - III) 2019 Pattern
259 pages
05 - Dictionaries and Tuples
No ratings yet
05 - Dictionaries and Tuples
61 pages
Sheet 05
No ratings yet
Sheet 05
2 pages
English II-year - Model Paper
No ratings yet
English II-year - Model Paper
4 pages
GR 12 Home Test (Maths Group)
No ratings yet
GR 12 Home Test (Maths Group)
6 pages
NLP Exam for SYMCA Students
No ratings yet
NLP Exam for SYMCA Students
4 pages
Python File Handling Basics
No ratings yet
Python File Handling Basics
10 pages
NLP Assignment-1 (Jagadish S - 22138025)
No ratings yet
NLP Assignment-1 (Jagadish S - 22138025)
5 pages
Mid-Term: Answer Point Value: 1.0 Points Answer Key: D
100% (1)
Mid-Term: Answer Point Value: 1.0 Points Answer Key: D
12 pages
Cairo University Faculty of Computers and Information Final Exam
No ratings yet
Cairo University Faculty of Computers and Information Final Exam
4 pages
امتحانات بوكليت إنجليزي ثالثة إعدادي 2025
No ratings yet
امتحانات بوكليت إنجليزي ثالثة إعدادي 2025
25 pages
AIT526 Lab 2 Text Summarization
No ratings yet
AIT526 Lab 2 Text Summarization
4 pages
Note: Do The Homework On Your Fair Notebook. No Need To Write
No ratings yet
Note: Do The Homework On Your Fair Notebook. No Need To Write
13 pages
Python Programming Exercises for Class XII
No ratings yet
Python Programming Exercises for Class XII
45 pages
Midterm Questions
No ratings yet
Midterm Questions
10 pages
English Activities at Class October 31ST
No ratings yet
English Activities at Class October 31ST
9 pages
Sheet 1
No ratings yet
Sheet 1
2 pages
Vellamal Bodhi Campus Vellore: Worksheet File Handling Marks
No ratings yet
Vellamal Bodhi Campus Vellore: Worksheet File Handling Marks
2 pages
WORKSHEET 4 For Text Files
No ratings yet
WORKSHEET 4 For Text Files
4 pages
2024 08 06 09 51 Solution
No ratings yet
2024 08 06 09 51 Solution
7 pages
Textfilehandling Worksheet2
No ratings yet
Textfilehandling Worksheet2
6 pages
Text Files Worksheet in Python
No ratings yet
Text Files Worksheet in Python
2 pages
Final Exam: Department: Computer Sciences
No ratings yet
Final Exam: Department: Computer Sciences
10 pages
English Exam Paper Analysis
No ratings yet
English Exam Paper Analysis
254 pages
Python File Handling Worksheet
No ratings yet
Python File Handling Worksheet
8 pages
NLP Exam Questions 2023-24
No ratings yet
NLP Exam Questions 2023-24
5 pages
StudentDetails 02jul20191956
No ratings yet
StudentDetails 02jul20191956
6 pages
Flight Ticket - Chandigarh To Ahmedabad: Passenger's Name Status 1. Miss Kanika Malhotra Confirmed
No ratings yet
Flight Ticket - Chandigarh To Ahmedabad: Passenger's Name Status 1. Miss Kanika Malhotra Confirmed
3 pages
RC4 Stream Cipher Overview
No ratings yet
RC4 Stream Cipher Overview
3 pages
Key Software Metrics Explained
No ratings yet
Key Software Metrics Explained
1 page
Object Detection for Robotics
No ratings yet
Object Detection for Robotics
4 pages
Invigilator Attendance Diary 2018
No ratings yet
Invigilator Attendance Diary 2018
3 pages
Overview of Recommender Systems
100% (1)
Overview of Recommender Systems
5 pages
Word Level Analyis III
No ratings yet
Word Level Analyis III
24 pages
NLP Assignment Anand1
No ratings yet
NLP Assignment Anand1
5 pages
Common Idioms and Their Meanings
No ratings yet
Common Idioms and Their Meanings
1 page
Approximation Algorithms for NP-Hard Problems
No ratings yet
Approximation Algorithms for NP-Hard Problems
37 pages
123assignment I
No ratings yet
123assignment I
1 page
Numerical Analysis Assignment
No ratings yet
Numerical Analysis Assignment
2 pages
Activity 1
No ratings yet
Activity 1
1 page
3 Sol
No ratings yet
3 Sol
3 pages

Brown Corpus Analysis and Programming Tasks

Uploaded by

Brown Corpus Analysis and Programming Tasks

Uploaded by

Assignment 2(a)

MADE BY: NAMAN MALHOTRA

You might also like