0% found this document useful (0 votes)
199 views4 pages

Brown Corpus Analysis and Programming Tasks

The document contains 8 questions asking to analyze and summarize various corpora from NLTK including: 1) Counting words starting with "wh" in the Brown news corpus 2) Finding conditional frequency of modals in Brown corpus categories 3) Finding the year from filenames in the Inaugural Address Corpus 4) Counting occurrences of words in State of the Union addresses over time 5) Comparing vocabulary between two texts 6) Finding words that occur at least 3 times in Brown Corpus 7) Calculating lexical diversity scores for each Brown Corpus genre 8) Writing a function to find the 50 most frequent words in a text

Uploaded by

naman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
199 views4 pages

Brown Corpus Analysis and Programming Tasks

The document contains 8 questions asking to analyze and summarize various corpora from NLTK including: 1) Counting words starting with "wh" in the Brown news corpus 2) Finding conditional frequency of modals in Brown corpus categories 3) Finding the year from filenames in the Inaugural Address Corpus 4) Counting occurrences of words in State of the Union addresses over time 5) Comparing vocabulary between two texts 6) Finding words that occur at least 3 times in Brown Corpus 7) Calculating lexical diversity scores for each Brown Corpus genre 8) Writing a function to find the 50 most frequent words in a text

Uploaded by

naman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Assignment 2(a)

Q1: In the news genre of brown corpus, find the count of the words starting with
wh, such as what, when, where, who and why?

Q2: Find the conditional frequency distribution of modals ['can', 'could', 'may',
'might', 'must', 'will'] in all the categories of brown corpus?

Q3: Find the year out of the filenames in the Inaugural Address Corpus?

Q4: Read in the texts of the State of the Union addresses, using the state_union
corpus reader. Count occurrences of men , women , and people in each
document. What has happened to the usage of these words over time?
Q5: Pick a pair of texts and study the differences between them, in terms of
vocabulary, vocabulary richness, genre, etc.

Q6: Write a program to find all words that occur at least three times in the Brown
Corpus.

Q7: Write a program to generate a table of lexical diversity scores (i.e., token/type
ratios) for each genre Include the full set of Brown Corpus genres (
nltk.corpus.brown.categories() ). Which genre has the lowest diversity

Ans :

categories=brown.categories()
minLength=20000
category=""
for i in categories:
words=brown.words(categories=[i])
words=len(words)
vocab=len(set(words))
lexicalDiversity=words/vocab
if(lexicalDiversity<=minLength):
minLength=lexicalDiversity
category=i
print(category,minLength)

Q8: Write a function that finds the 50 most frequently occurring words of a text?

MADE BY: NAMAN MALHOTRA


101512035
SEM2

You might also like