_统计词频

原创于 2020-11-16 21:31:06 发布 · 106 阅读

CC 4.0 BY-SA版权

该博客内容涉及对'walden.txt'文件的读取和处理。首先，通过f.readlines()读取文件，然后使用空格作为分隔符将文本转换为字符串列表。接着，利用Counter模块进行词频统计，输出前12000个最常见的词汇。博客主要探讨了文本分析和处理技术。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

f = open('C:\\Users\\Administrator\\Desktop\\walden.txt','r')
f_line = f.readlines()
str = " ".join(f_line)   #转为用分隔符“ ”连接的字符串
words = str.split()     #按某一字符串“ ”分割

from collections import Counter    #导入模块，计数
counter = Counter(words)
dictionary=dict(counter)
k=12000
res=counter.most_common(k)
print(res)