首页对于某个给定的英文网页，编写一个处理程序统计该网页中每个单词出现的次数要求： <a>. 输入一个英文网页地址，下载该网页的内容； <b>. 提取该网页中的文本内容； <c>. 统计文本内容中各单词出现的频次，并按降序排列。

对于某个给定的英文网页，编写一个处理程序统计该网页中每个单词出现的次数要求： <a>. 输入一个英文网页地址，下载该网页的内容； <b>. 提取该网页中的文本内容； <c>. 统计文本内容中各单词出现的频次，并按降序排列。

时间: 2025-02-17 16:22:28 浏览: 40

要完成这个任务，你可以使用Python编程语言，结合requests库下载网页内容，BeautifulSoup库解析HTML提取文本，然后利用collections库中的Counter类来统计单词频率。以下是大致步骤： 1. **安装所需库**: 首先需要安装`requests` 和 `beautifulsoup4` 库，如果你还没有安装，可以使用下面命令安装： ``` pip install requests beautifulsoup4 ``` 2. **编写代码**: ```python import requests from bs4 import BeautifulSoup from collections import Counter # 定义函数：下载并提取网页内容 def extract_text(url): response = requests.get(url) if response.status_code == 200: soup = BeautifulSoup(response.text, 'html.parser') # 使用CSS选择器提取正文部分，假设文章在class="article-content"下 text = soup.select_one('.article-content').get_text().lower() return text else: print(f"无法访问 {url}, 状态码: {response.status_code}") return None # 定义函数：统计单词频率 def word_frequency(text): words = text.split() # 分割成单词列表 counter = Counter(words) # 计算每个单词的出现次数 sorted_counter = counter.most_common() # 按照频率降序排序 return sorted_counter # 示例使用 url = "https://example.com" # 替换为你提供的网址 text = extract_text(url) if text is not None: top_words = word_frequency(text) for index, (word, freq) in enumerate(top_words, start=1): print(f"{index}. {word}: {freq} 次") ``` 注意：上述代码假设了网页结构已知，实际操作中可能需要根据目标网站的具体结构调整CSS选择器。

阅读全文