首页python爬虫，爬取豆瓣网的某个电影的评论并保存的代码实现

python爬虫，爬取豆瓣网的某个电影的评论并保存的代码实现

时间: 2025-06-22 10:39:32 浏览: 30

在Python中，我们可以使用`requests`库来发送HTTP请求获取网页内容，然后利用`BeautifulSoup`或`lxml`等解析库解析HTML结构，提取所需信息。以下是一个简单的例子，展示如何爬取豆瓣电影《肖申克的救赎》的评论，并将其保存到CSV文件中： ```python import requests from bs4 import BeautifulSoup import csv # 网页URL url = "https://movie.douban.com/subject/1292052/" def get_comments(url): # 发送GET请求 response = requests.get(url) response.raise_for_status() # 检查是否请求成功 # 使用BeautifulSoup解析HTML soup = BeautifulSoup(response.text, 'html.parser') # 找到评论部分的容器 comments_container = soup.find('div', {'class': 'comment-thread'}) # 提取评论内容和用户ID comments_data = [] for comment in comments_container.find_all('li', class_='comment'): content = comment.find('span', {'class': 'short'}).text.strip() user_id = int(comment['data-userid']) comments_data.append((content, user_id)) return comments_data def save_to_csv(comments, filename): with open(filename, 'w', newline='', encoding='utf-8') as csvfile: fieldnames = ['Comment', 'User ID'] writer = csv.DictWriter(csvfile, fieldnames=fieldnames) writer.writeheader() writer.writerows([{'Comment': c[0], 'User ID': c[1]} for c in comments]) comments = get_comments(url) save_to_csv(comments, 'douban_movie_reviews.csv') ``` 请注意，实际运行此代码需要安装`requests`和`beautifulsoup4`库。此外，由于网站可能会有反爬策略（如验证码、IP限制），频繁抓取可能会导致无法访问或者被封禁。因此，在实际操作中，应遵守网站的robots.txt规则，并考虑使用代理服务器或设置合理的延时。

阅读全文