python scrapy.爬取豆瓣某电影影评并进行数据分析
时间: 2025-01-24 16:10:46 浏览: 52
好的,我来为您介绍一下如何使用Python的Scrapy框架爬取豆瓣某电影的影评并进行数据分析:
1. 环境搭建:
首先需要安装Scrapy库,可以使用pip命令:
```
pip install scrapy
```
2. 创建Scrapy项目:
```
scrapy startproject douban_comments
cd douban_comments
```
3. 编写爬虫:
在spiders目录下创建一个新的爬虫文件,例如douban_spider.py:
```python
import scrapy
from douban_comments.items import DoubanCommentsItem
class DoubanSpider(scrapy.Spider):
name = "douban"
allowed_domains = ["movie.douban.com"]
start_urls = ["https://movie.douban.com/subject/1292052/comments?status=P"]
def parse(self, response):
comments = response.xpath('//div[@class="comment"]')
for comment in comments:
item = DoubanCommentsItem()
item['user'] = comment.xpath('.//span[@class="comment-info"]/a/text()').get()
item['rating'] = comment.xpath('.//span[@class="comment-info"]/span[2]/@title').get()
item['content'] = comment.xpath('.//span[@class="short"]/text()').get()
item['time'] = comment.xpath('.//span[@class="comment-time "]/@title').get()
yield item
# 翻页
next_page = response.xpath('//span[@class="next"]/a/@href').get()
if next_page:
yield scrapy.Request(url=response.urljoin(next_page), callback=self.parse)
```
4. 定义Item:
在items.py中定义要爬取的数据结构:
```python
import scrapy
class DoubanCommentsItem(scrapy.Item):
user = scrapy.Field()
rating = scrapy.Field()
content = scrapy.Field()
time = scrapy.Field()
```
5. 运行爬虫:
```
scrapy crawl douban -o comments.json
```
6. 数据分析:
爬取完成后,可以使用pandas进行数据分析:
```python
import pandas as pd
import matplotlib.pyplot as plt
# 读取数据
df = pd.read_json('comments.json')
# 清洗数据
df['time'] = pd.to_datetime(df['time'])
df['rating'] = df['rating'].str.extract('(\d+)').astype(int)
# 分析评分分布
rating_counts = df['rating'].value_counts().sort_index()
rating_counts.plot(kind='bar')
plt.title('评分分布')
plt.xlabel('评分')
plt.ylabel('数量')
plt.show()
# 分析评论趋势
df.set_index('time').resample('D').size().plot()
plt.title('评论数量趋势')
plt.xlabel('日期')
plt.ylabel('数量')
plt.show()
```
通过以上步骤,我们可以爬取豆瓣某电影的影评数据,并进行简单的数据分析。这个过程可以帮助我们了解电影的口碑变化趋势,以及观众对电影的总体评价。
阅读全文
相关推荐

















