爬虫爬取分析豆瓣电影系统TOP250所有数据信息保存csv代码
时间: 2025-07-05 09:08:39 浏览: 6
### Python 爬虫 豆瓣电影 TOP250 数据抓取 保存 CSV 示例代码
为了实现这一目标,可以采用 `requests` 和 `BeautifulSoup` 库来解析网页内容,并使用 `pandas` 来处理和存储数据到CSV文件中。下面是一个完整的Python脚本示例:
```python
import requests
from bs4 import BeautifulSoup
import pandas as pd
def fetch_douban_top_250():
url = 'https://movie.douban.com/top250'
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36"
}
movies_list = []
for start in range(0, 250, 25): # 遍历每一页
params = {'start': str(start)}
response = requests.get(url=url, headers=headers, params=params)
soup = BeautifulSoup(response.text, 'html.parser')
items = soup.find_all('div', class_='item')
for item in items:
movie_info = {}
title_tag = item.select_one('.title') or item.select_one('.hd a span:nth-child(1)')
rating_num = item.select_one('.rating_num').get_text()
inq = item.select_one('.inq')
movie_info['Title'] = title_tag.get_text(strip=True).replace('\n', '').strip() if title_tag else ''
movie_info['Rating'] = float(rating_num.strip()) if rating_num else None
if inq is not None:
movie_info['Description'] = inq.get_text().strip()
else:
movie_info['Description'] = ''
movies_list.append(movie_info)
df_movies = pd.DataFrame(movies_list)
df_movies.to_csv('douban_top_250.csv', index=False)
if __name__ == '__main__':
fetch_douban_top_250()
```
这段程序会访问豆瓣网站上的Top250页面,提取每一部影片的名字、评分以及简介信息,并最终把这些信息写入名为 `douban_top_250.csv` 的CSV文档里[^1]。
阅读全文
相关推荐


















