爬取分析豆瓣电影系统TOP250所有数据信息保存csv代码
时间: 2025-07-05 17:08:36 浏览: 2
### Python 爬虫抓取豆瓣电影 Top250 数据并保存至 CSV 文件
为了完成这一目标,可以采用 `requests` 和 `BeautifulSoup` 库来抓取网页内容,并利用 `pandas` 来处理和存储数据。下面是一个完整的示例代码:
#### 安装所需库
首先安装所需的第三方库:
```bash
pip install requests beautifulsoup4 pandas
```
#### 抓取与解析页面
创建一个名为 `douban_spider.py` 的文件,在其中编写如下代码:
```python
import requests
from bs4 import BeautifulSoup
import pandas as pd
def fetch_page(url):
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64)',
}
response = requests.get(url, headers=headers)
if response.status_code == 200:
return response.text
else:
raise Exception(f"Failed to load page {url}")
def parse_html(html_content):
soup = BeautifulSoup(html_content, "html.parser")
movie_list = []
items = soup.find_all('div', class_='item')
for item in items:
rank = item.find('em').get_text()
title = item.select_one('.title').get_text().strip()
rating_num = float(item.find('span', class_='rating_num').get_text())
try:
quote = item.find('span', class_='inq').get_text(strip=True)
except AttributeError:
quote = None
movie_info = {"Rank": int(rank), "Title": str(title),
"Rating": rating_num, "Quote": quote}
movie_list.append(movie_info)
return movie_list
if __name__ == "__main__":
base_url = "https://movie.douban.com/top250?start={}&filter="
all_movies = []
for i in range(0, 250, 25): # 遍历每一页
url = base_url.format(i)
html = fetch_page(url)
movies_on_this_page = parse_html(html)
all_movies.extend(movies_on_this_page)
df = pd.DataFrame(all_movies)
output_file_path = "./douban_top_250.csv"
df.to_csv(output_file_path, index=False, encoding='utf-8-sig')
print(f"All data has been successfully saved into '{output_file_path}'")
```
这段程序会遍历豆瓣电影 Top250 列表中的所有页面,提取每一部影片的信息(排名、名称、评分以及简介),最后把这些信息整理成表格形式存入本地的一个 CSV 文件里。
阅读全文
相关推荐


















