爬取豆瓣网top250所有数据保存到csv
时间: 2025-07-05 15:09:34 浏览: 7
### 使用 Python 爬取豆瓣电影 Top250 并保存为 CSV 文件
为了实现这一目标,可以采用 `requests` 和 `BeautifulSoup` 库来处理网页请求和解析 HTML 文档。以下是具体方法:
#### 导入所需库
```python
import requests
from bs4 import BeautifulSoup
import csv
```
#### 设置请求头与URL模板
设置合适的User-Agent以模拟浏览器访问行为,并定义每页链接的基础 URL。
```python
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36"
}
base_url = 'https://movie.douban.com/top250?start={}&filter='
```
#### 获取单个页面的内容并解析
构建函数用于发送 HTTP 请求获取指定索引处的页面内容,并利用 Beautiful Soup 解析返回的结果提取有用的信息片段。
```python
def get_page_content(page_index):
url = base_url.format(page_index * 25)
response = requests.get(url=url, headers=headers).text
soup = BeautifulSoup(response, 'html.parser')
items = []
for item in soup.find_all('div', class_='item'):
movie_info = {}
title_tag = item.select_one('.title').get_text(strip=True)
rating_num = float(item.select_one('.rating_num').get_text(strip=True))
inq_tags = item.select('.inq')
quote = ''
if len(inq_tags)>0:
quote=inq_tags[0].string.strip()
info=item.p.get_text('|',strip=True).split('|')[:5]
director_and_actors,*rest=info
try:
year=int(rest[-2])
except ValueError as e:
continue
country_category= rest[:-2][:-1]
categories = ''.join(country_category)
movie_info['Title'] = title_tag
movie_info['Rating'] = rating_num
movie_info['Quote'] = quote
movie_info['Director and Actors'] = director_and_actors
movie_info['Year'] = year
movie_info['Country/Categories'] = categories
items.append(movie_info)
return items
```
#### 将数据写入CSV文件
创建或打开一个名为 “douban_top250_movies.csv”的文件,在其中追加每一部影片的相关信息作为一行记录。
```python
with open('douban_top250_movies.csv', mode='w', newline='', encoding='utf-8-sig') as file:
writer = csv.DictWriter(file, fieldnames=['Title', 'Rating', 'Quote','Director and Actors', 'Year', 'Country/Categories'])
writer.writeheader()
for i in range(10):
movies_data=get_page_content(i)
with open('douban_top250_movies.csv', mode='a+', newline='', encoding='utf-8-sig') as file:
writer = csv.DictWriter(file, fieldnames=['Title', 'Rating', 'Quote','Director and Actors', 'Year', 'Country/Categories'])
for row in movies_data:
writer.writerow(row)
```
上述代码实现了从豆瓣网抓取 Top250 影片列表中的基本信息(包括名称、评分、简介等),并将它们整理成结构化的表格形式存放在本地磁盘上的 CSV 文件里[^1]。
阅读全文
相关推荐


















