小红书评论爬虫代码
时间: 2025-05-01 19:35:32 浏览: 67
小红书评论爬虫代码的实现需要了解以下几个步骤:
1. 确定要爬取的页面链接及需要的信息:例如,爬取某个商品的评论,需要获取评论内容、用户昵称、评分等信息。
2. 模拟登录:小红书需要登录后才能查看商品评论,因此需要模拟登录来获取访问权限。
3. 爬取评论数据:使用Python中的requests库发送GET请求获取评论页面的HTML源代码,并使用BeautifulSoup库解析HTML获取需要的信息。
4. 存储数据:将获取到的数据存储到本地或者数据库中,以便后续处理或分析。
下面是一个简单的小红书评论爬虫代码示例,供参考:
```
import requests
from bs4 import BeautifulSoup
# 模拟登录
def login():
# 构造登录请求
login_url = 'https://passport.xiaohongshu.com/api/oauth/token/login'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36',
'Content-Type': 'application/json'
}
data = {
"countryCode": "86",
"mobile": "your_mobile_number",
"password": "your_password",
"deviceId": "your_device_id"
}
response = requests.post(login_url, headers=headers, json=data)
if response.status_code == 200:
return response.json()['data']['token']
else:
return None
# 爬取评论数据
def crawl_comments(token):
# 构造评论请求
comments_url = 'https://www.xiaohongshu.com/api/sns/v9/note/your_note_id/comments'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36',
'Authorization': token
}
response = requests.get(comments_url, headers=headers)
if response.status_code == 200:
soup = BeautifulSoup(response.text, 'html.parser')
comments = []
for comment in soup.find_all('div', {'class': 'comment'}):
content = comment.find('div', {'class': 'comment__content'}).text.strip()
user_name = comment.find('div', {'class': 'comment__header'}).find('a').text.strip()
rating = comment.find('div', {'class': 'comment__header'}).find('span', {'class': 'comment__rating'}).text.strip()
comments.append({'content': content, 'user_name': user_name, 'rating': rating})
return comments
else:
return None
if __name__ == '__main__':
# 模拟登录获取token
token = login()
if token is None:
print('登录失败!')
else:
# 爬取评论数据
comments = crawl_comments(token)
if comments is None:
print('评论爬取失败!')
else:
# 打印评论数据
for comment in comments:
print(comment['user_name'], comment['rating'], comment['content'])
# 存储评论数据
# TODO: 存储评论数据到本地或数据库中
阅读全文
相关推荐

















