python爬取新闻的100条即时消息
时间: 2025-02-17 19:25:51 浏览: 48
Python 爬取新闻网站的即时消息通常涉及网络请求、HTML解析以及数据处理。你可以使用 Python 的requests库发送HTTP请求获取网页内容,然后利用BeautifulSoup或lxml等库解析HTML提取所需信息。以下是一个简单的步骤概述:
1. **安装必要的库**:
```bash
pip install requests beautifulsoup4 lxml
```
2. **编写爬虫脚本**:
```python
import requests
from bs4 import BeautifulSoup
def get_news(url):
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'
}
response = requests.get(url, headers=headers)
if response.status_code == 200:
soup = BeautifulSoup(response.text, 'lxml')
# 找到包含新闻的部分(这取决于目标网站结构)
news_list = soup.select('.news-item') or soup.find_all('div', class_='news-container')
messages = []
for news in news_list[:100]: # 获取前100条消息
title = news.find('h2').text if news.h2 else ''
summary = news.find('p').text if news.p else ''
messages.append({'title': title, 'summary': summary})
return messages
else:
print(f"Failed to fetch data from {url}. Status code: {response.status_code}")
# 使用函数并指定新闻网站URL
url = "https://example.com/news" # 替换为实际的新闻网站URL
news_messages = get_news(url)
for message in news_messages:
print(message)
阅读全文
相关推荐


















