1.在https://ssr1.scrape.center,然后利用lxml与xpath搜索标签的法方将页面中电影名称和评分全部找出来并打印。
时间: 2024-10-23 19:02:30 浏览: 127
在Python中,你可以使用`requests`, `lxml`和`BeautifulSoup`库来抓取网页内容,并结合XPath表达式来提取电影名称和评分。以下是基本步骤:
首先,你需要安装必要的库,如果还没有安装,可以使用pip安装:
```bash
pip install requests lxml beautifulsoup4
```
接下来是一个简单的示例,展示如何通过URL https://ssr1.scrape.center 获取并解析数据:
```python
import requests
from bs4 import BeautifulSoup
from lxml import etree
# 请求网页内容
url = "https://ssr1.scrape.center"
response = requests.get(url)
content = response.content
# 使用lxml解析HTML
html_soup = BeautifulSoup(content, 'lxml')
# 或者使用etree,两者效果类似
root = etree.fromstring(content)
# 使用XPath表达式找到电影名称和评分
movies_block_xpath = '//div[@class="movie"]' # 假设电影信息在class为"movie"的div内
movie_elements = html_soup.xpath(movies_block_xpath) or root.xpath(movies_block_xpath)
for movie in movie_elements:
name_element = movie.xpath('.//h2[@class="name"]/text()')[0] # 电影名称通常在h2元素里
rating_element = movie.xpath('.//span[@class="rating"]/text()') or movie.xpath('.//span[contains(@class,"score")]/text()') # 评分可能是span元素,也可能包含在其他class中有score的元素
movie_name = name_element.strip()
movie_rating = rating_element[0].strip() if rating_element else None
print(f"电影名称: {movie_name}, 评分: {movie_rating}")
阅读全文
相关推荐

















