修改代码,使得li_list的编码格式是utf-8import requests from bs4 import BeautifulSoup url = 'https://www.icbc.com.cn/page/827855918799994880.html' response = requests.get(url=url) page_response = response.text soup = BeautifulSoup(page_response, 'html.parser',from_encoding='utf-8') li_list = soup.select('#mypagehtmlcontent p')
时间: 2023-06-19 09:06:53 浏览: 248
import requests
from bs4 import BeautifulSoup
url = 'https://www.icbc.com.cn/page/827855918799994880.html'
response = requests.get(url=url)
page_response = response.content.decode('utf-8')
soup = BeautifulSoup(page_response, 'html.parser')
li_list = soup.select('#mypagehtmlcontent p')
# 以下为测试代码,可以不用理会
for li in li_list:
print(li.text)
相关问题
import requests from bs4 import BeautifulSoup url = "https://www.555dy1.com/voddetail/58397.html" # 发送请求并获取网页内容 response = requests.get(url) # 解析HTML内容,并提取播放地址 soup = BeautifulSoup(response.text, 'html.parser') links = soup.find_all('a', class_='module-play-list-link') # print(links) # 输出每个链接的播放地址 for link in links: print(link['href']) videolist = 'https://www.555dy1.com/'+ link['href'] print (videolist)
您可以考虑使用集合(set)来去除重复的播放地址,并使用字符串的 `startswith()` 方法判断链接是否以特定的字符串开头,以筛选出不同的播放器。下面是示例代码:
```
import requests
from bs4 import BeautifulSoup
url = "https://www.555dy1.com/voddetail/58397.html"
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
links = soup.find_all('a', class_='module-play-list-link')
playlists = set()
for link in links:
href = link['href']
if href.startswith('/play/'):
playlist = 'https://www.555dy1.com' + href
playlists.add(playlist)
for playlist in playlists:
print(playlist)
```
在上述代码中,我们首先创建了一个空的集合 `playlists`,用于存储不同播放器的播放地址。遍历所有链接时,我们使用 `startswith()` 方法判断链接是否以 `/play/` 开头,如果是则将其转换为完整的播放地址,并加入集合中。最后,我们遍历集合中的所有地址,并输出它们。由于集合中不会存在重复元素,因此重复的地址会被自动去除。
import requests from bs4 import BeautifulSoup import re url = 'https://www.umei.net/tags/siwameitui/' domain = 'https://www.umei.net/' resp = requests.get(url) resp.encoding = 'utf-8' # print(resp.text) page = BeautifulSoup(resp.text, 'html.parser') list_li = page.findAll('li', attrs={"class": 'i_list list_n2'}) for i in list_li: list_page = BeautifulSoup(i, 'html.parser') break
你好!我是CSDN开发的ChitGPT,一个类ChatGPT的AI。你的第一个问题是关于如何使用Python抓取网页并解析HTML内容的代码。
下面是使用Python的requests库和BeautifulSoup库来实现这个功能的代码示例:
```python
import requests
from bs4 import BeautifulSoup
url = 'https://www.umei.net/tags/siwameitui/'
domain = 'https://www.umei.net/'
resp = requests.get(url)
resp.encoding = 'utf-8'
page = BeautifulSoup(resp.text, 'html.parser')
list_li = page.findAll('li', attrs={"class": 'i_list list_n2'})
for i in list_li:
list_page = BeautifulSoup(str(i), 'html.parser')
# 在这里可以对每个li元素进行处理,提取所需的信息
```
以上代码首先使用`requests.get()`方法发送GET请求获取网页内容,然后将其编码为UTF-8格式。接下来,使用BeautifulSoup的`BeautifulSoup()`方法将网页内容解析为HTML对象。然后,使用`findAll()`方法查找所有符合条件的li元素,并逐个处理。
请注意,在处理每个li元素之前,需要将其转换为字符串形式,以便能够使用BeautifulSoup解析。
希望这能帮到你!如果你有任何其他问题,请随时提问。
阅读全文
相关推荐















