python爬虫爬取网页图片_Python爬虫：爬取网页图片

最新推荐文章于 2023-04-24 15:55:10 发布

最新推荐文章于 2023-04-24 15:55:10 发布 · 251 阅读

文章标签：

#python爬虫爬取网页图片

本文介绍了如何使用Python和BeautifulSoup库爬取指定网站上的图片，并将其保存到本地文件夹。作者首先通过requests获取网页内容，然后解析HTML找到图片链接，最后将图片下载并存储。适合初学者理解基本的网络爬虫流程。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

?i=2019051015394892.jpg?,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L2lwcm90bg==,size_16,color_FFFFFF,t_70

先分析查找要爬取图片的路径在浏览器F12 审查元素

?i=20190510154205665.jpg?,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L2lwcm90bg==,size_16,color_FFFFFF,t_70

整体实现代码

# -- coding:UTF-8 --

import requests

from bs4 import BeautifulSoup

import os

'''

思路：获取网址

获取图片地址

爬取图片并保存

'''

# 获取网址

def getUrl(url):

try:

read = requests.get(url) #获取url

read.raise_for_status() #状态响应返回200连接成功

read.encoding = read.apparent_encoding #从内容中分析出响应内容编码方式

return read.text #Http响应内容的字符串，即url对应的页面内容

except:

return "连接失败！"

# 获取图片地址并保存下载

def getPic(html):

soup = BeautifulSoup(html, "html.parser")

#通过分析网页内容，查找img的统一父类及属性

all_img = soup.find('ul', class_='thumbnail-group thumbnail-group-165 clearfix').find_all('img') #img为图片的标签

for img in all_img:

src = img['src'] #获取img标签里的src内容

img_url = src

print(img_url)

root = "F:/Pic/" #保存的路径

path = root + img_url.split('/')[-1] #获取img的文件名

print(path)

try:

if not os.path.exists(root): #判断是否存在文件并下载img

os.mkdir(root)

if not os.path.exists(path):

read = requests.get(img_url)

with open(path, "wb")as f:

f.write(read.content)

f.close()

print("文件保存成功！")

else:

print("文件已存在！")

except:

print("文件爬取失败！")

# 主函数

if __name__ == '__main__':

html_url=getUrl("https://findicons.com/search/nature")

getPic(html_url)

运行结果

?i=20190515164805478.jpg?,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L2lwcm90bg==,size_16,color_FFFFFF,t_70

爬取结果

?i=20190515164844290.jpg?,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L2lwcm90bg==,size_16,color_FFFFFF,t_70

代码有参考网络部分，如有侵犯请联系删除，谢谢。

标签：img,Python,爬虫,爬取,url,html,read,print,path

来源： https://blog.csdn.net/iprotn/article/details/90069342