Python进阶之Scrapy通过下载中间件携带随机U-A(DOWNLOADER_MIDDLEWARES,random.choice(),request.headers[])

Python进阶之Scrapy通过下载中间件携带随机U-A

  • 需求:
  • Scrapy访问网页时采用随机U-A
  • 测试页面: http://httpbin.org/user-agent
  • 通过DOWNLOADER_MIDDLEWARES实现

Scrapy随机U-A项目

scrapy startproject MV
cd MV
scrapy genspider ua httpbin.org

ua.py

import scrapy

class UaSpider(scrapy.Spider):
    name = 'ua'
    allowed_domains = ['httpbin.org']
    start_urls = ['http://httpbin.org/user-agent']

    def parse(self, response):

        print(response.text)

middlewares.py

  • 在原有代码的下方添加类
# -*- coding: utf-8 -*- BOT_NAME = 'spider' SPIDER_MODULES = ['spiders'] NEWSPIDER_MODULE = 'spiders' ROBOTSTXT_OBEY = False DEFAULT_REQUEST_HEADERS = { 'User-Agent': 'Mozilla/5.0 (iPhone; CPU iPhone OS 15_0 like Mac OS X) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.61 Mobile Safari/537.36', 'Referer': 'https://weibo.com/', 'Cookie': 'SCF=AgqHjsCD6PFCTQvhkhDV_5xV3cOZ1wMAWHeOwlyzXHHzgee-jRWGV3da2uTTSTt4FKe-Em3JNT_p8F3b5VN81so.; SINAGLOBAL=4541434264227.427.1736433023560; SUB=_2A25KysEKDeRhGeBK7VQQ8S3LzD-IHXVpplzCrDV8PUNbmtAYLWjYkW1NR4hNkw8TOc9gC8F8IhgUf_WlGRQj-ckJ; SUBP=0033WrSXqPxfM725Ws9jqgMF55529P9D9W5VFgGklZiw-wvM2Vq2x1Z15JpX5KzhUgL.FoqXSoqpeKeNS0e2dJLoI0YLxKnLBKqL1h2LxK-L1K5L1heLxK-LBKBL12qLxK-LBKBLBK.LxKML1-2L1hBLxKnLBK-LB.qLxKML1KeL1-et; ALF=02_1744191066; WBPSESS=6Axpu1TacVAKouPlQL8eUxLX1pyDayQXzA6Te2GXxJaj9Oro1D-tweRELev6_uwlz4ip9biTlZbF2sx58BB-zVRDaBkKlmosa4LcVtxpotxtEQy7eT14vllmsl7ePx57wZ0Cuap1NeK2mYz7NJLOHQ==; ULV=1741837498210:4:2:2:7357620306498.613.1741837498194:1741652383524; XSRF-TOKEN=9CFI5izfq6XlYszxBEVEdxqS' } CONCURRENT_REQUESTS = 16 DOWNLOAD_DELAY = 3 RANDOMIZE_DOWNLOAD_DELAY = True DOWNLOADER_MIDDLEWARES = { 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware': None, 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware': None, 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware': 810, # 'middlewares.IPProxyMiddleware': 100, # 如果有 IP 代理中间件 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware': 110, } REDIRECT_ENABLED = False # 彻底关闭 302 自动重定向 # 如果有代理: # HTTP_PROXY = 'http://your_proxy_ip:port' ITEM_PIPELINES = { 'pipelines.JsonWriterPipeline': 300, } 这个是我的代码
最新发布
03-15
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

kingx3

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值