pyppeteer拦截某个get请求并将其修改为post并传递参数的方法。
由于爬取的网站有cloufare验证,因此选择使用pyppeteer进行模拟登录,然后获取视频流。但是如果登录和获取流写在一起的话耗时很久,所以将登录的部分提取出来保存cookie
由于获流的请求方式为post请求,故将stream网址重置,修改请求方法并增加param。
# -*- coding: UTF-8 -*- import asyncio import random from urllib.parse import unquote from pyppeteer import launch from pyppeteer import launcher import requests from requests.adapters import HTTPAdapter import re import json
# 获取redis里面存储的cookie def get_redis(name): HOST = '45.91.101.88' PORT = '6379' PASSWORD = 'bet365pass' data = {} try: sr = StrictRedis(host=HOST, port=PORT, password=PASSWORD, decode_responses=True, db=0) data = sr.get(name + '_guides') except Exception as e: print(e) finally: return data
# 模拟浏览器方法 async def GetStream(stream_id): if '--enable-automation' in launcher.DEFAULT_ARGS: launcher.DEFAULT_ARGS.remove("--enable-automation") browser = await launch( {'handleSIGINT': False, 'handleSIGTERM': False, 'handleSIGHUP': False}, headless=True, dumpio=True, userDataDir=r'./text', args=['--no-sandbox', '--disable-setuid-sandbox', '--disable-blink-features=AutomationControlled']) web = await browser.createIncognitoBrowserContext() page = await web.newPage() await page.setViewport({ "width": 1068, "height": 480 }) await page.setUserAgent("Linux/Deepin") cookies = json.loads(get_redis('cookies')) for i in cookies: await page.setCookie(i) await page.setRequestInterception(True) page.on('request', lambda req: asyncio.ensure_future(intercept_request(req, stream_id))) response = await page.goto('https://www.365sb.com/streamingapi/v2/stream') responseBody = await response.text() url = responseBody.split('&')[2].split("=")[1] StreamContent(url) # 关闭浏览器 await page.close() await browser.close() # 请求拦截器函数,设置拦截条件并可作修改 async def intercept_request(interceptedRequest:Request, stream_id): interceptedRequest.headers["content-type"] = "application/x-www-form-urlencoded" param = 'l=10&m={}&t=446&sl=4'.format(stream_id) data = { 'method': 'POST', 'postData': param, # 注意格式,格式错误无法重置请求 'url': interceptedRequest.url, 'headers': interceptedRequest.headers } await interceptedRequest.continue_(data)
关于拦截器的位置,如果拦截器放置到goto后面,则只能拦截到加载页面之后的请求和响应。
请求拦截器拦截完成后一定要await interceptedRequest.continue_(),不然可能造成其他请求被拦截后无法再次加载出来。