解决python中UnicodeEncodeError: ‘ascii’ codec can’t encode character ‘\uff1f’ in position 22: ordinal not in range(128)问题
问题说明:
代码环境:python3.7,PyCharm中编译。
在用 urllib.request.urlopen
访问 url 的时候,报错如下:
Traceback (most recent call last):
File "/usr/local/lib/python3.7/urllib/request.py", line 222, in urlopen return opener.open(url, data, timeout)
File "/usr/local/lib/python3.7/urllib/request.py", line 525, in open response = self._open(req, data)
File "/usr/local/lib/python3.7/urllib/request.py", line 543, in _open '_open', req) File "/usr/local/lib/python3.7/urllib/request.py", line 503, in _call_chain result = func(*args)
File "/usr/local/lib/python3.7/urllib/request.py", line 1345, in http_open return self.do_open(http.client.HTTPConnection, req)
File "/usr/local/lib/python3.7/urllib/request.py", line 1317, in do_open encode_chunked=req.has_header('Transfer-encoding'))
File "/usr/local/lib/python3.7/http/client.py", line 1229, in request self._send_request(method, url, body, headers, encode_chunked)
File "/usr/local/lib/python3.7/http/client.py", line 1240, in _send_request self.putrequest(method, url, **skips)
File "/usr/local/lib/python3.7/http/client.py", line 1107, in putrequest self._output(request.encode('ascii'))
UnicodeEncodeError: 'ascii' codec can't encode character '\uff1f' in position 22: ordinal not in range(128)
check url后确定问题是url中有无法用ASCII码表示的中文字符,因此需要对url重新编码。
解决办法:
python3 中的urllib.parse
函数可以解析url,这里可以用来重构url,具体采用其中的 quote
函数。
from urllib import request, parse
import string
f = open('./check_url.txt', 'r')
for line in f:
# encode url
url = parse.quote(line, safe=string.printable)
# aviod SSL verification
context = ssl._create_unverified_context()
# request to open url
# 'safe' refers to characters that can be ignored
response = request.urlopen(url, context=context)
注意:
- python3.7中默认环境编码格式为 utf-8,因此在python2.7中的改变默认编码格式来解决此问题的方法不可行;
- 在
parse.quote
中需要设置参数 safe ,防止url中字符完全被编码,否则会导致url中非字母的字符都变成带 ‘%’ 的转码字符,这样url就无法打开了。- 参数 string.printable 的具体说明:该参数表示ASCII码第33~126号可打印字符,其中第48~57号为0~9十个阿拉伯数字;65~90号为26个大写英文字母,97~122号为26个小写英文字母,其余的是一些标点符号、运算符号等。
解法参考:https://blog.csdn.net/qq_25406563/article/details/81253347