python error:‘gbk‘ codec can‘t encode character ‘\xa0‘ in position 389: illegal multibyte sequence

转载于 2021-05-17 16:58:00 发布 · 156 阅读

CC 4.0 BY-SA版权

原文链接：https://stackoverflow.com/questions/10993612/how-to-remove-xa0-from-string-in-python#:~:text=%5B%26xa0%26%5D%20is%20actually%20non-breaking%20space%20in%20Latin1%20%28ISO,could%20be%20represented%20by%201%20to%204%20bytes.

文章标签：

#python #unicode

程序同时被 2 个专栏收录

2 篇文章

订阅专栏

Python

1 篇文章

订阅专栏

unicodedata.normalize("NFKD", unicode_str)



import unicodedata
text_string = BeautifulSoup(raw_html, "lxml").text
clean_text = unicodedata.normalize("NFKD",text_string)
print clean_text

REF:https://stackoverflow.com/questions/10993612/how-to-remove-xa0-from-string-in-python#:~:text=%5B%26xa0%26%5D%20is%20actually%20non-breaking%20space%20in%20Latin1%20%28ISO,could%20be%20represented%20by%201%20to%204%20bytes.