python2编码
unicode:unicode 你好 u'\u4f60\u597d'
| | | |
encode('utf8')| |decode('utf8') encode('gbk')| |decode('gbk')
| | | |
utf8 gbk
编码后的str '\xe4\xbd\xa0\xe5\xa5\xbd' 编码后的gbk u'\u6d63\u72b2\u30bd'
# str: bytes
>>> s = '你好 world'
>>> printrepr(s)'\xe4\xbd\xa0\xe5\xa5\xbd world'
>>> printlen(s)12
>>> printtype(s)
# unicode:unicode
>>> s = u'你好 world'
>>> printrepr(s)
u'\u4f60\u597d world'
>>> printlen(s)8
>>> printtype(s)
#unicode: 无论什么字符在Unicode都有一个对应。
python2的特点
1.在python2中print把字节转成了Unicode
2.python2中以默认已ASCII编码
[root@localhost ~]# cat python.py
#coding:utf8 # 告诉解释器以utf8编码
print '你好'
python3编码
在python3中默认以utf8编码
str:unicode 你好 u'\u4f60\u597d'
| | | |
encode('utf8')| |decode('utf8') encode('gbk')| |decode('gbk')
| | | |
utf8 gbk
编码后的str '\xe4\xbd\xa0\xe5\xa5\xbd' 编码后的gbk u'\u6d63\u72b2\u30bd'
>>> s = '你好 world'
>>> print(json.dumps(s))"\u4f60\u597d world"
>>> print(len(s))8
>>> print(type(s))
编码解码方式1:
>>> s = '你好 world'
>>> b = s.encode('utf8')>>> print(b)
b'\xe4\xbd\xa0\xe5\xa5\xbd world'
>>> s = b.decode('utf8')>>> print(s)
你好 world>>> s = b.decode('gbk')>>> print(s)
浣犲ソ world
编码解码方式2:
>>> s = '你好 world'
>>> b = bytes(s,'gbk')>>> print(b)
b'\xc4\xe3\xba\xc3 world'
>>> s = str(b,'gbk')>>> print(s)
你好 world>>> s = '你好 world'
>>> b = bytes(s,'utf8')>>> print(b)
b'\xe4\xbd\xa0\xe5\xa5\xbd world'
>>> s = str(b,'utf8')>>> print(s)
你好 world>>> s = str(b,'gbk')>>> print(s)
浣犲ソ world