解码 boss 直聘的薪资
最近(2025.06.16)在爬取 boss 直聘信息的时候发现薪资信息是经过编码的、人类不可读的形式。
将 font-family: kanzhun-mix, kanzhun-Regular
取消后薪资显示成方块儿,能够确定其是由特殊的字体进行解码的。
在 Network 中选中 Font 能看到 3 个字体文件。
分别查看 3 个字体文件的内容
1. 第一个字体是 base64 格式,需要特殊处理
import base64
# 原始数据(去掉前缀)
data = "d09GRgABAAAAABgUAAsAAAAAKyAAAQAAAAAAAAAAAAAAAAAAAAAAAAAAAABHU1VCAAABCAAAADMAAABCsP6z7U9TLzIAAAE8AAAAQwAAAFZW7kg4Y21hcAAAAYAAAAHbAAAFVNSkwZBnbHlmAAADXAAAEE0AABxcANDF92hlYWQAABOsAAAALwAAADYPh4nBaGhlYQAAE9wAAAAgAAAAJAfgA8hobXR4AAAT/AAAABUAAAEgH+kAAGxvY2EAABQUAAAAkgAAAJLyMupubWF4cAAAFKgAAAAfAAAAIAFaAHFuYW1lAAAUyAAAAVsAAAKprAB5inBvc3QAABYkAAAB7QAAAzwZuNu3eJxjYGRgYOBikGPQYWB0cfMJYeBgYGGAAJAMY05meiJQDMoDyrGAaQ4gZoOIAgCKIwNPAHicY2BkYWCcwMDKwMHUyXSGgYGhH0IzvmYwYuRgYGBiYGVmwAoC0lxTGBwYKp65MTf8b2CIYW5gaAAKM4LkANhrC7sAeJzF1EdWG0EYxPH/ICGSyDmDTM7gHHGEjY/hQ3A6H6cWXvkGuHqqNz4Bo/eTRvOkUT911QcMAx07sy40f2koxx9fbdrrHcbb611++/2oH0N+fdBAd4+P7Rnaa8/K0bSf+FnPxvzdCfpMMsU0M8wyxzwLvsMiSyyzwiprrLPBJltss8MuA56xxz4HHHLEMSecehXnXHDJFddeT9ervqHHCM95wUte8Zo3vOUd7/nARz5xy2e+8JVvfOcHd9x7OT2e7Gie7qf/P/rlqfOrvvO/wkPlJYrwvqEmvINoKEoO1AnvKupGuauGwzuNeuE9RyPh3Uej4RygsXAi0Hg4G2ginBLUD+cFTUbJnqbCGULT4TShmXCu0Gw4YWgunDU0H04dWgjnDy2Gk4iWwplEy+F0opVwTtFqOLFoLZxdtB5OMdoI5xlthpONtsIZR9vhtKOdcO7RbrgBaBDugpsWbgXaD/cDHUTpsQ7DnUFH4fag43CP0Em4Ueg03C10FmUm6DzKXNBFlHmhy3AH0VVQ9vw6KHt+E24oqtxVVLm1qHJ/UeUmo8qdRpXbjSr3HFVuPKrcfVR5CqDK8wBVngyo8oxAlacFqjw3UOUJgirPElR5qqDK8wVV3P8D3lS0GgB4nI1ZD3AU13l/33v3Ryed7nS3ultJh6TbW90tIN0JdP8CwtIaJDz8baAGbDkG2WBqDHgodSFua1i3zVjC5k9JaibTjH0TXCeYxCWJGbspJls8CXGNJwkdHKCDOeLW4zRD4mCapC736Pd276Q7ilJ0u9/uvn37vfe+7/f9eyJOQm5eYSdYC5HIdDKbDJHPEgKuboj5aDsoWiZFuyGkOENys49pqqa41ViK3QVyzNUc7stlErLL7fKDDzogrfTltBTVIJsZoP3QF24HaI20/WEwPi3IDkB9i9bxBb6YfhVCneo0/0CSL+oZbO6LSnU7vcFgazD4XJ3L6ayj1OH3wVY57HF66l38Jae/LXSicwbtBG+r1rb0/sZoJPjQWObx9rjsATAMkCJR39cGA20BPP6iLSwFW91NjXUtbY1qVzPs/PeGFsnbnviA4B/gWm86gBGSIERJD0A+BZqaReIDd0hF0gFyOqvGEoOQ6+uEMIqB5s5tGmds3+Y6un/z5n2sDk82vuk/Rpob6p9zS+699V5pxLVv05b9+Go/Y/s317H9Wzbtc30j3PReff17TWFCKI5rOAgzSAPKmDgTRMuRfJjILsISMT9oCRRmJ7iFRAdBDg/QXsjnMpRc5JedTlAuXgTF6eSX6/yaf++4LzHNN7bP1y7u/fEINI7v8xlVvS7iV/SMz7dvzDctgT39Wrtv33gjROL+8b1+jdTOJ1Q7HwmFoLkhF8z/n9HXHnx31t8Ngzd+euzW4Zjy0NsDu+fyDx6GWc8Qwqr4h0l37QhxCzVqDFcriQsO6Ao1y+FOSOdQ7LjqC/yyywXKhQuguFz8Mr8uL5iZjLZPCz3X2uUKRlVGmbs50edraK83qvpdwO/eknv0ZFvzXXSxxGgk4qnzTUs9crR7/t8umW/rH8lR+ivSild3qNnlVlOQDWRy+XQHhAJ5S+nN8NV/CTR2NgaQeIPjnlZPuA6cb3sbAwCBRlqwr6Wr4x5P2NNKKnzZF+gngm8c4ZMNuF2xXkhkBiE7AMIU5IAP2MaoV3q2Hvl5PO8ii6jFyRdspC17LF6eMw2NwWBjaaQxaPOkBRQlWqiH5j1Ad/A9sGMm/w4sY4Qb3OiGVbCKlPVpMpPpKG8Zp6OG0qF0FoGcVSUEN+Icce0DKBaLOh7RkyWns3TSolAQLcWSXmlAaulvkp+KHLPIMKSGkGEWqjhqtglZhgMW7yKQKu4nD19wOC4ctigU7NF5tGqo/spbpBU5GlS31+z2AFvFj/Kj3WAAtoqVzxQymJA5LdryAewqAzw2E5bx78yEHYx0l7+7hSegz0Bkl2XXjTz32N9YvuEjXHMLqSNEshkqoW/Dhh5+D/xjD/8K88Eofz0Fh+BQir+u307ulkzclnwkzS275byc1/IarRHJC6kXX0y9YFFWLXW+vdKMlIhZlHn7SYR0ohY0tKReksbZBZSA0IUi9BFSQqAI9WAj6pwJNeEp3XI1GDF0buh4LZlg6OYNw9R1A//MCYJyIRRbsV0Hous3cHSdl5/K5DbYqF51FR6kPAoAfQkKIE+nAsXhqaRx4/tVyIAjNaKp2AY1hQY8lBa4wIfJ8VfWt04t7UgeMEEHXWh8Ags6YkF8Bwo2m1QAy6hgj9g8AQ3TYlkyQCcT35m4ZhuXAkaITolf5VeTIFkEpCS/ChLO46p1W/3Gmg3y+KyFV+QiONh83EATM8vQpjsqd5MIBuMWLL9KvyawHNfwew0O/WcSZBpOvvnzJP85lZOii7Oso7txJD/GdRkRRCQEDEIGQRNQAH0DUwKKFEhbv8IoRFFO0dESatk0TaqXzAJqmyJgDL2kM6KXCDU4AUMIUsjrlvV4qIazsQ64CDvKZvXexF2Vx+qu8V32XA3Ek4d4y7MlEs5T4DmgoJbyoTTgxON4irFRYzgZsKah7+ezLdSidsQMEdsMe7y+E1GrI5JJZZ5RJu4sD4CsgRKIlnCVHP0HN/ELK2YhMa15+HAWBCpTCMRlcCOY86DZgBEjvTHGP38k1u+dA0uTQjqIE462pY/xJ2OvROd4YWmqMjZKD63Za+s8LyNLJy6FFpP8E/5J0tR5gRrwXXyCxqRJXaVPPxUTF3Z2mRVwPl6M0YrwwbbXDaUlRUordtbiB9QjKzsDxPmWlSu30AJarnDEsCiXWwT6xjT9dnqjfpPodDS8ejul21eHSwY1wv3LAZb3rxbiQ+sv+7Oj7BzrxhFjtmVjiJTDPsxHfKDaJt43AJkJn3/upfOOVPJsHQRb6s9mdg/bFrvm4Bo8xl5zOF5j3fg8vDtztr4lCHVnkynH+ZcOXygNrXr+yKF7wfHa2Nhxe61/yTjbhZKXSd4eOdwkUgMrcQCRMYh8U8rkNGzVEnkpl7CW3yznB7C3m7nC8gDq9dmPFg4PL/zo2Qv8CuYQnRDvSCT6E4luCvPqJVcDXe+KdmlrVYD+eqnBSdfXRbU5y9kueGbnk/z6kzufAfwIE4pO/q46T8WDb3TL9cP1Xur0Qn9nygF3zw56PcP1DQ7MTU+3J90wOIkdQhyo6UbbQ3tAEJpVDAv7OjNumAK1CJ4gT8FZ/sub+CE+PcR/CcF1FVs4wY6wVfjCRepJE2kmJC9LGpPzEoYSDzhhtltY8elTxVO/U/5m46k5p/ijyHU2N34H94B5eRv89TaufylF1257eFvpJP3ysdIfLRd4ohN2ZscqFxGytSDlrqgTJciMFn6sRZyW5z1D1y9Zsp5aFMwf8h/X1cGsH9parrQjJdXxihF3ld/pxLHSARVRr6JFCQeUFZEBDcEdwPAA4iyKmAPFmyinm2Rr8ngS/Ejw4L9GwnT7Rbmbq9JOX67cCdk5atanYqS0UFRZplxZIMYnd1XEqm6nBBdWtfyx48xzft268x52fAzNyjYupGhUtmnBIqZjv0mxzMOeuz4eGvp419jxpyb6r9wyUOmP1J6ryEkYmZCVhRjha6B8RX+CB/o5cRFBjFsn1yt3k3lnAQoCgRIKd5Sh6cOoTkg5Tgv/EUYdxG1pBNIhxV45pgwBXHQ2LQVs7QecagfaGY0+VWCjvGhPmBdHWeEpvcgObN16gOFcaFemi3L/t3ZxU7cXhBPa9S3/fAwcOnbBjvimTVUtfSAnNlpeo0Ay6j6OTpUJn+qDWCKT6wsXRawRJ4ZX/mOITI/gYRQK6KxuGMy8oR9oFi0ROzxP+mjbJ0766Cl0CpgTYE5zA8lUKrS8Nzp2vJlaZWiP1thQtNYSJTNIlgwQq16qwlWzS6zqlsnYdfJtG4/NGgQYnGVTe6Gw4TZt827TVpxomDU4WBbc3XfYZtfCZ9mrLIXaIRIDTCcgf3EMy+y73jzEf72EpfhbpStjJ5aA/9CkfQnZuxGvsrBryf5IY2V7ZllcoogS5asumJ0Z4R8n35m3YlMKVjzw6MnHESRCzJjnoenAZRzinSQ0jZzhZ08++gD/h9SmFTSjI4xWbjmlW7kiqfavDL0iFtSWxjG5RmuR+1DqVnDK5Cwfaz5xBtCtFtt62vCYkcsJJ2uYT1zn79L0aFuyL9UKuZW5sr+t4ttE2sW+QBVvUQorDCZGqNZc9ViwMfnky/Rfr/KPOzIdeFSrqXr00ge5z/89TANHR7Y/01GrEKGPItpr1MoORf6DE8liog9FkXLpVrqD1Vr5oSbHm/C0efSz6GLR12KSY1cECggu+NOwSCgWClQvFITBjZbIKCZABIRPwXPUxETYMEqmCVabGAZXRkQWBdZY/4S6F1pXAljb5DV3QLEushKwLvmAYl2+C6qe3Ls3qVeu/FLNFauoi+UOifI1Pmh3sNvtWHWNPc18JCg8F/oKDYWfFklBLq/iOlTAE6+ojzLkEi7mWHPpscf+HL3l1rVr3xx+6pzOm5CsubT5sT8rNz6yvCs568QTjxTiXT2L9XN0WD+3sqrJ8pmvWhhvInPJAtRDXwcNiQ2KLi1B8+X6hrqsPQuRb+TT2UQunwvLYWsLrB2EC70LkcgwSUoIqFAycnj3EP3i7qf5/6wfzaRz778fWsDoigdH1NYwZZFQsMUB0Dq9tdvX5GjJyEFom97aNL9N7UqrKtOHdh8e2XMpn86MrgenYXyRDzb0DwyuaJbikc9MD0UBmps8HbPrHd7GQFZSfaq8NNDSEpjpi/BfQSzdBbFM7JY6FWONs3pPAH1yTXXG51NSXYjBv1lptIWB/2bjzIO+T8SSAUjhimWnoB00l3e67DaXO57T6Ne71i7zZ+8fSfj42Y5lixN16T9Y3AItzvzoSlW574F2/hGro6riGPzcvZEZq2edmNagDq+YF4rENvUtlXseXJMKT98UaauMa7Lzls8ncaekxSPgdEuYiV/K/wKGDx6E4V/k7/8yDH3YCxq/0PshP2H5qjes9XaSpeQ+skHk3tbeXkB1iaLUJcpSsfcXSOewPJVznSB24fDEt6hKLJkRWJ0gKvjwIIgqPjcoiibEYS/kaxKIqnv6x+93h7u8q1e3pLrf7164EImsTj4u8IaTc+cmY48r7Ylm/9CCpvppze3K47HaRk+H1K78dt1uSnevs+nQCKUjQxZlurGkbUC5SRbklhhLGhuRRO5SoPLobdbqGhrq+vX+WPIz3UZ3+6yWGD5U2lJ5bIuk5Vj/sgnm63YPVJgjrcVKoLa6Z24ZC84atDSkjsGaJP9p7U7GAz1vwJoU/2k5Zt/8Hvs+u4dMx5wsZ3FEhnigASMKUQV9sp2C5PohoKYQSxY4nXZkjeM7zFTYaF+68Z58aPxzhjHwcCirN/l9L27f/qLP36TrCzaos+c4fvD88z8ovWTHbKyorPD9YfuGP1kTHNlKdRieiwX9tgOUHthG0Ykue/pPY8+fZuz0CbungSfmmIsqczbZPpRBG2aSPSKDlVVtYsuFoXeW0uVyzF3ZPhaZrFXdUbMQ1pusgmtjGu7no3rBziv0ckpRTH7zm8lRTHCKBbHtwjHvMCqFWbkoC/O5Vidw8U8r86nsuURqLbh6D07MbMq9lgdFpaxPsfkGX5moEWxbwwo4Lmv5XjQMLH2r7O5t7fCVDRuuHNberrbAlzddfkF55x3lhcvTam2xdu7tU+8XMRlk0KacfeKvYEaSHzn2e3aJDo7BjBQ/8kbVnsJkLSLiliKybKfYAyn/xG6YkD4GRGsXwT4xicZvxTaoqRtiz8uWy+tML+/dpAOQBwncCFeMknw8BR4Tdqb4b3XxKexM8t/QD/l4EjylSt5685/Z9/B7UVenpqqsB+hkYV3ebE1jkAMTV9jbI4rs1vqfZHfVFtnHGTsOj5q6aepWMTK8K/uT+lZRa/f0WmIpDcG9h76OxTbWJmPH4UHR0zTvXC8S4jqQnlIvb31p+jf036OUzTv69kBvueZxEFqsqs+s/wfYJf6d1WfXwEspv37tGr9OKXivvfKzGTN+9opNp/CYtBjmZ8LWCRlxzmz40cKFP2qwaHZKN3jr3o0Hc0GsYt0aE3s3RGzV6GYyTUVx/0nSLH1KXWaSN9qxslbfiTvQt+D6/+v5PjDvSMftul7JmeE3lX1aqUqq8Snuq8sRMKZ8+C+86x2kdLDXbr3dPY7+v5auzdAAAAB4nGNgZGBgAOJDAQ2b4vltvjJwszCAwDXjRY8Q9P8GFkbmBiCXg4EJJAoAQlkLIAB4nGNgZGBgbvjfwBDDwsDA8P8/CyMDUAQFeAAAcjYEsHicY2FgYGB+ycDAwjCKsWEApeYCCQAAAAAAAAAAdgCyAPoBKgF2AaIBzAHiAgoCRgJcAnAChAKeAswDGANaA2gDdgOEA5IDtAPWA+oEHARABHAEhASuBMwFBgVCBaIFxgX0BiQGZAa6Bt4G7AcsB1YHlAf8CBQIUgh+CMQI3AkSCUoJhgnyChQKUApqCwgLMAuKC9IMBgwwDGoMkgyyDPwNNA2MDaoN7A4uAAB4nGNgZGBg8GBIZeBgAAEmIOYCQgaG/2A+AwAadwHMAHicfY9LTsMwEIZ/94VIBQsQLLrBYoEEqOlDgkW3ldodSF10wypNnTZVEkeOW6kX4A4cgJNwDrgAl2CSDkipVBKN883n8XgC4AxfENg9FxQ7FjihbMcVHOGauUr+lrlG/MhcRxND5gb5J2YH93hhbuIcr9RB1I4pu8Mbs0ALH8wVnOKTuUr+m7mGlqgz13Eprpgb5B+YHUzFM3MTN+LdGRrlWTWXs60MfZ0EOrGOilSsEtvORTZRi3XkmZIrJVNlslAnsud2S36sEmV+e2ebRd/aQAZGx3JEl6go0jI1eqV86y6tTQedTsDe9XVMow5hoODB0jqHxAxbWkP40EgQFKulOoWIIqbI8/ZfRYYJuQXWtO8VvQ7VHd6ZkjP0DYtcogcX3X/qx4XLz+zPnWFDs/TJWppdUhg6ExON+E/yrhGxRFrsrcj45F0si1MpBujQG+zVu8Xt8Q+LZH1gAHicbVJZe9MwEPQUOXISpy003Fe5T3OU+yxQjvIzHHkT64stGUlO+Pj1+EhMHtCDPd7d2Z0dy9vy2jPw/n+OsYUTYPDRA0eAPgYYIsQI29jBLk7iFPYwxmmcwVmcw3lcwEVcwmVcwVXs4xqu4wZu4hZu4w7u4h7u4wEeIsIjPMYTPMUBnuE5XuAlXuE13uAt3uE9PuAjDvEJn/EFR/iKb/iOHzjGTw+/e2WR6TjxyRhtuC2FIGv5MjZKqlnfauOiRC8Vb1BZDOKqbhllNHVDIY3IKCqy0u5t4EiXLpOKVqU1e9hCI2epC1pcFmwSi3m4IopMW2JJ7Gi8Gel6idiQa8aGLZxo53Tebz+cLoYtakb4DTdMon9ifZGSmPcSysjRaJ1pBSValDkpx5OoaRJSIt16clDrbxyaaZ3YnqXYiJRJNdU8r6yKZ8Tq+iDTInZSK14XV97trgPrTqyaUfq5VKVlE8qyMNcTWXuW6iqpaGmriOlW9pv4qHmuY7yQwpWGdlbvrnXtOy+MVI4MM7Gac0NTQzYNfpVkaxU9Q7lekG/TakVuXWyiSqsl5yqt3V+oTaqCZiEFBVZnST1hu6V2jrTk6XS8yeokOinm5CyrLwz/o3UeScWczIktJC15e90OgiZTcVi9s+f9BXuB96oAAAA="
# 解码 base64 数据
binary_data = base64.b64decode(data)
# 写入 .woff 文件
with open("1.woff", "wb") as f:
f.write(binary_data)
使用在线字体查看器打开文件1.woff ,比如 FontEditor在线工具。发现字体是 element-icons,可以忽略。
2. 3kovsijnt11693967587313.woff2
字体是 kanzhun Regular
,是我们要找到的目标,并且可以看出只有 0 ~ 9
这 10 个数字被编码了。且 Unicode 的映射关系如下:
mapping = {
0xe031: "0",
0xe032: "1",
0xe033: "2",
0xe034: "3",
0xe035: "4",
0xe036: "5",
0xe037: "6",
0xe038: "7",
0xe039: "8",
0xe03a: "9",
}
3. 16a48d9v961651026858984.ttf
字体是 kanzhun
,并不是我们要找的目标。因为薪、元、时等中文是没有被编码的,如下:
根据映射关系进行解码
from selenium import webdriver
from selenium.webdriver.common.by import By
import time
def crawl_boss():
driver = webdriver.Chrome()
try:
# 打开目标网页
url = "https://www.zhipin.com/web/geek/jobs?query=&city=101010100&position=200101"
driver.get(url)
time.sleep(5) # 等待页面加载
# 获取页面内容
job_list = driver.find_elements(By.CSS_SELECTOR, ".card-area")
for job in job_list:
name = job.find_element(By.CSS_SELECTOR, ".job-name").text
company = job.find_element(By.CSS_SELECTOR, ".boss-name").text
salary = job.find_element(By.CSS_SELECTOR, ".job-salary").text
job_obj = {
"name": name,
"company": company,
"salary": salary,
"decoded_salary": decode_salary(salary),
}
print(job_obj, salary)
except Exception as e:
print(f"爬取失败: {e}")
finally:
# 关闭浏览器
driver.quit()
mapping = {
0xe031: "0",
0xe032: "1",
0xe033: "2",
0xe034: "3",
0xe035: "4",
0xe036: "5",
0xe037: "6",
0xe038: "7",
0xe039: "8",
0xe03a: "9",
}
def decode_salary(encoded_salary):
for key, value in mapping.items():
encoded_salary = encoded_salary.replace(chr(key), value)
return encoded_salary
if __name__ == "__main__":
crawl_boss()
更简单粗暴的映射方式
手动从 html 中复制出编码后的 0 ~ 9 进行映射。
mapping = {
"": "0",
"": "1",
"": "2",
"": "3",
"": "4",
"": "5",
"": "6",
"": "7",
"": "8",
"": "9",
}
def decode_salary(encoded_salary):
for key, value in mapping.items():
encoded_salary = encoded_salary.replace(key, value)
return encoded_salary
关于 Unicode Private Use Areas,PUA
查资料发现编码 0xe031~0xe03a,属于 Unicode Private Use Areas,以下摘自维基百科,里面也提到了字体相关的信息:
在Unicode中,私人使用区(英语:Private Use Areas,PUA)指其解释未在Unicode标准中指定,而是由合作用户之间的私人协议决定其用途的一系列码位。[4] 目前定义了三个私人使用区:一个在基本多语言平面(U+E000-U+F8FF)中,另外两个几乎包含了整个第15和第16平面(分别为U+F0000-U+FFFFD,U+100000-U+10FFFD)。[5]
私人使用区字符的分配,可以不由字面意义上的“私人”决定。一些组织已经发布了一些分配计划。但根据其定义,私人使用区相同的代码点可分配为不同的字符,因此使用某种字体的用户看到其显示为一种形态,但使用其它字体的用户看到的字符可能完全不同。