问题根源:transformers 4.45.0版本引入新的padding参数规范,而ChatGLMTokenizer的_pad()方法并未适配新参数,导致调用tokenizer进行文本处理时,参数传递不匹配。
两种解决方案(选一种即可):
1.降低transformers库版本
pip uninstall transformers -y
pip install transformers==4.40.2
2.修改tokenization_chatglm.py文件
源代码:
def _pad(
self,
encoded_inputs: Union[Dict[str, EncodedInput], BatchEncoding],
max_length: Optional[int] = None,
padding_strategy: PaddingStrategy = PaddingStrategy.DO_NOT_PAD,
pad_to_multiple_of: Optional[int] = None,
return_attention_mask: Optional[bool] = None,
) -> dict:
修改后:
def _pad(
self,
encoded_inputs: Union[Dict[str, EncodedInput], BatchEncoding],
max_length: Optional[int] = None,
padding_strategy: PaddingStrategy = PaddingStrategy.DO_NOT_PAD,
pad_to_multiple_of: Optional[int] = None,
return_attention_mask: Optional[bool] = None,
padding_side: Optional[str] = None,
) -> dict: