-
Notifications
You must be signed in to change notification settings - Fork 18.2k
model : add LFM2-ColBert-350M #18607
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
- Loading branch information
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -9948,6 +9948,31 @@ def _is_audio_tensor(self, name: str): | |
| return any(p in name for p in ["audio", "codebook", "conformer", "depth_embedding", "depthformer", "depth_linear"]) | ||
|
|
||
|
|
||
| @ModelBase.register("Lfm2Model") | ||
| class LFM2ColBertModel(LFM2Model): | ||
| model_arch = gguf.MODEL_ARCH.LFM2 | ||
| dense_tensor_name = "dense_2" | ||
|
|
||
| def set_vocab(self): | ||
| super().set_vocab() | ||
| self.gguf_writer.add_add_bos_token(False) | ||
|
|
||
| def modify_tensors(self, data_torch: Tensor, name: str, bid: int | None) -> Iterable[tuple[str, Tensor]]: | ||
| if not name.startswith(self.dense_tensor_name): | ||
| name = "model." + name | ||
|
|
||
| return super().modify_tensors(data_torch, name, bid) | ||
|
|
||
| def generate_extra_tensors(self) -> Iterable[tuple[str, Tensor]]: | ||
| # dense tensor is stored in a separate safetensors file | ||
| from safetensors.torch import load_file | ||
| tensors_file = self.dir_model / "1_Dense" / "model.safetensors" | ||
| assert tensors_file.is_file() | ||
| tensor = load_file(tensors_file)["linear.weight"] | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not a change request, but I'm wondering if we should introduce an Something like this: class LFM2ColBertModel(LFM2Model):
def extra_model_dir():
return [self.dir_model / "1_Dense" / "model.safetensors"]
# tensors will be loaded and processed via `modify_tensors()`
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yep, would be handy to deduplicate some code, I guess we'll see more of this as more I checked out some other ColBERTv2 models BTW, and they seem to have |
||
| self.gguf_writer.add_embedding_length_out(tensor.shape[0]) | ||
| yield f"{self.dense_tensor_name}.weight", tensor.clone() | ||
|
|
||
|
|
||
| @ModelBase.register("Lfm2MoeForCausalLM") | ||
| class LFM2MoeModel(TextModel): | ||
| model_arch = gguf.MODEL_ARCH.LFM2MOE | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -507,6 +507,7 @@ void llama_model::load_hparams(llama_model_loader & ml) { | |
|
|
||
| ml.get_key(LLM_KV_CONTEXT_LENGTH, hparams.n_ctx_train); | ||
| ml.get_key(LLM_KV_EMBEDDING_LENGTH, hparams.n_embd); | ||
| ml.get_key(LLM_KV_EMBEDDING_LENGTH_OUT, hparams.n_embd_out, false); | ||
| ml.get_key(LLM_KV_BLOCK_COUNT, hparams.n_layer); | ||
| ml.get_key(LLM_KV_EXPERT_COUNT, hparams.n_expert, false); | ||
| ml.get_key(LLM_KV_EXPERT_USED_COUNT, hparams.n_expert_used, false); | ||
|
|
@@ -627,6 +628,7 @@ void llama_model::load_hparams(llama_model_loader & ml) { | |
| ml.get_arr(LLM_KV_CLASSIFIER_OUTPUT_LABELS, classifier_labels, false); | ||
| if (!classifier_labels.empty()) { | ||
| hparams.n_cls_out = classifier_labels.size(); | ||
| hparams.n_embd_out = classifier_labels.size(); | ||
| } | ||
|
Comment on lines
629
to
631
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This crashes f.ex.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. thanks @CISC , rolled back to using |
||
|
|
||
| // arch-specific KVs | ||
|
|
@@ -6446,6 +6448,9 @@ bool llama_model::load_tensors(llama_model_loader & ml) { | |
| layer.shortconv.out_proj = create_tensor(tn(LLM_TENSOR_SHORTCONV_OUTPROJ, "weight", i), {n_embd, n_embd}, 0); | ||
| } | ||
| } | ||
|
|
||
| // for LFM2-ColBert-350M | ||
| dense_2_out_layers = create_tensor(tn(LLM_TENSOR_DENSE_2_OUT, "weight"), {n_embd, hparams.get_n_embd_out()}, TENSOR_NOT_REQUIRED); | ||
| } break; | ||
| case LLM_ARCH_SMALLTHINKER: | ||
| { | ||
|
|
@@ -7976,6 +7981,10 @@ int32_t llama_model_n_embd_inp(const llama_model * model) { | |
| return model->hparams.n_embd_inp(); | ||
| } | ||
|
|
||
| int32_t llama_model_n_embd_out(const llama_model * model) { | ||
| return model->hparams.get_n_embd_out(); | ||
| } | ||
|
|
||
| int32_t llama_model_n_layer(const llama_model * model) { | ||
| return model->hparams.n_layer; | ||
| } | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why set this to
FalseBTW? It'struein the config, and it's used inTemplateProcessingboth forsingleandpair.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, if the config is correct both
add_add_bos_tokenandadd_add_sep_tokenshould beTrue, and sep token should be set to bos token.Ideally this should be automatically done by
SpecialVocabsTemplateProcessing, but IIRC this pattern isn't accepted (a warning is logged), don't remember exactly why.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice catch, that's a debug leftover (was dealing with double BOS), will remove.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you still need to add sep metadata as mentioned for reranking to work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ColBERT doesn't need a sep token, it embeds queries and documents separately, and then the similarity score is calculated pairwise using maxsim. See the script attached here #18607 (comment).
This allows embedding documents once, caching document embeddings in the database, and then embedding only a query and computing a similarity score against precomputed document embeddings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking of adding this logic to
llama.cpp, but then it will require document database management, and I decided to leave it to the client.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, so
TemplateProcessingis not used at all.