-
Notifications
You must be signed in to change notification settings - Fork 11.7k
Multi-part GGML files: do they still work? And how hard would it be to modify convert.py to create them? #1503
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
The split files were only used because the original LLaMA models came like that, But you could just split the file by some other means for distribution, using the |
We used to support split files, and maybe we still do (not sure too) But I agree with @sw - some alternative common method for splitting a large file independently from |
The loader should still support multi part files, if you are able to hack convert.py to split the models it could work. Or just find the old converter somewhere in the repository log, but that will prevent mmap from working. |
Actually since HF doesn't really enforce model format, uploading it as a multipart .7z or .rar will probably be best. And as a bonus you might get some compression too. |
OK, thanks very much everyone for the info. If the loader does still support multi-part files then when I have some time I'll see about hacking convert.py. It'd be nice to do this "properly", but certainly not the end of the world if I can't. It's only a nice-to-have. And yeah, if that doesn't work then either I'll do a manual split which the user can re-join themselves, or else a compressed archive. Thanks again. |
@TheBloke one risk of using multi part files is that they get mixed up since the formats are prone to change. So if you have a 4 part file and somehow mix one of a q5_0 with q4_1, or worse, q4_1 for ggjtv2 vs q4_1 of ggjtv3 it will fail to work and you won't know why. |
@LostRuins I guess. However I always use separate branches when I do version updates. So eg right now my GGML repos have only ggjtv2 files in I guess I'm a little reluctant to use an archive because I have past experience of downloading a couple of models that used them, and it taking forever to uncompress them. Then again they do have the advantage of it being immediately understandable to all users how to access them. If I split the files I'd have to provide a script to join them, and then check it works on Windows... Yeah maybe I should just use an archive :) |
It's worth noting that 7z (and probably other archivers) support using no compression when archiving files, referred to as compression level 0 in 7z. In my experience there is essentially no speed penalty when extracting large files that are stored with no compression. So if speed is a big concern then that is the approach I would take. |
But in this case the whole reason to compress is to reduce the size of the file 😆 The issue is that HF won't store files larger than 50GB, so any files larger than that either need to be split, or compressed, to get under that limit. But yeah it might be worth experimenting with compression levels to find the optimum one that both reduces the file size below 50GB, while also being as fast as possible to decompress. |
Oh I was thinking you were just going to use the split archive feature that is natively supported by 7zip/WinRAR itself. Those wouldn't require any external script to split or join the files. But yeah if you don't want split archives at all then compression is unavoidable of course. Sorry for the misunderstanding 😄. |
Oh! I didn't even think of that! :) That sounds like the best of all worlds. Then I don't need any script to join the files, and it's one simple command for the user to get the file, and also doesn't have any compression slowdown. Sorry to you for my misunderstanding - thank you! :) |
I could get up to 20% file size reduction with zstd. but was a while back, so I don't remember the specifics. Please just use normal zip multiparts, no need for rar or lzma.
you can use
|
OK thanks for the test! I agree ZIP is easiest. |
Yea, just use |
So I finally got around to try this, for Tim Dettmer's Guanaco 65B in q8_0. And it doesn't work?
EDIT: figured it out. |
OK! Update: the ZIP is actually fine. I tested with And then on Linux I did
So I guess Panic over! |
Personally I use |
Hi all
Hugging Face has a max file size limit of 50GB, which is a bit annoying. This means it's not possible to upload a q8_0 GGML of a 65B model, or a float16 GGML for a 30B model.
I've had two people ask me to upload q8_0's for my 65B uploads. One of them asked if I could use another file sharing site like Google Drive or something like that. But the other mentioned the possibility of multi-part GGMLs.
I know that llama.cpp used to support multi-part models? It still shows
n_parts 1
in the header, implying that it might support 2 parts as well?So I'd love to know:
Here's the method convert.py uses to write the GGML file:
Would it just be a case of writing the file header twice, and then just putting the first X layers in the first file, and the rest in the other?
What about the vocab - would that go in both files, or only in the first?
Thanks in advance for any info!
The text was updated successfully, but these errors were encountered: