iPad App #844

ocrickard · 2023-04-08T04:25:37Z

ocrickard
Apr 8, 2023

I've been playing with using llama to help me tell stories to my daughter at night. I wrote a simple native iPad app that uses llama.cpp, and provides some nice model / thread management capabilities on top of it. It runs quite well on my M2 iPad after a few tweaks to the memory allocations in llama.

I'm curious if there's enough interest in something like this for me to continue polishing it and share it somewhere.

trimmed.mp4

edmundronald · 2023-04-08T13:17:07Z

edmundronald
Apr 8, 2023

Please do share!

0 replies

auto-stars · 2023-04-08T22:58:02Z

auto-stars
Apr 8, 2023

I think it's going to get a lot of attention when it comes out

0 replies

ocrickard · 2023-04-09T00:14:29Z

ocrickard
Apr 9, 2023
Author

Thanks for the interest. I'll keep working on it. The largest remaining item I need to resolve within llama itself is this:

https://github.com/ggerganov/llama.cpp/blob/master/llama.cpp#L60

The allocations for scratch in a 7B param model are pushing just over the limits of available memory, so the only way this runs locally is if I hack these allocations to be dynamic. I'll see if I can clean up that patch and submit it soon.

3 replies

Green-Sky Apr 9, 2023
Collaborator

make sure to set f16_kv to true, it halves the k+v memory usage. it still defaults to false https://github.com/ggerganov/llama.cpp/blob/master/llama.cpp#L294 . the ./main sets it by default.

guinmoon Jun 15, 2023

Thanks for the tip. With these changes, I was able to run Vicuna 7B on an iPhone.

static const size_t MB = 1024*1024;
static const size_t MB_small = 256*1024;

static const std::map<e_model, size_t> & MEM_REQ_SCRATCH0()
{
    static std::map<e_model, size_t> k_sizes = {
        { MODEL_3B,    128ull * MB },
        { MODEL_7B,    512ull * MB_small },
        { MODEL_13B,   512ull * MB_small },
        { MODEL_30B,   512ull * MB_small },
        { MODEL_65B,  1024ull * MB_small },
    };
    return k_sizes;
}

Green-Sky Jun 15, 2023
Collaborator

fyi, in master it now defaults to true :)

edmundronald · 2023-04-09T02:01:52Z

edmundronald
Apr 9, 2023

There seems to be a llama.cpp with mmap out there now, which might already do what you need. Look through the issues on github

…

On Sunday, April 9, 2023, Oliver Clark Rickard ***@***.***> wrote: Thanks for the interest. I'll keep working on it. The largest remaining item I need to resolve within llama itself is this: https://github.com/ggerganov/llama.cpp/blob/master/llama.cpp#L60 The allocations for scratch in a 7B param model are pushing just over the limits of available memory, so the only way this runs locally is if I hack these allocations to be dynamic. I'll see if I can clean up that patch and submit it soon. — Reply to this email directly, view it on GitHub <#844 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABWGTOI3LQM7BCWN5VXCBW3XAH5PDANCNFSM6AAAAAAWXFY4Y4> . You are receiving this because you commented.Message ID: <ggerganov/llama. ***@***.***>

1 reply

ocrickard Apr 9, 2023
Author

Yep, those changes were helpful on iPad, but it doesn’t help as much on iOS as it does on Mac, Linux, and Windows with more fully-featured memory paging systems. The scratch and non-dynamic memory allocations remain a problem that needs to be resolved.

jquave · 2023-05-03T20:50:48Z

jquave
May 3, 2023

Any success getting this to work @ocrickard and others? Interested in doing the same thing.

0 replies

guinmoon · 2023-06-14T16:32:40Z

guinmoon
Jun 14, 2023

Hello. I'm doing something similar. I am writing an application in SwiftUI so that you can select different models and inference. LLaMa and GPTNeox are currently supported. OpenLLaMa 3B works well even on iPhone. Unfortunately, 7B and largest models can only be launched on MacOS and the iOS Simulator due to a bad alloc error associated with not enough of RAM. I have an iphone and an intel hackintosh, there is no way to test on other devices, it would be very nice if someone could help with this. In the future I plan to add more inference and models.

2 replies

richardr1126 Jun 18, 2023

Has anyone tried to run a 2-bit k-quantized model on an iPhone yet??

guinmoon Jun 29, 2023

I updated to the latest llama.сpp and now on my iPhone 12 i can run this q2_k model

D4r3E-1v1l · 2024-03-17T22:55:30Z

D4r3E-1v1l
Mar 17, 2024

Hi, I want to write an app for iphone to run llama.cpp. Can I add your discord account to ask you how to integrate the model with swift UI? My discord username: shiyu_liu. Thank you very much!!!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

iPad App #844

{{title}}

Replies: 7 comments 6 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

iPad App #844

Replies: 7 comments · 6 replies

ocrickard Apr 9, 2023 Author

Green-Sky Apr 9, 2023 Collaborator

Green-Sky Jun 15, 2023 Collaborator

ocrickard Apr 9, 2023 Author

Replies: 7 comments 6 replies

ocrickard
Apr 9, 2023
Author

Green-Sky Apr 9, 2023
Collaborator

Green-Sky Jun 15, 2023
Collaborator

ocrickard Apr 9, 2023
Author