This web sample demonstrates how to use the LLM Inference API to run common text-to-text generation tasks like information retrieval, email drafting, and document summarization, on web.
- A browser with WebGPU support (eg. Chrome on macOS or Windows).
Follow the following instructions to run the sample on your device:
- Make a folder for the task, named as
llm_task
, and copy the index.html and index.js files into yourllm_task
folder. - Download a pre-converted Gemma model, like Gemma 2 2B (LiteRT 2b-it-gpu-int8) or the smaller Gemma 3 1B. Alternatively, you can convert an external LLM (Phi-2, Falcon, or StableLM) following the guide (only gpu backend is currently supported), into the
llm_task
folder. - In your
index.js
file, updatemodelFileName
with your model file's name. - Run
python3 -m http.server 8000
under thellm_task
folder to host the three files (orpython -m SimpleHTTPServer 8000
for older python versions). - Open
localhost:8000
in Chrome. Then the button on the webpage will be enabled when the task is ready (~10 seconds).