Skip to content

"Captions With Attitude" in your browser from your webcam generated by a Vision Language Model (VLM) from a Go program running entirely on your local machine using llama.cpp!

License

Notifications You must be signed in to change notification settings

hybridgroup/captions-with-attitude

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Captions With Attitude

Captions With Attitude

"Captions With Attitude" is a Go application that uses a Vision Language Model (VLM) to show live captions from your webcam in your browser all running entirely on your local machine!

It uses yzma to perform local inference using llama.cpp and GoCV for the video processing.

Installation

yzma

You must install yzma and llama.cpp to run this program.

See https://github.com/hybridgroup/yzma/blob/main/INSTALL.md

GoCV

You must also install OpenCV and GoCV, which unlike yzma requires CGo.

See https://gocv.io/getting-started/

Although yzma does not use CGo, yzma can co-exist in Go applications that use CGo.

Models

You will need a Vision Language Model (VLM). Download the model and projector files from Hugging Face in .gguf format.

For example, you can use the Qwen3-VL-2B-Instruct model.

https://huggingface.co/ggml-org/Qwen3-VL-2B-Instruct-GGUF

Building

go build .

Running

Flags

$ ./captions-with-attitude 

Usage:
captions-with-attitudes
  -device string
        camera device ID (default "0")
  -host string
        web server host:port (default "localhost:8080")
  -model string
        model file to use
  -p string
        prompt
  -projector string
        projector file to use
  -v    verbose logging

Example

./captions-with-attitude -model ~/models/Qwen3-VL-2B-Instruct-Q8_0.gguf -projector ~/models/mmproj-Qwen3-VL-2B-Instruct-Q8_0.gguf

Now open your web browser pointed to http://localhost:8080/

About

"Captions With Attitude" in your browser from your webcam generated by a Vision Language Model (VLM) from a Go program running entirely on your local machine using llama.cpp!

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published