We built a guided install mission for llama.cpp inside KubeStellar Console, a standalone Kubernetes dashboard (unrelated to legacy kubestellar/kubestellar, kubeflex, or OCM — zero shared code).
→ Open the llama.cpp install mission
What the mission does
The mission deploys llama-server as a Kubernetes workload with a PVC for the GGUF model cache, and exposes it via a ClusterIP Service on the OpenAI-compatible /v1/chat/completions endpoint. Each step:
- Pre-flight — checks CPU vs CUDA image selection, verifies NVIDIA device plugin when GPU is requested
- Commands — deploys the
ghcr.io/ggml-org/llama.cpp:server image with an initContainer that fetches a GGUF model from Hugging Face on first start
- Validation — waits for rollout and health probe, then
curls /v1/chat/completions to confirm inference works end-to-end
- Troubleshooting — on failure, reads pod logs and suggests fixes (initContainer download failures, OOM, readiness probe timing, CPU vs GPU image mismatch)
- Rollback — complete uninstall path for the Deployment, Service, PVC, and namespace
The mission also ties llama.cpp into the Console's agent selector: set LLAMACPP_URL to the in-cluster Service URL and llama.cpp appears as a chat-capable provider in the dropdown, so Console chat routes through your local llama-server.
Why we're reaching out
The Console now ships with six local-LLM runner integrations (Ollama, llama.cpp, LocalAI, vLLM, LM Studio, Red Hat AI Inference Server) so operators in regulated or air-gapped environments can keep LLM traffic inside their cluster. llama.cpp is the "dependency-minimal" option — easiest to audit, lightest to run, supports the broadest hardware matrix.
Install
Local (connects to your current kubeconfig context):
curl -sSL https://raw.githubusercontent.com/kubestellar/console/main/start.sh | bash
Deploy into a cluster:
curl -sSL https://raw.githubusercontent.com/kubestellar/console/main/deploy.sh | bash
Mission definitions are open source — PRs welcome at install-llama-cpp.json. Feel free to close if not relevant.
We built a guided install mission for llama.cpp inside KubeStellar Console, a standalone Kubernetes dashboard (unrelated to legacy kubestellar/kubestellar, kubeflex, or OCM — zero shared code).
→ Open the llama.cpp install mission
What the mission does
The mission deploys
llama-serveras a Kubernetes workload with a PVC for the GGUF model cache, and exposes it via a ClusterIP Service on the OpenAI-compatible/v1/chat/completionsendpoint. Each step:ghcr.io/ggml-org/llama.cpp:serverimage with an initContainer that fetches a GGUF model from Hugging Face on first startcurls/v1/chat/completionsto confirm inference works end-to-endThe mission also ties llama.cpp into the Console's agent selector: set
LLAMACPP_URLto the in-cluster Service URL and llama.cpp appears as a chat-capable provider in the dropdown, so Console chat routes through your local llama-server.Why we're reaching out
The Console now ships with six local-LLM runner integrations (Ollama, llama.cpp, LocalAI, vLLM, LM Studio, Red Hat AI Inference Server) so operators in regulated or air-gapped environments can keep LLM traffic inside their cluster. llama.cpp is the "dependency-minimal" option — easiest to audit, lightest to run, supports the broadest hardware matrix.
Install
Local (connects to your current kubeconfig context):
curl -sSL https://raw.githubusercontent.com/kubestellar/console/main/start.sh | bashDeploy into a cluster:
curl -sSL https://raw.githubusercontent.com/kubestellar/console/main/deploy.sh | bashMission definitions are open source — PRs welcome at install-llama-cpp.json. Feel free to close if not relevant.