Your AI companion and navigation in Smart Glasses - A multimodal LLM agent with vision, voice, and text capabilities.
The project follows a domain-driven architecture:
Smart_Glasses/
├── ui/ # Streamlit frontend
│ ├── app.py # Main Streamlit app
│ ├── components/ # UI components
│ └── utils/ # UI utilities
├── agent/ # LLM agent logic
│ ├── llm.py # LLM model handling & generation
│ ├── agent_loop.py # Main agent reasoning loop
│ └── modes.py # Quick/thinking mode logic
├── tools/ # MCP tools
│ ├── vision/ # Computer vision tools (YOLO)
│ ├── search/ # Web search tools
│ ├── speech/ # Speech recognition & TTS
│ └── navigation/ # GPS/navigation tools
├── server/ # MCP server
│ ├── server.py # FastMCP server definition
│ └── gateway.py # HTTP gateway for Streamlit
├── models/ # Data models
│ └── requests.py # Pydantic models for API
├── config/ # Configuration
│ ├── settings.py # App settings
│ └── model_config.py # LLM model configuration
└── shared/ # Shared utilities
└── utils.py # Common utilities
- Multimodal Input: Support for text, voice, and image inputs
- Two Agent Modes:
- Quick Mode: Fast single-pass responses
- Thinking Mode: Deep reasoning with history looping until satisfied
- MCP Tools:
VisionDetect: Real-time object detection using camera and YOLO modelsearch_web: Web search and context retrieval
- Streamlit UI: Interactive web interface with live camera and audio capture
Important: You must install dependencies before running the application!
Option A: Using uv (Recommended)
uv syncOption B: Using pip
pip install -r requirements.txtWindows:
.venv\Scripts\activateLinux/Mac:
source .venv/bin/activateOpen a terminal (with venv activated) and run:
Windows (using batch file):
start_gateway.batLinux/Mac (using shell script):
chmod +x start_gateway.sh
./start_gateway.shOr manually:
python start_gateway.pyYou should see:
🚀 Starting gateway server on localhost:8000
INFO: Uvicorn running on http://localhost:8000
The gateway will be available at http://localhost:8000
Open a NEW terminal, activate the virtual environment, and run:
Windows (using batch file):
start_streamlit.batLinux/Mac (using shell script):
chmod +x start_streamlit.sh
./start_streamlit.shOr manually:
# Activate venv first
streamlit run ui/app.pyThe app will open in your browser at http://localhost:8501
"ModuleNotFoundError" or "No module named 'fastapi'"
- Solution: Install dependencies first with
uv syncorpip install -r requirements.txt
"Connection refused" or "Gateway Offline"
- Solution: Make sure the gateway server is running (Step 2). Check that you see the startup message in the gateway terminal.
See QUICKSTART.md for more detailed troubleshooting.
- Text Input: Type your question in the text area
- Image Input: Click "Capture Frame" to capture the current camera frame
- Voice Input: Click "Capture Audio" to capture audio from your microphone
- Send Request: Click "Send Request" to process all inputs together
- Mode Selection: Choose between "quick" (fast) or "thinking" (deep reasoning) mode
The agent will:
- Transcribe audio if provided
- Combine text + transcribed audio + image into a unified prompt
- Process through the LLM agent with available MCP tools
- Return a response
Edit config/settings.py or set environment variables:
MODEL_ID: LLM model identifier (default: "google/gemma-3-4b-it")DEVICE: "cuda" or "cpu" (default: auto-detected)API_HOST: Gateway host (default: "localhost")API_PORT: Gateway port (default: 8000)MAX_LOOPS: Maximum agent loop iterations (default: 8)
The MCP server can be used independently with Claude Desktop or other MCP clients.
On Windows: %APPDATA%/Claude/claude_desktop_config.json
{
"mcpServers": {
"smart-glasses": {
"command": "uv",
"args": [
"--directory",
"D:\\0_code\\New_ideas\\1_Coding_Now\\Smart_Glasses",
"run",
"fastmcp",
"run",
"server.server:mcp",
"--transport",
"stdio"
]
}
}
}npx @modelcontextprotocol/inspector uv --directory D:\0_code\New_ideas\1_Coding_Now\Smart_Glasses run fastmcp run server.server:mcp --transport stdiostart_gateway.py: Start the HTTP gateway serverserver/server.py: MCP server entry pointui/app.py: Streamlit application
Test the end-to-end flow:
- Start the gateway
- Start the Streamlit app
- Try different input combinations (text, image, audio)
- Test both quick and thinking modes
- Verify MCP tool calls work (vision detection, web search)
See MIGRATION_NOTES.md for details about the refactoring from the old structure.
Key dependencies:
fastmcp: MCP server frameworkstreamlit: Web UI frameworkstreamlit-webrtc: WebRTC for camera/audiotransformers: LLM modelstorch: Deep learning frameworkultralytics: YOLO object detectionwhisper: Speech recognitionedge-tts: Text-to-speechfastapi: HTTP API framework
See pyproject.toml for the complete list.