Smart Glasses Project

Your AI companion and navigation in Smart Glasses - A multimodal LLM agent with vision, voice, and text capabilities.

Project Structure

The project follows a domain-driven architecture:

Smart_Glasses/
├── ui/                    # Streamlit frontend
│   ├── app.py            # Main Streamlit app
│   ├── components/       # UI components
│   └── utils/            # UI utilities
├── agent/                # LLM agent logic
│   ├── llm.py           # LLM model handling & generation
│   ├── agent_loop.py    # Main agent reasoning loop
│   └── modes.py         # Quick/thinking mode logic
├── tools/                # MCP tools
│   ├── vision/          # Computer vision tools (YOLO)
│   ├── search/          # Web search tools
│   ├── speech/          # Speech recognition & TTS
│   └── navigation/      # GPS/navigation tools
├── server/               # MCP server
│   ├── server.py        # FastMCP server definition
│   └── gateway.py       # HTTP gateway for Streamlit
├── models/               # Data models
│   └── requests.py      # Pydantic models for API
├── config/               # Configuration
│   ├── settings.py      # App settings
│   └── model_config.py  # LLM model configuration
└── shared/               # Shared utilities
    └── utils.py         # Common utilities

Features

Multimodal Input: Support for text, voice, and image inputs
Two Agent Modes:
- Quick Mode: Fast single-pass responses
- Thinking Mode: Deep reasoning with history looping until satisfied
MCP Tools:
- VisionDetect: Real-time object detection using camera and YOLO model
- search_web: Web search and context retrieval
Streamlit UI: Interactive web interface with live camera and audio capture

Quickstart

1. Install Dependencies

Important: You must install dependencies before running the application!

Option A: Using uv (Recommended)

uv sync

Option B: Using pip

pip install -r requirements.txt

2. Activate Virtual Environment

Windows:

.venv\Scripts\activate

Linux/Mac:

source .venv/bin/activate

3. Start the HTTP Gateway

Open a terminal (with venv activated) and run:

Windows (using batch file):

start_gateway.bat

Linux/Mac (using shell script):

chmod +x start_gateway.sh
./start_gateway.sh

Or manually:

python start_gateway.py

You should see:

🚀 Starting gateway server on localhost:8000
INFO:     Uvicorn running on http://localhost:8000

⚠️ Keep this terminal open! The gateway must be running for the Streamlit app to work.

The gateway will be available at http://localhost:8000

4. Start the Streamlit App

Open a NEW terminal, activate the virtual environment, and run:

Windows (using batch file):

start_streamlit.bat

Linux/Mac (using shell script):

chmod +x start_streamlit.sh
./start_streamlit.sh

Or manually:

# Activate venv first
streamlit run ui/app.py

The app will open in your browser at http://localhost:8501

Troubleshooting

"ModuleNotFoundError" or "No module named 'fastapi'"

Solution: Install dependencies first with uv sync or pip install -r requirements.txt

"Connection refused" or "Gateway Offline"

Solution: Make sure the gateway server is running (Step 2). Check that you see the startup message in the gateway terminal.

See QUICKSTART.md for more detailed troubleshooting.

Usage

Text Input: Type your question in the text area
Image Input: Click "Capture Frame" to capture the current camera frame
Voice Input: Click "Capture Audio" to capture audio from your microphone
Send Request: Click "Send Request" to process all inputs together
Mode Selection: Choose between "quick" (fast) or "thinking" (deep reasoning) mode

The agent will:

Transcribe audio if provided
Combine text + transcribed audio + image into a unified prompt
Process through the LLM agent with available MCP tools
Return a response

Configuration

Edit config/settings.py or set environment variables:

MODEL_ID: LLM model identifier (default: "google/gemma-3-4b-it")
DEVICE: "cuda" or "cpu" (default: auto-detected)
API_HOST: Gateway host (default: "localhost")
API_PORT: Gateway port (default: 8000)
MAX_LOOPS: Maximum agent loop iterations (default: 8)

MCP Server

The MCP server can be used independently with Claude Desktop or other MCP clients.

Claude Desktop Configuration

On Windows: %APPDATA%/Claude/claude_desktop_config.json

{
	"mcpServers": {
		"smart-glasses": {
			"command": "uv",
			"args": [
				"--directory",
				"D:\\0_code\\New_ideas\\1_Coding_Now\\Smart_Glasses",
				"run",
				"fastmcp",
				"run",
				"server.server:mcp",
				"--transport",
				"stdio"
			]
		}
	}
}

Debugging with MCP Inspector

npx @modelcontextprotocol/inspector uv --directory D:\0_code\New_ideas\1_Coding_Now\Smart_Glasses run fastmcp run server.server:mcp --transport stdio

Development

Project Scripts

start_gateway.py: Start the HTTP gateway server
server/server.py: MCP server entry point
ui/app.py: Streamlit application

Testing

Test the end-to-end flow:

Start the gateway
Start the Streamlit app
Try different input combinations (text, image, audio)
Test both quick and thinking modes
Verify MCP tool calls work (vision detection, web search)

Migration Notes

See MIGRATION_NOTES.md for details about the refactoring from the old structure.

Dependencies

Key dependencies:

fastmcp: MCP server framework
streamlit: Web UI framework
streamlit-webrtc: WebRTC for camera/audio
transformers: LLM models
torch: Deep learning framework
ultralytics: YOLO object detection
whisper: Speech recognition
edge-tts: Text-to-speech
fastapi: HTTP API framework

See pyproject.toml for the complete list.

Name		Name	Last commit message	Last commit date
Latest commit History 111 Commits
.claude		.claude
Comp_Analysis		Comp_Analysis
Hardware		Hardware
Presentation		Presentation
Review		Review
Smart_Gllasses/tools/speech		Smart_Gllasses/tools/speech
Temp		Temp
Tests		Tests
Todo/Meeting4		Todo/Meeting4
__pycache__		__pycache__
agent		agent
ar_pipeline		ar_pipeline
audio_recorder_component		audio_recorder_component
config		config
docs		docs
firmware/smart_glasses_esp32		firmware/smart_glasses_esp32
free-llm-api-resources		free-llm-api-resources
mobile		mobile
mobile_native		mobile_native
models		models
old		old
qr_codes		qr_codes
scripts		scripts
server		server
server_audio		server_audio
shared		shared
src		src
tools		tools
ui		ui
wakeword_models		wakeword_models
web_app		web_app
⚠Urgent4.1.2025		⚠Urgent4.1.2025
.gitignore		.gitignore
.python-version		.python-version
Clawdbot.md		Clawdbot.md
MIGRATION_NOTES.md		MIGRATION_NOTES.md
QUICKSTART.md		QUICKSTART.md
README.md		README.md
Todo.md		Todo.md
app_complex.py		app_complex.py
app_simple.py		app_simple.py
app_streamlit_legacy.py		app_streamlit_legacy.py
check_audio_devices.py		check_audio_devices.py
check_dependencies.py		check_dependencies.py
demo_wakeword.py		demo_wakeword.py
lore.md		lore.md
navigation.json		navigation.json
navigation_visualization.png		navigation_visualization.png
ngrok.exe		ngrok.exe
pyproject.toml		pyproject.toml
req.txt		req.txt
requirements.txt		requirements.txt
run_flask.py		run_flask.py
server.log		server.log
server_startup.log		server_startup.log
start_all.py		start_all.py
start_frame]		start_frame]
start_gateway.py		start_gateway.py
start_server.py		start_server.py
test_system.py		test_system.py
uv.lock		uv.lock
yolo11n.pt		yolo11n.pt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Smart Glasses Project

Project Structure

Features

Quickstart

1. Install Dependencies

2. Activate Virtual Environment

3. Start the HTTP Gateway

4. Start the Streamlit App

Troubleshooting

Usage

Configuration

MCP Server

Claude Desktop Configuration

Debugging with MCP Inspector

Development

Project Scripts

Testing

Migration Notes

Dependencies

About

Uh oh!

Releases

Packages

Contributors 4

Uh oh!

Languages

mozaikuu/AR_Glasses

Folders and files

Latest commit

History

Repository files navigation

Smart Glasses Project

Project Structure

Features

Quickstart

1. Install Dependencies

2. Activate Virtual Environment

3. Start the HTTP Gateway

4. Start the Streamlit App

Troubleshooting

Usage

Configuration

MCP Server

Claude Desktop Configuration

Debugging with MCP Inspector

Development

Project Scripts

Testing

Migration Notes

Dependencies

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Uh oh!

Languages

Packages