Memories: Long-Term Memory for LLMs

A Proof of Concept (PoC) demonstrating how to equip Large Language Models (LLMs) with persistent, long-term memory using Vector Databases (Pinecone) and RAG (Retrieval-Augmented Generation).

🚀 Overview

This project implements a memory system that allows an AI assistant to "remember" user preferences, facts, and past interactions across different sessions. It goes beyond simple context window buffering by actively storing, retrieving, and synthesizing memories.

Key Features

Semantic Search: Retrieves relevant past interactions based on the meaning of the user's current query.
Core Memory Synthesis: Periodically distills raw conversation logs into high-level facts (e.g., "User is allergic to peanuts") using a background synthesis process.
Hybrid Ranking: Uses a weighted scoring algorithm (Similarity + Recency + Frequency) to prioritize memories for synthesis.
Real-time Streaming: Chat interface with real-time streaming and dynamic scroll.

🛠️ Tech Stack

Framework: Next.js 15 (App Router)
Language: TypeScript
Database: Pinecone (Vector DB)
AI/LLM: OpenAI (GPT-4o-mini for synthesis, text-embedding-3-small for embeddings)
Styling: Tailwind CSS + Shadcn UI
State Management: Zustand

🏃‍♂️ Getting Started

Prerequisites

Node.js 18+
pnpm (recommended) or npm
OpenAI API Key
Pinecone API Key & Index

Setup

Clone the repository
```
git clone <repo-url>
cd memories
```
Install dependencies
```
pnpm install
```

Environment Configuration Copy the example environment file and fill in your credentials:

cp .env.example .env

Update .env with:

OPENAI_API_KEY=sk-...
PINECONE_API_KEY=pc-...
PINECONE_INDEX_NAME=your-index-name

Run the Development Server
```
pnpm dev
```
Open http://localhost:3000 in your browser.

🧠 How It Works

The memory architecture consists of two main loops: the Interaction Loop and the Synthesis Loop.

1. Interaction Loop (Fast Path)

This happens in real-time as the user chats.

User Input: User sends a message (e.g., "What was that pizza recipe I liked?").
Retrieval: The system embeds the query and searches Pinecone for semantically similar past messages (/api/retrieve).
- Mechanism: Pure Vector Similarity (Cosine).
- Threshold: Matches with score > 0.35.
Augmentation: Retrieved memories are injected into the LLM's system prompt as context.
Generation: The LLM responds, aware of the past context.
Embedding: The user's new message is chunked, embedded, and stored in Pinecone asynchronously (/api/embed).

2. Synthesis Loop (Background Path)

This process runs to distill "Core Memories"—high-level facts about the user.

Broad Retrieval: Fetches a large set of candidate memories (getBroadCandidateMemories).
Re-Ranking: Candidates are re-scored using a custom algorithm:
- Similarity (40%): Relevance to "User preferences and facts".
- Recency (40%): Newer memories are weighted higher to capture current state.
- Frequency (20%): Repeated information is prioritized.
LLM Synthesis: The top candidates are sent to an LLM (GPT-4o-mini) with instructions to:
- Deduplicate information.
- Resolve conflicts (favoring recent data).
- Extract distinct facts (e.g., "User prefers dark mode").
Storage: The synthesized list is stored and displayed in the UI as the user's "Core Identity".

🔮 Future Scope & Roadmap

Based on current State-of-the-Art (SOTA) research in LLM memory systems (MemGPT, Generative Agents, GraphRAG), here are the identified gaps and planned improvements:

1. Hierarchical Memory Architecture (MemGPT-inspired)

Current: Flat vector storage + simple list of core memories.
Future: Implement a tiered architecture:
- Working Context: RAM-like immediate context.
- Recall Storage: Episodic memory (current implementation).
- Archival Storage: Deep storage for cold data.
Goal: Allow the LLM to explicitly "page in" and "page out" memories rather than relying solely on implicit retrieval.

2. Knowledge Graphs (GraphRAG)

Current: Unstructured text chunks.
Future: Extract entities (People, Places, Concepts) and relationships into a Knowledge Graph.
Goal: Enable multi-hop reasoning (e.g., "How is my project X related to the meeting I had last week?") which vector search struggles with.

3. Reflection & Planning (Generative Agents)

Current: Simple summarization of facts.
Future: Implement a "Reflection" step where the agent periodically pauses to analyze its own behavior and form higher-level goals or personality adjustments.
Goal: Create a more agentic feel where the AI evolves its personality based on interactions.

4. Active Memory Management

Current: Passive storage (everything is saved).
Future: Allow the user (or the agent) to explicitly forget or modify memories.
Goal: Better privacy and accuracy control.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.vscode		.vscode
public		public
src		src
tasks		tasks
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
components.json		components.json
eslint.config.mjs		eslint.config.mjs
next-env.d.ts		next-env.d.ts
next.config.ts		next.config.ts
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
postcss.config.mjs		postcss.config.mjs
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Memories: Long-Term Memory for LLMs

🚀 Overview

Key Features

🛠️ Tech Stack

🏃‍♂️ Getting Started

Prerequisites

Setup

🧠 How It Works

1. Interaction Loop (Fast Path)

2. Synthesis Loop (Background Path)

🔮 Future Scope & Roadmap

1. Hierarchical Memory Architecture (MemGPT-inspired)

2. Knowledge Graphs (GraphRAG)

3. Reflection & Planning (Generative Agents)

4. Active Memory Management

About

Uh oh!

Releases

Packages

Uh oh!

Languages

cryptus-neoxys/memories

Folders and files

Latest commit

History

Repository files navigation

Memories: Long-Term Memory for LLMs

🚀 Overview

Key Features

🛠️ Tech Stack

🏃‍♂️ Getting Started

Prerequisites

Setup

🧠 How It Works

1. Interaction Loop (Fast Path)

2. Synthesis Loop (Background Path)

🔮 Future Scope & Roadmap

1. Hierarchical Memory Architecture (MemGPT-inspired)

2. Knowledge Graphs (GraphRAG)

3. Reflection & Planning (Generative Agents)

4. Active Memory Management

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages