9.6 KiB
9.6 KiB
Vera-AI - Persistent Memory Proxy for Ollama
Vera (Latin): True — True AI Memory
What is Vera-AI?
Vera-AI is a transparent proxy for Ollama that adds persistent memory using Qdrant vector storage. It sits between your AI client and Ollama, automatically augmenting conversations with relevant context from previous sessions.
Every conversation is remembered.
How It Works
┌─────────────────────────────────────────────────────────────────────────────────┐
│ REQUEST FLOW │
└─────────────────────────────────────────────────────────────────────────────────┘
┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
│ Client │ ──(1)──▶│ Vera-AI │ ──(3)──▶│ Ollama │ ──(5)──▶│ Response │
│ (You) │ │ Proxy │ │ LLM │ │ to User │
└──────────┘ └────┬─────┘ └──────────┘ └──────────┘
│
│ (2) Query semantic memory
│
▼
┌──────────┐
│ Qdrant │
│ Vector DB│
└──────────┘
│
│ (4) Store conversation turn
│
▼
┌──────────┐
│ Memory │
│ Storage │
└──────────┘
Quick Start
Option 1: Docker Run (Single Command)
docker run -d \
--name VeraAI \
--restart unless-stopped \
--network host \
-e APP_UID=1000 \
-e APP_GID=1000 \
-e TZ=America/Chicago \
-e VERA_DEBUG=false \
-v ./config/config.toml:/app/config/config.toml:ro \
-v ./prompts:/app/prompts:rw \
-v ./logs:/app/logs:rw \
mdkrushr/vera-ai:latest
Option 2: Docker Compose
Create docker-compose.yml:
services:
vera-ai:
image: mdkrushr/vera-ai:latest
container_name: VeraAI
restart: unless-stopped
network_mode: host
environment:
- APP_UID=1000
- APP_GID=1000
- TZ=America/Chicago
- VERA_DEBUG=false
volumes:
- ./config/config.toml:/app/config/config.toml:ro
- ./prompts:/app/prompts:rw
- ./logs:/app/logs:rw
healthcheck:
test: ["CMD", "python", "-c", "import urllib.request; urllib.request.urlopen('http://localhost:11434/')"]
interval: 30s
timeout: 10s
retries: 3
start_period: 10s
Run with:
docker compose up -d
Configuration
Environment Variables
| Variable | Default | Description |
|---|---|---|
APP_UID |
999 |
Container user ID (match your host UID) |
APP_GID |
999 |
Container group ID (match your host GID) |
TZ |
UTC |
Container timezone |
VERA_DEBUG |
false |
Enable debug logging |
config.toml
Create config/config.toml:
[general]
# Ollama server URL
ollama_host = "http://10.0.0.10:11434"
# Qdrant vector database URL
qdrant_host = "http://10.0.0.22:6333"
# Collection name for memories
qdrant_collection = "memories"
# Embedding model for semantic search
embedding_model = "snowflake-arctic-embed2"
# Enable debug logging (set to true for verbose logs)
debug = false
[layers]
# Token budget for semantic memory layer
semantic_token_budget = 25000
# Token budget for recent context layer
context_token_budget = 22000
# Number of recent turns to include in semantic search
semantic_search_turns = 2
# Minimum similarity score for semantic search (0.0-1.0)
semantic_score_threshold = 0.6
[curator]
# Time for daily curation (HH:MM format)
run_time = "02:00"
# Time for monthly full curation (HH:MM format)
# Day of month for full curation (1-28)
# Model to use for curation
curator_model = "gpt-oss:120b"
prompts/ Directory
Create prompts/ directory with:
prompts/curator_prompt.md - Prompt for memory curation:
You are a memory curator. Your job is to summarize conversation turns
into concise Q&A pairs that will be stored for future reference.
Extract the key information and create clear, searchable entries.
prompts/systemprompt.md - System context for Vera:
You are Vera, an AI with persistent memory. You remember all previous
conversations with this user and can reference them contextually.
Docker Options Explained
| Option | Description |
|---|---|
-d |
Run detached (background) |
--name VeraAI |
Container name |
--restart unless-stopped |
Auto-start on boot, survive reboots |
--network host |
Use host network (port 11434) |
-e APP_UID=1000 |
User ID (match your host UID) |
-e APP_GID=1000 |
Group ID (match your host GID) |
-e TZ=America/Chicago |
Timezone for scheduler |
-e VERA_DEBUG=false |
Disable debug logging |
-v ...config.toml:ro |
Config file (read-only) |
-v ...prompts:rw |
Prompts directory (read-write) |
-v ...logs:rw |
Logs directory (read-write) |
📋 Prerequisites
Required Services
| Service | Version | Description |
|---|---|---|
| Ollama | 0.1.x+ | LLM inference server |
| Qdrant | 1.6.x+ | Vector database |
| Docker | 20.x+ | Container runtime |
System Requirements
| Requirement | Minimum | Recommended |
|---|---|---|
| CPU | 2 cores | 4+ cores |
| RAM | 2 GB | 4+ GB |
| Disk | 1 GB | 5+ GB |
🔧 Installing with Ollama
Option A: All on Same Host (Recommended)
Install all services on a single machine:
# 1. Install Ollama
curl https://ollama.ai/install.sh | sh
# 2. Pull required models
ollama pull snowflake-arctic-embed2 # Embedding model (required)
ollama pull llama3.1 # Chat model
# 3. Run Qdrant in Docker
docker run -d --name qdrant -p 6333:6333 qdrant/qdrant
# 4. Run Vera-AI
docker run -d \
--name VeraAI \
--restart unless-stopped \
--network host \
-e APP_UID=$(id -u) \
-e APP_GID=$(id -g) \
-e TZ=America/Chicago \
-v ./config/config.toml:/app/config/config.toml:ro \
-v ./prompts:/app/prompts:rw \
-v ./logs:/app/logs:rw \
mdkrushr/vera-ai:latest
Config for same-host (config/config.toml):
[general]
ollama_host = "http://127.0.0.1:11434"
qdrant_host = "http://127.0.0.1:6333"
qdrant_collection = "memories"
embedding_model = "snowflake-arctic-embed2"
Option B: Docker Compose All-in-One
services:
ollama:
image: ollama/ollama
ports: ["11434:11434"]
volumes: [ollama_data:/root/.ollama]
qdrant:
image: qdrant/qdrant
ports: ["6333:6333"]
volumes: [qdrant_data:/qdrant/storage]
vera-ai:
image: mdkrushr/vera-ai:latest
network_mode: host
volumes:
- ./config/config.toml:/app/config/config.toml:ro
- ./prompts:/app/prompts:rw
volumes:
ollama_data:
qdrant_data:
Option C: Different Port
If Ollama uses port 11434, run Vera on port 8080:
docker run -d --name VeraAI -p 8080:11434 ...
# Connect client to: http://localhost:8080
✅ Pre-Flight Checklist
- Docker installed (
docker --version) - Ollama running (
curl http://localhost:11434/api/tags) - Qdrant running (
curl http://localhost:6333/collections) - Embedding model (
ollama pull snowflake-arctic-embed2) - Chat model (
ollama pull llama3.1)
Features
| Feature | Description |
|---|---|
| 🧠 Persistent Memory | Conversations stored in Qdrant, retrieved contextually |
| 📅 Monthly Curation | Daily cleanup, auto-monthly on day 01 |
| 🔍 4-Layer Context | System + semantic + recent + current messages |
| 👤 Configurable UID/GID | Match container user to host for permissions |
| 🌍 Timezone Support | Scheduler runs in your local timezone |
| 📝 Debug Logging | Optional logs written to configurable directory |
API Endpoints
| Endpoint | Method | Description |
|---|---|---|
/ |
GET |
Health check |
/api/chat |
POST |
Chat completion (with memory) |
/api/tags |
GET |
List models |
/curator/run |
POST |
Trigger curator manually |
Verify Installation
# Health check
curl http://localhost:11434/
# Expected: {"status":"ok","ollama":"reachable"}
# Check container
docker ps
# Expected: VeraAI running with (healthy) status
# Test chat
curl -X POST http://localhost:11434/api/chat \
-H "Content-Type: application/json" \
-d '{"model":"your-model","messages":[{"role":"user","content":"hello"}],"stream":false}'
Troubleshooting
Permission Denied
# Get your UID/GID
id
# Set in environment
APP_UID=$(id -u)
APP_GID=$(id -g)
Wrong Timezone
# Set correct timezone
TZ=America/Chicago
Source Code
License
MIT License
Brought to you by SpeedyFoxAi