# Vera-AI - Persistent Memory Proxy for Ollama **Vera** (Latin): *True* — **True AI Memory** --- ## What is Vera-AI? Vera-AI is a transparent proxy for Ollama that adds persistent memory using Qdrant vector storage. It sits between your AI client and Ollama, automatically augmenting conversations with relevant context from previous sessions. **Every conversation is remembered.** --- ## How It Works ``` ┌─────────────────────────────────────────────────────────────────────────────────┐ │ REQUEST FLOW │ └─────────────────────────────────────────────────────────────────────────────────┘ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ Client │ ──(1)──▶│ Vera-AI │ ──(3)──▶│ Ollama │ ──(5)──▶│ Response │ │ (You) │ │ Proxy │ │ LLM │ │ to User │ └──────────┘ └────┬─────┘ └──────────┘ └──────────┘ │ │ (2) Query semantic memory │ ▼ ┌──────────┐ │ Qdrant │ │ Vector DB│ └──────────┘ │ │ (4) Store conversation turn │ ▼ ┌──────────┐ │ Memory │ │ Storage │ └──────────┘ ``` --- ## Quick Start ### Option 1: Docker Run (Single Command) ```bash docker run -d \ --name VeraAI \ --restart unless-stopped \ --network host \ -e APP_UID=1000 \ -e APP_GID=1000 \ -e TZ=America/Chicago \ -e VERA_DEBUG=false \ -v ./config/config.toml:/app/config/config.toml:ro \ -v ./prompts:/app/prompts:rw \ -v ./logs:/app/logs:rw \ your-username/vera-ai:latest ``` ### Option 2: Docker Compose Create `docker-compose.yml`: ```yaml services: vera-ai: image: your-username/vera-ai:latest container_name: VeraAI restart: unless-stopped network_mode: host environment: - APP_UID=1000 - APP_GID=1000 - TZ=America/Chicago - VERA_DEBUG=false volumes: - ./config/config.toml:/app/config/config.toml:ro - ./prompts:/app/prompts:rw - ./logs:/app/logs:rw healthcheck: test: ["CMD", "python", "-c", "import urllib.request; urllib.request.urlopen('http://localhost:11434/')"] interval: 30s timeout: 10s retries: 3 start_period: 10s ``` Run with: ```bash docker compose up -d ``` --- ## Configuration ### Environment Variables | Variable | Default | Description | |----------|---------|-------------| | `APP_UID` | `999` | Container user ID (match your host UID) | | `APP_GID` | `999` | Container group ID (match your host GID) | | `TZ` | `UTC` | Container timezone | | `VERA_DEBUG` | `false` | Enable debug logging | ### config.toml Create `config/config.toml`: ```toml [general] # Ollama server URL ollama_host = "http://10.0.0.10:11434" # Qdrant vector database URL qdrant_host = "http://10.0.0.22:6333" # Collection name for memories qdrant_collection = "memories" # Embedding model for semantic search embedding_model = "snowflake-arctic-embed2" # Enable debug logging (set to true for verbose logs) debug = false [layers] # Token budget for semantic memory layer semantic_token_budget = 25000 # Token budget for recent context layer context_token_budget = 22000 # Number of recent turns to include in semantic search semantic_search_turns = 2 # Minimum similarity score for semantic search (0.0-1.0) semantic_score_threshold = 0.6 [curator] # Time for daily curation (HH:MM format) run_time = "02:00" # Time for monthly full curation (HH:MM format) full_run_time = "03:00" # Day of month for full curation (1-28) full_run_day = 1 # Model to use for curation curator_model = "gpt-oss:120b" ``` ### prompts/ Directory Create `prompts/` directory with: **`prompts/curator_prompt.md`** - Prompt for memory curation: ```markdown You are a memory curator. Your job is to summarize conversation turns into concise Q&A pairs that will be stored for future reference. Extract the key information and create clear, searchable entries. ``` **`prompts/systemprompt.md`** - System context for Vera: ```markdown You are Vera, an AI with persistent memory. You remember all previous conversations with this user and can reference them contextually. ``` --- ## Docker Options Explained | Option | Description | |--------|-------------| | `-d` | Run detached (background) | | `--name VeraAI` | Container name | | `--restart unless-stopped` | Auto-start on boot, survive reboots | | `--network host` | Use host network (port 11434) | | `-e APP_UID=1000` | User ID (match your host UID) | | `-e APP_GID=1000` | Group ID (match your host GID) | | `-e TZ=America/Chicago` | Timezone for scheduler | | `-e VERA_DEBUG=false` | Disable debug logging | | `-v ...config.toml:ro` | Config file (read-only) | | `-v ...prompts:rw` | Prompts directory (read-write) | | `-v ...logs:rw` | Logs directory (read-write) | --- ## 📋 Prerequisites ### Required Services | Service | Version | Description | |---------|---------|-------------| | **Ollama** | 0.1.x+ | LLM inference server | | **Qdrant** | 1.6.x+ | Vector database | | **Docker** | 20.x+ | Container runtime | ### System Requirements | Requirement | Minimum | Recommended | |-------------|---------|-------------| | **CPU** | 2 cores | 4+ cores | | **RAM** | 2 GB | 4+ GB | | **Disk** | 1 GB | 5+ GB | --- ## 🔧 Installing with Ollama ### Option A: All on Same Host (Recommended) Install all services on a single machine: ```bash # 1. Install Ollama curl https://ollama.ai/install.sh | sh # 2. Pull required models ollama pull snowflake-arctic-embed2 # Embedding model (required) ollama pull llama3.1 # Chat model # 3. Run Qdrant in Docker docker run -d --name qdrant -p 6333:6333 qdrant/qdrant # 4. Run Vera-AI docker run -d \ --name VeraAI \ --restart unless-stopped \ --network host \ -e APP_UID=$(id -u) \ -e APP_GID=$(id -g) \ -e TZ=America/Chicago \ -v ./config/config.toml:/app/config/config.toml:ro \ -v ./prompts:/app/prompts:rw \ -v ./logs:/app/logs:rw \ your-username/vera-ai:latest ``` **Config for same-host (config/config.toml):** ```toml [general] ollama_host = "http://127.0.0.1:11434" qdrant_host = "http://127.0.0.1:6333" qdrant_collection = "memories" embedding_model = "snowflake-arctic-embed2" ``` ### Option B: Docker Compose All-in-One ```yaml services: ollama: image: ollama/ollama ports: ["11434:11434"] volumes: [ollama_data:/root/.ollama] qdrant: image: qdrant/qdrant ports: ["6333:6333"] volumes: [qdrant_data:/qdrant/storage] vera-ai: image: your-username/vera-ai:latest network_mode: host volumes: - ./config/config.toml:/app/config/config.toml:ro - ./prompts:/app/prompts:rw volumes: ollama_data: qdrant_data: ``` ### Option C: Different Port If Ollama uses port 11434, run Vera on port 8080: ```bash docker run -d --name VeraAI -p 8080:11434 ... # Connect client to: http://localhost:8080 ``` --- ## ✅ Pre-Flight Checklist - [ ] Docker installed (`docker --version`) - [ ] Ollama running (`curl http://localhost:11434/api/tags`) - [ ] Qdrant running (`curl http://localhost:6333/collections`) - [ ] Embedding model (`ollama pull snowflake-arctic-embed2`) - [ ] Chat model (`ollama pull llama3.1`) --- --- ## Features | Feature | Description | |---------|-------------| | 🧠 **Persistent Memory** | Conversations stored in Qdrant, retrieved contextually | | 📅 **Monthly Curation** | Daily + monthly cleanup of raw memories | | 🔍 **4-Layer Context** | System + semantic + recent + current messages | | 👤 **Configurable UID/GID** | Match container user to host for permissions | | 🌍 **Timezone Support** | Scheduler runs in your local timezone | | 📝 **Debug Logging** | Optional logs written to configurable directory | --- ## API Endpoints | Endpoint | Method | Description | |----------|--------|-------------| | `/` | `GET` | Health check | | `/api/chat` | `POST` | Chat completion (with memory) | | `/api/tags` | `GET` | List models | | `/curator/run` | `POST` | Trigger curator manually | --- ## Verify Installation ```bash # Health check curl http://localhost:11434/ # Expected: {"status":"ok","ollama":"reachable"} # Check container docker ps # Expected: VeraAI running with (healthy) status # Test chat curl -X POST http://localhost:11434/api/chat \ -H "Content-Type: application/json" \ -d '{"model":"your-model","messages":[{"role":"user","content":"hello"}],"stream":false}' ``` --- ## Troubleshooting ### Permission Denied ```bash # Get your UID/GID id # Set in environment APP_UID=$(id -u) APP_GID=$(id -g) ``` ### Wrong Timezone ```bash # Set correct timezone TZ=America/Chicago ``` --- ## Source Code - **Gitea**: https://speedyfox.app/SpeedyFoxAi/vera-ai-v2 --- ## License MIT License --- Brought to you by SpeedyFoxAi