# Vera-AI: Persistent Memory Proxy for Ollama [![Docker](https://img.shields.io/docker/pulls/vera-ai/latest)](https://hub.docker.com/r/vera-ai/latest) **Vera-AI** is a transparent proxy for Ollama that adds persistent memory using Qdrant vector storage. It sits between your AI client and Ollama, automatically augmenting conversations with relevant context from previous sessions. ## Features - **Persistent Memory**: Conversations are stored in Qdrant and retrieved contextually - **Monthly Curation**: Daily and monthly cleanup of raw memories - **4-Layer Context**: System prompt + semantic memory + recent context + current messages - **Configurable UID/GID**: Match container user to host user for volume permissions - **Timezone Support**: Scheduler runs in your local timezone - **Debug Logging**: Optional debug logs written to configurable directory ## Prerequisites - **Ollama**: Running LLM inference server (e.g., `http://10.0.0.10:11434`) - **Qdrant**: Running vector database (e.g., `http://10.0.0.22:6333`) - **Docker**: Docker and Docker Compose installed - **Git**: For cloning the repository ## Quick Start ```bash # Clone the repository git clone http://10.0.0.61:3000/SpeedyFoxAi/vera-ai-v2.git cd vera-ai-v2 # Create environment file from template cp .env.example .env # Edit .env with your settings nano .env # Create required directories mkdir -p config prompts logs # Copy default config (or create your own) cp config.toml config/ # Build and run docker compose build docker compose up -d # Test curl http://localhost:11434/ ``` ## Full Setup Instructions ### 1. Clone Repository ```bash git clone http://10.0.0.61:3000/SpeedyFoxAi/vera-ai-v2.git cd vera-ai-v2 ``` ### 2. Create Environment File Create `.env` file (or copy from `.env.example`): ```bash # User/Group Configuration (match your host user) APP_UID=1000 APP_GID=1000 # Timezone Configuration TZ=America/Chicago # API Keys (optional) # OPENROUTER_API_KEY=your_api_key_here ``` **Important:** `APP_UID` and `APP_GID` must match your host user's UID/GID for volume permissions: ```bash # Get your UID and GID id -u # UID id -g # GID # Set in .env APP_UID=1000 # Replace with your UID APP_GID=1000 # Replace with your GID ``` ### 3. Create Required Directories ```bash # Create directories mkdir -p config prompts logs # Copy default configuration cp config.toml config/ # Verify prompts exist (should be in the repo) ls -la prompts/ # Should show: curator_prompt.md, systemprompt.md ``` ### 4. Configure Ollama and Qdrant Edit `config/config.toml`: ```toml [general] ollama_host = "http://YOUR_OLLAMA_IP:11434" qdrant_host = "http://YOUR_QDRANT_IP:6333" qdrant_collection = "memories" embedding_model = "snowflake-arctic-embed2" debug = false [layers] semantic_token_budget = 25000 context_token_budget = 22000 semantic_search_turns = 2 semantic_score_threshold = 0.6 [curator] run_time = "02:00" # Daily curator time full_run_time = "03:00" # Monthly full curator time full_run_day = 1 # Day of month (1st) curator_model = "gpt-oss:120b" ``` ### 5. Build and Run ```bash # Build with your UID/GID APP_UID=$(id -u) APP_GID=$(id -g) docker compose build # Run with timezone docker compose up -d # Check status docker ps docker logs vera-ai --tail 20 # Test health endpoint curl http://localhost:11434/ # Expected: {"status":"ok","ollama":"reachable"} ``` ### 6. Verify Installation ```bash # Check container is healthy docker ps --format "table {{.Names}}\t{{.Status}}" # Expected: vera-ai Up X minutes (healthy) # Check timezone docker exec vera-ai date # Should show your timezone (e.g., CDT for America/Chicago) # Check user docker exec vera-ai id # Expected: uid=1000(appuser) gid=1000(appgroup) # Check directories docker exec vera-ai ls -la /app/prompts/ # Should show: curator_prompt.md, systemprompt.md docker exec vera-ai ls -la /app/logs/ # Should be writable # Test chat curl -X POST http://localhost:11434/api/chat \ -H "Content-Type: application/json" \ -d '{"model":"YOUR_MODEL","messages":[{"role":"user","content":"hello"}],"stream":false}' ``` ## Configuration ### Environment Variables (.env) | Variable | Default | Description | |----------|---------|-------------| | `APP_UID` | `999` | User ID for container user (match your host UID) | | `APP_GID` | `999` | Group ID for container group (match your host GID) | | `TZ` | `UTC` | Timezone for scheduler | | `OPENROUTER_API_KEY` | - | API key for cloud model routing (optional) | | `VERA_CONFIG_DIR` | `/app/config` | Configuration directory (optional) | | `VERA_PROMPTS_DIR` | `/app/prompts` | Prompts directory (optional) | | `VERA_LOG_DIR` | `/app/logs` | Debug log directory (optional) | ### Volume Mappings | Host Path | Container Path | Mode | Purpose | |-----------|---------------|------|---------| | `./config/config.toml` | `/app/config/config.toml` | `ro` | Configuration file | | `./prompts/` | `/app/prompts/` | `rw` | Curator and system prompts | | `./logs/` | `/app/logs/` | `rw` | Debug logs (when debug=true) | ### Directory Structure ``` vera-ai-v2/ ├── config/ │ └── config.toml # Main configuration (mounted read-only) ├── prompts/ │ ├── curator_prompt.md # Prompt for memory curator │ └── systemprompt.md # System context (curator can append) ├── logs/ # Debug logs (when debug=true) ├── app/ │ ├── main.py # FastAPI application │ ├── config.py # Configuration loading │ ├── curator.py # Memory curation │ ├── proxy_handler.py # Chat request handling │ ├── qdrant_service.py # Qdrant operations │ ├── singleton.py # QdrantService singleton │ └── utils.py # Utilities ├── static/ # Legacy (symlinks to prompts/) ├── .env.example # Environment template ├── docker-compose.yml # Docker Compose config ├── Dockerfile # Container definition ├── requirements.txt # Python dependencies └── README.md # This file ``` ## Docker Compose ```yaml services: vera-ai: build: context: . dockerfile: Dockerfile args: APP_UID: ${APP_UID:-999} APP_GID: ${APP_GID:-999} image: vera-ai:latest container_name: vera-ai env_file: - .env volumes: - ./config/config.toml:/app/config/config.toml:ro - ./prompts:/app/prompts:rw - ./logs:/app/logs:rw network_mode: "host" restart: unless-stopped healthcheck: test: ["CMD", "python", "-c", "import urllib.request; urllib.request.urlopen('http://localhost:11434/')"] interval: 30s timeout: 10s retries: 3 start_period: 10s ``` ## Timezone Configuration The `TZ` environment variable sets the container timezone, which affects the scheduler: ```bash # .env file TZ=America/Chicago # Scheduler runs at: # - Daily curator: 02:00 Chicago time # - Monthly curator: 03:00 Chicago time on 1st ``` Common timezones: - `UTC` - Coordinated Universal Time - `America/New_York` - Eastern Time - `America/Chicago` - Central Time - `America/Los_Angeles` - Pacific Time - `Europe/London` - GMT/BST ## API Endpoints | Endpoint | Method | Description | |----------|--------|-------------| | `/` | GET | Health check | | `/api/chat` | POST | Chat completion (augmented with memory) | | `/api/tags` | GET | List models | | `/api/generate` | POST | Generate completion | | `/curator/run` | POST | Trigger curator manually | ## Manual Curator Trigger ```bash # Daily curation (recent 24h) curl -X POST http://localhost:11434/curator/run # Full curation (all raw memories) curl -X POST "http://localhost:11434/curator/run?full=true" ``` ## Memory System ### 4-Layer Context 1. **System Prompt**: From `prompts/systemprompt.md` 2. **Semantic Memory**: Curated Q&A pairs retrieved by relevance 3. **Recent Context**: Last N conversation turns 4. **Current Messages**: User/assistant messages from request ### Curation Schedule | Schedule | Time | What | Frequency | |----------|------|------|-----------| | Daily | 02:00 | Recent 24h raw memories | Every day | | Monthly | 03:00 on 1st | ALL raw memories | 1st of month | ### Memory Types - **raw**: Unprocessed conversation turns - **curated**: Cleaned, summarized Q&A pairs - **test**: Test entries (can be ignored) ## Troubleshooting ### Permission Denied If you see permission errors on `/app/prompts/` or `/app/logs/`: ```bash # Check your UID/GID id # Rebuild with correct UID/GID APP_UID=$(id -u) APP_GID=$(id -g) docker compose build --no-cache docker compose up -d ``` ### Timezone Issues If curator runs at wrong time: ```bash # Check container timezone docker exec vera-ai date # Set correct timezone in .env TZ=America/Chicago ``` ### Health Check Failing ```bash # Check container logs docker logs vera-ai --tail 50 # Check Ollama connectivity docker exec vera-ai python -c "import urllib.request; print(urllib.request.urlopen('http://YOUR_OLLAMA_IP:11434/').read())" # Check Qdrant connectivity docker exec vera-ai python -c "import urllib.request; print(urllib.request.urlopen('http://YOUR_QDRANT_IP:6333/').read())" ``` ### Container Not Starting ```bash # Check if port is in use sudo lsof -i :11434 # Check Docker logs docker compose logs # Rebuild from scratch docker compose down docker compose build --no-cache docker compose up -d ``` ## Development ### Building from Source ```bash # Clone repository git clone http://10.0.0.61:3000/SpeedyFoxAi/vera-ai-v2.git cd vera-ai-v2 # Install dependencies locally (optional) pip install -r requirements.txt # Build Docker image docker compose build ``` ### Running Tests ```bash # Test health endpoint curl http://localhost:11434/ # Test chat endpoint curl -X POST http://localhost:11434/api/chat \ -H "Content-Type: application/json" \ -d '{"model":"qwen3.5:397b-cloud","messages":[{"role":"user","content":"test"}],"stream":false}' # Test curator curl -X POST http://localhost:11434/curator/run ``` ## License MIT License - see LICENSE file for details. ## Support - **Issues**: http://10.0.0.61:3000/SpeedyFoxAi/vera-ai-v2/issues - **Repository**: http://10.0.0.61:3000/SpeedyFoxAi/vera-ai-v2