10 KiB
10 KiB
Vera-AI: Persistent Memory Proxy for Ollama
Vera-AI is a transparent proxy for Ollama that adds persistent memory using Qdrant vector storage. It sits between your AI client and Ollama, automatically augmenting conversations with relevant context from previous sessions.
Features
- Persistent Memory: Conversations are stored in Qdrant and retrieved contextually
- Monthly Curation: Daily and monthly cleanup of raw memories
- 4-Layer Context: System prompt + semantic memory + recent context + current messages
- Configurable UID/GID: Match container user to host user for volume permissions
- Timezone Support: Scheduler runs in your local timezone
- Debug Logging: Optional debug logs written to configurable directory
Prerequisites
- Ollama: Running LLM inference server (e.g.,
http://10.0.0.10:11434) - Qdrant: Running vector database (e.g.,
http://10.0.0.22:6333) - Docker: Docker and Docker Compose installed
- Git: For cloning the repository
Quick Start
# Clone the repository
git clone http://10.0.0.61:3000/SpeedyFoxAi/vera-ai-v2.git
cd vera-ai-v2
# Create environment file from template
cp .env.example .env
# Edit .env with your settings
nano .env
# Create required directories
mkdir -p config prompts logs
# Copy default config (or create your own)
cp config.toml config/
# Build and run
docker compose build
docker compose up -d
# Test
curl http://localhost:11434/
Full Setup Instructions
1. Clone Repository
git clone http://10.0.0.61:3000/SpeedyFoxAi/vera-ai-v2.git
cd vera-ai-v2
2. Create Environment File
Create .env file (or copy from .env.example):
# User/Group Configuration (match your host user)
APP_UID=1000
APP_GID=1000
# Timezone Configuration
TZ=America/Chicago
# API Keys (optional)
# OPENROUTER_API_KEY=your_api_key_here
Important: APP_UID and APP_GID must match your host user's UID/GID for volume permissions:
# Get your UID and GID
id -u # UID
id -g # GID
# Set in .env
APP_UID=1000 # Replace with your UID
APP_GID=1000 # Replace with your GID
3. Create Required Directories
# Create directories
mkdir -p config prompts logs
# Copy default configuration
cp config.toml config/
# Verify prompts exist (should be in the repo)
ls -la prompts/
# Should show: curator_prompt.md, systemprompt.md
4. Configure Ollama and Qdrant
Edit config/config.toml:
[general]
ollama_host = "http://YOUR_OLLAMA_IP:11434"
qdrant_host = "http://YOUR_QDRANT_IP:6333"
qdrant_collection = "memories"
embedding_model = "snowflake-arctic-embed2"
debug = false
[layers]
semantic_token_budget = 25000
context_token_budget = 22000
semantic_search_turns = 2
semantic_score_threshold = 0.6
[curator]
run_time = "02:00" # Daily curator time
full_run_time = "03:00" # Monthly full curator time
full_run_day = 1 # Day of month (1st)
curator_model = "gpt-oss:120b"
5. Build and Run
# Build with your UID/GID
APP_UID=$(id -u) APP_GID=$(id -g) docker compose build
# Run with timezone
docker compose up -d
# Check status
docker ps
docker logs vera-ai --tail 20
# Test health endpoint
curl http://localhost:11434/
# Expected: {"status":"ok","ollama":"reachable"}
6. Verify Installation
# Check container is healthy
docker ps --format "table {{.Names}}\t{{.Status}}"
# Expected: vera-ai Up X minutes (healthy)
# Check timezone
docker exec vera-ai date
# Should show your timezone (e.g., CDT for America/Chicago)
# Check user
docker exec vera-ai id
# Expected: uid=1000(appuser) gid=1000(appgroup)
# Check directories
docker exec vera-ai ls -la /app/prompts/
# Should show: curator_prompt.md, systemprompt.md
docker exec vera-ai ls -la /app/logs/
# Should be writable
# Test chat
curl -X POST http://localhost:11434/api/chat \
-H "Content-Type: application/json" \
-d '{"model":"YOUR_MODEL","messages":[{"role":"user","content":"hello"}],"stream":false}'
Configuration
Environment Variables (.env)
| Variable | Default | Description |
|---|---|---|
APP_UID |
999 |
User ID for container user (match your host UID) |
APP_GID |
999 |
Group ID for container group (match your host GID) |
TZ |
UTC |
Timezone for scheduler |
OPENROUTER_API_KEY |
- | API key for cloud model routing (optional) |
VERA_CONFIG_DIR |
/app/config |
Configuration directory (optional) |
VERA_PROMPTS_DIR |
/app/prompts |
Prompts directory (optional) |
VERA_LOG_DIR |
/app/logs |
Debug log directory (optional) |
Volume Mappings
| Host Path | Container Path | Mode | Purpose |
|---|---|---|---|
./config/config.toml |
/app/config/config.toml |
ro |
Configuration file |
./prompts/ |
/app/prompts/ |
rw |
Curator and system prompts |
./logs/ |
/app/logs/ |
rw |
Debug logs (when debug=true) |
Directory Structure
vera-ai-v2/
├── config/
│ └── config.toml # Main configuration (mounted read-only)
├── prompts/
│ ├── curator_prompt.md # Prompt for memory curator
│ └── systemprompt.md # System context (curator can append)
├── logs/ # Debug logs (when debug=true)
├── app/
│ ├── main.py # FastAPI application
│ ├── config.py # Configuration loading
│ ├── curator.py # Memory curation
│ ├── proxy_handler.py # Chat request handling
│ ├── qdrant_service.py # Qdrant operations
│ ├── singleton.py # QdrantService singleton
│ └── utils.py # Utilities
├── static/ # Legacy (symlinks to prompts/)
├── .env.example # Environment template
├── docker-compose.yml # Docker Compose config
├── Dockerfile # Container definition
├── requirements.txt # Python dependencies
└── README.md # This file
Docker Compose
services:
vera-ai:
build:
context: .
dockerfile: Dockerfile
args:
APP_UID: ${APP_UID:-999}
APP_GID: ${APP_GID:-999}
image: vera-ai:latest
container_name: vera-ai
env_file:
- .env
volumes:
- ./config/config.toml:/app/config/config.toml:ro
- ./prompts:/app/prompts:rw
- ./logs:/app/logs:rw
network_mode: "host"
restart: unless-stopped
healthcheck:
test: ["CMD", "python", "-c", "import urllib.request; urllib.request.urlopen('http://localhost:11434/')"]
interval: 30s
timeout: 10s
retries: 3
start_period: 10s
Timezone Configuration
The TZ environment variable sets the container timezone, which affects the scheduler:
# .env file
TZ=America/Chicago
# Scheduler runs at:
# - Daily curator: 02:00 Chicago time
# - Monthly curator: 03:00 Chicago time on 1st
Common timezones:
UTC- Coordinated Universal TimeAmerica/New_York- Eastern TimeAmerica/Chicago- Central TimeAmerica/Los_Angeles- Pacific TimeEurope/London- GMT/BST
API Endpoints
| Endpoint | Method | Description |
|---|---|---|
/ |
GET | Health check |
/api/chat |
POST | Chat completion (augmented with memory) |
/api/tags |
GET | List models |
/api/generate |
POST | Generate completion |
/curator/run |
POST | Trigger curator manually |
Manual Curator Trigger
# Daily curation (recent 24h)
curl -X POST http://localhost:11434/curator/run
# Full curation (all raw memories)
curl -X POST "http://localhost:11434/curator/run?full=true"
Memory System
4-Layer Context
- System Prompt: From
prompts/systemprompt.md - Semantic Memory: Curated Q&A pairs retrieved by relevance
- Recent Context: Last N conversation turns
- Current Messages: User/assistant messages from request
Curation Schedule
| Schedule | Time | What | Frequency |
|---|---|---|---|
| Daily | 02:00 | Recent 24h raw memories | Every day |
| Monthly | 03:00 on 1st | ALL raw memories | 1st of month |
Memory Types
- raw: Unprocessed conversation turns
- curated: Cleaned, summarized Q&A pairs
- test: Test entries (can be ignored)
Troubleshooting
Permission Denied
If you see permission errors on /app/prompts/ or /app/logs/:
# Check your UID/GID
id
# Rebuild with correct UID/GID
APP_UID=$(id -u) APP_GID=$(id -g) docker compose build --no-cache
docker compose up -d
Timezone Issues
If curator runs at wrong time:
# Check container timezone
docker exec vera-ai date
# Set correct timezone in .env
TZ=America/Chicago
Health Check Failing
# Check container logs
docker logs vera-ai --tail 50
# Check Ollama connectivity
docker exec vera-ai python -c "import urllib.request; print(urllib.request.urlopen('http://YOUR_OLLAMA_IP:11434/').read())"
# Check Qdrant connectivity
docker exec vera-ai python -c "import urllib.request; print(urllib.request.urlopen('http://YOUR_QDRANT_IP:6333/').read())"
Container Not Starting
# Check if port is in use
sudo lsof -i :11434
# Check Docker logs
docker compose logs
# Rebuild from scratch
docker compose down
docker compose build --no-cache
docker compose up -d
Development
Building from Source
# Clone repository
git clone http://10.0.0.61:3000/SpeedyFoxAi/vera-ai-v2.git
cd vera-ai-v2
# Install dependencies locally (optional)
pip install -r requirements.txt
# Build Docker image
docker compose build
Running Tests
# Test health endpoint
curl http://localhost:11434/
# Test chat endpoint
curl -X POST http://localhost:11434/api/chat \
-H "Content-Type: application/json" \
-d '{"model":"qwen3.5:397b-cloud","messages":[{"role":"user","content":"test"}],"stream":false}'
# Test curator
curl -X POST http://localhost:11434/curator/run
License
MIT License - see LICENSE file for details.