# Vera-AI
### *Vera* (Latin): **True** β *True AI*
**Persistent Memory Proxy for Ollama**
*A transparent proxy that gives your AI conversations lasting memory.*
[](https://hub.docker.com/r/vera-ai/latest)
[](LICENSE)
[](http://10.0.0.61:3000/SpeedyFoxAi/vera-ai-v2)
---
**Vera-AI sits between your AI client and Ollama, automatically augmenting conversations with relevant context from previous sessions.**
Every conversation is stored in Qdrant vector database and retrieved contextually β giving your AI **true memory**.
---
## π How It Works
```
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β REQUEST FLOW β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββ
β Client β ββ(1)βββΆβ Vera-AI β ββ(3)βββΆβ Ollama β ββ(5)βββΆβ Response β
β (You) β β Proxy β β LLM β β to User β
ββββββββββββ ββββββ¬ββββββ ββββββββββββ ββββββββββββ
β
β (2) Query semantic memory
β
βΌ
ββββββββββββ
β Qdrant β
β Vector DBβ
ββββββββββββ
β
β (4) Store conversation turn
β
βΌ
ββββββββββββ
β Memory β
β Storage β
ββββββββββββ
```
---
## π Features
| Feature | Description |
|---------|-------------|
| **π§ Persistent Memory** | Conversations stored in Qdrant, retrieved contextually |
| **π
Monthly Curation** | Daily + monthly cleanup of raw memories |
| **π 4-Layer Context** | System + semantic + recent + current messages |
| **π€ Configurable UID/GID** | Match container user to host for permissions |
| **π Timezone Support** | Scheduler runs in your local timezone |
| **π Debug Logging** | Optional logs written to configurable directory |
| **π³ Docker Ready** | One-command build and run |
## π Prerequisites
| Requirement | Description |
|-------------|-------------|
| **Ollama** | LLM inference server (e.g., `http://10.0.0.10:11434`) |
| **Qdrant** | Vector database (e.g., `http://10.0.0.22:6333`) |
| **Docker** | Docker and Docker Compose installed |
| **Git** | For cloning the repository |
---
## π³ Docker Deployment
### Option 1: Docker Run (Single Command)
```bash
docker run -d \
--name VeraAI \
--restart unless-stopped \
--network host \
-e APP_UID=1000 \
-e APP_GID=1000 \
-e TZ=America/Chicago \
-e VERA_DEBUG=false \
-v ./config/config.toml:/app/config/config.toml:ro \
-v ./prompts:/app/prompts:rw \
-v ./logs:/app/logs:rw \
your-username/vera-ai:latest
```
### Option 2: Docker Compose
Create `docker-compose.yml`:
```yaml
services:
vera-ai:
image: your-username/vera-ai:latest
container_name: VeraAI
restart: unless-stopped
network_mode: host
environment:
- APP_UID=1000
- APP_GID=1000
- TZ=America/Chicago
- VERA_DEBUG=false
volumes:
- ./config/config.toml:/app/config/config.toml:ro
- ./prompts:/app/prompts:rw
- ./logs:/app/logs:rw
healthcheck:
test: ["CMD", "python", "-c", "import urllib.request; urllib.request.urlopen('http://localhost:11434/')"]
interval: 30s
timeout: 10s
retries: 3
start_period: 10s
```
Run with:
```bash
docker compose up -d
```
### Docker Options Explained
| Option | Description |
|--------|-------------|
| `-d` | Run detached (background) |
| `--name VeraAI` | Container name |
| `--restart unless-stopped` | Auto-start on boot, survive reboots |
| `--network host` | Use host network (port 11434) |
| `-e APP_UID=1000` | User ID (match your host UID) |
| `-e APP_GID=1000` | Group ID (match your host GID) |
| `-e TZ=America/Chicago` | Timezone for scheduler |
| `-e VERA_DEBUG=false` | Disable debug logging |
| `-v ...:ro` | Config file (read-only) |
| `-v ...:rw` | Prompts and logs (read-write) |
---
## βοΈ Configuration
### Environment Variables
| Variable | Default | Description |
|----------|---------|-------------|
| `APP_UID` | `999` | Container user ID (match host) |
| `APP_GID` | `999` | Container group ID (match host) |
| `TZ` | `UTC` | Container timezone |
| `VERA_DEBUG` | `false` | Enable debug logging |
| `OPENROUTER_API_KEY` | - | Cloud model routing key |
| `VERA_CONFIG_DIR` | `/app/config` | Config directory |
| `VERA_PROMPTS_DIR` | `/app/prompts` | Prompts directory |
| `VERA_LOG_DIR` | `/app/logs` | Debug logs directory |
### config.toml
Create `config/config.toml` with all settings:
```toml
[general]
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
# General Settings
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
# Ollama server URL
ollama_host = "http://10.0.0.10:11434"
# Qdrant vector database URL
qdrant_host = "http://10.0.0.22:6333"
# Collection name for memories
qdrant_collection = "memories"
# Embedding model for semantic search
embedding_model = "snowflake-arctic-embed2"
# Enable debug logging (set to true for verbose logs)
debug = false
[layers]
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
# Context Layer Settings
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
# Token budget for semantic memory layer
# Controls how much curated memory can be included
semantic_token_budget = 25000
# Token budget for recent context layer
# Controls how much recent conversation can be included
context_token_budget = 22000
# Number of recent turns to include in semantic search
# Higher = more context, but slower
semantic_search_turns = 2
# Minimum similarity score for semantic search (0.0-1.0)
# Higher = more relevant results, but fewer matches
semantic_score_threshold = 0.6
[curator]
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
# Curation Settings
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
# Time for daily curation (HH:MM format, 24-hour)
# Processes raw memories from last 24h
run_time = "02:00"
# Time for monthly full curation (HH:MM format, 24-hour)
# Processes ALL raw memories
full_run_time = "03:00"
# Day of month for full curation (1-28)
full_run_day = 1
# Model to use for curation
# Should be a capable model for summarization
curator_model = "gpt-oss:120b"
```
### prompts/ Directory
Create `prompts/` directory with:
**`prompts/curator_prompt.md`** - Prompt for memory curation:
```markdown
You are a memory curator. Your job is to summarize conversation turns
into concise Q&A pairs that will be stored for future reference.
Extract the key information and create clear, searchable entries.
Focus on facts, decisions, and important context.
```
**`prompts/systemprompt.md`** - System context for Vera:
```markdown
You are Vera, an AI with persistent memory. You remember all previous
conversations with this user and can reference them contextually.
Use the provided context to give informed, personalized responses.
```
---
## π Quick Start (From Source)
```bash
# 1. Clone
git clone http://10.0.0.61:3000/SpeedyFoxAi/vera-ai-v2.git
cd vera-ai-v2
# 2. Configure
cp .env.example .env
nano .env # Set APP_UID, APP_GID, TZ
# 3. Create directories and config
mkdir -p config prompts logs
cp config.toml config/
nano config/config.toml # Set ollama_host, qdrant_host
# 4. Create prompts
nano prompts/curator_prompt.md
nano prompts/systemprompt.md
# 5. Run
docker compose build
docker compose up -d
# 6. Test
curl http://localhost:11434/
# Expected: {"status":"ok","ollama":"reachable"}
```
---
## π Full Setup Guide
### Step 1: Clone Repository
```bash
git clone http://10.0.0.61:3000/SpeedyFoxAi/vera-ai-v2.git
cd vera-ai-v2
```
### Step 2: Environment Configuration
Create `.env` file:
```bash
# User/Group Configuration
APP_UID=1000 # Run: id -u to get your UID
APP_GID=1000 # Run: id -g to get your GID
# Timezone Configuration
TZ=America/Chicago
# Debug Logging
VERA_DEBUG=false
```
### Step 3: Directory Structure
```bash
# Create required directories
mkdir -p config prompts logs
# Copy default configuration
cp config.toml config/
# Verify prompts exist
ls -la prompts/
# Should show: curator_prompt.md, systemprompt.md
```
### Step 4: Configure Services
Edit `config/config.toml` (see full example above)
### Step 5: Build and Run
```bash
# Build with your UID/GID
APP_UID=$(id -u) APP_GID=$(id -g) docker compose build
# Start container
docker compose up -d
# Check status
docker ps
docker logs VeraAI --tail 20
```
### Step 6: Verify Installation
```bash
# Health check
curl http://localhost:11434/
# Expected: {"status":"ok","ollama":"reachable"}
# Container status
docker ps --format "table {{.Names}}\t{{.Status}}"
# Expected: VeraAI Up X minutes (healthy)
# Timezone
docker exec VeraAI date
# Should show your timezone (e.g., CDT for America/Chicago)
# User permissions
docker exec VeraAI id
# Expected: uid=1000(appuser) gid=1000(appgroup)
# Directories
docker exec VeraAI ls -la /app/prompts/
# Should show: curator_prompt.md, systemprompt.md
# Test chat
curl -X POST http://localhost:11434/api/chat \
-H "Content-Type: application/json" \
-d '{"model":"qwen3.5:397b-cloud","messages":[{"role":"user","content":"hello"}],"stream":false}'
```
---
## π Volume Mappings
| Host Path | Container Path | Mode | Purpose |
|-----------|----------------|------|---------|
| `./config/config.toml` | `/app/config/config.toml` | `ro` | Configuration |
| `./prompts/` | `/app/prompts/` | `rw` | Curator prompts |
| `./logs/` | `/app/logs/` | `rw` | Debug logs |
### Directory Structure
```
vera-ai-v2/
βββ config/
β βββ config.toml # Main configuration
βββ prompts/
β βββ curator_prompt.md # Memory curation prompt
β βββ systemprompt.md # System context
βββ logs/ # Debug logs (when debug=true)
βββ app/
β βββ main.py # FastAPI application
β βββ config.py # Configuration loader
β βββ curator.py # Memory curation
β βββ proxy_handler.py # Chat handling
β βββ qdrant_service.py # Vector operations
β βββ singleton.py # QdrantService singleton
β βββ utils.py # Utilities
βββ static/ # Legacy symlinks
βββ .env.example # Environment template
βββ docker-compose.yml # Docker Compose
βββ Dockerfile # Container definition
βββ requirements.txt # Python dependencies
βββ README.md # This file
```
---
## π Timezone Configuration
The `TZ` variable sets the container timezone for the scheduler:
```bash
# Common timezones
TZ=UTC # Coordinated Universal Time
TZ=America/New_York # Eastern Time
TZ=America/Chicago # Central Time
TZ=America/Los_Angeles # Pacific Time
TZ=Europe/London # GMT/BST
```
**Curation Schedule:**
| Schedule | Time | What | Frequency |
|----------|------|------|-----------|
| Daily | 02:00 | Recent 24h | Every day |
| Monthly | 03:00 on 1st | ALL raw memories | 1st of month |
---
## π API Endpoints
| Endpoint | Method | Description |
|----------|--------|-------------|
| `/` | `GET` | Health check |
| `/api/chat` | `POST` | Chat completion (with memory) |
| `/api/tags` | `GET` | List available models |
| `/api/generate` | `POST` | Generate completion |
| `/curator/run` | `POST` | Trigger curator manually |
### Manual Curation
```bash
# Daily curation (recent 24h)
curl -X POST http://localhost:11434/curator/run
# Full curation (all raw memories)
curl -X POST "http://localhost:11434/curator/run?full=true"
```
---
## π§ Memory System
### 4-Layer Context Build
```
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Layer 1: System Prompt β
β β’ From prompts/systemprompt.md β
β β’ Preserved unchanged, passed through β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Layer 2: Semantic Memory β
β β’ Query Qdrant with user question β
β β’ Retrieve curated Q&A pairs by relevance β
β β’ Limited by semantic_token_budget β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Layer 3: Recent Context β
β β’ Last N conversation turns from Qdrant β
β β’ Chronological order, recent memories first β
β β’ Limited by context_token_budget β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Layer 4: Current Messages β
β β’ User message from current request β
β β’ Passed through unchanged β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
```
### Memory Types
| Type | Description | Retention |
|------|-------------|-----------|
| `raw` | Unprocessed conversation turns | Until curation |
| `curated` | Cleaned Q&A pairs | Permanent |
| `test` | Test entries | Can be ignored |
### Curation Process
1. **Daily (02:00)**: Processes raw memories from last 24h into curated Q&A pairs
2. **Monthly (03:00 on 1st)**: Processes ALL remaining raw memories for full cleanup
---
## π§ Troubleshooting
### Permission Denied
```bash
# Check your UID/GID
id
# Rebuild with correct values
APP_UID=$(id -u) APP_GID=$(id -g) docker compose build --no-cache
docker compose up -d
```
### Wrong Timezone
```bash
# Check container time
docker exec VeraAI date
# Fix in .env
TZ=America/Chicago
```
### Health Check Failing
```bash
# Check logs
docker logs VeraAI --tail 50
# Test Ollama connectivity
docker exec VeraAI python -c "
import urllib.request
print(urllib.request.urlopen('http://YOUR_OLLAMA_IP:11434/').read())
"
# Test Qdrant connectivity
docker exec VeraAI python -c "
import urllib.request
print(urllib.request.urlopen('http://YOUR_QDRANT_IP:6333/').read())
"
```
### Port Already in Use
```bash
# Check what's using port 11434
sudo lsof -i :11434
# Stop conflicting service or change port in config
```
---
## π οΈ Development
### Build from Source
```bash
git clone http://10.0.0.61:3000/SpeedyFoxAi/vera-ai-v2.git
cd vera-ai-v2
pip install -r requirements.txt
docker compose build
```
### Run Tests
```bash
# Health check
curl http://localhost:11434/
# Non-streaming chat
curl -X POST http://localhost:11434/api/chat \
-H "Content-Type: application/json" \
-d '{"model":"qwen3.5:397b-cloud","messages":[{"role":"user","content":"test"}],"stream":false}'
# Trigger curation
curl -X POST http://localhost:11434/curator/run
```
---
## π License
MIT License - see [LICENSE](LICENSE) file for details.
---
## π€ Support
| Resource | Link |
|----------|------|
| **Repository** | http://10.0.0.61:3000/SpeedyFoxAi/vera-ai-v2 |
| **Issues** | http://10.0.0.61:3000/SpeedyFoxAi/vera-ai-v2/issues |
---
**Vera-AI** β *True AI Memory*
Brought to you by SpeedyFoxAi