# Vera-AI - Persistent Memory Proxy for Ollama

**Vera** (Latin): *True* — **True AI Memory**

---

## What is Vera-AI?

Vera-AI is a transparent proxy for Ollama that adds persistent memory using Qdrant vector storage. It sits between your AI client and Ollama, automatically augmenting conversations with relevant context from previous sessions.

**Every conversation is remembered.**

---

## How It Works

```
┌─────────────────────────────────────────────────────────────────────────────────┐
│                              REQUEST FLOW                                        │
└─────────────────────────────────────────────────────────────────────────────────┘

    ┌──────────┐         ┌──────────┐         ┌──────────┐         ┌──────────┐
    │  Client  │ ──(1)──▶│ Vera-AI  │ ──(3)──▶│  Ollama  │ ──(5)──▶│ Response │
    │  (You)   │         │  Proxy   │         │   LLM    │         │  to User │
    └──────────┘         └────┬─────┘         └──────────┘         └──────────┘
                              │
                              │ (2) Query semantic memory
                              │
                              ▼
                       ┌──────────┐
                       │ Qdrant   │
                       │ Vector DB│
                       └──────────┘
                              │
                              │ (4) Store conversation turn
                              │
                              ▼
                       ┌──────────┐
                       │ Memory   │
                       │ Storage  │
                       └──────────┘
```

---

## Quick Start

### Option 1: Docker Run (Single Command)

```bash
docker run -d \
  --name VeraAI \
  --restart unless-stopped \
  --network host \
  -e APP_UID=1000 \
  -e APP_GID=1000 \
  -e TZ=America/Chicago \
  -e VERA_DEBUG=false \
  -v ./config/config.toml:/app/config/config.toml:ro \
  -v ./prompts:/app/prompts:rw \
  -v ./logs:/app/logs:rw \
  your-username/vera-ai:latest
```

### Option 2: Docker Compose

Create `docker-compose.yml`:

```yaml
services:
  vera-ai:
    image: your-username/vera-ai:latest
    container_name: VeraAI
    restart: unless-stopped
    network_mode: host
    environment:
      - APP_UID=1000
      - APP_GID=1000
      - TZ=America/Chicago
      - VERA_DEBUG=false
    volumes:
      - ./config/config.toml:/app/config/config.toml:ro
      - ./prompts:/app/prompts:rw
      - ./logs:/app/logs:rw
    healthcheck:
      test: ["CMD", "python", "-c", "import urllib.request; urllib.request.urlopen('http://localhost:11434/')"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 10s
```

Run with:

```bash
docker compose up -d
```

---

## Configuration

### Environment Variables

| Variable | Default | Description |
|----------|---------|-------------|
| `APP_UID` | `999` | Container user ID (match your host UID) |
| `APP_GID` | `999` | Container group ID (match your host GID) |
| `TZ` | `UTC` | Container timezone |
| `VERA_DEBUG` | `false` | Enable debug logging |

### config.toml

Create `config/config.toml`:

```toml
[general]
# Ollama server URL
ollama_host = "http://10.0.0.10:11434"

# Qdrant vector database URL
qdrant_host = "http://10.0.0.22:6333"

# Collection name for memories
qdrant_collection = "memories"

# Embedding model for semantic search
embedding_model = "snowflake-arctic-embed2"

# Enable debug logging (set to true for verbose logs)
debug = false

[layers]
# Token budget for semantic memory layer
semantic_token_budget = 25000

# Token budget for recent context layer
context_token_budget = 22000

# Number of recent turns to include in semantic search
semantic_search_turns = 2

# Minimum similarity score for semantic search (0.0-1.0)
semantic_score_threshold = 0.6

[curator]
# Time for daily curation (HH:MM format)
run_time = "02:00"

# Time for monthly full curation (HH:MM format)
full_run_time = "03:00"

# Day of month for full curation (1-28)
full_run_day = 1

# Model to use for curation
curator_model = "gpt-oss:120b"
```

### prompts/ Directory

Create `prompts/` directory with:

**`prompts/curator_prompt.md`** - Prompt for memory curation:
```markdown
You are a memory curator. Your job is to summarize conversation turns 
into concise Q&A pairs that will be stored for future reference.

Extract the key information and create clear, searchable entries.
```

**`prompts/systemprompt.md`** - System context for Vera:
```markdown
You are Vera, an AI with persistent memory. You remember all previous 
conversations with this user and can reference them contextually.
```

---

## Docker Options Explained

| Option | Description |
|--------|-------------|
| `-d` | Run detached (background) |
| `--name VeraAI` | Container name |
| `--restart unless-stopped` | Auto-start on boot, survive reboots |
| `--network host` | Use host network (port 11434) |
| `-e APP_UID=1000` | User ID (match your host UID) |
| `-e APP_GID=1000` | Group ID (match your host GID) |
| `-e TZ=America/Chicago` | Timezone for scheduler |
| `-e VERA_DEBUG=false` | Disable debug logging |
| `-v ...config.toml:ro` | Config file (read-only) |
| `-v ...prompts:rw` | Prompts directory (read-write) |
| `-v ...logs:rw` | Logs directory (read-write) |

---

## 📋 Prerequisites

### Required Services

| Service | Version | Description |
|---------|---------|-------------|
| **Ollama** | 0.1.x+ | LLM inference server |
| **Qdrant** | 1.6.x+ | Vector database |
| **Docker** | 20.x+ | Container runtime |

### System Requirements

| Requirement | Minimum | Recommended |
|-------------|---------|-------------|
| **CPU** | 2 cores | 4+ cores |
| **RAM** | 2 GB | 4+ GB |
| **Disk** | 1 GB | 5+ GB |

---

## 🔧 Installing with Ollama

### Option A: All on Same Host (Recommended)

Install all services on a single machine:

```bash
# 1. Install Ollama
curl https://ollama.ai/install.sh | sh

# 2. Pull required models
ollama pull snowflake-arctic-embed2  # Embedding model (required)
ollama pull llama3.1                   # Chat model

# 3. Run Qdrant in Docker
docker run -d --name qdrant -p 6333:6333 qdrant/qdrant

# 4. Run Vera-AI
docker run -d \
  --name VeraAI \
  --restart unless-stopped \
  --network host \
  -e APP_UID=$(id -u) \
  -e APP_GID=$(id -g) \
  -e TZ=America/Chicago \
  -v ./config/config.toml:/app/config/config.toml:ro \
  -v ./prompts:/app/prompts:rw \
  -v ./logs:/app/logs:rw \
  your-username/vera-ai:latest
```

**Config for same-host (config/config.toml):**
```toml
[general]
ollama_host = "http://127.0.0.1:11434"
qdrant_host = "http://127.0.0.1:6333"
qdrant_collection = "memories"
embedding_model = "snowflake-arctic-embed2"
```

### Option B: Docker Compose All-in-One

```yaml
services:
  ollama:
    image: ollama/ollama
    ports: ["11434:11434"]
    volumes: [ollama_data:/root/.ollama]

  qdrant:
    image: qdrant/qdrant
    ports: ["6333:6333"]
    volumes: [qdrant_data:/qdrant/storage]

  vera-ai:
    image: your-username/vera-ai:latest
    network_mode: host
    volumes:
      - ./config/config.toml:/app/config/config.toml:ro
      - ./prompts:/app/prompts:rw
volumes:
  ollama_data:
  qdrant_data:
```

### Option C: Different Port

If Ollama uses port 11434, run Vera on port 8080:

```bash
docker run -d --name VeraAI -p 8080:11434 ...
# Connect client to: http://localhost:8080
```

---

## ✅ Pre-Flight Checklist

- [ ] Docker installed (`docker --version`)
- [ ] Ollama running (`curl http://localhost:11434/api/tags`)
- [ ] Qdrant running (`curl http://localhost:6333/collections`)
- [ ] Embedding model (`ollama pull snowflake-arctic-embed2`)
- [ ] Chat model (`ollama pull llama3.1`)

---

---

## Features

| Feature | Description |
|---------|-------------|
| 🧠 **Persistent Memory** | Conversations stored in Qdrant, retrieved contextually |
| 📅 **Monthly Curation** | Daily + monthly cleanup of raw memories |
| 🔍 **4-Layer Context** | System + semantic + recent + current messages |
| 👤 **Configurable UID/GID** | Match container user to host for permissions |
| 🌍 **Timezone Support** | Scheduler runs in your local timezone |
| 📝 **Debug Logging** | Optional logs written to configurable directory |

---

## API Endpoints

| Endpoint | Method | Description |
|----------|--------|-------------|
| `/` | `GET` | Health check |
| `/api/chat` | `POST` | Chat completion (with memory) |
| `/api/tags` | `GET` | List models |
| `/curator/run` | `POST` | Trigger curator manually |

---

## Verify Installation

```bash
# Health check
curl http://localhost:11434/
# Expected: {"status":"ok","ollama":"reachable"}

# Check container
docker ps
# Expected: VeraAI running with (healthy) status

# Test chat
curl -X POST http://localhost:11434/api/chat \
  -H "Content-Type: application/json" \
  -d '{"model":"your-model","messages":[{"role":"user","content":"hello"}],"stream":false}'
```

---

## Troubleshooting

### Permission Denied

```bash
# Get your UID/GID
id

# Set in environment
APP_UID=$(id -u)
APP_GID=$(id -g)
```

### Wrong Timezone

```bash
# Set correct timezone
TZ=America/Chicago
```

---

## Source Code

- **Gitea**: https://speedyfox.app/SpeedyFoxAi/vera-ai-v2

---

## License

MIT License

---

Brought to you by SpeedyFoxAi