383 lines
9.7 KiB
Markdown
383 lines
9.7 KiB
Markdown
# Vera-AI - Persistent Memory Proxy for Ollama
|
|
|
|
**Vera** (Latin): *True* — **True AI Memory**
|
|
|
|
---
|
|
|
|
## What is Vera-AI?
|
|
|
|
Vera-AI is a transparent proxy for Ollama that adds persistent memory using Qdrant vector storage. It sits between your AI client and Ollama, automatically augmenting conversations with relevant context from previous sessions.
|
|
|
|
**Every conversation is remembered.**
|
|
|
|
---
|
|
|
|
## How It Works
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────────────────────────┐
|
|
│ REQUEST FLOW │
|
|
└─────────────────────────────────────────────────────────────────────────────────┘
|
|
|
|
┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
|
|
│ Client │ ──(1)──▶│ Vera-AI │ ──(3)──▶│ Ollama │ ──(5)──▶│ Response │
|
|
│ (You) │ │ Proxy │ │ LLM │ │ to User │
|
|
└──────────┘ └────┬─────┘ └──────────┘ └──────────┘
|
|
│
|
|
│ (2) Query semantic memory
|
|
│
|
|
▼
|
|
┌──────────┐
|
|
│ Qdrant │
|
|
│ Vector DB│
|
|
└──────────┘
|
|
│
|
|
│ (4) Store conversation turn
|
|
│
|
|
▼
|
|
┌──────────┐
|
|
│ Memory │
|
|
│ Storage │
|
|
└──────────┘
|
|
```
|
|
|
|
---
|
|
|
|
## Quick Start
|
|
|
|
### Option 1: Docker Run (Single Command)
|
|
|
|
```bash
|
|
docker run -d \
|
|
--name VeraAI \
|
|
--restart unless-stopped \
|
|
--network host \
|
|
-e APP_UID=1000 \
|
|
-e APP_GID=1000 \
|
|
-e TZ=America/Chicago \
|
|
-e VERA_DEBUG=false \
|
|
-v ./config/config.toml:/app/config/config.toml:ro \
|
|
-v ./prompts:/app/prompts:rw \
|
|
-v ./logs:/app/logs:rw \
|
|
your-username/vera-ai:latest
|
|
```
|
|
|
|
### Option 2: Docker Compose
|
|
|
|
Create `docker-compose.yml`:
|
|
|
|
```yaml
|
|
services:
|
|
vera-ai:
|
|
image: your-username/vera-ai:latest
|
|
container_name: VeraAI
|
|
restart: unless-stopped
|
|
network_mode: host
|
|
environment:
|
|
- APP_UID=1000
|
|
- APP_GID=1000
|
|
- TZ=America/Chicago
|
|
- VERA_DEBUG=false
|
|
volumes:
|
|
- ./config/config.toml:/app/config/config.toml:ro
|
|
- ./prompts:/app/prompts:rw
|
|
- ./logs:/app/logs:rw
|
|
healthcheck:
|
|
test: ["CMD", "python", "-c", "import urllib.request; urllib.request.urlopen('http://localhost:11434/')"]
|
|
interval: 30s
|
|
timeout: 10s
|
|
retries: 3
|
|
start_period: 10s
|
|
```
|
|
|
|
Run with:
|
|
|
|
```bash
|
|
docker compose up -d
|
|
```
|
|
|
|
---
|
|
|
|
## Configuration
|
|
|
|
### Environment Variables
|
|
|
|
| Variable | Default | Description |
|
|
|----------|---------|-------------|
|
|
| `APP_UID` | `999` | Container user ID (match your host UID) |
|
|
| `APP_GID` | `999` | Container group ID (match your host GID) |
|
|
| `TZ` | `UTC` | Container timezone |
|
|
| `VERA_DEBUG` | `false` | Enable debug logging |
|
|
|
|
### config.toml
|
|
|
|
Create `config/config.toml`:
|
|
|
|
```toml
|
|
[general]
|
|
# Ollama server URL
|
|
ollama_host = "http://10.0.0.10:11434"
|
|
|
|
# Qdrant vector database URL
|
|
qdrant_host = "http://10.0.0.22:6333"
|
|
|
|
# Collection name for memories
|
|
qdrant_collection = "memories"
|
|
|
|
# Embedding model for semantic search
|
|
embedding_model = "snowflake-arctic-embed2"
|
|
|
|
# Enable debug logging (set to true for verbose logs)
|
|
debug = false
|
|
|
|
[layers]
|
|
# Token budget for semantic memory layer
|
|
semantic_token_budget = 25000
|
|
|
|
# Token budget for recent context layer
|
|
context_token_budget = 22000
|
|
|
|
# Number of recent turns to include in semantic search
|
|
semantic_search_turns = 2
|
|
|
|
# Minimum similarity score for semantic search (0.0-1.0)
|
|
semantic_score_threshold = 0.6
|
|
|
|
[curator]
|
|
# Time for daily curation (HH:MM format)
|
|
run_time = "02:00"
|
|
|
|
# Time for monthly full curation (HH:MM format)
|
|
full_run_time = "03:00"
|
|
|
|
# Day of month for full curation (1-28)
|
|
full_run_day = 1
|
|
|
|
# Model to use for curation
|
|
curator_model = "gpt-oss:120b"
|
|
```
|
|
|
|
### prompts/ Directory
|
|
|
|
Create `prompts/` directory with:
|
|
|
|
**`prompts/curator_prompt.md`** - Prompt for memory curation:
|
|
```markdown
|
|
You are a memory curator. Your job is to summarize conversation turns
|
|
into concise Q&A pairs that will be stored for future reference.
|
|
|
|
Extract the key information and create clear, searchable entries.
|
|
```
|
|
|
|
**`prompts/systemprompt.md`** - System context for Vera:
|
|
```markdown
|
|
You are Vera, an AI with persistent memory. You remember all previous
|
|
conversations with this user and can reference them contextually.
|
|
```
|
|
|
|
---
|
|
|
|
## Docker Options Explained
|
|
|
|
| Option | Description |
|
|
|--------|-------------|
|
|
| `-d` | Run detached (background) |
|
|
| `--name VeraAI` | Container name |
|
|
| `--restart unless-stopped` | Auto-start on boot, survive reboots |
|
|
| `--network host` | Use host network (port 11434) |
|
|
| `-e APP_UID=1000` | User ID (match your host UID) |
|
|
| `-e APP_GID=1000` | Group ID (match your host GID) |
|
|
| `-e TZ=America/Chicago` | Timezone for scheduler |
|
|
| `-e VERA_DEBUG=false` | Disable debug logging |
|
|
| `-v ...config.toml:ro` | Config file (read-only) |
|
|
| `-v ...prompts:rw` | Prompts directory (read-write) |
|
|
| `-v ...logs:rw` | Logs directory (read-write) |
|
|
|
|
---
|
|
|
|
## 📋 Prerequisites
|
|
|
|
### Required Services
|
|
|
|
| Service | Version | Description |
|
|
|---------|---------|-------------|
|
|
| **Ollama** | 0.1.x+ | LLM inference server |
|
|
| **Qdrant** | 1.6.x+ | Vector database |
|
|
| **Docker** | 20.x+ | Container runtime |
|
|
|
|
### System Requirements
|
|
|
|
| Requirement | Minimum | Recommended |
|
|
|-------------|---------|-------------|
|
|
| **CPU** | 2 cores | 4+ cores |
|
|
| **RAM** | 2 GB | 4+ GB |
|
|
| **Disk** | 1 GB | 5+ GB |
|
|
|
|
---
|
|
|
|
## 🔧 Installing with Ollama
|
|
|
|
### Option A: All on Same Host (Recommended)
|
|
|
|
Install all services on a single machine:
|
|
|
|
```bash
|
|
# 1. Install Ollama
|
|
curl https://ollama.ai/install.sh | sh
|
|
|
|
# 2. Pull required models
|
|
ollama pull snowflake-arctic-embed2 # Embedding model (required)
|
|
ollama pull llama3.1 # Chat model
|
|
|
|
# 3. Run Qdrant in Docker
|
|
docker run -d --name qdrant -p 6333:6333 qdrant/qdrant
|
|
|
|
# 4. Run Vera-AI
|
|
docker run -d \
|
|
--name VeraAI \
|
|
--restart unless-stopped \
|
|
--network host \
|
|
-e APP_UID=$(id -u) \
|
|
-e APP_GID=$(id -g) \
|
|
-e TZ=America/Chicago \
|
|
-v ./config/config.toml:/app/config/config.toml:ro \
|
|
-v ./prompts:/app/prompts:rw \
|
|
-v ./logs:/app/logs:rw \
|
|
your-username/vera-ai:latest
|
|
```
|
|
|
|
**Config for same-host (config/config.toml):**
|
|
```toml
|
|
[general]
|
|
ollama_host = "http://127.0.0.1:11434"
|
|
qdrant_host = "http://127.0.0.1:6333"
|
|
qdrant_collection = "memories"
|
|
embedding_model = "snowflake-arctic-embed2"
|
|
```
|
|
|
|
### Option B: Docker Compose All-in-One
|
|
|
|
```yaml
|
|
services:
|
|
ollama:
|
|
image: ollama/ollama
|
|
ports: ["11434:11434"]
|
|
volumes: [ollama_data:/root/.ollama]
|
|
|
|
qdrant:
|
|
image: qdrant/qdrant
|
|
ports: ["6333:6333"]
|
|
volumes: [qdrant_data:/qdrant/storage]
|
|
|
|
vera-ai:
|
|
image: your-username/vera-ai:latest
|
|
network_mode: host
|
|
volumes:
|
|
- ./config/config.toml:/app/config/config.toml:ro
|
|
- ./prompts:/app/prompts:rw
|
|
volumes:
|
|
ollama_data:
|
|
qdrant_data:
|
|
```
|
|
|
|
### Option C: Different Port
|
|
|
|
If Ollama uses port 11434, run Vera on port 8080:
|
|
|
|
```bash
|
|
docker run -d --name VeraAI -p 8080:11434 ...
|
|
# Connect client to: http://localhost:8080
|
|
```
|
|
|
|
---
|
|
|
|
## ✅ Pre-Flight Checklist
|
|
|
|
- [ ] Docker installed (`docker --version`)
|
|
- [ ] Ollama running (`curl http://localhost:11434/api/tags`)
|
|
- [ ] Qdrant running (`curl http://localhost:6333/collections`)
|
|
- [ ] Embedding model (`ollama pull snowflake-arctic-embed2`)
|
|
- [ ] Chat model (`ollama pull llama3.1`)
|
|
|
|
---
|
|
|
|
---
|
|
|
|
## Features
|
|
|
|
| Feature | Description |
|
|
|---------|-------------|
|
|
| 🧠 **Persistent Memory** | Conversations stored in Qdrant, retrieved contextually |
|
|
| 📅 **Monthly Curation** | Daily + monthly cleanup of raw memories |
|
|
| 🔍 **4-Layer Context** | System + semantic + recent + current messages |
|
|
| 👤 **Configurable UID/GID** | Match container user to host for permissions |
|
|
| 🌍 **Timezone Support** | Scheduler runs in your local timezone |
|
|
| 📝 **Debug Logging** | Optional logs written to configurable directory |
|
|
|
|
---
|
|
|
|
## API Endpoints
|
|
|
|
| Endpoint | Method | Description |
|
|
|----------|--------|-------------|
|
|
| `/` | `GET` | Health check |
|
|
| `/api/chat` | `POST` | Chat completion (with memory) |
|
|
| `/api/tags` | `GET` | List models |
|
|
| `/curator/run` | `POST` | Trigger curator manually |
|
|
|
|
---
|
|
|
|
## Verify Installation
|
|
|
|
```bash
|
|
# Health check
|
|
curl http://localhost:11434/
|
|
# Expected: {"status":"ok","ollama":"reachable"}
|
|
|
|
# Check container
|
|
docker ps
|
|
# Expected: VeraAI running with (healthy) status
|
|
|
|
# Test chat
|
|
curl -X POST http://localhost:11434/api/chat \
|
|
-H "Content-Type: application/json" \
|
|
-d '{"model":"your-model","messages":[{"role":"user","content":"hello"}],"stream":false}'
|
|
```
|
|
|
|
---
|
|
|
|
## Troubleshooting
|
|
|
|
### Permission Denied
|
|
|
|
```bash
|
|
# Get your UID/GID
|
|
id
|
|
|
|
# Set in environment
|
|
APP_UID=$(id -u)
|
|
APP_GID=$(id -g)
|
|
```
|
|
|
|
### Wrong Timezone
|
|
|
|
```bash
|
|
# Set correct timezone
|
|
TZ=America/Chicago
|
|
```
|
|
|
|
---
|
|
|
|
## Source Code
|
|
|
|
- **Gitea**: https://speedyfox.app/SpeedyFoxAi/vera-ai-v2
|
|
|
|
---
|
|
|
|
## License
|
|
|
|
MIT License
|
|
|
|
---
|
|
|
|
Brought to you by SpeedyFoxAi |