SpeedyFoxAi/vera-ai-v2

Fork 0

Files

Vera-AI 7db83f096f Update Gitea URL to public internet address

2026-03-26 15:34:38 -05:00

20 KiB

Raw Blame History

Vera-AI

Vera (Latin): True — True AI

Persistent Memory Proxy for Ollama

A transparent proxy that gives your AI conversations lasting memory.

Vera-AI sits between your AI client and Ollama, automatically augmenting conversations with relevant context from previous sessions.

Every conversation is stored in Qdrant vector database and retrieved contextually — giving your AI true memory.

🔄 How It Works

┌─────────────────────────────────────────────────────────────────────────────────┐
│                              REQUEST FLOW                                        │
└─────────────────────────────────────────────────────────────────────────────────┘

    ┌──────────┐         ┌──────────┐         ┌──────────┐         ┌──────────┐
    │  Client  │ ──(1)──▶│ Vera-AI  │ ──(3)──▶│  Ollama  │ ──(5)──▶│ Response │
    │  (You)   │         │  Proxy   │         │   LLM    │         │  to User │
    └──────────┘         └────┬─────┘         └──────────┘         └──────────┘
                              │
                              │ (2) Query semantic memory
                              │
                              ▼
                       ┌──────────┐
                       │ Qdrant   │
                       │ Vector DB│
                       └──────────┘
                              │
                              │ (4) Store conversation turn
                              │
                              ▼
                       ┌──────────┐
                       │ Memory   │
                       │ Storage  │
                       └──────────┘

🌟 Features

Feature	Description
🧠 Persistent Memory	Conversations stored in Qdrant, retrieved contextually
📅 Monthly Curation	Daily + monthly cleanup of raw memories
🔍 4-Layer Context	System + semantic + recent + current messages
👤 Configurable UID/GID	Match container user to host for permissions
🌍 Timezone Support	Scheduler runs in your local timezone
📝 Debug Logging	Optional logs written to configurable directory
🐳 Docker Ready	One-command build and run

📋 Prerequisites

Required Services

Service	Version	Description
Ollama	0.1.x+	LLM inference server
Qdrant	1.6.x+	Vector database
Docker	20.x+	Container runtime

System Requirements

Requirement	Minimum	Recommended
CPU	2 cores	4+ cores
RAM	2 GB	4+ GB
Disk	1 GB	5+ GB

🔧 Installing with Ollama

Option A: All on Same Host (Recommended)

Install all services on a single machine:

# 1. Install Ollama
curl https://ollama.ai/install.sh | sh

# 2. Pull required models
ollama pull snowflake-arctic-embed2  # Embedding model (required)
ollama pull llama3.1                   # Chat model

# 3. Run Qdrant in Docker
docker run -d --name qdrant -p 6333:6333 qdrant/qdrant

# 4. Run Vera-AI
docker run -d \
  --name VeraAI \
  --restart unless-stopped \
  --network host \
  -e APP_UID=$(id -u) \
  -e APP_GID=$(id -g) \
  -e TZ=America/Chicago \
  -v ./config/config.toml:/app/config/config.toml:ro \
  -v ./prompts:/app/prompts:rw \
  -v ./logs:/app/logs:rw \
  your-username/vera-ai:latest

Config for same-host (config/config.toml):

[general]
ollama_host = "http://127.0.0.1:11434"
qdrant_host = "http://127.0.0.1:6333"
qdrant_collection = "memories"
embedding_model = "snowflake-arctic-embed2"

Option B: Docker Compose All-in-One

services:
  ollama:
    image: ollama/ollama
    ports: ["11434:11434"]
    volumes: [ollama_data:/root/.ollama]

  qdrant:
    image: qdrant/qdrant
    ports: ["6333:6333"]
    volumes: [qdrant_data:/qdrant/storage]

  vera-ai:
    image: your-username/vera-ai:latest
    network_mode: host
    volumes:
      - ./config/config.toml:/app/config/config.toml:ro
      - ./prompts:/app/prompts:rw
volumes:
  ollama_data:
  qdrant_data:

Option C: Different Port

If Ollama uses port 11434, run Vera on port 8080:

docker run -d --name VeraAI -p 8080:11434 ...
# Connect client to: http://localhost:8080

✅ Pre-Flight Checklist

Docker installed (docker --version)
Ollama running (curl http://localhost:11434/api/tags)
Qdrant running (curl http://localhost:6333/collections)
Embedding model (ollama pull snowflake-arctic-embed2)
Chat model (ollama pull llama3.1)

🐳 Docker Deployment

Option 1: Docker Run (Single Command)

docker run -d \
  --name VeraAI \
  --restart unless-stopped \
  --network host \
  -e APP_UID=1000 \
  -e APP_GID=1000 \
  -e TZ=America/Chicago \
  -e VERA_DEBUG=false \
  -v ./config/config.toml:/app/config/config.toml:ro \
  -v ./prompts:/app/prompts:rw \
  -v ./logs:/app/logs:rw \
  your-username/vera-ai:latest

Option 2: Docker Compose

Create docker-compose.yml:

services:
  vera-ai:
    image: your-username/vera-ai:latest
    container_name: VeraAI
    restart: unless-stopped
    network_mode: host
    environment:
      - APP_UID=1000
      - APP_GID=1000
      - TZ=America/Chicago
      - VERA_DEBUG=false
    volumes:
      - ./config/config.toml:/app/config/config.toml:ro
      - ./prompts:/app/prompts:rw
      - ./logs:/app/logs:rw
    healthcheck:
      test: ["CMD", "python", "-c", "import urllib.request; urllib.request.urlopen('http://localhost:11434/')"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 10s

Run with:

docker compose up -d

Docker Options Explained

Option	Description
`-d`	Run detached (background)
`--name VeraAI`	Container name
`--restart unless-stopped`	Auto-start on boot, survive reboots
`--network host`	Use host network (port 11434)
`-e APP_UID=1000`	User ID (match your host UID)
`-e APP_GID=1000`	Group ID (match your host GID)
`-e TZ=America/Chicago`	Timezone for scheduler
`-e VERA_DEBUG=false`	Disable debug logging
`-v ...:ro`	Config file (read-only)
`-v ...:rw`	Prompts and logs (read-write)

⚙️ Configuration

Environment Variables

Variable	Default	Description
`APP_UID`	`999`	Container user ID (match host)
`APP_GID`	`999`	Container group ID (match host)
`TZ`	`UTC`	Container timezone
`VERA_DEBUG`	`false`	Enable debug logging
`OPENROUTER_API_KEY`	-	Cloud model routing key
`VERA_CONFIG_DIR`	`/app/config`	Config directory
`VERA_PROMPTS_DIR`	`/app/prompts`	Prompts directory
`VERA_LOG_DIR`	`/app/logs`	Debug logs directory

config.toml

Create config/config.toml with all settings:

[general]
# ═══════════════════════════════════════════════════════════════
# General Settings
# ═══════════════════════════════════════════════════════════════

# Ollama server URL
ollama_host = "http://10.0.0.10:11434"

# Qdrant vector database URL
qdrant_host = "http://10.0.0.22:6333"

# Collection name for memories
qdrant_collection = "memories"

# Embedding model for semantic search
embedding_model = "snowflake-arctic-embed2"

# Enable debug logging (set to true for verbose logs)
debug = false

[layers]
# ═══════════════════════════════════════════════════════════════
# Context Layer Settings
# ═══════════════════════════════════════════════════════════════

# Token budget for semantic memory layer
# Controls how much curated memory can be included
semantic_token_budget = 25000

# Token budget for recent context layer
# Controls how much recent conversation can be included
context_token_budget = 22000

# Number of recent turns to include in semantic search
# Higher = more context, but slower
semantic_search_turns = 2

# Minimum similarity score for semantic search (0.0-1.0)
# Higher = more relevant results, but fewer matches
semantic_score_threshold = 0.6

[curator]
# ═══════════════════════════════════════════════════════════════
# Curation Settings
# ═══════════════════════════════════════════════════════════════

# Time for daily curation (HH:MM format, 24-hour)
# Processes raw memories from last 24h
run_time = "02:00"

# Time for monthly full curation (HH:MM format, 24-hour)
# Processes ALL raw memories
full_run_time = "03:00"

# Day of month for full curation (1-28)
full_run_day = 1

# Model to use for curation
# Should be a capable model for summarization
curator_model = "gpt-oss:120b"

prompts/ Directory

Create prompts/ directory with:

prompts/curator_prompt.md - Prompt for memory curation:

You are a memory curator. Your job is to summarize conversation turns 
into concise Q&A pairs that will be stored for future reference.

Extract the key information and create clear, searchable entries.
Focus on facts, decisions, and important context.

prompts/systemprompt.md - System context for Vera:

You are Vera, an AI with persistent memory. You remember all previous 
conversations with this user and can reference them contextually.

Use the provided context to give informed, personalized responses.

🚀 Quick Start (From Source)

# 1. Clone
git clone https://speedyfox.app/SpeedyFoxAi/vera-ai-v2.git
cd vera-ai-v2

# 2. Configure
cp .env.example .env
nano .env                    # Set APP_UID, APP_GID, TZ

# 3. Create directories and config
mkdir -p config prompts logs
cp config.toml config/
nano config/config.toml     # Set ollama_host, qdrant_host

# 4. Create prompts
nano prompts/curator_prompt.md
nano prompts/systemprompt.md

# 5. Run
docker compose build
docker compose up -d

# 6. Test
curl http://localhost:11434/
# Expected: {"status":"ok","ollama":"reachable"}

📖 Full Setup Guide

Step 1: Clone Repository

git clone https://speedyfox.app/SpeedyFoxAi/vera-ai-v2.git
cd vera-ai-v2

Step 2: Environment Configuration

Create .env file:

# User/Group Configuration
APP_UID=1000    # Run: id -u  to get your UID
APP_GID=1000    # Run: id -g  to get your GID

# Timezone Configuration
TZ=America/Chicago

# Debug Logging
VERA_DEBUG=false

Step 3: Directory Structure

# Create required directories
mkdir -p config prompts logs

# Copy default configuration
cp config.toml config/

# Verify prompts exist
ls -la prompts/
# Should show: curator_prompt.md, systemprompt.md

Step 4: Configure Services

Edit config/config.toml (see full example above)

Step 5: Build and Run

# Build with your UID/GID
APP_UID=$(id -u) APP_GID=$(id -g) docker compose build

# Start container
docker compose up -d

# Check status
docker ps
docker logs VeraAI --tail 20

Step 6: Verify Installation

# Health check
curl http://localhost:11434/
# Expected: {"status":"ok","ollama":"reachable"}

# Container status
docker ps --format "table {{.Names}}\t{{.Status}}"
# Expected: VeraAI   Up X minutes (healthy)

# Timezone
docker exec VeraAI date
# Should show your timezone (e.g., CDT for America/Chicago)

# User permissions
docker exec VeraAI id
# Expected: uid=1000(appuser) gid=1000(appgroup)

# Directories
docker exec VeraAI ls -la /app/prompts/
# Should show: curator_prompt.md, systemprompt.md

# Test chat
curl -X POST http://localhost:11434/api/chat \
  -H "Content-Type: application/json" \
  -d '{"model":"qwen3.5:397b-cloud","messages":[{"role":"user","content":"hello"}],"stream":false}'

📁 Volume Mappings

Host Path	Container Path	Mode	Purpose
`./config/config.toml`	`/app/config/config.toml`	`ro`	Configuration
`./prompts/`	`/app/prompts/`	`rw`	Curator prompts
`./logs/`	`/app/logs/`	`rw`	Debug logs

Directory Structure

vera-ai-v2/
├── config/
│   └── config.toml        # Main configuration
├── prompts/
│   ├── curator_prompt.md  # Memory curation prompt
│   └── systemprompt.md    # System context
├── logs/                  # Debug logs (when debug=true)
├── app/
│   ├── main.py            # FastAPI application
│   ├── config.py          # Configuration loader
│   ├── curator.py         # Memory curation
│   ├── proxy_handler.py   # Chat handling
│   ├── qdrant_service.py  # Vector operations
│   ├── singleton.py       # QdrantService singleton
│   └── utils.py           # Utilities
├── static/                # Legacy symlinks
├── .env.example           # Environment template
├── docker-compose.yml     # Docker Compose
├── Dockerfile             # Container definition
├── requirements.txt       # Python dependencies
└── README.md              # This file

🌍 Timezone Configuration

The TZ variable sets the container timezone for the scheduler:

# Common timezones
TZ=UTC                  # Coordinated Universal Time
TZ=America/New_York     # Eastern Time
TZ=America/Chicago      # Central Time
TZ=America/Los_Angeles  # Pacific Time
TZ=Europe/London        # GMT/BST

Curation Schedule:

Schedule	Time	What	Frequency
Daily	02:00	Recent 24h	Every day
Monthly	03:00 on 1st	ALL raw memories	1st of month

🔌 API Endpoints

Endpoint	Method	Description
`/`	`GET`	Health check
`/api/chat`	`POST`	Chat completion (with memory)
`/api/tags`	`GET`	List available models
`/api/generate`	`POST`	Generate completion
`/curator/run`	`POST`	Trigger curator manually

Manual Curation

# Daily curation (recent 24h)
curl -X POST http://localhost:11434/curator/run

# Full curation (all raw memories)
curl -X POST "http://localhost:11434/curator/run?full=true"

🧠 Memory System

4-Layer Context Build

┌─────────────────────────────────────────────────────────────┐
│ Layer 1: System Prompt                                      │
│   • From prompts/systemprompt.md                            │
│   • Preserved unchanged, passed through                     │
├─────────────────────────────────────────────────────────────┤
│ Layer 2: Semantic Memory                                    │
│   • Query Qdrant with user question                         │
│   • Retrieve curated Q&A pairs by relevance                 │
│   • Limited by semantic_token_budget                        │
├─────────────────────────────────────────────────────────────┤
│ Layer 3: Recent Context                                     │
│   • Last N conversation turns from Qdrant                   │
│   • Chronological order, recent memories first              │
│   • Limited by context_token_budget                         │
├─────────────────────────────────────────────────────────────┤
│ Layer 4: Current Messages                                   │
│   • User message from current request                       │
│   • Passed through unchanged                                │
└─────────────────────────────────────────────────────────────┘

Memory Types

Type	Description	Retention
`raw`	Unprocessed conversation turns	Until curation
`curated`	Cleaned Q&A pairs	Permanent
`test`	Test entries	Can be ignored

Curation Process

Daily (02:00): Processes raw memories from last 24h into curated Q&A pairs
Monthly (03:00 on 1st): Processes ALL remaining raw memories for full cleanup

🔧 Troubleshooting

Permission Denied

# Check your UID/GID
id

# Rebuild with correct values
APP_UID=$(id -u) APP_GID=$(id -g) docker compose build --no-cache
docker compose up -d

Wrong Timezone

# Check container time
docker exec VeraAI date

# Fix in .env
TZ=America/Chicago

Health Check Failing

# Check logs
docker logs VeraAI --tail 50

# Test Ollama connectivity
docker exec VeraAI python -c "
import urllib.request
print(urllib.request.urlopen('http://YOUR_OLLAMA_IP:11434/').read())
"

# Test Qdrant connectivity
docker exec VeraAI python -c "
import urllib.request
print(urllib.request.urlopen('http://YOUR_QDRANT_IP:6333/').read())
"

Port Already in Use

# Check what's using port 11434
sudo lsof -i :11434

# Stop conflicting service or change port in config

🛠️ Development

Build from Source

git clone https://speedyfox.app/SpeedyFoxAi/vera-ai-v2.git
cd vera-ai-v2
pip install -r requirements.txt
docker compose build

Run Tests

# Health check
curl http://localhost:11434/

# Non-streaming chat
curl -X POST http://localhost:11434/api/chat \
  -H "Content-Type: application/json" \
  -d '{"model":"qwen3.5:397b-cloud","messages":[{"role":"user","content":"test"}],"stream":false}'

# Trigger curation
curl -X POST http://localhost:11434/curator/run

📄 License

MIT License - see LICENSE file for details.

🤝 Support

Resource	Link
Repository	https://speedyfox.app/SpeedyFoxAi/vera-ai-v2
Issues	https://speedyfox.app/SpeedyFoxAi/vera-ai-v2/issues

Vera-AI — True AI Memory

Brought to you by SpeedyFoxAi

20 KiB Raw Blame History