SpeedyFoxAi/vera-ai-v2

Fork 0

Files

Vera-AI 8af6188401 Enhance DOCKERHUB Prerequisites with detailed setup

2026-03-26 15:00:35 -05:00

8.8 KiB

Raw Blame History

Vera-AI - Persistent Memory Proxy for Ollama

Vera (Latin): True — True AI Memory

What is Vera-AI?

Vera-AI is a transparent proxy for Ollama that adds persistent memory using Qdrant vector storage. It sits between your AI client and Ollama, automatically augmenting conversations with relevant context from previous sessions.

Every conversation is remembered.

How It Works

┌─────────────────────────────────────────────────────────────────────────────────┐
│                              REQUEST FLOW                                        │
└─────────────────────────────────────────────────────────────────────────────────┘

    ┌──────────┐         ┌──────────┐         ┌──────────┐         ┌──────────┐
    │  Client  │ ──(1)──▶│ Vera-AI  │ ──(3)──▶│  Ollama  │ ──(5)──▶│ Response │
    │  (You)   │         │  Proxy   │         │   LLM    │         │  to User │
    └──────────┘         └────┬─────┘         └──────────┘         └──────────┘
                              │
                              │ (2) Query semantic memory
                              │
                              ▼
                       ┌──────────┐
                       │ Qdrant   │
                       │ Vector DB│
                       └──────────┘
                              │
                              │ (4) Store conversation turn
                              │
                              ▼
                       ┌──────────┐
                       │ Memory   │
                       │ Storage  │
                       └──────────┘

Quick Start

Option 1: Docker Run (Single Command)

docker run -d \
  --name VeraAI \
  --restart unless-stopped \
  --network host \
  -e APP_UID=1000 \
  -e APP_GID=1000 \
  -e TZ=America/Chicago \
  -e VERA_DEBUG=false \
  -v ./config/config.toml:/app/config/config.toml:ro \
  -v ./prompts:/app/prompts:rw \
  -v ./logs:/app/logs:rw \
  your-username/vera-ai:latest

Option 2: Docker Compose

Create docker-compose.yml:

services:
  vera-ai:
    image: your-username/vera-ai:latest
    container_name: VeraAI
    restart: unless-stopped
    network_mode: host
    environment:
      - APP_UID=1000
      - APP_GID=1000
      - TZ=America/Chicago
      - VERA_DEBUG=false
    volumes:
      - ./config/config.toml:/app/config/config.toml:ro
      - ./prompts:/app/prompts:rw
      - ./logs:/app/logs:rw
    healthcheck:
      test: ["CMD", "python", "-c", "import urllib.request; urllib.request.urlopen('http://localhost:11434/')"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 10s

Run with:

docker compose up -d

Configuration

Environment Variables

Variable	Default	Description
`APP_UID`	`999`	Container user ID (match your host UID)
`APP_GID`	`999`	Container group ID (match your host GID)
`TZ`	`UTC`	Container timezone
`VERA_DEBUG`	`false`	Enable debug logging

config.toml

Create config/config.toml:

[general]
# Ollama server URL
ollama_host = "http://10.0.0.10:11434"

# Qdrant vector database URL
qdrant_host = "http://10.0.0.22:6333"

# Collection name for memories
qdrant_collection = "memories"

# Embedding model for semantic search
embedding_model = "snowflake-arctic-embed2"

# Enable debug logging (set to true for verbose logs)
debug = false

[layers]
# Token budget for semantic memory layer
semantic_token_budget = 25000

# Token budget for recent context layer
context_token_budget = 22000

# Number of recent turns to include in semantic search
semantic_search_turns = 2

# Minimum similarity score for semantic search (0.0-1.0)
semantic_score_threshold = 0.6

[curator]
# Time for daily curation (HH:MM format)
run_time = "02:00"

# Time for monthly full curation (HH:MM format)
full_run_time = "03:00"

# Day of month for full curation (1-28)
full_run_day = 1

# Model to use for curation
curator_model = "gpt-oss:120b"

prompts/ Directory

Create prompts/ directory with:

prompts/curator_prompt.md - Prompt for memory curation:

You are a memory curator. Your job is to summarize conversation turns 
into concise Q&A pairs that will be stored for future reference.

Extract the key information and create clear, searchable entries.

prompts/systemprompt.md - System context for Vera:

You are Vera, an AI with persistent memory. You remember all previous 
conversations with this user and can reference them contextually.

Docker Options Explained

Option	Description
`-d`	Run detached (background)
`--name VeraAI`	Container name
`--restart unless-stopped`	Auto-start on boot, survive reboots
`--network host`	Use host network (port 11434)
`-e APP_UID=1000`	User ID (match your host UID)
`-e APP_GID=1000`	Group ID (match your host GID)
`-e TZ=America/Chicago`	Timezone for scheduler
`-e VERA_DEBUG=false`	Disable debug logging
`-v ...config.toml:ro`	Config file (read-only)
`-v ...prompts:rw`	Prompts directory (read-write)
`-v ...logs:rw`	Logs directory (read-write)

Prerequisites

Required Services

Service	Version	Description	Install
Ollama	0.1.x+	LLM inference server	`curl https://ollama.ai/install.sh
Qdrant	1.6.x+	Vector database	`docker run -p 6333:6333 qdrant/qdrant`
Docker	20.x+	Container runtime	`curl -fsSL https://get.docker.com

System Requirements

Requirement	Minimum	Recommended
CPU	2 cores	4+ cores
RAM	2 GB	4+ GB
Disk	1 GB	5+ GB

Ollama Setup

# Install Ollama
curl https://ollama.ai/install.sh | sh

# Pull embedding model (required)
ollama pull snowflake-arctic-embed2

# Pull chat model
ollama pull llama3.1

# Verify
curl http://localhost:11434/api/tags

Qdrant Setup

# Docker run
docker run -d --name qdrant -p 6333:6333 qdrant/qdrant

# Verify
curl http://localhost:6333/collections

Embedding Models

Model	Dimensions	Notes
`snowflake-arctic-embed2`	1024	Recommended
`nomic-embed-text`	768	Alternative
`mxbai-embed-large`	1024	Fast

Pre-Flight Checklist

Docker installed (docker --version)
Ollama running (curl http://OLLAMA_IP:11434/api/tags)
Qdrant running (curl http://QDRANT_IP:6333/collections)
Embedding model pulled (ollama pull snowflake-arctic-embed2)
UID/GID noted (id command)

Features

Feature	Description
🧠 Persistent Memory	Conversations stored in Qdrant, retrieved contextually
📅 Monthly Curation	Daily + monthly cleanup of raw memories
🔍 4-Layer Context	System + semantic + recent + current messages
👤 Configurable UID/GID	Match container user to host for permissions
🌍 Timezone Support	Scheduler runs in your local timezone
📝 Debug Logging	Optional logs written to configurable directory

API Endpoints

Endpoint	Method	Description
`/`	`GET`	Health check
`/api/chat`	`POST`	Chat completion (with memory)
`/api/tags`	`GET`	List models
`/curator/run`	`POST`	Trigger curator manually

Verify Installation

# Health check
curl http://localhost:11434/
# Expected: {"status":"ok","ollama":"reachable"}

# Check container
docker ps
# Expected: VeraAI running with (healthy) status

# Test chat
curl -X POST http://localhost:11434/api/chat \
  -H "Content-Type: application/json" \
  -d '{"model":"your-model","messages":[{"role":"user","content":"hello"}],"stream":false}'

Troubleshooting

Permission Denied

# Get your UID/GID
id

# Set in environment
APP_UID=$(id -u)
APP_GID=$(id -g)

Wrong Timezone

# Set correct timezone
TZ=America/Chicago

Source Code

Gitea: http://10.0.0.61:3000/SpeedyFoxAi/vera-ai-v2

License

MIT License

Brought to you by SpeedyFoxAi

8.8 KiB Raw Blame History