Update DOCKERHUB.md with API flow diagram

2026-03-26 13:11:17 -05:00
parent 5617eabeae
commit f9730eec5b
1 changed files with 265 additions and 0 deletions
--- a/DOCKERHUB.md
+++ b/DOCKERHUB.md
@@ -0,0 +1,265 @@
 # Vera-AI - Persistent Memory Proxy for Ollama
 **Vera** (Latin): *True* — **True AI Memory**
 ---
 ## What is Vera-AI?
 Vera-AI is a transparent proxy for Ollama that adds persistent memory using Qdrant vector storage. It sits between your AI client and Ollama, automatically augmenting conversations with relevant context from previous sessions.
 **Every conversation is remembered.**
 ---
 ## How It Works
 ```
 ┌─────────────────────────────────────────────────────────────────────────────────┐
 │                              REQUEST FLOW                                        │
 └─────────────────────────────────────────────────────────────────────────────────┘
    ┌──────────┐         ┌──────────┐         ┌──────────┐         ┌──────────┐
    │  Client  │ ──(1)──▶│ Vera-AI  │ ──(3)──▶│  Ollama  │ ──(5)──▶│ Response │
    │  (You)   │         │  Proxy   │         │   LLM    │         │  to User │
    └──────────┘         └────┬─────┘         └──────────┘         └──────────┘
                              │
                              │ (2) Query semantic memory
                              │
                              ▼
                       ┌──────────┐
                       │ Qdrant   │
                       │ Vector DB│
                       └──────────┘
                              │
                              │ (4) Store conversation turn
                              │
                              ▼
                       ┌──────────┐
                       │ Memory   │
                       │ Storage  │
                       └──────────┘
 ┌─────────────────────────────────────────────────────────────────────────────────┐
 │                           4-LAYER CONTEXT BUILD                                  │
 └─────────────────────────────────────────────────────────────────────────────────┘
    Incoming Request (POST /api/chat)
              │
              ▼
    ┌─────────────────────────────────────────────────────────────────────────────┐
    │ Layer 1: System Prompt                                                      │
    │   • Static context from prompts/systemprompt.md                            │
    │   • Preserved unchanged, passed through                                      │
    └─────────────────────────────────────────────────────────────────────────────┘
              │
              ▼
    ┌─────────────────────────────────────────────────────────────────────────────┐
    │ Layer 2: Semantic Memory                                                    │
    │   • Query Qdrant with user question                                         │
    │   • Retrieve curated Q&A pairs by relevance                                 │
    │   • Limited by semantic_token_budget                                        │
    └─────────────────────────────────────────────────────────────────────────────┘
              │
              ▼
    ┌─────────────────────────────────────────────────────────────────────────────┐
    │ Layer 3: Recent Context                                                     │
    │   • Last N conversation turns from Qdrant                                   │
    │   • Chronological order, recent memories first                              │
    │   • Limited by context_token_budget                                         │
    └─────────────────────────────────────────────────────────────────────────────┘
              │
              ▼
    ┌─────────────────────────────────────────────────────────────────────────────┐
    │ Layer 4: Current Messages                                                    │
    │   • User message from current request                                       │
    │   • Passed through unchanged                                                │
    └─────────────────────────────────────────────────────────────────────────────┘
              │
              ▼
         [augmented request] ──▶ Ollama LLM ──▶ Response
 ```
 ---
 ## Quick Start
 ```bash
 # Pull the image
 docker pull YOUR_USERNAME/vera-ai:latest
 # Create directories
 mkdir -p config prompts logs
 # Create environment file
 cat > .env << EOF
 APP_UID=$(id -u)
 APP_GID=$(id -g)
 TZ=America/Chicago
 EOF
 # Run
 docker run -d \
  --name vera-ai \
  --env-file .env \
  -v ./config/config.toml:/app/config/config.toml:ro \
  -v ./prompts:/app/prompts:rw \
  -v ./logs:/app/logs:rw \
  --network host \
  YOUR_USERNAME/vera-ai:latest
 # Test
 curl http://localhost:11434/
 ```
 ---
 ## Features
 | Feature | Description |
 |---------|-------------|
 | 🧠 **Persistent Memory** | Conversations stored in Qdrant, retrieved contextually |
 | 📅 **Monthly Curation** | Daily + monthly cleanup of raw memories |
 | 🔍 **4-Layer Context** | System + semantic + recent + current messages |
 | 👤 **Configurable UID/GID** | Match container user to host for permissions |
 | 🌍 **Timezone Support** | Scheduler runs in your local timezone |
 | 📝 **Debug Logging** | Optional logs written to configurable directory |
 ---
 ## Configuration
 ### Environment Variables
 | Variable | Default | Description |
 |----------|---------|-------------|
 | `APP_UID` | `999` | Container user ID (match your host UID) |
 | `APP_GID` | `999` | Container group ID (match your host GID) |
 | `TZ` | `UTC` | Container timezone |
 | `VERA_CONFIG_DIR` | `/app/config` | Config directory |
 | `VERA_PROMPTS_DIR` | `/app/prompts` | Prompts directory |
 | `VERA_LOG_DIR` | `/app/logs` | Debug logs directory |
 ### Required Services
 - **Ollama**: LLM inference server
 - **Qdrant**: Vector database for memory storage
 ### Example config.toml
 ```toml
 [general]
 ollama_host = "http://YOUR_OLLAMA_IP:11434"
 qdrant_host = "http://YOUR_QDRANT_IP:6333"
 qdrant_collection = "memories"
 embedding_model = "snowflake-arctic-embed2"
 debug = false
 [layers]
 semantic_token_budget = 25000
 context_token_budget = 22000
 semantic_search_turns = 2
 semantic_score_threshold = 0.6
 [curator]
 run_time = "02:00"           # Daily curator
 full_run_time = "03:00"      # Monthly curator
 full_run_day = 1             # Day of month (1st)
 curator_model = "gpt-oss:120b"
 ```
 ---
 ## Docker Compose
 ```yaml
 services:
  vera-ai:
    image: YOUR_USERNAME/vera-ai:latest
    container_name: vera-ai
    env_file:
      - .env
    volumes:
      - ./config/config.toml:/app/config/config.toml:ro
      - ./prompts:/app/prompts:rw
      - ./logs:/app/logs:rw
    network_mode: "host"
    restart: unless-stopped
    healthcheck:
      test: ["CMD", "python", "-c", "import urllib.request; urllib.request.urlopen('http://localhost:11434/')"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 10s
 ```
 ---
 ## Memory System
 ### 4-Layer Context
 1. **System Prompt** - From `prompts/systemprompt.md`
 2. **Semantic Memory** - Curated Q&A retrieved by relevance
 3. **Recent Context** - Last N conversation turns
 4. **Current Messages** - User/assistant from request
 ### Curation Schedule
 | Schedule | Time | What |
 |----------|------|------|
 | Daily | 02:00 | Recent 24h raw memories |
 | Monthly | 03:00 on 1st | ALL raw memories |
 ---
 ## API Endpoints
 | Endpoint | Method | Description |
 |----------|--------|-------------|
 | `/` | `GET` | Health check |
 | `/api/chat` | `POST` | Chat completion (with memory) |
 | `/api/tags` | `GET` | List models |
 | `/curator/run` | `POST` | Trigger curator |
 ---
 ## Troubleshooting
 ### Permission Denied
 ```bash
 # Get your UID/GID
 id
 # Set in .env
 APP_UID=1000
 APP_GID=1000
 # Rebuild
 docker compose build --no-cache
 ```
 ### Wrong Timezone
 ```bash
 # Set in .env
 TZ=America/Chicago
 ```
 ---
 ## Source Code
 - **Gitea**: http://10.0.0.61:3000/SpeedyFoxAi/vera-ai-v2
 ---
 ## License
 MIT License
 ---
 Brought to you by SpeedyFoxAi