Update DOCKERHUB.md with API flow diagram

2026-03-26 13:11:17 -05:00
parent 5617eabeae
commit f9730eec5b
1 changed files with 265 additions and 0 deletions
--- a/DOCKERHUB.md
+++ b/DOCKERHUB.md
@@ -0,0 +1,265 @@
+# Vera-AI - Persistent Memory Proxy for Ollama
+
+**Vera** (Latin): *True* — **True AI Memory**
+
+---
+
+## What is Vera-AI?
+
+Vera-AI is a transparent proxy for Ollama that adds persistent memory using Qdrant vector storage. It sits between your AI client and Ollama, automatically augmenting conversations with relevant context from previous sessions.
+
+**Every conversation is remembered.**
+
+---
+
+## How It Works
+
+```
+┌─────────────────────────────────────────────────────────────────────────────────┐
+│                              REQUEST FLOW                                        │
+└─────────────────────────────────────────────────────────────────────────────────┘
+
+    ┌──────────┐         ┌──────────┐         ┌──────────┐         ┌──────────┐
+    │  Client  │ ──(1)──▶│ Vera-AI  │ ──(3)──▶│  Ollama  │ ──(5)──▶│ Response │
+    │  (You)   │         │  Proxy   │         │   LLM    │         │  to User │
+    └──────────┘         └────┬─────┘         └──────────┘         └──────────┘
+                              │
+                              │ (2) Query semantic memory
+                              │
+                              ▼
+                       ┌──────────┐
+                       │ Qdrant   │
+                       │ Vector DB│
+                       └──────────┘
+                              │
+                              │ (4) Store conversation turn
+                              │
+                              ▼
+                       ┌──────────┐
+                       │ Memory   │
+                       │ Storage  │
+                       └──────────┘
+
+
+┌─────────────────────────────────────────────────────────────────────────────────┐
+│                           4-LAYER CONTEXT BUILD                                  │
+└─────────────────────────────────────────────────────────────────────────────────┘
+
+    Incoming Request (POST /api/chat)
+              │
+              ▼
+    ┌─────────────────────────────────────────────────────────────────────────────┐
+    │ Layer 1: System Prompt                                                      │
+    │   • Static context from prompts/systemprompt.md                            │
+    │   • Preserved unchanged, passed through                                      │
+    └─────────────────────────────────────────────────────────────────────────────┘
+              │
+              ▼
+    ┌─────────────────────────────────────────────────────────────────────────────┐
+    │ Layer 2: Semantic Memory                                                    │
+    │   • Query Qdrant with user question                                         │
+    │   • Retrieve curated Q&A pairs by relevance                                 │
+    │   • Limited by semantic_token_budget                                        │
+    └─────────────────────────────────────────────────────────────────────────────┘
+              │
+              ▼
+    ┌─────────────────────────────────────────────────────────────────────────────┐
+    │ Layer 3: Recent Context                                                     │
+    │   • Last N conversation turns from Qdrant                                   │
+    │   • Chronological order, recent memories first                              │
+    │   • Limited by context_token_budget                                         │
+    └─────────────────────────────────────────────────────────────────────────────┘
+              │
+              ▼
+    ┌─────────────────────────────────────────────────────────────────────────────┐
+    │ Layer 4: Current Messages                                                    │
+    │   • User message from current request                                       │
+    │   • Passed through unchanged                                                │
+    └─────────────────────────────────────────────────────────────────────────────┘
+              │
+              ▼
+         [augmented request] ──▶ Ollama LLM ──▶ Response
+```
+
+---
+
+## Quick Start
+
+```bash
+# Pull the image
+docker pull YOUR_USERNAME/vera-ai:latest
+
+# Create directories
+mkdir -p config prompts logs
+
+# Create environment file
+cat > .env << EOF
+APP_UID=$(id -u)
+APP_GID=$(id -g)
+TZ=America/Chicago
+EOF
+
+# Run
+docker run -d \
+  --name vera-ai \
+  --env-file .env \
+  -v ./config/config.toml:/app/config/config.toml:ro \
+  -v ./prompts:/app/prompts:rw \
+  -v ./logs:/app/logs:rw \
+  --network host \
+  YOUR_USERNAME/vera-ai:latest
+
+# Test
+curl http://localhost:11434/
+```
+
+---
+
+## Features
+
+| Feature | Description |
+|---------|-------------|
+| 🧠 **Persistent Memory** | Conversations stored in Qdrant, retrieved contextually |
+| 📅 **Monthly Curation** | Daily + monthly cleanup of raw memories |
+| 🔍 **4-Layer Context** | System + semantic + recent + current messages |
+| 👤 **Configurable UID/GID** | Match container user to host for permissions |
+| 🌍 **Timezone Support** | Scheduler runs in your local timezone |
+| 📝 **Debug Logging** | Optional logs written to configurable directory |
+
+---
+
+## Configuration
+
+### Environment Variables
+
+| Variable | Default | Description |
+|----------|---------|-------------|
+| `APP_UID` | `999` | Container user ID (match your host UID) |
+| `APP_GID` | `999` | Container group ID (match your host GID) |
+| `TZ` | `UTC` | Container timezone |
+| `VERA_CONFIG_DIR` | `/app/config` | Config directory |
+| `VERA_PROMPTS_DIR` | `/app/prompts` | Prompts directory |
+| `VERA_LOG_DIR` | `/app/logs` | Debug logs directory |
+
+### Required Services
+
+- **Ollama**: LLM inference server
+- **Qdrant**: Vector database for memory storage
+
+### Example config.toml
+
+```toml
+[general]
+ollama_host = "http://YOUR_OLLAMA_IP:11434"
+qdrant_host = "http://YOUR_QDRANT_IP:6333"
+qdrant_collection = "memories"
+embedding_model = "snowflake-arctic-embed2"
+debug = false
+
+[layers]
+semantic_token_budget = 25000
+context_token_budget = 22000
+semantic_search_turns = 2
+semantic_score_threshold = 0.6
+
+[curator]
+run_time = "02:00"           # Daily curator
+full_run_time = "03:00"      # Monthly curator
+full_run_day = 1             # Day of month (1st)
+curator_model = "gpt-oss:120b"
+```
+
+---
+
+## Docker Compose
+
+```yaml
+services:
+  vera-ai:
+    image: YOUR_USERNAME/vera-ai:latest
+    container_name: vera-ai
+    env_file:
+      - .env
+    volumes:
+      - ./config/config.toml:/app/config/config.toml:ro
+      - ./prompts:/app/prompts:rw
+      - ./logs:/app/logs:rw
+    network_mode: "host"
+    restart: unless-stopped
+    healthcheck:
+      test: ["CMD", "python", "-c", "import urllib.request; urllib.request.urlopen('http://localhost:11434/')"]
+      interval: 30s
+      timeout: 10s
+      retries: 3
+      start_period: 10s
+```
+
+---
+
+## Memory System
+
+### 4-Layer Context
+
+1. **System Prompt** - From `prompts/systemprompt.md`
+2. **Semantic Memory** - Curated Q&A retrieved by relevance
+3. **Recent Context** - Last N conversation turns
+4. **Current Messages** - User/assistant from request
+
+### Curation Schedule
+
+| Schedule | Time | What |
+|----------|------|------|
+| Daily | 02:00 | Recent 24h raw memories |
+| Monthly | 03:00 on 1st | ALL raw memories |
+
+---
+
+## API Endpoints
+
+| Endpoint | Method | Description |
+|----------|--------|-------------|
+| `/` | `GET` | Health check |
+| `/api/chat` | `POST` | Chat completion (with memory) |
+| `/api/tags` | `GET` | List models |
+| `/curator/run` | `POST` | Trigger curator |
+
+---
+
+## Troubleshooting
+
+### Permission Denied
+
+```bash
+# Get your UID/GID
+id
+
+# Set in .env
+APP_UID=1000
+APP_GID=1000
+
+# Rebuild
+docker compose build --no-cache
+```
+
+### Wrong Timezone
+
+```bash
+# Set in .env
+TZ=America/Chicago
+```
+
+---
+
+## Source Code
+
+- **Gitea**: http://10.0.0.61:3000/SpeedyFoxAi/vera-ai-v2
+
+---
+
+## License
+
+MIT License
+
+---
+
+Brought to you by SpeedyFoxAi