Add API flow diagram showing how requests pass through Vera

2026-03-26 13:10:11 -05:00
parent 535265c7d2
commit 5617eabeae
1 changed files with 138 additions and 59 deletions
--- a/README.md
+++ b/README.md
@@ -22,6 +22,108 @@ Every conversation is stored in Qdrant vector database and retrieved contextuall
 ---
 ## 🔄 How It Works
 ```
 ┌─────────────────────────────────────────────────────────────────────────────────┐
 │                              REQUEST FLOW                                        │
 └─────────────────────────────────────────────────────────────────────────────────┘
    ┌──────────┐         ┌──────────┐         ┌──────────┐         ┌──────────┐
    │  Client  │ ──(1)──▶│ Vera-AI  │ ──(3)──▶│  Ollama  │ ──(5)──▶│ Response │
    │  (You)   │         │  Proxy   │         │   LLM    │         │  to User │
    └──────────┘         └────┬─────┘         └──────────┘         └──────────┘
                              │
                              │ (2) Query semantic memory
                              │
                              ▼
                       ┌──────────┐
                       │ Qdrant   │
                       │ Vector DB│
                       └──────────┘
                              │
                              │ (4) Store conversation turn
                              │
                              ▼
                       ┌──────────┐
                       │ Memory   │
                       │ Storage  │
                       └──────────┘
 ┌─────────────────────────────────────────────────────────────────────────────────┐
 │                           4-LAYER CONTEXT BUILD                                  │
 └─────────────────────────────────────────────────────────────────────────────────┘
    Incoming Request (POST /api/chat)
              │
              ▼
    ┌─────────────────────────────────────────────────────────────────────────────┐
    │ Layer 1: System Prompt                                                      │
    │   • Static context from prompts/systemprompt.md                            │
    │   • Preserved unchanged, passed through                                      │
    └─────────────────────────────────────────────────────────────────────────────┘
              │
              ▼
    ┌─────────────────────────────────────────────────────────────────────────────┐
    │ Layer 2: Semantic Memory                                                    │
    │   • Query Qdrant with user question                                         │
    │   • Retrieve curated Q&A pairs by relevance                                 │
    │   • Limited by semantic_token_budget                                        │
    └─────────────────────────────────────────────────────────────────────────────┘
              │
              ▼
    ┌─────────────────────────────────────────────────────────────────────────────┐
    │ Layer 3: Recent Context                                                     │
    │   • Last N conversation turns from Qdrant                                   │
    │   • Chronological order, recent memories first                              │
    │   • Limited by context_token_budget                                         │
    └─────────────────────────────────────────────────────────────────────────────┘
              │
              ▼
    ┌─────────────────────────────────────────────────────────────────────────────┐
    │ Layer 4: Current Messages                                                    │
    │   • User message from current request                                       │
    │   • Passed through unchanged                                                │
    └─────────────────────────────────────────────────────────────────────────────┘
              │
              ▼
         [augmented request] ──▶ Ollama LLM ──▶ Response
 ┌─────────────────────────────────────────────────────────────────────────────────┐
 │                           MEMORY STORAGE FLOW                                    │
 └─────────────────────────────────────────────────────────────────────────────────┘
    User Question + Assistant Response
              │
              ▼
    ┌─────────────────────────────────────────────────────────────────────────────┐
    │ Store as "raw" memory in Qdrant                                             │
    │   • User ID, role, content, timestamp                                       │
    │   • Embedded using configured embedding model                               │
    └─────────────────────────────────────────────────────────────────────────────┘
              │
              ▼
    ┌─────────────────────────────────────────────────────────────────────────────┐
    │ Daily Curator (02:00)                                                        │
    │   • Processes raw memories from last 24h                                    │
    │   • Summarizes into curated Q&A pairs                                      │
    │   • Stores as "curated" memories                                            │
    │   • Deletes processed raw memories                                          │
    └─────────────────────────────────────────────────────────────────────────────┘
              │
              ▼
    ┌─────────────────────────────────────────────────────────────────────────────┐
    │ Monthly Curator (03:00 on 1st)                                              │
    │   • Processes ALL remaining raw memories                                    │
    │   • Full database cleanup                                                   │
    │   • Ensures no memories are orphaned                                        │
    └─────────────────────────────────────────────────────────────────────────────┘
 ```
 ---
 ## 🌟 Features
 | Feature | Description |
@@ -81,24 +183,18 @@ cd vera-ai-v2
 Create `.env` file (or copy from `.env.example`):
 ```bash
 # ═══════════════════════════════════════════════════════════════
 # User/Group Configuration
 # ═══════════════════════════════════════════════════════════════
 # IMPORTANT: Match these to your host user for volume permissions
 APP_UID=1000    # Run: id -u  to get your UID
 APP_GID=1000    # Run: id -g  to get your GID
 # ═══════════════════════════════════════════════════════════════
 # Timezone Configuration
 # ═══════════════════════════════════════════════════════════════
 # Affects curator schedule (daily at 02:00, monthly on 1st at 03:00)
 TZ=America/Chicago
 # ═══════════════════════════════════════════════════════════════
 # Optional: Cloud Model Routing
 # ═══════════════════════════════════════════════════════════════
 # OPENROUTER_API_KEY=your_api_key_here
 ```
@@ -169,27 +265,27 @@ docker logs vera-ai --tail 20
 ### Step 6: Verify Installation
 ```bash
-# ✅ Health check
+# Health check
 curl http://localhost:11434/
 # Expected: {"status":"ok","ollama":"reachable"}
-# ✅ Container status
+# Container status
 docker ps --format "table {{.Names}}\t{{.Status}}"
 # Expected: vera-ai   Up X minutes (healthy)
-# ✅ Timezone
+# Timezone
 docker exec vera-ai date
 # Should show your timezone (e.g., CDT for America/Chicago)
-# ✅ User permissions
+# User permissions
 docker exec vera-ai id
 # Expected: uid=1000(appuser) gid=1000(appgroup)
-# ✅ Directories
+# Directories
 docker exec vera-ai ls -la /app/prompts/
 # Should show: curator_prompt.md, systemprompt.md
-# ✅ Test chat
+# Test chat
 curl -X POST http://localhost:11434/api/chat \
  -H "Content-Type: application/json" \
  -d '{"model":"qwen3.5:397b-cloud","messages":[{"role":"user","content":"hello"}],"stream":false}'
@@ -221,26 +317,26 @@ curl -X POST http://localhost:11434/api/chat \
 ```
 vera-ai-v2/
-├── 📁 config/
+├── config/
-│   └── 📄 config.toml        # Main configuration
+│   └── config.toml        # Main configuration
-├── 📁 prompts/
+├── prompts/
-│   ├── 📄 curator_prompt.md  # Memory curation prompt
+│   ├── curator_prompt.md  # Memory curation prompt
-│   └── 📄 systemprompt.md    # System context
+│   └── systemprompt.md    # System context
-├── 📁 logs/                  # Debug logs (when debug=true)
+├── logs/                  # Debug logs (when debug=true)
-├── 📁 app/
+├── app/
-│   ├── 🐍 main.py            # FastAPI application
+│   ├── main.py            # FastAPI application
-│   ├── 🐍 config.py          # Configuration loader
+│   ├── config.py          # Configuration loader
-│   ├── 🐍 curator.py         # Memory curation
+│   ├── curator.py         # Memory curation
-│   ├── 🐍 proxy_handler.py  # Chat handling
+│   ├── proxy_handler.py   # Chat handling
-│   ├── 🐍 qdrant_service.py # Vector operations
+│   ├── qdrant_service.py  # Vector operations
-│   ├── 🐍 singleton.py      # QdrantService singleton
+│   ├── singleton.py       # QdrantService singleton
-│   └── 🐍 utils.py          # Utilities
+│   └── utils.py           # Utilities
-├── 📁 static/               # Legacy symlinks
+├── static/                # Legacy symlinks
-├── 📄 .env.example          # Environment template
+├── .env.example           # Environment template
-├── 📄 docker-compose.yml    # Docker Compose
+├── docker-compose.yml     # Docker Compose
-├── 📄 Dockerfile            # Container definition
+├── Dockerfile             # Container definition
-├── 📄 requirements.txt      # Python dependencies
+├── requirements.txt       # Python dependencies
-└── 📄 README.md             # This file
+└── README.md              # This file
 ```
 ## 🐳 Docker Compose
@@ -281,11 +377,11 @@ The `TZ` variable sets the container timezone for the scheduler:
 ```bash
 # Common timezones
-TZ=UTC                 # Coordinated Universal Time
+TZ=UTC                  # Coordinated Universal Time
-TZ=America/New_York    # Eastern Time
+TZ=America/New_York     # Eastern Time
-TZ=America/Chicago     # Central Time
+TZ=America/Chicago      # Central Time
-TZ=America/Los_Angeles # Pacific Time
+TZ=America/Los_Angeles  # Pacific Time
-TZ=Europe/London       # GMT/BST
+TZ=Europe/London        # GMT/BST
 ```
 **Curation Schedule:**
@@ -316,28 +412,6 @@ curl -X POST "http://localhost:11434/curator/run?full=true"
 ## 🧠 Memory System
 ### 4-Layer Context
 ```
 ┌─────────────────────────────────────────────────────────────┐
 │ Layer 1: System Prompt                                      │
 │   - From prompts/systemprompt.md                            │
 │   - Static context, curator can append rules                │
 ├─────────────────────────────────────────────────────────────┤
 │ Layer 2: Semantic Memory                                    │
 │   - Curated Q&A pairs from Qdrant                           │
 │   - Retrieved by relevance to current message               │
 ├─────────────────────────────────────────────────────────────┤
 │ Layer 3: Recent Context                                     │
 │   - Last N conversation turns from Qdrant                   │
 │   - Chronological order                                      │
 ├─────────────────────────────────────────────────────────────┤
 │ Layer 4: Current Messages                                   │
 │   - User/assistant messages from current request            │
 │   - Passed through unchanged                                │
 └─────────────────────────────────────────────────────────────┘
 ```
 ### Memory Types
 | Type | Description | Retention |
@@ -346,6 +420,11 @@ curl -X POST "http://localhost:11434/curator/run?full=true"
 | `curated` | Cleaned Q&A pairs | Permanent |
 | `test` | Test entries | Can be ignored |
 ### Curation Process
 1. **Daily (02:00)**: Processes raw memories from last 24h into curated Q&A pairs
 2. **Monthly (03:00 on 1st)**: Processes ALL remaining raw memories for full cleanup
 ## 🔧 Troubleshooting
 ### Permission Denied