Add API flow diagram showing how requests pass through Vera
This commit is contained in:
197
README.md
197
README.md
@@ -22,6 +22,108 @@ Every conversation is stored in Qdrant vector database and retrieved contextuall
|
|||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
## 🔄 How It Works
|
||||||
|
|
||||||
|
```
|
||||||
|
┌─────────────────────────────────────────────────────────────────────────────────┐
|
||||||
|
│ REQUEST FLOW │
|
||||||
|
└─────────────────────────────────────────────────────────────────────────────────┘
|
||||||
|
|
||||||
|
┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
|
||||||
|
│ Client │ ──(1)──▶│ Vera-AI │ ──(3)──▶│ Ollama │ ──(5)──▶│ Response │
|
||||||
|
│ (You) │ │ Proxy │ │ LLM │ │ to User │
|
||||||
|
└──────────┘ └────┬─────┘ └──────────┘ └──────────┘
|
||||||
|
│
|
||||||
|
│ (2) Query semantic memory
|
||||||
|
│
|
||||||
|
▼
|
||||||
|
┌──────────┐
|
||||||
|
│ Qdrant │
|
||||||
|
│ Vector DB│
|
||||||
|
└──────────┘
|
||||||
|
│
|
||||||
|
│ (4) Store conversation turn
|
||||||
|
│
|
||||||
|
▼
|
||||||
|
┌──────────┐
|
||||||
|
│ Memory │
|
||||||
|
│ Storage │
|
||||||
|
└──────────┘
|
||||||
|
|
||||||
|
|
||||||
|
┌─────────────────────────────────────────────────────────────────────────────────┐
|
||||||
|
│ 4-LAYER CONTEXT BUILD │
|
||||||
|
└─────────────────────────────────────────────────────────────────────────────────┘
|
||||||
|
|
||||||
|
Incoming Request (POST /api/chat)
|
||||||
|
│
|
||||||
|
▼
|
||||||
|
┌─────────────────────────────────────────────────────────────────────────────┐
|
||||||
|
│ Layer 1: System Prompt │
|
||||||
|
│ • Static context from prompts/systemprompt.md │
|
||||||
|
│ • Preserved unchanged, passed through │
|
||||||
|
└─────────────────────────────────────────────────────────────────────────────┘
|
||||||
|
│
|
||||||
|
▼
|
||||||
|
┌─────────────────────────────────────────────────────────────────────────────┐
|
||||||
|
│ Layer 2: Semantic Memory │
|
||||||
|
│ • Query Qdrant with user question │
|
||||||
|
│ • Retrieve curated Q&A pairs by relevance │
|
||||||
|
│ • Limited by semantic_token_budget │
|
||||||
|
└─────────────────────────────────────────────────────────────────────────────┘
|
||||||
|
│
|
||||||
|
▼
|
||||||
|
┌─────────────────────────────────────────────────────────────────────────────┐
|
||||||
|
│ Layer 3: Recent Context │
|
||||||
|
│ • Last N conversation turns from Qdrant │
|
||||||
|
│ • Chronological order, recent memories first │
|
||||||
|
│ • Limited by context_token_budget │
|
||||||
|
└─────────────────────────────────────────────────────────────────────────────┘
|
||||||
|
│
|
||||||
|
▼
|
||||||
|
┌─────────────────────────────────────────────────────────────────────────────┐
|
||||||
|
│ Layer 4: Current Messages │
|
||||||
|
│ • User message from current request │
|
||||||
|
│ • Passed through unchanged │
|
||||||
|
└─────────────────────────────────────────────────────────────────────────────┘
|
||||||
|
│
|
||||||
|
▼
|
||||||
|
[augmented request] ──▶ Ollama LLM ──▶ Response
|
||||||
|
|
||||||
|
|
||||||
|
┌─────────────────────────────────────────────────────────────────────────────────┐
|
||||||
|
│ MEMORY STORAGE FLOW │
|
||||||
|
└─────────────────────────────────────────────────────────────────────────────────┘
|
||||||
|
|
||||||
|
User Question + Assistant Response
|
||||||
|
│
|
||||||
|
▼
|
||||||
|
┌─────────────────────────────────────────────────────────────────────────────┐
|
||||||
|
│ Store as "raw" memory in Qdrant │
|
||||||
|
│ • User ID, role, content, timestamp │
|
||||||
|
│ • Embedded using configured embedding model │
|
||||||
|
└─────────────────────────────────────────────────────────────────────────────┘
|
||||||
|
│
|
||||||
|
▼
|
||||||
|
┌─────────────────────────────────────────────────────────────────────────────┐
|
||||||
|
│ Daily Curator (02:00) │
|
||||||
|
│ • Processes raw memories from last 24h │
|
||||||
|
│ • Summarizes into curated Q&A pairs │
|
||||||
|
│ • Stores as "curated" memories │
|
||||||
|
│ • Deletes processed raw memories │
|
||||||
|
└─────────────────────────────────────────────────────────────────────────────┘
|
||||||
|
│
|
||||||
|
▼
|
||||||
|
┌─────────────────────────────────────────────────────────────────────────────┐
|
||||||
|
│ Monthly Curator (03:00 on 1st) │
|
||||||
|
│ • Processes ALL remaining raw memories │
|
||||||
|
│ • Full database cleanup │
|
||||||
|
│ • Ensures no memories are orphaned │
|
||||||
|
└─────────────────────────────────────────────────────────────────────────────┘
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
## 🌟 Features
|
## 🌟 Features
|
||||||
|
|
||||||
| Feature | Description |
|
| Feature | Description |
|
||||||
@@ -81,24 +183,18 @@ cd vera-ai-v2
|
|||||||
Create `.env` file (or copy from `.env.example`):
|
Create `.env` file (or copy from `.env.example`):
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# ═══════════════════════════════════════════════════════════════
|
|
||||||
# User/Group Configuration
|
# User/Group Configuration
|
||||||
# ═══════════════════════════════════════════════════════════════
|
|
||||||
# IMPORTANT: Match these to your host user for volume permissions
|
# IMPORTANT: Match these to your host user for volume permissions
|
||||||
|
|
||||||
APP_UID=1000 # Run: id -u to get your UID
|
APP_UID=1000 # Run: id -u to get your UID
|
||||||
APP_GID=1000 # Run: id -g to get your GID
|
APP_GID=1000 # Run: id -g to get your GID
|
||||||
|
|
||||||
# ═══════════════════════════════════════════════════════════════
|
|
||||||
# Timezone Configuration
|
# Timezone Configuration
|
||||||
# ═══════════════════════════════════════════════════════════════
|
|
||||||
# Affects curator schedule (daily at 02:00, monthly on 1st at 03:00)
|
# Affects curator schedule (daily at 02:00, monthly on 1st at 03:00)
|
||||||
|
|
||||||
TZ=America/Chicago
|
TZ=America/Chicago
|
||||||
|
|
||||||
# ═══════════════════════════════════════════════════════════════
|
|
||||||
# Optional: Cloud Model Routing
|
# Optional: Cloud Model Routing
|
||||||
# ═══════════════════════════════════════════════════════════════
|
|
||||||
# OPENROUTER_API_KEY=your_api_key_here
|
# OPENROUTER_API_KEY=your_api_key_here
|
||||||
```
|
```
|
||||||
|
|
||||||
@@ -169,27 +265,27 @@ docker logs vera-ai --tail 20
|
|||||||
### Step 6: Verify Installation
|
### Step 6: Verify Installation
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# ✅ Health check
|
# Health check
|
||||||
curl http://localhost:11434/
|
curl http://localhost:11434/
|
||||||
# Expected: {"status":"ok","ollama":"reachable"}
|
# Expected: {"status":"ok","ollama":"reachable"}
|
||||||
|
|
||||||
# ✅ Container status
|
# Container status
|
||||||
docker ps --format "table {{.Names}}\t{{.Status}}"
|
docker ps --format "table {{.Names}}\t{{.Status}}"
|
||||||
# Expected: vera-ai Up X minutes (healthy)
|
# Expected: vera-ai Up X minutes (healthy)
|
||||||
|
|
||||||
# ✅ Timezone
|
# Timezone
|
||||||
docker exec vera-ai date
|
docker exec vera-ai date
|
||||||
# Should show your timezone (e.g., CDT for America/Chicago)
|
# Should show your timezone (e.g., CDT for America/Chicago)
|
||||||
|
|
||||||
# ✅ User permissions
|
# User permissions
|
||||||
docker exec vera-ai id
|
docker exec vera-ai id
|
||||||
# Expected: uid=1000(appuser) gid=1000(appgroup)
|
# Expected: uid=1000(appuser) gid=1000(appgroup)
|
||||||
|
|
||||||
# ✅ Directories
|
# Directories
|
||||||
docker exec vera-ai ls -la /app/prompts/
|
docker exec vera-ai ls -la /app/prompts/
|
||||||
# Should show: curator_prompt.md, systemprompt.md
|
# Should show: curator_prompt.md, systemprompt.md
|
||||||
|
|
||||||
# ✅ Test chat
|
# Test chat
|
||||||
curl -X POST http://localhost:11434/api/chat \
|
curl -X POST http://localhost:11434/api/chat \
|
||||||
-H "Content-Type: application/json" \
|
-H "Content-Type: application/json" \
|
||||||
-d '{"model":"qwen3.5:397b-cloud","messages":[{"role":"user","content":"hello"}],"stream":false}'
|
-d '{"model":"qwen3.5:397b-cloud","messages":[{"role":"user","content":"hello"}],"stream":false}'
|
||||||
@@ -221,26 +317,26 @@ curl -X POST http://localhost:11434/api/chat \
|
|||||||
|
|
||||||
```
|
```
|
||||||
vera-ai-v2/
|
vera-ai-v2/
|
||||||
├── 📁 config/
|
├── config/
|
||||||
│ └── 📄 config.toml # Main configuration
|
│ └── config.toml # Main configuration
|
||||||
├── 📁 prompts/
|
├── prompts/
|
||||||
│ ├── 📄 curator_prompt.md # Memory curation prompt
|
│ ├── curator_prompt.md # Memory curation prompt
|
||||||
│ └── 📄 systemprompt.md # System context
|
│ └── systemprompt.md # System context
|
||||||
├── 📁 logs/ # Debug logs (when debug=true)
|
├── logs/ # Debug logs (when debug=true)
|
||||||
├── 📁 app/
|
├── app/
|
||||||
│ ├── 🐍 main.py # FastAPI application
|
│ ├── main.py # FastAPI application
|
||||||
│ ├── 🐍 config.py # Configuration loader
|
│ ├── config.py # Configuration loader
|
||||||
│ ├── 🐍 curator.py # Memory curation
|
│ ├── curator.py # Memory curation
|
||||||
│ ├── 🐍 proxy_handler.py # Chat handling
|
│ ├── proxy_handler.py # Chat handling
|
||||||
│ ├── 🐍 qdrant_service.py # Vector operations
|
│ ├── qdrant_service.py # Vector operations
|
||||||
│ ├── 🐍 singleton.py # QdrantService singleton
|
│ ├── singleton.py # QdrantService singleton
|
||||||
│ └── 🐍 utils.py # Utilities
|
│ └── utils.py # Utilities
|
||||||
├── 📁 static/ # Legacy symlinks
|
├── static/ # Legacy symlinks
|
||||||
├── 📄 .env.example # Environment template
|
├── .env.example # Environment template
|
||||||
├── 📄 docker-compose.yml # Docker Compose
|
├── docker-compose.yml # Docker Compose
|
||||||
├── 📄 Dockerfile # Container definition
|
├── Dockerfile # Container definition
|
||||||
├── 📄 requirements.txt # Python dependencies
|
├── requirements.txt # Python dependencies
|
||||||
└── 📄 README.md # This file
|
└── README.md # This file
|
||||||
```
|
```
|
||||||
|
|
||||||
## 🐳 Docker Compose
|
## 🐳 Docker Compose
|
||||||
@@ -281,11 +377,11 @@ The `TZ` variable sets the container timezone for the scheduler:
|
|||||||
|
|
||||||
```bash
|
```bash
|
||||||
# Common timezones
|
# Common timezones
|
||||||
TZ=UTC # Coordinated Universal Time
|
TZ=UTC # Coordinated Universal Time
|
||||||
TZ=America/New_York # Eastern Time
|
TZ=America/New_York # Eastern Time
|
||||||
TZ=America/Chicago # Central Time
|
TZ=America/Chicago # Central Time
|
||||||
TZ=America/Los_Angeles # Pacific Time
|
TZ=America/Los_Angeles # Pacific Time
|
||||||
TZ=Europe/London # GMT/BST
|
TZ=Europe/London # GMT/BST
|
||||||
```
|
```
|
||||||
|
|
||||||
**Curation Schedule:**
|
**Curation Schedule:**
|
||||||
@@ -316,28 +412,6 @@ curl -X POST "http://localhost:11434/curator/run?full=true"
|
|||||||
|
|
||||||
## 🧠 Memory System
|
## 🧠 Memory System
|
||||||
|
|
||||||
### 4-Layer Context
|
|
||||||
|
|
||||||
```
|
|
||||||
┌─────────────────────────────────────────────────────────────┐
|
|
||||||
│ Layer 1: System Prompt │
|
|
||||||
│ - From prompts/systemprompt.md │
|
|
||||||
│ - Static context, curator can append rules │
|
|
||||||
├─────────────────────────────────────────────────────────────┤
|
|
||||||
│ Layer 2: Semantic Memory │
|
|
||||||
│ - Curated Q&A pairs from Qdrant │
|
|
||||||
│ - Retrieved by relevance to current message │
|
|
||||||
├─────────────────────────────────────────────────────────────┤
|
|
||||||
│ Layer 3: Recent Context │
|
|
||||||
│ - Last N conversation turns from Qdrant │
|
|
||||||
│ - Chronological order │
|
|
||||||
├─────────────────────────────────────────────────────────────┤
|
|
||||||
│ Layer 4: Current Messages │
|
|
||||||
│ - User/assistant messages from current request │
|
|
||||||
│ - Passed through unchanged │
|
|
||||||
└─────────────────────────────────────────────────────────────┘
|
|
||||||
```
|
|
||||||
|
|
||||||
### Memory Types
|
### Memory Types
|
||||||
|
|
||||||
| Type | Description | Retention |
|
| Type | Description | Retention |
|
||||||
@@ -346,6 +420,11 @@ curl -X POST "http://localhost:11434/curator/run?full=true"
|
|||||||
| `curated` | Cleaned Q&A pairs | Permanent |
|
| `curated` | Cleaned Q&A pairs | Permanent |
|
||||||
| `test` | Test entries | Can be ignored |
|
| `test` | Test entries | Can be ignored |
|
||||||
|
|
||||||
|
### Curation Process
|
||||||
|
|
||||||
|
1. **Daily (02:00)**: Processes raw memories from last 24h into curated Q&A pairs
|
||||||
|
2. **Monthly (03:00 on 1st)**: Processes ALL remaining raw memories for full cleanup
|
||||||
|
|
||||||
## 🔧 Troubleshooting
|
## 🔧 Troubleshooting
|
||||||
|
|
||||||
### Permission Denied
|
### Permission Denied
|
||||||
|
|||||||
Reference in New Issue
Block a user