v2.0.3: Improve error handling, add tests, cleanup

- Fix bare except clauses in curator.py and main.py - Change embedding model to snowflake-arctic-embed2 - Increase semantic_score_threshold to 0.6 - Add memory context explanation to systemprompt.md - Add pytest dependencies to requirements.txt - Remove unused context_handler.py and .env.example - Add project documentation (CLAUDE.md) and test files Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-30 08:47:56 -05:00
parent 34304a79e0
commit abfcc91eb3
12 changed files with 342 additions and 243 deletions
--- a/.claude/skills/ssh/SKILL.md
+++ b/.claude/skills/ssh/SKILL.md
@@ -0,0 +1,69 @@
+---
+name: ssh
+description: SSH into remote servers and execute commands. Use for remote operations, file transfers, and server management.
+allowed-tools: Bash(ssh*), Bash(scp*), Bash(rsync*), Bash(sshpass*), Read, Write
+argument-hint: [host-alias]
+---
+
+## SSH Connections
+
+| Alias | Host | User | Password | Hostname | Purpose |
+|-------|------|------|----------|----------|---------|
+| `deb9` | `10.0.0.48` | `n8n` | `passw0rd` | epyc-deb9 | vera-ai source project |
+| `deb8` | `10.0.0.46` | `n8n` | `passw0rd` | epyc-deb8 | vera-ai Docker runtime |
+
+## Connection Commands
+
+**Interactive SSH:**
+```bash
+sshpass -p 'passw0rd' ssh -o StrictHostKeyChecking=no n8n@10.0.0.48
+sshpass -p 'passw0rd' ssh -o StrictHostKeyChecking=no n8n@10.0.0.46
+```
+
+**Run single command:**
+```bash
+sshpass -p 'passw0rd' ssh -o StrictHostKeyChecking=no n8n@10.0.0.48 "command"
+sshpass -p 'passw0rd' ssh -o StrictHostKeyChecking=no n8n@10.0.0.46 "command"
+```
+
+**Copy file to server:**
+```bash
+sshpass -p 'passw0rd' scp -o StrictHostKeyChecking=no local_file n8n@10.0.0.48:/remote/path
+sshpass -p 'passw0rd' scp -o StrictHostKeyChecking=no local_file n8n@10.0.0.46:/remote/path
+```
+
+**Copy file from server:**
+```bash
+sshpass -p 'passw0rd' scp -o StrictHostKeyChecking=no n8n@10.0.0.48:/remote/path local_file
+sshpass -p 'passw0rd' scp -o StrictHostKeyChecking=no n8n@10.0.0.46:/remote/path local_file
+```
+
+**Sync directory to server:**
+```bash
+sshpass -p 'passw0rd' rsync -avz -e "ssh -o StrictHostKeyChecking=no" local_dir/ n8n@10.0.0.48:/remote/path/
+sshpass -p 'passw0rd' rsync -avz -e "ssh -o StrictHostKeyChecking=no" local_dir/ n8n@10.0.0.46:/remote/path/
+```
+
+**Sync directory from server:**
+```bash
+sshpass -p 'passw0rd' rsync -avz -e "ssh -o StrictHostKeyChecking=no" n8n@10.0.0.48:/remote/path/ local_dir/
+sshpass -p 'passw0rd' rsync -avz -e "ssh -o StrictHostKeyChecking=no" n8n@10.0.0.46:/remote/path/ local_dir/
+```
+
+## Notes
+
+- Uses `sshpass` to handle password authentication non-interactively
+- `-o StrictHostKeyChecking=no` prevents host key prompts (useful for automation)
+- For frequent connections, consider setting up SSH key authentication instead of password
+
+## SSH Config (Optional)
+
+To simplify connections, add to `~/.ssh/config`:
+
+```
+Host n8n-server
+    HostName 10.0.0.48
+    User n8n
+```
+
+Then connect with just `ssh n8n-server` (still needs password or key).
--- a/.env.example
+++ b/.env.example
@@ -1,31 +0,0 @@
-# Vera-AI Environment Configuration
-# Copy this file to .env and customize for your deployment
-
-# =============================================================================
-# User/Group Configuration
-# =============================================================================
-# UID and GID for the container user (must match host user for volume permissions)
-# Run: id -u and id -g on your host to get these values
-APP_UID=1000
-APP_GID=1000
-
-# =============================================================================
-# Timezone Configuration
-# =============================================================================
-# Timezone for the container (affects scheduler times)
-# Common values: UTC, America/New_York, America/Chicago, America/Los_Angeles, Europe/London
-TZ=America/Chicago
-
-# =============================================================================
-# API Keys (Optional)
-# =============================================================================
-# OpenRouter API key for cloud model routing
-# OPENROUTER_API_KEY=your_api_key_here
-
-# =============================================================================
-# Vera-AI Configuration Paths (Optional)
-# =============================================================================
-# These can be overridden via environment variables
-# VERA_CONFIG_DIR=/app/config
-# VERA_PROMPTS_DIR=/app/prompts
-# VERA_STATIC_DIR=/app/static
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -0,0 +1,128 @@
+# Vera-AI Project
+
+**Persistent Memory Proxy for Ollama**
+
+> **Status:** Built and running on deb8. Goal: Validate and improve.
+
+Vera-AI sits between AI clients and Ollama, storing conversations in Qdrant and retrieving context semantically — giving AI **true memory**.
+
+## Architecture
+
+```
+Client → Vera-AI (port 11434) → Ollama
+              ↓
+           Qdrant (vector DB)
+              ↓
+           Memory Storage
+```
+
+## Key Components
+
+| File | Purpose |
+|------|---------|
+| `app/main.py` | FastAPI application entry point |
+| `app/proxy_handler.py` | Chat request handling |
+| `app/qdrant_service.py` | Vector DB operations |
+| `app/curator.py` | Memory curation (daily/monthly) |
+| `app/config.py` | Configuration loader |
+| `config/config.toml` | Main configuration file |
+
+## 4-Layer Context System
+
+1. **System Prompt** — From `prompts/systemprompt.md`
+2. **Semantic Memory** — Curated Q&A from Qdrant (relevance search)
+3. **Recent Context** — Last N conversation turns
+4. **Current Messages** — User's current request
+
+## Configuration
+
+Key settings in `config/config.toml`:
+
+```toml
+[general]
+ollama_host = "http://10.0.0.10:11434"
+qdrant_host = "http://10.0.0.22:6333"
+qdrant_collection = "memories"
+embedding_model = "snowflake-arctic-embed2"
+
+[layers]
+semantic_token_budget = 25000
+context_token_budget = 22000
+semantic_search_turns = 2
+semantic_score_threshold = 0.6
+
+[curator]
+run_time = "02:00"  # Daily curation time
+curator_model = "gpt-oss:120b"
+```
+
+## Environment Variables
+
+| Variable | Default | Description |
+|----------|---------|-------------|
+| `APP_UID` | `999` | Container user ID |
+| `APP_GID` | `999` | Container group ID |
+| `TZ` | `UTC` | Timezone |
+| `VERA_DEBUG` | `false` | Enable debug logging |
+
+## Running
+
+```bash
+# Build and start
+docker compose build
+docker compose up -d
+
+# Check status
+docker ps
+docker logs VeraAI --tail 20
+
+# Health check
+curl http://localhost:11434/
+```
+
+## API Endpoints
+
+| Endpoint | Method | Description |
+|----------|--------|-------------|
+| `/` | GET | Health check |
+| `/api/chat` | POST | Chat completion (with memory) |
+| `/api/tags` | GET | List models |
+| `/api/generate` | POST | Generate completion |
+| `/curator/run` | POST | Trigger curation manually |
+
+## Development Workflow
+
+This project is synced with **deb9** (10.0.0.48). To sync changes:
+
+```bash
+# Pull from deb9
+sshpass -p 'passw0rd' scp -r -o StrictHostKeyChecking=no n8n@10.0.0.48:/home/n8n/vera-ai/* /home/n8n/vera-ai/
+
+# Push to deb9 (after local changes)
+sshpass -p 'passw0rd' scp -r -o StrictHostKeyChecking=no /home/n8n/vera-ai/* n8n@10.0.0.48:/home/n8n/vera-ai/
+```
+
+## Memory System
+
+- **raw** memories — Unprocessed conversation turns (until curation)
+- **curated** memories — Cleaned Q&A pairs (permanent)
+- **test** memories — Test entries (can be ignored)
+
+Curation runs daily at 02:00 and monthly on the 1st at 03:00.
+
+## Related Infrastructure
+
+| Service | Host | Port |
+|---------|------|------|
+| Qdrant | 10.0.0.22 | 6333 |
+| Ollama | 10.0.0.10 | 11434 |
+| deb9 | 10.0.0.48 | Source project (SSH) |
+| deb8 | 10.0.0.46 | Docker runtime |
+
+## Qdrant Collections
+
+| Collection | Purpose |
+|------------|---------|
+| `python_kb` | Python code patterns reference for this project |
+| `memories` | Conversation memory storage (default) |
+| `vera_memories` | Alternative memory collection |
--- a/app/context_handler.py
+++ b/app/context_handler.py
@@ -1,208 +0,0 @@
-"""Context handler - builds 4-layer context for every request."""
-import httpx
-import logging
-from typing import List, Dict, Any, Optional
-from pathlib import Path
-from .config import Config
-from .qdrant_service import QdrantService
-from .utils import count_tokens, truncate_by_tokens
-
-logger = logging.getLogger(__name__)
-
-
-class ContextHandler:
-    def __init__(self, config: Config):
-        self.config = config
-        self.qdrant = QdrantService(
-            host=config.qdrant_host,
-            collection=config.qdrant_collection,
-            embedding_model=config.embedding_model,
-            ollama_host=config.ollama_host
-        )
-        self.system_prompt = self._load_system_prompt()
-    
-    def _load_system_prompt(self) -> str:
-        """Load system prompt from static/systemprompt.md."""
-        try:
-            path = Path(__file__).parent.parent / "static" / "systemprompt.md"
-            return path.read_text().strip()
-        except FileNotFoundError:
-            logger.error("systemprompt.md not found - required file")
-            raise
-    
-    async def process(self, messages: List[Dict], model: str, stream: bool = False) -> Dict:
-        """Process chat request through 4-layer context."""
-        # Get user question (last user message)
-        user_question = ""
-        for msg in reversed(messages):
-            if msg.get("role") == "user":
-                user_question = msg.get("content", "")
-                break
-        
-        # Get messages for semantic search (last N turns)
-        search_messages = []
-        for msg in messages[-self.config.semantic_search_turns:]:
-            if msg.get("role") in ("user", "assistant"):
-                search_messages.append(msg.get("content", ""))
-        
-        # Build the 4-layer context messages
-        context_messages = await self.build_context_messages(
-            incoming_system=next((m for m in messages if m.get("role") == "system"), None),
-            user_question=user_question,
-            search_context=" ".join(search_messages)
-        )
-        
-        # Forward to Ollama
-        async with httpx.AsyncClient(timeout=120.0) as client:
-            response = await client.post(
-                f"{self.config.ollama_host}/api/chat",
-                json={"model": model, "messages": context_messages, "stream": stream}
-            )
-            result = response.json()
-        
-        # Store the Q&A turn in Qdrant
-        assistant_msg = result.get("message", {}).get("content", "")
-        await self.qdrant.store_qa_turn(user_question, assistant_msg)
-        
-        return result
-    
-    def _parse_curated_turn(self, text: str) -> List[Dict]:
-        """Parse a curated turn into alternating user/assistant messages.
-        
-        Input format:
-            User: [question]
-            Assistant: [answer]
-            Timestamp: ISO datetime
-        
-        Returns list of message dicts with role and content.
-        """
-        messages = []
-        lines = text.strip().split("\n")
-        
-        current_role = None
-        current_content = []
-        
-        for line in lines:
-            line = line.strip()
-            if line.startswith("User:"):
-                # Save previous content if exists
-                if current_role and current_content:
-                    messages.append({
-                        "role": current_role,
-                        "content": "\n".join(current_content).strip()
-                    })
-                current_role = "user"
-                current_content = [line[5:].strip()]  # Remove "User:" prefix
-            elif line.startswith("Assistant:"):
-                # Save previous content if exists
-                if current_role and current_content:
-                    messages.append({
-                        "role": current_role,
-                        "content": "\n".join(current_content).strip()
-                    })
-                current_role = "assistant"
-                current_content = [line[10:].strip()]  # Remove "Assistant:" prefix
-            elif line.startswith("Timestamp:"):
-                # Ignore timestamp line
-                continue
-            elif current_role:
-                # Continuation of current message
-                current_content.append(line)
-        
-        # Save last message
-        if current_role and current_content:
-            messages.append({
-                "role": current_role,
-                "content": "\n".join(current_content).strip()
-            })
-        
-        return messages
-    
-    async def build_context_messages(self, incoming_system: Optional[Dict], user_question: str, search_context: str) -> List[Dict]:
-        """Build 4-layer context messages array."""
-        messages = []
-        token_budget = {
-            "semantic": self.config.semantic_token_budget,
-            "context": self.config.context_token_budget
-        }
-        
-        # === LAYER 1: System Prompt (pass through unchanged) ===
-        # DO NOT truncate - preserve system prompt entirely
-        system_content = ""
-        if incoming_system:
-            system_content = incoming_system.get("content", "")
-            logger.info(f"System layer: preserved incoming system {len(system_content)} chars, {count_tokens(system_content)} tokens")
-        
-        # Add Vera context info if present (small, just metadata)
-        if self.system_prompt.strip():
-            system_content += "\n\n" + self.system_prompt
-            logger.info(f"System layer: added vera context {len(self.system_prompt)} chars")
-        
-        messages.append({"role": "system", "content": system_content})
-        
-        # === LAYER 2: Semantic Layer (curated memories) ===
-        # Search for curated blocks only
-        semantic_results = await self.qdrant.semantic_search(
-            query=search_context if search_context else user_question,
-            limit=20,
-            score_threshold=self.config.semantic_score_threshold,
-            entry_type="curated"
-        )
-        
-        # Parse curated turns into alternating user/assistant messages
-        semantic_messages = []
-        semantic_tokens_used = 0
-        
-        for result in semantic_results:
-            payload = result.get("payload", {})
-            text = payload.get("text", "")
-            if text:
-                parsed = self._parse_curated_turn(text)
-                for msg in parsed:
-                    msg_tokens = count_tokens(msg.get("content", ""))
-                    if semantic_tokens_used + msg_tokens <= token_budget["semantic"]:
-                        semantic_messages.append(msg)
-                        semantic_tokens_used += msg_tokens
-                    else:
-                        break
-        
-        # Add parsed messages to context
-        for msg in semantic_messages:
-            messages.append(msg)
-        
-        if semantic_messages:
-            logger.info(f"Semantic layer: {len(semantic_messages)} messages, ~{semantic_tokens_used} tokens")
-        
-        # === LAYER 3: Context Layer (recent turns) ===
-        recent_turns = await self.qdrant.get_recent_turns(limit=50)
-        
-        context_messages_parsed = []
-        context_tokens_used = 0
-        
-        for turn in reversed(recent_turns):  # Oldest first
-            payload = turn.get("payload", {})
-            text = payload.get("text", "")
-            entry_type = payload.get("type", "raw")
-            
-            if text:
-                # Parse turn into messages
-                parsed = self._parse_curated_turn(text)
-                
-                for msg in parsed:
-                    msg_tokens = count_tokens(msg.get("content", ""))
-                    if context_tokens_used + msg_tokens <= token_budget["context"]:
-                        context_messages_parsed.append(msg)
-                        context_tokens_used += msg_tokens
-                    else:
-                        break
-        
-        for msg in context_messages_parsed:
-            messages.append(msg)
-        
-        if context_messages_parsed:
-            logger.info(f"Context layer: {len(context_messages_parsed)} messages, ~{context_tokens_used} tokens")
-        
-        # === LAYER 4: Current Question ===
-        messages.append({"role": "user", "content": user_question})
-        
-        return messages
--- a/app/curator.py
+++ b/app/curator.py
@@ -171,7 +171,8 @@ Remember: Respond with ONLY valid JSON. No markdown, no explanations, just the J
            mem_time = datetime.fromisoformat(timestamp.replace("Z", "+00:00"))
            cutoff = datetime.utcnow() - timedelta(hours=hours)
            return mem_time.replace(tzinfo=None) > cutoff
-        except:
+        except (ValueError, TypeError):
+            logger.debug(f"Could not parse timestamp: {timestamp}")
            return True

    def _format_raw_turns(self, turns: List[Dict]) -> str:
--- a/app/main.py
+++ b/app/main.py
@@ -80,7 +80,8 @@ async def health_check():
            resp = await client.get(f"{config.ollama_host}/api/tags")
            if resp.status_code == 200:
                ollama_status = "reachable"
-    except: pass
+    except Exception:
+        logger.warning(f"Failed to reach Ollama at {config.ollama_host}")
    return {"status": "ok", "ollama": ollama_status}


--- a/config/config.toml
+++ b/config/config.toml
@@ -2,14 +2,14 @@
 ollama_host = "http://10.0.0.10:11434"
 qdrant_host = "http://10.0.0.22:6333"
 qdrant_collection = "memories"
-embedding_model = "mxbai-embed-large"
+embedding_model = "snowflake-arctic-embed2"
 debug = false

 [layers]
 semantic_token_budget = 25000
 context_token_budget = 22000
 semantic_search_turns = 2
-semantic_score_threshold = 0.3
+semantic_score_threshold = 0.6

 [curator]
 run_time = "02:00"
--- a/prompts/systemprompt.md
+++ b/prompts/systemprompt.md
@@ -1 +1,10 @@
+You have persistent memory across all conversations with this user.

+**Important:** The latter portion of your conversation context contains memories retrieved from a vector database. These are curated summaries of past conversations, not live chat history.
+
+Use these memories to:
+- Reference previous decisions and preferences
+- Draw on relevant past discussions
+- Provide personalized, context-aware responses
+
+If memories seem outdated or conflicting, ask for clarification.
--- a/requirements.txt
+++ b/requirements.txt
@@ -6,3 +6,5 @@ ollama>=0.1.0
 toml>=0.10.2
 tiktoken>=0.5.0
 apscheduler>=3.10.0
+pytest>=7.0.0
+pytest-asyncio>=0.21.0
--- a/tests/init.py
+++ b/tests/init.py
@@ -0,0 +1 @@
+# Test package
--- a/tests/test_config.py
+++ b/tests/test_config.py
@@ -0,0 +1,42 @@
+"""Tests for configuration."""
+import pytest
+from pathlib import Path
+from app.config import Config, EMBEDDING_DIMS
+
+
+class TestConfig:
+    """Tests for Config class."""
+
+    def test_default_values(self):
+        """Config should have sensible defaults."""
+        config = Config()
+        assert config.ollama_host == "http://10.0.0.10:11434"
+        assert config.qdrant_host == "http://10.0.0.22:6333"
+        assert config.qdrant_collection == "memories"
+        assert config.embedding_model == "snowflake-arctic-embed2"
+
+    def test_vector_size_property(self):
+        """Vector size should match embedding model."""
+        config = Config(embedding_model="snowflake-arctic-embed2")
+        assert config.vector_size == 1024
+
+    def test_vector_size_fallback(self):
+        """Unknown model should default to 1024."""
+        config = Config(embedding_model="unknown-model")
+        assert config.vector_size == 1024
+
+
+class TestEmbeddingDims:
+    """Tests for embedding dimensions mapping."""
+
+    def test_snowflake_arctic_embed2(self):
+        """snowflake-arctic-embed2 should have 1024 dimensions."""
+        assert EMBEDDING_DIMS["snowflake-arctic-embed2"] == 1024
+
+    def test_nomic_embed_text(self):
+        """nomic-embed-text should have 768 dimensions."""
+        assert EMBEDDING_DIMS["nomic-embed-text"] == 768
+
+    def test_mxbai_embed_large(self):
+        """mxbai-embed-large should have 1024 dimensions."""
+        assert EMBEDDING_DIMS["mxbai-embed-large"] == 1024
--- a/tests/test_utils.py
+++ b/tests/test_utils.py
@@ -0,0 +1,85 @@
+"""Tests for utility functions."""
+import pytest
+from app.utils import count_tokens, truncate_by_tokens, parse_curated_turn
+
+
+class TestCountTokens:
+    """Tests for count_tokens function."""
+
+    def test_empty_string(self):
+        """Empty string should return 0 tokens."""
+        assert count_tokens("") == 0
+
+    def test_simple_text(self):
+        """Simple text should count tokens correctly."""
+        text = "Hello, world!"
+        assert count_tokens(text) > 0
+
+    def test_longer_text(self):
+        """Longer text should have more tokens."""
+        short = "Hello"
+        long = "Hello, this is a longer sentence with more words."
+        assert count_tokens(long) > count_tokens(short)
+
+
+class TestTruncateByTokens:
+    """Tests for truncate_by_tokens function."""
+
+    def test_no_truncation_needed(self):
+        """Text shorter than limit should not be truncated."""
+        text = "Short text"
+        result = truncate_by_tokens(text, max_tokens=100)
+        assert result == text
+
+    def test_truncation_applied(self):
+        """Text longer than limit should be truncated."""
+        text = "This is a longer piece of text that will need to be truncated"
+        result = truncate_by_tokens(text, max_tokens=5)
+        assert count_tokens(result) <= 5
+
+    def test_empty_string(self):
+        """Empty string should return empty string."""
+        assert truncate_by_tokens("", max_tokens=10) == ""
+
+
+class TestParseCuratedTurn:
+    """Tests for parse_curated_turn function."""
+
+    def test_empty_string(self):
+        """Empty string should return empty list."""
+        assert parse_curated_turn("") == []
+
+    def test_single_turn(self):
+        """Single Q&A turn should parse correctly."""
+        text = "User: What is Python?\nAssistant: A programming language."
+        result = parse_curated_turn(text)
+        assert len(result) == 2
+        assert result[0]["role"] == "user"
+        assert result[0]["content"] == "What is Python?"
+        assert result[1]["role"] == "assistant"
+        assert result[1]["content"] == "A programming language."
+
+    def test_multiple_turns(self):
+        """Multiple Q&A turns should parse correctly."""
+        text = """User: What is Python?
+Assistant: A programming language.
+User: Is it popular?
+Assistant: Yes, very popular."""
+        result = parse_curated_turn(text)
+        assert len(result) == 4
+
+    def test_timestamp_ignored(self):
+        """Timestamp lines should be ignored."""
+        text = "User: Question?\nAssistant: Answer.\nTimestamp: 2024-01-01T00:00:00Z"
+        result = parse_curated_turn(text)
+        assert len(result) == 2
+        for msg in result:
+            assert "Timestamp" not in msg["content"]
+
+    def test_multiline_content(self):
+        """Multiline content should be preserved."""
+        text = "User: Line 1\nLine 2\nLine 3\nAssistant: Response"
+        result = parse_curated_turn(text)
+        assert "Line 1" in result[0]["content"]
+        assert "Line 2" in result[0]["content"]
+        assert "Line 3" in result[0]["content"]