diff --git a/GIT_VALIDATION_CHECK.md b/GIT_VALIDATION_CHECK.md index 7c41ecd..690dbf5 100644 --- a/GIT_VALIDATION_CHECK.md +++ b/GIT_VALIDATION_CHECK.md @@ -1,6 +1,6 @@ # TrueRecall v2 - Git Validation Checklist -**Environment:** Git Repository (`.git_projects/true-recall-v2/`) +**Environment:** Git Repository (`.git_projects/true-recall-gems/`) **Purpose:** Validate git-ready directory for public sharing **Version:** 2.4 **Last Updated:** 2026-02-26 diff --git a/README.md b/README.md index 732ab9a..6858456 100644 --- a/README.md +++ b/README.md @@ -1,613 +1,147 @@ -# TrueRecall v2 +# TrueRecall Gems (v2) -**Project:** Gem extraction and memory recall system -**Status:** ✅ Active & Verified -**Location:** `~/.openclaw/workspace/.local_projects/true-recall-v2/` -**Last Updated:** 2026-02-25 12:04 CST +**Purpose:** Memory curation (gems) + context injection ---- - -## Table of Contents - -- [Quick Start](#quick-start) -- [Overview](#overview) -- [Current State](#current-state) -- [Architecture](#architecture) -- [Components](#components) -- [Files & Locations](#files--locations) -- [Configuration](#configuration) -- [Validation](#validation) -- [Troubleshooting](#troubleshooting) -- [Status Summary](#status-summary) - ---- - -## Quick Start - -```bash -# Check system status -openclaw status -sudo systemctl status mem-qdrant-watcher - -# View recent captures -curl -s http://:6333/collections/memories_tr | jq '.result.points_count' - -# Check collections -curl -s http://:6333/collections | jq '.result.collections[].name' -``` - ---- - -## Recent Fixes (2026-02-25 12:41 CST) - -| Issue | Root Cause | Fix Applied | -|-------|------------|-------------| -| **Watcher stuck on old session** | Watcher only switched sessions when file deleted, old sessions persisted | ✅ Restarted service, now follows current session | -| **Plugin capture 0 exchanges** | OpenClaw uses OpenAI content format (array of items), plugin expected string | ✅ Added `extractMessageText()` to extract text from `type: "text"` items | -| **Gem ID collision** | Hash used non-existent fields (`conversation_id`, `turn_range`, `gem`) | ✅ Hash now uses `embedding_text_for_hash[:100]` | -| **Meta-gems extracted** | Curator extracted from debug/tool output | ✅ Added SKIP_PATTERNS filter ("gems extracted", "✅", "🔍", etc.) + skip `role: "assistant"` | -| **gems_tr pollution** | 5 meta-gems + 1 real gem | ✅ Cleaned, now 1 real gem only | -| **First-person format** | Third person "User decided..." | ✅ Changed to "I decided..." for better query matching (score 0.746 vs 0.39) | - -### Validation Results - -**Plugin capture:** -``` -Before: parsed 14 user, 84 assistant messages, 0 exchanges -After: parsed 17 user, 116 assistant messages, 9 exchanges ✅ -``` - -**Watcher:** -``` -Before: Watching old session (1737142a... from Feb 24) -After: Watching current session (93dc32bf... from Feb 25) ✅ -``` - ---- - -## Needed Improvements - -| Issue | Description | Priority | -|-------|-------------|----------| -| **Semantic Deduplication** | No dedup between similar gems. Same fact phrased differently creates multiple gems. Need semantic similarity check before storage. | High | -| **Search Result Deduplication** | Similar gems both above threshold are both injected, causing redundancy. Need filter to remove near-duplicates from results. | Medium | -| **Gem Quality Scoring** | No quality metric. Some extracted gems may be low value. Need LLM-based quality scoring. | Medium | -| **Temporal Decay** | All gems treated equally regardless of age. Should weight recent gems higher. | Low | -| **Gem Merging/Updating** | When user changes preference, old gem still exists. Need mechanism to update/contradict old gems. | Low | -| **Importance Calibration** | All curator gems marked "medium" importance. Should dynamically assign based on significance. | Low | - -### Gem Quality & Model Intelligence - -**Gem quality improves significantly with smarter models:** - -| Model | Gem Quality | Example | -|-------|-------------|---------| -| **Small models (7B)** | Basic extraction, may miss nuance | "User likes local AI" | -| **Medium models (30B)** | Better categorization, captures intent | "I prefer local AI over cloud services for privacy reasons" | -| **Large models (70B+)** | Rich context, infers significance, better first-person conversion | "I decided to self-host AI tools because I value data privacy and want to avoid vendor lock-in" | - -**Example Gem (High Quality):** -```json -{ - "text": "I decided to keep the installation simple and not include gems for the basic version", - "category": "decision", - "importance": "high" -} -``` - -**Current:** Using `qwen3:30b-a3b-instruct` for extraction (good balance of quality/speed). -**Recommendation:** For production use, consider `qwen3:72b` or `deepseek-r1` for higher gem quality. +**Status:** ⚠️ Requires true-recall-base to be installed first --- ## Overview -TrueRecall v2 is a **standalone memory system** that extracts "gems" (key insights) from conversations and injects them as context. It operates independently — not an addon or extension of any previous system. +TrueRecall Gems adds **curation** and **injection** on top of Base's capture foundation. -TrueRecall v2 replaces both Jarvis Memory and TrueRecall v1 with a completely re-architected solution: - -| System | Status | Relationship to v2 | -|--------|--------|-------------------| -| **Jarvis Memory** | Legacy | Replaced by v2 | -| **TrueRecall v1** | Deprecated | Replaced by v2 | -| **TrueRecall v2** | ✅ Active | Complete standalone replacement | - -### Three-Layer Architecture - -1. **Capture** — Real-time watcher saves every turn to `memories_tr` -2. **Curation** — Timer-based curator extracts gems to `gems_tr` -3. **Injection** — Plugin searches `gems_tr` and injects gems per turn - -**Key:** v2 requires no components from Jarvis Memory or v1. It is self-contained with its own storage (Qdrant-only), capture mechanism, and injection system. +**Gems is an ADDON:** +- Requires true-recall-base +- Independent from openclaw-true-recall-blocks +- Choose Gems OR Blocks, not both --- -## Current State - -### Verified at 19:02 CST - -| Collection | Points | Purpose | Status | -|------------|--------|---------|--------| -| `memories_tr` | **12,729** | Full text (live capture) | ✅ Active | -| `gems_tr` | **14+** | Curated gems (injection) | ✅ **WORKING** - Context injection verified | - -**All memories tagged with `curated: false` for timer curation.** - -### Services Status - -| Service | Status | Details | -|---------|--------|---------| -| `mem-qdrant-watcher` | ✅ Active | PID 234, capturing | -| Timer curator | ✅ Deployed | Every 5 min via cron | -| OpenClaw Gateway | ✅ Running | Version 2026.2.23 | -| memory-qdrant plugin | ✅ Loaded | recall: gems_tr | - ---- - -## Comparison: TrueRecall v2 vs Jarvis Memory vs v1 - -| Feature | Jarvis Memory | TrueRecall v1 | TrueRecall v2 | -|---------|---------------|---------------|---------------| -| **Storage** | Redis | Redis + Qdrant | Qdrant only | -| **Capture** | Session batch | Session batch | Real-time | -| **Curation** | Manual | Daily 2:45 AM | Timer (5 min) ✅ | -| **Embedding** | — | snowflake | snowflake-arctic-embed2 ✅ | -| **Curator LLM** | — | qwen3:4b | qwen3:30b | -| **State tracking** | — | — | `curated` tag | -| **Batch size** | — | 24h worth | Configurable | -| **JSON parsing** | — | Fallback needed | Native (30b) | - -**Key Improvements v2:** -- ✅ Real-time capture (no batch delay) -- ✅ Timer-based curation (responsive vs daily) -- ✅ 30b curator (better gems, faster ~3s) -- ✅ `curated` tag (reliable state tracking) -- ✅ No Redis dependency (simpler stack) - ---- - -## Architecture - -### v2.2: Timer-Based Curation +## Three-Tier Architecture ``` -┌─────────────────┐ ┌──────────────────────┐ ┌─────────────┐ -│ OpenClaw Chat │────▶│ Real-Time Watcher │────▶│ memories_tr │ -│ (Session JSONL)│ │ (Python daemon) │ │ (Qdrant) │ -└─────────────────┘ └──────────────────────┘ └──────┬──────┘ - │ - │ Every 30 min - ▼ - ┌──────────────────┐ - │ Timer Curator │ - │ (cron/qwen3) │ - └────────┬─────────┘ - │ - ▼ - ┌──────────────────┐ - │ gems_tr │ - │ (Qdrant) │ - └────────┬─────────┘ - │ - Per turn │ - ▼ - ┌──────────────────┐ - │ memory-qdrant │ - │ plugin │ - └──────────────────┘ -``` +true-recall-base (REQUIRED) +├── Watcher daemon +└── memories_tr (raw capture) + │ + └──▶ true-recall-gems (THIS ADDON) + ├── Curator extracts gems + ├── gems_tr (curated) + └── Plugin injection -**Key Changes in v2.2:** -- ✅ Timer-based curation (30 min intervals) -- ✅ All memories tagged `curated: false` on capture -- ✅ Migration complete (12,378 memories) -- ❌ Removed daily batch processing (2:45 AM) - ---- - -## Components - -### 1. Real-Time Watcher - -**File:** `skills/qdrant-memory/scripts/realtime_qdrant_watcher.py` - -**What it does:** -- Watches `~/.openclaw/agents/main/sessions/*.jsonl` -- Parses each turn (user + AI) -- Embeds with `snowflake-arctic-embed2` -- Stores to `memories_tr` instantly -- **Cleans:** Removes markdown, tables, metadata - -**Service:** `mem-qdrant-watcher.service` - -**Commands:** -```bash -# Check status -sudo systemctl status mem-qdrant-watcher - -# View logs -sudo journalctl -u mem-qdrant-watcher -f - -# Restart -sudo systemctl restart mem-qdrant-watcher +Note: Don't install with openclaw-true-recall-blocks. +Choose one addon: Gems OR Blocks. ``` --- -### 2. Content Cleaner +## Prerequisites -**File:** `skills/qdrant-memory/scripts/clean_memories_tr.py` +**REQUIRED: Install TrueRecall Base first** -**Purpose:** Batch-clean existing points - -**Usage:** -```bash -# Preview changes -python3 clean_memories_tr.py --dry-run - -# Clean all -python3 clean_memories_tr.py --execute - -# Clean 100 (test) -python3 clean_memories_tr.py --execute --limit 100 -``` - -**Cleans:** -- `**bold**` → plain text -- `|tables|` → removed -- `` `code` `` → plain text -- `---` rules → removed -- `# headers` → removed - ---- - -### 3. Timer Curator - -**File:** `tr-continuous/curator_timer.py` - -**Schedule:** Every 30 minutes (cron) - -**Flow:** -1. Query uncurated memories from `memories_tr` -2. Send batch to qwen3 (max 100) -3. Extract gems → store to `gems_tr` -4. Mark memories as `curated: true` - -**Config:** `tr-continuous/curator_config.json` -```json -{ - "timer_minutes": 30, - "max_batch_size": 100 -} -``` - -**Logs:** `/var/log/true-recall-timer.log` - ---- - -### 4. Curation Model Comparison - -**Current:** `qwen3:4b-instruct` - -| Metric | 4b | 30b | -|--------|----|----| -| Speed | ~10-30s per batch | **~3.3s** (tested 2026-02-24) | -| JSON reliability | ⚠️ Needs fallback | ✅ Native | -| Context quality | Basic extraction | ✅ Nuanced | -| Snippet accuracy | ~80% | ✅ Expected: 95%+ | - -**30b Benchmark (2026-02-24):** -- Load: 108ms -- Prompt eval: 49ms (1,576 tok/s) -- Generation: 2.9s (233 tokens, 80 tok/s) -- **Total: 3.26s** - -**Trade-offs:** -- **4b:** Faster batch processing, lightweight, catches explicit decisions -- **30b:** Deeper context, better inference, ~3x slower but superior quality - -**Gem Quality Comparison (Sample Review):** - -| Aspect | 4b | 30b | -|--------|----|----| -| **Context depth** | "Extracted via fallback" | Explains *why* decisions were made | -| **Confidence scores** | 0.7-0.85 | 0.9-0.97 | -| **Snippet accuracy** | ~80% (wrong source) | ✅ 95%+ (relevant quotes) | -| **Categories** | Generic "extracted" | Specific: knowledge, technical, decision | -| **Example** | "User implemented BorgBackup" (no context) | "User selected mxbai... due to top MTEB score of 66.5" (explains reasoning) | - -**Verdict:** 30b produces significantly higher quality gems — richer context, accurate snippets, and captures architectural intent, not just surface facts. - ---- - -### 5. OpenClaw Compactor Configuration - -**Status:** ✅ Applied - -**Goal:** Minimal overhead — just remove context, do nothing else. - -**Config Applied:** -```json5 -{ - agents: { - defaults: { - compaction: { - mode: "default", // "default" or "safeguard" - reserveTokensFloor: 0, // Disable safety floor (default: 20000) - memoryFlush: { - enabled: false // Disable silent .md file writes - } - } - } - } -} -``` - -**What this does:** -- `mode: "default"` — Standard summarization (faster) -- `reserveTokensFloor: 0` — Allow aggressive settings (disables 20k minimum) -- `memoryFlush.enabled: false` — No silent "write memory" turns - -**Note:** `reserveTokens` and `keepRecentTokens` are Pi runtime settings, not configurable via `agents.defaults.compaction`. They are set per-model in `contextWindow`/`contextTokens`. - ---- - -### 6. Configuration Options Reference - -**All configurable options with defaults:** - -| Option | Default | Description | -|--------|---------|-------------| -| **Embedding model** | `mxbai-embed-large` | Model for generating gem embeddings. `mxbai` = higher accuracy (MTEB 66.5). `snowflake` = faster processing. | -| **Timer interval** | `5` minutes | How often the curator runs. `5 min` = fast backlog clearing. `30 min` = balanced. `60 min` = minimal overhead. | -| **Batch size** | `100` | Max memories sent to curator per run. Higher = fewer API calls but more memory usage. | -| **Max gems per run** | *(unlimited)* | Hard limit on gems extracted per batch. Not set by default — extracts all found gems. | -| **Qdrant URL** | `http://:6333` | Vector database endpoint. Change if Qdrant runs on different host/port. | -| **Ollama URL** | `http://:11434` | LLM endpoint for gem extraction. Change if Ollama runs elsewhere. | -| **Curator LLM** | `qwen3:30b-a3b-instruct` | Model for extracting gems. `30b` = best quality (~3s). `4b` = faster but needs JSON fallback. | -| **User ID** | `rob` | Owner identifier for memories. Used for filtering and multi-user setups. | -| **Source collection** | `memories_tr` | Qdrant collection for raw captured memories. | -| **Target collection** | `gems_tr` | Qdrant collection for curated gems (injected into context). | -| **Watcher service** | `enabled` | Real-time capture daemon. Reads session JSONL and writes to Qdrant. | -| **Cron timer** | `enabled` | Periodic curation job. Runs `curator_timer.py` on schedule. | -| **Log path** | `/var/log/true-recall-timer.log` | Where curator output is written. Check with `tail -f`. | -| **Dry-run mode** | `disabled` | Test mode — shows what would be curated without writing to Qdrant. | - -**OpenClaw-side options:** -| Option | Default | Description | -|--------|---------|-------------| -| **Compactor mode** | `default` | How context is summarized. `default` = fast standard. `safeguard` = chunked for very long sessions. | -| **Memory flush** | `disabled` | If enabled, writes silent "memory" turn before compaction. Adds overhead — disabled for minimal lag. | -| **Context pruning** | `cache-ttl` | Removes old tool results from context. `cache-ttl` = prunes hourly. `off` = no pruning. | - ---- - -### 7. Embedding Models - -**Current Setup:** -- `memories_tr`: `snowflake-arctic-embed2` (capture) -- `gems_tr`: `snowflake-arctic-embed2` (recall) ✅ **FIXED** - Both collections now use same model - -**Note:** Previously used `mxbai-embed-large` for gems, but this caused embedding model mismatch. Fixed 2026-02-25. - ---- - -### 6. memory-qdrant Plugin - -**Location:** `~/.openclaw/extensions/memory-qdrant/` - -**Config (openclaw.json):** -```json -{ - "collectionName": "gems_tr", - "captureCollection": "memories_tr", - "autoRecall": true, - "autoCapture": true -} -``` - -**Functions:** -- **Recall:** Searches `gems_tr`, injects gems (hidden) -- **Capture:** Session-level to `memories_tr` (backup) - ---- - -## Files & Locations - -### Core Project - -``` -~/.openclaw/workspace/.local_projects/true-recall-v2/ -├── README.md # This file -├── session.md # Detailed notes -├── curator-prompt.md # Extraction prompt -├── tr-daily/ -│ └── curate_from_qdrant.py # Daily curator -└── shared/ -``` - -### New Files (2026-02-24) - -| File | Purpose | -|------|---------| -| `tr-continuous/curator_timer.py` | Timer curator (v2.2) | -| `tr-continuous/curator_config.json` | Curator settings | -| `tr-continuous/migrate_add_curated.py` | Migration script | -| `skills/qdrant-memory/scripts/realtime_qdrant_watcher.py` | Capture daemon | -| `skills/qdrant-memory/mem-qdrant-watcher.service` | Systemd service | - -### Archived Files (v2.1) - -| File | Status | Note | -|------|--------|------| -| `tr-daily/curate_from_qdrant.py` | 📦 Archived | Replaced by timer | -| `tr-continuous/curator_by_count.py` | 📦 Archived | Replaced by timer | - -### System Files - -| File | Purpose | -|------|---------| -| `~/.openclaw/extensions/memory-qdrant/` | Plugin code | -| `~/.openclaw/openclaw.json` | Configuration | -| `/etc/systemd/system/mem-qdrant-watcher.service` | Service file | - ---- - -## Configuration - -### memory-qdrant Plugin - -**File:** `~/.openclaw/openclaw.json` - -```json -{ - "memory-qdrant": { - "config": { - "autoCapture": true, - "autoRecall": true, - "collectionName": "gems_tr", - "captureCollection": "memories_tr", - "embeddingModel": "snowflake-arctic-embed2", - "maxRecallResults": 2, - "minRecallScore": 0.7, - "ollamaUrl": "http://:11434", - "qdrantUrl": "http://:6333" - }, - "enabled": true - } -} -``` - -### Gateway Control UI (OpenClaw 2026.2.23) - -```json -{ - "gateway": { - "controlUi": { - "allowedOrigins": ["*"], - "allowInsecureAuth": false, - "dangerouslyDisableDeviceAuth": true - } - } -} -``` - ---- - -## Validation - -### Check Collections +Base provides the capture infrastructure (`memories_tr` collection). ```bash -# Count points +# Verify base is running curl -s http://:6333/collections/memories_tr | jq '.result.points_count' +``` + +## Installation + +### 1. Curator Setup + +**Install cron job:** + +```bash +# Edit path and add to crontab +echo "*/5 * * * * cd /true-recall-gems/tr-continuous && /usr/bin/python3 curator_timer.py >> /var/log/true-recall-timer.log 2>&1" | sudo crontab - + +sudo touch /var/log/true-recall-timer.log +sudo chmod 644 /var/log/true-recall-timer.log +``` + +**Configure curator_config.json:** +```json +{ + "timer_minutes": 5, + "max_batch_size": 100, + "user_id": "your-user-id", + "source_collection": "memories_tr", + "target_collection": "gems_tr" +} +``` + +**Edit curator_timer.py:** +- Replace ``, `` with your endpoints +- Replace `` with your identifier +- Replace `` with your LLM (e.g., `qwen3:30b`) + +### 2. Injection Setup + +Add to your OpenClaw `openclaw.json`: + +```json +{ + "plugins": { + "entries": { + "memory-qdrant": { + "config": { + "autoCapture": true, + "autoRecall": true, + "captureCollection": "memories_tr", + "collectionName": "gems_tr", + "embeddingModel": "snowflake-arctic-embed2", + "maxRecallResults": 2, + "minRecallScore": 0.8, + "ollamaUrl": "http://:11434", + "qdrantUrl": "http://:6333" + }, + "enabled": true + } + }, + "slots": { + "memory": "memory-qdrant" + } + } +} +``` + +--- + +## Files + +| File | Purpose | +|------|---------| +| `tr-continuous/curator_timer.py` | Timer-based curator | +| `tr-continuous/curator_config.json` | Curator settings template | + +--- + +## Verification + +```bash +# Check v1 capture +curl -s http://:6333/collections/memories_tr | jq '.result.points_count' + +# Check v2 curation curl -s http://:6333/collections/gems_tr | jq '.result.points_count' -# View recent captures -curl -s -X POST http://:6333/collections/memories_tr/points/scroll \ - -H "Content-Type: application/json" \ - -d '{"limit": 3, "with_payload": true}' | jq '.result.points[].payload.content' -``` - -### Check Services - -```bash -# Watcher -sudo systemctl status mem-qdrant-watcher -sudo journalctl -u mem-qdrant-watcher -n 20 - -# OpenClaw -openclaw status -openclaw gateway status -``` - -### Test Capture - -Send a message, then check: -```bash -# Should increase by 1-2 points -curl -s http://:6333/collections/memories_tr | jq '.result.points_count' +# Check curator logs +tail -20 /var/log/true-recall-timer.log ``` --- -## Troubleshooting +## Dependencies -### Watcher Not Capturing - -```bash -# Check logs -sudo journalctl -u mem-qdrant-watcher -f - -# Verify dependencies -curl http://:6333/ # Qdrant -curl http://:11434/api/tags # Ollama -``` - -### Plugin Not Loading - -```bash -# Validate config -openclaw config validate - -# Check logs -tail /tmp/openclaw/openclaw-$(date +%Y-%m-%d).log | grep memory-qdrant - -# Restart gateway -openclaw gateway restart -``` - -### Gateway Won't Start (OpenClaw 2026.2.23+) - -**Error:** `non-loopback Control UI requires gateway.controlUi.allowedOrigins` - -**Fix:** Add to `openclaw.json`: -```json -"gateway": { - "controlUi": { - "allowedOrigins": ["*"] - } -} -``` +| Component | Provided By | Required For | +|-----------|-------------|--------------| +| Capture | v1 | v2 (input) | +| Curation | v2 | Injection | +| Injection | v2 | Context recall | --- -## Status Summary - -| Component | Status | Notes | -|-----------|--------|-------| -| Real-time watcher | ✅ Active | PID 1748, capturing | -| memories_tr | ✅ 12,378 pts | All tagged `curated: false` | -| gems_tr | ✅ 5 pts | Injection ready | -| Timer curator | ✅ Deployed | Every 30 min via cron | -| Plugin injection | ✅ **WORKING** | Context injection verified - score 0.587 | -| Migration | ✅ Complete | 12,378 memories | - -**Logs:** `tail /var/log/true-recall-timer.log` - -**Next:** Monitor first timer run - ---- - -## Roadmap - -### Planned Features - -| Feature | Status | Description | -|---------|--------|-------------| -| Interactive install script | ⏳ Planned | Prompts for embedding model, timer interval, batch size, endpoints | -| Single embedding model | ⏳ Planned | Option to use one model for both collections | -| Configurable thresholds | ⏳ Planned | Per-user customization via prompts | - -**Install script will prompt for:** -1. **Embedding model** — snowflake (fast) vs mxbai (accurate) -2. **Timer interval** — 5 min / 30 min / hourly -3. **Batch size** — 50 / 100 / 500 memories -4. **Endpoints** — Qdrant/Ollama URLs -5. **User ID** — for multi-user setups - ---- - -**Maintained by:** Rob -**AI Assistant:** Kimi 🎙️ -**Version:** 2026.02.24-v2.2 +**Version:** 2.0 +**Requires:** TrueRecall v1 +**Collections:** `memories_tr` (v1), `gems_tr` (v2) diff --git a/audit_checklist.md b/audit_checklist.md index 5b81222..43496dc 100644 --- a/audit_checklist.md +++ b/audit_checklist.md @@ -1,6 +1,6 @@ -# TrueRecall v2 - Master Audit Checklist (GIT) +# TrueRecall Gems - Master Audit Checklist (GIT) -**For:** `.git_projects/true-recall-v2/` (Git Repository - Sanitized) +**For:** `.git_projects/true-recall-gems/` (Git Repository - Sanitized) **Version:** 2.2 **Last Updated:** 2026-02-25 10:07 CST @@ -91,18 +91,18 @@ This checklist validates the **git repository** where all private IPs, absolute | # | File | Path | Status | |---|------|------|--------| -| 2.1.1 | README.md | `.local_projects/true-recall-v2/README.md` | ☐ | -| 2.1.2 | session.md | `.local_projects/true-recall-v2/session.md` | ☐ | -| 2.1.3 | checklist.md | `.local_projects/true-recall-v2/checklist.md` | ☐ | -| 2.1.4 | curator-prompt.md | `.local_projects/true-recall-v2/curator-prompt.md` | ☐ | +| 2.1.1 | README.md | `.local_projects/true-recall-gems/README.md` | ☐ | +| 2.1.2 | session.md | `.local_projects/true-recall-gems/session.md` | ☐ | +| 2.1.3 | checklist.md | `.local_projects/true-recall-gems/checklist.md` | ☐ | +| 2.1.4 | curator-prompt.md | `.local_projects/true-recall-gems/curator-prompt.md` | ☐ | ### 2.2 Scripts Exist | # | File | Path | Status | |---|------|------|--------| -| 2.2.1 | curator_timer.py | `.local_projects/true-recall-v2/tr-continuous/curator_timer.py` | ☐ | -| 2.2.2 | curator_config.json | `.local_projects/true-recall-v2/tr-continuous/curator_config.json` | ☐ | -| 2.2.3 | install.py | `.local_projects/true-recall-v2/install.py` | ☐ | +| 2.2.1 | curator_timer.py | `.local_projects/true-recall-gems/tr-continuous/curator_timer.py` | ☐ | +| 2.2.2 | curator_config.json | `.local_projects/true-recall-gems/tr-continuous/curator_config.json` | ☐ | +| 2.2.3 | install.py | `.local_projects/true-recall-gems/install.py` | ☐ | ### 2.3 Watcher Files @@ -347,7 +347,7 @@ curl -s -X POST http://:6333/collections/memories_tr/points/count \ -d '{"filter":{"must":[{"key":"curated","match":{"value":false}}]}}' # Manual curator run -cd ~/.openclaw/workspace/.local_projects/true-recall-v2/tr-continuous +cd ~/.openclaw/workspace/.local_projects/true-recall-gems/tr-continuous python3 curator_timer.py --dry-run # Restart services @@ -357,4 +357,4 @@ sudo systemctl restart mem-qdrant-watcher --- *This checklist is for LOCAL working directory validation only.* -*For git/public checks, see `audit_checklist.md` in `.git_projects/true-recall-v2/`* +*For git/public checks, see `audit_checklist.md` in `.git_projects/true-recall-gems/`* diff --git a/function_check.md b/function_check.md index bf4b507..c09c093 100644 --- a/function_check.md +++ b/function_check.md @@ -6,14 +6,14 @@ **Qdrant:** http://:6333 **Ollama:** http://:11434 **Timer:** 5 minutes -**Working Dir:** ~/.openclaw/workspace/.local_projects/true-recall-v2 +**Working Dir:** ~/.openclaw/workspace/.local_projects/true-recall-gems --- ## Quick Status Check ```bash -cd ~/.openclaw/workspace/.local_projects/true-recall-v2 +cd ~/.openclaw/workspace/.local_projects/true-recall-gems ``` --- @@ -22,13 +22,13 @@ cd ~/.openclaw/workspace/.local_projects/true-recall-v2 | Check | Command | Expected | |-------|---------|----------| -| Local project exists | `ls ~/.openclaw/workspace/.local_projects/true-recall-v2` | Files listed | -| Git project exists | `ls ~/.openclaw/workspace/.git_projects/true-recall-v2` | Files listed | +| Local project exists | `ls ~/.openclaw/workspace/.local_projects/true-recall-gems` | Files listed | +| Git project exists | `ls ~/.openclaw/workspace/.git_projects/true-recall-gems` | Files listed | | Watcher script | `ls ~/.openclaw/workspace/skills/qdrant-memory/scripts/realtime_qdrant_watcher.py` | File exists | **Our Paths:** -- Local: `~/.openclaw/workspace/.local_projects/true-recall-v2/` -- Git: `~/.openclaw/workspace/.git_projects/true-recall-v2/` +- Local: `~/.openclaw/workspace/.local_projects/true-recall-gems/` +- Git: `~/.openclaw/workspace/.git_projects/true-recall-gems/` - Watcher: `~/.openclaw/workspace/skills/qdrant-memory/scripts/realtime_qdrant_watcher.py` - Systemd: `/etc/systemd/system/mem-qdrant-watcher.service` @@ -131,8 +131,8 @@ cd ~/.openclaw/workspace/.local_projects/true-recall-v2 | Path | Check | Status | |------|-------|--------| | Watcher script | `skills/qdrant-memory/scripts/realtime_qdrant_watcher.py` | ☐ | -| Curator script | `.local_projects/true-recall-v2/tr-continuous/curator_timer.py` | ☐ | -| Config file | `.local_projects/true-recall-v2/tr-continuous/curator_config.json` | ☐ | +| Curator script | `.local_projects/true-recall-gems/tr-continuous/curator_timer.py` | ☐ | +| Config file | `.local_projects/true-recall-gems/tr-continuous/curator_config.json` | ☐ | | Log file | `/var/log/true-recall-timer.log` | ☐ | --- @@ -153,7 +153,7 @@ curl -s -X POST http://:6333/collections/memories_tr/points/count \ -d '{"filter":{"must":[{"key":"user_id","match":{"value":"rob"}},{"key":"curated","match":{"value":false}}]}}' | jq .result.count # Run curator manually (Our path: .local_projects) -cd ~/.openclaw/workspace/.local_projects/true-recall-v2/tr-continuous +cd ~/.openclaw/workspace/.local_projects/true-recall-gems/tr-continuous python3 curator_timer.py # Check OpenClaw plugin diff --git a/tr-continuous/curator_config.json b/tr-continuous/curator_config.json index c496f33..c530388 100644 --- a/tr-continuous/curator_config.json +++ b/tr-continuous/curator_config.json @@ -1,7 +1,7 @@ { "timer_minutes": 5, "max_batch_size": 100, - "user_id": "rob", + "user_id": "", "source_collection": "memories_tr", "target_collection": "gems_tr" } diff --git a/tr-continuous/curator_timer.py b/tr-continuous/curator_timer.py index 3154ae2..adee23c 100755 --- a/tr-continuous/curator_timer.py +++ b/tr-continuous/curator_timer.py @@ -1,144 +1,102 @@ #!/usr/bin/env python3 """ -TrueRecall Timer Curator: Runs every 30 minutes via cron. +TrueRecall v2 - Timer Curator +Runs every 5 minutes via cron +Extracts gems from uncurated memories and stores them in gems_tr -- Queries all uncurated memories from memories_tr -- Sends batch to qwen3 for gem extraction -- Stores gems to gems_tr -- Marks processed memories as curated=true - -Usage: - python3 curator_timer.py --config curator_config.json - python3 curator_timer.py --config curator_config.json --dry-run +REQUIRES: TrueRecall v1 (provides memories_tr via watcher) """ -import os import sys import json -import argparse +import hashlib import requests from datetime import datetime, timezone -from pathlib import Path from typing import List, Dict, Any, Optional -import hashlib -# Load config -def load_config(config_path: str) -> Dict[str, Any]: - with open(config_path, 'r') as f: - return json.load(f) - -# Default paths -SCRIPT_DIR = Path(__file__).parent -DEFAULT_CONFIG = SCRIPT_DIR / "curator_config.json" - -# Curator prompt path -CURATOR_PROMPT_PATH = Path("~/.openclaw/workspace/.local_projects/true-recall-v2/curator-prompt.md") +# Configuration - EDIT THESE for your environment +QDRANT_URL = "http://:6333" +OLLAMA_URL = "http://:11434" +SOURCE_COLLECTION = "memories_tr" +TARGET_COLLECTION = "gems_tr" +EMBEDDING_MODEL = "snowflake-arctic-embed2" +MAX_BATCH = 100 +USER_ID = "" -def load_curator_prompt() -> str: - """Load the curator system prompt.""" +def get_uncurated_memories(qdrant_url: str, collection: str, user_id: str, max_batch: int = 100) -> List[Dict[str, Any]]: + """Fetch uncurated memories from Qdrant.""" try: - with open(CURATOR_PROMPT_PATH, 'r') as f: - return f.read() - except FileNotFoundError: - print(f"⚠️ Curator prompt not found at {CURATOR_PROMPT_PATH}") - return """You are The Curator. Extract meaningful gems from conversation history. -Extract facts, insights, decisions, preferences, and context that would be valuable to remember. -Output a JSON array of gems with fields: gem, context, snippet, categories, importance (1-5), confidence (0-0.99).""" - - -def get_uncurated_memories(qdrant_url: str, collection: str, user_id: str, max_batch: int) -> List[Dict[str, Any]]: - """Query Qdrant for uncurated memories.""" - filter_data = { - "must": [ - {"key": "user_id", "match": {"value": user_id}}, - {"key": "curated", "match": {"value": False}} - ] - } - - all_points = [] - offset = None - iterations = 0 - max_iterations = 10 - - while len(all_points) < max_batch and iterations < max_iterations: - iterations += 1 - scroll_data = { - "limit": min(100, max_batch - len(all_points)), - "with_payload": True, - "filter": filter_data - } - - if offset: - scroll_data["offset"] = offset - - try: - response = requests.post( - f"{qdrant_url}/collections/{collection}/points/scroll", - json=scroll_data, - headers={"Content-Type": "application/json"}, - timeout=30 - ) - response.raise_for_status() - result = response.json() - points = result.get("result", {}).get("points", []) - - if not points: - break - - all_points.extend(points) - offset = result.get("result", {}).get("next_page_offset") - if not offset: - break - except Exception as e: - print(f"Error querying Qdrant: {e}", file=sys.stderr) - break - - # Convert to simple dicts - memories = [] - for point in all_points: - payload = point.get("payload", {}) - memories.append({ - "id": point.get("id"), - "content": payload.get("content", ""), - "role": payload.get("role", ""), - "timestamp": payload.get("timestamp", ""), - "turn": payload.get("turn", 0), - **payload - }) - - return memories[:max_batch] + response = requests.post( + f"{qdrant_url}/collections/{collection}/points/scroll", + json={ + "limit": max_batch, + "filter": { + "must": [ + {"key": "user_id", "match": {"value": user_id}}, + {"key": "curated", "match": {"value": False}} + ] + }, + "with_payload": True + }, + timeout=30 + ) + response.raise_for_status() + data = response.json() + return data.get("result", {}).get("points", []) + except Exception as e: + print(f"Error fetching memories: {e}", file=sys.stderr) + return [] def extract_gems(memories: List[Dict[str, Any]], ollama_url: str) -> List[Dict[str, Any]]: - """Send memories to qwen3 for gem extraction.""" + """Send memories to LLM for gem extraction.""" if not memories: return [] - # Build conversation from memories (support both 'text' and 'content' fields) + SKIP_PATTERNS = [ + "gems extracted", "curator", "curation complete", + "system is running", "validation round", + ] + conversation_lines = [] for i, mem in enumerate(memories): - # Support both migrated memories (text) and watcher memories (content) - text = mem.get("text", "") or mem.get("content", "") - if text: - # Truncate very long texts - text = text[:500] if len(text) > 500 else text - conversation_lines.append(f"[{i+1}] {text}") + payload = mem.get("payload", {}) + text = payload.get("text", "") or payload.get("content", "") + role = payload.get("role", "") + + if not text: + continue + text = str(text) + + if role == "assistant": + continue + + text_lower = text.lower() + if len(text) < 20: + continue + if any(pattern in text_lower for pattern in SKIP_PATTERNS): + continue + + text = text[:500] if len(text) > 500 else text + conversation_lines.append(f"[{i+1}] {text}") + + if not conversation_lines: + return [] conversation_text = "\n\n".join(conversation_lines) - # Simple extraction prompt prompt = """You are a memory curator. Extract atomic facts from the conversation below. For each distinct fact/decision/preference, output a JSON object with: -- "text": the atomic fact (1-2 sentences) +- "text": the atomic fact (1-2 sentences) - use FIRST PERSON ("I" not "User") - "category": one of [decision, preference, technical, project, knowledge, system] - "importance": "high" or "medium" Return ONLY a JSON array. Example: [ - {"text": "User decided to use Redis for caching", "category": "decision", "importance": "high"}, - {"text": "User prefers dark mode", "category": "preference", "importance": "medium"} + {"text": "I decided to use Redis for caching", "category": "decision", "importance": "high"}, + {"text": "I prefer dark mode", "category": "preference", "importance": "medium"} ] If no extractable facts, return []. @@ -152,7 +110,7 @@ CONVERSATION: response = requests.post( f"{ollama_url}/api/generate", json={ - "model": "qwen3:30b-a3b-instruct-2507-q8_0", + "model": "", "system": prompt, "prompt": full_prompt, "stream": False, @@ -169,28 +127,20 @@ CONVERSATION: return [] result = response.json() - output = result.get('response', '').strip() - - # Extract JSON from output - if '```json' in output: - output = output.split('```json')[1].split('```')[0].strip() - elif '```' in output: - output = output.split('```')[1].split('```')[0].strip() + response_text = result.get("response", "") try: - # Find JSON array in output - start_idx = output.find('[') - end_idx = output.rfind(']') - if start_idx != -1 and end_idx != -1 and end_idx > start_idx: - output = output[start_idx:end_idx+1] - - gems = json.loads(output) + start = response_text.find('[') + end = response_text.rfind(']') + if start == -1 or end == -1: + return [] + json_str = response_text[start:end+1] + gems = json.loads(json_str) if not isinstance(gems, list): - gems = [gems] if gems else [] + return [] return gems except json.JSONDecodeError as e: - print(f"Error parsing curator output: {e}", file=sys.stderr) - print(f"Raw output: {repr(output[:500])}...", file=sys.stderr) + print(f"JSON parse error: {e}", file=sys.stderr) return [] @@ -199,50 +149,35 @@ def get_embedding(text: str, ollama_url: str) -> Optional[List[float]]: try: response = requests.post( f"{ollama_url}/api/embeddings", - json={"model": "snowflake-arctic-embed2", "prompt": text}, + json={ + "model": EMBEDDING_MODEL, + "prompt": text + }, timeout=30 ) response.raise_for_status() - return response.json()['embedding'] + data = response.json() + return data.get("embedding") except Exception as e: print(f"Error getting embedding: {e}", file=sys.stderr) return None -def store_gem(gem: Dict[str, Any], user_id: str, qdrant_url: str, target_collection: str, ollama_url: str) -> bool: - """Store a single gem to Qdrant.""" - # Support both old format (gem, context, snippet) and new format (text, category, importance) - embedding_text = gem.get('text', '') or gem.get('gem', '') - if not embedding_text: - embedding_text = f"{gem.get('gem', '')} {gem.get('context', '')} {gem.get('snippet', '')}".strip() +def store_gem(gem: Dict[str, Any], vector: List[float], qdrant_url: str, target_collection: str, user_id: str) -> bool: + """Store a gem in Qdrant.""" + embedding_text = gem.get("text", "") or gem.get("gem", "") - if not embedding_text: - print(f"⚠️ Empty embedding text for gem, skipping", file=sys.stderr) - return False - - vector = get_embedding(embedding_text, ollama_url) - - if vector is None: - print(f"⚠️ Failed to get embedding for gem", file=sys.stderr) - return False - - # Generate ID - hash_content = f"{user_id}:{gem.get('conversation_id', '')}:{gem.get('turn_range', '')}:{gem.get('gem', '')[:50]}" + hash_content = f"{user_id}:{embedding_text[:100]}" hash_bytes = hashlib.sha256(hash_content.encode()).digest()[:8] gem_id = int.from_bytes(hash_bytes, byteorder='big') % (2**63) - # Normalize gem fields - ensure we have text field payload = { + "text": embedding_text, + "category": gem.get("category", "fact"), + "importance": gem.get("importance", "medium"), "user_id": user_id, - "text": gem.get('text', gem.get('gem', '')), - "category": gem.get('category', 'general'), - "importance": gem.get('importance', 'medium'), - "curated_at": datetime.now(timezone.utc).isoformat() + "created_at": datetime.now(timezone.utc).isoformat() } - # Preserve any other fields from gem - for key in ['context', 'snippet', 'confidence', 'conversation_id', 'turn_range']: - if key in gem: - payload[key] = gem[key] try: response = requests.put( @@ -264,7 +199,7 @@ def store_gem(gem: Dict[str, Any], user_id: str, qdrant_url: str, target_collect def mark_curated(memory_ids: List, qdrant_url: str, collection: str) -> bool: - """Mark memories as curated in Qdrant using POST /points/payload format.""" + """Mark memories as curated.""" if not memory_ids: return True @@ -288,79 +223,58 @@ def mark_curated(memory_ids: List, qdrant_url: str, collection: str) -> bool: def main(): - parser = argparse.ArgumentParser(description="TrueRecall Timer Curator") - parser.add_argument("--config", "-c", default=str(DEFAULT_CONFIG), help="Config file path") - parser.add_argument("--dry-run", "-n", action="store_true", help="Don't write, just preview") - args = parser.parse_args() + print("TrueRecall v2 - Timer Curator") + print(f"User: {USER_ID}") + print(f"Source: {SOURCE_COLLECTION}") + print(f"Target: {TARGET_COLLECTION}") + print(f"Max batch: {MAX_BATCH}\n") - config = load_config(args.config) - - qdrant_url = os.getenv("QDRANT_URL", "http://:6333") - ollama_url = os.getenv("OLLAMA_URL", "http://:11434") - - user_id = config.get("user_id", "rob") - source_collection = config.get("source_collection", "memories_tr") - target_collection = config.get("target_collection", "gems_tr") - max_batch = config.get("max_batch_size", 100) - - print(f"🔍 TrueRecall Timer Curator") - print(f"👤 User: {user_id}") - print(f"📥 Source: {source_collection}") - print(f"💎 Target: {target_collection}") - print(f"📦 Max batch: {max_batch}") - if args.dry_run: - print("🏃 DRY RUN MODE") - print() - - # Get uncurated memories - print("📥 Fetching uncurated memories...") - memories = get_uncurated_memories(qdrant_url, source_collection, user_id, max_batch) - print(f"✅ Found {len(memories)} uncurated memories") + print("Fetching uncurated memories...") + memories = get_uncurated_memories(QDRANT_URL, SOURCE_COLLECTION, USER_ID, MAX_BATCH) + print(f"Found {len(memories)} uncurated memories\n") if not memories: - print("🤷 Nothing to curate. Exiting.") + print("Nothing to curate. Exiting.") return - # Extract gems - print(f"\n🧠 Sending {len(memories)} memories to curator...") - gems = extract_gems(memories, ollama_url) - print(f"✅ Extracted {len(gems)} gems") + print("Sending memories to curator...") + gems = extract_gems(memories, OLLAMA_URL) + print(f"Extracted {len(gems)} gems\n") if not gems: - print("⚠️ No gems extracted. Nothing to store.") - # Still mark as curated so we don't reprocess - memory_ids = [m["id"] for m in memories] # Keep as integers - mark_curated(memory_ids, qdrant_url, source_collection) + print("No gems extracted. Exiting.") return - # Preview - print("\n💎 Gems preview:") + print("Gems preview:") for i, gem in enumerate(gems[:3], 1): - print(f" {i}. {gem.get('gem', 'N/A')[:80]}...") + text = gem.get("text", "N/A")[:50] + print(f" {i}. {text}...") if len(gems) > 3: print(f" ... and {len(gems) - 3} more") + print() - if args.dry_run: - print("\n🏃 DRY RUN: Not storing gems or marking curated.") - return - - # Store gems - print(f"\n💾 Storing {len(gems)} gems...") + print("Storing gems...") stored = 0 for gem in gems: - if store_gem(gem, user_id, qdrant_url, target_collection, ollama_url): - stored += 1 - print(f"✅ Stored: {stored}/{len(gems)}") + text = gem.get("text", "") or gem.get("gem", "") + if not text: + continue + + vector = get_embedding(text, OLLAMA_URL) + if vector: + if store_gem(gem, vector, QDRANT_URL, TARGET_COLLECTION, USER_ID): + stored += 1 - # Mark memories as curated - print("\n📝 Marking memories as curated...") - memory_ids = [m["id"] for m in memories] # Keep as integers - if mark_curated(memory_ids, qdrant_url, source_collection): - print(f"✅ Marked {len(memory_ids)} memories as curated") + print(f"Stored: {stored}/{len(gems)}\n") + + print("Marking memories as curated...") + memory_ids = [mem.get("id") for mem in memories if mem.get("id")] + if mark_curated(memory_ids, QDRANT_URL, SOURCE_COLLECTION): + print(f"Marked {len(memory_ids)} memories as curated\n") else: - print(f"⚠️ Failed to mark some memories as curated") + print("Failed to mark memories\n") - print("\n🎉 Curation complete!") + print("Curation complete!") if __name__ == "__main__":