docs: simplify README, update validation and curator docs

This commit is contained in:
root
2026-03-10 12:08:53 -05:00
parent 08aaddb4d0
commit 62953e9f39
6 changed files with 261 additions and 813 deletions

View File

@@ -1,6 +1,6 @@
# TrueRecall v2 - Git Validation Checklist # TrueRecall v2 - Git Validation Checklist
**Environment:** Git Repository (`.git_projects/true-recall-v2/`) **Environment:** Git Repository (`.git_projects/true-recall-gems/`)
**Purpose:** Validate git-ready directory for public sharing **Purpose:** Validate git-ready directory for public sharing
**Version:** 2.4 **Version:** 2.4
**Last Updated:** 2026-02-26 **Last Updated:** 2026-02-26

612
README.md
View File

@@ -1,492 +1,106 @@
# TrueRecall v2 # TrueRecall Gems (v2)
**Project:** Gem extraction and memory recall system **Purpose:** Memory curation (gems) + context injection
**Status:** ✅ Active & Verified
**Location:** `~/.openclaw/workspace/.local_projects/true-recall-v2/`
**Last Updated:** 2026-02-25 12:04 CST
--- **Status:** ⚠️ Requires true-recall-base to be installed first
## Table of Contents
- [Quick Start](#quick-start)
- [Overview](#overview)
- [Current State](#current-state)
- [Architecture](#architecture)
- [Components](#components)
- [Files & Locations](#files--locations)
- [Configuration](#configuration)
- [Validation](#validation)
- [Troubleshooting](#troubleshooting)
- [Status Summary](#status-summary)
---
## Quick Start
```bash
# Check system status
openclaw status
sudo systemctl status mem-qdrant-watcher
# View recent captures
curl -s http://<QDRANT_IP>:6333/collections/memories_tr | jq '.result.points_count'
# Check collections
curl -s http://<QDRANT_IP>:6333/collections | jq '.result.collections[].name'
```
---
## Recent Fixes (2026-02-25 12:41 CST)
| Issue | Root Cause | Fix Applied |
|-------|------------|-------------|
| **Watcher stuck on old session** | Watcher only switched sessions when file deleted, old sessions persisted | ✅ Restarted service, now follows current session |
| **Plugin capture 0 exchanges** | OpenClaw uses OpenAI content format (array of items), plugin expected string | ✅ Added `extractMessageText()` to extract text from `type: "text"` items |
| **Gem ID collision** | Hash used non-existent fields (`conversation_id`, `turn_range`, `gem`) | ✅ Hash now uses `embedding_text_for_hash[:100]` |
| **Meta-gems extracted** | Curator extracted from debug/tool output | ✅ Added SKIP_PATTERNS filter ("gems extracted", "✅", "🔍", etc.) + skip `role: "assistant"` |
| **gems_tr pollution** | 5 meta-gems + 1 real gem | ✅ Cleaned, now 1 real gem only |
| **First-person format** | Third person "User decided..." | ✅ Changed to "I decided..." for better query matching (score 0.746 vs 0.39) |
### Validation Results
**Plugin capture:**
```
Before: parsed 14 user, 84 assistant messages, 0 exchanges
After: parsed 17 user, 116 assistant messages, 9 exchanges ✅
```
**Watcher:**
```
Before: Watching old session (1737142a... from Feb 24)
After: Watching current session (93dc32bf... from Feb 25) ✅
```
---
## Needed Improvements
| Issue | Description | Priority |
|-------|-------------|----------|
| **Semantic Deduplication** | No dedup between similar gems. Same fact phrased differently creates multiple gems. Need semantic similarity check before storage. | High |
| **Search Result Deduplication** | Similar gems both above threshold are both injected, causing redundancy. Need filter to remove near-duplicates from results. | Medium |
| **Gem Quality Scoring** | No quality metric. Some extracted gems may be low value. Need LLM-based quality scoring. | Medium |
| **Temporal Decay** | All gems treated equally regardless of age. Should weight recent gems higher. | Low |
| **Gem Merging/Updating** | When user changes preference, old gem still exists. Need mechanism to update/contradict old gems. | Low |
| **Importance Calibration** | All curator gems marked "medium" importance. Should dynamically assign based on significance. | Low |
### Gem Quality & Model Intelligence
**Gem quality improves significantly with smarter models:**
| Model | Gem Quality | Example |
|-------|-------------|---------|
| **Small models (7B)** | Basic extraction, may miss nuance | "User likes local AI" |
| **Medium models (30B)** | Better categorization, captures intent | "I prefer local AI over cloud services for privacy reasons" |
| **Large models (70B+)** | Rich context, infers significance, better first-person conversion | "I decided to self-host AI tools because I value data privacy and want to avoid vendor lock-in" |
**Example Gem (High Quality):**
```json
{
"text": "I decided to keep the installation simple and not include gems for the basic version",
"category": "decision",
"importance": "high"
}
```
**Current:** Using `qwen3:30b-a3b-instruct` for extraction (good balance of quality/speed).
**Recommendation:** For production use, consider `qwen3:72b` or `deepseek-r1` for higher gem quality.
--- ---
## Overview ## Overview
TrueRecall v2 is a **standalone memory system** that extracts "gems" (key insights) from conversations and injects them as context. It operates independently — not an addon or extension of any previous system. TrueRecall Gems adds **curation** and **injection** on top of Base's capture foundation.
TrueRecall v2 replaces both Jarvis Memory and TrueRecall v1 with a completely re-architected solution: **Gems is an ADDON:**
- Requires true-recall-base
| System | Status | Relationship to v2 | - Independent from openclaw-true-recall-blocks
|--------|--------|-------------------| - Choose Gems OR Blocks, not both
| **Jarvis Memory** | Legacy | Replaced by v2 |
| **TrueRecall v1** | Deprecated | Replaced by v2 |
| **TrueRecall v2** | ✅ Active | Complete standalone replacement |
### Three-Layer Architecture
1. **Capture** — Real-time watcher saves every turn to `memories_tr`
2. **Curation** — Timer-based curator extracts gems to `gems_tr`
3. **Injection** — Plugin searches `gems_tr` and injects gems per turn
**Key:** v2 requires no components from Jarvis Memory or v1. It is self-contained with its own storage (Qdrant-only), capture mechanism, and injection system.
--- ---
## Current State ## Three-Tier Architecture
### Verified at 19:02 CST
| Collection | Points | Purpose | Status |
|------------|--------|---------|--------|
| `memories_tr` | **12,729** | Full text (live capture) | ✅ Active |
| `gems_tr` | **14+** | Curated gems (injection) | ✅ **WORKING** - Context injection verified |
**All memories tagged with `curated: false` for timer curation.**
### Services Status
| Service | Status | Details |
|---------|--------|---------|
| `mem-qdrant-watcher` | ✅ Active | PID 234, capturing |
| Timer curator | ✅ Deployed | Every 5 min via cron |
| OpenClaw Gateway | ✅ Running | Version 2026.2.23 |
| memory-qdrant plugin | ✅ Loaded | recall: gems_tr |
---
## Comparison: TrueRecall v2 vs Jarvis Memory vs v1
| Feature | Jarvis Memory | TrueRecall v1 | TrueRecall v2 |
|---------|---------------|---------------|---------------|
| **Storage** | Redis | Redis + Qdrant | Qdrant only |
| **Capture** | Session batch | Session batch | Real-time |
| **Curation** | Manual | Daily 2:45 AM | Timer (5 min) ✅ |
| **Embedding** | — | snowflake | snowflake-arctic-embed2 ✅ |
| **Curator LLM** | — | qwen3:4b | qwen3:30b |
| **State tracking** | — | — | `curated` tag |
| **Batch size** | — | 24h worth | Configurable |
| **JSON parsing** | — | Fallback needed | Native (30b) |
**Key Improvements v2:**
- ✅ Real-time capture (no batch delay)
- ✅ Timer-based curation (responsive vs daily)
- ✅ 30b curator (better gems, faster ~3s)
-`curated` tag (reliable state tracking)
- ✅ No Redis dependency (simpler stack)
---
## Architecture
### v2.2: Timer-Based Curation
``` ```
┌─────────────────┐ ┌──────────────────────┐ ┌─────────────┐ true-recall-base (REQUIRED)
│ OpenClaw Chat │────▶│ Real-Time Watcher │────▶│ memories_tr │ ├── Watcher daemon
│ (Session JSONL)│ │ (Python daemon) │ │ (Qdrant) │ └── memories_tr (raw capture)
└─────────────────┘ └──────────────────────┘ └──────┬──────┘
│ Every 30 min └──▶ true-recall-gems (THIS ADDON)
├── Curator extracts gems
┌──────────────────┐ ├── gems_tr (curated)
│ Timer Curator │ └── Plugin injection
│ (cron/qwen3) │
└────────┬─────────┘
┌──────────────────┐
│ gems_tr │
│ (Qdrant) │
└────────┬─────────┘
Per turn │
┌──────────────────┐
│ memory-qdrant │
│ plugin │
└──────────────────┘
```
**Key Changes in v2.2:** Note: Don't install with openclaw-true-recall-blocks.
- ✅ Timer-based curation (30 min intervals) Choose one addon: Gems OR Blocks.
- ✅ All memories tagged `curated: false` on capture ```
- ✅ Migration complete (12,378 memories)
- ❌ Removed daily batch processing (2:45 AM)
--- ---
## Components ## Prerequisites
### 1. Real-Time Watcher **REQUIRED: Install TrueRecall Base first**
**File:** `skills/qdrant-memory/scripts/realtime_qdrant_watcher.py` Base provides the capture infrastructure (`memories_tr` collection).
**What it does:**
- Watches `~/.openclaw/agents/main/sessions/*.jsonl`
- Parses each turn (user + AI)
- Embeds with `snowflake-arctic-embed2`
- Stores to `memories_tr` instantly
- **Cleans:** Removes markdown, tables, metadata
**Service:** `mem-qdrant-watcher.service`
**Commands:**
```bash ```bash
# Check status # Verify base is running
sudo systemctl status mem-qdrant-watcher curl -s http://<QDRANT_IP>:6333/collections/memories_tr | jq '.result.points_count'
# View logs
sudo journalctl -u mem-qdrant-watcher -f
# Restart
sudo systemctl restart mem-qdrant-watcher
``` ```
--- ## Installation
### 2. Content Cleaner ### 1. Curator Setup
**File:** `skills/qdrant-memory/scripts/clean_memories_tr.py` **Install cron job:**
**Purpose:** Batch-clean existing points
**Usage:**
```bash ```bash
# Preview changes # Edit path and add to crontab
python3 clean_memories_tr.py --dry-run echo "*/5 * * * * cd <INSTALL_PATH>/true-recall-gems/tr-continuous && /usr/bin/python3 curator_timer.py >> /var/log/true-recall-timer.log 2>&1" | sudo crontab -
# Clean all sudo touch /var/log/true-recall-timer.log
python3 clean_memories_tr.py --execute sudo chmod 644 /var/log/true-recall-timer.log
# Clean 100 (test)
python3 clean_memories_tr.py --execute --limit 100
``` ```
**Cleans:** **Configure curator_config.json:**
- `**bold**` → plain text
- `|tables|` → removed
- `` `code` `` → plain text
- `---` rules → removed
- `# headers` → removed
---
### 3. Timer Curator
**File:** `tr-continuous/curator_timer.py`
**Schedule:** Every 30 minutes (cron)
**Flow:**
1. Query uncurated memories from `memories_tr`
2. Send batch to qwen3 (max 100)
3. Extract gems → store to `gems_tr`
4. Mark memories as `curated: true`
**Config:** `tr-continuous/curator_config.json`
```json ```json
{ {
"timer_minutes": 30, "timer_minutes": 5,
"max_batch_size": 100 "max_batch_size": 100,
"user_id": "your-user-id",
"source_collection": "memories_tr",
"target_collection": "gems_tr"
} }
``` ```
**Logs:** `/var/log/true-recall-timer.log` **Edit curator_timer.py:**
- Replace `<QDRANT_IP>`, `<OLLAMA_IP>` with your endpoints
- Replace `<USER_ID>` with your identifier
- Replace `<CURATOR_MODEL>` with your LLM (e.g., `qwen3:30b`)
--- ### 2. Injection Setup
### 4. Curation Model Comparison Add to your OpenClaw `openclaw.json`:
**Current:** `qwen3:4b-instruct`
| Metric | 4b | 30b |
|--------|----|----|
| Speed | ~10-30s per batch | **~3.3s** (tested 2026-02-24) |
| JSON reliability | ⚠️ Needs fallback | ✅ Native |
| Context quality | Basic extraction | ✅ Nuanced |
| Snippet accuracy | ~80% | ✅ Expected: 95%+ |
**30b Benchmark (2026-02-24):**
- Load: 108ms
- Prompt eval: 49ms (1,576 tok/s)
- Generation: 2.9s (233 tokens, 80 tok/s)
- **Total: 3.26s**
**Trade-offs:**
- **4b:** Faster batch processing, lightweight, catches explicit decisions
- **30b:** Deeper context, better inference, ~3x slower but superior quality
**Gem Quality Comparison (Sample Review):**
| Aspect | 4b | 30b |
|--------|----|----|
| **Context depth** | "Extracted via fallback" | Explains *why* decisions were made |
| **Confidence scores** | 0.7-0.85 | 0.9-0.97 |
| **Snippet accuracy** | ~80% (wrong source) | ✅ 95%+ (relevant quotes) |
| **Categories** | Generic "extracted" | Specific: knowledge, technical, decision |
| **Example** | "User implemented BorgBackup" (no context) | "User selected mxbai... due to top MTEB score of 66.5" (explains reasoning) |
**Verdict:** 30b produces significantly higher quality gems — richer context, accurate snippets, and captures architectural intent, not just surface facts.
---
### 5. OpenClaw Compactor Configuration
**Status:** ✅ Applied
**Goal:** Minimal overhead — just remove context, do nothing else.
**Config Applied:**
```json5
{
agents: {
defaults: {
compaction: {
mode: "default", // "default" or "safeguard"
reserveTokensFloor: 0, // Disable safety floor (default: 20000)
memoryFlush: {
enabled: false // Disable silent .md file writes
}
}
}
}
}
```
**What this does:**
- `mode: "default"` — Standard summarization (faster)
- `reserveTokensFloor: 0` — Allow aggressive settings (disables 20k minimum)
- `memoryFlush.enabled: false` — No silent "write memory" turns
**Note:** `reserveTokens` and `keepRecentTokens` are Pi runtime settings, not configurable via `agents.defaults.compaction`. They are set per-model in `contextWindow`/`contextTokens`.
---
### 6. Configuration Options Reference
**All configurable options with defaults:**
| Option | Default | Description |
|--------|---------|-------------|
| **Embedding model** | `mxbai-embed-large` | Model for generating gem embeddings. `mxbai` = higher accuracy (MTEB 66.5). `snowflake` = faster processing. |
| **Timer interval** | `5` minutes | How often the curator runs. `5 min` = fast backlog clearing. `30 min` = balanced. `60 min` = minimal overhead. |
| **Batch size** | `100` | Max memories sent to curator per run. Higher = fewer API calls but more memory usage. |
| **Max gems per run** | *(unlimited)* | Hard limit on gems extracted per batch. Not set by default — extracts all found gems. |
| **Qdrant URL** | `http://<QDRANT_IP>:6333` | Vector database endpoint. Change if Qdrant runs on different host/port. |
| **Ollama URL** | `http://<OLLAMA_IP>:11434` | LLM endpoint for gem extraction. Change if Ollama runs elsewhere. |
| **Curator LLM** | `qwen3:30b-a3b-instruct` | Model for extracting gems. `30b` = best quality (~3s). `4b` = faster but needs JSON fallback. |
| **User ID** | `rob` | Owner identifier for memories. Used for filtering and multi-user setups. |
| **Source collection** | `memories_tr` | Qdrant collection for raw captured memories. |
| **Target collection** | `gems_tr` | Qdrant collection for curated gems (injected into context). |
| **Watcher service** | `enabled` | Real-time capture daemon. Reads session JSONL and writes to Qdrant. |
| **Cron timer** | `enabled` | Periodic curation job. Runs `curator_timer.py` on schedule. |
| **Log path** | `/var/log/true-recall-timer.log` | Where curator output is written. Check with `tail -f`. |
| **Dry-run mode** | `disabled` | Test mode — shows what would be curated without writing to Qdrant. |
**OpenClaw-side options:**
| Option | Default | Description |
|--------|---------|-------------|
| **Compactor mode** | `default` | How context is summarized. `default` = fast standard. `safeguard` = chunked for very long sessions. |
| **Memory flush** | `disabled` | If enabled, writes silent "memory" turn before compaction. Adds overhead — disabled for minimal lag. |
| **Context pruning** | `cache-ttl` | Removes old tool results from context. `cache-ttl` = prunes hourly. `off` = no pruning. |
---
### 7. Embedding Models
**Current Setup:**
- `memories_tr`: `snowflake-arctic-embed2` (capture)
- `gems_tr`: `snowflake-arctic-embed2` (recall) ✅ **FIXED** - Both collections now use same model
**Note:** Previously used `mxbai-embed-large` for gems, but this caused embedding model mismatch. Fixed 2026-02-25.
---
### 6. memory-qdrant Plugin
**Location:** `~/.openclaw/extensions/memory-qdrant/`
**Config (openclaw.json):**
```json
{
"collectionName": "gems_tr",
"captureCollection": "memories_tr",
"autoRecall": true,
"autoCapture": true
}
```
**Functions:**
- **Recall:** Searches `gems_tr`, injects gems (hidden)
- **Capture:** Session-level to `memories_tr` (backup)
---
## Files & Locations
### Core Project
```
~/.openclaw/workspace/.local_projects/true-recall-v2/
├── README.md # This file
├── session.md # Detailed notes
├── curator-prompt.md # Extraction prompt
├── tr-daily/
│ └── curate_from_qdrant.py # Daily curator
└── shared/
```
### New Files (2026-02-24)
| File | Purpose |
|------|---------|
| `tr-continuous/curator_timer.py` | Timer curator (v2.2) |
| `tr-continuous/curator_config.json` | Curator settings |
| `tr-continuous/migrate_add_curated.py` | Migration script |
| `skills/qdrant-memory/scripts/realtime_qdrant_watcher.py` | Capture daemon |
| `skills/qdrant-memory/mem-qdrant-watcher.service` | Systemd service |
### Archived Files (v2.1)
| File | Status | Note |
|------|--------|------|
| `tr-daily/curate_from_qdrant.py` | 📦 Archived | Replaced by timer |
| `tr-continuous/curator_by_count.py` | 📦 Archived | Replaced by timer |
### System Files
| File | Purpose |
|------|---------|
| `~/.openclaw/extensions/memory-qdrant/` | Plugin code |
| `~/.openclaw/openclaw.json` | Configuration |
| `/etc/systemd/system/mem-qdrant-watcher.service` | Service file |
---
## Configuration
### memory-qdrant Plugin
**File:** `~/.openclaw/openclaw.json`
```json ```json
{ {
"plugins": {
"entries": {
"memory-qdrant": { "memory-qdrant": {
"config": { "config": {
"autoCapture": true, "autoCapture": true,
"autoRecall": true, "autoRecall": true,
"collectionName": "gems_tr",
"captureCollection": "memories_tr", "captureCollection": "memories_tr",
"collectionName": "gems_tr",
"embeddingModel": "snowflake-arctic-embed2", "embeddingModel": "snowflake-arctic-embed2",
"maxRecallResults": 2, "maxRecallResults": 2,
"minRecallScore": 0.7, "minRecallScore": 0.8,
"ollamaUrl": "http://<OLLAMA_IP>:11434", "ollamaUrl": "http://<OLLAMA_IP>:11434",
"qdrantUrl": "http://<QDRANT_IP>:6333" "qdrantUrl": "http://<QDRANT_IP>:6333"
}, },
"enabled": true "enabled": true
} }
} },
``` "slots": {
"memory": "memory-qdrant"
### Gateway Control UI (OpenClaw 2026.2.23)
```json
{
"gateway": {
"controlUi": {
"allowedOrigins": ["*"],
"allowInsecureAuth": false,
"dangerouslyDisableDeviceAuth": true
} }
} }
} }
@@ -494,120 +108,40 @@ python3 clean_memories_tr.py --execute --limit 100
--- ---
## Validation ## Files
### Check Collections | File | Purpose |
|------|---------|
| `tr-continuous/curator_timer.py` | Timer-based curator |
| `tr-continuous/curator_config.json` | Curator settings template |
---
## Verification
```bash ```bash
# Count points # Check v1 capture
curl -s http://<QDRANT_IP>:6333/collections/memories_tr | jq '.result.points_count' curl -s http://<QDRANT_IP>:6333/collections/memories_tr | jq '.result.points_count'
# Check v2 curation
curl -s http://<QDRANT_IP>:6333/collections/gems_tr | jq '.result.points_count' curl -s http://<QDRANT_IP>:6333/collections/gems_tr | jq '.result.points_count'
# View recent captures # Check curator logs
curl -s -X POST http://<QDRANT_IP>:6333/collections/memories_tr/points/scroll \ tail -20 /var/log/true-recall-timer.log
-H "Content-Type: application/json" \
-d '{"limit": 3, "with_payload": true}' | jq '.result.points[].payload.content'
```
### Check Services
```bash
# Watcher
sudo systemctl status mem-qdrant-watcher
sudo journalctl -u mem-qdrant-watcher -n 20
# OpenClaw
openclaw status
openclaw gateway status
```
### Test Capture
Send a message, then check:
```bash
# Should increase by 1-2 points
curl -s http://<QDRANT_IP>:6333/collections/memories_tr | jq '.result.points_count'
``` ```
--- ---
## Troubleshooting ## Dependencies
### Watcher Not Capturing | Component | Provided By | Required For |
|-----------|-------------|--------------|
```bash | Capture | v1 | v2 (input) |
# Check logs | Curation | v2 | Injection |
sudo journalctl -u mem-qdrant-watcher -f | Injection | v2 | Context recall |
# Verify dependencies
curl http://<QDRANT_IP>:6333/ # Qdrant
curl http://<OLLAMA_IP>:11434/api/tags # Ollama
```
### Plugin Not Loading
```bash
# Validate config
openclaw config validate
# Check logs
tail /tmp/openclaw/openclaw-$(date +%Y-%m-%d).log | grep memory-qdrant
# Restart gateway
openclaw gateway restart
```
### Gateway Won't Start (OpenClaw 2026.2.23+)
**Error:** `non-loopback Control UI requires gateway.controlUi.allowedOrigins`
**Fix:** Add to `openclaw.json`:
```json
"gateway": {
"controlUi": {
"allowedOrigins": ["*"]
}
}
```
--- ---
## Status Summary **Version:** 2.0
**Requires:** TrueRecall v1
| Component | Status | Notes | **Collections:** `memories_tr` (v1), `gems_tr` (v2)
|-----------|--------|-------|
| Real-time watcher | ✅ Active | PID 1748, capturing |
| memories_tr | ✅ 12,378 pts | All tagged `curated: false` |
| gems_tr | ✅ 5 pts | Injection ready |
| Timer curator | ✅ Deployed | Every 30 min via cron |
| Plugin injection | ✅ **WORKING** | Context injection verified - score 0.587 |
| Migration | ✅ Complete | 12,378 memories |
**Logs:** `tail /var/log/true-recall-timer.log`
**Next:** Monitor first timer run
---
## Roadmap
### Planned Features
| Feature | Status | Description |
|---------|--------|-------------|
| Interactive install script | ⏳ Planned | Prompts for embedding model, timer interval, batch size, endpoints |
| Single embedding model | ⏳ Planned | Option to use one model for both collections |
| Configurable thresholds | ⏳ Planned | Per-user customization via prompts |
**Install script will prompt for:**
1. **Embedding model** — snowflake (fast) vs mxbai (accurate)
2. **Timer interval** — 5 min / 30 min / hourly
3. **Batch size** — 50 / 100 / 500 memories
4. **Endpoints** — Qdrant/Ollama URLs
5. **User ID** — for multi-user setups
---
**Maintained by:** Rob
**AI Assistant:** Kimi 🎙️
**Version:** 2026.02.24-v2.2

View File

@@ -1,6 +1,6 @@
# TrueRecall v2 - Master Audit Checklist (GIT) # TrueRecall Gems - Master Audit Checklist (GIT)
**For:** `.git_projects/true-recall-v2/` (Git Repository - Sanitized) **For:** `.git_projects/true-recall-gems/` (Git Repository - Sanitized)
**Version:** 2.2 **Version:** 2.2
**Last Updated:** 2026-02-25 10:07 CST **Last Updated:** 2026-02-25 10:07 CST
@@ -91,18 +91,18 @@ This checklist validates the **git repository** where all private IPs, absolute
| # | File | Path | Status | | # | File | Path | Status |
|---|------|------|--------| |---|------|------|--------|
| 2.1.1 | README.md | `.local_projects/true-recall-v2/README.md` | ☐ | | 2.1.1 | README.md | `.local_projects/true-recall-gems/README.md` | ☐ |
| 2.1.2 | session.md | `.local_projects/true-recall-v2/session.md` | ☐ | | 2.1.2 | session.md | `.local_projects/true-recall-gems/session.md` | ☐ |
| 2.1.3 | checklist.md | `.local_projects/true-recall-v2/checklist.md` | ☐ | | 2.1.3 | checklist.md | `.local_projects/true-recall-gems/checklist.md` | ☐ |
| 2.1.4 | curator-prompt.md | `.local_projects/true-recall-v2/curator-prompt.md` | ☐ | | 2.1.4 | curator-prompt.md | `.local_projects/true-recall-gems/curator-prompt.md` | ☐ |
### 2.2 Scripts Exist ### 2.2 Scripts Exist
| # | File | Path | Status | | # | File | Path | Status |
|---|------|------|--------| |---|------|------|--------|
| 2.2.1 | curator_timer.py | `.local_projects/true-recall-v2/tr-continuous/curator_timer.py` | ☐ | | 2.2.1 | curator_timer.py | `.local_projects/true-recall-gems/tr-continuous/curator_timer.py` | ☐ |
| 2.2.2 | curator_config.json | `.local_projects/true-recall-v2/tr-continuous/curator_config.json` | ☐ | | 2.2.2 | curator_config.json | `.local_projects/true-recall-gems/tr-continuous/curator_config.json` | ☐ |
| 2.2.3 | install.py | `.local_projects/true-recall-v2/install.py` | ☐ | | 2.2.3 | install.py | `.local_projects/true-recall-gems/install.py` | ☐ |
### 2.3 Watcher Files ### 2.3 Watcher Files
@@ -347,7 +347,7 @@ curl -s -X POST http://<QDRANT_IP>:6333/collections/memories_tr/points/count \
-d '{"filter":{"must":[{"key":"curated","match":{"value":false}}]}}' -d '{"filter":{"must":[{"key":"curated","match":{"value":false}}]}}'
# Manual curator run # Manual curator run
cd ~/.openclaw/workspace/.local_projects/true-recall-v2/tr-continuous cd ~/.openclaw/workspace/.local_projects/true-recall-gems/tr-continuous
python3 curator_timer.py --dry-run python3 curator_timer.py --dry-run
# Restart services # Restart services
@@ -357,4 +357,4 @@ sudo systemctl restart mem-qdrant-watcher
--- ---
*This checklist is for LOCAL working directory validation only.* *This checklist is for LOCAL working directory validation only.*
*For git/public checks, see `audit_checklist.md` in `.git_projects/true-recall-v2/`* *For git/public checks, see `audit_checklist.md` in `.git_projects/true-recall-gems/`*

View File

@@ -6,14 +6,14 @@
**Qdrant:** http://<QDRANT_IP>:6333 **Qdrant:** http://<QDRANT_IP>:6333
**Ollama:** http://<OLLAMA_IP>:11434 **Ollama:** http://<OLLAMA_IP>:11434
**Timer:** 5 minutes **Timer:** 5 minutes
**Working Dir:** ~/.openclaw/workspace/.local_projects/true-recall-v2 **Working Dir:** ~/.openclaw/workspace/.local_projects/true-recall-gems
--- ---
## Quick Status Check ## Quick Status Check
```bash ```bash
cd ~/.openclaw/workspace/.local_projects/true-recall-v2 cd ~/.openclaw/workspace/.local_projects/true-recall-gems
``` ```
--- ---
@@ -22,13 +22,13 @@ cd ~/.openclaw/workspace/.local_projects/true-recall-v2
| Check | Command | Expected | | Check | Command | Expected |
|-------|---------|----------| |-------|---------|----------|
| Local project exists | `ls ~/.openclaw/workspace/.local_projects/true-recall-v2` | Files listed | | Local project exists | `ls ~/.openclaw/workspace/.local_projects/true-recall-gems` | Files listed |
| Git project exists | `ls ~/.openclaw/workspace/.git_projects/true-recall-v2` | Files listed | | Git project exists | `ls ~/.openclaw/workspace/.git_projects/true-recall-gems` | Files listed |
| Watcher script | `ls ~/.openclaw/workspace/skills/qdrant-memory/scripts/realtime_qdrant_watcher.py` | File exists | | Watcher script | `ls ~/.openclaw/workspace/skills/qdrant-memory/scripts/realtime_qdrant_watcher.py` | File exists |
**Our Paths:** **Our Paths:**
- Local: `~/.openclaw/workspace/.local_projects/true-recall-v2/` - Local: `~/.openclaw/workspace/.local_projects/true-recall-gems/`
- Git: `~/.openclaw/workspace/.git_projects/true-recall-v2/` - Git: `~/.openclaw/workspace/.git_projects/true-recall-gems/`
- Watcher: `~/.openclaw/workspace/skills/qdrant-memory/scripts/realtime_qdrant_watcher.py` - Watcher: `~/.openclaw/workspace/skills/qdrant-memory/scripts/realtime_qdrant_watcher.py`
- Systemd: `/etc/systemd/system/mem-qdrant-watcher.service` - Systemd: `/etc/systemd/system/mem-qdrant-watcher.service`
@@ -131,8 +131,8 @@ cd ~/.openclaw/workspace/.local_projects/true-recall-v2
| Path | Check | Status | | Path | Check | Status |
|------|-------|--------| |------|-------|--------|
| Watcher script | `skills/qdrant-memory/scripts/realtime_qdrant_watcher.py` | ☐ | | Watcher script | `skills/qdrant-memory/scripts/realtime_qdrant_watcher.py` | ☐ |
| Curator script | `.local_projects/true-recall-v2/tr-continuous/curator_timer.py` | ☐ | | Curator script | `.local_projects/true-recall-gems/tr-continuous/curator_timer.py` | ☐ |
| Config file | `.local_projects/true-recall-v2/tr-continuous/curator_config.json` | ☐ | | Config file | `.local_projects/true-recall-gems/tr-continuous/curator_config.json` | ☐ |
| Log file | `/var/log/true-recall-timer.log` | ☐ | | Log file | `/var/log/true-recall-timer.log` | ☐ |
--- ---
@@ -153,7 +153,7 @@ curl -s -X POST http://<QDRANT_IP>:6333/collections/memories_tr/points/count \
-d '{"filter":{"must":[{"key":"user_id","match":{"value":"rob"}},{"key":"curated","match":{"value":false}}]}}' | jq .result.count -d '{"filter":{"must":[{"key":"user_id","match":{"value":"rob"}},{"key":"curated","match":{"value":false}}]}}' | jq .result.count
# Run curator manually (Our path: .local_projects) # Run curator manually (Our path: .local_projects)
cd ~/.openclaw/workspace/.local_projects/true-recall-v2/tr-continuous cd ~/.openclaw/workspace/.local_projects/true-recall-gems/tr-continuous
python3 curator_timer.py python3 curator_timer.py
# Check OpenClaw plugin # Check OpenClaw plugin

View File

@@ -1,7 +1,7 @@
{ {
"timer_minutes": 5, "timer_minutes": 5,
"max_batch_size": 100, "max_batch_size": 100,
"user_id": "rob", "user_id": "<USER_ID>",
"source_collection": "memories_tr", "source_collection": "memories_tr",
"target_collection": "gems_tr" "target_collection": "gems_tr"
} }

View File

@@ -1,144 +1,102 @@
#!/usr/bin/env python3 #!/usr/bin/env python3
""" """
TrueRecall Timer Curator: Runs every 30 minutes via cron. TrueRecall v2 - Timer Curator
Runs every 5 minutes via cron
Extracts gems from uncurated memories and stores them in gems_tr
- Queries all uncurated memories from memories_tr REQUIRES: TrueRecall v1 (provides memories_tr via watcher)
- Sends batch to qwen3 for gem extraction
- Stores gems to gems_tr
- Marks processed memories as curated=true
Usage:
python3 curator_timer.py --config curator_config.json
python3 curator_timer.py --config curator_config.json --dry-run
""" """
import os
import sys import sys
import json import json
import argparse import hashlib
import requests import requests
from datetime import datetime, timezone from datetime import datetime, timezone
from pathlib import Path
from typing import List, Dict, Any, Optional from typing import List, Dict, Any, Optional
import hashlib
# Load config # Configuration - EDIT THESE for your environment
def load_config(config_path: str) -> Dict[str, Any]: QDRANT_URL = "http://<QDRANT_IP>:6333"
with open(config_path, 'r') as f: OLLAMA_URL = "http://<OLLAMA_IP>:11434"
return json.load(f) SOURCE_COLLECTION = "memories_tr"
TARGET_COLLECTION = "gems_tr"
# Default paths EMBEDDING_MODEL = "snowflake-arctic-embed2"
SCRIPT_DIR = Path(__file__).parent MAX_BATCH = 100
DEFAULT_CONFIG = SCRIPT_DIR / "curator_config.json" USER_ID = "<USER_ID>"
# Curator prompt path
CURATOR_PROMPT_PATH = Path("~/.openclaw/workspace/.local_projects/true-recall-v2/curator-prompt.md")
def load_curator_prompt() -> str: def get_uncurated_memories(qdrant_url: str, collection: str, user_id: str, max_batch: int = 100) -> List[Dict[str, Any]]:
"""Load the curator system prompt.""" """Fetch uncurated memories from Qdrant."""
try: try:
with open(CURATOR_PROMPT_PATH, 'r') as f: response = requests.post(
return f.read() f"{qdrant_url}/collections/{collection}/points/scroll",
except FileNotFoundError: json={
print(f"⚠️ Curator prompt not found at {CURATOR_PROMPT_PATH}") "limit": max_batch,
return """You are The Curator. Extract meaningful gems from conversation history. "filter": {
Extract facts, insights, decisions, preferences, and context that would be valuable to remember.
Output a JSON array of gems with fields: gem, context, snippet, categories, importance (1-5), confidence (0-0.99)."""
def get_uncurated_memories(qdrant_url: str, collection: str, user_id: str, max_batch: int) -> List[Dict[str, Any]]:
"""Query Qdrant for uncurated memories."""
filter_data = {
"must": [ "must": [
{"key": "user_id", "match": {"value": user_id}}, {"key": "user_id", "match": {"value": user_id}},
{"key": "curated", "match": {"value": False}} {"key": "curated", "match": {"value": False}}
] ]
} },
"with_payload": True
all_points = [] },
offset = None
iterations = 0
max_iterations = 10
while len(all_points) < max_batch and iterations < max_iterations:
iterations += 1
scroll_data = {
"limit": min(100, max_batch - len(all_points)),
"with_payload": True,
"filter": filter_data
}
if offset:
scroll_data["offset"] = offset
try:
response = requests.post(
f"{qdrant_url}/collections/{collection}/points/scroll",
json=scroll_data,
headers={"Content-Type": "application/json"},
timeout=30 timeout=30
) )
response.raise_for_status() response.raise_for_status()
result = response.json() data = response.json()
points = result.get("result", {}).get("points", []) return data.get("result", {}).get("points", [])
if not points:
break
all_points.extend(points)
offset = result.get("result", {}).get("next_page_offset")
if not offset:
break
except Exception as e: except Exception as e:
print(f"Error querying Qdrant: {e}", file=sys.stderr) print(f"Error fetching memories: {e}", file=sys.stderr)
break return []
# Convert to simple dicts
memories = []
for point in all_points:
payload = point.get("payload", {})
memories.append({
"id": point.get("id"),
"content": payload.get("content", ""),
"role": payload.get("role", ""),
"timestamp": payload.get("timestamp", ""),
"turn": payload.get("turn", 0),
**payload
})
return memories[:max_batch]
def extract_gems(memories: List[Dict[str, Any]], ollama_url: str) -> List[Dict[str, Any]]: def extract_gems(memories: List[Dict[str, Any]], ollama_url: str) -> List[Dict[str, Any]]:
"""Send memories to qwen3 for gem extraction.""" """Send memories to LLM for gem extraction."""
if not memories: if not memories:
return [] return []
# Build conversation from memories (support both 'text' and 'content' fields) SKIP_PATTERNS = [
"gems extracted", "curator", "curation complete",
"system is running", "validation round",
]
conversation_lines = [] conversation_lines = []
for i, mem in enumerate(memories): for i, mem in enumerate(memories):
# Support both migrated memories (text) and watcher memories (content) payload = mem.get("payload", {})
text = mem.get("text", "") or mem.get("content", "") text = payload.get("text", "") or payload.get("content", "")
if text: role = payload.get("role", "")
# Truncate very long texts
if not text:
continue
text = str(text)
if role == "assistant":
continue
text_lower = text.lower()
if len(text) < 20:
continue
if any(pattern in text_lower for pattern in SKIP_PATTERNS):
continue
text = text[:500] if len(text) > 500 else text text = text[:500] if len(text) > 500 else text
conversation_lines.append(f"[{i+1}] {text}") conversation_lines.append(f"[{i+1}] {text}")
if not conversation_lines:
return []
conversation_text = "\n\n".join(conversation_lines) conversation_text = "\n\n".join(conversation_lines)
# Simple extraction prompt
prompt = """You are a memory curator. Extract atomic facts from the conversation below. prompt = """You are a memory curator. Extract atomic facts from the conversation below.
For each distinct fact/decision/preference, output a JSON object with: For each distinct fact/decision/preference, output a JSON object with:
- "text": the atomic fact (1-2 sentences) - "text": the atomic fact (1-2 sentences) - use FIRST PERSON ("I" not "User")
- "category": one of [decision, preference, technical, project, knowledge, system] - "category": one of [decision, preference, technical, project, knowledge, system]
- "importance": "high" or "medium" - "importance": "high" or "medium"
Return ONLY a JSON array. Example: Return ONLY a JSON array. Example:
[ [
{"text": "User decided to use Redis for caching", "category": "decision", "importance": "high"}, {"text": "I decided to use Redis for caching", "category": "decision", "importance": "high"},
{"text": "User prefers dark mode", "category": "preference", "importance": "medium"} {"text": "I prefer dark mode", "category": "preference", "importance": "medium"}
] ]
If no extractable facts, return []. If no extractable facts, return [].
@@ -152,7 +110,7 @@ CONVERSATION:
response = requests.post( response = requests.post(
f"{ollama_url}/api/generate", f"{ollama_url}/api/generate",
json={ json={
"model": "qwen3:30b-a3b-instruct-2507-q8_0", "model": "<CURATOR_MODEL>",
"system": prompt, "system": prompt,
"prompt": full_prompt, "prompt": full_prompt,
"stream": False, "stream": False,
@@ -169,28 +127,20 @@ CONVERSATION:
return [] return []
result = response.json() result = response.json()
output = result.get('response', '').strip() response_text = result.get("response", "")
# Extract JSON from output
if '```json' in output:
output = output.split('```json')[1].split('```')[0].strip()
elif '```' in output:
output = output.split('```')[1].split('```')[0].strip()
try: try:
# Find JSON array in output start = response_text.find('[')
start_idx = output.find('[') end = response_text.rfind(']')
end_idx = output.rfind(']') if start == -1 or end == -1:
if start_idx != -1 and end_idx != -1 and end_idx > start_idx: return []
output = output[start_idx:end_idx+1] json_str = response_text[start:end+1]
gems = json.loads(json_str)
gems = json.loads(output)
if not isinstance(gems, list): if not isinstance(gems, list):
gems = [gems] if gems else [] return []
return gems return gems
except json.JSONDecodeError as e: except json.JSONDecodeError as e:
print(f"Error parsing curator output: {e}", file=sys.stderr) print(f"JSON parse error: {e}", file=sys.stderr)
print(f"Raw output: {repr(output[:500])}...", file=sys.stderr)
return [] return []
@@ -199,50 +149,35 @@ def get_embedding(text: str, ollama_url: str) -> Optional[List[float]]:
try: try:
response = requests.post( response = requests.post(
f"{ollama_url}/api/embeddings", f"{ollama_url}/api/embeddings",
json={"model": "snowflake-arctic-embed2", "prompt": text}, json={
"model": EMBEDDING_MODEL,
"prompt": text
},
timeout=30 timeout=30
) )
response.raise_for_status() response.raise_for_status()
return response.json()['embedding'] data = response.json()
return data.get("embedding")
except Exception as e: except Exception as e:
print(f"Error getting embedding: {e}", file=sys.stderr) print(f"Error getting embedding: {e}", file=sys.stderr)
return None return None
def store_gem(gem: Dict[str, Any], user_id: str, qdrant_url: str, target_collection: str, ollama_url: str) -> bool: def store_gem(gem: Dict[str, Any], vector: List[float], qdrant_url: str, target_collection: str, user_id: str) -> bool:
"""Store a single gem to Qdrant.""" """Store a gem in Qdrant."""
# Support both old format (gem, context, snippet) and new format (text, category, importance) embedding_text = gem.get("text", "") or gem.get("gem", "")
embedding_text = gem.get('text', '') or gem.get('gem', '')
if not embedding_text:
embedding_text = f"{gem.get('gem', '')} {gem.get('context', '')} {gem.get('snippet', '')}".strip()
if not embedding_text: hash_content = f"{user_id}:{embedding_text[:100]}"
print(f"⚠️ Empty embedding text for gem, skipping", file=sys.stderr)
return False
vector = get_embedding(embedding_text, ollama_url)
if vector is None:
print(f"⚠️ Failed to get embedding for gem", file=sys.stderr)
return False
# Generate ID
hash_content = f"{user_id}:{gem.get('conversation_id', '')}:{gem.get('turn_range', '')}:{gem.get('gem', '')[:50]}"
hash_bytes = hashlib.sha256(hash_content.encode()).digest()[:8] hash_bytes = hashlib.sha256(hash_content.encode()).digest()[:8]
gem_id = int.from_bytes(hash_bytes, byteorder='big') % (2**63) gem_id = int.from_bytes(hash_bytes, byteorder='big') % (2**63)
# Normalize gem fields - ensure we have text field
payload = { payload = {
"text": embedding_text,
"category": gem.get("category", "fact"),
"importance": gem.get("importance", "medium"),
"user_id": user_id, "user_id": user_id,
"text": gem.get('text', gem.get('gem', '')), "created_at": datetime.now(timezone.utc).isoformat()
"category": gem.get('category', 'general'),
"importance": gem.get('importance', 'medium'),
"curated_at": datetime.now(timezone.utc).isoformat()
} }
# Preserve any other fields from gem
for key in ['context', 'snippet', 'confidence', 'conversation_id', 'turn_range']:
if key in gem:
payload[key] = gem[key]
try: try:
response = requests.put( response = requests.put(
@@ -264,7 +199,7 @@ def store_gem(gem: Dict[str, Any], user_id: str, qdrant_url: str, target_collect
def mark_curated(memory_ids: List, qdrant_url: str, collection: str) -> bool: def mark_curated(memory_ids: List, qdrant_url: str, collection: str) -> bool:
"""Mark memories as curated in Qdrant using POST /points/payload format.""" """Mark memories as curated."""
if not memory_ids: if not memory_ids:
return True return True
@@ -288,79 +223,58 @@ def mark_curated(memory_ids: List, qdrant_url: str, collection: str) -> bool:
def main(): def main():
parser = argparse.ArgumentParser(description="TrueRecall Timer Curator") print("TrueRecall v2 - Timer Curator")
parser.add_argument("--config", "-c", default=str(DEFAULT_CONFIG), help="Config file path") print(f"User: {USER_ID}")
parser.add_argument("--dry-run", "-n", action="store_true", help="Don't write, just preview") print(f"Source: {SOURCE_COLLECTION}")
args = parser.parse_args() print(f"Target: {TARGET_COLLECTION}")
print(f"Max batch: {MAX_BATCH}\n")
config = load_config(args.config) print("Fetching uncurated memories...")
memories = get_uncurated_memories(QDRANT_URL, SOURCE_COLLECTION, USER_ID, MAX_BATCH)
qdrant_url = os.getenv("QDRANT_URL", "http://<QDRANT_IP>:6333") print(f"Found {len(memories)} uncurated memories\n")
ollama_url = os.getenv("OLLAMA_URL", "http://<OLLAMA_IP>:11434")
user_id = config.get("user_id", "rob")
source_collection = config.get("source_collection", "memories_tr")
target_collection = config.get("target_collection", "gems_tr")
max_batch = config.get("max_batch_size", 100)
print(f"🔍 TrueRecall Timer Curator")
print(f"👤 User: {user_id}")
print(f"📥 Source: {source_collection}")
print(f"💎 Target: {target_collection}")
print(f"📦 Max batch: {max_batch}")
if args.dry_run:
print("🏃 DRY RUN MODE")
print()
# Get uncurated memories
print("📥 Fetching uncurated memories...")
memories = get_uncurated_memories(qdrant_url, source_collection, user_id, max_batch)
print(f"✅ Found {len(memories)} uncurated memories")
if not memories: if not memories:
print("🤷 Nothing to curate. Exiting.") print("Nothing to curate. Exiting.")
return return
# Extract gems print("Sending memories to curator...")
print(f"\n🧠 Sending {len(memories)} memories to curator...") gems = extract_gems(memories, OLLAMA_URL)
gems = extract_gems(memories, ollama_url) print(f"Extracted {len(gems)} gems\n")
print(f"✅ Extracted {len(gems)} gems")
if not gems: if not gems:
print("⚠️ No gems extracted. Nothing to store.") print("No gems extracted. Exiting.")
# Still mark as curated so we don't reprocess
memory_ids = [m["id"] for m in memories] # Keep as integers
mark_curated(memory_ids, qdrant_url, source_collection)
return return
# Preview print("Gems preview:")
print("\n💎 Gems preview:")
for i, gem in enumerate(gems[:3], 1): for i, gem in enumerate(gems[:3], 1):
print(f" {i}. {gem.get('gem', 'N/A')[:80]}...") text = gem.get("text", "N/A")[:50]
print(f" {i}. {text}...")
if len(gems) > 3: if len(gems) > 3:
print(f" ... and {len(gems) - 3} more") print(f" ... and {len(gems) - 3} more")
print()
if args.dry_run: print("Storing gems...")
print("\n🏃 DRY RUN: Not storing gems or marking curated.")
return
# Store gems
print(f"\n💾 Storing {len(gems)} gems...")
stored = 0 stored = 0
for gem in gems: for gem in gems:
if store_gem(gem, user_id, qdrant_url, target_collection, ollama_url): text = gem.get("text", "") or gem.get("gem", "")
if not text:
continue
vector = get_embedding(text, OLLAMA_URL)
if vector:
if store_gem(gem, vector, QDRANT_URL, TARGET_COLLECTION, USER_ID):
stored += 1 stored += 1
print(f"✅ Stored: {stored}/{len(gems)}")
# Mark memories as curated print(f"Stored: {stored}/{len(gems)}\n")
print("\n📝 Marking memories as curated...")
memory_ids = [m["id"] for m in memories] # Keep as integers print("Marking memories as curated...")
if mark_curated(memory_ids, qdrant_url, source_collection): memory_ids = [mem.get("id") for mem in memories if mem.get("id")]
print(f"✅ Marked {len(memory_ids)} memories as curated") if mark_curated(memory_ids, QDRANT_URL, SOURCE_COLLECTION):
print(f"Marked {len(memory_ids)} memories as curated\n")
else: else:
print(f"⚠️ Failed to mark some memories as curated") print("Failed to mark memories\n")
print("\n🎉 Curation complete!") print("Curation complete!")
if __name__ == "__main__": if __name__ == "__main__":