docs: simplify README, update validation and curator docs

This commit is contained in:
root
2026-03-10 12:08:53 -05:00
parent 08aaddb4d0
commit 62953e9f39
6 changed files with 261 additions and 813 deletions

View File

@@ -1,6 +1,6 @@
# TrueRecall v2 - Git Validation Checklist
**Environment:** Git Repository (`.git_projects/true-recall-v2/`)
**Environment:** Git Repository (`.git_projects/true-recall-gems/`)
**Purpose:** Validate git-ready directory for public sharing
**Version:** 2.4
**Last Updated:** 2026-02-26

694
README.md
View File

@@ -1,613 +1,147 @@
# TrueRecall v2
# TrueRecall Gems (v2)
**Project:** Gem extraction and memory recall system
**Status:** ✅ Active & Verified
**Location:** `~/.openclaw/workspace/.local_projects/true-recall-v2/`
**Last Updated:** 2026-02-25 12:04 CST
**Purpose:** Memory curation (gems) + context injection
---
## Table of Contents
- [Quick Start](#quick-start)
- [Overview](#overview)
- [Current State](#current-state)
- [Architecture](#architecture)
- [Components](#components)
- [Files & Locations](#files--locations)
- [Configuration](#configuration)
- [Validation](#validation)
- [Troubleshooting](#troubleshooting)
- [Status Summary](#status-summary)
---
## Quick Start
```bash
# Check system status
openclaw status
sudo systemctl status mem-qdrant-watcher
# View recent captures
curl -s http://<QDRANT_IP>:6333/collections/memories_tr | jq '.result.points_count'
# Check collections
curl -s http://<QDRANT_IP>:6333/collections | jq '.result.collections[].name'
```
---
## Recent Fixes (2026-02-25 12:41 CST)
| Issue | Root Cause | Fix Applied |
|-------|------------|-------------|
| **Watcher stuck on old session** | Watcher only switched sessions when file deleted, old sessions persisted | ✅ Restarted service, now follows current session |
| **Plugin capture 0 exchanges** | OpenClaw uses OpenAI content format (array of items), plugin expected string | ✅ Added `extractMessageText()` to extract text from `type: "text"` items |
| **Gem ID collision** | Hash used non-existent fields (`conversation_id`, `turn_range`, `gem`) | ✅ Hash now uses `embedding_text_for_hash[:100]` |
| **Meta-gems extracted** | Curator extracted from debug/tool output | ✅ Added SKIP_PATTERNS filter ("gems extracted", "✅", "🔍", etc.) + skip `role: "assistant"` |
| **gems_tr pollution** | 5 meta-gems + 1 real gem | ✅ Cleaned, now 1 real gem only |
| **First-person format** | Third person "User decided..." | ✅ Changed to "I decided..." for better query matching (score 0.746 vs 0.39) |
### Validation Results
**Plugin capture:**
```
Before: parsed 14 user, 84 assistant messages, 0 exchanges
After: parsed 17 user, 116 assistant messages, 9 exchanges ✅
```
**Watcher:**
```
Before: Watching old session (1737142a... from Feb 24)
After: Watching current session (93dc32bf... from Feb 25) ✅
```
---
## Needed Improvements
| Issue | Description | Priority |
|-------|-------------|----------|
| **Semantic Deduplication** | No dedup between similar gems. Same fact phrased differently creates multiple gems. Need semantic similarity check before storage. | High |
| **Search Result Deduplication** | Similar gems both above threshold are both injected, causing redundancy. Need filter to remove near-duplicates from results. | Medium |
| **Gem Quality Scoring** | No quality metric. Some extracted gems may be low value. Need LLM-based quality scoring. | Medium |
| **Temporal Decay** | All gems treated equally regardless of age. Should weight recent gems higher. | Low |
| **Gem Merging/Updating** | When user changes preference, old gem still exists. Need mechanism to update/contradict old gems. | Low |
| **Importance Calibration** | All curator gems marked "medium" importance. Should dynamically assign based on significance. | Low |
### Gem Quality & Model Intelligence
**Gem quality improves significantly with smarter models:**
| Model | Gem Quality | Example |
|-------|-------------|---------|
| **Small models (7B)** | Basic extraction, may miss nuance | "User likes local AI" |
| **Medium models (30B)** | Better categorization, captures intent | "I prefer local AI over cloud services for privacy reasons" |
| **Large models (70B+)** | Rich context, infers significance, better first-person conversion | "I decided to self-host AI tools because I value data privacy and want to avoid vendor lock-in" |
**Example Gem (High Quality):**
```json
{
"text": "I decided to keep the installation simple and not include gems for the basic version",
"category": "decision",
"importance": "high"
}
```
**Current:** Using `qwen3:30b-a3b-instruct` for extraction (good balance of quality/speed).
**Recommendation:** For production use, consider `qwen3:72b` or `deepseek-r1` for higher gem quality.
**Status:** ⚠️ Requires true-recall-base to be installed first
---
## Overview
TrueRecall v2 is a **standalone memory system** that extracts "gems" (key insights) from conversations and injects them as context. It operates independently — not an addon or extension of any previous system.
TrueRecall Gems adds **curation** and **injection** on top of Base's capture foundation.
TrueRecall v2 replaces both Jarvis Memory and TrueRecall v1 with a completely re-architected solution:
| System | Status | Relationship to v2 |
|--------|--------|-------------------|
| **Jarvis Memory** | Legacy | Replaced by v2 |
| **TrueRecall v1** | Deprecated | Replaced by v2 |
| **TrueRecall v2** | ✅ Active | Complete standalone replacement |
### Three-Layer Architecture
1. **Capture** — Real-time watcher saves every turn to `memories_tr`
2. **Curation** — Timer-based curator extracts gems to `gems_tr`
3. **Injection** — Plugin searches `gems_tr` and injects gems per turn
**Key:** v2 requires no components from Jarvis Memory or v1. It is self-contained with its own storage (Qdrant-only), capture mechanism, and injection system.
**Gems is an ADDON:**
- Requires true-recall-base
- Independent from openclaw-true-recall-blocks
- Choose Gems OR Blocks, not both
---
## Current State
### Verified at 19:02 CST
| Collection | Points | Purpose | Status |
|------------|--------|---------|--------|
| `memories_tr` | **12,729** | Full text (live capture) | ✅ Active |
| `gems_tr` | **14+** | Curated gems (injection) | ✅ **WORKING** - Context injection verified |
**All memories tagged with `curated: false` for timer curation.**
### Services Status
| Service | Status | Details |
|---------|--------|---------|
| `mem-qdrant-watcher` | ✅ Active | PID 234, capturing |
| Timer curator | ✅ Deployed | Every 5 min via cron |
| OpenClaw Gateway | ✅ Running | Version 2026.2.23 |
| memory-qdrant plugin | ✅ Loaded | recall: gems_tr |
---
## Comparison: TrueRecall v2 vs Jarvis Memory vs v1
| Feature | Jarvis Memory | TrueRecall v1 | TrueRecall v2 |
|---------|---------------|---------------|---------------|
| **Storage** | Redis | Redis + Qdrant | Qdrant only |
| **Capture** | Session batch | Session batch | Real-time |
| **Curation** | Manual | Daily 2:45 AM | Timer (5 min) ✅ |
| **Embedding** | — | snowflake | snowflake-arctic-embed2 ✅ |
| **Curator LLM** | — | qwen3:4b | qwen3:30b |
| **State tracking** | — | — | `curated` tag |
| **Batch size** | — | 24h worth | Configurable |
| **JSON parsing** | — | Fallback needed | Native (30b) |
**Key Improvements v2:**
- ✅ Real-time capture (no batch delay)
- ✅ Timer-based curation (responsive vs daily)
- ✅ 30b curator (better gems, faster ~3s)
-`curated` tag (reliable state tracking)
- ✅ No Redis dependency (simpler stack)
---
## Architecture
### v2.2: Timer-Based Curation
## Three-Tier Architecture
```
┌─────────────────┐ ┌──────────────────────┐ ┌─────────────┐
│ OpenClaw Chat │────▶│ Real-Time Watcher │────▶│ memories_tr │
│ (Session JSONL)│ │ (Python daemon) │ │ (Qdrant) │
└─────────────────┘ └──────────────────────┘ └──────┬──────┘
│ Every 30 min
┌──────────────────┐
│ Timer Curator │
│ (cron/qwen3) │
└────────┬─────────┘
┌──────────────────┐
│ gems_tr │
│ (Qdrant) │
└────────┬─────────┘
Per turn │
┌──────────────────┐
│ memory-qdrant │
│ plugin │
└──────────────────┘
```
true-recall-base (REQUIRED)
├── Watcher daemon
└── memories_tr (raw capture)
└──▶ true-recall-gems (THIS ADDON)
├── Curator extracts gems
├── gems_tr (curated)
└── Plugin injection
**Key Changes in v2.2:**
- ✅ Timer-based curation (30 min intervals)
- ✅ All memories tagged `curated: false` on capture
- ✅ Migration complete (12,378 memories)
- ❌ Removed daily batch processing (2:45 AM)
---
## Components
### 1. Real-Time Watcher
**File:** `skills/qdrant-memory/scripts/realtime_qdrant_watcher.py`
**What it does:**
- Watches `~/.openclaw/agents/main/sessions/*.jsonl`
- Parses each turn (user + AI)
- Embeds with `snowflake-arctic-embed2`
- Stores to `memories_tr` instantly
- **Cleans:** Removes markdown, tables, metadata
**Service:** `mem-qdrant-watcher.service`
**Commands:**
```bash
# Check status
sudo systemctl status mem-qdrant-watcher
# View logs
sudo journalctl -u mem-qdrant-watcher -f
# Restart
sudo systemctl restart mem-qdrant-watcher
Note: Don't install with openclaw-true-recall-blocks.
Choose one addon: Gems OR Blocks.
```
---
### 2. Content Cleaner
## Prerequisites
**File:** `skills/qdrant-memory/scripts/clean_memories_tr.py`
**REQUIRED: Install TrueRecall Base first**
**Purpose:** Batch-clean existing points
**Usage:**
```bash
# Preview changes
python3 clean_memories_tr.py --dry-run
# Clean all
python3 clean_memories_tr.py --execute
# Clean 100 (test)
python3 clean_memories_tr.py --execute --limit 100
```
**Cleans:**
- `**bold**` → plain text
- `|tables|` → removed
- `` `code` `` → plain text
- `---` rules → removed
- `# headers` → removed
---
### 3. Timer Curator
**File:** `tr-continuous/curator_timer.py`
**Schedule:** Every 30 minutes (cron)
**Flow:**
1. Query uncurated memories from `memories_tr`
2. Send batch to qwen3 (max 100)
3. Extract gems → store to `gems_tr`
4. Mark memories as `curated: true`
**Config:** `tr-continuous/curator_config.json`
```json
{
"timer_minutes": 30,
"max_batch_size": 100
}
```
**Logs:** `/var/log/true-recall-timer.log`
---
### 4. Curation Model Comparison
**Current:** `qwen3:4b-instruct`
| Metric | 4b | 30b |
|--------|----|----|
| Speed | ~10-30s per batch | **~3.3s** (tested 2026-02-24) |
| JSON reliability | ⚠️ Needs fallback | ✅ Native |
| Context quality | Basic extraction | ✅ Nuanced |
| Snippet accuracy | ~80% | ✅ Expected: 95%+ |
**30b Benchmark (2026-02-24):**
- Load: 108ms
- Prompt eval: 49ms (1,576 tok/s)
- Generation: 2.9s (233 tokens, 80 tok/s)
- **Total: 3.26s**
**Trade-offs:**
- **4b:** Faster batch processing, lightweight, catches explicit decisions
- **30b:** Deeper context, better inference, ~3x slower but superior quality
**Gem Quality Comparison (Sample Review):**
| Aspect | 4b | 30b |
|--------|----|----|
| **Context depth** | "Extracted via fallback" | Explains *why* decisions were made |
| **Confidence scores** | 0.7-0.85 | 0.9-0.97 |
| **Snippet accuracy** | ~80% (wrong source) | ✅ 95%+ (relevant quotes) |
| **Categories** | Generic "extracted" | Specific: knowledge, technical, decision |
| **Example** | "User implemented BorgBackup" (no context) | "User selected mxbai... due to top MTEB score of 66.5" (explains reasoning) |
**Verdict:** 30b produces significantly higher quality gems — richer context, accurate snippets, and captures architectural intent, not just surface facts.
---
### 5. OpenClaw Compactor Configuration
**Status:** ✅ Applied
**Goal:** Minimal overhead — just remove context, do nothing else.
**Config Applied:**
```json5
{
agents: {
defaults: {
compaction: {
mode: "default", // "default" or "safeguard"
reserveTokensFloor: 0, // Disable safety floor (default: 20000)
memoryFlush: {
enabled: false // Disable silent .md file writes
}
}
}
}
}
```
**What this does:**
- `mode: "default"` — Standard summarization (faster)
- `reserveTokensFloor: 0` — Allow aggressive settings (disables 20k minimum)
- `memoryFlush.enabled: false` — No silent "write memory" turns
**Note:** `reserveTokens` and `keepRecentTokens` are Pi runtime settings, not configurable via `agents.defaults.compaction`. They are set per-model in `contextWindow`/`contextTokens`.
---
### 6. Configuration Options Reference
**All configurable options with defaults:**
| Option | Default | Description |
|--------|---------|-------------|
| **Embedding model** | `mxbai-embed-large` | Model for generating gem embeddings. `mxbai` = higher accuracy (MTEB 66.5). `snowflake` = faster processing. |
| **Timer interval** | `5` minutes | How often the curator runs. `5 min` = fast backlog clearing. `30 min` = balanced. `60 min` = minimal overhead. |
| **Batch size** | `100` | Max memories sent to curator per run. Higher = fewer API calls but more memory usage. |
| **Max gems per run** | *(unlimited)* | Hard limit on gems extracted per batch. Not set by default — extracts all found gems. |
| **Qdrant URL** | `http://<QDRANT_IP>:6333` | Vector database endpoint. Change if Qdrant runs on different host/port. |
| **Ollama URL** | `http://<OLLAMA_IP>:11434` | LLM endpoint for gem extraction. Change if Ollama runs elsewhere. |
| **Curator LLM** | `qwen3:30b-a3b-instruct` | Model for extracting gems. `30b` = best quality (~3s). `4b` = faster but needs JSON fallback. |
| **User ID** | `rob` | Owner identifier for memories. Used for filtering and multi-user setups. |
| **Source collection** | `memories_tr` | Qdrant collection for raw captured memories. |
| **Target collection** | `gems_tr` | Qdrant collection for curated gems (injected into context). |
| **Watcher service** | `enabled` | Real-time capture daemon. Reads session JSONL and writes to Qdrant. |
| **Cron timer** | `enabled` | Periodic curation job. Runs `curator_timer.py` on schedule. |
| **Log path** | `/var/log/true-recall-timer.log` | Where curator output is written. Check with `tail -f`. |
| **Dry-run mode** | `disabled` | Test mode — shows what would be curated without writing to Qdrant. |
**OpenClaw-side options:**
| Option | Default | Description |
|--------|---------|-------------|
| **Compactor mode** | `default` | How context is summarized. `default` = fast standard. `safeguard` = chunked for very long sessions. |
| **Memory flush** | `disabled` | If enabled, writes silent "memory" turn before compaction. Adds overhead — disabled for minimal lag. |
| **Context pruning** | `cache-ttl` | Removes old tool results from context. `cache-ttl` = prunes hourly. `off` = no pruning. |
---
### 7. Embedding Models
**Current Setup:**
- `memories_tr`: `snowflake-arctic-embed2` (capture)
- `gems_tr`: `snowflake-arctic-embed2` (recall) ✅ **FIXED** - Both collections now use same model
**Note:** Previously used `mxbai-embed-large` for gems, but this caused embedding model mismatch. Fixed 2026-02-25.
---
### 6. memory-qdrant Plugin
**Location:** `~/.openclaw/extensions/memory-qdrant/`
**Config (openclaw.json):**
```json
{
"collectionName": "gems_tr",
"captureCollection": "memories_tr",
"autoRecall": true,
"autoCapture": true
}
```
**Functions:**
- **Recall:** Searches `gems_tr`, injects gems (hidden)
- **Capture:** Session-level to `memories_tr` (backup)
---
## Files & Locations
### Core Project
```
~/.openclaw/workspace/.local_projects/true-recall-v2/
├── README.md # This file
├── session.md # Detailed notes
├── curator-prompt.md # Extraction prompt
├── tr-daily/
│ └── curate_from_qdrant.py # Daily curator
└── shared/
```
### New Files (2026-02-24)
| File | Purpose |
|------|---------|
| `tr-continuous/curator_timer.py` | Timer curator (v2.2) |
| `tr-continuous/curator_config.json` | Curator settings |
| `tr-continuous/migrate_add_curated.py` | Migration script |
| `skills/qdrant-memory/scripts/realtime_qdrant_watcher.py` | Capture daemon |
| `skills/qdrant-memory/mem-qdrant-watcher.service` | Systemd service |
### Archived Files (v2.1)
| File | Status | Note |
|------|--------|------|
| `tr-daily/curate_from_qdrant.py` | 📦 Archived | Replaced by timer |
| `tr-continuous/curator_by_count.py` | 📦 Archived | Replaced by timer |
### System Files
| File | Purpose |
|------|---------|
| `~/.openclaw/extensions/memory-qdrant/` | Plugin code |
| `~/.openclaw/openclaw.json` | Configuration |
| `/etc/systemd/system/mem-qdrant-watcher.service` | Service file |
---
## Configuration
### memory-qdrant Plugin
**File:** `~/.openclaw/openclaw.json`
```json
{
"memory-qdrant": {
"config": {
"autoCapture": true,
"autoRecall": true,
"collectionName": "gems_tr",
"captureCollection": "memories_tr",
"embeddingModel": "snowflake-arctic-embed2",
"maxRecallResults": 2,
"minRecallScore": 0.7,
"ollamaUrl": "http://<OLLAMA_IP>:11434",
"qdrantUrl": "http://<QDRANT_IP>:6333"
},
"enabled": true
}
}
```
### Gateway Control UI (OpenClaw 2026.2.23)
```json
{
"gateway": {
"controlUi": {
"allowedOrigins": ["*"],
"allowInsecureAuth": false,
"dangerouslyDisableDeviceAuth": true
}
}
}
```
---
## Validation
### Check Collections
Base provides the capture infrastructure (`memories_tr` collection).
```bash
# Count points
# Verify base is running
curl -s http://<QDRANT_IP>:6333/collections/memories_tr | jq '.result.points_count'
```
## Installation
### 1. Curator Setup
**Install cron job:**
```bash
# Edit path and add to crontab
echo "*/5 * * * * cd <INSTALL_PATH>/true-recall-gems/tr-continuous && /usr/bin/python3 curator_timer.py >> /var/log/true-recall-timer.log 2>&1" | sudo crontab -
sudo touch /var/log/true-recall-timer.log
sudo chmod 644 /var/log/true-recall-timer.log
```
**Configure curator_config.json:**
```json
{
"timer_minutes": 5,
"max_batch_size": 100,
"user_id": "your-user-id",
"source_collection": "memories_tr",
"target_collection": "gems_tr"
}
```
**Edit curator_timer.py:**
- Replace `<QDRANT_IP>`, `<OLLAMA_IP>` with your endpoints
- Replace `<USER_ID>` with your identifier
- Replace `<CURATOR_MODEL>` with your LLM (e.g., `qwen3:30b`)
### 2. Injection Setup
Add to your OpenClaw `openclaw.json`:
```json
{
"plugins": {
"entries": {
"memory-qdrant": {
"config": {
"autoCapture": true,
"autoRecall": true,
"captureCollection": "memories_tr",
"collectionName": "gems_tr",
"embeddingModel": "snowflake-arctic-embed2",
"maxRecallResults": 2,
"minRecallScore": 0.8,
"ollamaUrl": "http://<OLLAMA_IP>:11434",
"qdrantUrl": "http://<QDRANT_IP>:6333"
},
"enabled": true
}
},
"slots": {
"memory": "memory-qdrant"
}
}
}
```
---
## Files
| File | Purpose |
|------|---------|
| `tr-continuous/curator_timer.py` | Timer-based curator |
| `tr-continuous/curator_config.json` | Curator settings template |
---
## Verification
```bash
# Check v1 capture
curl -s http://<QDRANT_IP>:6333/collections/memories_tr | jq '.result.points_count'
# Check v2 curation
curl -s http://<QDRANT_IP>:6333/collections/gems_tr | jq '.result.points_count'
# View recent captures
curl -s -X POST http://<QDRANT_IP>:6333/collections/memories_tr/points/scroll \
-H "Content-Type: application/json" \
-d '{"limit": 3, "with_payload": true}' | jq '.result.points[].payload.content'
```
### Check Services
```bash
# Watcher
sudo systemctl status mem-qdrant-watcher
sudo journalctl -u mem-qdrant-watcher -n 20
# OpenClaw
openclaw status
openclaw gateway status
```
### Test Capture
Send a message, then check:
```bash
# Should increase by 1-2 points
curl -s http://<QDRANT_IP>:6333/collections/memories_tr | jq '.result.points_count'
# Check curator logs
tail -20 /var/log/true-recall-timer.log
```
---
## Troubleshooting
## Dependencies
### Watcher Not Capturing
```bash
# Check logs
sudo journalctl -u mem-qdrant-watcher -f
# Verify dependencies
curl http://<QDRANT_IP>:6333/ # Qdrant
curl http://<OLLAMA_IP>:11434/api/tags # Ollama
```
### Plugin Not Loading
```bash
# Validate config
openclaw config validate
# Check logs
tail /tmp/openclaw/openclaw-$(date +%Y-%m-%d).log | grep memory-qdrant
# Restart gateway
openclaw gateway restart
```
### Gateway Won't Start (OpenClaw 2026.2.23+)
**Error:** `non-loopback Control UI requires gateway.controlUi.allowedOrigins`
**Fix:** Add to `openclaw.json`:
```json
"gateway": {
"controlUi": {
"allowedOrigins": ["*"]
}
}
```
| Component | Provided By | Required For |
|-----------|-------------|--------------|
| Capture | v1 | v2 (input) |
| Curation | v2 | Injection |
| Injection | v2 | Context recall |
---
## Status Summary
| Component | Status | Notes |
|-----------|--------|-------|
| Real-time watcher | ✅ Active | PID 1748, capturing |
| memories_tr | ✅ 12,378 pts | All tagged `curated: false` |
| gems_tr | ✅ 5 pts | Injection ready |
| Timer curator | ✅ Deployed | Every 30 min via cron |
| Plugin injection | ✅ **WORKING** | Context injection verified - score 0.587 |
| Migration | ✅ Complete | 12,378 memories |
**Logs:** `tail /var/log/true-recall-timer.log`
**Next:** Monitor first timer run
---
## Roadmap
### Planned Features
| Feature | Status | Description |
|---------|--------|-------------|
| Interactive install script | ⏳ Planned | Prompts for embedding model, timer interval, batch size, endpoints |
| Single embedding model | ⏳ Planned | Option to use one model for both collections |
| Configurable thresholds | ⏳ Planned | Per-user customization via prompts |
**Install script will prompt for:**
1. **Embedding model** — snowflake (fast) vs mxbai (accurate)
2. **Timer interval** — 5 min / 30 min / hourly
3. **Batch size** — 50 / 100 / 500 memories
4. **Endpoints** — Qdrant/Ollama URLs
5. **User ID** — for multi-user setups
---
**Maintained by:** Rob
**AI Assistant:** Kimi 🎙️
**Version:** 2026.02.24-v2.2
**Version:** 2.0
**Requires:** TrueRecall v1
**Collections:** `memories_tr` (v1), `gems_tr` (v2)

View File

@@ -1,6 +1,6 @@
# TrueRecall v2 - Master Audit Checklist (GIT)
# TrueRecall Gems - Master Audit Checklist (GIT)
**For:** `.git_projects/true-recall-v2/` (Git Repository - Sanitized)
**For:** `.git_projects/true-recall-gems/` (Git Repository - Sanitized)
**Version:** 2.2
**Last Updated:** 2026-02-25 10:07 CST
@@ -91,18 +91,18 @@ This checklist validates the **git repository** where all private IPs, absolute
| # | File | Path | Status |
|---|------|------|--------|
| 2.1.1 | README.md | `.local_projects/true-recall-v2/README.md` | ☐ |
| 2.1.2 | session.md | `.local_projects/true-recall-v2/session.md` | ☐ |
| 2.1.3 | checklist.md | `.local_projects/true-recall-v2/checklist.md` | ☐ |
| 2.1.4 | curator-prompt.md | `.local_projects/true-recall-v2/curator-prompt.md` | ☐ |
| 2.1.1 | README.md | `.local_projects/true-recall-gems/README.md` | ☐ |
| 2.1.2 | session.md | `.local_projects/true-recall-gems/session.md` | ☐ |
| 2.1.3 | checklist.md | `.local_projects/true-recall-gems/checklist.md` | ☐ |
| 2.1.4 | curator-prompt.md | `.local_projects/true-recall-gems/curator-prompt.md` | ☐ |
### 2.2 Scripts Exist
| # | File | Path | Status |
|---|------|------|--------|
| 2.2.1 | curator_timer.py | `.local_projects/true-recall-v2/tr-continuous/curator_timer.py` | ☐ |
| 2.2.2 | curator_config.json | `.local_projects/true-recall-v2/tr-continuous/curator_config.json` | ☐ |
| 2.2.3 | install.py | `.local_projects/true-recall-v2/install.py` | ☐ |
| 2.2.1 | curator_timer.py | `.local_projects/true-recall-gems/tr-continuous/curator_timer.py` | ☐ |
| 2.2.2 | curator_config.json | `.local_projects/true-recall-gems/tr-continuous/curator_config.json` | ☐ |
| 2.2.3 | install.py | `.local_projects/true-recall-gems/install.py` | ☐ |
### 2.3 Watcher Files
@@ -347,7 +347,7 @@ curl -s -X POST http://<QDRANT_IP>:6333/collections/memories_tr/points/count \
-d '{"filter":{"must":[{"key":"curated","match":{"value":false}}]}}'
# Manual curator run
cd ~/.openclaw/workspace/.local_projects/true-recall-v2/tr-continuous
cd ~/.openclaw/workspace/.local_projects/true-recall-gems/tr-continuous
python3 curator_timer.py --dry-run
# Restart services
@@ -357,4 +357,4 @@ sudo systemctl restart mem-qdrant-watcher
---
*This checklist is for LOCAL working directory validation only.*
*For git/public checks, see `audit_checklist.md` in `.git_projects/true-recall-v2/`*
*For git/public checks, see `audit_checklist.md` in `.git_projects/true-recall-gems/`*

View File

@@ -6,14 +6,14 @@
**Qdrant:** http://<QDRANT_IP>:6333
**Ollama:** http://<OLLAMA_IP>:11434
**Timer:** 5 minutes
**Working Dir:** ~/.openclaw/workspace/.local_projects/true-recall-v2
**Working Dir:** ~/.openclaw/workspace/.local_projects/true-recall-gems
---
## Quick Status Check
```bash
cd ~/.openclaw/workspace/.local_projects/true-recall-v2
cd ~/.openclaw/workspace/.local_projects/true-recall-gems
```
---
@@ -22,13 +22,13 @@ cd ~/.openclaw/workspace/.local_projects/true-recall-v2
| Check | Command | Expected |
|-------|---------|----------|
| Local project exists | `ls ~/.openclaw/workspace/.local_projects/true-recall-v2` | Files listed |
| Git project exists | `ls ~/.openclaw/workspace/.git_projects/true-recall-v2` | Files listed |
| Local project exists | `ls ~/.openclaw/workspace/.local_projects/true-recall-gems` | Files listed |
| Git project exists | `ls ~/.openclaw/workspace/.git_projects/true-recall-gems` | Files listed |
| Watcher script | `ls ~/.openclaw/workspace/skills/qdrant-memory/scripts/realtime_qdrant_watcher.py` | File exists |
**Our Paths:**
- Local: `~/.openclaw/workspace/.local_projects/true-recall-v2/`
- Git: `~/.openclaw/workspace/.git_projects/true-recall-v2/`
- Local: `~/.openclaw/workspace/.local_projects/true-recall-gems/`
- Git: `~/.openclaw/workspace/.git_projects/true-recall-gems/`
- Watcher: `~/.openclaw/workspace/skills/qdrant-memory/scripts/realtime_qdrant_watcher.py`
- Systemd: `/etc/systemd/system/mem-qdrant-watcher.service`
@@ -131,8 +131,8 @@ cd ~/.openclaw/workspace/.local_projects/true-recall-v2
| Path | Check | Status |
|------|-------|--------|
| Watcher script | `skills/qdrant-memory/scripts/realtime_qdrant_watcher.py` | ☐ |
| Curator script | `.local_projects/true-recall-v2/tr-continuous/curator_timer.py` | ☐ |
| Config file | `.local_projects/true-recall-v2/tr-continuous/curator_config.json` | ☐ |
| Curator script | `.local_projects/true-recall-gems/tr-continuous/curator_timer.py` | ☐ |
| Config file | `.local_projects/true-recall-gems/tr-continuous/curator_config.json` | ☐ |
| Log file | `/var/log/true-recall-timer.log` | ☐ |
---
@@ -153,7 +153,7 @@ curl -s -X POST http://<QDRANT_IP>:6333/collections/memories_tr/points/count \
-d '{"filter":{"must":[{"key":"user_id","match":{"value":"rob"}},{"key":"curated","match":{"value":false}}]}}' | jq .result.count
# Run curator manually (Our path: .local_projects)
cd ~/.openclaw/workspace/.local_projects/true-recall-v2/tr-continuous
cd ~/.openclaw/workspace/.local_projects/true-recall-gems/tr-continuous
python3 curator_timer.py
# Check OpenClaw plugin

View File

@@ -1,7 +1,7 @@
{
"timer_minutes": 5,
"max_batch_size": 100,
"user_id": "rob",
"user_id": "<USER_ID>",
"source_collection": "memories_tr",
"target_collection": "gems_tr"
}

View File

@@ -1,144 +1,102 @@
#!/usr/bin/env python3
"""
TrueRecall Timer Curator: Runs every 30 minutes via cron.
TrueRecall v2 - Timer Curator
Runs every 5 minutes via cron
Extracts gems from uncurated memories and stores them in gems_tr
- Queries all uncurated memories from memories_tr
- Sends batch to qwen3 for gem extraction
- Stores gems to gems_tr
- Marks processed memories as curated=true
Usage:
python3 curator_timer.py --config curator_config.json
python3 curator_timer.py --config curator_config.json --dry-run
REQUIRES: TrueRecall v1 (provides memories_tr via watcher)
"""
import os
import sys
import json
import argparse
import hashlib
import requests
from datetime import datetime, timezone
from pathlib import Path
from typing import List, Dict, Any, Optional
import hashlib
# Load config
def load_config(config_path: str) -> Dict[str, Any]:
with open(config_path, 'r') as f:
return json.load(f)
# Default paths
SCRIPT_DIR = Path(__file__).parent
DEFAULT_CONFIG = SCRIPT_DIR / "curator_config.json"
# Curator prompt path
CURATOR_PROMPT_PATH = Path("~/.openclaw/workspace/.local_projects/true-recall-v2/curator-prompt.md")
# Configuration - EDIT THESE for your environment
QDRANT_URL = "http://<QDRANT_IP>:6333"
OLLAMA_URL = "http://<OLLAMA_IP>:11434"
SOURCE_COLLECTION = "memories_tr"
TARGET_COLLECTION = "gems_tr"
EMBEDDING_MODEL = "snowflake-arctic-embed2"
MAX_BATCH = 100
USER_ID = "<USER_ID>"
def load_curator_prompt() -> str:
"""Load the curator system prompt."""
def get_uncurated_memories(qdrant_url: str, collection: str, user_id: str, max_batch: int = 100) -> List[Dict[str, Any]]:
"""Fetch uncurated memories from Qdrant."""
try:
with open(CURATOR_PROMPT_PATH, 'r') as f:
return f.read()
except FileNotFoundError:
print(f"⚠️ Curator prompt not found at {CURATOR_PROMPT_PATH}")
return """You are The Curator. Extract meaningful gems from conversation history.
Extract facts, insights, decisions, preferences, and context that would be valuable to remember.
Output a JSON array of gems with fields: gem, context, snippet, categories, importance (1-5), confidence (0-0.99)."""
def get_uncurated_memories(qdrant_url: str, collection: str, user_id: str, max_batch: int) -> List[Dict[str, Any]]:
"""Query Qdrant for uncurated memories."""
filter_data = {
"must": [
{"key": "user_id", "match": {"value": user_id}},
{"key": "curated", "match": {"value": False}}
]
}
all_points = []
offset = None
iterations = 0
max_iterations = 10
while len(all_points) < max_batch and iterations < max_iterations:
iterations += 1
scroll_data = {
"limit": min(100, max_batch - len(all_points)),
"with_payload": True,
"filter": filter_data
}
if offset:
scroll_data["offset"] = offset
try:
response = requests.post(
f"{qdrant_url}/collections/{collection}/points/scroll",
json=scroll_data,
headers={"Content-Type": "application/json"},
timeout=30
)
response.raise_for_status()
result = response.json()
points = result.get("result", {}).get("points", [])
if not points:
break
all_points.extend(points)
offset = result.get("result", {}).get("next_page_offset")
if not offset:
break
except Exception as e:
print(f"Error querying Qdrant: {e}", file=sys.stderr)
break
# Convert to simple dicts
memories = []
for point in all_points:
payload = point.get("payload", {})
memories.append({
"id": point.get("id"),
"content": payload.get("content", ""),
"role": payload.get("role", ""),
"timestamp": payload.get("timestamp", ""),
"turn": payload.get("turn", 0),
**payload
})
return memories[:max_batch]
response = requests.post(
f"{qdrant_url}/collections/{collection}/points/scroll",
json={
"limit": max_batch,
"filter": {
"must": [
{"key": "user_id", "match": {"value": user_id}},
{"key": "curated", "match": {"value": False}}
]
},
"with_payload": True
},
timeout=30
)
response.raise_for_status()
data = response.json()
return data.get("result", {}).get("points", [])
except Exception as e:
print(f"Error fetching memories: {e}", file=sys.stderr)
return []
def extract_gems(memories: List[Dict[str, Any]], ollama_url: str) -> List[Dict[str, Any]]:
"""Send memories to qwen3 for gem extraction."""
"""Send memories to LLM for gem extraction."""
if not memories:
return []
# Build conversation from memories (support both 'text' and 'content' fields)
SKIP_PATTERNS = [
"gems extracted", "curator", "curation complete",
"system is running", "validation round",
]
conversation_lines = []
for i, mem in enumerate(memories):
# Support both migrated memories (text) and watcher memories (content)
text = mem.get("text", "") or mem.get("content", "")
if text:
# Truncate very long texts
text = text[:500] if len(text) > 500 else text
conversation_lines.append(f"[{i+1}] {text}")
payload = mem.get("payload", {})
text = payload.get("text", "") or payload.get("content", "")
role = payload.get("role", "")
if not text:
continue
text = str(text)
if role == "assistant":
continue
text_lower = text.lower()
if len(text) < 20:
continue
if any(pattern in text_lower for pattern in SKIP_PATTERNS):
continue
text = text[:500] if len(text) > 500 else text
conversation_lines.append(f"[{i+1}] {text}")
if not conversation_lines:
return []
conversation_text = "\n\n".join(conversation_lines)
# Simple extraction prompt
prompt = """You are a memory curator. Extract atomic facts from the conversation below.
For each distinct fact/decision/preference, output a JSON object with:
- "text": the atomic fact (1-2 sentences)
- "text": the atomic fact (1-2 sentences) - use FIRST PERSON ("I" not "User")
- "category": one of [decision, preference, technical, project, knowledge, system]
- "importance": "high" or "medium"
Return ONLY a JSON array. Example:
[
{"text": "User decided to use Redis for caching", "category": "decision", "importance": "high"},
{"text": "User prefers dark mode", "category": "preference", "importance": "medium"}
{"text": "I decided to use Redis for caching", "category": "decision", "importance": "high"},
{"text": "I prefer dark mode", "category": "preference", "importance": "medium"}
]
If no extractable facts, return [].
@@ -152,7 +110,7 @@ CONVERSATION:
response = requests.post(
f"{ollama_url}/api/generate",
json={
"model": "qwen3:30b-a3b-instruct-2507-q8_0",
"model": "<CURATOR_MODEL>",
"system": prompt,
"prompt": full_prompt,
"stream": False,
@@ -169,28 +127,20 @@ CONVERSATION:
return []
result = response.json()
output = result.get('response', '').strip()
# Extract JSON from output
if '```json' in output:
output = output.split('```json')[1].split('```')[0].strip()
elif '```' in output:
output = output.split('```')[1].split('```')[0].strip()
response_text = result.get("response", "")
try:
# Find JSON array in output
start_idx = output.find('[')
end_idx = output.rfind(']')
if start_idx != -1 and end_idx != -1 and end_idx > start_idx:
output = output[start_idx:end_idx+1]
gems = json.loads(output)
start = response_text.find('[')
end = response_text.rfind(']')
if start == -1 or end == -1:
return []
json_str = response_text[start:end+1]
gems = json.loads(json_str)
if not isinstance(gems, list):
gems = [gems] if gems else []
return []
return gems
except json.JSONDecodeError as e:
print(f"Error parsing curator output: {e}", file=sys.stderr)
print(f"Raw output: {repr(output[:500])}...", file=sys.stderr)
print(f"JSON parse error: {e}", file=sys.stderr)
return []
@@ -199,50 +149,35 @@ def get_embedding(text: str, ollama_url: str) -> Optional[List[float]]:
try:
response = requests.post(
f"{ollama_url}/api/embeddings",
json={"model": "snowflake-arctic-embed2", "prompt": text},
json={
"model": EMBEDDING_MODEL,
"prompt": text
},
timeout=30
)
response.raise_for_status()
return response.json()['embedding']
data = response.json()
return data.get("embedding")
except Exception as e:
print(f"Error getting embedding: {e}", file=sys.stderr)
return None
def store_gem(gem: Dict[str, Any], user_id: str, qdrant_url: str, target_collection: str, ollama_url: str) -> bool:
"""Store a single gem to Qdrant."""
# Support both old format (gem, context, snippet) and new format (text, category, importance)
embedding_text = gem.get('text', '') or gem.get('gem', '')
if not embedding_text:
embedding_text = f"{gem.get('gem', '')} {gem.get('context', '')} {gem.get('snippet', '')}".strip()
def store_gem(gem: Dict[str, Any], vector: List[float], qdrant_url: str, target_collection: str, user_id: str) -> bool:
"""Store a gem in Qdrant."""
embedding_text = gem.get("text", "") or gem.get("gem", "")
if not embedding_text:
print(f"⚠️ Empty embedding text for gem, skipping", file=sys.stderr)
return False
vector = get_embedding(embedding_text, ollama_url)
if vector is None:
print(f"⚠️ Failed to get embedding for gem", file=sys.stderr)
return False
# Generate ID
hash_content = f"{user_id}:{gem.get('conversation_id', '')}:{gem.get('turn_range', '')}:{gem.get('gem', '')[:50]}"
hash_content = f"{user_id}:{embedding_text[:100]}"
hash_bytes = hashlib.sha256(hash_content.encode()).digest()[:8]
gem_id = int.from_bytes(hash_bytes, byteorder='big') % (2**63)
# Normalize gem fields - ensure we have text field
payload = {
"text": embedding_text,
"category": gem.get("category", "fact"),
"importance": gem.get("importance", "medium"),
"user_id": user_id,
"text": gem.get('text', gem.get('gem', '')),
"category": gem.get('category', 'general'),
"importance": gem.get('importance', 'medium'),
"curated_at": datetime.now(timezone.utc).isoformat()
"created_at": datetime.now(timezone.utc).isoformat()
}
# Preserve any other fields from gem
for key in ['context', 'snippet', 'confidence', 'conversation_id', 'turn_range']:
if key in gem:
payload[key] = gem[key]
try:
response = requests.put(
@@ -264,7 +199,7 @@ def store_gem(gem: Dict[str, Any], user_id: str, qdrant_url: str, target_collect
def mark_curated(memory_ids: List, qdrant_url: str, collection: str) -> bool:
"""Mark memories as curated in Qdrant using POST /points/payload format."""
"""Mark memories as curated."""
if not memory_ids:
return True
@@ -288,79 +223,58 @@ def mark_curated(memory_ids: List, qdrant_url: str, collection: str) -> bool:
def main():
parser = argparse.ArgumentParser(description="TrueRecall Timer Curator")
parser.add_argument("--config", "-c", default=str(DEFAULT_CONFIG), help="Config file path")
parser.add_argument("--dry-run", "-n", action="store_true", help="Don't write, just preview")
args = parser.parse_args()
print("TrueRecall v2 - Timer Curator")
print(f"User: {USER_ID}")
print(f"Source: {SOURCE_COLLECTION}")
print(f"Target: {TARGET_COLLECTION}")
print(f"Max batch: {MAX_BATCH}\n")
config = load_config(args.config)
qdrant_url = os.getenv("QDRANT_URL", "http://<QDRANT_IP>:6333")
ollama_url = os.getenv("OLLAMA_URL", "http://<OLLAMA_IP>:11434")
user_id = config.get("user_id", "rob")
source_collection = config.get("source_collection", "memories_tr")
target_collection = config.get("target_collection", "gems_tr")
max_batch = config.get("max_batch_size", 100)
print(f"🔍 TrueRecall Timer Curator")
print(f"👤 User: {user_id}")
print(f"📥 Source: {source_collection}")
print(f"💎 Target: {target_collection}")
print(f"📦 Max batch: {max_batch}")
if args.dry_run:
print("🏃 DRY RUN MODE")
print()
# Get uncurated memories
print("📥 Fetching uncurated memories...")
memories = get_uncurated_memories(qdrant_url, source_collection, user_id, max_batch)
print(f"✅ Found {len(memories)} uncurated memories")
print("Fetching uncurated memories...")
memories = get_uncurated_memories(QDRANT_URL, SOURCE_COLLECTION, USER_ID, MAX_BATCH)
print(f"Found {len(memories)} uncurated memories\n")
if not memories:
print("🤷 Nothing to curate. Exiting.")
print("Nothing to curate. Exiting.")
return
# Extract gems
print(f"\n🧠 Sending {len(memories)} memories to curator...")
gems = extract_gems(memories, ollama_url)
print(f"✅ Extracted {len(gems)} gems")
print("Sending memories to curator...")
gems = extract_gems(memories, OLLAMA_URL)
print(f"Extracted {len(gems)} gems\n")
if not gems:
print("⚠️ No gems extracted. Nothing to store.")
# Still mark as curated so we don't reprocess
memory_ids = [m["id"] for m in memories] # Keep as integers
mark_curated(memory_ids, qdrant_url, source_collection)
print("No gems extracted. Exiting.")
return
# Preview
print("\n💎 Gems preview:")
print("Gems preview:")
for i, gem in enumerate(gems[:3], 1):
print(f" {i}. {gem.get('gem', 'N/A')[:80]}...")
text = gem.get("text", "N/A")[:50]
print(f" {i}. {text}...")
if len(gems) > 3:
print(f" ... and {len(gems) - 3} more")
print()
if args.dry_run:
print("\n🏃 DRY RUN: Not storing gems or marking curated.")
return
# Store gems
print(f"\n💾 Storing {len(gems)} gems...")
print("Storing gems...")
stored = 0
for gem in gems:
if store_gem(gem, user_id, qdrant_url, target_collection, ollama_url):
stored += 1
print(f"✅ Stored: {stored}/{len(gems)}")
text = gem.get("text", "") or gem.get("gem", "")
if not text:
continue
vector = get_embedding(text, OLLAMA_URL)
if vector:
if store_gem(gem, vector, QDRANT_URL, TARGET_COLLECTION, USER_ID):
stored += 1
# Mark memories as curated
print("\n📝 Marking memories as curated...")
memory_ids = [m["id"] for m in memories] # Keep as integers
if mark_curated(memory_ids, qdrant_url, source_collection):
print(f"✅ Marked {len(memory_ids)} memories as curated")
print(f"Stored: {stored}/{len(gems)}\n")
print("Marking memories as curated...")
memory_ids = [mem.get("id") for mem in memories if mem.get("id")]
if mark_curated(memory_ids, QDRANT_URL, SOURCE_COLLECTION):
print(f"Marked {len(memory_ids)} memories as curated\n")
else:
print(f"⚠️ Failed to mark some memories as curated")
print("Failed to mark memories\n")
print("\n🎉 Curation complete!")
print("Curation complete!")
if __name__ == "__main__":