Files
true-recall/README.md

683 lines
21 KiB
Markdown
Raw Normal View History

# TrueRecall v2
**Project:** Gem extraction and memory recall system
**Status:** ✅ Active & Verified
**Location:** `~/.openclaw/workspace/.projects/true-recall-v2/`
**Last Updated:** 2026-02-24 19:02 CST
---
## Table of Contents
- [Quick Start](#quick-start)
- [Overview](#overview)
- [Current State](#current-state)
- [Architecture](#architecture)
- [Components](#components)
- [Files & Locations](#files--locations)
- [Configuration](#configuration)
- [Validation](#validation)
- [Troubleshooting](#troubleshooting)
- [Status Summary](#status-summary)
---
## Quick Start
```bash
# Check system status
openclaw status
sudo systemctl status mem-qdrant-watcher
# View recent captures
curl -s http://<QDRANT_IP>:6333/collections/memories_tr | jq '.result.points_count'
# Check collections
curl -s http://<QDRANT_IP>:6333/collections | jq '.result.collections[].name'
```
---
## Overview
TrueRecall v2 extracts "gems" (key insights) from conversations and injects them as context. It consists of three layers:
1. **Capture** — Real-time watcher saves every turn to `memories_tr`
2. **Curation** — Daily curator extracts gems to `gems_tr`
3. **Injection** — Plugin searches `gems_tr` and injects gems per turn
---
## Current State
### Verified at 19:02 CST
| Collection | Points | Purpose | Status |
|------------|--------|---------|--------|
| `memories_tr` | **12,378** | Full text (live capture) | ✅ Active |
| `gems_tr` | **5** | Curated gems (injection) | ✅ Active |
**All memories tagged with `curated: false` for timer curation.**
### Services Status
| Service | Status | Details |
|---------|--------|---------|
| `mem-qdrant-watcher` | ✅ Active | PID 1748, capturing |
| Timer curator | ✅ Deployed | Every 30 min via cron |
| OpenClaw Gateway | ✅ Running | Version 2026.2.23 |
| memory-qdrant plugin | ✅ Loaded | recall: gems_tr |
---
## Comparison: TrueRecall v2 vs Jarvis Memory vs v1
| Feature | Jarvis Memory | TrueRecall v1 | TrueRecall v2 |
|---------|---------------|---------------|---------------|
| **Storage** | Redis | Redis + Qdrant | Qdrant only |
| **Capture** | Session batch | Session batch | Real-time |
| **Curation** | Manual | Daily 2:45 AM | Timer (5 min) |
| **Embedding** | — | snowflake | snowflake + mxbai |
| **Curator LLM** | — | qwen3:4b | qwen3:30b |
| **State tracking** | — | — | `curated` tag |
| **Batch size** | — | 24h worth | Configurable |
| **JSON parsing** | — | Fallback needed | Native (30b) |
**Key Improvements v2:**
- ✅ Real-time capture (no batch delay)
- ✅ Timer-based curation (responsive vs daily)
- ✅ 30b curator (better gems, faster ~3s)
-`curated` tag (reliable state tracking)
- ✅ No Redis dependency (simpler stack)
---
## Architecture
### v2.2: Timer-Based Curation
```
┌─────────────────┐ ┌──────────────────────┐ ┌─────────────┐
│ OpenClaw Chat │────▶│ Real-Time Watcher │────▶│ memories_tr │
│ (Session JSONL)│ │ (Python daemon) │ │ (Qdrant) │
└─────────────────┘ └──────────────────────┘ └──────┬──────┘
│ Every 30 min
┌──────────────────┐
│ Timer Curator │
│ (cron/qwen3) │
└────────┬─────────┘
┌──────────────────┐
│ gems_tr │
│ (Qdrant) │
└────────┬─────────┘
Per turn │
┌──────────────────┐
│ memory-qdrant │
│ plugin │
└──────────────────┘
```
**Key Changes in v2.2:**
- ✅ Timer-based curation (30 min intervals)
- ✅ All memories tagged `curated: false` on capture
- ✅ Migration complete (12,378 memories)
- ❌ Removed daily batch processing (2:45 AM)
---
## Components
### 1. Real-Time Watcher
**File:** `skills/qdrant-memory/scripts/realtime_qdrant_watcher.py`
**What it does:**
- Watches `~/.openclaw/agents/main/sessions/*.jsonl`
- Parses each turn (user + AI)
- Embeds with `snowflake-arctic-embed2`
- Stores to `memories_tr` instantly
- **Cleans:** Removes markdown, tables, metadata
**Service:** `mem-qdrant-watcher.service`
**Commands:**
```bash
# Check status
sudo systemctl status mem-qdrant-watcher
# View logs
sudo journalctl -u mem-qdrant-watcher -f
# Restart
sudo systemctl restart mem-qdrant-watcher
```
---
### 2. Content Cleaner
**File:** `skills/qdrant-memory/scripts/clean_memories_tr.py`
**Purpose:** Batch-clean existing points
**Usage:**
```bash
# Preview changes
python3 clean_memories_tr.py --dry-run
# Clean all
python3 clean_memories_tr.py --execute
# Clean 100 (test)
python3 clean_memories_tr.py --execute --limit 100
```
**Cleans:**
- `**bold**` → plain text
- `|tables|` → removed
- `` `code` `` → plain text
- `---` rules → removed
- `# headers` → removed
---
### 3. Timer Curator
**File:** `tr-continuous/curator_timer.py`
**Schedule:** Every 30 minutes (cron)
**Flow:**
1. Query uncurated memories from `memories_tr`
2. Send batch to qwen3 (max 100)
3. Extract gems → store to `gems_tr`
4. Mark memories as `curated: true`
**Config:** `tr-continuous/curator_config.json`
```json
{
"timer_minutes": 30,
"max_batch_size": 100
}
```
**Logs:** `/var/log/true-recall-timer.log`
---
### 4. Curation Model Comparison
**Current:** `qwen3:4b-instruct`
| Metric | 4b | 30b |
|--------|----|----|
| Speed | ~10-30s per batch | **~3.3s** (tested 2026-02-24) |
| JSON reliability | ⚠️ Needs fallback | ✅ Native |
| Context quality | Basic extraction | ✅ Nuanced |
| Snippet accuracy | ~80% | ✅ Expected: 95%+ |
**30b Benchmark (2026-02-24):**
- Load: 108ms
- Prompt eval: 49ms (1,576 tok/s)
- Generation: 2.9s (233 tokens, 80 tok/s)
- **Total: 3.26s**
**Trade-offs:**
- **4b:** Faster batch processing, lightweight, catches explicit decisions
- **30b:** Deeper context, better inference, ~3x slower but superior quality
**Gem Quality Comparison (Sample Review):**
| Aspect | 4b | 30b |
|--------|----|----|
| **Context depth** | "Extracted via fallback" | Explains *why* decisions were made |
| **Confidence scores** | 0.7-0.85 | 0.9-0.97 |
| **Snippet accuracy** | ~80% (wrong source) | ✅ 95%+ (relevant quotes) |
| **Categories** | Generic "extracted" | Specific: knowledge, technical, decision |
| **Example** | "User implemented BorgBackup" (no context) | "User selected mxbai... due to top MTEB score of 66.5" (explains reasoning) |
**Verdict:** 30b produces significantly higher quality gems — richer context, accurate snippets, and captures architectural intent, not just surface facts.
---
### 5. Semantic Deduplication (Similarity Checking)
**Why:** Smaller models (4b) often extract duplicate or near-duplicate gems. Without checking, your `gems_tr` collection fills with redundant entries.
**The Problem:**
- "User decided on Redis" and "User selected Redis for caching" are the same gem
- Smaller models lack nuance — they extract surface variations as separate gems
- Over time, 30-50% of gems may be duplicates
**Solution: Semantic Similarity Check**
Before inserting a new gem:
1. Embed the candidate gem text
2. Search `gems_tr` for similar embeddings (past 24h)
3. If similarity > 0.85, SKIP (don't insert)
4. If similarity 0.70-0.85, MERGE (update existing with richer context)
5. If similarity < 0.70, INSERT (new unique gem)
**Implementation Options:**
#### Option A: Built-in Curator Check (Recommended)
Modify `curator_timer.py` to add pre-insertion similarity check:
```python
import numpy as np
from qdrant_client import QdrantClient
qdrant = QdrantClient("http://<QDRANT_IP>:6333")
def is_duplicate(gem_text: str, user_id: str = "rob", threshold: float = 0.85) -> bool:
"""Check if similar gem exists in past 24h"""
# Embed the candidate
response = requests.post(
"http://<OLLAMA_IP>:11434/api/embeddings",
json={"model": "mxbai-embed-large", "prompt": gem_text}
)
embedding = response.json()["embedding"]
# Search for similar gems
results = qdrant.search(
collection_name="gems_tr",
query_vector=embedding,
limit=3,
query_filter={
"must": [
{"key": "user_id", "match": {"value": user_id}},
{"key": "timestamp", "range": {"gte": "now-24h"}}
]
}
)
# Check similarity scores
for result in results:
if result.score > threshold:
return True # Duplicate found
return False
# In main loop, before inserting:
if is_duplicate(gem["gem"]):
log.info(f"Skipping duplicate gem: {gem['gem'][:50]}...")
continue
```
**Pros:** Catches duplicates at source, no extra jobs
**Cons:** Adds ~50-100ms per gem (embedding call)
#### Option B: Periodic AI Review (Subagent Task)
Have a subagent periodically review and merge duplicates:
```bash
# Run weekly via cron
0 3 * * 0 cd <PROJECT_PATH> && python3 dedup_gems.py
```
**dedup_gems.py approach:**
1. Load all gems from past 7 days
2. Group by semantic similarity (clustering)
3. For each cluster > 1 gem:
- Keep highest confidence gem as primary
- Merge context from others into primary
- Delete duplicates
**Pros:** Can use reasoning model for nuanced merging
**Cons:** Batch job, duplicates exist until cleanup runs
#### Option C: Real-time Watcher Hook
Add deduplication to the real-time watcher before memories are even stored:
```python
# In watcher, before upsert to memories_tr
if is_similar_to_recent(memory_text, window="1h"):
memory["duplicate_of"] = similar_id # Tag but still store
```
**Pros:** Prevents duplicate memories upstream
**Cons:** Memories may differ slightly even if gems would be same
**Recommendation by Model:**
| Model | Recommended Approach | Reason |
|-------|---------------------|--------|
| **4b** | **Option A + B** | Built-in check prevents duplicates; periodic review catches edge cases |
| **30b** | **Option B only** | 30b produces fewer duplicates; weekly review sufficient |
| **Production** | **Option A** | Best balance of prevention and performance |
**Configuration:**
Add to `curator_config.json`:
```json
{
"deduplication": {
"enabled": true,
"similarity_threshold": 0.85,
"lookback_hours": 24,
"mode": "skip" // "skip", "merge", or "flag"
}
}
```
---
### 6. OpenClaw Compactor Configuration
**Status:** ✅ Applied
**Goal:** Minimal overhead — just remove context, do nothing else.
**Config Applied:**
```json5
{
agents: {
defaults: {
compaction: {
mode: "default", // "default" or "safeguard"
reserveTokensFloor: 0, // Disable safety floor (default: 20000)
memoryFlush: {
enabled: false // Disable silent .md file writes
}
}
}
}
}
```
**What this does:**
- `mode: "default"` — Standard summarization (faster)
- `reserveTokensFloor: 0` — Allow aggressive settings (disables 20k minimum)
- `memoryFlush.enabled: false` — No silent "write memory" turns
**Known Issue: UI Glitch During Compaction**
When compaction runs, the Control UI may briefly behave unexpectedly:
- Typed text may not appear immediately after hitting Enter
- Messages may render out of order briefly
- UI "catches up" within 1-2 seconds after compaction completes
**Why:** Compaction replaces the full conversation history with a summary. The UI's WebSocket state can get briefly out of sync during this transition.
**Workaround:**
- Wait 2-3 seconds after hitting Enter during compaction
- Or hard refresh (Ctrl+Shift+R) if UI seems stuck
- **Note:** This is an OpenClaw Control UI limitation — cannot be fixed from TrueRecall side at this time.
**Note:** `reserveTokens` and `keepRecentTokens` are Pi runtime settings, not configurable via `agents.defaults.compaction`. They are set per-model in `contextWindow`/`contextTokens`.
---
### 7. Configuration Options Reference
**All configurable options with defaults:**
| Option | Default | Description |
|--------|---------|-------------|
| **Embedding model** | `mxbai-embed-large` | Model for generating gem embeddings. `mxbai` = higher accuracy (MTEB 66.5). `snowflake` = faster processing. |
| **Timer interval** | `5` minutes | How often the curator runs. `5 min` = fast backlog clearing. `30 min` = balanced. `60 min` = minimal overhead. |
| **Batch size** | `100` | Max memories sent to curator per run. Higher = fewer API calls but more memory usage. |
| **Max gems per run** | *(unlimited)* | Hard limit on gems extracted per batch. Not set by default — extracts all found gems. |
| **Qdrant URL** | `http://<QDRANT_IP>:6333` | Vector database endpoint. Change if Qdrant runs on different host/port. |
| **Ollama URL** | `http://<OLLAMA_IP>:11434` | LLM endpoint for gem extraction. Change if Ollama runs elsewhere. |
| **Curator LLM** | `qwen3:30b-a3b-instruct` | Model for extracting gems. `30b` = best quality (~3s). `4b` = faster but needs JSON fallback. |
| **User ID** | `rob` | Owner identifier for memories. Used for filtering and multi-user setups. |
| **Source collection** | `memories_tr` | Qdrant collection for raw captured memories. |
| **Target collection** | `gems_tr` | Qdrant collection for curated gems (injected into context). |
| **Watcher service** | `enabled` | Real-time capture daemon. Reads session JSONL and writes to Qdrant. |
| **Cron timer** | `enabled` | Periodic curation job. Runs `curator_timer.py` on schedule. |
| **Log path** | `/var/log/true-recall-timer.log` | Where curator output is written. Check with `tail -f`. |
| **Dry-run mode** | `disabled` | Test mode — shows what would be curated without writing to Qdrant. |
**OpenClaw-side options:**
| Option | Default | Description |
|--------|---------|-------------|
| **Compactor mode** | `default` | How context is summarized. `default` = fast standard. `safeguard` = chunked for very long sessions. |
| **Memory flush** | `disabled` | If enabled, writes silent "memory" turn before compaction. Adds overhead — disabled for minimal lag. |
| **Context pruning** | `cache-ttl` | Removes old tool results from context. `cache-ttl` = prunes hourly. `off` = no pruning. |
---
### 8. Embedding Models
**Current Setup:**
- `memories_tr`: `snowflake-arctic-embed2` (capture similarity)
- `gems_tr`: `mxbai-embed-large` (recall similarity)
**Rationale:**
- mxbai has higher MTEB score (66.5) for semantic search
- snowflake is faster for high-volume capture
**Note:** For simplicity, a single embedding model could be used for both collections. This would reduce complexity and memory overhead, though with slightly lower recall performance.
---
### 9. memory-qdrant Plugin
**Location:** `~/.openclaw/extensions/memory-qdrant/`
**Config (openclaw.json):**
```json
{
"collectionName": "gems_tr",
"captureCollection": "memories_tr",
"autoRecall": true,
"autoCapture": true
}
```
**Functions:**
- **Recall:** Searches `gems_tr`, injects gems (hidden)
- **Capture:** Session-level to `memories_tr` (backup)
---
## Files & Locations
### Core Project
```
~/.openclaw/workspace/.projects/true-recall-v2/
├── README.md # This file
├── session.md # Detailed notes
├── curator-prompt.md # Extraction prompt
├── tr-daily/
│ └── curate_from_qdrant.py # Daily curator
└── shared/
```
### New Files (2026-02-24)
| File | Purpose |
|------|---------|
| `tr-continuous/curator_timer.py` | Timer curator (v2.2) |
| `tr-continuous/curator_config.json` | Curator settings |
| `tr-continuous/migrate_add_curated.py` | Migration script |
| `skills/qdrant-memory/scripts/realtime_qdrant_watcher.py` | Capture daemon |
| `skills/qdrant-memory/mem-qdrant-watcher.service` | Systemd service |
### Archived Files (v2.1)
| File | Status | Note |
|------|--------|------|
| `tr-daily/curate_from_qdrant.py` | 📦 Archived | Replaced by timer |
| `tr-continuous/curator_by_count.py` | 📦 Archived | Replaced by timer |
### System Files
| File | Purpose |
|------|---------|
| `~/.openclaw/extensions/memory-qdrant/` | Plugin code |
| `~/.openclaw/openclaw.json` | Configuration |
| `/etc/systemd/system/mem-qdrant-watcher.service` | Service file |
---
## Configuration
### memory-qdrant Plugin
**File:** `~/.openclaw/openclaw.json`
```json
{
"memory-qdrant": {
"config": {
"autoCapture": true,
"autoRecall": true,
"collectionName": "gems_tr",
"captureCollection": "memories_tr",
"embeddingModel": "snowflake-arctic-embed2",
"maxRecallResults": 2,
"minRecallScore": 0.7,
"ollamaUrl": "http://<OLLAMA_IP>:11434",
"qdrantUrl": "http://<QDRANT_IP>:6333"
},
"enabled": true
}
}
```
### Gateway Control UI (OpenClaw 2026.2.23)
```json
{
"gateway": {
"controlUi": {
"allowedOrigins": ["*"],
"allowInsecureAuth": false,
"dangerouslyDisableDeviceAuth": true
}
}
}
```
---
## Validation
### Check Collections
```bash
# Count points
curl -s http://<QDRANT_IP>:6333/collections/memories_tr | jq '.result.points_count'
curl -s http://<QDRANT_IP>:6333/collections/gems_tr | jq '.result.points_count'
# View recent captures
curl -s -X POST http://<QDRANT_IP>:6333/collections/memories_tr/points/scroll \
-H "Content-Type: application/json" \
-d '{"limit": 3, "with_payload": true}' | jq '.result.points[].payload.content'
```
### Check Services
```bash
# Watcher
sudo systemctl status mem-qdrant-watcher
sudo journalctl -u mem-qdrant-watcher -n 20
# OpenClaw
openclaw status
openclaw gateway status
```
### Test Capture
Send a message, then check:
```bash
# Should increase by 1-2 points
curl -s http://<QDRANT_IP>:6333/collections/memories_tr | jq '.result.points_count'
```
---
## Troubleshooting
### Watcher Not Capturing
```bash
# Check logs
sudo journalctl -u mem-qdrant-watcher -f
# Verify dependencies
curl http://<QDRANT_IP>:6333/ # Qdrant
curl http://<OLLAMA_IP>:11434/api/tags # Ollama
```
### Plugin Not Loading
```bash
# Validate config
openclaw config validate
# Check logs
tail /tmp/openclaw/openclaw-$(date +%Y-%m-%d).log | grep memory-qdrant
# Restart gateway
openclaw gateway restart
```
### Gateway Won't Start (OpenClaw 2026.2.23+)
**Error:** `non-loopback Control UI requires gateway.controlUi.allowedOrigins`
**Fix:** Add to `openclaw.json`:
```json
"gateway": {
"controlUi": {
"allowedOrigins": ["*"]
}
}
```
---
## Status Summary
| Component | Status | Notes |
|-----------|--------|-------|
| Real-time watcher | ✅ Active | PID 1748, capturing |
| memories_tr | ✅ 12,378 pts | All tagged `curated: false` |
| gems_tr | ✅ 5 pts | Injection ready |
| Timer curator | ✅ Deployed | Every 30 min via cron |
| Plugin injection | ✅ Working | Uses gems_tr |
| Migration | ✅ Complete | 12,378 memories |
**Logs:** `tail /var/log/true-recall-timer.log`
**Next:** Monitor first timer run
---
## Roadmap
### Planned Features
| Feature | Status | Description |
|---------|--------|-------------|
| Interactive install script | ⏳ Planned | Prompts for embedding model, timer interval, batch size, endpoints |
| Single embedding model | ⏳ Planned | Option to use one model for both collections |
| Configurable thresholds | ⏳ Planned | Per-user customization via prompts |
**Install script will prompt for:**
1. **Embedding model** — snowflake (fast) vs mxbai (accurate)
2. **Timer interval** — 5 min / 30 min / hourly
3. **Batch size** — 50 / 100 / 500 memories
4. **Endpoints** — Qdrant/Ollama URLs
5. **User ID** — for multi-user setups
---
**Maintained by:** Rob
**AI Assistant:** Kimi 🎙️
**Version:** 2026.02.24-v2.2