- Why smaller models need deduplication (4b vs 30b) - Three implementation options (built-in, periodic AI, watcher hook) - Code example for pre-insertion similarity check - Configuration options for deduplication settings - Recommendations by model size - Fixed section numbering
TrueRecall v2
Project: Gem extraction and memory recall system
Status: ✅ Active & Verified
Location: ~/.openclaw/workspace/.projects/true-recall-v2/
Last Updated: 2026-02-24 19:02 CST
Table of Contents
- Quick Start
- Overview
- Current State
- Architecture
- Components
- Files & Locations
- Configuration
- Validation
- Troubleshooting
- Status Summary
Quick Start
# Check system status
openclaw status
sudo systemctl status mem-qdrant-watcher
# View recent captures
curl -s http://<QDRANT_IP>:6333/collections/memories_tr | jq '.result.points_count'
# Check collections
curl -s http://<QDRANT_IP>:6333/collections | jq '.result.collections[].name'
Overview
TrueRecall v2 extracts "gems" (key insights) from conversations and injects them as context. It consists of three layers:
- Capture — Real-time watcher saves every turn to
memories_tr - Curation — Daily curator extracts gems to
gems_tr - Injection — Plugin searches
gems_trand injects gems per turn
Current State
Verified at 19:02 CST
| Collection | Points | Purpose | Status |
|---|---|---|---|
memories_tr |
12,378 | Full text (live capture) | ✅ Active |
gems_tr |
5 | Curated gems (injection) | ✅ Active |
All memories tagged with curated: false for timer curation.
Services Status
| Service | Status | Details |
|---|---|---|
mem-qdrant-watcher |
✅ Active | PID 1748, capturing |
| Timer curator | ✅ Deployed | Every 30 min via cron |
| OpenClaw Gateway | ✅ Running | Version 2026.2.23 |
| memory-qdrant plugin | ✅ Loaded | recall: gems_tr |
Comparison: TrueRecall v2 vs Jarvis Memory vs v1
| Feature | Jarvis Memory | TrueRecall v1 | TrueRecall v2 |
|---|---|---|---|
| Storage | Redis | Redis + Qdrant | Qdrant only |
| Capture | Session batch | Session batch | Real-time |
| Curation | Manual | Daily 2:45 AM | Timer (5 min) |
| Embedding | — | snowflake | snowflake + mxbai |
| Curator LLM | — | qwen3:4b | qwen3:30b |
| State tracking | — | — | curated tag |
| Batch size | — | 24h worth | Configurable |
| JSON parsing | — | Fallback needed | Native (30b) |
Key Improvements v2:
- ✅ Real-time capture (no batch delay)
- ✅ Timer-based curation (responsive vs daily)
- ✅ 30b curator (better gems, faster ~3s)
- ✅
curatedtag (reliable state tracking) - ✅ No Redis dependency (simpler stack)
Architecture
v2.2: Timer-Based Curation
┌─────────────────┐ ┌──────────────────────┐ ┌─────────────┐
│ OpenClaw Chat │────▶│ Real-Time Watcher │────▶│ memories_tr │
│ (Session JSONL)│ │ (Python daemon) │ │ (Qdrant) │
└─────────────────┘ └──────────────────────┘ └──────┬──────┘
│
│ Every 30 min
▼
┌──────────────────┐
│ Timer Curator │
│ (cron/qwen3) │
└────────┬─────────┘
│
▼
┌──────────────────┐
│ gems_tr │
│ (Qdrant) │
└────────┬─────────┘
│
Per turn │
▼
┌──────────────────┐
│ memory-qdrant │
│ plugin │
└──────────────────┘
Key Changes in v2.2:
- ✅ Timer-based curation (30 min intervals)
- ✅ All memories tagged
curated: falseon capture - ✅ Migration complete (12,378 memories)
- ❌ Removed daily batch processing (2:45 AM)
Components
1. Real-Time Watcher
File: skills/qdrant-memory/scripts/realtime_qdrant_watcher.py
What it does:
- Watches
~/.openclaw/agents/main/sessions/*.jsonl - Parses each turn (user + AI)
- Embeds with
snowflake-arctic-embed2 - Stores to
memories_trinstantly - Cleans: Removes markdown, tables, metadata
Service: mem-qdrant-watcher.service
Commands:
# Check status
sudo systemctl status mem-qdrant-watcher
# View logs
sudo journalctl -u mem-qdrant-watcher -f
# Restart
sudo systemctl restart mem-qdrant-watcher
2. Content Cleaner
File: skills/qdrant-memory/scripts/clean_memories_tr.py
Purpose: Batch-clean existing points
Usage:
# Preview changes
python3 clean_memories_tr.py --dry-run
# Clean all
python3 clean_memories_tr.py --execute
# Clean 100 (test)
python3 clean_memories_tr.py --execute --limit 100
Cleans:
**bold**→ plain text|tables|→ removed`code`→ plain text---rules → removed# headers→ removed
3. Timer Curator
File: tr-continuous/curator_timer.py
Schedule: Every 30 minutes (cron)
Flow:
- Query uncurated memories from
memories_tr - Send batch to qwen3 (max 100)
- Extract gems → store to
gems_tr - Mark memories as
curated: true
Config: tr-continuous/curator_config.json
{
"timer_minutes": 30,
"max_batch_size": 100
}
Logs: /var/log/true-recall-timer.log
4. Curation Model Comparison
Current: qwen3:4b-instruct
| Metric | 4b | 30b |
|---|---|---|
| Speed | ~10-30s per batch | ~3.3s (tested 2026-02-24) |
| JSON reliability | ⚠️ Needs fallback | ✅ Native |
| Context quality | Basic extraction | ✅ Nuanced |
| Snippet accuracy | ~80% | ✅ Expected: 95%+ |
30b Benchmark (2026-02-24):
- Load: 108ms
- Prompt eval: 49ms (1,576 tok/s)
- Generation: 2.9s (233 tokens, 80 tok/s)
- Total: 3.26s
Trade-offs:
- 4b: Faster batch processing, lightweight, catches explicit decisions
- 30b: Deeper context, better inference, ~3x slower but superior quality
Gem Quality Comparison (Sample Review):
| Aspect | 4b | 30b |
|---|---|---|
| Context depth | "Extracted via fallback" | Explains why decisions were made |
| Confidence scores | 0.7-0.85 | 0.9-0.97 |
| Snippet accuracy | ~80% (wrong source) | ✅ 95%+ (relevant quotes) |
| Categories | Generic "extracted" | Specific: knowledge, technical, decision |
| Example | "User implemented BorgBackup" (no context) | "User selected mxbai... due to top MTEB score of 66.5" (explains reasoning) |
Verdict: 30b produces significantly higher quality gems — richer context, accurate snippets, and captures architectural intent, not just surface facts.
5. Semantic Deduplication (Similarity Checking)
Why: Smaller models (4b) often extract duplicate or near-duplicate gems. Without checking, your gems_tr collection fills with redundant entries.
The Problem:
- "User decided on Redis" and "User selected Redis for caching" are the same gem
- Smaller models lack nuance — they extract surface variations as separate gems
- Over time, 30-50% of gems may be duplicates
Solution: Semantic Similarity Check
Before inserting a new gem:
- Embed the candidate gem text
- Search
gems_trfor similar embeddings (past 24h) - If similarity > 0.85, SKIP (don't insert)
- If similarity 0.70-0.85, MERGE (update existing with richer context)
- If similarity < 0.70, INSERT (new unique gem)
Implementation Options:
Option A: Built-in Curator Check (Recommended)
Modify curator_timer.py to add pre-insertion similarity check:
import numpy as np
from qdrant_client import QdrantClient
qdrant = QdrantClient("http://<QDRANT_IP>:6333")
def is_duplicate(gem_text: str, user_id: str = "rob", threshold: float = 0.85) -> bool:
"""Check if similar gem exists in past 24h"""
# Embed the candidate
response = requests.post(
"http://<OLLAMA_IP>:11434/api/embeddings",
json={"model": "mxbai-embed-large", "prompt": gem_text}
)
embedding = response.json()["embedding"]
# Search for similar gems
results = qdrant.search(
collection_name="gems_tr",
query_vector=embedding,
limit=3,
query_filter={
"must": [
{"key": "user_id", "match": {"value": user_id}},
{"key": "timestamp", "range": {"gte": "now-24h"}}
]
}
)
# Check similarity scores
for result in results:
if result.score > threshold:
return True # Duplicate found
return False
# In main loop, before inserting:
if is_duplicate(gem["gem"]):
log.info(f"Skipping duplicate gem: {gem['gem'][:50]}...")
continue
Pros: Catches duplicates at source, no extra jobs Cons: Adds ~50-100ms per gem (embedding call)
Option B: Periodic AI Review (Subagent Task)
Have a subagent periodically review and merge duplicates:
# Run weekly via cron
0 3 * * 0 cd <PROJECT_PATH> && python3 dedup_gems.py
dedup_gems.py approach:
- Load all gems from past 7 days
- Group by semantic similarity (clustering)
- For each cluster > 1 gem:
- Keep highest confidence gem as primary
- Merge context from others into primary
- Delete duplicates
Pros: Can use reasoning model for nuanced merging Cons: Batch job, duplicates exist until cleanup runs
Option C: Real-time Watcher Hook
Add deduplication to the real-time watcher before memories are even stored:
# In watcher, before upsert to memories_tr
if is_similar_to_recent(memory_text, window="1h"):
memory["duplicate_of"] = similar_id # Tag but still store
Pros: Prevents duplicate memories upstream Cons: Memories may differ slightly even if gems would be same
Recommendation by Model:
| Model | Recommended Approach | Reason |
|---|---|---|
| 4b | Option A + B | Built-in check prevents duplicates; periodic review catches edge cases |
| 30b | Option B only | 30b produces fewer duplicates; weekly review sufficient |
| Production | Option A | Best balance of prevention and performance |
Configuration:
Add to curator_config.json:
{
"deduplication": {
"enabled": true,
"similarity_threshold": 0.85,
"lookback_hours": 24,
"mode": "skip" // "skip", "merge", or "flag"
}
}
6. OpenClaw Compactor Configuration
Status: ✅ Applied
Goal: Minimal overhead — just remove context, do nothing else.
Config Applied:
{
agents: {
defaults: {
compaction: {
mode: "default", // "default" or "safeguard"
reserveTokensFloor: 0, // Disable safety floor (default: 20000)
memoryFlush: {
enabled: false // Disable silent .md file writes
}
}
}
}
}
What this does:
mode: "default"— Standard summarization (faster)reserveTokensFloor: 0— Allow aggressive settings (disables 20k minimum)memoryFlush.enabled: false— No silent "write memory" turns
Note: reserveTokens and keepRecentTokens are Pi runtime settings, not configurable via agents.defaults.compaction. They are set per-model in contextWindow/contextTokens.
7. Configuration Options Reference
All configurable options with defaults:
| Option | Default | Description |
|---|---|---|
| Embedding model | mxbai-embed-large |
Model for generating gem embeddings. mxbai = higher accuracy (MTEB 66.5). snowflake = faster processing. |
| Timer interval | 5 minutes |
How often the curator runs. 5 min = fast backlog clearing. 30 min = balanced. 60 min = minimal overhead. |
| Batch size | 100 |
Max memories sent to curator per run. Higher = fewer API calls but more memory usage. |
| Max gems per run | (unlimited) | Hard limit on gems extracted per batch. Not set by default — extracts all found gems. |
| Qdrant URL | http://<QDRANT_IP>:6333 |
Vector database endpoint. Change if Qdrant runs on different host/port. |
| Ollama URL | http://<OLLAMA_IP>:11434 |
LLM endpoint for gem extraction. Change if Ollama runs elsewhere. |
| Curator LLM | qwen3:30b-a3b-instruct |
Model for extracting gems. 30b = best quality (~3s). 4b = faster but needs JSON fallback. |
| User ID | rob |
Owner identifier for memories. Used for filtering and multi-user setups. |
| Source collection | memories_tr |
Qdrant collection for raw captured memories. |
| Target collection | gems_tr |
Qdrant collection for curated gems (injected into context). |
| Watcher service | enabled |
Real-time capture daemon. Reads session JSONL and writes to Qdrant. |
| Cron timer | enabled |
Periodic curation job. Runs curator_timer.py on schedule. |
| Log path | /var/log/true-recall-timer.log |
Where curator output is written. Check with tail -f. |
| Dry-run mode | disabled |
Test mode — shows what would be curated without writing to Qdrant. |
OpenClaw-side options:
| Option | Default | Description |
|---|---|---|
| Compactor mode | default |
How context is summarized. default = fast standard. safeguard = chunked for very long sessions. |
| Memory flush | disabled |
If enabled, writes silent "memory" turn before compaction. Adds overhead — disabled for minimal lag. |
| Context pruning | cache-ttl |
Removes old tool results from context. cache-ttl = prunes hourly. off = no pruning. |
8. Embedding Models
Current Setup:
memories_tr:snowflake-arctic-embed2(capture similarity)gems_tr:mxbai-embed-large(recall similarity)
Rationale:
- mxbai has higher MTEB score (66.5) for semantic search
- snowflake is faster for high-volume capture
Note: For simplicity, a single embedding model could be used for both collections. This would reduce complexity and memory overhead, though with slightly lower recall performance.
9. memory-qdrant Plugin
Location: ~/.openclaw/extensions/memory-qdrant/
Config (openclaw.json):
{
"collectionName": "gems_tr",
"captureCollection": "memories_tr",
"autoRecall": true,
"autoCapture": true
}
Functions:
- Recall: Searches
gems_tr, injects gems (hidden) - Capture: Session-level to
memories_tr(backup)
Files & Locations
Core Project
~/.openclaw/workspace/.projects/true-recall-v2/
├── README.md # This file
├── session.md # Detailed notes
├── curator-prompt.md # Extraction prompt
├── tr-daily/
│ └── curate_from_qdrant.py # Daily curator
└── shared/
New Files (2026-02-24)
| File | Purpose |
|---|---|
tr-continuous/curator_timer.py |
Timer curator (v2.2) |
tr-continuous/curator_config.json |
Curator settings |
tr-continuous/migrate_add_curated.py |
Migration script |
skills/qdrant-memory/scripts/realtime_qdrant_watcher.py |
Capture daemon |
skills/qdrant-memory/mem-qdrant-watcher.service |
Systemd service |
Archived Files (v2.1)
| File | Status | Note |
|---|---|---|
tr-daily/curate_from_qdrant.py |
📦 Archived | Replaced by timer |
tr-continuous/curator_by_count.py |
📦 Archived | Replaced by timer |
System Files
| File | Purpose |
|---|---|
~/.openclaw/extensions/memory-qdrant/ |
Plugin code |
~/.openclaw/openclaw.json |
Configuration |
/etc/systemd/system/mem-qdrant-watcher.service |
Service file |
Configuration
memory-qdrant Plugin
File: ~/.openclaw/openclaw.json
{
"memory-qdrant": {
"config": {
"autoCapture": true,
"autoRecall": true,
"collectionName": "gems_tr",
"captureCollection": "memories_tr",
"embeddingModel": "snowflake-arctic-embed2",
"maxRecallResults": 2,
"minRecallScore": 0.7,
"ollamaUrl": "http://<OLLAMA_IP>:11434",
"qdrantUrl": "http://<QDRANT_IP>:6333"
},
"enabled": true
}
}
Gateway Control UI (OpenClaw 2026.2.23)
{
"gateway": {
"controlUi": {
"allowedOrigins": ["*"],
"allowInsecureAuth": false,
"dangerouslyDisableDeviceAuth": true
}
}
}
Validation
Check Collections
# Count points
curl -s http://<QDRANT_IP>:6333/collections/memories_tr | jq '.result.points_count'
curl -s http://<QDRANT_IP>:6333/collections/gems_tr | jq '.result.points_count'
# View recent captures
curl -s -X POST http://<QDRANT_IP>:6333/collections/memories_tr/points/scroll \
-H "Content-Type: application/json" \
-d '{"limit": 3, "with_payload": true}' | jq '.result.points[].payload.content'
Check Services
# Watcher
sudo systemctl status mem-qdrant-watcher
sudo journalctl -u mem-qdrant-watcher -n 20
# OpenClaw
openclaw status
openclaw gateway status
Test Capture
Send a message, then check:
# Should increase by 1-2 points
curl -s http://<QDRANT_IP>:6333/collections/memories_tr | jq '.result.points_count'
Troubleshooting
Watcher Not Capturing
# Check logs
sudo journalctl -u mem-qdrant-watcher -f
# Verify dependencies
curl http://<QDRANT_IP>:6333/ # Qdrant
curl http://<OLLAMA_IP>:11434/api/tags # Ollama
Plugin Not Loading
# Validate config
openclaw config validate
# Check logs
tail /tmp/openclaw/openclaw-$(date +%Y-%m-%d).log | grep memory-qdrant
# Restart gateway
openclaw gateway restart
Gateway Won't Start (OpenClaw 2026.2.23+)
Error: non-loopback Control UI requires gateway.controlUi.allowedOrigins
Fix: Add to openclaw.json:
"gateway": {
"controlUi": {
"allowedOrigins": ["*"]
}
}
Status Summary
| Component | Status | Notes |
|---|---|---|
| Real-time watcher | ✅ Active | PID 1748, capturing |
| memories_tr | ✅ 12,378 pts | All tagged curated: false |
| gems_tr | ✅ 5 pts | Injection ready |
| Timer curator | ✅ Deployed | Every 30 min via cron |
| Plugin injection | ✅ Working | Uses gems_tr |
| Migration | ✅ Complete | 12,378 memories |
Logs: tail /var/log/true-recall-timer.log
Next: Monitor first timer run
Roadmap
Planned Features
| Feature | Status | Description |
|---|---|---|
| Interactive install script | ⏳ Planned | Prompts for embedding model, timer interval, batch size, endpoints |
| Single embedding model | ⏳ Planned | Option to use one model for both collections |
| Configurable thresholds | ⏳ Planned | Per-user customization via prompts |
Install script will prompt for:
- Embedding model — snowflake (fast) vs mxbai (accurate)
- Timer interval — 5 min / 30 min / hourly
- Batch size — 50 / 100 / 500 memories
- Endpoints — Qdrant/Ollama URLs
- User ID — for multi-user setups
Maintained by: Rob
AI Assistant: Kimi 🎙️
Version: 2026.02.24-v2.2