docs: simplify README, update validation and curator docs
This commit is contained in:
@@ -1,6 +1,6 @@
|
|||||||
# TrueRecall v2 - Git Validation Checklist
|
# TrueRecall v2 - Git Validation Checklist
|
||||||
|
|
||||||
**Environment:** Git Repository (`.git_projects/true-recall-v2/`)
|
**Environment:** Git Repository (`.git_projects/true-recall-gems/`)
|
||||||
**Purpose:** Validate git-ready directory for public sharing
|
**Purpose:** Validate git-ready directory for public sharing
|
||||||
**Version:** 2.4
|
**Version:** 2.4
|
||||||
**Last Updated:** 2026-02-26
|
**Last Updated:** 2026-02-26
|
||||||
|
|||||||
694
README.md
694
README.md
@@ -1,613 +1,147 @@
|
|||||||
# TrueRecall v2
|
# TrueRecall Gems (v2)
|
||||||
|
|
||||||
**Project:** Gem extraction and memory recall system
|
**Purpose:** Memory curation (gems) + context injection
|
||||||
**Status:** ✅ Active & Verified
|
|
||||||
**Location:** `~/.openclaw/workspace/.local_projects/true-recall-v2/`
|
|
||||||
**Last Updated:** 2026-02-25 12:04 CST
|
|
||||||
|
|
||||||
---
|
**Status:** ⚠️ Requires true-recall-base to be installed first
|
||||||
|
|
||||||
## Table of Contents
|
|
||||||
|
|
||||||
- [Quick Start](#quick-start)
|
|
||||||
- [Overview](#overview)
|
|
||||||
- [Current State](#current-state)
|
|
||||||
- [Architecture](#architecture)
|
|
||||||
- [Components](#components)
|
|
||||||
- [Files & Locations](#files--locations)
|
|
||||||
- [Configuration](#configuration)
|
|
||||||
- [Validation](#validation)
|
|
||||||
- [Troubleshooting](#troubleshooting)
|
|
||||||
- [Status Summary](#status-summary)
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Quick Start
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Check system status
|
|
||||||
openclaw status
|
|
||||||
sudo systemctl status mem-qdrant-watcher
|
|
||||||
|
|
||||||
# View recent captures
|
|
||||||
curl -s http://<QDRANT_IP>:6333/collections/memories_tr | jq '.result.points_count'
|
|
||||||
|
|
||||||
# Check collections
|
|
||||||
curl -s http://<QDRANT_IP>:6333/collections | jq '.result.collections[].name'
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Recent Fixes (2026-02-25 12:41 CST)
|
|
||||||
|
|
||||||
| Issue | Root Cause | Fix Applied |
|
|
||||||
|-------|------------|-------------|
|
|
||||||
| **Watcher stuck on old session** | Watcher only switched sessions when file deleted, old sessions persisted | ✅ Restarted service, now follows current session |
|
|
||||||
| **Plugin capture 0 exchanges** | OpenClaw uses OpenAI content format (array of items), plugin expected string | ✅ Added `extractMessageText()` to extract text from `type: "text"` items |
|
|
||||||
| **Gem ID collision** | Hash used non-existent fields (`conversation_id`, `turn_range`, `gem`) | ✅ Hash now uses `embedding_text_for_hash[:100]` |
|
|
||||||
| **Meta-gems extracted** | Curator extracted from debug/tool output | ✅ Added SKIP_PATTERNS filter ("gems extracted", "✅", "🔍", etc.) + skip `role: "assistant"` |
|
|
||||||
| **gems_tr pollution** | 5 meta-gems + 1 real gem | ✅ Cleaned, now 1 real gem only |
|
|
||||||
| **First-person format** | Third person "User decided..." | ✅ Changed to "I decided..." for better query matching (score 0.746 vs 0.39) |
|
|
||||||
|
|
||||||
### Validation Results
|
|
||||||
|
|
||||||
**Plugin capture:**
|
|
||||||
```
|
|
||||||
Before: parsed 14 user, 84 assistant messages, 0 exchanges
|
|
||||||
After: parsed 17 user, 116 assistant messages, 9 exchanges ✅
|
|
||||||
```
|
|
||||||
|
|
||||||
**Watcher:**
|
|
||||||
```
|
|
||||||
Before: Watching old session (1737142a... from Feb 24)
|
|
||||||
After: Watching current session (93dc32bf... from Feb 25) ✅
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Needed Improvements
|
|
||||||
|
|
||||||
| Issue | Description | Priority |
|
|
||||||
|-------|-------------|----------|
|
|
||||||
| **Semantic Deduplication** | No dedup between similar gems. Same fact phrased differently creates multiple gems. Need semantic similarity check before storage. | High |
|
|
||||||
| **Search Result Deduplication** | Similar gems both above threshold are both injected, causing redundancy. Need filter to remove near-duplicates from results. | Medium |
|
|
||||||
| **Gem Quality Scoring** | No quality metric. Some extracted gems may be low value. Need LLM-based quality scoring. | Medium |
|
|
||||||
| **Temporal Decay** | All gems treated equally regardless of age. Should weight recent gems higher. | Low |
|
|
||||||
| **Gem Merging/Updating** | When user changes preference, old gem still exists. Need mechanism to update/contradict old gems. | Low |
|
|
||||||
| **Importance Calibration** | All curator gems marked "medium" importance. Should dynamically assign based on significance. | Low |
|
|
||||||
|
|
||||||
### Gem Quality & Model Intelligence
|
|
||||||
|
|
||||||
**Gem quality improves significantly with smarter models:**
|
|
||||||
|
|
||||||
| Model | Gem Quality | Example |
|
|
||||||
|-------|-------------|---------|
|
|
||||||
| **Small models (7B)** | Basic extraction, may miss nuance | "User likes local AI" |
|
|
||||||
| **Medium models (30B)** | Better categorization, captures intent | "I prefer local AI over cloud services for privacy reasons" |
|
|
||||||
| **Large models (70B+)** | Rich context, infers significance, better first-person conversion | "I decided to self-host AI tools because I value data privacy and want to avoid vendor lock-in" |
|
|
||||||
|
|
||||||
**Example Gem (High Quality):**
|
|
||||||
```json
|
|
||||||
{
|
|
||||||
"text": "I decided to keep the installation simple and not include gems for the basic version",
|
|
||||||
"category": "decision",
|
|
||||||
"importance": "high"
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
**Current:** Using `qwen3:30b-a3b-instruct` for extraction (good balance of quality/speed).
|
|
||||||
**Recommendation:** For production use, consider `qwen3:72b` or `deepseek-r1` for higher gem quality.
|
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Overview
|
## Overview
|
||||||
|
|
||||||
TrueRecall v2 is a **standalone memory system** that extracts "gems" (key insights) from conversations and injects them as context. It operates independently — not an addon or extension of any previous system.
|
TrueRecall Gems adds **curation** and **injection** on top of Base's capture foundation.
|
||||||
|
|
||||||
TrueRecall v2 replaces both Jarvis Memory and TrueRecall v1 with a completely re-architected solution:
|
**Gems is an ADDON:**
|
||||||
|
- Requires true-recall-base
|
||||||
| System | Status | Relationship to v2 |
|
- Independent from openclaw-true-recall-blocks
|
||||||
|--------|--------|-------------------|
|
- Choose Gems OR Blocks, not both
|
||||||
| **Jarvis Memory** | Legacy | Replaced by v2 |
|
|
||||||
| **TrueRecall v1** | Deprecated | Replaced by v2 |
|
|
||||||
| **TrueRecall v2** | ✅ Active | Complete standalone replacement |
|
|
||||||
|
|
||||||
### Three-Layer Architecture
|
|
||||||
|
|
||||||
1. **Capture** — Real-time watcher saves every turn to `memories_tr`
|
|
||||||
2. **Curation** — Timer-based curator extracts gems to `gems_tr`
|
|
||||||
3. **Injection** — Plugin searches `gems_tr` and injects gems per turn
|
|
||||||
|
|
||||||
**Key:** v2 requires no components from Jarvis Memory or v1. It is self-contained with its own storage (Qdrant-only), capture mechanism, and injection system.
|
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Current State
|
## Three-Tier Architecture
|
||||||
|
|
||||||
### Verified at 19:02 CST
|
|
||||||
|
|
||||||
| Collection | Points | Purpose | Status |
|
|
||||||
|------------|--------|---------|--------|
|
|
||||||
| `memories_tr` | **12,729** | Full text (live capture) | ✅ Active |
|
|
||||||
| `gems_tr` | **14+** | Curated gems (injection) | ✅ **WORKING** - Context injection verified |
|
|
||||||
|
|
||||||
**All memories tagged with `curated: false` for timer curation.**
|
|
||||||
|
|
||||||
### Services Status
|
|
||||||
|
|
||||||
| Service | Status | Details |
|
|
||||||
|---------|--------|---------|
|
|
||||||
| `mem-qdrant-watcher` | ✅ Active | PID 234, capturing |
|
|
||||||
| Timer curator | ✅ Deployed | Every 5 min via cron |
|
|
||||||
| OpenClaw Gateway | ✅ Running | Version 2026.2.23 |
|
|
||||||
| memory-qdrant plugin | ✅ Loaded | recall: gems_tr |
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Comparison: TrueRecall v2 vs Jarvis Memory vs v1
|
|
||||||
|
|
||||||
| Feature | Jarvis Memory | TrueRecall v1 | TrueRecall v2 |
|
|
||||||
|---------|---------------|---------------|---------------|
|
|
||||||
| **Storage** | Redis | Redis + Qdrant | Qdrant only |
|
|
||||||
| **Capture** | Session batch | Session batch | Real-time |
|
|
||||||
| **Curation** | Manual | Daily 2:45 AM | Timer (5 min) ✅ |
|
|
||||||
| **Embedding** | — | snowflake | snowflake-arctic-embed2 ✅ |
|
|
||||||
| **Curator LLM** | — | qwen3:4b | qwen3:30b |
|
|
||||||
| **State tracking** | — | — | `curated` tag |
|
|
||||||
| **Batch size** | — | 24h worth | Configurable |
|
|
||||||
| **JSON parsing** | — | Fallback needed | Native (30b) |
|
|
||||||
|
|
||||||
**Key Improvements v2:**
|
|
||||||
- ✅ Real-time capture (no batch delay)
|
|
||||||
- ✅ Timer-based curation (responsive vs daily)
|
|
||||||
- ✅ 30b curator (better gems, faster ~3s)
|
|
||||||
- ✅ `curated` tag (reliable state tracking)
|
|
||||||
- ✅ No Redis dependency (simpler stack)
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Architecture
|
|
||||||
|
|
||||||
### v2.2: Timer-Based Curation
|
|
||||||
|
|
||||||
```
|
```
|
||||||
┌─────────────────┐ ┌──────────────────────┐ ┌─────────────┐
|
true-recall-base (REQUIRED)
|
||||||
│ OpenClaw Chat │────▶│ Real-Time Watcher │────▶│ memories_tr │
|
├── Watcher daemon
|
||||||
│ (Session JSONL)│ │ (Python daemon) │ │ (Qdrant) │
|
└── memories_tr (raw capture)
|
||||||
└─────────────────┘ └──────────────────────┘ └──────┬──────┘
|
│
|
||||||
│
|
└──▶ true-recall-gems (THIS ADDON)
|
||||||
│ Every 30 min
|
├── Curator extracts gems
|
||||||
▼
|
├── gems_tr (curated)
|
||||||
┌──────────────────┐
|
└── Plugin injection
|
||||||
│ Timer Curator │
|
|
||||||
│ (cron/qwen3) │
|
|
||||||
└────────┬─────────┘
|
|
||||||
│
|
|
||||||
▼
|
|
||||||
┌──────────────────┐
|
|
||||||
│ gems_tr │
|
|
||||||
│ (Qdrant) │
|
|
||||||
└────────┬─────────┘
|
|
||||||
│
|
|
||||||
Per turn │
|
|
||||||
▼
|
|
||||||
┌──────────────────┐
|
|
||||||
│ memory-qdrant │
|
|
||||||
│ plugin │
|
|
||||||
└──────────────────┘
|
|
||||||
```
|
|
||||||
|
|
||||||
**Key Changes in v2.2:**
|
Note: Don't install with openclaw-true-recall-blocks.
|
||||||
- ✅ Timer-based curation (30 min intervals)
|
Choose one addon: Gems OR Blocks.
|
||||||
- ✅ All memories tagged `curated: false` on capture
|
|
||||||
- ✅ Migration complete (12,378 memories)
|
|
||||||
- ❌ Removed daily batch processing (2:45 AM)
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Components
|
|
||||||
|
|
||||||
### 1. Real-Time Watcher
|
|
||||||
|
|
||||||
**File:** `skills/qdrant-memory/scripts/realtime_qdrant_watcher.py`
|
|
||||||
|
|
||||||
**What it does:**
|
|
||||||
- Watches `~/.openclaw/agents/main/sessions/*.jsonl`
|
|
||||||
- Parses each turn (user + AI)
|
|
||||||
- Embeds with `snowflake-arctic-embed2`
|
|
||||||
- Stores to `memories_tr` instantly
|
|
||||||
- **Cleans:** Removes markdown, tables, metadata
|
|
||||||
|
|
||||||
**Service:** `mem-qdrant-watcher.service`
|
|
||||||
|
|
||||||
**Commands:**
|
|
||||||
```bash
|
|
||||||
# Check status
|
|
||||||
sudo systemctl status mem-qdrant-watcher
|
|
||||||
|
|
||||||
# View logs
|
|
||||||
sudo journalctl -u mem-qdrant-watcher -f
|
|
||||||
|
|
||||||
# Restart
|
|
||||||
sudo systemctl restart mem-qdrant-watcher
|
|
||||||
```
|
```
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
### 2. Content Cleaner
|
## Prerequisites
|
||||||
|
|
||||||
**File:** `skills/qdrant-memory/scripts/clean_memories_tr.py`
|
**REQUIRED: Install TrueRecall Base first**
|
||||||
|
|
||||||
**Purpose:** Batch-clean existing points
|
Base provides the capture infrastructure (`memories_tr` collection).
|
||||||
|
|
||||||
**Usage:**
|
|
||||||
```bash
|
|
||||||
# Preview changes
|
|
||||||
python3 clean_memories_tr.py --dry-run
|
|
||||||
|
|
||||||
# Clean all
|
|
||||||
python3 clean_memories_tr.py --execute
|
|
||||||
|
|
||||||
# Clean 100 (test)
|
|
||||||
python3 clean_memories_tr.py --execute --limit 100
|
|
||||||
```
|
|
||||||
|
|
||||||
**Cleans:**
|
|
||||||
- `**bold**` → plain text
|
|
||||||
- `|tables|` → removed
|
|
||||||
- `` `code` `` → plain text
|
|
||||||
- `---` rules → removed
|
|
||||||
- `# headers` → removed
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### 3. Timer Curator
|
|
||||||
|
|
||||||
**File:** `tr-continuous/curator_timer.py`
|
|
||||||
|
|
||||||
**Schedule:** Every 30 minutes (cron)
|
|
||||||
|
|
||||||
**Flow:**
|
|
||||||
1. Query uncurated memories from `memories_tr`
|
|
||||||
2. Send batch to qwen3 (max 100)
|
|
||||||
3. Extract gems → store to `gems_tr`
|
|
||||||
4. Mark memories as `curated: true`
|
|
||||||
|
|
||||||
**Config:** `tr-continuous/curator_config.json`
|
|
||||||
```json
|
|
||||||
{
|
|
||||||
"timer_minutes": 30,
|
|
||||||
"max_batch_size": 100
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
**Logs:** `/var/log/true-recall-timer.log`
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### 4. Curation Model Comparison
|
|
||||||
|
|
||||||
**Current:** `qwen3:4b-instruct`
|
|
||||||
|
|
||||||
| Metric | 4b | 30b |
|
|
||||||
|--------|----|----|
|
|
||||||
| Speed | ~10-30s per batch | **~3.3s** (tested 2026-02-24) |
|
|
||||||
| JSON reliability | ⚠️ Needs fallback | ✅ Native |
|
|
||||||
| Context quality | Basic extraction | ✅ Nuanced |
|
|
||||||
| Snippet accuracy | ~80% | ✅ Expected: 95%+ |
|
|
||||||
|
|
||||||
**30b Benchmark (2026-02-24):**
|
|
||||||
- Load: 108ms
|
|
||||||
- Prompt eval: 49ms (1,576 tok/s)
|
|
||||||
- Generation: 2.9s (233 tokens, 80 tok/s)
|
|
||||||
- **Total: 3.26s**
|
|
||||||
|
|
||||||
**Trade-offs:**
|
|
||||||
- **4b:** Faster batch processing, lightweight, catches explicit decisions
|
|
||||||
- **30b:** Deeper context, better inference, ~3x slower but superior quality
|
|
||||||
|
|
||||||
**Gem Quality Comparison (Sample Review):**
|
|
||||||
|
|
||||||
| Aspect | 4b | 30b |
|
|
||||||
|--------|----|----|
|
|
||||||
| **Context depth** | "Extracted via fallback" | Explains *why* decisions were made |
|
|
||||||
| **Confidence scores** | 0.7-0.85 | 0.9-0.97 |
|
|
||||||
| **Snippet accuracy** | ~80% (wrong source) | ✅ 95%+ (relevant quotes) |
|
|
||||||
| **Categories** | Generic "extracted" | Specific: knowledge, technical, decision |
|
|
||||||
| **Example** | "User implemented BorgBackup" (no context) | "User selected mxbai... due to top MTEB score of 66.5" (explains reasoning) |
|
|
||||||
|
|
||||||
**Verdict:** 30b produces significantly higher quality gems — richer context, accurate snippets, and captures architectural intent, not just surface facts.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### 5. OpenClaw Compactor Configuration
|
|
||||||
|
|
||||||
**Status:** ✅ Applied
|
|
||||||
|
|
||||||
**Goal:** Minimal overhead — just remove context, do nothing else.
|
|
||||||
|
|
||||||
**Config Applied:**
|
|
||||||
```json5
|
|
||||||
{
|
|
||||||
agents: {
|
|
||||||
defaults: {
|
|
||||||
compaction: {
|
|
||||||
mode: "default", // "default" or "safeguard"
|
|
||||||
reserveTokensFloor: 0, // Disable safety floor (default: 20000)
|
|
||||||
memoryFlush: {
|
|
||||||
enabled: false // Disable silent .md file writes
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
**What this does:**
|
|
||||||
- `mode: "default"` — Standard summarization (faster)
|
|
||||||
- `reserveTokensFloor: 0` — Allow aggressive settings (disables 20k minimum)
|
|
||||||
- `memoryFlush.enabled: false` — No silent "write memory" turns
|
|
||||||
|
|
||||||
**Note:** `reserveTokens` and `keepRecentTokens` are Pi runtime settings, not configurable via `agents.defaults.compaction`. They are set per-model in `contextWindow`/`contextTokens`.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### 6. Configuration Options Reference
|
|
||||||
|
|
||||||
**All configurable options with defaults:**
|
|
||||||
|
|
||||||
| Option | Default | Description |
|
|
||||||
|--------|---------|-------------|
|
|
||||||
| **Embedding model** | `mxbai-embed-large` | Model for generating gem embeddings. `mxbai` = higher accuracy (MTEB 66.5). `snowflake` = faster processing. |
|
|
||||||
| **Timer interval** | `5` minutes | How often the curator runs. `5 min` = fast backlog clearing. `30 min` = balanced. `60 min` = minimal overhead. |
|
|
||||||
| **Batch size** | `100` | Max memories sent to curator per run. Higher = fewer API calls but more memory usage. |
|
|
||||||
| **Max gems per run** | *(unlimited)* | Hard limit on gems extracted per batch. Not set by default — extracts all found gems. |
|
|
||||||
| **Qdrant URL** | `http://<QDRANT_IP>:6333` | Vector database endpoint. Change if Qdrant runs on different host/port. |
|
|
||||||
| **Ollama URL** | `http://<OLLAMA_IP>:11434` | LLM endpoint for gem extraction. Change if Ollama runs elsewhere. |
|
|
||||||
| **Curator LLM** | `qwen3:30b-a3b-instruct` | Model for extracting gems. `30b` = best quality (~3s). `4b` = faster but needs JSON fallback. |
|
|
||||||
| **User ID** | `rob` | Owner identifier for memories. Used for filtering and multi-user setups. |
|
|
||||||
| **Source collection** | `memories_tr` | Qdrant collection for raw captured memories. |
|
|
||||||
| **Target collection** | `gems_tr` | Qdrant collection for curated gems (injected into context). |
|
|
||||||
| **Watcher service** | `enabled` | Real-time capture daemon. Reads session JSONL and writes to Qdrant. |
|
|
||||||
| **Cron timer** | `enabled` | Periodic curation job. Runs `curator_timer.py` on schedule. |
|
|
||||||
| **Log path** | `/var/log/true-recall-timer.log` | Where curator output is written. Check with `tail -f`. |
|
|
||||||
| **Dry-run mode** | `disabled` | Test mode — shows what would be curated without writing to Qdrant. |
|
|
||||||
|
|
||||||
**OpenClaw-side options:**
|
|
||||||
| Option | Default | Description |
|
|
||||||
|--------|---------|-------------|
|
|
||||||
| **Compactor mode** | `default` | How context is summarized. `default` = fast standard. `safeguard` = chunked for very long sessions. |
|
|
||||||
| **Memory flush** | `disabled` | If enabled, writes silent "memory" turn before compaction. Adds overhead — disabled for minimal lag. |
|
|
||||||
| **Context pruning** | `cache-ttl` | Removes old tool results from context. `cache-ttl` = prunes hourly. `off` = no pruning. |
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### 7. Embedding Models
|
|
||||||
|
|
||||||
**Current Setup:**
|
|
||||||
- `memories_tr`: `snowflake-arctic-embed2` (capture)
|
|
||||||
- `gems_tr`: `snowflake-arctic-embed2` (recall) ✅ **FIXED** - Both collections now use same model
|
|
||||||
|
|
||||||
**Note:** Previously used `mxbai-embed-large` for gems, but this caused embedding model mismatch. Fixed 2026-02-25.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### 6. memory-qdrant Plugin
|
|
||||||
|
|
||||||
**Location:** `~/.openclaw/extensions/memory-qdrant/`
|
|
||||||
|
|
||||||
**Config (openclaw.json):**
|
|
||||||
```json
|
|
||||||
{
|
|
||||||
"collectionName": "gems_tr",
|
|
||||||
"captureCollection": "memories_tr",
|
|
||||||
"autoRecall": true,
|
|
||||||
"autoCapture": true
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
**Functions:**
|
|
||||||
- **Recall:** Searches `gems_tr`, injects gems (hidden)
|
|
||||||
- **Capture:** Session-level to `memories_tr` (backup)
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Files & Locations
|
|
||||||
|
|
||||||
### Core Project
|
|
||||||
|
|
||||||
```
|
|
||||||
~/.openclaw/workspace/.local_projects/true-recall-v2/
|
|
||||||
├── README.md # This file
|
|
||||||
├── session.md # Detailed notes
|
|
||||||
├── curator-prompt.md # Extraction prompt
|
|
||||||
├── tr-daily/
|
|
||||||
│ └── curate_from_qdrant.py # Daily curator
|
|
||||||
└── shared/
|
|
||||||
```
|
|
||||||
|
|
||||||
### New Files (2026-02-24)
|
|
||||||
|
|
||||||
| File | Purpose |
|
|
||||||
|------|---------|
|
|
||||||
| `tr-continuous/curator_timer.py` | Timer curator (v2.2) |
|
|
||||||
| `tr-continuous/curator_config.json` | Curator settings |
|
|
||||||
| `tr-continuous/migrate_add_curated.py` | Migration script |
|
|
||||||
| `skills/qdrant-memory/scripts/realtime_qdrant_watcher.py` | Capture daemon |
|
|
||||||
| `skills/qdrant-memory/mem-qdrant-watcher.service` | Systemd service |
|
|
||||||
|
|
||||||
### Archived Files (v2.1)
|
|
||||||
|
|
||||||
| File | Status | Note |
|
|
||||||
|------|--------|------|
|
|
||||||
| `tr-daily/curate_from_qdrant.py` | 📦 Archived | Replaced by timer |
|
|
||||||
| `tr-continuous/curator_by_count.py` | 📦 Archived | Replaced by timer |
|
|
||||||
|
|
||||||
### System Files
|
|
||||||
|
|
||||||
| File | Purpose |
|
|
||||||
|------|---------|
|
|
||||||
| `~/.openclaw/extensions/memory-qdrant/` | Plugin code |
|
|
||||||
| `~/.openclaw/openclaw.json` | Configuration |
|
|
||||||
| `/etc/systemd/system/mem-qdrant-watcher.service` | Service file |
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Configuration
|
|
||||||
|
|
||||||
### memory-qdrant Plugin
|
|
||||||
|
|
||||||
**File:** `~/.openclaw/openclaw.json`
|
|
||||||
|
|
||||||
```json
|
|
||||||
{
|
|
||||||
"memory-qdrant": {
|
|
||||||
"config": {
|
|
||||||
"autoCapture": true,
|
|
||||||
"autoRecall": true,
|
|
||||||
"collectionName": "gems_tr",
|
|
||||||
"captureCollection": "memories_tr",
|
|
||||||
"embeddingModel": "snowflake-arctic-embed2",
|
|
||||||
"maxRecallResults": 2,
|
|
||||||
"minRecallScore": 0.7,
|
|
||||||
"ollamaUrl": "http://<OLLAMA_IP>:11434",
|
|
||||||
"qdrantUrl": "http://<QDRANT_IP>:6333"
|
|
||||||
},
|
|
||||||
"enabled": true
|
|
||||||
}
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
### Gateway Control UI (OpenClaw 2026.2.23)
|
|
||||||
|
|
||||||
```json
|
|
||||||
{
|
|
||||||
"gateway": {
|
|
||||||
"controlUi": {
|
|
||||||
"allowedOrigins": ["*"],
|
|
||||||
"allowInsecureAuth": false,
|
|
||||||
"dangerouslyDisableDeviceAuth": true
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Validation
|
|
||||||
|
|
||||||
### Check Collections
|
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# Count points
|
# Verify base is running
|
||||||
curl -s http://<QDRANT_IP>:6333/collections/memories_tr | jq '.result.points_count'
|
curl -s http://<QDRANT_IP>:6333/collections/memories_tr | jq '.result.points_count'
|
||||||
|
```
|
||||||
|
|
||||||
|
## Installation
|
||||||
|
|
||||||
|
### 1. Curator Setup
|
||||||
|
|
||||||
|
**Install cron job:**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Edit path and add to crontab
|
||||||
|
echo "*/5 * * * * cd <INSTALL_PATH>/true-recall-gems/tr-continuous && /usr/bin/python3 curator_timer.py >> /var/log/true-recall-timer.log 2>&1" | sudo crontab -
|
||||||
|
|
||||||
|
sudo touch /var/log/true-recall-timer.log
|
||||||
|
sudo chmod 644 /var/log/true-recall-timer.log
|
||||||
|
```
|
||||||
|
|
||||||
|
**Configure curator_config.json:**
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"timer_minutes": 5,
|
||||||
|
"max_batch_size": 100,
|
||||||
|
"user_id": "your-user-id",
|
||||||
|
"source_collection": "memories_tr",
|
||||||
|
"target_collection": "gems_tr"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Edit curator_timer.py:**
|
||||||
|
- Replace `<QDRANT_IP>`, `<OLLAMA_IP>` with your endpoints
|
||||||
|
- Replace `<USER_ID>` with your identifier
|
||||||
|
- Replace `<CURATOR_MODEL>` with your LLM (e.g., `qwen3:30b`)
|
||||||
|
|
||||||
|
### 2. Injection Setup
|
||||||
|
|
||||||
|
Add to your OpenClaw `openclaw.json`:
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"plugins": {
|
||||||
|
"entries": {
|
||||||
|
"memory-qdrant": {
|
||||||
|
"config": {
|
||||||
|
"autoCapture": true,
|
||||||
|
"autoRecall": true,
|
||||||
|
"captureCollection": "memories_tr",
|
||||||
|
"collectionName": "gems_tr",
|
||||||
|
"embeddingModel": "snowflake-arctic-embed2",
|
||||||
|
"maxRecallResults": 2,
|
||||||
|
"minRecallScore": 0.8,
|
||||||
|
"ollamaUrl": "http://<OLLAMA_IP>:11434",
|
||||||
|
"qdrantUrl": "http://<QDRANT_IP>:6333"
|
||||||
|
},
|
||||||
|
"enabled": true
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"slots": {
|
||||||
|
"memory": "memory-qdrant"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Files
|
||||||
|
|
||||||
|
| File | Purpose |
|
||||||
|
|------|---------|
|
||||||
|
| `tr-continuous/curator_timer.py` | Timer-based curator |
|
||||||
|
| `tr-continuous/curator_config.json` | Curator settings template |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Verification
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Check v1 capture
|
||||||
|
curl -s http://<QDRANT_IP>:6333/collections/memories_tr | jq '.result.points_count'
|
||||||
|
|
||||||
|
# Check v2 curation
|
||||||
curl -s http://<QDRANT_IP>:6333/collections/gems_tr | jq '.result.points_count'
|
curl -s http://<QDRANT_IP>:6333/collections/gems_tr | jq '.result.points_count'
|
||||||
|
|
||||||
# View recent captures
|
# Check curator logs
|
||||||
curl -s -X POST http://<QDRANT_IP>:6333/collections/memories_tr/points/scroll \
|
tail -20 /var/log/true-recall-timer.log
|
||||||
-H "Content-Type: application/json" \
|
|
||||||
-d '{"limit": 3, "with_payload": true}' | jq '.result.points[].payload.content'
|
|
||||||
```
|
|
||||||
|
|
||||||
### Check Services
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Watcher
|
|
||||||
sudo systemctl status mem-qdrant-watcher
|
|
||||||
sudo journalctl -u mem-qdrant-watcher -n 20
|
|
||||||
|
|
||||||
# OpenClaw
|
|
||||||
openclaw status
|
|
||||||
openclaw gateway status
|
|
||||||
```
|
|
||||||
|
|
||||||
### Test Capture
|
|
||||||
|
|
||||||
Send a message, then check:
|
|
||||||
```bash
|
|
||||||
# Should increase by 1-2 points
|
|
||||||
curl -s http://<QDRANT_IP>:6333/collections/memories_tr | jq '.result.points_count'
|
|
||||||
```
|
```
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Troubleshooting
|
## Dependencies
|
||||||
|
|
||||||
### Watcher Not Capturing
|
| Component | Provided By | Required For |
|
||||||
|
|-----------|-------------|--------------|
|
||||||
```bash
|
| Capture | v1 | v2 (input) |
|
||||||
# Check logs
|
| Curation | v2 | Injection |
|
||||||
sudo journalctl -u mem-qdrant-watcher -f
|
| Injection | v2 | Context recall |
|
||||||
|
|
||||||
# Verify dependencies
|
|
||||||
curl http://<QDRANT_IP>:6333/ # Qdrant
|
|
||||||
curl http://<OLLAMA_IP>:11434/api/tags # Ollama
|
|
||||||
```
|
|
||||||
|
|
||||||
### Plugin Not Loading
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Validate config
|
|
||||||
openclaw config validate
|
|
||||||
|
|
||||||
# Check logs
|
|
||||||
tail /tmp/openclaw/openclaw-$(date +%Y-%m-%d).log | grep memory-qdrant
|
|
||||||
|
|
||||||
# Restart gateway
|
|
||||||
openclaw gateway restart
|
|
||||||
```
|
|
||||||
|
|
||||||
### Gateway Won't Start (OpenClaw 2026.2.23+)
|
|
||||||
|
|
||||||
**Error:** `non-loopback Control UI requires gateway.controlUi.allowedOrigins`
|
|
||||||
|
|
||||||
**Fix:** Add to `openclaw.json`:
|
|
||||||
```json
|
|
||||||
"gateway": {
|
|
||||||
"controlUi": {
|
|
||||||
"allowedOrigins": ["*"]
|
|
||||||
}
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Status Summary
|
**Version:** 2.0
|
||||||
|
**Requires:** TrueRecall v1
|
||||||
| Component | Status | Notes |
|
**Collections:** `memories_tr` (v1), `gems_tr` (v2)
|
||||||
|-----------|--------|-------|
|
|
||||||
| Real-time watcher | ✅ Active | PID 1748, capturing |
|
|
||||||
| memories_tr | ✅ 12,378 pts | All tagged `curated: false` |
|
|
||||||
| gems_tr | ✅ 5 pts | Injection ready |
|
|
||||||
| Timer curator | ✅ Deployed | Every 30 min via cron |
|
|
||||||
| Plugin injection | ✅ **WORKING** | Context injection verified - score 0.587 |
|
|
||||||
| Migration | ✅ Complete | 12,378 memories |
|
|
||||||
|
|
||||||
**Logs:** `tail /var/log/true-recall-timer.log`
|
|
||||||
|
|
||||||
**Next:** Monitor first timer run
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Roadmap
|
|
||||||
|
|
||||||
### Planned Features
|
|
||||||
|
|
||||||
| Feature | Status | Description |
|
|
||||||
|---------|--------|-------------|
|
|
||||||
| Interactive install script | ⏳ Planned | Prompts for embedding model, timer interval, batch size, endpoints |
|
|
||||||
| Single embedding model | ⏳ Planned | Option to use one model for both collections |
|
|
||||||
| Configurable thresholds | ⏳ Planned | Per-user customization via prompts |
|
|
||||||
|
|
||||||
**Install script will prompt for:**
|
|
||||||
1. **Embedding model** — snowflake (fast) vs mxbai (accurate)
|
|
||||||
2. **Timer interval** — 5 min / 30 min / hourly
|
|
||||||
3. **Batch size** — 50 / 100 / 500 memories
|
|
||||||
4. **Endpoints** — Qdrant/Ollama URLs
|
|
||||||
5. **User ID** — for multi-user setups
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
**Maintained by:** Rob
|
|
||||||
**AI Assistant:** Kimi 🎙️
|
|
||||||
**Version:** 2026.02.24-v2.2
|
|
||||||
|
|||||||
@@ -1,6 +1,6 @@
|
|||||||
# TrueRecall v2 - Master Audit Checklist (GIT)
|
# TrueRecall Gems - Master Audit Checklist (GIT)
|
||||||
|
|
||||||
**For:** `.git_projects/true-recall-v2/` (Git Repository - Sanitized)
|
**For:** `.git_projects/true-recall-gems/` (Git Repository - Sanitized)
|
||||||
**Version:** 2.2
|
**Version:** 2.2
|
||||||
**Last Updated:** 2026-02-25 10:07 CST
|
**Last Updated:** 2026-02-25 10:07 CST
|
||||||
|
|
||||||
@@ -91,18 +91,18 @@ This checklist validates the **git repository** where all private IPs, absolute
|
|||||||
|
|
||||||
| # | File | Path | Status |
|
| # | File | Path | Status |
|
||||||
|---|------|------|--------|
|
|---|------|------|--------|
|
||||||
| 2.1.1 | README.md | `.local_projects/true-recall-v2/README.md` | ☐ |
|
| 2.1.1 | README.md | `.local_projects/true-recall-gems/README.md` | ☐ |
|
||||||
| 2.1.2 | session.md | `.local_projects/true-recall-v2/session.md` | ☐ |
|
| 2.1.2 | session.md | `.local_projects/true-recall-gems/session.md` | ☐ |
|
||||||
| 2.1.3 | checklist.md | `.local_projects/true-recall-v2/checklist.md` | ☐ |
|
| 2.1.3 | checklist.md | `.local_projects/true-recall-gems/checklist.md` | ☐ |
|
||||||
| 2.1.4 | curator-prompt.md | `.local_projects/true-recall-v2/curator-prompt.md` | ☐ |
|
| 2.1.4 | curator-prompt.md | `.local_projects/true-recall-gems/curator-prompt.md` | ☐ |
|
||||||
|
|
||||||
### 2.2 Scripts Exist
|
### 2.2 Scripts Exist
|
||||||
|
|
||||||
| # | File | Path | Status |
|
| # | File | Path | Status |
|
||||||
|---|------|------|--------|
|
|---|------|------|--------|
|
||||||
| 2.2.1 | curator_timer.py | `.local_projects/true-recall-v2/tr-continuous/curator_timer.py` | ☐ |
|
| 2.2.1 | curator_timer.py | `.local_projects/true-recall-gems/tr-continuous/curator_timer.py` | ☐ |
|
||||||
| 2.2.2 | curator_config.json | `.local_projects/true-recall-v2/tr-continuous/curator_config.json` | ☐ |
|
| 2.2.2 | curator_config.json | `.local_projects/true-recall-gems/tr-continuous/curator_config.json` | ☐ |
|
||||||
| 2.2.3 | install.py | `.local_projects/true-recall-v2/install.py` | ☐ |
|
| 2.2.3 | install.py | `.local_projects/true-recall-gems/install.py` | ☐ |
|
||||||
|
|
||||||
### 2.3 Watcher Files
|
### 2.3 Watcher Files
|
||||||
|
|
||||||
@@ -347,7 +347,7 @@ curl -s -X POST http://<QDRANT_IP>:6333/collections/memories_tr/points/count \
|
|||||||
-d '{"filter":{"must":[{"key":"curated","match":{"value":false}}]}}'
|
-d '{"filter":{"must":[{"key":"curated","match":{"value":false}}]}}'
|
||||||
|
|
||||||
# Manual curator run
|
# Manual curator run
|
||||||
cd ~/.openclaw/workspace/.local_projects/true-recall-v2/tr-continuous
|
cd ~/.openclaw/workspace/.local_projects/true-recall-gems/tr-continuous
|
||||||
python3 curator_timer.py --dry-run
|
python3 curator_timer.py --dry-run
|
||||||
|
|
||||||
# Restart services
|
# Restart services
|
||||||
@@ -357,4 +357,4 @@ sudo systemctl restart mem-qdrant-watcher
|
|||||||
---
|
---
|
||||||
|
|
||||||
*This checklist is for LOCAL working directory validation only.*
|
*This checklist is for LOCAL working directory validation only.*
|
||||||
*For git/public checks, see `audit_checklist.md` in `.git_projects/true-recall-v2/`*
|
*For git/public checks, see `audit_checklist.md` in `.git_projects/true-recall-gems/`*
|
||||||
|
|||||||
@@ -6,14 +6,14 @@
|
|||||||
**Qdrant:** http://<QDRANT_IP>:6333
|
**Qdrant:** http://<QDRANT_IP>:6333
|
||||||
**Ollama:** http://<OLLAMA_IP>:11434
|
**Ollama:** http://<OLLAMA_IP>:11434
|
||||||
**Timer:** 5 minutes
|
**Timer:** 5 minutes
|
||||||
**Working Dir:** ~/.openclaw/workspace/.local_projects/true-recall-v2
|
**Working Dir:** ~/.openclaw/workspace/.local_projects/true-recall-gems
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Quick Status Check
|
## Quick Status Check
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
cd ~/.openclaw/workspace/.local_projects/true-recall-v2
|
cd ~/.openclaw/workspace/.local_projects/true-recall-gems
|
||||||
```
|
```
|
||||||
|
|
||||||
---
|
---
|
||||||
@@ -22,13 +22,13 @@ cd ~/.openclaw/workspace/.local_projects/true-recall-v2
|
|||||||
|
|
||||||
| Check | Command | Expected |
|
| Check | Command | Expected |
|
||||||
|-------|---------|----------|
|
|-------|---------|----------|
|
||||||
| Local project exists | `ls ~/.openclaw/workspace/.local_projects/true-recall-v2` | Files listed |
|
| Local project exists | `ls ~/.openclaw/workspace/.local_projects/true-recall-gems` | Files listed |
|
||||||
| Git project exists | `ls ~/.openclaw/workspace/.git_projects/true-recall-v2` | Files listed |
|
| Git project exists | `ls ~/.openclaw/workspace/.git_projects/true-recall-gems` | Files listed |
|
||||||
| Watcher script | `ls ~/.openclaw/workspace/skills/qdrant-memory/scripts/realtime_qdrant_watcher.py` | File exists |
|
| Watcher script | `ls ~/.openclaw/workspace/skills/qdrant-memory/scripts/realtime_qdrant_watcher.py` | File exists |
|
||||||
|
|
||||||
**Our Paths:**
|
**Our Paths:**
|
||||||
- Local: `~/.openclaw/workspace/.local_projects/true-recall-v2/`
|
- Local: `~/.openclaw/workspace/.local_projects/true-recall-gems/`
|
||||||
- Git: `~/.openclaw/workspace/.git_projects/true-recall-v2/`
|
- Git: `~/.openclaw/workspace/.git_projects/true-recall-gems/`
|
||||||
- Watcher: `~/.openclaw/workspace/skills/qdrant-memory/scripts/realtime_qdrant_watcher.py`
|
- Watcher: `~/.openclaw/workspace/skills/qdrant-memory/scripts/realtime_qdrant_watcher.py`
|
||||||
- Systemd: `/etc/systemd/system/mem-qdrant-watcher.service`
|
- Systemd: `/etc/systemd/system/mem-qdrant-watcher.service`
|
||||||
|
|
||||||
@@ -131,8 +131,8 @@ cd ~/.openclaw/workspace/.local_projects/true-recall-v2
|
|||||||
| Path | Check | Status |
|
| Path | Check | Status |
|
||||||
|------|-------|--------|
|
|------|-------|--------|
|
||||||
| Watcher script | `skills/qdrant-memory/scripts/realtime_qdrant_watcher.py` | ☐ |
|
| Watcher script | `skills/qdrant-memory/scripts/realtime_qdrant_watcher.py` | ☐ |
|
||||||
| Curator script | `.local_projects/true-recall-v2/tr-continuous/curator_timer.py` | ☐ |
|
| Curator script | `.local_projects/true-recall-gems/tr-continuous/curator_timer.py` | ☐ |
|
||||||
| Config file | `.local_projects/true-recall-v2/tr-continuous/curator_config.json` | ☐ |
|
| Config file | `.local_projects/true-recall-gems/tr-continuous/curator_config.json` | ☐ |
|
||||||
| Log file | `/var/log/true-recall-timer.log` | ☐ |
|
| Log file | `/var/log/true-recall-timer.log` | ☐ |
|
||||||
|
|
||||||
---
|
---
|
||||||
@@ -153,7 +153,7 @@ curl -s -X POST http://<QDRANT_IP>:6333/collections/memories_tr/points/count \
|
|||||||
-d '{"filter":{"must":[{"key":"user_id","match":{"value":"rob"}},{"key":"curated","match":{"value":false}}]}}' | jq .result.count
|
-d '{"filter":{"must":[{"key":"user_id","match":{"value":"rob"}},{"key":"curated","match":{"value":false}}]}}' | jq .result.count
|
||||||
|
|
||||||
# Run curator manually (Our path: .local_projects)
|
# Run curator manually (Our path: .local_projects)
|
||||||
cd ~/.openclaw/workspace/.local_projects/true-recall-v2/tr-continuous
|
cd ~/.openclaw/workspace/.local_projects/true-recall-gems/tr-continuous
|
||||||
python3 curator_timer.py
|
python3 curator_timer.py
|
||||||
|
|
||||||
# Check OpenClaw plugin
|
# Check OpenClaw plugin
|
||||||
|
|||||||
@@ -1,7 +1,7 @@
|
|||||||
{
|
{
|
||||||
"timer_minutes": 5,
|
"timer_minutes": 5,
|
||||||
"max_batch_size": 100,
|
"max_batch_size": 100,
|
||||||
"user_id": "rob",
|
"user_id": "<USER_ID>",
|
||||||
"source_collection": "memories_tr",
|
"source_collection": "memories_tr",
|
||||||
"target_collection": "gems_tr"
|
"target_collection": "gems_tr"
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -1,144 +1,102 @@
|
|||||||
#!/usr/bin/env python3
|
#!/usr/bin/env python3
|
||||||
"""
|
"""
|
||||||
TrueRecall Timer Curator: Runs every 30 minutes via cron.
|
TrueRecall v2 - Timer Curator
|
||||||
|
Runs every 5 minutes via cron
|
||||||
|
Extracts gems from uncurated memories and stores them in gems_tr
|
||||||
|
|
||||||
- Queries all uncurated memories from memories_tr
|
REQUIRES: TrueRecall v1 (provides memories_tr via watcher)
|
||||||
- Sends batch to qwen3 for gem extraction
|
|
||||||
- Stores gems to gems_tr
|
|
||||||
- Marks processed memories as curated=true
|
|
||||||
|
|
||||||
Usage:
|
|
||||||
python3 curator_timer.py --config curator_config.json
|
|
||||||
python3 curator_timer.py --config curator_config.json --dry-run
|
|
||||||
"""
|
"""
|
||||||
|
|
||||||
import os
|
|
||||||
import sys
|
import sys
|
||||||
import json
|
import json
|
||||||
import argparse
|
import hashlib
|
||||||
import requests
|
import requests
|
||||||
from datetime import datetime, timezone
|
from datetime import datetime, timezone
|
||||||
from pathlib import Path
|
|
||||||
from typing import List, Dict, Any, Optional
|
from typing import List, Dict, Any, Optional
|
||||||
import hashlib
|
|
||||||
|
|
||||||
# Load config
|
# Configuration - EDIT THESE for your environment
|
||||||
def load_config(config_path: str) -> Dict[str, Any]:
|
QDRANT_URL = "http://<QDRANT_IP>:6333"
|
||||||
with open(config_path, 'r') as f:
|
OLLAMA_URL = "http://<OLLAMA_IP>:11434"
|
||||||
return json.load(f)
|
SOURCE_COLLECTION = "memories_tr"
|
||||||
|
TARGET_COLLECTION = "gems_tr"
|
||||||
# Default paths
|
EMBEDDING_MODEL = "snowflake-arctic-embed2"
|
||||||
SCRIPT_DIR = Path(__file__).parent
|
MAX_BATCH = 100
|
||||||
DEFAULT_CONFIG = SCRIPT_DIR / "curator_config.json"
|
USER_ID = "<USER_ID>"
|
||||||
|
|
||||||
# Curator prompt path
|
|
||||||
CURATOR_PROMPT_PATH = Path("~/.openclaw/workspace/.local_projects/true-recall-v2/curator-prompt.md")
|
|
||||||
|
|
||||||
|
|
||||||
def load_curator_prompt() -> str:
|
def get_uncurated_memories(qdrant_url: str, collection: str, user_id: str, max_batch: int = 100) -> List[Dict[str, Any]]:
|
||||||
"""Load the curator system prompt."""
|
"""Fetch uncurated memories from Qdrant."""
|
||||||
try:
|
try:
|
||||||
with open(CURATOR_PROMPT_PATH, 'r') as f:
|
response = requests.post(
|
||||||
return f.read()
|
f"{qdrant_url}/collections/{collection}/points/scroll",
|
||||||
except FileNotFoundError:
|
json={
|
||||||
print(f"⚠️ Curator prompt not found at {CURATOR_PROMPT_PATH}")
|
"limit": max_batch,
|
||||||
return """You are The Curator. Extract meaningful gems from conversation history.
|
"filter": {
|
||||||
Extract facts, insights, decisions, preferences, and context that would be valuable to remember.
|
"must": [
|
||||||
Output a JSON array of gems with fields: gem, context, snippet, categories, importance (1-5), confidence (0-0.99)."""
|
{"key": "user_id", "match": {"value": user_id}},
|
||||||
|
{"key": "curated", "match": {"value": False}}
|
||||||
|
]
|
||||||
def get_uncurated_memories(qdrant_url: str, collection: str, user_id: str, max_batch: int) -> List[Dict[str, Any]]:
|
},
|
||||||
"""Query Qdrant for uncurated memories."""
|
"with_payload": True
|
||||||
filter_data = {
|
},
|
||||||
"must": [
|
timeout=30
|
||||||
{"key": "user_id", "match": {"value": user_id}},
|
)
|
||||||
{"key": "curated", "match": {"value": False}}
|
response.raise_for_status()
|
||||||
]
|
data = response.json()
|
||||||
}
|
return data.get("result", {}).get("points", [])
|
||||||
|
except Exception as e:
|
||||||
all_points = []
|
print(f"Error fetching memories: {e}", file=sys.stderr)
|
||||||
offset = None
|
return []
|
||||||
iterations = 0
|
|
||||||
max_iterations = 10
|
|
||||||
|
|
||||||
while len(all_points) < max_batch and iterations < max_iterations:
|
|
||||||
iterations += 1
|
|
||||||
scroll_data = {
|
|
||||||
"limit": min(100, max_batch - len(all_points)),
|
|
||||||
"with_payload": True,
|
|
||||||
"filter": filter_data
|
|
||||||
}
|
|
||||||
|
|
||||||
if offset:
|
|
||||||
scroll_data["offset"] = offset
|
|
||||||
|
|
||||||
try:
|
|
||||||
response = requests.post(
|
|
||||||
f"{qdrant_url}/collections/{collection}/points/scroll",
|
|
||||||
json=scroll_data,
|
|
||||||
headers={"Content-Type": "application/json"},
|
|
||||||
timeout=30
|
|
||||||
)
|
|
||||||
response.raise_for_status()
|
|
||||||
result = response.json()
|
|
||||||
points = result.get("result", {}).get("points", [])
|
|
||||||
|
|
||||||
if not points:
|
|
||||||
break
|
|
||||||
|
|
||||||
all_points.extend(points)
|
|
||||||
offset = result.get("result", {}).get("next_page_offset")
|
|
||||||
if not offset:
|
|
||||||
break
|
|
||||||
except Exception as e:
|
|
||||||
print(f"Error querying Qdrant: {e}", file=sys.stderr)
|
|
||||||
break
|
|
||||||
|
|
||||||
# Convert to simple dicts
|
|
||||||
memories = []
|
|
||||||
for point in all_points:
|
|
||||||
payload = point.get("payload", {})
|
|
||||||
memories.append({
|
|
||||||
"id": point.get("id"),
|
|
||||||
"content": payload.get("content", ""),
|
|
||||||
"role": payload.get("role", ""),
|
|
||||||
"timestamp": payload.get("timestamp", ""),
|
|
||||||
"turn": payload.get("turn", 0),
|
|
||||||
**payload
|
|
||||||
})
|
|
||||||
|
|
||||||
return memories[:max_batch]
|
|
||||||
|
|
||||||
|
|
||||||
def extract_gems(memories: List[Dict[str, Any]], ollama_url: str) -> List[Dict[str, Any]]:
|
def extract_gems(memories: List[Dict[str, Any]], ollama_url: str) -> List[Dict[str, Any]]:
|
||||||
"""Send memories to qwen3 for gem extraction."""
|
"""Send memories to LLM for gem extraction."""
|
||||||
if not memories:
|
if not memories:
|
||||||
return []
|
return []
|
||||||
|
|
||||||
# Build conversation from memories (support both 'text' and 'content' fields)
|
SKIP_PATTERNS = [
|
||||||
|
"gems extracted", "curator", "curation complete",
|
||||||
|
"system is running", "validation round",
|
||||||
|
]
|
||||||
|
|
||||||
conversation_lines = []
|
conversation_lines = []
|
||||||
for i, mem in enumerate(memories):
|
for i, mem in enumerate(memories):
|
||||||
# Support both migrated memories (text) and watcher memories (content)
|
payload = mem.get("payload", {})
|
||||||
text = mem.get("text", "") or mem.get("content", "")
|
text = payload.get("text", "") or payload.get("content", "")
|
||||||
if text:
|
role = payload.get("role", "")
|
||||||
# Truncate very long texts
|
|
||||||
text = text[:500] if len(text) > 500 else text
|
if not text:
|
||||||
conversation_lines.append(f"[{i+1}] {text}")
|
continue
|
||||||
|
text = str(text)
|
||||||
|
|
||||||
|
if role == "assistant":
|
||||||
|
continue
|
||||||
|
|
||||||
|
text_lower = text.lower()
|
||||||
|
if len(text) < 20:
|
||||||
|
continue
|
||||||
|
if any(pattern in text_lower for pattern in SKIP_PATTERNS):
|
||||||
|
continue
|
||||||
|
|
||||||
|
text = text[:500] if len(text) > 500 else text
|
||||||
|
conversation_lines.append(f"[{i+1}] {text}")
|
||||||
|
|
||||||
|
if not conversation_lines:
|
||||||
|
return []
|
||||||
|
|
||||||
conversation_text = "\n\n".join(conversation_lines)
|
conversation_text = "\n\n".join(conversation_lines)
|
||||||
|
|
||||||
# Simple extraction prompt
|
|
||||||
prompt = """You are a memory curator. Extract atomic facts from the conversation below.
|
prompt = """You are a memory curator. Extract atomic facts from the conversation below.
|
||||||
|
|
||||||
For each distinct fact/decision/preference, output a JSON object with:
|
For each distinct fact/decision/preference, output a JSON object with:
|
||||||
- "text": the atomic fact (1-2 sentences)
|
- "text": the atomic fact (1-2 sentences) - use FIRST PERSON ("I" not "User")
|
||||||
- "category": one of [decision, preference, technical, project, knowledge, system]
|
- "category": one of [decision, preference, technical, project, knowledge, system]
|
||||||
- "importance": "high" or "medium"
|
- "importance": "high" or "medium"
|
||||||
|
|
||||||
Return ONLY a JSON array. Example:
|
Return ONLY a JSON array. Example:
|
||||||
[
|
[
|
||||||
{"text": "User decided to use Redis for caching", "category": "decision", "importance": "high"},
|
{"text": "I decided to use Redis for caching", "category": "decision", "importance": "high"},
|
||||||
{"text": "User prefers dark mode", "category": "preference", "importance": "medium"}
|
{"text": "I prefer dark mode", "category": "preference", "importance": "medium"}
|
||||||
]
|
]
|
||||||
|
|
||||||
If no extractable facts, return [].
|
If no extractable facts, return [].
|
||||||
@@ -152,7 +110,7 @@ CONVERSATION:
|
|||||||
response = requests.post(
|
response = requests.post(
|
||||||
f"{ollama_url}/api/generate",
|
f"{ollama_url}/api/generate",
|
||||||
json={
|
json={
|
||||||
"model": "qwen3:30b-a3b-instruct-2507-q8_0",
|
"model": "<CURATOR_MODEL>",
|
||||||
"system": prompt,
|
"system": prompt,
|
||||||
"prompt": full_prompt,
|
"prompt": full_prompt,
|
||||||
"stream": False,
|
"stream": False,
|
||||||
@@ -169,28 +127,20 @@ CONVERSATION:
|
|||||||
return []
|
return []
|
||||||
|
|
||||||
result = response.json()
|
result = response.json()
|
||||||
output = result.get('response', '').strip()
|
response_text = result.get("response", "")
|
||||||
|
|
||||||
# Extract JSON from output
|
|
||||||
if '```json' in output:
|
|
||||||
output = output.split('```json')[1].split('```')[0].strip()
|
|
||||||
elif '```' in output:
|
|
||||||
output = output.split('```')[1].split('```')[0].strip()
|
|
||||||
|
|
||||||
try:
|
try:
|
||||||
# Find JSON array in output
|
start = response_text.find('[')
|
||||||
start_idx = output.find('[')
|
end = response_text.rfind(']')
|
||||||
end_idx = output.rfind(']')
|
if start == -1 or end == -1:
|
||||||
if start_idx != -1 and end_idx != -1 and end_idx > start_idx:
|
return []
|
||||||
output = output[start_idx:end_idx+1]
|
json_str = response_text[start:end+1]
|
||||||
|
gems = json.loads(json_str)
|
||||||
gems = json.loads(output)
|
|
||||||
if not isinstance(gems, list):
|
if not isinstance(gems, list):
|
||||||
gems = [gems] if gems else []
|
return []
|
||||||
return gems
|
return gems
|
||||||
except json.JSONDecodeError as e:
|
except json.JSONDecodeError as e:
|
||||||
print(f"Error parsing curator output: {e}", file=sys.stderr)
|
print(f"JSON parse error: {e}", file=sys.stderr)
|
||||||
print(f"Raw output: {repr(output[:500])}...", file=sys.stderr)
|
|
||||||
return []
|
return []
|
||||||
|
|
||||||
|
|
||||||
@@ -199,50 +149,35 @@ def get_embedding(text: str, ollama_url: str) -> Optional[List[float]]:
|
|||||||
try:
|
try:
|
||||||
response = requests.post(
|
response = requests.post(
|
||||||
f"{ollama_url}/api/embeddings",
|
f"{ollama_url}/api/embeddings",
|
||||||
json={"model": "snowflake-arctic-embed2", "prompt": text},
|
json={
|
||||||
|
"model": EMBEDDING_MODEL,
|
||||||
|
"prompt": text
|
||||||
|
},
|
||||||
timeout=30
|
timeout=30
|
||||||
)
|
)
|
||||||
response.raise_for_status()
|
response.raise_for_status()
|
||||||
return response.json()['embedding']
|
data = response.json()
|
||||||
|
return data.get("embedding")
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
print(f"Error getting embedding: {e}", file=sys.stderr)
|
print(f"Error getting embedding: {e}", file=sys.stderr)
|
||||||
return None
|
return None
|
||||||
|
|
||||||
|
|
||||||
def store_gem(gem: Dict[str, Any], user_id: str, qdrant_url: str, target_collection: str, ollama_url: str) -> bool:
|
def store_gem(gem: Dict[str, Any], vector: List[float], qdrant_url: str, target_collection: str, user_id: str) -> bool:
|
||||||
"""Store a single gem to Qdrant."""
|
"""Store a gem in Qdrant."""
|
||||||
# Support both old format (gem, context, snippet) and new format (text, category, importance)
|
embedding_text = gem.get("text", "") or gem.get("gem", "")
|
||||||
embedding_text = gem.get('text', '') or gem.get('gem', '')
|
|
||||||
if not embedding_text:
|
|
||||||
embedding_text = f"{gem.get('gem', '')} {gem.get('context', '')} {gem.get('snippet', '')}".strip()
|
|
||||||
|
|
||||||
if not embedding_text:
|
hash_content = f"{user_id}:{embedding_text[:100]}"
|
||||||
print(f"⚠️ Empty embedding text for gem, skipping", file=sys.stderr)
|
|
||||||
return False
|
|
||||||
|
|
||||||
vector = get_embedding(embedding_text, ollama_url)
|
|
||||||
|
|
||||||
if vector is None:
|
|
||||||
print(f"⚠️ Failed to get embedding for gem", file=sys.stderr)
|
|
||||||
return False
|
|
||||||
|
|
||||||
# Generate ID
|
|
||||||
hash_content = f"{user_id}:{gem.get('conversation_id', '')}:{gem.get('turn_range', '')}:{gem.get('gem', '')[:50]}"
|
|
||||||
hash_bytes = hashlib.sha256(hash_content.encode()).digest()[:8]
|
hash_bytes = hashlib.sha256(hash_content.encode()).digest()[:8]
|
||||||
gem_id = int.from_bytes(hash_bytes, byteorder='big') % (2**63)
|
gem_id = int.from_bytes(hash_bytes, byteorder='big') % (2**63)
|
||||||
|
|
||||||
# Normalize gem fields - ensure we have text field
|
|
||||||
payload = {
|
payload = {
|
||||||
|
"text": embedding_text,
|
||||||
|
"category": gem.get("category", "fact"),
|
||||||
|
"importance": gem.get("importance", "medium"),
|
||||||
"user_id": user_id,
|
"user_id": user_id,
|
||||||
"text": gem.get('text', gem.get('gem', '')),
|
"created_at": datetime.now(timezone.utc).isoformat()
|
||||||
"category": gem.get('category', 'general'),
|
|
||||||
"importance": gem.get('importance', 'medium'),
|
|
||||||
"curated_at": datetime.now(timezone.utc).isoformat()
|
|
||||||
}
|
}
|
||||||
# Preserve any other fields from gem
|
|
||||||
for key in ['context', 'snippet', 'confidence', 'conversation_id', 'turn_range']:
|
|
||||||
if key in gem:
|
|
||||||
payload[key] = gem[key]
|
|
||||||
|
|
||||||
try:
|
try:
|
||||||
response = requests.put(
|
response = requests.put(
|
||||||
@@ -264,7 +199,7 @@ def store_gem(gem: Dict[str, Any], user_id: str, qdrant_url: str, target_collect
|
|||||||
|
|
||||||
|
|
||||||
def mark_curated(memory_ids: List, qdrant_url: str, collection: str) -> bool:
|
def mark_curated(memory_ids: List, qdrant_url: str, collection: str) -> bool:
|
||||||
"""Mark memories as curated in Qdrant using POST /points/payload format."""
|
"""Mark memories as curated."""
|
||||||
if not memory_ids:
|
if not memory_ids:
|
||||||
return True
|
return True
|
||||||
|
|
||||||
@@ -288,79 +223,58 @@ def mark_curated(memory_ids: List, qdrant_url: str, collection: str) -> bool:
|
|||||||
|
|
||||||
|
|
||||||
def main():
|
def main():
|
||||||
parser = argparse.ArgumentParser(description="TrueRecall Timer Curator")
|
print("TrueRecall v2 - Timer Curator")
|
||||||
parser.add_argument("--config", "-c", default=str(DEFAULT_CONFIG), help="Config file path")
|
print(f"User: {USER_ID}")
|
||||||
parser.add_argument("--dry-run", "-n", action="store_true", help="Don't write, just preview")
|
print(f"Source: {SOURCE_COLLECTION}")
|
||||||
args = parser.parse_args()
|
print(f"Target: {TARGET_COLLECTION}")
|
||||||
|
print(f"Max batch: {MAX_BATCH}\n")
|
||||||
|
|
||||||
config = load_config(args.config)
|
print("Fetching uncurated memories...")
|
||||||
|
memories = get_uncurated_memories(QDRANT_URL, SOURCE_COLLECTION, USER_ID, MAX_BATCH)
|
||||||
qdrant_url = os.getenv("QDRANT_URL", "http://<QDRANT_IP>:6333")
|
print(f"Found {len(memories)} uncurated memories\n")
|
||||||
ollama_url = os.getenv("OLLAMA_URL", "http://<OLLAMA_IP>:11434")
|
|
||||||
|
|
||||||
user_id = config.get("user_id", "rob")
|
|
||||||
source_collection = config.get("source_collection", "memories_tr")
|
|
||||||
target_collection = config.get("target_collection", "gems_tr")
|
|
||||||
max_batch = config.get("max_batch_size", 100)
|
|
||||||
|
|
||||||
print(f"🔍 TrueRecall Timer Curator")
|
|
||||||
print(f"👤 User: {user_id}")
|
|
||||||
print(f"📥 Source: {source_collection}")
|
|
||||||
print(f"💎 Target: {target_collection}")
|
|
||||||
print(f"📦 Max batch: {max_batch}")
|
|
||||||
if args.dry_run:
|
|
||||||
print("🏃 DRY RUN MODE")
|
|
||||||
print()
|
|
||||||
|
|
||||||
# Get uncurated memories
|
|
||||||
print("📥 Fetching uncurated memories...")
|
|
||||||
memories = get_uncurated_memories(qdrant_url, source_collection, user_id, max_batch)
|
|
||||||
print(f"✅ Found {len(memories)} uncurated memories")
|
|
||||||
|
|
||||||
if not memories:
|
if not memories:
|
||||||
print("🤷 Nothing to curate. Exiting.")
|
print("Nothing to curate. Exiting.")
|
||||||
return
|
return
|
||||||
|
|
||||||
# Extract gems
|
print("Sending memories to curator...")
|
||||||
print(f"\n🧠 Sending {len(memories)} memories to curator...")
|
gems = extract_gems(memories, OLLAMA_URL)
|
||||||
gems = extract_gems(memories, ollama_url)
|
print(f"Extracted {len(gems)} gems\n")
|
||||||
print(f"✅ Extracted {len(gems)} gems")
|
|
||||||
|
|
||||||
if not gems:
|
if not gems:
|
||||||
print("⚠️ No gems extracted. Nothing to store.")
|
print("No gems extracted. Exiting.")
|
||||||
# Still mark as curated so we don't reprocess
|
|
||||||
memory_ids = [m["id"] for m in memories] # Keep as integers
|
|
||||||
mark_curated(memory_ids, qdrant_url, source_collection)
|
|
||||||
return
|
return
|
||||||
|
|
||||||
# Preview
|
print("Gems preview:")
|
||||||
print("\n💎 Gems preview:")
|
|
||||||
for i, gem in enumerate(gems[:3], 1):
|
for i, gem in enumerate(gems[:3], 1):
|
||||||
print(f" {i}. {gem.get('gem', 'N/A')[:80]}...")
|
text = gem.get("text", "N/A")[:50]
|
||||||
|
print(f" {i}. {text}...")
|
||||||
if len(gems) > 3:
|
if len(gems) > 3:
|
||||||
print(f" ... and {len(gems) - 3} more")
|
print(f" ... and {len(gems) - 3} more")
|
||||||
|
print()
|
||||||
|
|
||||||
if args.dry_run:
|
print("Storing gems...")
|
||||||
print("\n🏃 DRY RUN: Not storing gems or marking curated.")
|
|
||||||
return
|
|
||||||
|
|
||||||
# Store gems
|
|
||||||
print(f"\n💾 Storing {len(gems)} gems...")
|
|
||||||
stored = 0
|
stored = 0
|
||||||
for gem in gems:
|
for gem in gems:
|
||||||
if store_gem(gem, user_id, qdrant_url, target_collection, ollama_url):
|
text = gem.get("text", "") or gem.get("gem", "")
|
||||||
stored += 1
|
if not text:
|
||||||
print(f"✅ Stored: {stored}/{len(gems)}")
|
continue
|
||||||
|
|
||||||
|
vector = get_embedding(text, OLLAMA_URL)
|
||||||
|
if vector:
|
||||||
|
if store_gem(gem, vector, QDRANT_URL, TARGET_COLLECTION, USER_ID):
|
||||||
|
stored += 1
|
||||||
|
|
||||||
# Mark memories as curated
|
print(f"Stored: {stored}/{len(gems)}\n")
|
||||||
print("\n📝 Marking memories as curated...")
|
|
||||||
memory_ids = [m["id"] for m in memories] # Keep as integers
|
print("Marking memories as curated...")
|
||||||
if mark_curated(memory_ids, qdrant_url, source_collection):
|
memory_ids = [mem.get("id") for mem in memories if mem.get("id")]
|
||||||
print(f"✅ Marked {len(memory_ids)} memories as curated")
|
if mark_curated(memory_ids, QDRANT_URL, SOURCE_COLLECTION):
|
||||||
|
print(f"Marked {len(memory_ids)} memories as curated\n")
|
||||||
else:
|
else:
|
||||||
print(f"⚠️ Failed to mark some memories as curated")
|
print("Failed to mark memories\n")
|
||||||
|
|
||||||
print("\n🎉 Curation complete!")
|
print("Curation complete!")
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
if __name__ == "__main__":
|
||||||
|
|||||||
Reference in New Issue
Block a user