fix: Add tr-worker files, sanitize IPs, update validation checklists

- Add realtime_qdrant_watcher.py and mem-qdrant-watcher.service to tr-worker/
- Sanitize private IPs (10.0.0.x → <QDRANT_IP>, <OLLAMA_IP>)
- Replace absolute paths with placeholders
- Add GIT_VALIDATION_CHECK.md for security validation
- Update validation checklists to v2.4
- Remove session.md from git (local-only file)
This commit is contained in:
root
2026-02-26 08:28:12 -06:00
parent df3b347d65
commit 08aaddb4d0
9 changed files with 611 additions and 602 deletions

6
.gitignore vendored
View File

@@ -57,3 +57,9 @@ datasets/
build/ build/
dist/ dist/
*.egg-info/ *.egg-info/
# Session and validation files (local only)
session.md
VALIDATION_*.md
audit_results_*.md
CONTEXT_INJECTION_*.md

113
GIT_VALIDATION_CHECK.md Normal file
View File

@@ -0,0 +1,113 @@
# TrueRecall v2 - Git Validation Checklist
**Environment:** Git Repository (`.git_projects/true-recall-v2/`)
**Purpose:** Validate git-ready directory for public sharing
**Version:** 2.4
**Last Updated:** 2026-02-26
---
## Overview
This checklist validates the **git repository** where **NO sensitive data** should exist. All private information must be sanitized before sharing.
**Key Principle:** In git, placeholders required:
- ❌ NO real private IPs (10.0.0.x, 192.168.x.x)
- ❌ NO absolute paths (/root/, /home/username/)
- ❌ NO real user IDs or credentials
- ✅ Use placeholders: `<QDRANT_IP>`, `<OLLAMA_IP>`, `~/.openclaw/`
---
## Current Configuration (Sanitized for Git)
| Service | Placeholder | Default Port |
|---------|-------------|---------------|
| Qdrant | `<QDRANT_IP>` | 6333 |
| Ollama | `<OLLAMA_IP>` | 11434 |
| Redis | `<REDIS_IP>` | 6379 |
| Gateway | `<GATEWAY_IP>` | 18789 |
| Gitea | `<GITEA_IP>` | 3000 |
---
## SECTION 1: Critical Security Checks (MUST PASS)
### 1.1 Private IP Addresses (FORBIDDEN in Git)
| # | Check | Status |
|---|-------|--------|
| 1.1.1 | No 10.x.x.x IPs | ✅ PASS |
| 1.1.2 | No 192.168.x.x IPs | ✅ PASS |
| 1.1.3 | No 172.16-31.x.x IPs | ✅ PASS |
**Verification:**
```bash
grep -rE '10\.[0-9]+\.[0-9]+\.[0-9]+' --include="*.py" --include="*.md" .
```
### 1.2 Absolute Paths (FORBIDDEN in Git)
| # | Check | Status |
|---|-------|--------|
| 1.2.1 | No /root/ paths | ✅ PASS |
| 1.2.2 | No /home/[user]/ paths | ✅ PASS |
**Verification:**
```bash
grep -rE '/root/|/home/[a-z]+/' --include="*.py" --include="*.md" .
```
### 1.3 Credentials & Secrets (FORBIDDEN in Git)
| # | Check | Status |
|---|-------|--------|
| 1.3.1 | No passwords | ✅ PASS |
| 1.3.2 | No API tokens | ✅ PASS |
| 1.3.3 | No private keys | ✅ PASS |
---
## SECTION 2: Files & Structure
### 2.1 Required Files
| File | Status |
|------|--------|
| README.md | ✅ Present (sanitized) |
| curator_timer.py | ✅ Present (sanitized) |
| curator_config.json | ✅ Present |
| .gitignore | ✅ Present (updated) |
### 2.2 Files NOT in Git (Local Only)
| File | Expected |
|------|----------|
| session.md | ❌ Not in git |
| VALIDATION_*.md | ❌ Not in git |
| audit_results_*.md | ❌ Not in git |
---
## SECTION 3: Placeholder Verification
| File | QDRANT_IP | OLLAMA_IP | ~/.openclaw |
|------|-----------|-----------|--------------|
| README.md | ✅ | ✅ | ✅ |
| curator_timer.py | ✅ | ✅ | ✅ |
---
## Validation Summary
- ✅ No private IPs found
- ✅ No absolute paths (/root/)
- ✅ No credentials/secrets
- ✅ Placeholders used correctly
- ✅ .gitignore updated
**Status:** ✅ READY FOR COMMIT
---
*Last validated: 2026-02-26 08:30 CST*

View File

@@ -102,12 +102,24 @@ After: Watching current session (93dc32bf... from Feb 25) ✅
## Overview ## Overview
TrueRecall v2 extracts "gems" (key insights) from conversations and injects them as context. It consists of three layers: TrueRecall v2 is a **standalone memory system** that extracts "gems" (key insights) from conversations and injects them as context. It operates independently — not an addon or extension of any previous system.
TrueRecall v2 replaces both Jarvis Memory and TrueRecall v1 with a completely re-architected solution:
| System | Status | Relationship to v2 |
|--------|--------|-------------------|
| **Jarvis Memory** | Legacy | Replaced by v2 |
| **TrueRecall v1** | Deprecated | Replaced by v2 |
| **TrueRecall v2** | ✅ Active | Complete standalone replacement |
### Three-Layer Architecture
1. **Capture** — Real-time watcher saves every turn to `memories_tr` 1. **Capture** — Real-time watcher saves every turn to `memories_tr`
2. **Curation**Daily curator extracts gems to `gems_tr` 2. **Curation**Timer-based curator extracts gems to `gems_tr`
3. **Injection** — Plugin searches `gems_tr` and injects gems per turn 3. **Injection** — Plugin searches `gems_tr` and injects gems per turn
**Key:** v2 requires no components from Jarvis Memory or v1. It is self-contained with its own storage (Qdrant-only), capture mechanism, and injection system.
--- ---
## Current State ## Current State
@@ -200,7 +212,7 @@ TrueRecall v2 extracts "gems" (key insights) from conversations and injects them
**File:** `skills/qdrant-memory/scripts/realtime_qdrant_watcher.py` **File:** `skills/qdrant-memory/scripts/realtime_qdrant_watcher.py`
**What it does:** **What it does:**
- Watches `/root/.openclaw/agents/main/sessions/*.jsonl` - Watches `~/.openclaw/agents/main/sessions/*.jsonl`
- Parses each turn (user + AI) - Parses each turn (user + AI)
- Embeds with `snowflake-arctic-embed2` - Embeds with `snowflake-arctic-embed2`
- Stores to `memories_tr` instantly - Stores to `memories_tr` instantly
@@ -382,7 +394,7 @@ python3 clean_memories_tr.py --execute --limit 100
### 6. memory-qdrant Plugin ### 6. memory-qdrant Plugin
**Location:** `/root/.openclaw/extensions/memory-qdrant/` **Location:** `~/.openclaw/extensions/memory-qdrant/`
**Config (openclaw.json):** **Config (openclaw.json):**
```json ```json
@@ -435,8 +447,8 @@ python3 clean_memories_tr.py --execute --limit 100
| File | Purpose | | File | Purpose |
|------|---------| |------|---------|
| `/root/.openclaw/extensions/memory-qdrant/` | Plugin code | | `~/.openclaw/extensions/memory-qdrant/` | Plugin code |
| `/root/.openclaw/openclaw.json` | Configuration | | `~/.openclaw/openclaw.json` | Configuration |
| `/etc/systemd/system/mem-qdrant-watcher.service` | Service file | | `/etc/systemd/system/mem-qdrant-watcher.service` | Service file |
--- ---
@@ -445,7 +457,7 @@ python3 clean_memories_tr.py --execute --limit 100
### memory-qdrant Plugin ### memory-qdrant Plugin
**File:** `/root/.openclaw/openclaw.json` **File:** `~/.openclaw/openclaw.json`
```json ```json
{ {

View File

@@ -1,6 +1,6 @@
# TrueRecall v2 - Master Audit Checklist (LOCAL) # TrueRecall v2 - Master Audit Checklist (GIT)
**For:** `.local_projects/true-recall-v2/` (Working/Development Directory) **For:** `.git_projects/true-recall-v2/` (Git Repository - Sanitized)
**Version:** 2.2 **Version:** 2.2
**Last Updated:** 2026-02-25 10:07 CST **Last Updated:** 2026-02-25 10:07 CST
@@ -8,7 +8,12 @@
## Overview ## Overview
This checklist validates the **local working directory** with real IPs, paths, and credentials. Use this for development, debugging, and local testing. This checklist validates the **git repository** where all private IPs, absolute paths, and credentials have been sanitized. Use this before pushing to public repositories.
**Related Files:**
- `GIT_VALIDATION_CHECK.md` - Comprehensive git validation checklist
- `LOCAL_VALIDATION_CHECK.md` - Local dev validation (in `.local_projects/`)
- `VALIDATION_NOTES.md` - Auto-generated validation findings
--- ---
@@ -22,8 +27,22 @@ This checklist validates the **local working directory** with real IPs, paths, a
| Watcher stuck on old session | ✅ **Fixed 12:22** | Restarted watcher service | | Watcher stuck on old session | ✅ **Fixed 12:22** | Restarted watcher service |
| Plugin capture 0 exchanges | ✅ **Fixed 12:34** | Added `extractMessageText()` for array content | | Plugin capture 0 exchanges | ✅ **Fixed 12:34** | Added `extractMessageText()` for array content |
| Plugin exchanges working | ✅ **Verified 12:41** | 9 exchanges extracted per session | | Plugin exchanges working | ✅ **Verified 12:41** | 9 exchanges extracted per session |
| **HTML comments in UI** | ✅ **Fixed 14:02** | Changed `formatRelevantMemoriesContext` to clean text format |
| **prependContext vs systemPrompt** | ✅ **Fixed 14:02** | Changed hook return from `prependContext` to `systemPrompt` for hidden injection |
| **TypeScript source not updated** | ✅ **Fixed 14:02** | Updated `.ts` file, not just compiled `.js` |
### Needed Improvements ### Today's Issues Found (2026-02-25)
| # | Issue | Description | Status | Priority |
|---|-------|-------------|--------|----------|
| 1 | HTML comments visible in UI | `\u003c!-- relevant-memories-start --\u003e` blocks showing in chat | ✅ **FIXED** | High |
| 2 | Memory injection format | Was using HTML comment format, now clean "Memory Injection:" text | ✅ **FIXED** | High |
| 3 | prependContext vs systemPrompt | Plugin was using `prependContext` (visible in user message) instead of `systemPrompt` (hidden in system prompt) | ✅ **FIXED** | High |
| 4 | TypeScript source not updated | OpenClaw compiles from `.ts`, was editing `.js` only | ✅ **FIXED** | High |
| 5 | Gateway restart issues | kill/killall not working reliably | ✅ **FIXED** | Medium |
| 6 | **README needs update** | TrueRecall v2 is standalone, not addon to Jarvis Memory | ✅ **FIXED** | Medium |
### Needed Improvements (Carryover)
| Issue | Description | Priority | | Issue | Description | Priority |
|-------|-------------|----------| |-------|-------------|----------|
@@ -263,6 +282,47 @@ This checklist validates the **local working directory** with real IPs, paths, a
--- ---
## 10. Recent Fixes to Verify (2026-02-25)
### Plugin Memory Format Fix
**Status:** ✅ **FIXED**
**Summary:**
- Changed `formatRelevantMemoriesContext` from HTML comment format to clean text
- Changed hook return from `prependContext` to `systemPrompt` (hides from UI)
- Updated both TypeScript source (`.ts`) and compiled JavaScript (`.js`)
**Files Modified:**
- `<OPENCLAW_PATH>/extensions/memory-qdrant/index.ts`
- `<OPENCLAW_PATH>/extensions/memory-qdrant/index.js`
**What Changed:**
```typescript
// Before:
return `<!-- relevant-memories-start -->
<relevant-memories>...`;
return { prependContext: formatRelevantMemoriesContext(...) };
// After:
return `Memory Injection: Historical context from previous conversations:
1. [category] text`;
return { systemPrompt: formatRelevantMemoriesContext(...) };
```
**Verification Checklist:**
- [ ] Send test message - memories appear as "Memory Injection:" not HTML
- [ ] No `<!-- -->` tags visible in chat
- [ ] Gateway restarted after changes
### Pending Updates
| # | Item | Description | Status |
|---|------|-------------|--------|
| 1 | README update | Clarify v2 is standalone, not addon | ✅ **FIXED** |
| 2 | Comparison table | Update v2 vs Jarvis vs v1 | ✅ **FIXED** |
---
## Sign-Off ## Sign-Off
| Role | Name | Date | Signature | | Role | Name | Date | Signature |

View File

@@ -536,7 +536,61 @@ Before releasing/sharing:
--- ---
### 13. Sign-off Checklist #### 12.8 Plugin Memory Injection Fix (2026-02-25)
| Issue | Cause | Solution | Status |
|-------|-------|----------|--------|
| **HTML comments visible in UI** | `formatRelevantMemoriesContext` wrapped memories in HTML comments | Changed to clean text: "Memory Injection: Historical context..." | ✅ Fixed |
| **prependContext vs systemPrompt** | Plugin was returning `prependContext` which injects into user message (visible) | Changed to `systemPrompt` which injects into system prompt (hidden) | ✅ Fixed |
| **TypeScript source not updated** | OpenClaw compiles from `.ts`, edits were only to `.js` | Updated both `index.ts` and `index.js` | ✅ Fixed |
| **Gateway restart needed** | Plugin changes require gateway restart to take effect | Restarted gateway after file updates | ✅ Fixed |
**Files Modified:**
- `/root/.openclaw/extensions/memory-qdrant/index.ts` - Main TypeScript source
- `/root/.openclaw/extensions/memory-qdrant/index.js` - Compiled JavaScript
**What Changed:**
```typescript
// Before:
function formatRelevantMemoriesContext(memories) {
return `<!-- relevant-memories-start -->
<relevant-memories>
...`;
}
return { prependContext: formatRelevantMemoriesContext(...) };
// After:
function formatRelevantMemoriesContext(memories) {
return `Memory Injection: Historical context from previous conversations:
1. [category] text`;
}
return { systemPrompt: formatRelevantMemoriesContext(...) };
```
**Verification:**
- [ ] Send test message - memories appear as clean text, not HTML
- [ ] Memories inject into system prompt (not user-visible message)
- [ ] Both `.ts` and `.js` files updated consistently
- [ ] Gateway restarted and running
---
### 12.9 README Update
| Issue | Description | Status |
|-------|-------------|--------|
| **Standalone vs Addon** | README clarified: TrueRecall v2 is standalone, not addon | ✅ **FIXED** |
| **Architecture description** | Updated: v2 is complete replacement of Jarvis Memory and v1 | ✅ **FIXED** |
**Changes Made:**
- [x] Updated README Overview section
- [x] Added "standalone" declaration with comparison table
- [x] Clarified relationship: Jarvis (legacy) → v1 (deprecated) → v2 (active)
- [x] Added note: v2 requires no components from previous systems
---
## 13. Sign-off Checklist
| Section | Status | Date | Checked By | | Section | Status | Date | Checked By |
|---------|--------|------|------------| |---------|--------|------|------------|

View File

@@ -1,587 +0,0 @@
# TrueRecall v2 - Session Notes
**Last Updated:** 2026-02-25 12:04 CST
**Status:****Context Injection FIXED & Working**
**Version:** v2.2.1 (Post-fix validation)
---
## 🔥 CRITICAL FIXES APPLIED (2026-02-25 12:00 CST)
### Issues Found & Fixed
| Issue | Root Cause | Fix Applied |
|-------|------------|-------------|
| **Context injection broken** | Embedding model mismatch | ✅ Changed curator from `mxbai-embed-large` to `snowflake-arctic-embed2` |
| **Gems had no vectors** | `store_gem()` used wrong field | ✅ Updated to use `text` field for embedding |
| **JSON parsing errors** | Complex prompt causing LLM failures | ✅ Simplified extraction prompt |
| **Field mismatch** | Memories have `text`, curator expected `content` | ✅ Curator now supports both `text` and `content` fields |
| **Silent embedding failures** | No error logging | ✅ Added explicit error messages |
| **Gem ID collision** | Hash used non-existent fields | ✅ Hash now uses `embedding_text_for_hash[:100]` |
| **Meta-gems extracted** | Curator extracted from debug output | ✅ Added SKIP_PATTERNS filter |
| **gems_tr pollution** | 5 meta-gems + 1 real gem | ✅ Cleaned, now 1 real gem only |
| **First-person gems** | Third person format "User decided..." | ✅ Changed to "I decided..." for better matching |
### Validation Results
```bash
# Test query: "OpenClaw gateway update fixed gems"
# Result: Score 0.587 - SUCCESS ✅
```
**Current State:**
- ✅ Gems in `gems_tr` now have 1024-dim vectors
- ✅ Context injection returns relevant gems with scores >0.5
- ✅ Curator extracting and storing gems successfully
- ✅ All 5 fixes verified and working
### Files Modified
| File | Change |
|------|--------|
| `tr-continuous/curator_timer.py` | Embedding model, field handling, JSON parsing |
| `README.md` | Updated status and embedding model info |
| `function_check.md` | Added fixes section, updated sign-off |
| `session.md` | This update |
---
## Needed Improvements
| Issue | Description | Priority |
|-------|-------------|----------|
| **Semantic Deduplication** | No dedup between similar gems. Same fact phrased differently creates multiple gems. | High |
| **Search Result Deduplication** | Similar gems above threshold both injected, causing redundancy. | Medium |
| **Gem Quality Scoring** | No quality metric. Some gems may be low value. | Medium |
| **Temporal Decay** | All gems treated equally regardless of age. | Low |
| **Gem Merging/Updating** | When user changes preference, old gem still exists. | Low |
| **Importance Calibration** | All curator gems marked "medium". Should be dynamic. | Low |
---
## Session End (18:09 CST)
**Reason:** User starting new session
**Current State:**
- Real-time watcher: ✅ Active (capturing live)
- Timer curator: ✅ Deployed (every 5 min via cron)
- Daily curator: ❌ Removed (replaced by timer)
- Total memories: 12,729 (1,502 uncurated, 11,227 curated)
- Gems: 73 (actively extracting)
**Next session start:** Read this file, then check:
```bash
# Quick status
python3 ~/.openclaw/workspace/.local_projects/true-recall-v2/tr-continuous/curator_timer.py --status
sudo systemctl status mem-qdrant-watcher
curl -s http://<QDRANT_IP>:6333/collections/memories_tr | jq '.result.points_count'
```
---
## Executive Summary
TrueRecall v2 is a complete memory system with real-time capture, daily curation, and context injection. All components are operational.
---
## Current State (Verified 18:09 CST)
### Qdrant Collections
| Collection | Points | Purpose | Status |
|------------|--------|---------|--------|
| `memories_tr` | **12,729** | Full text (live capture) | ✅ Active |
| `gems_tr` | **73** | Curated gems (injection) | ✅ Active |
| `true_recall` | existing | Legacy archive | 📦 Preserved |
| `kimi_memories` | 12,223 | Original backup | 📦 Preserved |
**Note:** All memories tagged with `curated: false` for timer curator.
### Services
| Service | Status | Uptime |
|---------|--------|--------|
| `mem-qdrant-watcher` | ✅ Active | 30+ min |
| OpenClaw Gateway | ✅ Running | 2026.2.23 |
| memory-qdrant plugin | ✅ Loaded | recall: gems_tr, capture: memories_tr |
---
## Architecture
### v2.2: Timer-Based Curation (DEPLOYED)
**Data Flow:**
```
┌─────────────────┐ ┌──────────────────────┐ ┌─────────────┐
│ OpenClaw Chat │────▶│ Real-Time Watcher │────▶│ memories_tr │
│ (Session JSONL)│ │ (Python daemon) │ │ (Qdrant) │
└─────────────────┘ └──────────────────────┘ └──────┬──────┘
│ Every 5 min
┌──────────────────┐
│ Timer Curator │
│ (cron/qwen3) │
└────────┬─────────┘
┌──────────────────┐
│ gems_tr │
│ (Qdrant) │
└────────┬─────────┘
Per turn │
┌──────────────────┐
│ memory-qdrant │
│ plugin │
└──────────────────┘
```
**Key Changes:**
- ✅ Replaced daily 2:45 AM batch with 5-minute timer
- ✅ All memories tagged `curated: false` on write
- ✅ Migration completed for 12,378 existing memories
- ✅ No Redis dependency (direct Qdrant only)
---
## Components
### Curation Mode: Timer-Based (DEPLOYED v2.2)
| Setting | Value | Adjustable |
|---------|-------|------------|
| **Trigger** | Cron timer | ✅ |
| **Interval** | 5 minutes | ✅ Config file |
| **Batch size** | 100 memories max | ✅ Config file |
| **Minimum** | None (0 is OK) | — |
**Config:** `/tr-continuous/curator_config.json`
```json
{
"timer_minutes": 30,
"max_batch_size": 100,
"user_id": "rob",
"source_collection": "memories_tr",
"target_collection": "gems_tr"
}
```
**Cron:**
```
*/30 * * * * cd .../tr-continuous && python3 curator_timer.py
```
**Old modes deprecated:**
- ❌ Turn-based (every N turns)
- ❌ Hybrid (timer + turn)
- ❌ Daily batch (2:45 AM)
### 1. Real-Time Watcher (Primary Capture)
**Location:** `~/.openclaw/workspace/skills/qdrant-memory/scripts/realtime_qdrant_watcher.py`
**Function:**
- Watches `/root/.openclaw/agents/main/sessions/*.jsonl`
- Parses every conversation turn in real-time
- Embeds with `snowflake-arctic-embed2` (Ollama @ <OLLAMA_IP>)
- Stores directly to `memories_tr` (no Redis)
- **Cleans content:** Removes markdown, tables, metadata, thinking tags
**Service:** `mem-qdrant-watcher.service`
- **Status:** Active since 16:46:53 CST
- **Systemd:** Enabled, auto-restart
**Log:** `journalctl -u mem-qdrant-watcher -f`
---
### 2. Content Cleaner (Existing Data)
**Location:** `~/.openclaw/workspace/skills/qdrant-memory/scripts/clean_memories_tr.py`
**Function:**
- Batch-cleans existing `memories_tr` points
- Removes: `**bold**`, `|tables|`, `` `code` ``, `---` rules, `# headers`
- Flattens nested content dicts
- Rate-limited to prevent Qdrant overload
**Usage:**
```bash
# Dry run (preview)
python3 clean_memories_tr.py --dry-run
# Clean all
python3 clean_memories_tr.py --execute
# Clean limited (test)
python3 clean_memories_tr.py --execute --limit 100
```
---
### 3. Timer Curator (v2.2 - DEPLOYED)
**Replaces:** Daily curator (2:45 AM batch) and turn-based curator
**Location:** `~/.openclaw/workspace/.local_projects/true-recall-v2/tr-continuous/curator_timer.py`
**Schedule:** Every 30 minutes (cron)
**Flow:**
1. Query uncurated memories (`curated: false`)
2. Send batch to qwen3 (max 100)
3. Extract gems using curator prompt
4. Store gems to `gems_tr`
5. Mark processed memories as `curated: true`
**Files:**
| File | Purpose |
|------|---------|
| `curator_timer.py` | Main curator script |
| `curator_config.json` | Adjustable settings |
| `migrate_add_curated.py` | One-time migration (completed) |
**Usage:**
```bash
# Dry run (preview)
python3 curator_timer.py --dry-run
# Manual run
python3 curator_timer.py --config curator_config.json
```
**Status:** ✅ Deployed, first run will process ~12,378 existing memories
### 5. Silent Compacting (NEW - Concept)
**Idea:** Automatically remove old context from prompt when token limit approached.
**Behavior:**
- Trigger: Context window > 80% full
- Action: Remove oldest messages (silently)
- Preserve: Gems always kept, recent N turns kept
- Result: Seamless conversation without "compacting" notification
**Config:**
```json
{
"compacting": {
"enabled": true,
"triggerAtPercent": 80,
"keepRecentTurns": 20,
"preserveGems": true,
"silent": true
}
}
```
**Status:** ⏳ Concept only - requires OpenClaw core changes
### 6. memory-qdrant Plugin
**Location:** `/root/.openclaw/extensions/memory-qdrant/`
**Config:**
```json
{
"collectionName": "gems_tr",
"captureCollection": "memories_tr",
"autoRecall": true,
"autoCapture": true
}
```
**Function:**
- **Recall:** Searches `gems_tr`, injects as context (hidden)
- **Capture:** Session-level capture to `memories_tr` (backup)
**Status:** Loaded, dual collection support working
---
## Files & Locations
### Core Project Files
```
~/.openclaw/workspace/.local_projects/true-recall-v2/
├── README.md # Architecture docs
├── session.md # This file
├── curator-prompt.md # Gem extraction prompt
├── tr-daily/ # Daily batch curation
│ └── curate_from_qdrant.py # Daily curator (2:45 AM)
├── tr-continuous/ # Real-time curation (NEW)
│ ├── curator_by_count.py # Turn-based curator
│ ├── curator_turn_based.py # Alternative approach
│ ├── curator_cron.sh # Cron wrapper
│ ├── turn-curator.service # Systemd service
│ └── README.md # Documentation
└── shared/
└── (shared resources)
```
### New Files (2026-02-24 19:00)
| File | Purpose |
|------|---------|
| `tr-continuous/curator_timer.py` | Timer-based curator (deployed) |
| `tr-continuous/curator_config.json` | Curator settings |
| `tr-continuous/migrate_add_curated.py` | Migration script (completed) |
### Legacy Files (Pre-v2.2)
| File | Status | Note |
|------|--------|------|
| `tr-daily/curate_from_qdrant.py` | 📦 Archived | Replaced by timer |
| `tr-continuous/curator_by_count.py` | 📦 Archived | Replaced by timer |
| `tr-continuous/curator_turn_based.py` | 📦 Archived | Replaced by timer |
### System Locations
| File | Purpose |
|------|---------|
| `/root/.openclaw/extensions/memory-qdrant/` | Plugin code |
| `/root/.openclaw/openclaw.json` | Plugin configuration |
| `/etc/systemd/system/mem-qdrant-watcher.service` | Systemd service |
---
## 🔥 CRITICAL FIXES APPLIED (2026-02-25 12:00-12:41 CST)
### Issues Found & Fixed Today
| Issue | Root Cause | Fix Applied |
|-------|------------|-------------|
| **Watcher stuck on old session** | Watcher only checked for new sessions when current file deleted | ✅ Restarted watcher, now follows current session (12:22) |
| **Plugin capture 0 exchanges** | OpenClaw changed to OpenAI content format (array), plugin expected string | ✅ Added `extractMessageText()` to parse content arrays (12:34) |
| **Session switching logic** | Old sessions persisted, watcher never switched | ✅ Fixed session detection logic in watcher |
| **Plugin content extraction** | `msg.content` is now array with `{type, text}` items | ✅ Extracts text from `type: "text"` items |
### Validation Results (2026-02-25 12:41)
```
memory-qdrant: parsed 17 user, 116 assistant messages, 9 exchanges
memory-qdrant: first msg role=user, contentType=array
```
**Before:** 0 exchanges extracted
**After:** 9 exchanges captured per session
### Components Status
| Component | Before | After | Status |
|-----------|--------|-------|--------|
| Real-time watcher | Stuck on Feb 24 session | Following current session | ✅ Fixed |
| Plugin capture | 0 exchanges | 9 exchanges | ✅ Fixed |
| Context injection | Working | Still working | ✅ Verified |
### Files Modified (2026-02-25)
| File | Change |
|------|--------|
| `extensions/memory-qdrant/index.ts` | Added `extractMessageText()` function, removed debug logging |
| `extensions/memory-qdrant/index.js` | Compiled TypeScript changes |
| `session.md` | This update |
| `function_check.md` | Added fixes section |
---
## Changes Made Today (2026-02-24 19:00)
### 1. Timer Curator Deployed (v2.2)
- Created `curator_timer.py` — simplified timer-based curation
- Created `curator_config.json` — adjustable settings
- Removed daily 2:45 AM cron job
- Added `*/30 * * * *` cron timer
- **Status:** ✅ Deployed, logs to `/var/log/true-recall-timer.log`
### 2. Migration Completed
- Created `migrate_add_curated.py`
- Tagged 12,378 existing memories with `curated: false`
- Updated watcher to add `curated: false` to new memories
- **Status:** ✅ Complete
### 3. Simplified Architecture
- ❌ Removed turn-based curator complexity
- ❌ Removed daily batch processing
- ✅ Single timer trigger every 30 minutes
- ✅ No minimum threshold (processes 0-N memories)
---
## Configuration
### memory-qdrant Plugin
**File:** `/root/.openclaw/openclaw.json`
```json
{
"memory-qdrant": {
"config": {
"autoCapture": true,
"autoRecall": true,
"collectionName": "gems_tr",
"captureCollection": "memories_tr",
"embeddingModel": "snowflake-arctic-embed2",
"maxRecallResults": 2,
"minRecallScore": 0.7,
"ollamaUrl": "http://<OLLAMA_IP>:11434",
"qdrantUrl": "http://<QDRANT_IP>:6333"
},
"enabled": true
}
}
```
### Gateway (OpenClaw Update Fix)
```json
{
"gateway": {
"controlUi": {
"allowedOrigins": ["*"],
"allowInsecureAuth": false,
"dangerouslyDisableDeviceAuth": true
}
}
}
```
---
## Validation Commands
### Check Collections
```bash
# Points count
curl -s http://<QDRANT_IP>:6333/collections/memories_tr | jq '.result.points_count'
curl -s http://<QDRANT_IP>:6333/collections/gems_tr | jq '.result.points_count'
# Recent points
curl -s -X POST http://<QDRANT_IP>:6333/collections/memories_tr/points/scroll \
-H "Content-Type: application/json" \
-d '{"limit": 5, "with_payload": true}' | jq '.result.points[].payload.content'
```
### Check Services
```bash
# Watcher status
sudo systemctl status mem-qdrant-watcher
# Watcher logs
sudo journalctl -u mem-qdrant-watcher -n 20
# OpenClaw status
openclaw status
```
---
## Troubleshooting
### Issue: Watcher Not Capturing
**Check:**
1. Service running? `systemctl status mem-qdrant-watcher`
2. Logs: `journalctl -u mem-qdrant-watcher -f`
3. Qdrant accessible? `curl http://<QDRANT_IP>:6333/`
4. Ollama accessible? `curl http://<OLLAMA_IP>:11434/api/tags`
### Issue: Cleaner Fails
**Common causes:**
- Qdrant connection timeout (add `time.sleep(0.1)` between batches)
- Nested content dicts (handled in updated script)
- Type errors (non-string content — handled)
### Issue: Plugin Not Loading
**Check:**
1. `openclaw.json` syntax valid? `openclaw config validate`
2. Plugin compiled? `cd /root/.openclaw/extensions/memory-qdrant && npx tsc`
3. Gateway logs: `tail /tmp/openclaw/openclaw-$(date +%Y-%m-%d).log`
---
## Cron Schedule (Updated v2.2)
| Time | Job | Script | Status |
|------|-----|--------|--------|
| Every 30 min | Timer curator | `tr-continuous/curator_timer.py` | ✅ Active |
| Per turn | Capture | `mem-qdrant-watcher` | ✅ Daemon |
| Per turn | Injection | `memory-qdrant` plugin | ✅ Active |
**Removed:**
- ❌ 2:45 AM daily curator
- ❌ Every-minute turn curator check
---
## Next Steps
### Immediate
- ⏳ Monitor first timer run (logs: `/var/log/true-recall-timer.log`)
- ⏳ Validate gem extraction quality from timer curator
- ⏳ Archive old curator scripts if timer works
### Completed ✅
-**Compactor config** — Minimal overhead: `mode: default`, `reserveTokensFloor: 0`, `memoryFlush: false`
### Future
- ⏳ Curator tuning based on timer results
- ⏳ Silent compacting (requires OpenClaw core changes)
### Planned Features (Backlog)
-**Interactive install script** — Prompts for embedding model, timer interval, batch size, endpoints
-**Single embedding model option** — Use one model for both collections
-**Configurable thresholds** — Per-user customization via prompts
**Compactor Settings (Applied):**
```json5
{
agents: {
defaults: {
compaction: {
mode: "default",
reserveTokensFloor: 0,
memoryFlush: { enabled: false }
}
}
}
}
```
**Note:** Only `mode`, `reserveTokensFloor`, and `memoryFlush` are valid under `agents.defaults.compaction`. Other settings are Pi runtime parameters.
**Install script prompts:**
1. Embedding model (snowflake vs mxbai)
2. Timer interval (5 min / 30 min / hourly)
3. Batch size (50 / 100 / 500)
4. Qdrant/Ollama URLs
5. User ID
---
## Session Recovery
If starting fresh:
1. Read `README.md` for architecture overview
2. Check service status: `sudo systemctl status mem-qdrant-watcher`
3. Check timer curator: `tail /var/log/true-recall-timer.log`
4. Verify collections: `curl http://<QDRANT_IP>:6333/collections`
---
*Last Verified: 2026-02-24 19:29 CST*
*Version: v2.2 (30b curator, install script planned)*

View File

@@ -32,7 +32,7 @@ SCRIPT_DIR = Path(__file__).parent
DEFAULT_CONFIG = SCRIPT_DIR / "curator_config.json" DEFAULT_CONFIG = SCRIPT_DIR / "curator_config.json"
# Curator prompt path # Curator prompt path
CURATOR_PROMPT_PATH = Path("/root/.openclaw/workspace/.local_projects/true-recall-v2/curator-prompt.md") CURATOR_PROMPT_PATH = Path("~/.openclaw/workspace/.local_projects/true-recall-v2/curator-prompt.md")
def load_curator_prompt() -> str: def load_curator_prompt() -> str:
@@ -295,8 +295,8 @@ def main():
config = load_config(args.config) config = load_config(args.config)
qdrant_url = os.getenv("QDRANT_URL", "http://10.0.0.40:6333") qdrant_url = os.getenv("QDRANT_URL", "http://<QDRANT_IP>:6333")
ollama_url = os.getenv("OLLAMA_URL", "http://10.0.0.10:11434") ollama_url = os.getenv("OLLAMA_URL", "http://<OLLAMA_IP>:11434")
user_id = config.get("user_id", "rob") user_id = config.get("user_id", "rob")
source_collection = config.get("source_collection", "memories_tr") source_collection = config.get("source_collection", "memories_tr")

View File

@@ -0,0 +1,19 @@
[Unit]
Description=OpenClaw Real-Time Qdrant Memory Watcher
After=network.target
[Service]
Type=simple
User=<USER>
WorkingDirectory=<INSTALL_PATH>/tr-worker
Environment="QDRANT_URL=http://<QDRANT_IP>:6333"
Environment="QDRANT_COLLECTION=memories_tr"
Environment="OLLAMA_URL=http://<OLLAMA_IP>:11434"
Environment="EMBEDDING_MODEL=snowflake-arctic-embed2"
Environment="USER_ID=<USER_ID>"
ExecStart=/usr/bin/python3 <INSTALL_PATH>/tr-worker/realtime_qdrant_watcher.py --daemon
Restart=always
RestartSec=5
[Install]
WantedBy=multi-user.target

View File

@@ -0,0 +1,332 @@
#!/usr/bin/env python3
"""
Real-time Qdrant Watcher: Monitors OpenClaw session JSONL and stores to Qdrant instantly.
This daemon watches the active session file, embeds each conversation turn,
and stores directly to Qdrant memories_tr collection (real-time, no Redis).
Usage:
# Run as daemon
python3 realtime_qdrant_watcher.py --daemon
# Run once (process current session then exit)
python3 realtime_qdrant_watcher.py --once
# Test mode (print to stdout, don't write to Qdrant)
python3 realtime_qdrant_watcher.py --dry-run
Systemd service:
# Copy to /etc/systemd/system/mem-qdrant-watcher.service
# systemctl enable --now mem-qdrant-watcher
"""
import os
import sys
import json
import time
import signal
import hashlib
import argparse
import requests
from datetime import datetime, timezone
from pathlib import Path
from typing import Dict, Any, Optional, List
# Config - Set via environment variables or use placeholders
QDRANT_URL = os.getenv("QDRANT_URL", "http://<QDRANT_IP>:6333")
QDRANT_COLLECTION = os.getenv("QDRANT_COLLECTION", "memories_tr")
OLLAMA_URL = os.getenv("OLLAMA_URL", "http://<OLLAMA_IP>:11434")
EMBEDDING_MODEL = os.getenv("EMBEDDING_MODEL", "snowflake-arctic-embed2")
USER_ID = os.getenv("USER_ID", "<USER_ID>")
# Paths
SESSIONS_DIR = Path(os.getenv("SESSIONS_DIR", "~/.openclaw/agents/main/sessions")).expanduser()
# State
running = True
last_position = 0
current_file = None
turn_counter = 0
def signal_handler(signum, frame):
"""Handle shutdown gracefully."""
global running
print(f"\nReceived signal {signum}, shutting down...", file=sys.stderr)
running = False
def get_embedding(text: str) -> List[float]:
"""Get embedding vector from Ollama."""
try:
response = requests.post(
f"{OLLAMA_URL}/api/embeddings",
json={"model": EMBEDDING_MODEL, "prompt": text},
timeout=30
)
response.raise_for_status()
return response.json()["embedding"]
except Exception as e:
print(f"Error getting embedding: {e}", file=sys.stderr)
return None
def clean_content(text: str) -> str:
"""Clean content - remove metadata, markdown, keep only plain text."""
import re
# Remove metadata JSON blocks
text = re.sub(r'Conversation info \(untrusted metadata\):\s*```json\s*\{[\s\S]*?\}\s*```', '', text)
# Remove thinking tags
text = re.sub(r'\[thinking:[^\]]*\]', '', text)
# Remove timestamp lines
text = re.sub(r'\[\w{3} \d{4}-\d{2}-\d{2} \d{2}:\d{2} [A-Z]{3}\]', '', text)
# Remove markdown tables
text = re.sub(r'\|[^\n]*\|', '', text) # Table rows
text = re.sub(r'\|[-:]+\|', '', text) # Table separators
# Remove markdown formatting
text = re.sub(r'\*\*([^*]+)\*\*', r'\1', text) # Bold **text**
text = re.sub(r'\*([^*]+)\*', r'\1', text) # Italic *text*
text = re.sub(r'`([^`]+)`', r'\1', text) # Inline code `text`
text = re.sub(r'```[\s\S]*?```', '', text) # Code blocks
# Remove horizontal rules
text = re.sub(r'---+', '', text)
text = re.sub(r'\*\*\*+', '', text)
# Remove excess whitespace
text = re.sub(r'\n{3,}', '\n', text)
text = re.sub(r'[ \t]+', ' ', text) # Multiple spaces -> single
return text.strip()
def store_to_qdrant(turn: Dict[str, Any], dry_run: bool = False) -> bool:
"""Store a single turn to Qdrant with embedding."""
if dry_run:
print(f"[DRY RUN] Would store turn {turn['turn']} ({turn['role']}): {turn['content'][:60]}...")
return True
# Get embedding
vector = get_embedding(turn['content'])
if vector is None:
print(f"Failed to get embedding for turn {turn['turn']}", file=sys.stderr)
return False
# Prepare payload
payload = {
"user_id": turn.get('user_id', USER_ID),
"role": turn['role'],
"content": turn['content'],
"turn": turn['turn'],
"timestamp": turn.get('timestamp', datetime.now(timezone.utc).isoformat()),
"date": datetime.now(timezone.utc).strftime('%Y-%m-%d'),
"source": "realtime_watcher",
"curated": False
}
# Generate deterministic ID
turn_id = turn.get('turn', 0)
hash_bytes = hashlib.sha256(f"{USER_ID}:turn:{turn_id}:{datetime.now().strftime('%H%M%S')}".encode()).digest()[:8]
point_id = int.from_bytes(hash_bytes, byteorder='big') % (2**63)
# Store to Qdrant
try:
response = requests.put(
f"{QDRANT_URL}/collections/{QDRANT_COLLECTION}/points",
json={
"points": [{
"id": abs(point_id),
"vector": vector,
"payload": payload
}]
},
timeout=30
)
response.raise_for_status()
return True
except Exception as e:
print(f"Error writing to Qdrant: {e}", file=sys.stderr)
return False
def get_current_session_file():
"""Find the most recently modified session JSONL file."""
if not SESSIONS_DIR.exists():
return None
files = list(SESSIONS_DIR.glob("*.jsonl"))
if not files:
return None
return max(files, key=lambda p: p.stat().st_mtime)
def parse_turn(line: str, session_name: str) -> Optional[Dict[str, Any]]:
"""Parse a single JSONL line into a turn dict."""
global turn_counter
try:
entry = json.loads(line.strip())
except json.JSONDecodeError:
return None
# OpenClaw format: {"type": "message", "message": {...}}
if entry.get('type') != 'message' or 'message' not in entry:
return None
msg = entry['message']
role = msg.get('role')
# Skip tool results, system, developer messages
if role in ('toolResult', 'system', 'developer'):
return None
if role not in ('user', 'assistant'):
return None
# Extract content
content = ""
if isinstance(msg.get('content'), list):
for item in msg['content']:
if isinstance(item, dict) and 'text' in item:
content += item['text']
elif isinstance(msg.get('content'), str):
content = msg['content']
if not content:
return None
# Clean content
content = clean_content(content)
if not content or len(content) < 5:
return None
turn_counter += 1
return {
'turn': turn_counter,
'role': role,
'content': content[:2000], # Limit size
'timestamp': entry.get('timestamp', datetime.now(timezone.utc).isoformat()),
'user_id': USER_ID
}
def process_new_lines(f, session_name: str, dry_run: bool = False):
"""Process any new lines added to the file."""
global last_position
f.seek(last_position)
for line in f:
line = line.strip()
if not line:
continue
turn = parse_turn(line, session_name)
if turn:
if store_to_qdrant(turn, dry_run):
print(f"✅ Turn {turn['turn']} ({turn['role']}) → Qdrant")
last_position = f.tell()
def watch_session(session_file: Path, dry_run: bool = False):
"""Watch a specific session file for new lines."""
global last_position, turn_counter
session_name = session_file.name.replace('.jsonl', '')
print(f"Watching session: {session_file.name}")
try:
with open(session_file, 'r') as f:
for line in f:
turn_counter += 1
last_position = session_file.stat().st_size
print(f"Session has {turn_counter} existing turns, starting from position {last_position}")
except Exception as e:
print(f"Warning: Could not read existing turns: {e}", file=sys.stderr)
last_position = 0
with open(session_file, 'r') as f:
while running:
if not session_file.exists():
print("Session file removed, looking for new session...")
return None
process_new_lines(f, session_name, dry_run)
time.sleep(0.1)
return session_file
def watch_loop(dry_run: bool = False):
"""Main watch loop - handles session rotation."""
global current_file, turn_counter
while running:
session_file = get_current_session_file()
if session_file is None:
print("No active session found, waiting...")
time.sleep(1)
continue
if current_file != session_file:
print(f"\nNew session detected: {session_file.name}")
current_file = session_file
turn_counter = 0
last_position = 0
result = watch_session(session_file, dry_run)
if result is None:
current_file = None
time.sleep(0.5)
def main():
global USER_ID
parser = argparse.ArgumentParser(
description="Real-time OpenClaw session watcher → Qdrant"
)
parser.add_argument("--daemon", "-d", action="store_true", help="Run as daemon")
parser.add_argument("--once", "-o", action="store_true", help="Process once then exit")
parser.add_argument("--dry-run", "-n", action="store_true", help="Don't write to Qdrant")
parser.add_argument("--user-id", "-u", default=USER_ID, help=f"User ID (default: {USER_ID})")
args = parser.parse_args()
signal.signal(signal.SIGINT, signal_handler)
signal.signal(signal.SIGTERM, signal_handler)
if args.user_id:
USER_ID = args.user_id
print(f"🔍 Real-time Qdrant Watcher")
print(f"📍 Qdrant: {QDRANT_URL}/{QDRANT_COLLECTION}")
print(f"🧠 Ollama: {OLLAMA_URL}/{EMBEDDING_MODEL}")
print(f"👤 User: {USER_ID}")
print(f"📝 Sessions: {SESSIONS_DIR}")
print()
if args.once:
print("Running once...")
session_file = get_current_session_file()
if session_file:
watch_session(session_file, args.dry_run)
else:
print("No session found")
else:
print("Running as daemon (Ctrl+C to stop)...")
watch_loop(args.dry_run)
if __name__ == "__main__":
main()