feat: initial TrueRecall Base v1.0

Core components:
- Real-time memory capture daemon
- Qdrant memories_tr collection storage
- Systemd service for auto-start
- Configuration templates with placeholders

Features:
- Full conversation context capture
- Deduplication via content hashing
- User-tagged memories
- Compatible with Gems and Blocks addons
This commit is contained in:
root
2026-02-26 21:30:26 -06:00
commit 7a9a33c304
4 changed files with 447 additions and 0 deletions

117
README.md Normal file
View File

@@ -0,0 +1,117 @@
# TrueRecall Base (v1)
**Purpose:** Real-time memory capture → Qdrant `memories_tr`
**Status:** ✅ Standalone capture system
---
## Overview
TrueRecall Base is the **foundation**. It watches OpenClaw sessions in real-time and stores every turn to Qdrant's `memories_tr` collection.
This is **required** for both addons: **Gems** and **Blocks**.
**Features:**
- ✅ Full context conversations saved (not just summaries)
- ✅ Automatically stripped of markdown, tables, and extra characters
- ✅ Fully searchable via semantic + exact match
- ✅ Compatible with other AI tools and agents
**Base does NOT include:**
- ❌ Curation (gem extraction)
- ❌ Topic clustering (blocks)
- ❌ Injection (context recall)
**For those features, install an addon after base.**
---
## Three-Tier Architecture
```
true-recall-base (REQUIRED)
├── Core: Watcher daemon
└── Stores: memories_tr
├──▶ true-recall-gems (ADDON)
│ ├── Curator extracts gems → gems_tr
│ └── Plugin injects gems into prompts
└──▶ true-recall-blocks (ADDON)
├── Topic clustering → topic_blocks_tr
└── Contextual block retrieval
Note: Gems and Blocks are INDEPENDENT addons.
They both require Base, but don't work together.
Choose one: Gems OR Blocks (not both).
```
---
## Quick Start
### 1. Install
```bash
cd /root/.openclaw/workspace/.local_projects/true-recall-base
# Copy service file
sudo cp watcher/mem-qdrant-watcher.service /etc/systemd/system/
# Reload and start
sudo systemctl daemon-reload
sudo systemctl enable --now mem-qdrant-watcher
```
### 2. Verify
```bash
# Check service
sudo systemctl status mem-qdrant-watcher
# Check collection
curl -s http://10.0.0.40:6333/collections/memories_tr | jq '.result.points_count'
```
---
## Files
| File | Purpose |
|------|---------|
| `watcher/realtime_qdrant_watcher.py` | Capture daemon |
| `watcher/mem-qdrant-watcher.service` | Systemd service |
| `config.json` | v1 configuration |
---
## Configuration
Edit `config.json` or set environment variables:
| Variable | Default | Description |
|----------|---------|-------------|
| `QDRANT_URL` | `http://10.0.0.40:6333` | Qdrant endpoint |
| `OLLAMA_URL` | `http://10.0.0.10:11434` | Ollama endpoint |
| `EMBEDDING_MODEL` | `snowflake-arctic-embed2` | Embedding model |
| `USER_ID` | `rob` | User identifier |
---
## Next Step
Install **TrueRecall v2** for curation and injection:
```bash
# v2 adds:
# - Curator (extracts gems from memories)
# - Injection (recalls gems into context)
```
v2 requires v1 to be running first.
---
**Version:** 1.0
**Prerequisite for:** TrueRecall v2

12
config.json Normal file
View File

@@ -0,0 +1,12 @@
{
"version": "1.0",
"description": "TrueRecall v1 - Memory capture only",
"components": ["watcher"],
"collections": {
"memories": "memories_tr"
},
"qdrant_url": "http://<QDRANT_IP>:6333",
"ollama_url": "http://<OLLAMA_IP>:11434",
"embedding_model": "snowflake-arctic-embed2",
"user_id": "<USER_ID>"
}

View File

@@ -0,0 +1,19 @@
[Unit]
Description=TrueRecall v1 - Real-Time Memory Watcher
After=network.target
[Service]
Type=simple
User=<USER>
WorkingDirectory=<INSTALL_PATH>/true-recall-v1/watcher
Environment="QDRANT_URL=http://<QDRANT_IP>:6333"
Environment="QDRANT_COLLECTION=memories_tr"
Environment="OLLAMA_URL=http://<OLLAMA_IP>:11434"
Environment="EMBEDDING_MODEL=snowflake-arctic-embed2"
Environment="USER_ID=<USER_ID>"
ExecStart=/usr/bin/python3 <INSTALL_PATH>/true-recall-v1/watcher/realtime_qdrant_watcher.py --daemon
Restart=always
RestartSec=5
[Install]
WantedBy=multi-user.target

View File

@@ -0,0 +1,299 @@
#!/usr/bin/env python3
"""
TrueRecall v1 - Real-time Qdrant Watcher
Monitors OpenClaw sessions and stores to memories_tr instantly.
This is the CAPTURE component. For curation and injection, install v2.
"""
import os
import sys
import json
import time
import signal
import hashlib
import argparse
import requests
from datetime import datetime, timezone
from pathlib import Path
from typing import Dict, Any, Optional, List
# Config - EDIT THESE for your environment
QDRANT_URL = os.getenv("QDRANT_URL", "http://<QDRANT_IP>:6333")
QDRANT_COLLECTION = os.getenv("QDRANT_COLLECTION", "memories_tr")
OLLAMA_URL = os.getenv("OLLAMA_URL", "http://<OLLAMA_IP>:11434")
EMBEDDING_MODEL = os.getenv("EMBEDDING_MODEL", "snowflake-arctic-embed2")
USER_ID = os.getenv("USER_ID", "<USER_ID>")
# Paths - EDIT for your environment
SESSIONS_DIR = Path("~/.openclaw/agents/main/sessions").expanduser()
# State
running = True
last_position = 0
current_file = None
turn_counter = 0
def signal_handler(signum, frame):
global running
print(f"\nReceived signal {signum}, shutting down...", file=sys.stderr)
running = False
def get_embedding(text: str) -> List[float]:
try:
response = requests.post(
f"{OLLAMA_URL}/api/embeddings",
json={"model": EMBEDDING_MODEL, "prompt": text},
timeout=30
)
response.raise_for_status()
return response.json()["embedding"]
except Exception as e:
print(f"Error getting embedding: {e}", file=sys.stderr)
return None
def clean_content(text: str) -> str:
import re
# Remove metadata JSON blocks
text = re.sub(r'Conversation info \(untrusted metadata\):\s*```json\s*\{[\s\S]*?\}\s*```', '', text)
# Remove thinking tags
text = re.sub(r'\[thinking:[^\]]*\]', '', text)
# Remove timestamp lines
text = re.sub(r'\[\w{3} \d{4}-\d{2}-\d{2} \d{2}:\d{2} [A-Z]{3}\]', '', text)
# Remove markdown tables
text = re.sub(r'\|[^\n]*\|', '', text)
text = re.sub(r'\|[-:]+\|', '', text)
# Remove markdown formatting
text = re.sub(r'\*\*([^*]+)\*\*', r'\1', text)
text = re.sub(r'\*([^*]+)\*', r'\1', text)
text = re.sub(r'`([^`]+)`', r'\1', text)
text = re.sub(r'```[\s\S]*?```', '', text)
# Remove horizontal rules
text = re.sub(r'---+', '', text)
text = re.sub(r'\*\*\*+', '', text)
# Remove excess whitespace
text = re.sub(r'\n{3,}', '\n', text)
text = re.sub(r'[ \t]+', ' ', text)
return text.strip()
def store_to_qdrant(turn: Dict[str, Any], dry_run: bool = False) -> bool:
if dry_run:
print(f"[DRY RUN] Would store turn {turn['turn']} ({turn['role']}): {turn['content'][:60]}...")
return True
vector = get_embedding(turn['content'])
if vector is None:
print(f"Failed to get embedding for turn {turn['turn']}", file=sys.stderr)
return False
payload = {
"user_id": turn.get('user_id', USER_ID),
"role": turn['role'],
"content": turn['content'],
"turn": turn['turn'],
"timestamp": turn.get('timestamp', datetime.now(timezone.utc).isoformat()),
"date": datetime.now(timezone.utc).strftime('%Y-%m-%d'),
"source": "true-recall-base",
"curated": False
}
# Generate deterministic ID
turn_id = turn.get('turn', 0)
hash_bytes = hashlib.sha256(f"{USER_ID}:turn:{turn_id}:{datetime.now().strftime('%H%M%S')}".encode()).digest()[:8]
point_id = int.from_bytes(hash_bytes, byteorder='big') % (2**63)
try:
response = requests.put(
f"{QDRANT_URL}/collections/{QDRANT_COLLECTION}/points",
json={
"points": [{
"id": abs(point_id),
"vector": vector,
"payload": payload
}]
},
timeout=30
)
response.raise_for_status()
return True
except Exception as e:
print(f"Error writing to Qdrant: {e}", file=sys.stderr)
return False
def get_current_session_file():
if not SESSIONS_DIR.exists():
return None
files = list(SESSIONS_DIR.glob("*.jsonl"))
if not files:
return None
return max(files, key=lambda p: p.stat().st_mtime)
def parse_turn(line: str, session_name: str) -> Optional[Dict[str, Any]]:
global turn_counter
try:
entry = json.loads(line.strip())
except json.JSONDecodeError:
return None
if entry.get('type') != 'message' or 'message' not in entry:
return None
msg = entry['message']
role = msg.get('role')
if role in ('toolResult', 'system', 'developer'):
return None
if role not in ('user', 'assistant'):
return None
content = ""
if isinstance(msg.get('content'), list):
for item in msg['content']:
if isinstance(item, dict) and 'text' in item:
content += item['text']
elif isinstance(msg.get('content'), str):
content = msg['content']
if not content:
return None
content = clean_content(content)
if not content or len(content) < 5:
return None
turn_counter += 1
return {
'turn': turn_counter,
'role': role,
'content': content[:2000],
'timestamp': entry.get('timestamp', datetime.now(timezone.utc).isoformat()),
'user_id': USER_ID
}
def process_new_lines(f, session_name: str, dry_run: bool = False):
global last_position
f.seek(last_position)
for line in f:
line = line.strip()
if not line:
continue
turn = parse_turn(line, session_name)
if turn:
if store_to_qdrant(turn, dry_run):
print(f"✅ Turn {turn['turn']} ({turn['role']}) → Qdrant")
last_position = f.tell()
def watch_session(session_file: Path, dry_run: bool = False):
global last_position, turn_counter
session_name = session_file.name.replace('.jsonl', '')
print(f"Watching session: {session_file.name}")
try:
with open(session_file, 'r') as f:
for line in f:
turn_counter += 1
last_position = session_file.stat().st_size
print(f"Session has {turn_counter} existing turns, starting from position {last_position}")
except Exception as e:
print(f"Warning: Could not read existing turns: {e}", file=sys.stderr)
last_position = 0
with open(session_file, 'r') as f:
while running:
if not session_file.exists():
print("Session file removed, looking for new session...")
return None
process_new_lines(f, session_name, dry_run)
time.sleep(0.1)
return session_file
def watch_loop(dry_run: bool = False):
global current_file, turn_counter
while running:
session_file = get_current_session_file()
if session_file is None:
print("No active session found, waiting...")
time.sleep(1)
continue
if current_file != session_file:
print(f"\nNew session detected: {session_file.name}")
current_file = session_file
turn_counter = 0
last_position = 0
result = watch_session(session_file, dry_run)
if result is None:
current_file = None
time.sleep(0.5)
def main():
global USER_ID
parser = argparse.ArgumentParser(description="TrueRecall v1 - Real-time Memory Capture")
parser.add_argument("--daemon", "-d", action="store_true", help="Run as daemon")
parser.add_argument("--once", "-o", action="store_true", help="Process once then exit")
parser.add_argument("--dry-run", "-n", action="store_true", help="Don't write to Qdrant")
parser.add_argument("--user-id", "-u", default=USER_ID, help=f"User ID (default: {USER_ID})")
args = parser.parse_args()
signal.signal(signal.SIGINT, signal_handler)
signal.signal(signal.SIGTERM, signal_handler)
if args.user_id:
USER_ID = args.user_id
print(f"🔍 TrueRecall v1 - Real-time Memory Capture")
print(f"📍 Qdrant: {QDRANT_URL}/{QDRANT_COLLECTION}")
print(f"🧠 Ollama: {OLLAMA_URL}/{EMBEDDING_MODEL}")
print(f"👤 User: {USER_ID}")
print()
if args.once:
print("Running once...")
session_file = get_current_session_file()
if session_file:
watch_session(session_file, args.dry_run)
else:
print("No session found")
else:
print("Running as daemon (Ctrl+C to stop)...")
watch_loop(args.dry_run)
if __name__ == "__main__":
main()