5 Commits

Author SHA1 Message Date
root
5a83ee0c6c fix: add PYTHONUNBUFFERED=1 for real-time logging
- Python buffers stdout when running as systemd service (no TTY)
- This prevented journalctl from showing real-time turn captures
- Added Environment='PYTHONUNBUFFERED=1' to disable buffering
- Now shows ' Turn N (role) → Qdrant' in real-time via journalctl -f

Fixes issue where watcher captured turns but only logged on restart.
2026-03-13 13:13:40 -05:00
root
f21c4db0f8 feat: add v1.3 validation script 2026-03-10 12:14:17 -05:00
root
e2d485b4af release v1.3: fix crash loop (FileNotFoundError), add content chunking
ERROR FIXED:
- FileNotFoundError on deleted session files caused 2,551 restarts/24h
- Embedding token overflow for long messages (exceeded 4K token limit)

ROOT CAUSES:
1. File opened before existence check - crash when OpenClaw deletes session files
2. No content truncation for embedding - long messages failed silently

FIXES:
1. Added pre-check for file existence before open()
2. Added try/catch for FileNotFoundError
3. Added chunk_text() for long content (6000 char chunks with 200 char overlap)
4. Each chunk stored separately with metadata (chunk_index, total_chunks)

IMPACT:
- Eliminates crash loop
- No memory loss for long messages
- Graceful recovery on session rotation
2026-03-10 12:11:50 -05:00
root
b71a16891d fix: watcher crash on deleted session files, add chunking for long content 2026-03-10 12:09:42 -05:00
root
24023279ab fix: update references to openclaw-true-recall-blocks, add upgrade docs, add backfill script 2026-03-10 12:05:58 -05:00
8 changed files with 732 additions and 50 deletions

127
CHANGELOG.md Normal file
View File

@@ -0,0 +1,127 @@
# Changelog - openclaw-true-recall-base
All notable changes to this project will be documented in this file.
## [v1.4] - 2026-03-13
### Fixed
#### Real-Time Logging Not Visible in journalctl
**Error:** Watcher was capturing turns to Qdrant but `journalctl -u mem-qdrant-watcher -f` showed no output between restarts.
**Root Cause:**
- Python buffers stdout when not connected to a TTY (systemd service)
- `print()` statements in the watcher were buffered, not flushed
- Logs only appeared on service restart when buffer was flushed
**Impact:** Impossible to monitor real-time capture status, difficult to debug
**Fix:**
- Added `Environment="PYTHONUNBUFFERED=1"` to systemd service file
- This disables Python's stdout buffering, forcing immediate flush
**Changed Files:**
- `watcher/mem-qdrant-watcher.service` - Added PYTHONUNBUFFERED environment variable
**Validation:**
```bash
journalctl -u mem-qdrant-watcher -f
# Now shows: ✅ Turn 170 (assistant) → Qdrant (in real-time)
```
---
## [v1.3] - 2026-03-10
### Fixed
#### Critical: Crash Loop on Deleted Session Files
**Error:** `FileNotFoundError: [Errno 2] No such file or directory: '/root/.openclaw/agents/main/sessions/daccff90-f889-44fa-ba8b-c8d7397e5241.jsonl'`
**Root Cause:**
- OpenClaw deletes session `.jsonl` files when `/new` or `/reset` is called
- The watcher opened the file before checking existence
- Between file detection and opening, the file was deleted
- This caused unhandled `FileNotFoundError` → crash → systemd restart
**Impact:** 2,551 restarts in 24 hours
**Original Code (v1.2):**
```python
# Track file handle for re-opening
f = open(session_file, 'r') # CRASH HERE if file deleted
f.seek(last_position)
try:
while running:
if not session_file.exists(): # Check happens AFTER crash
...
```
**Fix (v1.3):**
```python
# Check file exists before opening (handles deleted sessions)
if not session_file.exists():
print(f"Session file gone: {session_file.name}, looking for new session...", file=sys.stderr)
return None
# Track file handle for re-opening
try:
f = open(session_file, 'r')
f.seek(last_position)
except FileNotFoundError:
print(f"Session file removed during open: {session_file.name}", file=sys.stderr)
return None
```
#### Embedding Token Overflow
**Error:** `Ollama API error 400: {"StatusCode":400,"Status":"400 Bad Request","error":"prompt too long; exceeded max context length by 4 tokens"}`
**Root Cause:**
- The embedding model `snowflake-arctic-embed2` has a 4,096 token limit (~16K chars)
- Long messages were sent to embedding without truncation
- The watcher's `get_embedding()` call passed full `turn['content']`
**Impact:** Failed embedding generation, memory loss for long messages
**Fix:**
- Added `chunk_text()` function to split long content into 6,000 char overlapping chunks
- Each chunk gets its own Qdrant point with `chunk_index` and `total_chunks` metadata
- Overlap (200 chars) ensures search continuity
- No data loss - all content stored
### Changed
- `store_to_qdrant()` now handles multiple chunks per turn
- Each chunk stored with metadata: `chunk_index`, `total_chunks`, `full_content_length`
---
## [v1.2] - 2026-02-26
### Fixed
- Session rotation bug - added inactivity detection (30s threshold)
- Improved file scoring to properly detect new sessions on `/new` or `/reset`
---
## [v1.1] - 2026-02-25
### Added
- 1-second mtime polling for session rotation
---
## [v1.0] - 2026-02-24
### Added
- Initial release
- Real-time monitoring of OpenClaw sessions
- Automatic embedding via local Ollama (snowflake-arctic-embed2)
- Storage to Qdrant `memories_tr` collection

107
README.md
View File

@@ -61,7 +61,7 @@ true-recall-base (REQUIRED)
│ ├── Curator extracts gems → gems_tr
│ └── Plugin injects gems into prompts
└──▶ true-recall-blocks (ADDON)
└──▶ openclaw-true-recall-blocks (ADDON)
├── Topic clustering → topic_blocks_tr
└── Contextual block retrieval
@@ -654,3 +654,108 @@ curl -s "http://10.0.0.40:6333/collections/memories_tr/points/scroll" \
---
**Prerequisite for:** TrueRecall Gems, TrueRecall Blocks
---
## Upgrading from Older Versions
This section covers full upgrades from older TrueRecall Base installations to the current version.
### Version History
| Version | Key Changes |
|---------|-------------|
| **v1.0** | Initial release - basic watcher |
| **v1.1** | Session detection improvements |
| **v1.2** | Priority-based session detection, lock file validation, backfill script |
| **v1.3** | Offset persistence (resumes from last position), fixes duplicate processing |
| **v1.4** | Current version - Memory backfill fix (Qdrant ids field), improved error handling |
### Upgrade Paths
#### From v1.0/v1.1/v1.2 → v1.4 (Current)
If you have an older installation, follow these steps:
```bash
# Step 1: Backup existing configuration
cp /root/.openclaw/workspace/skills/qdrant-memory/scripts/realtime_qdrant_watcher.py /root/.openclaw/workspace/skills/qdrant-memory/scripts/realtime_qdrant_watcher.py.bak.$(date +%Y%m%d)
cp /root/.openclaw/workspace/skills/qdrant-memory/scripts/config.json /root/.openclaw/workspace/skills/qdrant-memory/scripts/config.json.bak.$(date +%Y%m%d)
```
```bash
# Step 2: Stop the watcher
pkill -f realtime_qdrant_watcher
# Verify stopped
ps aux | grep realtime_qdrant_watcher
```
```bash
# Step 3: Download latest files (choose one source)
# Option A: From GitLab (recommended)
curl -o /root/.openclaw/workspace/skills/qdrant-memory/scripts/realtime_qdrant_watcher.py https://gitlab.com/mdkrush/openclaw-true-recall-base/-/raw/master/watcher/realtime_qdrant_watcher.py
# Option B: From Gitea
curl -o /root/.openclaw/workspace/skills/qdrant-memory/scripts/realtime_qdrant_watcher.py http://10.0.0.61:3000/SpeedyFoxAi/openclaw-true-recall-base/raw/branch/master/watcher/realtime_qdrant_watcher.py
# Option C: From local clone (if you cloned the repo)
cp /path/to/openclaw-true-recall-base/watcher/realtime_qdrant_watcher.py /root/.openclaw/workspace/skills/qdrant-memory/scripts/
```
```bash
# Step 4: Start the watcher
python3 /root/.openclaw/workspace/skills/qdrant-memory/scripts/realtime_qdrant_watcher.py --daemon
```
```bash
# Step 5: Verify installation
ps aux | grep realtime_qdrant_watcher
curl -s "http://10.0.0.40:6333/collections/memories_tr/points/scroll" -H "Content-Type: application/json" -d '{"limit": 3}' | jq '.result.points[0].payload.timestamp'
```
### Upgrading with Git (If You Cloned the Repository)
```bash
# Navigate to your clone
cd /path/to/openclaw-true-recall-base
git pull origin master
# Stop current watcher
pkill -f realtime_qdrant_watcher
# Copy updated files to OpenClaw
cp watcher/realtime_qdrant_watcher.py /root/.openclaw/workspace/skills/qdrant-memory/scripts/
cp scripts/backfill_memory.py /root/.openclaw/workspace/skills/qdrant-memory/scripts/
# Restart the watcher
python3 /root/.openclaw/workspace/skills/qdrant-memory/scripts/realtime_qdrant_watcher.py --daemon
# Verify
ps aux | grep realtime_qdrant_watcher
```
### Backfilling Historical Memories (Optional)
```bash
python3 /root/.openclaw/workspace/skills/qdrant-memory/scripts/backfill_memory.py
```
### Verifying Your Upgrade
```bash
# 1. Check watcher is running
ps aux | grep realtime_qdrant_watcher
# 2. Verify source is "true-recall-base"
curl -s "http://10.0.0.40:6333/collections/memories_tr/points/scroll" -H "Content-Type: application/json" -d '{"limit": 1}' | jq '.result.points[0].payload.source'
# 3. Check date coverage
curl -s "http://10.0.0.40:6333/collections/memories_tr/points/scroll" -H "Content-Type: application/json" -d '{"limit": 10000}' | jq '[.result.points[].payload.date] | unique | sort'
```
Expected output:
- Source: `"true-recall-base"`
- Dates: Array from oldest to newest memory

View File

@@ -96,3 +96,32 @@ echo " curl -s http://$QDRANT_IP/collections/memories_tr | jq '.result.points_c
echo ""
echo "View logs:"
echo " sudo journalctl -u mem-qdrant-watcher -f"
echo ""
echo "=========================================="
echo "UPGRADING FROM OLDER VERSION"
echo "=========================================="
echo ""
echo "If you already have TrueRecall Base installed:"
echo ""
echo "1. Stop the watcher:"
echo " pkill -f realtime_qdrant_watcher"
echo ""
echo "2. Backup current files:"
echo " cp /root/.openclaw/workspace/skills/qdrant-memory/scripts/realtime_qdrant_watcher.py \"
echo " /root/.openclaw/workspace/skills/qdrant-memory/scripts/realtime_qdrant_watcher.py.bak"
echo ""
echo "3. Copy updated files:"
echo " cp watcher/realtime_qdrant_watcher.py \"
echo " /root/.openclaw/workspace/skills/qdrant-memory/scripts/"
echo " cp scripts/backfill_memory.py \"
echo " /root/.openclaw/workspace/skills/qdrant-memory/scripts/"
echo ""
echo "4. Restart watcher:"
echo " python3 /root/.openclaw/workspace/skills/qdrant-memory/scripts/realtime_qdrant_watcher.py --daemon"
echo ""
echo "5. Verify:"
echo " ps aux | grep realtime_qdrant_watcher"
echo ""
echo "For full upgrade instructions, see README.md"

View File

@@ -0,0 +1,67 @@
#!/usr/bin/env python3
"""Backfill memory files to Qdrant memories_tr collection."""
import os
import json
from datetime import datetime
QDRANT_URL = "http://10.0.0.40:6333"
MEMORY_DIR = "/root/.openclaw/workspace/memory"
def get_memory_files():
"""Get all memory files sorted by date."""
files = []
for f in os.listdir(MEMORY_DIR):
if f.startswith("2026-") and f.endswith(".md"):
date = f.replace(".md", "")
files.append((date, f))
return sorted(files, key=lambda x: x[0])
def backfill_file(date, filename):
"""Backfill a single memory file to Qdrant."""
filepath = os.path.join(MEMORY_DIR, filename)
with open(filepath, 'r') as f:
content = f.read()
# Truncate if too long for payload
payload = {
"content": content[:50000], # Limit size
"date": date,
"source": "memory_file",
"curated": False,
"role": "system",
"user_id": "rob"
}
# Add to Qdrant
import requests
point_id = hash(f"memory_{date}") % 10000000000
resp = requests.post(
f"{QDRANT_URL}/collections/memories_tr/points",
json={
"points": [{
"id": point_id,
"payload": payload
}],
"ids": [point_id]
}
)
return resp.status_code == 200
def main():
files = get_memory_files()
print(f"Found {len(files)} memory files to backfill")
count = 0
for date, filename in files:
print(f"Backfilling {filename}...", end=" ")
if backfill_file(date, filename):
print("")
count += 1
else:
print("")
print(f"\nBackfilled {count}/{len(files)} files")
if __name__ == "__main__":
main()

View File

@@ -19,7 +19,7 @@ true-recall-base (REQUIRED FOUNDATION)
│ ├── Curator extracts atomic gems
│ └── Plugin injects gems as context
└──▶ true-recall-blocks (OPTIONAL ADDON)
└──▶ openclaw-true-recall-blocks (OPTIONAL ADDON)
├── Topic clustering
└── Block-based retrieval
```
@@ -82,4 +82,4 @@ curl -s http://10.0.0.40:6333/collections/memories_tr | jq '.result.points_count
---
*Next: Install true-recall-gems OR true-recall-blocks (not both)*
*Next: Install true-recall-gems OR openclaw-true-recall-blocks (not both)*

247
validate_v1.3.sh Executable file
View File

@@ -0,0 +1,247 @@
#!/bin/bash
# Validation Script for openclaw-true-recall-base v1.3
# Tests all fixes and changes from v1.2 → v1.3
set -e
echo "╔══════════════════════════════════════════════════════════════════════════╗"
echo "║ TrueRecall Base v1.3 Validation Script ║"
echo "╚══════════════════════════════════════════════════════════════════════════╝"
echo ""
PASS=0
FAIL=0
WARN=0
check_pass() { echo "$1"; ((PASS++)); }
check_fail() { echo "$1"; ((FAIL++)); }
check_warn() { echo "⚠️ $1"; ((WARN++)); }
# ════════════════════════════════════════════════════════════════════════════
# SECTION 1: File Structure
# ════════════════════════════════════════════════════════════════════════════
echo ""
echo "═════════════════════════════════════════════════════════════════════════"
echo "SECTION 1: File Structure"
echo "═════════════════════════════════════════════════════════════════════════"
PROJECT_DIR="$(dirname "$0")"
if [ -f "$PROJECT_DIR/CHANGELOG.md" ]; then
check_pass "CHANGELOG.md exists"
else
check_fail "CHANGELOG.md missing"
fi
if [ -f "$PROJECT_DIR/watcher/realtime_qdrant_watcher.py" ]; then
check_pass "realtime_qdrant_watcher.py exists"
else
check_fail "realtime_qdrant_watcher.py missing"
fi
# Check version in file
VERSION=$(grep -m1 "TrueRecall v" "$PROJECT_DIR/watcher/realtime_qdrant_watcher.py" | grep -oE "v[0-9]+\.[0-9]+")
if [ "$VERSION" = "v1.3" ]; then
check_pass "Version is v1.3"
else
check_fail "Version mismatch: expected v1.3, got $VERSION"
fi
# ════════════════════════════════════════════════════════════════════════════
# SECTION 2: Code Changes (v1.3 Fixes)
# ════════════════════════════════════════════════════════════════════════════
echo ""
echo "═════════════════════════════════════════════════════════════════════════"
echo "SECTION 2: Code Changes (v1.3 Fixes)"
echo "═════════════════════════════════════════════════════════════════════════"
# Fix 1: FileNotFoundError check
if grep -q "if not session_file.exists():" "$PROJECT_DIR/watcher/realtime_qdrant_watcher.py"; then
check_pass "FileNotFoundError fix: Pre-check exists before open()"
else
check_fail "FileNotFoundError fix MISSING: No session_file.exists() check"
fi
if grep -q "except FileNotFoundError:" "$PROJECT_DIR/watcher/realtime_qdrant_watcher.py"; then
check_pass "FileNotFoundError fix: Exception handler present"
else
check_fail "FileNotFoundError fix MISSING: No FileNotFoundError exception handler"
fi
# Fix 2: Chunking for long content
if grep -q "def chunk_text" "$PROJECT_DIR/watcher/realtime_qdrant_watcher.py"; then
check_pass "Chunking fix: chunk_text() function defined"
else
check_fail "Chunking fix MISSING: No chunk_text() function"
fi
if grep -q "chunk_text_content" "$PROJECT_DIR/watcher/realtime_qdrant_watcher.py"; then
check_pass "Chunking fix: chunk_text_content used in store_to_qdrant()"
else
check_fail "Chunking fix MISSING: Not using chunked content"
fi
if grep -q "chunk_index" "$PROJECT_DIR/watcher/realtime_qdrant_watcher.py"; then
check_pass "Chunking fix: chunk_index metadata added"
else
check_fail "Chunking fix MISSING: No chunk_index metadata"
fi
# ════════════════════════════════════════════════════════════════════════════
# SECTION 3: Service Status
# ════════════════════════════════════════════════════════════════════════════
echo ""
echo "═════════════════════════════════════════════════════════════════════════"
echo "SECTION 3: Service Status"
echo "═════════════════════════════════════════════════════════════════════════"
if systemctl is-active --quiet mem-qdrant-watcher 2>/dev/null; then
check_pass "mem-qdrant-watcher service is running"
else
check_warn "mem-qdrant-watcher service not running (may be running in daemon mode)"
fi
# Check for running watcher process
if pgrep -f "realtime_qdrant_watcher" > /dev/null; then
check_pass "realtime_qdrant_watcher process is running"
else
check_fail "realtime_qdrant_watcher process NOT running"
fi
# ════════════════════════════════════════════════════════════════════════════
# SECTION 4: Connectivity
# ════════════════════════════════════════════════════════════════════════════
echo ""
echo "═════════════════════════════════════════════════════════════════════════"
echo "SECTION 4: Connectivity"
echo "═════════════════════════════════════════════════════════════════════════"
# Qdrant
QDRANT_URL="${QDRANT_URL:-http://10.0.0.40:6333}"
if curl -s -o /dev/null -w "%{http_code}" "$QDRANT_URL/collections/memories_tr" | grep -q "200"; then
check_pass "Qdrant memories_tr collection reachable"
else
check_fail "Qdrant memories_tr collection NOT reachable"
fi
# Ollama (local)
if curl -s -o /dev/null -w "%{http_code}" "http://localhost:11434/api/tags" | grep -q "200"; then
check_pass "Ollama (localhost) reachable"
else
check_fail "Ollama (localhost) NOT reachable"
fi
# Check embedding model
if curl -s "http://localhost:11434/api/tags" | grep -q "snowflake-arctic-embed2"; then
check_pass "Embedding model snowflake-arctic-embed2 available"
else
check_fail "Embedding model snowflake-arctic-embed2 NOT available"
fi
# ════════════════════════════════════════════════════════════════════════════
# SECTION 5: Crash Loop Test
# ════════════════════════════════════════════════════════════════════════════
echo ""
echo "═════════════════════════════════════════════════════════════════════════"
echo "SECTION 5: Crash Loop Test (Last 1 Hour)"
echo "═════════════════════════════════════════════════════════════════════════"
RESTARTS=$(journalctl -u mem-qdrant-watcher --since "1 hour ago" --no-pager 2>/dev/null | grep -c "Started mem-qdrant-watcher" || echo "0")
if [ "$RESTARTS" -le 2 ]; then
check_pass "Restarts in last hour: $RESTARTS (expected ≤2)"
else
check_fail "Restarts in last hour: $RESTARTS (too many, expected ≤2)"
fi
# Check for FileNotFoundError in logs
ERRORS=$(journalctl -u mem-qdrant-watcher --since "1 hour ago" --no-pager 2>/dev/null | grep -c "FileNotFoundError" || echo "0")
if [ "$ERRORS" -eq 0 ]; then
check_pass "No FileNotFoundError in last hour"
else
check_fail "FileNotFoundError found $ERRORS times in last hour"
fi
# ════════════════════════════════════════════════════════════════════════════
# SECTION 6: Chunking Test
# ════════════════════════════════════════════════════════════════════════════
echo ""
echo "═════════════════════════════════════════════════════════════════════════"
echo "SECTION 6: Chunking Test"
echo "═════════════════════════════════════════════════════════════════════════"
# Test chunking with Python
python3 -c "
import sys
sys.path.insert(0, '$PROJECT_DIR/watcher')
# Import chunk_text function
exec(open('$PROJECT_DIR/watcher/realtime_qdrant_watcher.py').read().split('def chunk_text')[1].split('def store_to_qdrant')[0])
# Test with long content
test_content = 'A' * 10000
chunks = chunk_text(test_content, max_chars=6000, overlap=200)
if len(chunks) > 1:
print(f'PASS: chunk_text splits 10000 chars into {len(chunks)} chunks')
sys.exit(0)
else:
print(f'FAIL: chunk_text returned {len(chunks)} chunks for 10000 chars')
sys.exit(1)
" 2>/dev/null && check_pass "chunk_text() splits long content correctly" || check_fail "chunk_text() test failed"
# ════════════════════════════════════════════════════════════════════════════
# SECTION 7: Git Status
# ════════════════════════════════════════════════════════════════════════════
echo ""
echo "═════════════════════════════════════════════════════════════════════════"
echo "SECTION 7: Git Status"
echo "═════════════════════════════════════════════════════════════════════════"
cd "$PROJECT_DIR"
# Check for v1.3 tag
if git tag -l | grep -q "v1.3"; then
check_pass "Git tag v1.3 exists"
else
check_fail "Git tag v1.3 missing"
fi
# Check CHANGELOG.md committed
if git log --oneline -1 | grep -q "v1.3"; then
check_pass "v1.3 commit in git log"
else
check_fail "v1.3 commit not found in git log"
fi
# Check for uncommitted changes
UNCOMMITTED=$(git status --short 2>/dev/null | wc -l)
if [ "$UNCOMMITTED" -eq 0 ]; then
check_pass "No uncommitted changes"
else
check_warn "$UNCOMMITTED uncommitted files"
fi
# ════════════════════════════════════════════════════════════════════════════
# SUMMARY
# ════════════════════════════════════════════════════════════════════════════
echo ""
echo "═════════════════════════════════════════════════════════════════════════"
echo "VALIDATION SUMMARY"
echo "═════════════════════════════════════════════════════════════════════════"
echo ""
echo "✅ Passed: $PASS"
echo "❌ Failed: $FAIL"
echo "⚠️ Warnings: $WARN"
echo ""
if [ $FAIL -eq 0 ]; then
echo "╔══════════════════════════════════════════════════════════════════════════╗"
echo "║ ✅ ALL VALIDATIONS PASSED - v1.3 READY FOR PRODUCTION ║"
echo "╚══════════════════════════════════════════════════════════════════════════╝"
exit 0
else
echo "╔══════════════════════════════════════════════════════════════════════════╗"
echo "║ ❌ VALIDATION FAILED - $FAIL ISSUE(S) NEED ATTENTION ║"
echo "╚══════════════════════════════════════════════════════════════════════════╝"
exit 1
fi

View File

@@ -11,6 +11,7 @@ Environment="QDRANT_COLLECTION=memories_tr"
Environment="OLLAMA_URL=http://localhost:11434"
Environment="EMBEDDING_MODEL=snowflake-arctic-embed2"
Environment="USER_ID=rob"
Environment="PYTHONUNBUFFERED=1"
ExecStart=/usr/bin/python3 /root/.openclaw/workspace/.local_projects/openclaw-true-recall-base/watcher/realtime_qdrant_watcher.py --daemon
Restart=always
RestartSec=5

View File

@@ -1,11 +1,14 @@
#!/usr/bin/env python3
"""
TrueRecall v1.2 - Real-time Qdrant Watcher
TrueRecall v1.3 - Real-time Qdrant Watcher
Monitors OpenClaw sessions and stores to memories_tr instantly.
This is the CAPTURE component. For curation and injection, install v2.
Changelog:
- v1.3: Fixed crash loop (2551 restarts/24h) from FileNotFoundError on deleted session files.
Added chunking for long content (6000 char chunks) to prevent embedding token overflow.
Improved error handling for session file lifecycle.
- v1.2: Fixed session rotation bug - added inactivity detection (30s threshold)
and improved file scoring to properly detect new sessions on /new or /reset
- v1.1: Added 1-second mtime polling for session rotation
@@ -27,7 +30,7 @@ from typing import Dict, Any, Optional, List
# Config
QDRANT_URL = os.getenv("QDRANT_URL", "http://10.0.0.40:6333")
QDRANT_COLLECTION = os.getenv("QDRANT_COLLECTION", "memories_tr")
OLLAMA_URL = os.getenv("OLLAMA_URL", "http://10.0.0.10:11434")
OLLAMA_URL = os.getenv("OLLAMA_URL", "http://localhost:11434")
EMBEDDING_MODEL = os.getenv("EMBEDDING_MODEL", "snowflake-arctic-embed2")
USER_ID = os.getenv("USER_ID", "rob")
@@ -94,49 +97,124 @@ def clean_content(text: str) -> str:
return text.strip()
def chunk_text(text: str, max_chars: int = 6000, overlap: int = 200) -> list:
"""Split text into overlapping chunks for embedding.
Args:
text: Text to chunk
max_chars: Max chars per chunk (6000 = safe for 4K token limit)
overlap: Chars to overlap between chunks
Returns:
List of chunk dicts with 'text' and 'chunk_index'
"""
if len(text) <= max_chars:
return [{'text': text, 'chunk_index': 0, 'total_chunks': 1}]
chunks = []
start = 0
chunk_num = 0
while start < len(text):
end = start + max_chars
# Try to break at sentence boundary
if end < len(text):
# Look for paragraph break first
para_break = text.rfind('\n\n', start, end)
if para_break > start + 500:
end = para_break
else:
# Look for sentence break
for delim in ['. ', '? ', '! ', '\n']:
sent_break = text.rfind(delim, start, end)
if sent_break > start + 500:
end = sent_break + 1
break
chunk_text = text[start:end].strip()
if len(chunk_text) > 100: # Skip tiny chunks
chunks.append(chunk_text)
chunk_num += 1
start = end - overlap if end < len(text) else len(text)
# Add metadata to each chunk
total = len(chunks)
return [{'text': c, 'chunk_index': i, 'total_chunks': total} for i, c in enumerate(chunks)]
def store_to_qdrant(turn: Dict[str, Any], dry_run: bool = False) -> bool:
"""Store a conversation turn to Qdrant, chunking if needed.
For long content, splits into multiple chunks (no data loss).
Each chunk gets its own point with chunk_index metadata.
"""
if dry_run:
print(f"[DRY RUN] Would store turn {turn['turn']} ({turn['role']}): {turn['content'][:60]}...")
return True
vector = get_embedding(turn['content'])
if vector is None:
print(f"Failed to get embedding for turn {turn['turn']}", file=sys.stderr)
return False
content = turn['content']
chunks = chunk_text(content)
payload = {
"user_id": turn.get('user_id', USER_ID),
"role": turn['role'],
"content": turn['content'],
"turn": turn['turn'],
"timestamp": turn.get('timestamp', datetime.now(timezone.utc).isoformat()),
"date": datetime.now(timezone.utc).strftime('%Y-%m-%d'),
"source": "openclaw-true-recall-base",
"curated": False
}
if len(chunks) > 1:
print(f" 📦 Chunking turn {turn['turn']}: {len(content)} chars → {len(chunks)} chunks", file=sys.stderr)
# Generate deterministic ID
turn_id = turn.get('turn', 0)
hash_bytes = hashlib.sha256(f"{USER_ID}:turn:{turn_id}:{datetime.now().strftime('%H%M%S')}".encode()).digest()[:8]
point_id = int.from_bytes(hash_bytes, byteorder='big') % (2**63)
base_time = datetime.now().strftime('%H%M%S')
all_success = True
try:
response = requests.put(
f"{QDRANT_URL}/collections/{QDRANT_COLLECTION}/points",
json={
"points": [{
"id": abs(point_id),
"vector": vector,
"payload": payload
}]
},
timeout=30
)
response.raise_for_status()
return True
except Exception as e:
print(f"Error writing to Qdrant: {e}", file=sys.stderr)
return False
for chunk_info in chunks:
chunk_text_content = chunk_info['text']
chunk_index = chunk_info['chunk_index']
total_chunks = chunk_info['total_chunks']
# Get embedding for this chunk
vector = get_embedding(chunk_text_content)
if vector is None:
print(f"Failed to get embedding for turn {turn['turn']} chunk {chunk_index}", file=sys.stderr)
all_success = False
continue
# Payload includes full content reference, chunk metadata
payload = {
"user_id": turn.get('user_id', USER_ID),
"role": turn['role'],
"content": chunk_text_content, # Store chunk content (searchable)
"full_content_length": len(content), # Original length
"turn": turn['turn'],
"timestamp": turn.get('timestamp', datetime.now(timezone.utc).isoformat()),
"date": datetime.now(timezone.utc).strftime('%Y-%m-%d'),
"source": "true-recall-base",
"curated": False,
"chunk_index": chunk_index,
"total_chunks": total_chunks
}
# Generate unique ID for each chunk
hash_bytes = hashlib.sha256(
f"{USER_ID}:turn:{turn_id}:chunk{chunk_index}:{base_time}".encode()
).digest()[:8]
point_id = int.from_bytes(hash_bytes, byteorder='big') % (2**63)
try:
response = requests.put(
f"{QDRANT_URL}/collections/{QDRANT_COLLECTION}/points",
json={
"points": [{
"id": abs(point_id),
"vector": vector,
"payload": payload
}]
},
timeout=30
)
response.raise_for_status()
except Exception as e:
print(f"Error writing chunk {chunk_index} to Qdrant: {e}", file=sys.stderr)
all_success = False
return all_success
def is_lock_valid(lock_path: Path, max_age_seconds: int = 1800) -> bool:
@@ -332,10 +410,24 @@ def watch_session(session_file: Path, dry_run: bool = False):
INACTIVITY_THRESHOLD = 30 # seconds - if no data for 30s, check for new session
with open(session_file, 'r') as f:
# Check file exists before opening (handles deleted sessions)
if not session_file.exists():
print(f"Session file gone: {session_file.name}, looking for new session...", file=sys.stderr)
return None
# Track file handle for re-opening
try:
f = open(session_file, 'r')
f.seek(last_position)
except FileNotFoundError:
print(f"Session file removed during open: {session_file.name}", file=sys.stderr)
return None
try:
while running:
if not session_file.exists():
print("Session file removed, looking for new session...")
f.close()
return None
current_time = time.time()
@@ -346,6 +438,7 @@ def watch_session(session_file: Path, dry_run: bool = False):
newest_session = get_current_session_file()
if newest_session and newest_session != session_file:
print(f"Newer session detected: {newest_session.name}")
f.close()
return newest_session
# Check if current file is stale (no new data for threshold)
@@ -357,6 +450,7 @@ def watch_session(session_file: Path, dry_run: bool = False):
newest_session = get_current_session_file()
if newest_session and newest_session != session_file:
print(f"Current session inactive, switching to: {newest_session.name}")
f.close()
return newest_session
else:
# File grew, update tracking
@@ -365,19 +459,31 @@ def watch_session(session_file: Path, dry_run: bool = False):
except Exception:
pass
# Process new lines and update activity tracking
old_position = last_position
process_new_lines(f, session_name, dry_run)
# Check if file has grown since last read
try:
current_size = session_file.stat().st_size
except Exception:
current_size = 0
# If we processed new data, update activity timestamp
if last_position > old_position:
last_data_time = current_time
try:
last_file_size = session_file.stat().st_size
except Exception:
pass
# Only process if file has grown
if current_size > last_position:
old_position = last_position
process_new_lines(f, session_name, dry_run)
# If we processed new data, update activity timestamp
if last_position > old_position:
last_data_time = current_time
last_file_size = current_size
else:
# Re-open file handle to detect new writes
f.close()
time.sleep(0.05) # Brief pause before re-opening
f = open(session_file, 'r')
f.seek(last_position)
time.sleep(0.1)
finally:
f.close()
return session_file