release v1.3: fix crash loop (FileNotFoundError), add content chunking

ERROR FIXED: - FileNotFoundError on deleted session files caused 2,551 restarts/24h - Embedding token overflow for long messages (exceeded 4K token limit) ROOT CAUSES: 1. File opened before existence check - crash when OpenClaw deletes session files 2. No content truncation for embedding - long messages failed silently FIXES: 1. Added pre-check for file existence before open() 2. Added try/catch for FileNotFoundError 3. Added chunk_text() for long content (6000 char chunks with 200 char overlap) 4. Each chunk stored separately with metadata (chunk_index, total_chunks) IMPACT: - Eliminates crash loop - No memory loss for long messages - Graceful recovery on session rotation
fix: watcher crash on deleted session files, add chunking for long content
2026-03-10 12:11:50 -05:00 · 2026-03-10 12:09:42 -05:00 · 2026-03-10 12:05:58 -05:00
6 changed files with 454 additions and 50 deletions
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -0,0 +1,97 @@
 # Changelog - openclaw-true-recall-base
 All notable changes to this project will be documented in this file.
 ## [v1.3] - 2026-03-10
 ### Fixed
 #### Critical: Crash Loop on Deleted Session Files
 **Error:** `FileNotFoundError: [Errno 2] No such file or directory: '/root/.openclaw/agents/main/sessions/daccff90-f889-44fa-ba8b-c8d7397e5241.jsonl'`
 **Root Cause:**
 - OpenClaw deletes session `.jsonl` files when `/new` or `/reset` is called
 - The watcher opened the file before checking existence
 - Between file detection and opening, the file was deleted
 - This caused unhandled `FileNotFoundError` → crash → systemd restart
 **Impact:** 2,551 restarts in 24 hours
 **Original Code (v1.2):**
 ```python
 # Track file handle for re-opening
 f = open(session_file, 'r')  # CRASH HERE if file deleted
 f.seek(last_position)
 try:
    while running:
        if not session_file.exists():  # Check happens AFTER crash
            ...
 ```
 **Fix (v1.3):**
 ```python
 # Check file exists before opening (handles deleted sessions)
 if not session_file.exists():
    print(f"Session file gone: {session_file.name}, looking for new session...", file=sys.stderr)
    return None
 # Track file handle for re-opening
 try:
    f = open(session_file, 'r')
    f.seek(last_position)
 except FileNotFoundError:
    print(f"Session file removed during open: {session_file.name}", file=sys.stderr)
    return None
 ```
 #### Embedding Token Overflow
 **Error:** `Ollama API error 400: {"StatusCode":400,"Status":"400 Bad Request","error":"prompt too long; exceeded max context length by 4 tokens"}`
 **Root Cause:**
 - The embedding model `snowflake-arctic-embed2` has a 4,096 token limit (~16K chars)
 - Long messages were sent to embedding without truncation
 - The watcher's `get_embedding()` call passed full `turn['content']`
 **Impact:** Failed embedding generation, memory loss for long messages
 **Fix:**
 - Added `chunk_text()` function to split long content into 6,000 char overlapping chunks
 - Each chunk gets its own Qdrant point with `chunk_index` and `total_chunks` metadata
 - Overlap (200 chars) ensures search continuity
 - No data loss - all content stored
 ### Changed
 - `store_to_qdrant()` now handles multiple chunks per turn
 - Each chunk stored with metadata: `chunk_index`, `total_chunks`, `full_content_length`
 ---
 ## [v1.2] - 2026-02-26
 ### Fixed
 - Session rotation bug - added inactivity detection (30s threshold)
 - Improved file scoring to properly detect new sessions on `/new` or `/reset`
 ---
 ## [v1.1] - 2026-02-25
 ### Added
 - 1-second mtime polling for session rotation
 ---
 ## [v1.0] - 2026-02-24
 ### Added
 - Initial release
 - Real-time monitoring of OpenClaw sessions
 - Automatic embedding via local Ollama (snowflake-arctic-embed2)
 - Storage to Qdrant `memories_tr` collection
--- a/README.md
+++ b/README.md
@@ -61,7 +61,7 @@ true-recall-base (REQUIRED)
    │   ├── Curator extracts gems → gems_tr
    │   └── Plugin injects gems into prompts
    │
-    └──▶ true-recall-blocks (ADDON)
+    └──▶ openclaw-true-recall-blocks (ADDON)
        ├── Topic clustering → topic_blocks_tr
        └── Contextual block retrieval
@@ -654,3 +654,108 @@ curl -s "http://10.0.0.40:6333/collections/memories_tr/points/scroll" \
 ---
 **Prerequisite for:** TrueRecall Gems, TrueRecall Blocks
 ---
 ## Upgrading from Older Versions
 This section covers full upgrades from older TrueRecall Base installations to the current version.
 ### Version History
 | Version | Key Changes |
 |---------|-------------|
 | **v1.0** | Initial release - basic watcher |
 | **v1.1** | Session detection improvements |
 | **v1.2** | Priority-based session detection, lock file validation, backfill script |
 | **v1.3** | Offset persistence (resumes from last position), fixes duplicate processing |
 | **v1.4** | Current version - Memory backfill fix (Qdrant ids field), improved error handling |
 ### Upgrade Paths
 #### From v1.0/v1.1/v1.2 → v1.4 (Current)
 If you have an older installation, follow these steps:
 ```bash
 # Step 1: Backup existing configuration
 cp /root/.openclaw/workspace/skills/qdrant-memory/scripts/realtime_qdrant_watcher.py    /root/.openclaw/workspace/skills/qdrant-memory/scripts/realtime_qdrant_watcher.py.bak.$(date +%Y%m%d)
 cp /root/.openclaw/workspace/skills/qdrant-memory/scripts/config.json    /root/.openclaw/workspace/skills/qdrant-memory/scripts/config.json.bak.$(date +%Y%m%d)
 ```
 ```bash
 # Step 2: Stop the watcher
 pkill -f realtime_qdrant_watcher
 # Verify stopped
 ps aux | grep realtime_qdrant_watcher
 ```
 ```bash
 # Step 3: Download latest files (choose one source)
 # Option A: From GitLab (recommended)
 curl -o /root/.openclaw/workspace/skills/qdrant-memory/scripts/realtime_qdrant_watcher.py   https://gitlab.com/mdkrush/openclaw-true-recall-base/-/raw/master/watcher/realtime_qdrant_watcher.py
 # Option B: From Gitea
 curl -o /root/.openclaw/workspace/skills/qdrant-memory/scripts/realtime_qdrant_watcher.py   http://10.0.0.61:3000/SpeedyFoxAi/openclaw-true-recall-base/raw/branch/master/watcher/realtime_qdrant_watcher.py
 # Option C: From local clone (if you cloned the repo)
 cp /path/to/openclaw-true-recall-base/watcher/realtime_qdrant_watcher.py    /root/.openclaw/workspace/skills/qdrant-memory/scripts/
 ```
 ```bash
 # Step 4: Start the watcher
 python3 /root/.openclaw/workspace/skills/qdrant-memory/scripts/realtime_qdrant_watcher.py --daemon
 ```
 ```bash
 # Step 5: Verify installation
 ps aux | grep realtime_qdrant_watcher
 curl -s "http://10.0.0.40:6333/collections/memories_tr/points/scroll"   -H "Content-Type: application/json" -d '{"limit": 3}' | jq '.result.points[0].payload.timestamp'
 ```
 ### Upgrading with Git (If You Cloned the Repository)
 ```bash
 # Navigate to your clone
 cd /path/to/openclaw-true-recall-base
 git pull origin master
 # Stop current watcher
 pkill -f realtime_qdrant_watcher
 # Copy updated files to OpenClaw
 cp watcher/realtime_qdrant_watcher.py /root/.openclaw/workspace/skills/qdrant-memory/scripts/
 cp scripts/backfill_memory.py /root/.openclaw/workspace/skills/qdrant-memory/scripts/
 # Restart the watcher
 python3 /root/.openclaw/workspace/skills/qdrant-memory/scripts/realtime_qdrant_watcher.py --daemon
 # Verify
 ps aux | grep realtime_qdrant_watcher
 ```
 ### Backfilling Historical Memories (Optional)
 ```bash
 python3 /root/.openclaw/workspace/skills/qdrant-memory/scripts/backfill_memory.py
 ```
 ### Verifying Your Upgrade
 ```bash
 # 1. Check watcher is running
 ps aux | grep realtime_qdrant_watcher
 # 2. Verify source is "true-recall-base"
 curl -s "http://10.0.0.40:6333/collections/memories_tr/points/scroll"   -H "Content-Type: application/json" -d '{"limit": 1}' | jq '.result.points[0].payload.source'
 # 3. Check date coverage
 curl -s "http://10.0.0.40:6333/collections/memories_tr/points/scroll"   -H "Content-Type: application/json" -d '{"limit": 10000}' | jq '[.result.points[].payload.date] | unique | sort'
 ```
 Expected output:
 - Source: `"true-recall-base"`
 - Dates: Array from oldest to newest memory
--- a/install.sh
+++ b/install.sh
@@ -96,3 +96,32 @@ echo "  curl -s http://$QDRANT_IP/collections/memories_tr | jq '.result.points_c
 echo ""
 echo "View logs:"
 echo "  sudo journalctl -u mem-qdrant-watcher -f"
 echo ""
 echo "=========================================="
 echo "UPGRADING FROM OLDER VERSION"
 echo "=========================================="
 echo ""
 echo "If you already have TrueRecall Base installed:"
 echo ""
 echo "1. Stop the watcher:"
 echo "   pkill -f realtime_qdrant_watcher"
 echo ""
 echo "2. Backup current files:"
 echo "   cp /root/.openclaw/workspace/skills/qdrant-memory/scripts/realtime_qdrant_watcher.py \"
 echo "      /root/.openclaw/workspace/skills/qdrant-memory/scripts/realtime_qdrant_watcher.py.bak"
 echo ""
 echo "3. Copy updated files:"
 echo "   cp watcher/realtime_qdrant_watcher.py \"
 echo "      /root/.openclaw/workspace/skills/qdrant-memory/scripts/"
 echo "   cp scripts/backfill_memory.py \"
 echo "      /root/.openclaw/workspace/skills/qdrant-memory/scripts/"
 echo ""
 echo "4. Restart watcher:"
 echo "   python3 /root/.openclaw/workspace/skills/qdrant-memory/scripts/realtime_qdrant_watcher.py --daemon"
 echo ""
 echo "5. Verify:"
 echo "   ps aux | grep realtime_qdrant_watcher"
 echo ""
 echo "For full upgrade instructions, see README.md"
--- a/scripts/backfill_memory.py
+++ b/scripts/backfill_memory.py
@@ -0,0 +1,67 @@
 #!/usr/bin/env python3
 """Backfill memory files to Qdrant memories_tr collection."""
 import os
 import json
 from datetime import datetime
 QDRANT_URL = "http://10.0.0.40:6333"
 MEMORY_DIR = "/root/.openclaw/workspace/memory"
 def get_memory_files():
    """Get all memory files sorted by date."""
    files = []
    for f in os.listdir(MEMORY_DIR):
        if f.startswith("2026-") and f.endswith(".md"):
            date = f.replace(".md", "")
            files.append((date, f))
    return sorted(files, key=lambda x: x[0])
 def backfill_file(date, filename):
    """Backfill a single memory file to Qdrant."""
    filepath = os.path.join(MEMORY_DIR, filename)
    with open(filepath, 'r') as f:
        content = f.read()
    # Truncate if too long for payload
    payload = {
        "content": content[:50000],  # Limit size
        "date": date,
        "source": "memory_file",
        "curated": False,
        "role": "system",
        "user_id": "rob"
    }
    # Add to Qdrant
    import requests
    point_id = hash(f"memory_{date}") % 10000000000
    resp = requests.post(
        f"{QDRANT_URL}/collections/memories_tr/points",
        json={
            "points": [{
                "id": point_id,
                "payload": payload
            }],
            "ids": [point_id]
        }
    )
    return resp.status_code == 200
 def main():
    files = get_memory_files()
    print(f"Found {len(files)} memory files to backfill")
    count = 0
    for date, filename in files:
        print(f"Backfilling {filename}...", end=" ")
        if backfill_file(date, filename):
            print("✓")
            count += 1
        else:
            print("✗")
    print(f"\nBackfilled {count}/{len(files)} files")
 if __name__ == "__main__":
    main()
--- a/session.md
+++ b/session.md
@@ -19,7 +19,7 @@ true-recall-base (REQUIRED FOUNDATION)
    │   ├── Curator extracts atomic gems
    │   └── Plugin injects gems as context
    │
-    └──▶ true-recall-blocks (OPTIONAL ADDON)
+    └──▶ openclaw-true-recall-blocks (OPTIONAL ADDON)
        ├── Topic clustering
        └── Block-based retrieval
 ```
@@ -82,4 +82,4 @@ curl -s http://10.0.0.40:6333/collections/memories_tr | jq '.result.points_count
 ---
-*Next: Install true-recall-gems OR true-recall-blocks (not both)*
+*Next: Install true-recall-gems OR openclaw-true-recall-blocks (not both)*
--- a/watcher/realtime_qdrant_watcher.py
+++ b/watcher/realtime_qdrant_watcher.py
@@ -1,11 +1,14 @@
 #!/usr/bin/env python3
 """
-TrueRecall v1.2 - Real-time Qdrant Watcher
+TrueRecall v1.3 - Real-time Qdrant Watcher
 Monitors OpenClaw sessions and stores to memories_tr instantly.
 This is the CAPTURE component. For curation and injection, install v2.
 Changelog:
 - v1.3: Fixed crash loop (2551 restarts/24h) from FileNotFoundError on deleted session files.
        Added chunking for long content (6000 char chunks) to prevent embedding token overflow.
        Improved error handling for session file lifecycle.
 - v1.2: Fixed session rotation bug - added inactivity detection (30s threshold)
        and improved file scoring to properly detect new sessions on /new or /reset
 - v1.1: Added 1-second mtime polling for session rotation
@@ -27,7 +30,7 @@ from typing import Dict, Any, Optional, List
 # Config
 QDRANT_URL = os.getenv("QDRANT_URL", "http://10.0.0.40:6333")
 QDRANT_COLLECTION = os.getenv("QDRANT_COLLECTION", "memories_tr")
-OLLAMA_URL = os.getenv("OLLAMA_URL", "http://10.0.0.10:11434")
+OLLAMA_URL = os.getenv("OLLAMA_URL", "http://localhost:11434")
 EMBEDDING_MODEL = os.getenv("EMBEDDING_MODEL", "snowflake-arctic-embed2")
 USER_ID = os.getenv("USER_ID", "rob")
@@ -94,49 +97,124 @@ def clean_content(text: str) -> str:
    return text.strip()
 def chunk_text(text: str, max_chars: int = 6000, overlap: int = 200) -> list:
    """Split text into overlapping chunks for embedding.
    Args:
        text: Text to chunk
        max_chars: Max chars per chunk (6000 = safe for 4K token limit)
        overlap: Chars to overlap between chunks
    Returns:
        List of chunk dicts with 'text' and 'chunk_index'
    """
    if len(text) <= max_chars:
        return [{'text': text, 'chunk_index': 0, 'total_chunks': 1}]
    chunks = []
    start = 0
    chunk_num = 0
    while start < len(text):
        end = start + max_chars
        # Try to break at sentence boundary
        if end < len(text):
            # Look for paragraph break first
            para_break = text.rfind('\n\n', start, end)
            if para_break > start + 500:
                end = para_break
            else:
                # Look for sentence break
                for delim in ['. ', '? ', '! ', '\n']:
                    sent_break = text.rfind(delim, start, end)
                    if sent_break > start + 500:
                        end = sent_break + 1
                        break
        chunk_text = text[start:end].strip()
        if len(chunk_text) > 100:  # Skip tiny chunks
            chunks.append(chunk_text)
            chunk_num += 1
        start = end - overlap if end < len(text) else len(text)
    # Add metadata to each chunk
    total = len(chunks)
    return [{'text': c, 'chunk_index': i, 'total_chunks': total} for i, c in enumerate(chunks)]
 def store_to_qdrant(turn: Dict[str, Any], dry_run: bool = False) -> bool:
    """Store a conversation turn to Qdrant, chunking if needed.
    For long content, splits into multiple chunks (no data loss).
    Each chunk gets its own point with chunk_index metadata.
    """
    if dry_run:
        print(f"[DRY RUN] Would store turn {turn['turn']} ({turn['role']}): {turn['content'][:60]}...")
        return True
-    vector = get_embedding(turn['content'])
+    content = turn['content']
-    if vector is None:
+    chunks = chunk_text(content)
        print(f"Failed to get embedding for turn {turn['turn']}", file=sys.stderr)
        return False
-    payload = {
+    if len(chunks) > 1:
-        "user_id": turn.get('user_id', USER_ID),
+        print(f"  📦 Chunking turn {turn['turn']}: {len(content)} chars → {len(chunks)} chunks", file=sys.stderr)
        "role": turn['role'],
        "content": turn['content'],
        "turn": turn['turn'],
        "timestamp": turn.get('timestamp', datetime.now(timezone.utc).isoformat()),
        "date": datetime.now(timezone.utc).strftime('%Y-%m-%d'),
        "source": "openclaw-true-recall-base",
        "curated": False
    }
    # Generate deterministic ID
    turn_id = turn.get('turn', 0)
-    hash_bytes = hashlib.sha256(f"{USER_ID}:turn:{turn_id}:{datetime.now().strftime('%H%M%S')}".encode()).digest()[:8]
+    base_time = datetime.now().strftime('%H%M%S')
-    point_id = int.from_bytes(hash_bytes, byteorder='big') % (2**63)
+    all_success = True
-    try:
+    for chunk_info in chunks:
-        response = requests.put(
+        chunk_text_content = chunk_info['text']
-            f"{QDRANT_URL}/collections/{QDRANT_COLLECTION}/points",
+        chunk_index = chunk_info['chunk_index']
-            json={
+        total_chunks = chunk_info['total_chunks']
-                "points": [{
+        
-                    "id": abs(point_id),
+        # Get embedding for this chunk
-                    "vector": vector,
+        vector = get_embedding(chunk_text_content)
-                    "payload": payload
+        if vector is None:
-                }]
+            print(f"Failed to get embedding for turn {turn['turn']} chunk {chunk_index}", file=sys.stderr)
-            },
+            all_success = False
-            timeout=30
+            continue
-        )
+        
-        response.raise_for_status()
+        # Payload includes full content reference, chunk metadata
-        return True
+        payload = {
-    except Exception as e:
+            "user_id": turn.get('user_id', USER_ID),
-        print(f"Error writing to Qdrant: {e}", file=sys.stderr)
+            "role": turn['role'],
-        return False
+            "content": chunk_text_content,  # Store chunk content (searchable)
            "full_content_length": len(content),  # Original length
            "turn": turn['turn'],
            "timestamp": turn.get('timestamp', datetime.now(timezone.utc).isoformat()),
            "date": datetime.now(timezone.utc).strftime('%Y-%m-%d'),
            "source": "true-recall-base",
            "curated": False,
            "chunk_index": chunk_index,
            "total_chunks": total_chunks
        }
        # Generate unique ID for each chunk
        hash_bytes = hashlib.sha256(
            f"{USER_ID}:turn:{turn_id}:chunk{chunk_index}:{base_time}".encode()
        ).digest()[:8]
        point_id = int.from_bytes(hash_bytes, byteorder='big') % (2**63)
        try:
            response = requests.put(
                f"{QDRANT_URL}/collections/{QDRANT_COLLECTION}/points",
                json={
                    "points": [{
                        "id": abs(point_id),
                        "vector": vector,
                        "payload": payload
                    }]
                },
                timeout=30
            )
            response.raise_for_status()
        except Exception as e:
            print(f"Error writing chunk {chunk_index} to Qdrant: {e}", file=sys.stderr)
            all_success = False
    return all_success
 def is_lock_valid(lock_path: Path, max_age_seconds: int = 1800) -> bool:
@@ -332,10 +410,24 @@ def watch_session(session_file: Path, dry_run: bool = False):
    INACTIVITY_THRESHOLD = 30  # seconds - if no data for 30s, check for new session
-    with open(session_file, 'r') as f:
+    # Check file exists before opening (handles deleted sessions)
    if not session_file.exists():
        print(f"Session file gone: {session_file.name}, looking for new session...", file=sys.stderr)
        return None
    # Track file handle for re-opening
    try:
        f = open(session_file, 'r')
        f.seek(last_position)
    except FileNotFoundError:
        print(f"Session file removed during open: {session_file.name}", file=sys.stderr)
        return None
    try:
        while running:
            if not session_file.exists():
                print("Session file removed, looking for new session...")
                f.close()
                return None
            current_time = time.time()
@@ -346,6 +438,7 @@ def watch_session(session_file: Path, dry_run: bool = False):
                newest_session = get_current_session_file()
                if newest_session and newest_session != session_file:
                    print(f"Newer session detected: {newest_session.name}")
                    f.close()
                    return newest_session
            # Check if current file is stale (no new data for threshold)
@@ -357,6 +450,7 @@ def watch_session(session_file: Path, dry_run: bool = False):
                        newest_session = get_current_session_file()
                        if newest_session and newest_session != session_file:
                            print(f"Current session inactive, switching to: {newest_session.name}")
                            f.close()
                            return newest_session
                    else:
                        # File grew, update tracking
@@ -365,19 +459,31 @@ def watch_session(session_file: Path, dry_run: bool = False):
                except Exception:
                    pass
-            # Process new lines and update activity tracking
+            # Check if file has grown since last read
-            old_position = last_position
+            try:
-            process_new_lines(f, session_name, dry_run)
+                current_size = session_file.stat().st_size
            except Exception:
                current_size = 0
-            # If we processed new data, update activity timestamp
+            # Only process if file has grown
-            if last_position > old_position:
+            if current_size > last_position:
-                last_data_time = current_time
+                old_position = last_position
-                try:
+                process_new_lines(f, session_name, dry_run)
-                    last_file_size = session_file.stat().st_size
+                
-                except Exception:
+                # If we processed new data, update activity timestamp
-                    pass
+                if last_position > old_position:
                    last_data_time = current_time
                    last_file_size = current_size
            else:
                # Re-open file handle to detect new writes
                f.close()
                time.sleep(0.05)  # Brief pause before re-opening
                f = open(session_file, 'r')
                f.seek(last_position)
            time.sleep(0.1)
    finally:
        f.close()
    return session_file