35 Commits

Author SHA1 Message Date
root
2069949a8d Add comprehensive upgrade instructions
- Added 'Upgrading from Older Versions' section with full step-by-step
- Added version history table (v1.0 - v1.4)
- Added upgrade paths for curl download and git pull
- Added backfill instructions and verification steps
- Added upgrade section to install.sh
- Added backfill_memory.py script
2026-03-04 14:59:37 -06:00
root
caebfb4b25 docs: add v1.2 patching/update instructions to README 2026-03-04 14:59:37 -06:00
root
1bb4b1daaa feat: update watcher with priority-based session file detection 2026-03-04 14:59:37 -06:00
root
87aab95ba8 feat: improve search_q.sh output
- Add result count to summary
- Increase content preview to 250 chars
- Add user_id to result display
- Improve 'no results' messaging
- Better result counting with tee
2026-03-04 14:59:37 -06:00
root
8bf10326db feat: add search_q.sh script with chronological sorting
- Search memories by keyword/phrase
- Automatically sorts results by timestamp (newest first)
- Shows formatted output with date, role, and content
- Supports environment variables for configuration
- Limits results to avoid information overload
- Handles errors gracefully
2026-03-04 14:59:37 -06:00
root
4e1d432b02 docs: add detailed installer documentation
- Explain each step the installer performs
- Show example prompts and outputs
- Document configuration values
- List post-installation verification commands
- Include installer requirements
2026-03-04 14:59:37 -06:00
root
402c03f647 fix: accept full host:port in install script
- Change defaults to include port (localhost:6333, localhost:11434)
- Show full host:port examples with actual IPs
- Use entered value directly without appending port
- Fixes duplicate port issue if user enters full format
2026-03-04 14:59:37 -06:00
root
eb9bddab69 docs: add IP examples and port info to install prompts
- Show example IPs: localhost, 10.0.0.40, 192.168.1.10
- Clarify default ports (6333 for Qdrant, 11434 for Ollama)
- Help users understand expected input format
2026-03-04 14:59:37 -06:00
root
f238812ce1 fix: handle paths with spaces in install script
- Add INSTALL_DIR variable with absolute path resolution
- Handles spaces in directory names correctly
- Uses cd/pwd trick for robust path detection
2026-03-04 14:59:37 -06:00
root
ead210b565 docs: update Quick Start with install script option
- Add Option 1: Quick Install using install.sh
- Add Option 2: Manual Install (original)
- Update verification section
- Make install script the recommended path
2026-03-04 14:59:37 -06:00
root
a194702a8a feat: add simple install script
- Interactive configuration with defaults
- Defaults to localhost for Qdrant and Ollama
- Allows custom values for all settings
- Creates systemd service with user-provided config
- Auto-starts the watcher
2026-03-04 14:59:37 -06:00
root
cdcfe2f51a docs: add requirements section
- Document vector database requirement
- Explain local vs cloud options
- Clarify IP accessibility needed
- List additional requirements (Ollama, OpenClaw, systemd)
2026-03-04 14:59:37 -06:00
root
b954064502 chore: remove development files (audit checklist and validation report) 2026-03-04 14:59:37 -06:00
root
ba4a5fd63d docs: remove v1 from title 2026-03-04 14:59:37 -06:00
root
29a1ade004 docs: add Base is Complete section
- Emphasize that Base is fully functional standalone
- Clarify upgrade is optional
- List what Base provides without addons
- Reduce pressure to upgrade
2026-03-04 14:59:37 -06:00
root
2f2d93ce7f docs: add final validation report
- 2-pass comprehensive validation
- 100% accuracy confirmed
- All systems operational
- Ready for production
2026-03-04 14:59:37 -06:00
root
62251f5566 docs: add memory usage and q command instructions
- Add Using Memories with OpenClaw section
- Document the 'q' command and its meaning
- Include context injection instructions for system prompts
- Add search priority table
- Explain when to search q vs other sources
- Include right/wrong response examples
2026-03-04 14:59:37 -06:00
root
501872d46e docs: add comprehensive How It Works section
- Add architecture diagram
- Detail step-by-step process (5 steps)
- Include code snippets for each phase
- Document session rotation handling
- Add error handling documentation
- Include collection schema details
- Document security notes
- Add performance metrics table
2026-03-04 14:59:37 -06:00
root
6a32cedb5a docs: update README with upgrade paths and coming soon notices
- Remove duplicate Base does NOT include section
- Add detailed upgrade paths for Gems and Blocks
- Add Coming Soon status indicators
- Include workflow diagrams for both addons
- Explain use cases for each upgrade option
2026-03-04 14:59:37 -06:00
root
04953bc38b Update README: Add v1 to title for clarity 2026-03-04 14:59:37 -06:00
root
d943b9d87e docs: sanitize IP addresses in README
- Replace hardcoded IPs with placeholders
- QDRANT_URL: 10.0.0.40 → <QDRANT_IP>
- OLLAMA_URL: 10.0.0.10 → <OLLAMA_IP>
- USER_ID: rob → <USER_ID>
- Update verification example command
2026-03-04 14:59:37 -06:00
root
808c021d15 docs: replace v2 references with Gems/Blocks addons
- Remove v2 from README Next Step section
- Add addon comparison table (Gems vs Blocks)
- Update prerequisite mention
- Update Python docstring to reference addons
2026-03-04 14:59:37 -06:00
root
5ea614b212 refactor: rename v1 references to base
- Remove v1 versioning from project name
- Update all references: TrueRecall v1 → TrueRecall Base
- Update paths: true-recall-v1 → true-recall-base
- Clean up README (remove version number)
- Update config description
- Update service file description and paths
2026-03-04 14:59:37 -06:00
root
e1962887a5 chore: add .gitignore for Python and session files 2026-03-04 14:59:37 -06:00
root
d88ff6cea3 feat: initial TrueRecall Base v1.0
Core components:
- Real-time memory capture daemon
- Qdrant memories_tr collection storage
- Systemd service for auto-start
- Configuration templates with placeholders

Features:
- Full conversation context capture
- Deduplication via content hashing
- User-tagged memories
- Compatible with Gems and Blocks addons
2026-03-04 14:59:37 -06:00
root
c780a24847 Fix Qdrant upsert: add required ids field
- Fixed missing 'ids' field in POST body causing 400 errors
- Backfilled 23 memory files (Feb 4 - Mar 1, 2026)
- Validation: ~20K+ total points, date coverage complete

Resolves Gitea issue #8
2026-03-04 14:37:08 -06:00
root
5c2014cb11 Fix: Proper session rotation detection (v1.2)
Fixes the bug where watcher stayed stuck on old sessions after /new or /reset.

Changes:
- Added file_score() function combining mtime + size for better detection
- Added INACTIVITY_THRESHOLD (30s) - if no new data, check for active session
- Tracks last_data_time and file size to detect stale sessions
- Switches to newer session when current is inactive

The previous v1.1 fix (mtime polling) was incomplete because new sessions
can have older mtime than recently-written old sessions.

Tested: Watcher now properly follows session rotation on /new and /reset
2026-02-28 19:09:38 -06:00
root
a053ec1c3d fix: SESSIONS_DIR env var and config dimension docs
- SESSIONS_DIR now reads from OPENCLAW_SESSIONS_DIR env var with fallback
- Fixes hardcoded path issue reported by community
- config.json: add embedding_dimensions (1024) and notes field
- Update version to 1.1 in config.json

Validated 4x:
1. SESSIONS_DIR line correct
2. config.json syntax valid
3. Both files syntax OK
4. Env var logic tested

Thanks to Rob Whyte @ Fort Myers Brewing for the suggestion.
2026-02-28 17:05:43 -06:00
root
1c24618ad9 docs: Add Gotchas section - embedding dimensions and hardcoded paths
- Document that snowflake-arctic-embed2 outputs 1024 dimensions (not 768)
- Document SESSIONS_DIR hardcoded path and how to patch with env var
- Add Known Limitations section near File Watching docs
- Fixes community feedback from GitLab issue #1

Thanks to Rob Whyte @ Fort Myers Brewing for identifying these issues.
2026-02-28 17:01:06 -06:00
root
70f5aec465 Fix: Add session rotation detection (v1.1)
- Add 1-second mtime polling to detect newer sessions
- Fixes bug where watcher stayed stuck on first session forever
- Prevents data loss when sessions rotate (was losing 2+ days of history)
- Bump version to v1.1
2026-02-28 16:51:31 -06:00
root
97a95bd3af docs: add validation rule - always validate after changes 2026-02-14 07:23:26 -06:00
root
9769839a67 docs: add counter reset root cause - removed old update_count.sh 2026-02-14 07:23:18 -06:00
root
59225f0d1b docs: add note - website complements YouTube channel 2026-02-14 07:21:21 -06:00
root
a8299b6db7 docs: add Feb 14 - SpeedyFoxAI website details, counter fix, nginx path discovery 2026-02-14 07:20:53 -06:00
root
648aa7f016 docs: add git repository section to daily log 2026-02-10 14:40:48 -06:00
16 changed files with 3628 additions and 0 deletions

33
.gitignore vendored Normal file
View File

@@ -0,0 +1,33 @@
# Python
__pycache__/
*.py[cod]
*$py.class
*.so
.Python
*.egg-info/
dist/
build/
# Environment
.env
.env.*
.venv/
# IDE
.vscode/
.idea/
*.swp
*.swo
*~
# OS
.DS_Store
Thumbs.db
# Session notes (local only)
session.md
*.session.md
# Logs
*.log
logs/

View File

@@ -0,0 +1,761 @@
# TrueRecall Base
**Purpose:** Real-time memory capture → Qdrant `memories_tr`
**Status:** ✅ Standalone capture system
---
## Overview
TrueRecall Base is the **foundation**. It watches OpenClaw sessions in real-time and stores every turn to Qdrant's `memories_tr` collection.
This is **required** for both addons: **Gems** and **Blocks**.
**Base does NOT include:**
- ❌ Curation (gem extraction)
- ❌ Topic clustering (blocks)
- ❌ Injection (context recall)
**For those features, install an addon after base.**
---
## Requirements
**Vector Database**
TrueRecall Base requires a vector database to store conversation embeddings. This can be:
- **Local** - Self-hosted Qdrant (recommended for privacy)
- **Cloud** - Managed Qdrant Cloud or similar service
- **Any IP-accessible** Qdrant instance
In this version, we use a **local Qdrant database** (`http://<QDRANT_IP>:6333`). The database must be reachable from the machine running the watcher daemon.
**Additional Requirements:**
- **Ollama** - For generating text embeddings (local or remote)
- **OpenClaw** - The session files to monitor
- **Linux systemd** - For running the watcher as a service
---
## Gotchas & Known Limitations
> ⚠️ **Embedding Dimensions:** `snowflake-arctic-embed2` outputs **1024 dimensions**, not 768. Ensure your Qdrant collection is configured with `"size": 1024`.
> ⚠️ **Hardcoded Sessions Path:** `SESSIONS_DIR` is hardcoded to `/root/.openclaw/agents/main/sessions`. To use a different path, modify `realtime_qdrant_watcher.py` to read from an environment variable:
> ```python
> SESSIONS_DIR = Path(os.getenv("OPENCLAW_SESSIONS_DIR", "/root/.openclaw/agents/main/sessions"))
> ```
---
## Three-Tier Architecture
```
true-recall-base (REQUIRED)
├── Core: Watcher daemon
└── Stores: memories_tr
├──▶ true-recall-gems (ADDON)
│ ├── Curator extracts gems → gems_tr
│ └── Plugin injects gems into prompts
└──▶ true-recall-blocks (ADDON)
├── Topic clustering → topic_blocks_tr
└── Contextual block retrieval
Note: Gems and Blocks are INDEPENDENT addons.
They both require Base, but don't work together.
Choose one: Gems OR Blocks (not both).
```
---
## Quick Start
### Option 1: Quick Install (Recommended)
```bash
cd /path/to/true-recall-base
./install.sh
```
#### What the Installer Does (Step-by-Step)
The `install.sh` script automates the entire setup process. Here's exactly what happens:
**Step 1: Interactive Configuration**
```
Configuration (press Enter for defaults):
Examples:
Qdrant: 10.0.0.40:6333 (remote) or localhost:6333 (local)
Ollama: 10.0.0.10:11434 (remote) or localhost:11434 (local)
Qdrant host:port [localhost:6333]: _
Ollama host:port [localhost:11434]: _
User ID [user]: _
```
- Prompts for Qdrant host:port (default: `localhost:6333`)
- Prompts for Ollama host:port (default: `localhost:11434`)
- Prompts for User ID (default: `user`)
- Press Enter to accept defaults, or type custom values
**Step 2: Configuration Confirmation**
```
Configuration:
Qdrant: http://localhost:6333
Ollama: http://localhost:11434
User ID: user
Proceed? [Y/n]: _
```
- Shows the complete configuration
- Asks for confirmation (type `n` to cancel, Enter or `Y` to proceed)
- Exits cleanly if cancelled, no changes made
**Step 3: Systemd Service Generation**
- Creates a temporary service file at `/tmp/mem-qdrant-watcher.service`
- Inserts your configuration values (IPs, ports, user ID)
- Uses absolute path for the script location (handles spaces in paths)
- Sets up automatic restart on failure
**Step 4: Service Installation**
```bash
sudo cp /tmp/mem-qdrant-watcher.service /etc/systemd/system/
sudo systemctl daemon-reload
```
- Copies the service file to systemd directory
- Reloads systemd to recognize the new service
**Step 5: Service Activation**
```bash
sudo systemctl enable --now mem-qdrant-watcher
```
- Enables the service to start on boot (`enable`)
- Starts the service immediately (`now`)
**Step 6: Verification**
```
==========================================
Installation Complete!
==========================================
Status:
● mem-qdrant-watcher.service - TrueRecall Base...
Active: active (running)
```
- Displays the service status
- Shows it's active and running
- Provides commands to verify and monitor
**Post-Installation Commands:**
```bash
# Check service status anytime
sudo systemctl status mem-qdrant-watcher
# View live logs
sudo journalctl -u mem-qdrant-watcher -f
# Verify Qdrant collection
curl -s http://localhost:6333/collections/memories_tr | jq '.result.points_count'
```
#### Installer Requirements
- Must run as root or with sudo (for systemd operations)
- Must have execute permissions (`chmod +x install.sh`)
- Script must be run from the true-recall-base directory
### Option 2: Manual Install
```bash
cd /path/to/true-recall-base
# Copy service file
sudo cp watcher/mem-qdrant-watcher.service /etc/systemd/system/
# Edit the service file to set your IPs and user
sudo nano /etc/systemd/system/mem-qdrant-watcher.service
# Reload and start
sudo systemctl daemon-reload
sudo systemctl enable --now mem-qdrant-watcher
```
### Verify Installation
```bash
# Check service status
sudo systemctl status mem-qdrant-watcher
# Check collection
curl -s http://<QDRANT_IP>:6333/collections/memories_tr | jq '.result.points_count'
```
---
## Files
| File | Purpose |
|------|---------|
| `watcher/realtime_qdrant_watcher.py` | Capture daemon |
| `watcher/mem-qdrant-watcher.service` | Systemd service |
| `config.json` | Configuration template |
---
## Configuration
Edit `config.json` or set environment variables:
| Variable | Default | Description |
|----------|---------|-------------|
| `QDRANT_URL` | `http://<QDRANT_IP>:6333` | Qdrant endpoint |
| `OLLAMA_URL` | `http://<OLLAMA_IP>:11434` | Ollama endpoint |
| `EMBEDDING_MODEL` | `snowflake-arctic-embed2` | Embedding model |
| `USER_ID` | `<USER_ID>` | User identifier |
---
## How It Works
### Architecture Overview
```
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ OpenClaw Chat │────▶│ Session JSONL │────▶│ Base Watcher │
│ (You talking) │ │ (/sessions/*.jsonl) │ │ (This daemon) │
└─────────────────┘ └──────────────────┘ └────────┬────────┘
┌────────────────────────────────────────────────────────────────────┐
│ PROCESSING PIPELINE │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌───────────┐ │
│ │ Watch File │─▶│ Parse Turn │─▶│ Clean Text │─▶│ Embed │ │
│ │ (inotify) │ │ (JSON→dict) │ │ (strip md) │ │ (Ollama) │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ └─────┬─────┘ │
│ │ │
│ ┌───────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────┐ ┌──────────────┐ │
│ │ Store to │─▶│ Qdrant │ │
│ │ memories_tr │ │ (vector DB) │ │
│ └──────────────┘ └──────────────┘ │
└────────────────────────────────────────────────────────────────────┘
```
### Step-by-Step Process
#### Step 1: File Watching
The watcher monitors OpenClaw session files in real-time:
```python
# From realtime_qdrant_watcher.py
SESSIONS_DIR = Path("/root/.openclaw/agents/main/sessions")
```
> ⚠️ **Known Limitation:** `SESSIONS_DIR` is currently hardcoded. To use a different path, patch the watcher script to read from an environment variable (e.g., `os.getenv("OPENCLAW_SESSIONS_DIR", "/root/.openclaw/agents/main/sessions")`).
**What happens:**
- Uses `inotify` or polling to watch the sessions directory
- Automatically detects the most recently modified `.jsonl` file
- Handles session rotation (when OpenClaw starts a new session)
- Maintains position in file to avoid re-processing old lines
#### Step 2: Turn Parsing
Each conversation turn is extracted from the JSONL file:
```json
// Example session file entry
{
"type": "message",
"message": {
"role": "user",
"content": "Hello, can you help me?",
"timestamp": "2026-02-27T09:30:00Z"
}
}
```
**What happens:**
- Reads new lines appended to the session file
- Parses JSON to extract role (user/assistant/system)
- Extracts content text
- Captures timestamp
- Generates unique turn ID from content hash + timestamp
**Code flow:**
```python
def parse_turn(line: str) -> Optional[Dict]:
data = json.loads(line)
if data.get("type") != "message":
return None # Skip non-message entries
return {
"id": hashlib.md5(f"{content}{timestamp}".encode()).hexdigest()[:16],
"role": role,
"content": content,
"timestamp": timestamp,
"user_id": os.getenv("USER_ID", "default")
}
```
#### Step 3: Content Cleaning
Before storage, content is normalized:
**Strips:**
- Markdown tables (`| column | column |`)
- Bold/italic markers (`**text**`, `*text*`)
- Inline code (`` `code` ``)
- Code blocks (```code```)
- Multiple consecutive spaces
- Leading/trailing whitespace
**Example:**
```
Input: "Check this **important** table: | col1 | col2 |"
Output: "Check this important table"
```
**Why:** Clean text improves embedding quality and searchability.
#### Step 4: Embedding Generation
The cleaned content is converted to a vector embedding:
```python
def get_embedding(text: str) -> List[float]:
response = requests.post(
f"{OLLAMA_URL}/api/embeddings",
json={"model": EMBEDDING_MODEL, "prompt": text}
)
return response.json()["embedding"]
```
**What happens:**
- Sends text to Ollama API (10.0.0.10:11434)
- Uses `snowflake-arctic-embed2` model
- Returns **1024-dimensional vector** (not 768)
- Falls back gracefully if Ollama is unavailable
#### Step 5: Qdrant Storage
The complete turn data is stored to Qdrant:
```python
payload = {
"user_id": user_id,
"role": turn["role"],
"content": cleaned_content[:2000], # Size limit
"timestamp": turn["timestamp"],
"session_id": session_id,
"source": "true-recall-base"
}
requests.put(
f"{QDRANT_URL}/collections/memories_tr/points",
json={"points": [{"id": turn_id, "vector": embedding, "payload": payload}]}
)
```
**Storage format:**
| Field | Type | Description |
|-------|------|-------------|
| `user_id` | string | User identifier |
| `role` | string | user/assistant/system |
| `content` | string | Cleaned text (max 2000 chars) |
| `timestamp` | string | ISO 8601 timestamp |
| `session_id` | string | Source session file |
| `source` | string | "true-recall-base" |
### Real-Time Performance
| Metric | Target | Actual |
|--------|--------|--------|
| Latency | < 500ms | ~100-200ms |
| Throughput | > 10 turns/sec | > 50 turns/sec |
| Embedding time | < 300ms | ~50-100ms |
| Qdrant write | < 100ms | ~10-50ms |
### Session Rotation Handling
When OpenClaw starts a new session:
1. New `.jsonl` file created in sessions directory
2. Watcher detects file change via `inotify`
3. Identifies most recently modified file
4. Switches to watching new file
5. Continues from position 0 of new file
6. Old file remains in `memories_tr` (already captured)
### Error Handling
**Qdrant unavailable:**
- Retries with exponential backoff
- Logs error, continues watching
- Next turn attempts storage again
**Ollama unavailable:**
- Cannot generate embeddings
- Logs error, skips turn
- Continues watching (no data loss in file)
**File access errors:**
- Handles permission issues gracefully
- Retries on temporary failures
### Collection Schema
**Qdrant collection: `memories_tr`**
```python
{
"name": "memories_tr",
"vectors": {
"size": 1024, # snowflake-arctic-embed2 dimension (1024, not 768)
"distance": "Cosine" # Similarity metric
},
"payload_schema": {
"user_id": "keyword", # Filterable
"role": "keyword", # Filterable
"timestamp": "datetime", # Range filterable
"content": "text" # Full-text searchable
}
}
```
### Security Notes
- **No credential storage** in code
- All sensitive values via environment variables
- `USER_ID` isolates memories per user
- Cleaned content removes PII markers (but review your data)
- HTTPS recommended for production Qdrant/Ollama
---
## Using Memories with OpenClaw
### The "q" Command
**"q"** refers to your Qdrant memory system (`memories_tr` collection).
When interacting with OpenClaw agents, you can search your stored memories using:
- `search q <topic>` - Semantic search for past conversations
- `q <topic>` - Shortcut for the same
### Context Injection Instructions
**For OpenClaw System Prompt:**
Add these lines to your agent's system context to enable memory-aware responses:
```
## Memory System (q)
**"q" = Qdrant collection `memories_tr`** — your conversation history database.
### Memory Retrieval Rules
**Before saying "I don't know" or "I can't do that":**
1. **ALWAYS search q first** using the topic/keywords from the user's request
2. Incorporate findings INTO your response (not as footnotes)
3. Reference specific dates/details: "Based on our Feb 27th discussion..."
**Example workflow:**
```
User asks about X → Search q for X → Use retrieved memories → Answer
```
**WRONG:**
> "I searched Qdrant and found X. [Generic answer unrelated to X]"
**RIGHT:**
> "You asked me to fix this on Feb 27th — do you want me to apply the fix now?"
### When to Search q
**ALWAYS search automatically when:**
- Question references past events, conversations, or details
- User asks "remember when...", "what did we discuss...", "what did I tell you..."
- You're unsure if you have relevant context
- ANY question about configuration, memories, or past interactions
**DO NOT search for:**
- General knowledge questions you can answer directly
- Current time, weather, or factual queries
- Simple requests like "check my email" or "run a command"
- When you already have sufficient context in the conversation
```
### Search Priority
| Order | Source | When to Use |
|-------|--------|-------------|
| 1 | **q (Qdrant)** | First - semantic search of all conversations |
| 2 | `memory/` files | Fallback if q yields no results |
| 3 | Web search | Last resort |
| 4 | "I don't know" | Only after all above |
---
## Next Step
### ✅ Base is Complete
**You don't need to upgrade.** TrueRecall Base is a **fully functional, standalone memory system**. If you're happy with real-time capture and manual search via the `q` command, you can stop here.
Base gives you:
- ✅ Complete conversation history in Qdrant
- ✅ Semantic search via `search q <topic>`
- ✅ Full-text search capabilities
- ✅ Permanent storage of all conversations
**Upgrade only if** you want automatic context injection into prompts.
---
### Optional Addons
Install an **addon** for automatic curation and injection:
| Addon | Purpose | Status |
|-------|---------|--------|
| **Gems** | Extracts atomic gems from memories, injects into context | 🚧 Coming Soon |
| **Blocks** | Topic clustering, contextual block retrieval | 🚧 Coming Soon |
### Upgrade Paths
Once Base is running, you have two upgrade options:
#### Option 1: Gems (Atomic Memory)
**Best for:** Conversational context, quick recall
- **Curator** extracts "gems" (key insights) from `memories_tr`
- Stores curated gems in `gems_tr` collection
- **Injection plugin** recalls relevant gems into prompts automatically
- Optimized for: Chat assistants, help bots, personal memory
**Workflow:**
```
memories_tr → Curator → gems_tr → Injection → Context
```
#### Option 2: Blocks (Topic Clustering)
**Best for:** Document organization, topic-based retrieval
- Clusters conversations by topic automatically
- Creates `topic_blocks_tr` collection
- Retrieves entire contextual blocks on query
- Optimized for: Knowledge bases, document systems
**Workflow:**
```
memories_tr → Topic Engine → topic_blocks_tr → Retrieval → Context
```
**Note:** Gems and Blocks are **independent** addons. They both require Base, but you choose one based on your use case.
---
## Updating / Patching
If you already have TrueRecall Base installed and need to apply a bug fix or update:
### Quick Update (v1.2 Patch)
**Applies to:** Session file detection fix (picks wrong file when multiple sessions active)
```bash
# 1. Backup current watcher
cp /root/.openclaw/workspace/skills/qdrant-memory/scripts/realtime_qdrant_watcher.py \
/root/.openclaw/workspace/skills/qdrant-memory/scripts/realtime_qdrant_watcher.py.bak.$(date +%Y%m%d)
# 2. Download latest watcher (choose one source)
# Option A: From GitHub
curl -o /root/.openclaw/workspace/skills/qdrant-memory/scripts/realtime_qdrant_watcher.py \
https://raw.githubusercontent.com/speedyfoxai/openclaw-true-recall-base/master/watcher/realtime_qdrant_watcher.py
# Option B: From GitLab
curl -o /root/.openclaw/workspace/skills/qdrant-memory/scripts/realtime_qdrant_watcher.py \
https://gitlab.com/mdkrush/true-recall-base/-/raw/master/watcher/realtime_qdrant_watcher.py
# Option C: From local git (if cloned)
cp /path/to/true-recall-base/watcher/realtime_qdrant_watcher.py \
/root/.openclaw/workspace/skills/qdrant-memory/scripts/
# 3. Stop old watcher
pkill -f realtime_qdrant_watcher
# 4. Start new watcher
python3 /root/.openclaw/workspace/skills/qdrant-memory/scripts/realtime_qdrant_watcher.py --daemon
# 5. Verify
ps aux | grep watcher
lsof -p $(pgrep -f realtime_qdrant_watcher) | grep jsonl
```
### Update with Git (If Cloned)
```bash
cd /path/to/true-recall-base
git pull origin master
# Copy updated files
cp watcher/realtime_qdrant_watcher.py \
/root/.openclaw/workspace/skills/qdrant-memory/scripts/
# Copy optional: backfill script
cp scripts/backfill_memory_to_q.py \
/root/.openclaw/workspace/skills/qdrant-memory/scripts/ 2>/dev/null || true
# Restart watcher
sudo systemctl restart mem-qdrant-watcher
# OR manually:
pkill -f realtime_qdrant_watcher
python3 /root/.openclaw/workspace/skills/qdrant-memory/scripts/realtime_qdrant_watcher.py --daemon
```
### Verify Update Applied
```bash
# Check version in file
grep "v1.2" /root/.openclaw/workspace/skills/qdrant-memory/scripts/realtime_qdrant_watcher.py
# Verify watcher is running
ps aux | grep realtime_qdrant_watcher
# Confirm watching main session (not subagent)
lsof -p $(pgrep -f realtime_qdrant_watcher) | grep jsonl
# Check recent captures in Qdrant
curl -s "http://10.0.0.40:6333/collections/memories_tr/points/scroll" \
-H "Content-Type: application/json" \
-d '{"limit": 3, "with_payload": true}' | jq -r '.result.points[].payload.timestamp'
```
### What's New in v1.2
| Feature | Benefit |
|---------|---------|
| **Priority-based session detection** | Always picks `agent:main:main` first |
| **Lock file validation** | Ignores stale/crashed session locks via PID check |
| **Inactive subagent filtering** | Skips sessions with `sessionFile=null` |
| **Backfill script** | Import historical memories from markdown files |
**No config changes required** - existing `config.json` works unchanged.
---
**Prerequisite for:** TrueRecall Gems, TrueRecall Blocks
---
## Upgrading from Older Versions
This section covers full upgrades from older TrueRecall Base installations to the current version.
### Version History
| Version | Key Changes |
|---------|-------------|
| **v1.0** | Initial release - basic watcher |
| **v1.1** | Session detection improvements |
| **v1.2** | Priority-based session detection, lock file validation, backfill script |
| **v1.3** | Offset persistence (resumes from last position), fixes duplicate processing |
| **v1.4** | Current version - Memory backfill fix (Qdrant ids field), improved error handling |
### Upgrade Paths
#### From v1.0/v1.1/v1.2 → v1.4 (Current)
If you have an older installation, follow these steps:
```bash
# Step 1: Backup existing configuration
cp /root/.openclaw/workspace/skills/qdrant-memory/scripts/realtime_qdrant_watcher.py /root/.openclaw/workspace/skills/qdrant-memory/scripts/realtime_qdrant_watcher.py.bak.$(date +%Y%m%d)
cp /root/.openclaw/workspace/skills/qdrant-memory/scripts/config.json /root/.openclaw/workspace/skills/qdrant-memory/scripts/config.json.bak.$(date +%Y%m%d)
```
```bash
# Step 2: Stop the watcher
pkill -f realtime_qdrant_watcher
# Verify stopped
ps aux | grep realtime_qdrant_watcher
```
```bash
# Step 3: Download latest files (choose one source)
# Option A: From GitLab (recommended)
curl -o /root/.openclaw/workspace/skills/qdrant-memory/scripts/realtime_qdrant_watcher.py https://gitlab.com/mdkrush/openclaw-true-recall-base/-/raw/master/watcher/realtime_qdrant_watcher.py
# Option B: From Gitea
curl -o /root/.openclaw/workspace/skills/qdrant-memory/scripts/realtime_qdrant_watcher.py http://10.0.0.61:3000/SpeedyFoxAi/openclaw-true-recall-base/raw/branch/master/watcher/realtime_qdrant_watcher.py
# Option C: From local clone (if you cloned the repo)
cp /path/to/openclaw-true-recall-base/watcher/realtime_qdrant_watcher.py /root/.openclaw/workspace/skills/qdrant-memory/scripts/
```
```bash
# Step 4: Start the watcher
python3 /root/.openclaw/workspace/skills/qdrant-memory/scripts/realtime_qdrant_watcher.py --daemon
```
```bash
# Step 5: Verify installation
ps aux | grep realtime_qdrant_watcher
curl -s "http://10.0.0.40:6333/collections/memories_tr/points/scroll" -H "Content-Type: application/json" -d '{"limit": 3}' | jq '.result.points[0].payload.timestamp'
```
### Upgrading with Git (If You Cloned the Repository)
```bash
# Navigate to your clone
cd /path/to/openclaw-true-recall-base
git pull origin master
# Stop current watcher
pkill -f realtime_qdrant_watcher
# Copy updated files to OpenClaw
cp watcher/realtime_qdrant_watcher.py /root/.openclaw/workspace/skills/qdrant-memory/scripts/
cp scripts/backfill_memory.py /root/.openclaw/workspace/skills/qdrant-memory/scripts/
# Restart the watcher
python3 /root/.openclaw/workspace/skills/qdrant-memory/scripts/realtime_qdrant_watcher.py --daemon
# Verify
ps aux | grep realtime_qdrant_watcher
```
### Backfilling Historical Memories (Optional)
```bash
python3 /root/.openclaw/workspace/skills/qdrant-memory/scripts/backfill_memory.py
```
### Verifying Your Upgrade
```bash
# 1. Check watcher is running
ps aux | grep realtime_qdrant_watcher
# 2. Verify source is "true-recall-base"
curl -s "http://10.0.0.40:6333/collections/memories_tr/points/scroll" -H "Content-Type: application/json" -d '{"limit": 1}' | jq '.result.points[0].payload.source'
# 3. Check date coverage
curl -s "http://10.0.0.40:6333/collections/memories_tr/points/scroll" -H "Content-Type: application/json" -d '{"limit": 10000}' | jq '[.result.points[].payload.date] | unique | sort'
```
Expected output:
- Source: `"true-recall-base"`
- Dates: Array from oldest to newest memory

View File

@@ -0,0 +1,127 @@
#!/bin/bash
# TrueRecall Base - Simple Installer
# Usage: ./install.sh
set -e
echo "=========================================="
echo "TrueRecall Base - Installer"
echo "=========================================="
echo ""
# Default values
DEFAULT_QDRANT_IP="localhost:6333"
DEFAULT_OLLAMA_IP="localhost:11434"
DEFAULT_USER_ID="user"
# Get user input with defaults
echo "Configuration (press Enter for defaults):"
echo ""
echo "Examples:"
echo " Qdrant: 10.0.0.40:6333 (remote) or localhost:6333 (local)"
echo " Ollama: 10.0.0.10:11434 (remote) or localhost:11434 (local)"
echo ""
read -p "Qdrant host:port [$DEFAULT_QDRANT_IP]: " QDRANT_IP
QDRANT_IP=${QDRANT_IP:-$DEFAULT_QDRANT_IP}
read -p "Ollama host:port [$DEFAULT_OLLAMA_IP]: " OLLAMA_IP
OLLAMA_IP=${OLLAMA_IP:-$DEFAULT_OLLAMA_IP}
read -p "User ID [$DEFAULT_USER_ID]: " USER_ID
USER_ID=${USER_ID:-$DEFAULT_USER_ID}
echo ""
echo "Configuration:"
echo " Qdrant: http://$QDRANT_IP"
echo " Ollama: http://$OLLAMA_IP"
echo " User ID: $USER_ID"
echo ""
read -p "Proceed? [Y/n]: " CONFIRM
if [[ $CONFIRM =~ ^[Nn]$ ]]; then
echo "Installation cancelled."
exit 0
fi
# Create service file
echo ""
echo "Creating systemd service..."
# Get absolute path (handles spaces)
INSTALL_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
cat > /tmp/mem-qdrant-watcher.service << EOF
[Unit]
Description=TrueRecall Base - Real-Time Memory Watcher
After=network.target
[Service]
Type=simple
User=$USER
WorkingDirectory=$INSTALL_DIR/watcher
Environment="QDRANT_URL=http://$QDRANT_IP"
Environment="QDRANT_COLLECTION=memories_tr"
Environment="OLLAMA_URL=http://$OLLAMA_IP"
Environment="EMBEDDING_MODEL=snowflake-arctic-embed2"
Environment="USER_ID=$USER_ID"
ExecStart=/usr/bin/python3 $INSTALL_DIR/watcher/realtime_qdrant_watcher.py --daemon
Restart=always
RestartSec=5
[Install]
WantedBy=multi-user.target
EOF
# Install service
sudo cp /tmp/mem-qdrant-watcher.service /etc/systemd/system/
sudo systemctl daemon-reload
echo ""
echo "Starting service..."
sudo systemctl enable --now mem-qdrant-watcher
echo ""
echo "=========================================="
echo "Installation Complete!"
echo "=========================================="
echo ""
echo "Status:"
sudo systemctl status mem-qdrant-watcher --no-pager
echo ""
echo "Verify collection:"
echo " curl -s http://$QDRANT_IP/collections/memories_tr | jq '.result.points_count'"
echo ""
echo "View logs:"
echo " sudo journalctl -u mem-qdrant-watcher -f"
echo ""
echo "=========================================="
echo "UPGRADING FROM OLDER VERSION"
echo "=========================================="
echo ""
echo "If you already have TrueRecall Base installed:"
echo ""
echo "1. Stop the watcher:"
echo " pkill -f realtime_qdrant_watcher"
echo ""
echo "2. Backup current files:"
echo " cp /root/.openclaw/workspace/skills/qdrant-memory/scripts/realtime_qdrant_watcher.py \"
echo " /root/.openclaw/workspace/skills/qdrant-memory/scripts/realtime_qdrant_watcher.py.bak"
echo ""
echo "3. Copy updated files:"
echo " cp watcher/realtime_qdrant_watcher.py \"
echo " /root/.openclaw/workspace/skills/qdrant-memory/scripts/"
echo " cp scripts/backfill_memory.py \"
echo " /root/.openclaw/workspace/skills/qdrant-memory/scripts/"
echo ""
echo "4. Restart watcher:"
echo " python3 /root/.openclaw/workspace/skills/qdrant-memory/scripts/realtime_qdrant_watcher.py --daemon"
echo ""
echo "5. Verify:"
echo " ps aux | grep realtime_qdrant_watcher"
echo ""
echo "For full upgrade instructions, see README.md"

View File

@@ -0,0 +1,67 @@
#!/usr/bin/env python3
"""Backfill memory files to Qdrant memories_tr collection."""
import os
import json
from datetime import datetime
QDRANT_URL = "http://10.0.0.40:6333"
MEMORY_DIR = "/root/.openclaw/workspace/memory"
def get_memory_files():
"""Get all memory files sorted by date."""
files = []
for f in os.listdir(MEMORY_DIR):
if f.startswith("2026-") and f.endswith(".md"):
date = f.replace(".md", "")
files.append((date, f))
return sorted(files, key=lambda x: x[0])
def backfill_file(date, filename):
"""Backfill a single memory file to Qdrant."""
filepath = os.path.join(MEMORY_DIR, filename)
with open(filepath, 'r') as f:
content = f.read()
# Truncate if too long for payload
payload = {
"content": content[:50000], # Limit size
"date": date,
"source": "memory_file",
"curated": False,
"role": "system",
"user_id": "rob"
}
# Add to Qdrant
import requests
point_id = hash(f"memory_{date}") % 10000000000
resp = requests.post(
f"{QDRANT_URL}/collections/memories_tr/points",
json={
"points": [{
"id": point_id,
"payload": payload
}],
"ids": [point_id]
}
)
return resp.status_code == 200
def main():
files = get_memory_files()
print(f"Found {len(files)} memory files to backfill")
count = 0
for date, filename in files:
print(f"Backfilling {filename}...", end=" ")
if backfill_file(date, filename):
print("")
count += 1
else:
print("")
print(f"\nBackfilled {count}/{len(files)} files")
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,566 @@
# TrueRecall Base
**Purpose:** Real-time memory capture → Qdrant `memories_tr`
**Status:** ✅ Standalone capture system
---
## Overview
TrueRecall Base is the **foundation**. It watches OpenClaw sessions in real-time and stores every turn to Qdrant's `memories_tr` collection.
This is **required** for both addons: **Gems** and **Blocks**.
**Base does NOT include:**
- ❌ Curation (gem extraction)
- ❌ Topic clustering (blocks)
- ❌ Injection (context recall)
**For those features, install an addon after base.**
---
## Requirements
**Vector Database**
TrueRecall Base requires a vector database to store conversation embeddings. This can be:
- **Local** - Self-hosted Qdrant (recommended for privacy)
- **Cloud** - Managed Qdrant Cloud or similar service
- **Any IP-accessible** Qdrant instance
In this version, we use a **local Qdrant database** (`http://<QDRANT_IP>:6333`). The database must be reachable from the machine running the watcher daemon.
**Additional Requirements:**
- **Ollama** - For generating text embeddings (local or remote)
- **OpenClaw** - The session files to monitor
- **Linux systemd** - For running the watcher as a service
---
## Gotchas & Known Limitations
> ⚠️ **Embedding Dimensions:** `snowflake-arctic-embed2` outputs **1024 dimensions**, not 768. Ensure your Qdrant collection is configured with `"size": 1024`.
> ⚠️ **Hardcoded Sessions Path:** `SESSIONS_DIR` is hardcoded to `/root/.openclaw/agents/main/sessions`. To use a different path, modify `realtime_qdrant_watcher.py` to read from an environment variable:
> ```python
> SESSIONS_DIR = Path(os.getenv("OPENCLAW_SESSIONS_DIR", "/root/.openclaw/agents/main/sessions"))
> ```
---
## Three-Tier Architecture
```
true-recall-base (REQUIRED)
├── Core: Watcher daemon
└── Stores: memories_tr
├──▶ true-recall-gems (ADDON)
│ ├── Curator extracts gems → gems_tr
│ └── Plugin injects gems into prompts
└──▶ true-recall-blocks (ADDON)
├── Topic clustering → topic_blocks_tr
└── Contextual block retrieval
Note: Gems and Blocks are INDEPENDENT addons.
They both require Base, but don't work together.
Choose one: Gems OR Blocks (not both).
```
---
## Quick Start
### Option 1: Quick Install (Recommended)
```bash
cd /path/to/true-recall-base
./install.sh
```
#### What the Installer Does (Step-by-Step)
The `install.sh` script automates the entire setup process. Here's exactly what happens:
**Step 1: Interactive Configuration**
```
Configuration (press Enter for defaults):
Examples:
Qdrant: 10.0.0.40:6333 (remote) or localhost:6333 (local)
Ollama: 10.0.0.10:11434 (remote) or localhost:11434 (local)
Qdrant host:port [localhost:6333]: _
Ollama host:port [localhost:11434]: _
User ID [user]: _
```
- Prompts for Qdrant host:port (default: `localhost:6333`)
- Prompts for Ollama host:port (default: `localhost:11434`)
- Prompts for User ID (default: `user`)
- Press Enter to accept defaults, or type custom values
**Step 2: Configuration Confirmation**
```
Configuration:
Qdrant: http://localhost:6333
Ollama: http://localhost:11434
User ID: user
Proceed? [Y/n]: _
```
- Shows the complete configuration
- Asks for confirmation (type `n` to cancel, Enter or `Y` to proceed)
- Exits cleanly if cancelled, no changes made
**Step 3: Systemd Service Generation**
- Creates a temporary service file at `/tmp/mem-qdrant-watcher.service`
- Inserts your configuration values (IPs, ports, user ID)
- Uses absolute path for the script location (handles spaces in paths)
- Sets up automatic restart on failure
**Step 4: Service Installation**
```bash
sudo cp /tmp/mem-qdrant-watcher.service /etc/systemd/system/
sudo systemctl daemon-reload
```
- Copies the service file to systemd directory
- Reloads systemd to recognize the new service
**Step 5: Service Activation**
```bash
sudo systemctl enable --now mem-qdrant-watcher
```
- Enables the service to start on boot (`enable`)
- Starts the service immediately (`now`)
**Step 6: Verification**
```
==========================================
Installation Complete!
==========================================
Status:
● mem-qdrant-watcher.service - TrueRecall Base...
Active: active (running)
```
- Displays the service status
- Shows it's active and running
- Provides commands to verify and monitor
**Post-Installation Commands:**
```bash
# Check service status anytime
sudo systemctl status mem-qdrant-watcher
# View live logs
sudo journalctl -u mem-qdrant-watcher -f
# Verify Qdrant collection
curl -s http://localhost:6333/collections/memories_tr | jq '.result.points_count'
```
#### Installer Requirements
- Must run as root or with sudo (for systemd operations)
- Must have execute permissions (`chmod +x install.sh`)
- Script must be run from the true-recall-base directory
### Option 2: Manual Install
```bash
cd /path/to/true-recall-base
# Copy service file
sudo cp watcher/mem-qdrant-watcher.service /etc/systemd/system/
# Edit the service file to set your IPs and user
sudo nano /etc/systemd/system/mem-qdrant-watcher.service
# Reload and start
sudo systemctl daemon-reload
sudo systemctl enable --now mem-qdrant-watcher
```
### Verify Installation
```bash
# Check service status
sudo systemctl status mem-qdrant-watcher
# Check collection
curl -s http://<QDRANT_IP>:6333/collections/memories_tr | jq '.result.points_count'
```
---
## Files
| File | Purpose |
|------|---------|
| `watcher/realtime_qdrant_watcher.py` | Capture daemon |
| `watcher/mem-qdrant-watcher.service` | Systemd service |
| `config.json` | Configuration template |
---
## Configuration
Edit `config.json` or set environment variables:
| Variable | Default | Description |
|----------|---------|-------------|
| `QDRANT_URL` | `http://<QDRANT_IP>:6333` | Qdrant endpoint |
| `OLLAMA_URL` | `http://<OLLAMA_IP>:11434` | Ollama endpoint |
| `EMBEDDING_MODEL` | `snowflake-arctic-embed2` | Embedding model |
| `USER_ID` | `<USER_ID>` | User identifier |
---
## How It Works
### Architecture Overview
```
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ OpenClaw Chat │────▶│ Session JSONL │────▶│ Base Watcher │
│ (You talking) │ │ (/sessions/*.jsonl) │ │ (This daemon) │
└─────────────────┘ └──────────────────┘ └────────┬────────┘
┌────────────────────────────────────────────────────────────────────┐
│ PROCESSING PIPELINE │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌───────────┐ │
│ │ Watch File │─▶│ Parse Turn │─▶│ Clean Text │─▶│ Embed │ │
│ │ (inotify) │ │ (JSON→dict) │ │ (strip md) │ │ (Ollama) │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ └─────┬─────┘ │
│ │ │
│ ┌───────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────┐ ┌──────────────┐ │
│ │ Store to │─▶│ Qdrant │ │
│ │ memories_tr │ │ (vector DB) │ │
│ └──────────────┘ └──────────────┘ │
└────────────────────────────────────────────────────────────────────┘
```
### Step-by-Step Process
#### Step 1: File Watching
The watcher monitors OpenClaw session files in real-time:
```python
# From realtime_qdrant_watcher.py
SESSIONS_DIR = Path("/root/.openclaw/agents/main/sessions")
```
> ⚠️ **Known Limitation:** `SESSIONS_DIR` is currently hardcoded. To use a different path, patch the watcher script to read from an environment variable (e.g., `os.getenv("OPENCLAW_SESSIONS_DIR", "/root/.openclaw/agents/main/sessions")`).
**What happens:**
- Uses `inotify` or polling to watch the sessions directory
- Automatically detects the most recently modified `.jsonl` file
- Handles session rotation (when OpenClaw starts a new session)
- Maintains position in file to avoid re-processing old lines
#### Step 2: Turn Parsing
Each conversation turn is extracted from the JSONL file:
```json
// Example session file entry
{
"type": "message",
"message": {
"role": "user",
"content": "Hello, can you help me?",
"timestamp": "2026-02-27T09:30:00Z"
}
}
```
**What happens:**
- Reads new lines appended to the session file
- Parses JSON to extract role (user/assistant/system)
- Extracts content text
- Captures timestamp
- Generates unique turn ID from content hash + timestamp
**Code flow:**
```python
def parse_turn(line: str) -> Optional[Dict]:
data = json.loads(line)
if data.get("type") != "message":
return None # Skip non-message entries
return {
"id": hashlib.md5(f"{content}{timestamp}".encode()).hexdigest()[:16],
"role": role,
"content": content,
"timestamp": timestamp,
"user_id": os.getenv("USER_ID", "default")
}
```
#### Step 3: Content Cleaning
Before storage, content is normalized:
**Strips:**
- Markdown tables (`| column | column |`)
- Bold/italic markers (`**text**`, `*text*`)
- Inline code (`` `code` ``)
- Code blocks (```code```)
- Multiple consecutive spaces
- Leading/trailing whitespace
**Example:**
```
Input: "Check this **important** table: | col1 | col2 |"
Output: "Check this important table"
```
**Why:** Clean text improves embedding quality and searchability.
#### Step 4: Embedding Generation
The cleaned content is converted to a vector embedding:
```python
def get_embedding(text: str) -> List[float]:
response = requests.post(
f"{OLLAMA_URL}/api/embeddings",
json={"model": EMBEDDING_MODEL, "prompt": text}
)
return response.json()["embedding"]
```
**What happens:**
- Sends text to Ollama API (10.0.0.10:11434)
- Uses `snowflake-arctic-embed2` model
- Returns **1024-dimensional vector** (not 768)
- Falls back gracefully if Ollama is unavailable
#### Step 5: Qdrant Storage
The complete turn data is stored to Qdrant:
```python
payload = {
"user_id": user_id,
"role": turn["role"],
"content": cleaned_content[:2000], # Size limit
"timestamp": turn["timestamp"],
"session_id": session_id,
"source": "true-recall-base"
}
requests.put(
f"{QDRANT_URL}/collections/memories_tr/points",
json={"points": [{"id": turn_id, "vector": embedding, "payload": payload}]}
)
```
**Storage format:**
| Field | Type | Description |
|-------|------|-------------|
| `user_id` | string | User identifier |
| `role` | string | user/assistant/system |
| `content` | string | Cleaned text (max 2000 chars) |
| `timestamp` | string | ISO 8601 timestamp |
| `session_id` | string | Source session file |
| `source` | string | "true-recall-base" |
### Real-Time Performance
| Metric | Target | Actual |
|--------|--------|--------|
| Latency | < 500ms | ~100-200ms |
| Throughput | > 10 turns/sec | > 50 turns/sec |
| Embedding time | < 300ms | ~50-100ms |
| Qdrant write | < 100ms | ~10-50ms |
### Session Rotation Handling
When OpenClaw starts a new session:
1. New `.jsonl` file created in sessions directory
2. Watcher detects file change via `inotify`
3. Identifies most recently modified file
4. Switches to watching new file
5. Continues from position 0 of new file
6. Old file remains in `memories_tr` (already captured)
### Error Handling
**Qdrant unavailable:**
- Retries with exponential backoff
- Logs error, continues watching
- Next turn attempts storage again
**Ollama unavailable:**
- Cannot generate embeddings
- Logs error, skips turn
- Continues watching (no data loss in file)
**File access errors:**
- Handles permission issues gracefully
- Retries on temporary failures
### Collection Schema
**Qdrant collection: `memories_tr`**
```python
{
"name": "memories_tr",
"vectors": {
"size": 1024, # snowflake-arctic-embed2 dimension (1024, not 768)
"distance": "Cosine" # Similarity metric
},
"payload_schema": {
"user_id": "keyword", # Filterable
"role": "keyword", # Filterable
"timestamp": "datetime", # Range filterable
"content": "text" # Full-text searchable
}
}
```
### Security Notes
- **No credential storage** in code
- All sensitive values via environment variables
- `USER_ID` isolates memories per user
- Cleaned content removes PII markers (but review your data)
- HTTPS recommended for production Qdrant/Ollama
---
## Using Memories with OpenClaw
### The "q" Command
**"q"** refers to your Qdrant memory system (`memories_tr` collection).
When interacting with OpenClaw agents, you can search your stored memories using:
- `search q <topic>` - Semantic search for past conversations
- `q <topic>` - Shortcut for the same
### Context Injection Instructions
**For OpenClaw System Prompt:**
Add these lines to your agent's system context to enable memory-aware responses:
```
## Memory System (q)
**"q" = Qdrant collection `memories_tr`** — your conversation history database.
### Memory Retrieval Rules
**Before saying "I don't know" or "I can't do that":**
1. **ALWAYS search q first** using the topic/keywords from the user's request
2. Incorporate findings INTO your response (not as footnotes)
3. Reference specific dates/details: "Based on our Feb 27th discussion..."
**Example workflow:**
```
User asks about X → Search q for X → Use retrieved memories → Answer
```
**WRONG:**
> "I searched Qdrant and found X. [Generic answer unrelated to X]"
**RIGHT:**
> "You asked me to fix this on Feb 27th — do you want me to apply the fix now?"
### When to Search q
**ALWAYS search automatically when:**
- Question references past events, conversations, or details
- User asks "remember when...", "what did we discuss...", "what did I tell you..."
- You're unsure if you have relevant context
- ANY question about configuration, memories, or past interactions
**DO NOT search for:**
- General knowledge questions you can answer directly
- Current time, weather, or factual queries
- Simple requests like "check my email" or "run a command"
- When you already have sufficient context in the conversation
```
### Search Priority
| Order | Source | When to Use |
|-------|--------|-------------|
| 1 | **q (Qdrant)** | First - semantic search of all conversations |
| 2 | `memory/` files | Fallback if q yields no results |
| 3 | Web search | Last resort |
| 4 | "I don't know" | Only after all above |
---
## Next Step
### ✅ Base is Complete
**You don't need to upgrade.** TrueRecall Base is a **fully functional, standalone memory system**. If you're happy with real-time capture and manual search via the `q` command, you can stop here.
Base gives you:
- ✅ Complete conversation history in Qdrant
- ✅ Semantic search via `search q <topic>`
- ✅ Full-text search capabilities
- ✅ Permanent storage of all conversations
**Upgrade only if** you want automatic context injection into prompts.
---
### Optional Addons
Install an **addon** for automatic curation and injection:
| Addon | Purpose | Status |
|-------|---------|--------|
| **Gems** | Extracts atomic gems from memories, injects into context | 🚧 Coming Soon |
| **Blocks** | Topic clustering, contextual block retrieval | 🚧 Coming Soon |
### Upgrade Paths
Once Base is running, you have two upgrade options:
#### Option 1: Gems (Atomic Memory)
**Best for:** Conversational context, quick recall
- **Curator** extracts "gems" (key insights) from `memories_tr`
- Stores curated gems in `gems_tr` collection
- **Injection plugin** recalls relevant gems into prompts automatically
- Optimized for: Chat assistants, help bots, personal memory
**Workflow:**
```
memories_tr → Curator → gems_tr → Injection → Context
```
#### Option 2: Blocks (Topic Clustering)
**Best for:** Document organization, topic-based retrieval
- Clusters conversations by topic automatically
- Creates `topic_blocks_tr` collection
- Retrieves entire contextual blocks on query
- Optimized for: Knowledge bases, document systems
**Workflow:**
```
memories_tr → Topic Engine → topic_blocks_tr → Retrieval → Context
```
**Note:** Gems and Blocks are **independent** addons. They both require Base, but you choose one based on your use case.
---
**Prerequisite for:** TrueRecall Gems, TrueRecall Blocks

View File

@@ -0,0 +1,14 @@
{
"version": "1.1",
"description": "TrueRecall v1.1 - Memory capture with session rotation fix",
"components": ["watcher"],
"collections": {
"memories": "memories_tr"
},
"qdrant_url": "http://10.0.0.40:6333",
"ollama_url": "http://localhost:11434",
"embedding_model": "snowflake-arctic-embed2",
"embedding_dimensions": 1024,
"user_id": "rob",
"notes": "Ensure memories_tr collection is created with size=1024 for snowflake-arctic-embed2"
}

View File

@@ -0,0 +1,367 @@
#!/usr/bin/env python3
"""
TrueRecall v1.2 - Real-time Qdrant Watcher
Monitors OpenClaw sessions and stores to memories_tr instantly.
This is the CAPTURE component. For curation and injection, install v2.
Changelog:
- v1.2: Fixed session rotation bug - added inactivity detection (30s threshold)
and improved file scoring to properly detect new sessions on /new or /reset
- v1.1: Added 1-second mtime polling for session rotation
- v1.0: Initial release
"""
import os
import sys
import json
import time
import signal
import hashlib
import argparse
import requests
from datetime import datetime, timezone
from pathlib import Path
from typing import Dict, Any, Optional, List
# Config
QDRANT_URL = os.getenv("QDRANT_URL", "http://10.0.0.40:6333")
QDRANT_COLLECTION = os.getenv("QDRANT_COLLECTION", "memories_tr")
OLLAMA_URL = os.getenv("OLLAMA_URL", "http://localhost:11434")
EMBEDDING_MODEL = os.getenv("EMBEDDING_MODEL", "snowflake-arctic-embed2")
USER_ID = os.getenv("USER_ID", "rob")
# Paths
SESSIONS_DIR = Path(os.getenv("OPENCLAW_SESSIONS_DIR", "/root/.openclaw/agents/main/sessions"))
# State
running = True
last_position = 0
current_file = None
turn_counter = 0
def signal_handler(signum, frame):
global running
print(f"\nReceived signal {signum}, shutting down...", file=sys.stderr)
running = False
def get_embedding(text: str) -> List[float]:
try:
response = requests.post(
f"{OLLAMA_URL}/api/embeddings",
json={"model": EMBEDDING_MODEL, "prompt": text},
timeout=30
)
response.raise_for_status()
return response.json()["embedding"]
except Exception as e:
print(f"Error getting embedding: {e}", file=sys.stderr)
return None
def clean_content(text: str) -> str:
import re
# Remove metadata JSON blocks
text = re.sub(r'Conversation info \(untrusted metadata\):\s*```json\s*\{[\s\S]*?\}\s*```', '', text)
# Remove thinking tags
text = re.sub(r'\[thinking:[^\]]*\]', '', text)
# Remove timestamp lines
text = re.sub(r'\[\w{3} \d{4}-\d{2}-\d{2} \d{2}:\d{2} [A-Z]{3}\]', '', text)
# Remove markdown tables
text = re.sub(r'\|[^\n]*\|', '', text)
text = re.sub(r'\|[-:]+\|', '', text)
# Remove markdown formatting
text = re.sub(r'\*\*([^*]+)\*\*', r'\1', text)
text = re.sub(r'\*([^*]+)\*', r'\1', text)
text = re.sub(r'`([^`]+)`', r'\1', text)
text = re.sub(r'```[\s\S]*?```', '', text)
# Remove horizontal rules
text = re.sub(r'---+', '', text)
text = re.sub(r'\*\*\*+', '', text)
# Remove excess whitespace
text = re.sub(r'\n{3,}', '\n', text)
text = re.sub(r'[ \t]+', ' ', text)
return text.strip()
def store_to_qdrant(turn: Dict[str, Any], dry_run: bool = False) -> bool:
if dry_run:
print(f"[DRY RUN] Would store turn {turn['turn']} ({turn['role']}): {turn['content'][:60]}...")
return True
vector = get_embedding(turn['content'])
if vector is None:
print(f"Failed to get embedding for turn {turn['turn']}", file=sys.stderr)
return False
payload = {
"user_id": turn.get('user_id', USER_ID),
"role": turn['role'],
"content": turn['content'],
"turn": turn['turn'],
"timestamp": turn.get('timestamp', datetime.now(timezone.utc).isoformat()),
"date": datetime.now(timezone.utc).strftime('%Y-%m-%d'),
"source": "true-recall-base",
"curated": False
}
# Generate deterministic ID
turn_id = turn.get('turn', 0)
hash_bytes = hashlib.sha256(f"{USER_ID}:turn:{turn_id}:{datetime.now().strftime('%H%M%S')}".encode()).digest()[:8]
point_id = int.from_bytes(hash_bytes, byteorder='big') % (2**63)
try:
response = requests.put(
f"{QDRANT_URL}/collections/{QDRANT_COLLECTION}/points",
json={
"points": [{
"id": abs(point_id),
"vector": vector,
"payload": payload
}]
},
timeout=30
)
response.raise_for_status()
return True
except Exception as e:
print(f"Error writing to Qdrant: {e}", file=sys.stderr)
return False
def get_current_session_file():
"""Find the most recently active session file.
Uses a combination of creation time and modification time to handle
session rotation when /new or /reset is used.
"""
if not SESSIONS_DIR.exists():
return None
files = list(SESSIONS_DIR.glob("*.jsonl"))
if not files:
return None
# Score files by: recency (mtime) + size activity
# Files with very recent mtime AND non-zero size are likely active
def file_score(p: Path) -> float:
try:
stat = p.stat()
mtime = stat.st_mtime
size = stat.st_size
# Prefer files with recent mtime and non-zero size
# Add small bonus for larger files (active sessions grow)
return mtime + (size / 1e9) # size bonus is tiny vs mtime
except Exception:
return 0
return max(files, key=file_score)
def parse_turn(line: str, session_name: str) -> Optional[Dict[str, Any]]:
global turn_counter
try:
entry = json.loads(line.strip())
except json.JSONDecodeError:
return None
if entry.get('type') != 'message' or 'message' not in entry:
return None
msg = entry['message']
role = msg.get('role')
if role in ('toolResult', 'system', 'developer'):
return None
if role not in ('user', 'assistant'):
return None
content = ""
if isinstance(msg.get('content'), list):
for item in msg['content']:
if isinstance(item, dict) and 'text' in item:
content += item['text']
elif isinstance(msg.get('content'), str):
content = msg['content']
if not content:
return None
content = clean_content(content)
if not content or len(content) < 5:
return None
turn_counter += 1
return {
'turn': turn_counter,
'role': role,
'content': content[:2000],
'timestamp': entry.get('timestamp', datetime.now(timezone.utc).isoformat()),
'user_id': USER_ID
}
def process_new_lines(f, session_name: str, dry_run: bool = False):
global last_position
f.seek(last_position)
for line in f:
line = line.strip()
if not line:
continue
turn = parse_turn(line, session_name)
if turn:
if store_to_qdrant(turn, dry_run):
print(f"✅ Turn {turn['turn']} ({turn['role']}) → Qdrant")
last_position = f.tell()
def watch_session(session_file: Path, dry_run: bool = False):
global last_position, turn_counter
session_name = session_file.name.replace('.jsonl', '')
print(f"Watching session: {session_file.name}")
try:
with open(session_file, 'r') as f:
for line in f:
turn_counter += 1
last_position = session_file.stat().st_size
print(f"Session has {turn_counter} existing turns, starting from position {last_position}")
except Exception as e:
print(f"Warning: Could not read existing turns: {e}", file=sys.stderr)
last_position = 0
last_session_check = time.time()
last_data_time = time.time() # Track when we last saw new data
last_file_size = session_file.stat().st_size if session_file.exists() else 0
INACTIVITY_THRESHOLD = 30 # seconds - if no data for 30s, check for new session
with open(session_file, 'r') as f:
while running:
if not session_file.exists():
print("Session file removed, looking for new session...")
return None
current_time = time.time()
# Check for newer session every 1 second
if current_time - last_session_check > 1.0:
last_session_check = current_time
newest_session = get_current_session_file()
if newest_session and newest_session != session_file:
print(f"Newer session detected: {newest_session.name}")
return newest_session
# Check if current file is stale (no new data for threshold)
if current_time - last_data_time > INACTIVITY_THRESHOLD:
try:
current_size = session_file.stat().st_size
# If file hasn't grown, check if another session is active
if current_size == last_file_size:
newest_session = get_current_session_file()
if newest_session and newest_session != session_file:
print(f"Current session inactive, switching to: {newest_session.name}")
return newest_session
else:
# File grew, update tracking
last_file_size = current_size
last_data_time = current_time
except Exception:
pass
# Process new lines and update activity tracking
old_position = last_position
process_new_lines(f, session_name, dry_run)
# If we processed new data, update activity timestamp
if last_position > old_position:
last_data_time = current_time
try:
last_file_size = session_file.stat().st_size
except Exception:
pass
time.sleep(0.1)
return session_file
def watch_loop(dry_run: bool = False):
global current_file, turn_counter
while running:
session_file = get_current_session_file()
if session_file is None:
print("No active session found, waiting...")
time.sleep(1)
continue
if current_file != session_file:
print(f"\nNew session detected: {session_file.name}")
current_file = session_file
turn_counter = 0
last_position = 0
result = watch_session(session_file, dry_run)
if result is None:
current_file = None
time.sleep(0.5)
def main():
global USER_ID
parser = argparse.ArgumentParser(description="TrueRecall v1.1 - Real-time Memory Capture")
parser.add_argument("--daemon", "-d", action="store_true", help="Run as daemon")
parser.add_argument("--once", "-o", action="store_true", help="Process once then exit")
parser.add_argument("--dry-run", "-n", action="store_true", help="Don't write to Qdrant")
parser.add_argument("--user-id", "-u", default=USER_ID, help=f"User ID (default: {USER_ID})")
args = parser.parse_args()
signal.signal(signal.SIGINT, signal_handler)
signal.signal(signal.SIGTERM, signal_handler)
if args.user_id:
USER_ID = args.user_id
print(f"🔍 TrueRecall v1.1 - Real-time Memory Capture")
print(f"📍 Qdrant: {QDRANT_URL}/{QDRANT_COLLECTION}")
print(f"🧠 Ollama: {OLLAMA_URL}/{EMBEDDING_MODEL}")
print(f"👤 User: {USER_ID}")
print()
if args.once:
print("Running once...")
session_file = get_current_session_file()
if session_file:
watch_session(session_file, args.dry_run)
else:
print("No session found")
else:
print("Running as daemon (Ctrl+C to stop)...")
watch_loop(args.dry_run)
if __name__ == "__main__":
main()

656
README.md Normal file
View File

@@ -0,0 +1,656 @@
# TrueRecall Base
**Purpose:** Real-time memory capture → Qdrant `memories_tr`
**Status:** ✅ Standalone capture system
---
## Overview
TrueRecall Base is the **foundation**. It watches OpenClaw sessions in real-time and stores every turn to Qdrant's `memories_tr` collection.
This is **required** for both addons: **Gems** and **Blocks**.
**Base does NOT include:**
- ❌ Curation (gem extraction)
- ❌ Topic clustering (blocks)
- ❌ Injection (context recall)
**For those features, install an addon after base.**
---
## Requirements
**Vector Database**
TrueRecall Base requires a vector database to store conversation embeddings. This can be:
- **Local** - Self-hosted Qdrant (recommended for privacy)
- **Cloud** - Managed Qdrant Cloud or similar service
- **Any IP-accessible** Qdrant instance
In this version, we use a **local Qdrant database** (`http://<QDRANT_IP>:6333`). The database must be reachable from the machine running the watcher daemon.
**Additional Requirements:**
- **Ollama** - For generating text embeddings (local or remote)
- **OpenClaw** - The session files to monitor
- **Linux systemd** - For running the watcher as a service
---
## Gotchas & Known Limitations
> ⚠️ **Embedding Dimensions:** `snowflake-arctic-embed2` outputs **1024 dimensions**, not 768. Ensure your Qdrant collection is configured with `"size": 1024`.
> ⚠️ **Hardcoded Sessions Path:** `SESSIONS_DIR` is hardcoded to `/root/.openclaw/agents/main/sessions`. To use a different path, modify `realtime_qdrant_watcher.py` to read from an environment variable:
> ```python
> SESSIONS_DIR = Path(os.getenv("OPENCLAW_SESSIONS_DIR", "/root/.openclaw/agents/main/sessions"))
> ```
---
## Three-Tier Architecture
```
true-recall-base (REQUIRED)
├── Core: Watcher daemon
└── Stores: memories_tr
├──▶ true-recall-gems (ADDON)
│ ├── Curator extracts gems → gems_tr
│ └── Plugin injects gems into prompts
└──▶ true-recall-blocks (ADDON)
├── Topic clustering → topic_blocks_tr
└── Contextual block retrieval
Note: Gems and Blocks are INDEPENDENT addons.
They both require Base, but don't work together.
Choose one: Gems OR Blocks (not both).
```
---
## Quick Start
### Option 1: Quick Install (Recommended)
```bash
cd /path/to/true-recall-base
./install.sh
```
#### What the Installer Does (Step-by-Step)
The `install.sh` script automates the entire setup process. Here's exactly what happens:
**Step 1: Interactive Configuration**
```
Configuration (press Enter for defaults):
Examples:
Qdrant: 10.0.0.40:6333 (remote) or localhost:6333 (local)
Ollama: 10.0.0.10:11434 (remote) or localhost:11434 (local)
Qdrant host:port [localhost:6333]: _
Ollama host:port [localhost:11434]: _
User ID [user]: _
```
- Prompts for Qdrant host:port (default: `localhost:6333`)
- Prompts for Ollama host:port (default: `localhost:11434`)
- Prompts for User ID (default: `user`)
- Press Enter to accept defaults, or type custom values
**Step 2: Configuration Confirmation**
```
Configuration:
Qdrant: http://localhost:6333
Ollama: http://localhost:11434
User ID: user
Proceed? [Y/n]: _
```
- Shows the complete configuration
- Asks for confirmation (type `n` to cancel, Enter or `Y` to proceed)
- Exits cleanly if cancelled, no changes made
**Step 3: Systemd Service Generation**
- Creates a temporary service file at `/tmp/mem-qdrant-watcher.service`
- Inserts your configuration values (IPs, ports, user ID)
- Uses absolute path for the script location (handles spaces in paths)
- Sets up automatic restart on failure
**Step 4: Service Installation**
```bash
sudo cp /tmp/mem-qdrant-watcher.service /etc/systemd/system/
sudo systemctl daemon-reload
```
- Copies the service file to systemd directory
- Reloads systemd to recognize the new service
**Step 5: Service Activation**
```bash
sudo systemctl enable --now mem-qdrant-watcher
```
- Enables the service to start on boot (`enable`)
- Starts the service immediately (`now`)
**Step 6: Verification**
```
==========================================
Installation Complete!
==========================================
Status:
● mem-qdrant-watcher.service - TrueRecall Base...
Active: active (running)
```
- Displays the service status
- Shows it's active and running
- Provides commands to verify and monitor
**Post-Installation Commands:**
```bash
# Check service status anytime
sudo systemctl status mem-qdrant-watcher
# View live logs
sudo journalctl -u mem-qdrant-watcher -f
# Verify Qdrant collection
curl -s http://localhost:6333/collections/memories_tr | jq '.result.points_count'
```
#### Installer Requirements
- Must run as root or with sudo (for systemd operations)
- Must have execute permissions (`chmod +x install.sh`)
- Script must be run from the true-recall-base directory
### Option 2: Manual Install
```bash
cd /path/to/true-recall-base
# Copy service file
sudo cp watcher/mem-qdrant-watcher.service /etc/systemd/system/
# Edit the service file to set your IPs and user
sudo nano /etc/systemd/system/mem-qdrant-watcher.service
# Reload and start
sudo systemctl daemon-reload
sudo systemctl enable --now mem-qdrant-watcher
```
### Verify Installation
```bash
# Check service status
sudo systemctl status mem-qdrant-watcher
# Check collection
curl -s http://<QDRANT_IP>:6333/collections/memories_tr | jq '.result.points_count'
```
---
## Files
| File | Purpose |
|------|---------|
| `watcher/realtime_qdrant_watcher.py` | Capture daemon |
| `watcher/mem-qdrant-watcher.service` | Systemd service |
| `config.json` | Configuration template |
---
## Configuration
Edit `config.json` or set environment variables:
| Variable | Default | Description |
|----------|---------|-------------|
| `QDRANT_URL` | `http://<QDRANT_IP>:6333` | Qdrant endpoint |
| `OLLAMA_URL` | `http://<OLLAMA_IP>:11434` | Ollama endpoint |
| `EMBEDDING_MODEL` | `snowflake-arctic-embed2` | Embedding model |
| `USER_ID` | `<USER_ID>` | User identifier |
---
## How It Works
### Architecture Overview
```
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ OpenClaw Chat │────▶│ Session JSONL │────▶│ Base Watcher │
│ (You talking) │ │ (/sessions/*.jsonl) │ │ (This daemon) │
└─────────────────┘ └──────────────────┘ └────────┬────────┘
┌────────────────────────────────────────────────────────────────────┐
│ PROCESSING PIPELINE │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌───────────┐ │
│ │ Watch File │─▶│ Parse Turn │─▶│ Clean Text │─▶│ Embed │ │
│ │ (inotify) │ │ (JSON→dict) │ │ (strip md) │ │ (Ollama) │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ └─────┬─────┘ │
│ │ │
│ ┌───────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────┐ ┌──────────────┐ │
│ │ Store to │─▶│ Qdrant │ │
│ │ memories_tr │ │ (vector DB) │ │
│ └──────────────┘ └──────────────┘ │
└────────────────────────────────────────────────────────────────────┘
```
### Step-by-Step Process
#### Step 1: File Watching
The watcher monitors OpenClaw session files in real-time:
```python
# From realtime_qdrant_watcher.py
SESSIONS_DIR = Path("/root/.openclaw/agents/main/sessions")
```
> ⚠️ **Known Limitation:** `SESSIONS_DIR` is currently hardcoded. To use a different path, patch the watcher script to read from an environment variable (e.g., `os.getenv("OPENCLAW_SESSIONS_DIR", "/root/.openclaw/agents/main/sessions")`).
**What happens:**
- Uses `inotify` or polling to watch the sessions directory
- Automatically detects the most recently modified `.jsonl` file
- Handles session rotation (when OpenClaw starts a new session)
- Maintains position in file to avoid re-processing old lines
#### Step 2: Turn Parsing
Each conversation turn is extracted from the JSONL file:
```json
// Example session file entry
{
"type": "message",
"message": {
"role": "user",
"content": "Hello, can you help me?",
"timestamp": "2026-02-27T09:30:00Z"
}
}
```
**What happens:**
- Reads new lines appended to the session file
- Parses JSON to extract role (user/assistant/system)
- Extracts content text
- Captures timestamp
- Generates unique turn ID from content hash + timestamp
**Code flow:**
```python
def parse_turn(line: str) -> Optional[Dict]:
data = json.loads(line)
if data.get("type") != "message":
return None # Skip non-message entries
return {
"id": hashlib.md5(f"{content}{timestamp}".encode()).hexdigest()[:16],
"role": role,
"content": content,
"timestamp": timestamp,
"user_id": os.getenv("USER_ID", "default")
}
```
#### Step 3: Content Cleaning
Before storage, content is normalized:
**Strips:**
- Markdown tables (`| column | column |`)
- Bold/italic markers (`**text**`, `*text*`)
- Inline code (`` `code` ``)
- Code blocks (```code```)
- Multiple consecutive spaces
- Leading/trailing whitespace
**Example:**
```
Input: "Check this **important** table: | col1 | col2 |"
Output: "Check this important table"
```
**Why:** Clean text improves embedding quality and searchability.
#### Step 4: Embedding Generation
The cleaned content is converted to a vector embedding:
```python
def get_embedding(text: str) -> List[float]:
response = requests.post(
f"{OLLAMA_URL}/api/embeddings",
json={"model": EMBEDDING_MODEL, "prompt": text}
)
return response.json()["embedding"]
```
**What happens:**
- Sends text to Ollama API (10.0.0.10:11434)
- Uses `snowflake-arctic-embed2` model
- Returns **1024-dimensional vector** (not 768)
- Falls back gracefully if Ollama is unavailable
#### Step 5: Qdrant Storage
The complete turn data is stored to Qdrant:
```python
payload = {
"user_id": user_id,
"role": turn["role"],
"content": cleaned_content[:2000], # Size limit
"timestamp": turn["timestamp"],
"session_id": session_id,
"source": "true-recall-base"
}
requests.put(
f"{QDRANT_URL}/collections/memories_tr/points",
json={"points": [{"id": turn_id, "vector": embedding, "payload": payload}]}
)
```
**Storage format:**
| Field | Type | Description |
|-------|------|-------------|
| `user_id` | string | User identifier |
| `role` | string | user/assistant/system |
| `content` | string | Cleaned text (max 2000 chars) |
| `timestamp` | string | ISO 8601 timestamp |
| `session_id` | string | Source session file |
| `source` | string | "true-recall-base" |
### Real-Time Performance
| Metric | Target | Actual |
|--------|--------|--------|
| Latency | < 500ms | ~100-200ms |
| Throughput | > 10 turns/sec | > 50 turns/sec |
| Embedding time | < 300ms | ~50-100ms |
| Qdrant write | < 100ms | ~10-50ms |
### Session Rotation Handling
When OpenClaw starts a new session:
1. New `.jsonl` file created in sessions directory
2. Watcher detects file change via `inotify`
3. Identifies most recently modified file
4. Switches to watching new file
5. Continues from position 0 of new file
6. Old file remains in `memories_tr` (already captured)
### Error Handling
**Qdrant unavailable:**
- Retries with exponential backoff
- Logs error, continues watching
- Next turn attempts storage again
**Ollama unavailable:**
- Cannot generate embeddings
- Logs error, skips turn
- Continues watching (no data loss in file)
**File access errors:**
- Handles permission issues gracefully
- Retries on temporary failures
### Collection Schema
**Qdrant collection: `memories_tr`**
```python
{
"name": "memories_tr",
"vectors": {
"size": 1024, # snowflake-arctic-embed2 dimension (1024, not 768)
"distance": "Cosine" # Similarity metric
},
"payload_schema": {
"user_id": "keyword", # Filterable
"role": "keyword", # Filterable
"timestamp": "datetime", # Range filterable
"content": "text" # Full-text searchable
}
}
```
### Security Notes
- **No credential storage** in code
- All sensitive values via environment variables
- `USER_ID` isolates memories per user
- Cleaned content removes PII markers (but review your data)
- HTTPS recommended for production Qdrant/Ollama
---
## Using Memories with OpenClaw
### The "q" Command
**"q"** refers to your Qdrant memory system (`memories_tr` collection).
When interacting with OpenClaw agents, you can search your stored memories using:
- `search q <topic>` - Semantic search for past conversations
- `q <topic>` - Shortcut for the same
### Context Injection Instructions
**For OpenClaw System Prompt:**
Add these lines to your agent's system context to enable memory-aware responses:
```
## Memory System (q)
**"q" = Qdrant collection `memories_tr`** — your conversation history database.
### Memory Retrieval Rules
**Before saying "I don't know" or "I can't do that":**
1. **ALWAYS search q first** using the topic/keywords from the user's request
2. Incorporate findings INTO your response (not as footnotes)
3. Reference specific dates/details: "Based on our Feb 27th discussion..."
**Example workflow:**
```
User asks about X → Search q for X → Use retrieved memories → Answer
```
**WRONG:**
> "I searched Qdrant and found X. [Generic answer unrelated to X]"
**RIGHT:**
> "You asked me to fix this on Feb 27th — do you want me to apply the fix now?"
### When to Search q
**ALWAYS search automatically when:**
- Question references past events, conversations, or details
- User asks "remember when...", "what did we discuss...", "what did I tell you..."
- You're unsure if you have relevant context
- ANY question about configuration, memories, or past interactions
**DO NOT search for:**
- General knowledge questions you can answer directly
- Current time, weather, or factual queries
- Simple requests like "check my email" or "run a command"
- When you already have sufficient context in the conversation
```
### Search Priority
| Order | Source | When to Use |
|-------|--------|-------------|
| 1 | **q (Qdrant)** | First - semantic search of all conversations |
| 2 | `memory/` files | Fallback if q yields no results |
| 3 | Web search | Last resort |
| 4 | "I don't know" | Only after all above |
---
## Next Step
### ✅ Base is Complete
**You don't need to upgrade.** TrueRecall Base is a **fully functional, standalone memory system**. If you're happy with real-time capture and manual search via the `q` command, you can stop here.
Base gives you:
- ✅ Complete conversation history in Qdrant
- ✅ Semantic search via `search q <topic>`
- ✅ Full-text search capabilities
- ✅ Permanent storage of all conversations
**Upgrade only if** you want automatic context injection into prompts.
---
### Optional Addons
Install an **addon** for automatic curation and injection:
| Addon | Purpose | Status |
|-------|---------|--------|
| **Gems** | Extracts atomic gems from memories, injects into context | 🚧 Coming Soon |
| **Blocks** | Topic clustering, contextual block retrieval | 🚧 Coming Soon |
### Upgrade Paths
Once Base is running, you have two upgrade options:
#### Option 1: Gems (Atomic Memory)
**Best for:** Conversational context, quick recall
- **Curator** extracts "gems" (key insights) from `memories_tr`
- Stores curated gems in `gems_tr` collection
- **Injection plugin** recalls relevant gems into prompts automatically
- Optimized for: Chat assistants, help bots, personal memory
**Workflow:**
```
memories_tr → Curator → gems_tr → Injection → Context
```
#### Option 2: Blocks (Topic Clustering)
**Best for:** Document organization, topic-based retrieval
- Clusters conversations by topic automatically
- Creates `topic_blocks_tr` collection
- Retrieves entire contextual blocks on query
- Optimized for: Knowledge bases, document systems
**Workflow:**
```
memories_tr → Topic Engine → topic_blocks_tr → Retrieval → Context
```
**Note:** Gems and Blocks are **independent** addons. They both require Base, but you choose one based on your use case.
---
## Updating / Patching
If you already have TrueRecall Base installed and need to apply a bug fix or update:
### Quick Update (v1.2 Patch)
**Applies to:** Session file detection fix (picks wrong file when multiple sessions active)
```bash
# 1. Backup current watcher
cp /root/.openclaw/workspace/skills/qdrant-memory/scripts/realtime_qdrant_watcher.py \
/root/.openclaw/workspace/skills/qdrant-memory/scripts/realtime_qdrant_watcher.py.bak.$(date +%Y%m%d)
# 2. Download latest watcher (choose one source)
# Option A: From GitHub
curl -o /root/.openclaw/workspace/skills/qdrant-memory/scripts/realtime_qdrant_watcher.py \
https://raw.githubusercontent.com/speedyfoxai/openclaw-true-recall-base/master/watcher/realtime_qdrant_watcher.py
# Option B: From GitLab
curl -o /root/.openclaw/workspace/skills/qdrant-memory/scripts/realtime_qdrant_watcher.py \
https://gitlab.com/mdkrush/true-recall-base/-/raw/master/watcher/realtime_qdrant_watcher.py
# Option C: From local git (if cloned)
cp /path/to/true-recall-base/watcher/realtime_qdrant_watcher.py \
/root/.openclaw/workspace/skills/qdrant-memory/scripts/
# 3. Stop old watcher
pkill -f realtime_qdrant_watcher
# 4. Start new watcher
python3 /root/.openclaw/workspace/skills/qdrant-memory/scripts/realtime_qdrant_watcher.py --daemon
# 5. Verify
ps aux | grep watcher
lsof -p $(pgrep -f realtime_qdrant_watcher) | grep jsonl
```
### Update with Git (If Cloned)
```bash
cd /path/to/true-recall-base
git pull origin master
# Copy updated files
cp watcher/realtime_qdrant_watcher.py \
/root/.openclaw/workspace/skills/qdrant-memory/scripts/
# Copy optional: backfill script
cp scripts/backfill_memory_to_q.py \
/root/.openclaw/workspace/skills/qdrant-memory/scripts/ 2>/dev/null || true
# Restart watcher
sudo systemctl restart mem-qdrant-watcher
# OR manually:
pkill -f realtime_qdrant_watcher
python3 /root/.openclaw/workspace/skills/qdrant-memory/scripts/realtime_qdrant_watcher.py --daemon
```
### Verify Update Applied
```bash
# Check version in file
grep "v1.2" /root/.openclaw/workspace/skills/qdrant-memory/scripts/realtime_qdrant_watcher.py
# Verify watcher is running
ps aux | grep realtime_qdrant_watcher
# Confirm watching main session (not subagent)
lsof -p $(pgrep -f realtime_qdrant_watcher) | grep jsonl
# Check recent captures in Qdrant
curl -s "http://10.0.0.40:6333/collections/memories_tr/points/scroll" \
-H "Content-Type: application/json" \
-d '{"limit": 3, "with_payload": true}' | jq -r '.result.points[].payload.timestamp'
```
### What's New in v1.2
| Feature | Benefit |
|---------|---------|
| **Priority-based session detection** | Always picks `agent:main:main` first |
| **Lock file validation** | Ignores stale/crashed session locks via PID check |
| **Inactive subagent filtering** | Skips sessions with `sessionFile=null` |
| **Backfill script** | Import historical memories from markdown files |
**No config changes required** - existing `config.json` works unchanged.
---
**Prerequisite for:** TrueRecall Gems, TrueRecall Blocks

12
config.json Normal file
View File

@@ -0,0 +1,12 @@
{
"version": "1.0",
"description": "TrueRecall Base - Memory capture",
"components": ["watcher"],
"collections": {
"memories": "memories_tr"
},
"qdrant_url": "http://<QDRANT_IP>:6333",
"ollama_url": "http://<OLLAMA_IP>:11434",
"embedding_model": "snowflake-arctic-embed2",
"user_id": "<USER_ID>"
}

98
install.sh Normal file
View File

@@ -0,0 +1,98 @@
#!/bin/bash
# TrueRecall Base - Simple Installer
# Usage: ./install.sh
set -e
echo "=========================================="
echo "TrueRecall Base - Installer"
echo "=========================================="
echo ""
# Default values
DEFAULT_QDRANT_IP="localhost:6333"
DEFAULT_OLLAMA_IP="localhost:11434"
DEFAULT_USER_ID="user"
# Get user input with defaults
echo "Configuration (press Enter for defaults):"
echo ""
echo "Examples:"
echo " Qdrant: 10.0.0.40:6333 (remote) or localhost:6333 (local)"
echo " Ollama: 10.0.0.10:11434 (remote) or localhost:11434 (local)"
echo ""
read -p "Qdrant host:port [$DEFAULT_QDRANT_IP]: " QDRANT_IP
QDRANT_IP=${QDRANT_IP:-$DEFAULT_QDRANT_IP}
read -p "Ollama host:port [$DEFAULT_OLLAMA_IP]: " OLLAMA_IP
OLLAMA_IP=${OLLAMA_IP:-$DEFAULT_OLLAMA_IP}
read -p "User ID [$DEFAULT_USER_ID]: " USER_ID
USER_ID=${USER_ID:-$DEFAULT_USER_ID}
echo ""
echo "Configuration:"
echo " Qdrant: http://$QDRANT_IP"
echo " Ollama: http://$OLLAMA_IP"
echo " User ID: $USER_ID"
echo ""
read -p "Proceed? [Y/n]: " CONFIRM
if [[ $CONFIRM =~ ^[Nn]$ ]]; then
echo "Installation cancelled."
exit 0
fi
# Create service file
echo ""
echo "Creating systemd service..."
# Get absolute path (handles spaces)
INSTALL_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
cat > /tmp/mem-qdrant-watcher.service << EOF
[Unit]
Description=TrueRecall Base - Real-Time Memory Watcher
After=network.target
[Service]
Type=simple
User=$USER
WorkingDirectory=$INSTALL_DIR/watcher
Environment="QDRANT_URL=http://$QDRANT_IP"
Environment="QDRANT_COLLECTION=memories_tr"
Environment="OLLAMA_URL=http://$OLLAMA_IP"
Environment="EMBEDDING_MODEL=snowflake-arctic-embed2"
Environment="USER_ID=$USER_ID"
ExecStart=/usr/bin/python3 $INSTALL_DIR/watcher/realtime_qdrant_watcher.py --daemon
Restart=always
RestartSec=5
[Install]
WantedBy=multi-user.target
EOF
# Install service
sudo cp /tmp/mem-qdrant-watcher.service /etc/systemd/system/
sudo systemctl daemon-reload
echo ""
echo "Starting service..."
sudo systemctl enable --now mem-qdrant-watcher
echo ""
echo "=========================================="
echo "Installation Complete!"
echo "=========================================="
echo ""
echo "Status:"
sudo systemctl status mem-qdrant-watcher --no-pager
echo ""
echo "Verify collection:"
echo " curl -s http://$QDRANT_IP/collections/memories_tr | jq '.result.points_count'"
echo ""
echo "View logs:"
echo " sudo journalctl -u mem-qdrant-watcher -f"

View File

@@ -153,5 +153,15 @@ sessions_spawn({
**Status:** Configured and ready **Status:** Configured and ready
## Git Repository Initialized
**Setup:** Git repo initialized for workspace version control
**Commits:**
- `d1357c5` — Initial commit: 77 files, 10,822 insertions (workspace setup)
- `98d14be` — MEMORY.md updated with sub-agent and git config
**Status:** Clean working tree, tracking active
--- ---
*Stored for long-term memory retention* *Stored for long-term memory retention*

168
memory/2026-02-14.md Normal file
View File

@@ -0,0 +1,168 @@
# Website Details - SpeedyFoxAI
**Domain:** speedyfoxai.com
**Hosted on:** deb2 (10.0.0.39)
**Web Root:** /root/html/ (Nginx serves from here, NOT /var/www/html/)
**Created:** February 13, 2026
**Created by:** Kimi (OpenClaw + Ollama) via SSH
## Critical Discovery
Nginx config (`/etc/nginx/sites-enabled/default`) sets `root /root/html;`
This means `/var/www/html/` is NOT the live document root - `/root/html/` is.
## Visitor Counter System
### Current Status
- **Count:** 288 (restored from Feb 13 backup)
- **Location:** `/root/html/count.txt`
- **Persistent storage:** `/root/html/.counter_total`
- **Script:** `/root/html/update_count_persistent.sh`
### Why It Reset
Old script read from nginx access.log which gets rotated by logrotate. When logs rotate, count drops to near zero. Lost ~350 visits (was ~400+, dropped to 46).
### Fix Applied
New persistent counter that:
1. Stores total in `.counter_total` file
2. Tracks last log line counted in `.counter_last_line`
3. Only adds NEW visits since last run
4. Handles log rotation gracefully
## Site Versions
### Current Live Version
- **File:** `/root/html/index.html` (served by nginx)
- **Source:** `/var/www/html/index.html` (edit here, copy to /root/html/)
- **Style:** Simple HTML/CSS (Rob's Tech Lab theme)
- **Features:** Visitor counter, 3 embedded YouTube videos
### Backup Version (Full Featured)
- **Location:** `/root/html_backup/20260213_155707/index.html`
- **Style:** Tailwind CSS, full SpeedyFoxAI branding
- **Features:** Dark mode, FAQ section, navigation, full counter
## YouTube Channel
**Name:** SpeedyFoxAI
**URL:** https://www.youtube.com/@SpeedyFoxAi
**Stats:** 5K+ subscribers, 51+ videos
### Embedded Videos
1. DIY AI Assistant Setup (kz-4l5roK6k)
2. Self-Hosted Tools Deep Dive (9IYNGK44EyM)
3. OpenClaw + Ollama Workflow (8Fncc5Sg2yg)
## File Structure
```
/root/html/ # LIVE site (nginx root)
├── index.html # Main page
├── count.txt # Visitor count (288)
├── .counter_total # Persistent count storage
├── .counter_last_line # Log line tracking
├── update_count_persistent.sh # Counter script
├── websitememory.md # Documentation
├── downloads.html
├── fox720.jpg
└── favicon.png
/root/html_backup/ # Backups with timestamps
├── 20260214_071243/ # Pre-counter-script backup
├── 20260214_070713/ # Before counter fix
├── 20260214_070536/ # Full backup
└── 20260213_155707/ # Full version with counter
/var/www/html/ # Edit source (copy to /root/html/)
└── index.html
```
## Technical Details
### Counter JavaScript
```javascript
fetch("/count.txt?t=" + Date.now())
.then(r => r.text())
.then(n => {
document.getElementById("visit-count").textContent =
parseInt(n || 0).toString().padStart(6, "0");
});
```
### Counter Display
- Location: Footer
- Format: "Visitors: 000288"
- Style: 10px font, opacity 0.5
### Backup Strategy
```bash
DT=$(date +%Y%m%d_%H%M%S)
mkdir -p /root/html_backup/${DT}
cp -r /root/html/* /root/html_backup/${DT}/
cp /var/www/html/* /root/html_backup/${DT}/
```
## SEO & Content
- **Title:** Rob's Tech Lab | Local AI & Self-Hosted Tools
- **Meta:** None (simple version)
- **Schema:** None (simple version)
- **Full version has:** Schema.org FAQPage, structured data
## Social Links
- YouTube: @SpeedyFoxAi
- Discord: mdkrush
- GitHub: mdkrush
- Twitter: mdkrush
## Creator
**Name:** Rob
**Brand:** SpeedyFoxAI
**Focus:** Self-hosting, local AI, automation tools
**Personality:** Comical/structured humor
## Status
- **Counter:** Working (shows 000288)
- **HTML:** Valid structure, no misaligned code
- **Backups:** Multiple timestamps available
- **Documentation:** /root/html/websitememory.md
Stored: February 14, 2026
## Relationship to YouTube
The SpeedyFoxAI.com website complements the YouTube channel @SpeedyFoxAi. It serves as a hub for video content, resources, and contact info, with embedded videos linking directly to the channel. Design and branding are consistent across both platforms.
---
## UPDATE - Feb 14, 07:21
### Counter Reset Issue - ROOT CAUSE FOUND
**Problem:** Count kept resetting to current nginx log lines (86, 89, etc.)
**Root Cause:** Old script `/root/html/update_count.sh` still existed and was running:
```bash
#!/bin/bash
COUNT=$(wc -l < /var/log/nginx/access.log 2>/dev/null || echo 0)
echo "$COUNT" > /root/html/count.txt
```
This script was periodically overwriting count.txt with nginx log line count, overriding the persistent counter.
**Fix Applied:**
- Removed `/root/html/update_count.sh`
- Restored count.txt to 288
- Persistent counter now working correctly
**Lesson:** Check for competing scripts before implementing fixes.
---
## Rule Added - Feb 14, 2026
**Always validate after changes.** No exceptions.
- Test functionality
- Verify file integrity
- Check permissions
- Confirm expected output
Applied retroactively to todays counter fix.

View File

@@ -0,0 +1,198 @@
#!/usr/bin/env python3
"""
Backfill memories_tr collection from memory markdown files.
Processes all .md files in /root/.openclaw/workspace/memory/
and stores them to Qdrant memories_tr collection.
Usage:
python3 backfill_memory_to_q.py [--dry-run]
"""
import argparse
import hashlib
import json
import os
import re
import sys
from pathlib import Path
from datetime import datetime, timezone
from typing import List, Optional, Dict, Any
import requests
# Config
QDRANT_URL = os.getenv("QDRANT_URL", "http://10.0.0.40:6333")
COLLECTION_NAME = "memories_tr"
OLLAMA_URL = os.getenv("OLLAMA_URL", "http://10.0.0.10:11434")
EMBEDDING_MODEL = os.getenv("EMBEDDING_MODEL", "snowflake-arctic-embed2")
MEMORY_DIR = Path("/root/.openclaw/workspace/memory")
USER_ID = "rob"
def get_embedding(text: str) -> Optional[List[float]]:
"""Generate embedding using Ollama"""
try:
response = requests.post(
f"{OLLAMA_URL}/api/embeddings",
json={"model": EMBEDDING_MODEL, "prompt": text[:4000]},
timeout=30
)
response.raise_for_status()
return response.json()["embedding"]
except Exception as e:
print(f"Error getting embedding: {e}", file=sys.stderr)
return None
def clean_content(text: str) -> str:
"""Clean markdown content for storage"""
# Remove markdown formatting
text = re.sub(r'\*\*([^*]+)\*\*', r'\1', text)
text = re.sub(r'\*([^*]+)\*', r'\1', text)
text = re.sub(r'`([^`]+)`', r'\1', text)
text = re.sub(r'```[\s\S]*?```', '', text)
# Remove headers
text = re.sub(r'^#{1,6}\s+', '', text, flags=re.MULTILINE)
# Remove excess whitespace
text = re.sub(r'\n{3,}', '\n\n', text)
return text.strip()
def parse_memory_file(file_path: Path) -> List[Dict[str, Any]]:
"""Parse a memory markdown file into entries"""
entries = []
try:
content = file_path.read_text(encoding='utf-8')
except Exception as e:
print(f"Error reading {file_path}: {e}", file=sys.stderr)
return entries
# Extract date from filename
date_match = re.search(r'(\d{4}-\d{2}-\d{2})', file_path.name)
date_str = date_match.group(1) if date_match else datetime.now().strftime('%Y-%m-%d')
# Split by session headers (## Session: or ## Update:)
sessions = re.split(r'\n## ', content)
for i, session in enumerate(sessions):
if not session.strip():
continue
# Extract session title if present
title_match = re.match(r'Session:\s*(.+)', session, re.MULTILINE)
if not title_match:
title_match = re.match(r'Update:\s*(.+)', session, re.MULTILINE)
session_title = title_match.group(1).strip() if title_match else f"Session {i}"
# Extract key events, decisions, and content
# Look for bullet points and content
sections = session.split('\n### ')
for section in sections:
if not section.strip():
continue
# Clean the content
cleaned = clean_content(section)
if len(cleaned) < 20: # Skip very short sections
continue
entry = {
'content': cleaned[:2000],
'role': 'assistant', # These are summaries
'date': date_str,
'session_title': session_title,
'file': file_path.name,
'source': 'memory-backfill'
}
entries.append(entry)
return entries
def store_to_qdrant(entry: Dict[str, Any], dry_run: bool = False) -> bool:
"""Store a memory entry to Qdrant"""
content = entry['content']
if dry_run:
print(f"[DRY RUN] Would store: {content[:60]}...")
return True
vector = get_embedding(content)
if vector is None:
return False
# Generate deterministic ID
hash_content = f"{USER_ID}:{entry['date']}:{content[:100]}"
hash_bytes = hashlib.sha256(hash_content.encode()).digest()[:8]
point_id = abs(int.from_bytes(hash_bytes, byteorder='big') % (2**63))
payload = {
'user_id': USER_ID,
'role': entry.get('role', 'assistant'),
'content': content,
'date': entry['date'],
'timestamp': datetime.now(timezone.utc).isoformat(),
'source': entry.get('source', 'memory-backfill'),
'file': entry.get('file', ''),
'session_title': entry.get('session_title', ''),
'curated': True # Mark as curated since these are processed
}
try:
response = requests.put(
f"{QDRANT_URL}/collections/{COLLECTION_NAME}/points",
json={'points': [{'id': point_id, 'vector': vector, 'payload': payload}]},
timeout=30
)
response.raise_for_status()
return True
except Exception as e:
print(f"Error storing to Qdrant: {e}", file=sys.stderr)
return False
def main():
parser = argparse.ArgumentParser(description='Backfill memory files to Qdrant')
parser.add_argument('--dry-run', '-n', action='store_true', help='Dry run - do not write to Qdrant')
parser.add_argument('--limit', '-l', type=int, default=None, help='Limit number of files to process')
args = parser.parse_args()
if not MEMORY_DIR.exists():
print(f"Memory directory not found: {MEMORY_DIR}", file=sys.stderr)
sys.exit(1)
# Get all markdown files
md_files = sorted(MEMORY_DIR.glob('*.md'))
if args.limit:
md_files = md_files[:args.limit]
print(f"Found {len(md_files)} memory files to process")
print(f"Target collection: {COLLECTION_NAME}")
print(f"Qdrant URL: {QDRANT_URL}")
print(f"Ollama URL: {OLLAMA_URL}")
print()
total_entries = 0
stored = 0
failed = 0
for file_path in md_files:
print(f"Processing: {file_path.name}")
entries = parse_memory_file(file_path)
for entry in entries:
total_entries += 1
if store_to_qdrant(entry, args.dry_run):
stored += 1
print(f" ✅ Stored entry {stored}")
else:
failed += 1
print(f" ❌ Failed entry {failed}")
print()
print(f"Done! Processed {len(md_files)} files")
print(f"Total entries: {total_entries}")
print(f"Stored: {stored}")
print(f"Failed: {failed}")
if __name__ == '__main__':
main()

87
scripts/search_q.sh Executable file
View File

@@ -0,0 +1,87 @@
#!/bin/bash
# search_q.sh - Search memories with chronological sorting
# Usage: ./search_q.sh "search query"
# Returns: Results sorted by timestamp (newest first)
set -e
QDRANT_URL="${QDRANT_URL:-http://localhost:6333}"
COLLECTION="${QDRANT_COLLECTION:-memories_tr}"
LIMIT="${SEARCH_LIMIT:-10}"
if [ -z "$1" ]; then
echo "Usage: ./search_q.sh 'your search query'"
echo ""
echo "Environment variables:"
echo " QDRANT_URL - Qdrant endpoint (default: http://localhost:6333)"
echo " SEARCH_LIMIT - Number of results (default: 10)"
exit 1
fi
QUERY="$1"
echo "=========================================="
echo "Searching: '$QUERY'"
echo "=========================================="
echo ""
# Search with scroll to get all results, then sort by timestamp
# Using scroll API to handle large result sets
SCROLL_ID="null"
ALL_RESULTS="[]"
while true; do
if [ "$SCROLL_ID" = "null" ]; then
RESPONSE=$(curl -s -X POST "$QDRANT_URL/collections/$COLLECTION/points/scroll" \
-H "Content-Type: application/json" \
-d "{
\"limit\": $LIMIT,
\"with_payload\": true,
\"filter\": {
\"must\": [
{
\"key\": \"content\",
\"match\": {
\"text\": \"$QUERY\"
}
}
]
}
}") 2>/dev/null || echo '{"result": {"points": []}}'
else
break # For text search, we get results in first call
fi
# Extract results
POINTS=$(echo "$RESPONSE" | jq -r '.result.points // []')
if [ "$POINTS" = "[]" ] || [ "$POINTS" = "null" ]; then
break
fi
ALL_RESULTS="$POINTS"
break
done
# Sort by timestamp (newest first) and format output
echo "$ALL_RESULTS" | jq -r '
sort_by(.payload.timestamp) | reverse |
.[] |
"━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\n" +
"📅 " + (.payload.timestamp | split("T") | join(" ")) + "\n" +
"👤 " + .payload.role + " | User: " + .payload.user_id + "\n" +
"📝 " + (.payload.content | if length > 250 then .[0:250] + "..." else . end) + "\n"
' 2>/dev/null | tee /tmp/search_results.txt
# Count results
RESULT_COUNT=$(cat /tmp/search_results.txt | grep -c "━━━━━━━━" 2>/dev/null || echo "0")
echo ""
echo "=========================================="
if [ "$RESULT_COUNT" -gt 0 ]; then
echo "Found $RESULT_COUNT result(s). Most recent shown first."
else
echo "No results found for '$QUERY'"
fi
echo "=========================================="

View File

@@ -0,0 +1,19 @@
[Unit]
Description=TrueRecall Base - Real-Time Memory Watcher
After=network.target
[Service]
Type=simple
User=<USER>
WorkingDirectory=<INSTALL_PATH>/true-recall-base/watcher
Environment="QDRANT_URL=http://<QDRANT_IP>:6333"
Environment="QDRANT_COLLECTION=memories_tr"
Environment="OLLAMA_URL=http://<OLLAMA_IP>:11434"
Environment="EMBEDDING_MODEL=snowflake-arctic-embed2"
Environment="USER_ID=<USER_ID>"
ExecStart=/usr/bin/python3 <INSTALL_PATH>/true-recall-base/watcher/realtime_qdrant_watcher.py --daemon
Restart=always
RestartSec=5
[Install]
WantedBy=multi-user.target

View File

@@ -0,0 +1,445 @@
#!/usr/bin/env python3
"""
TrueRecall v1.2 - Real-time Qdrant Watcher
Monitors OpenClaw sessions and stores to memories_tr instantly.
This is the CAPTURE component. For curation and injection, install v2.
Changelog:
- v1.2: Fixed session rotation bug - added inactivity detection (30s threshold)
and improved file scoring to properly detect new sessions on /new or /reset
- v1.1: Added 1-second mtime polling for session rotation
- v1.0: Initial release
"""
import os
import sys
import json
import time
import signal
import hashlib
import argparse
import requests
from datetime import datetime, timezone
from pathlib import Path
from typing import Dict, Any, Optional, List
# Config
QDRANT_URL = os.getenv("QDRANT_URL", "http://10.0.0.40:6333")
QDRANT_COLLECTION = os.getenv("QDRANT_COLLECTION", "memories_tr")
OLLAMA_URL = os.getenv("OLLAMA_URL", "http://10.0.0.10:11434")
EMBEDDING_MODEL = os.getenv("EMBEDDING_MODEL", "snowflake-arctic-embed2")
USER_ID = os.getenv("USER_ID", "rob")
# Paths
SESSIONS_DIR = Path(os.getenv("OPENCLAW_SESSIONS_DIR", "/root/.openclaw/agents/main/sessions"))
# State
running = True
last_position = 0
current_file = None
turn_counter = 0
def signal_handler(signum, frame):
global running
print(f"\nReceived signal {signum}, shutting down...", file=sys.stderr)
running = False
def get_embedding(text: str) -> List[float]:
try:
response = requests.post(
f"{OLLAMA_URL}/api/embeddings",
json={"model": EMBEDDING_MODEL, "prompt": text},
timeout=30
)
response.raise_for_status()
return response.json()["embedding"]
except Exception as e:
print(f"Error getting embedding: {e}", file=sys.stderr)
return None
def clean_content(text: str) -> str:
import re
# Remove metadata JSON blocks
text = re.sub(r'Conversation info \(untrusted metadata\):\s*```json\s*\{[\s\S]*?\}\s*```', '', text)
# Remove thinking tags
text = re.sub(r'\[thinking:[^\]]*\]', '', text)
# Remove timestamp lines
text = re.sub(r'\[\w{3} \d{4}-\d{2}-\d{2} \d{2}:\d{2} [A-Z]{3}\]', '', text)
# Remove markdown tables
text = re.sub(r'\|[^\n]*\|', '', text)
text = re.sub(r'\|[-:]+\|', '', text)
# Remove markdown formatting
text = re.sub(r'\*\*([^*]+)\*\*', r'\1', text)
text = re.sub(r'\*([^*]+)\*', r'\1', text)
text = re.sub(r'`([^`]+)`', r'\1', text)
text = re.sub(r'```[\s\S]*?```', '', text)
# Remove horizontal rules
text = re.sub(r'---+', '', text)
text = re.sub(r'\*\*\*+', '', text)
# Remove excess whitespace
text = re.sub(r'\n{3,}', '\n', text)
text = re.sub(r'[ \t]+', ' ', text)
return text.strip()
def store_to_qdrant(turn: Dict[str, Any], dry_run: bool = False) -> bool:
if dry_run:
print(f"[DRY RUN] Would store turn {turn['turn']} ({turn['role']}): {turn['content'][:60]}...")
return True
vector = get_embedding(turn['content'])
if vector is None:
print(f"Failed to get embedding for turn {turn['turn']}", file=sys.stderr)
return False
payload = {
"user_id": turn.get('user_id', USER_ID),
"role": turn['role'],
"content": turn['content'],
"turn": turn['turn'],
"timestamp": turn.get('timestamp', datetime.now(timezone.utc).isoformat()),
"date": datetime.now(timezone.utc).strftime('%Y-%m-%d'),
"source": "true-recall-base",
"curated": False
}
# Generate deterministic ID
turn_id = turn.get('turn', 0)
hash_bytes = hashlib.sha256(f"{USER_ID}:turn:{turn_id}:{datetime.now().strftime('%H%M%S')}".encode()).digest()[:8]
point_id = int.from_bytes(hash_bytes, byteorder='big') % (2**63)
try:
response = requests.put(
f"{QDRANT_URL}/collections/{QDRANT_COLLECTION}/points",
json={
"points": [{
"id": abs(point_id),
"vector": vector,
"payload": payload
}]
},
timeout=30
)
response.raise_for_status()
return True
except Exception as e:
print(f"Error writing to Qdrant: {e}", file=sys.stderr)
return False
def is_lock_valid(lock_path: Path, max_age_seconds: int = 1800) -> bool:
"""Check if lock file is valid (not stale, PID exists)."""
try:
with open(lock_path, 'r') as f:
data = json.load(f)
# Check lock file age
created = datetime.fromisoformat(data['createdAt'].replace('Z', '+00:00'))
if (datetime.now(timezone.utc) - created).total_seconds() > max_age_seconds:
return False
# Check PID exists
pid = data.get('pid')
if pid and not os.path.exists(f"/proc/{pid}"):
return False
return True
except Exception:
return False
def get_current_session_file():
"""Find the most recently active session file.
Priority (per subagent analysis consensus):
1. Explicit agent:main:main lookup from sessions.json (highest priority)
2. Lock files with valid PID + recent timestamp
3. Parse sessions.json for other active sessions
4. File scoring by mtime + size (fallback)
"""
if not SESSIONS_DIR.exists():
return None
sessions_json = SESSIONS_DIR / "sessions.json"
# PRIORITY 1: Explicit main session lookup
if sessions_json.exists():
try:
with open(sessions_json, 'r') as f:
sessions_data = json.load(f)
# Look up agent:main:main explicitly
main_session = sessions_data.get("agent:main:main", {})
main_session_id = main_session.get('sessionId')
if main_session_id:
main_file = SESSIONS_DIR / f"{main_session_id}.jsonl"
if main_file.exists():
return main_file
except Exception as e:
print(f"Warning: Failed to parse sessions.json for main session: {e}", file=sys.stderr)
# PRIORITY 2: Lock files with PID validation
lock_files = list(SESSIONS_DIR.glob("*.jsonl.lock"))
valid_locks = [lf for lf in lock_files if is_lock_valid(lf)]
if valid_locks:
# Get the most recent valid lock file
newest_lock = max(valid_locks, key=lambda p: p.stat().st_mtime)
session_file = SESSIONS_DIR / newest_lock.name.replace('.jsonl.lock', '.jsonl')
if session_file.exists():
return session_file
# PRIORITY 3: Parse sessions.json for other sessions with sessionFile
if sessions_json.exists():
try:
with open(sessions_json, 'r') as f:
sessions_data = json.load(f)
active_session = None
active_mtime = 0
for session_key, session_info in sessions_data.items():
# Skip if no sessionFile (inactive subagents have null)
session_file_path = session_info.get('sessionFile')
if not session_file_path:
continue
session_file = Path(session_file_path)
if session_file.exists():
mtime = session_file.stat().st_mtime
if mtime > active_mtime:
active_mtime = mtime
active_session = session_file
if active_session:
return active_session
except Exception as e:
print(f"Warning: Failed to parse sessions.json: {e}", file=sys.stderr)
# PRIORITY 4: Score files by recency (mtime) + size
files = list(SESSIONS_DIR.glob("*.jsonl"))
if not files:
return None
def file_score(p: Path) -> float:
try:
stat = p.stat()
mtime = stat.st_mtime
size = stat.st_size
return mtime + (size / 1e9)
except Exception:
return 0
return max(files, key=file_score)
def parse_turn(line: str, session_name: str) -> Optional[Dict[str, Any]]:
global turn_counter
try:
entry = json.loads(line.strip())
except json.JSONDecodeError:
return None
if entry.get('type') != 'message' or 'message' not in entry:
return None
msg = entry['message']
role = msg.get('role')
if role in ('toolResult', 'system', 'developer'):
return None
if role not in ('user', 'assistant'):
return None
content = ""
if isinstance(msg.get('content'), list):
for item in msg['content']:
if isinstance(item, dict) and 'text' in item:
content += item['text']
elif isinstance(msg.get('content'), str):
content = msg['content']
if not content:
return None
content = clean_content(content)
if not content or len(content) < 5:
return None
turn_counter += 1
return {
'turn': turn_counter,
'role': role,
'content': content[:2000],
'timestamp': entry.get('timestamp', datetime.now(timezone.utc).isoformat()),
'user_id': USER_ID
}
def process_new_lines(f, session_name: str, dry_run: bool = False):
global last_position
f.seek(last_position)
for line in f:
line = line.strip()
if not line:
continue
turn = parse_turn(line, session_name)
if turn:
if store_to_qdrant(turn, dry_run):
print(f"✅ Turn {turn['turn']} ({turn['role']}) → Qdrant")
last_position = f.tell()
def watch_session(session_file: Path, dry_run: bool = False):
global last_position, turn_counter
session_name = session_file.name.replace('.jsonl', '')
print(f"Watching session: {session_file.name}")
try:
with open(session_file, 'r') as f:
for line in f:
turn_counter += 1
last_position = session_file.stat().st_size
print(f"Session has {turn_counter} existing turns, starting from position {last_position}")
except Exception as e:
print(f"Warning: Could not read existing turns: {e}", file=sys.stderr)
last_position = 0
last_session_check = time.time()
last_data_time = time.time() # Track when we last saw new data
last_file_size = session_file.stat().st_size if session_file.exists() else 0
INACTIVITY_THRESHOLD = 30 # seconds - if no data for 30s, check for new session
with open(session_file, 'r') as f:
while running:
if not session_file.exists():
print("Session file removed, looking for new session...")
return None
current_time = time.time()
# Check for newer session every 1 second
if current_time - last_session_check > 1.0:
last_session_check = current_time
newest_session = get_current_session_file()
if newest_session and newest_session != session_file:
print(f"Newer session detected: {newest_session.name}")
return newest_session
# Check if current file is stale (no new data for threshold)
if current_time - last_data_time > INACTIVITY_THRESHOLD:
try:
current_size = session_file.stat().st_size
# If file hasn't grown, check if another session is active
if current_size == last_file_size:
newest_session = get_current_session_file()
if newest_session and newest_session != session_file:
print(f"Current session inactive, switching to: {newest_session.name}")
return newest_session
else:
# File grew, update tracking
last_file_size = current_size
last_data_time = current_time
except Exception:
pass
# Process new lines and update activity tracking
old_position = last_position
process_new_lines(f, session_name, dry_run)
# If we processed new data, update activity timestamp
if last_position > old_position:
last_data_time = current_time
try:
last_file_size = session_file.stat().st_size
except Exception:
pass
time.sleep(0.1)
return session_file
def watch_loop(dry_run: bool = False):
global current_file, turn_counter
while running:
session_file = get_current_session_file()
if session_file is None:
print("No active session found, waiting...")
time.sleep(1)
continue
if current_file != session_file:
print(f"\nNew session detected: {session_file.name}")
current_file = session_file
turn_counter = 0
last_position = 0
result = watch_session(session_file, dry_run)
if result is None:
current_file = None
time.sleep(0.5)
def main():
global USER_ID
parser = argparse.ArgumentParser(description="TrueRecall v1.1 - Real-time Memory Capture")
parser.add_argument("--daemon", "-d", action="store_true", help="Run as daemon")
parser.add_argument("--once", "-o", action="store_true", help="Process once then exit")
parser.add_argument("--dry-run", "-n", action="store_true", help="Don't write to Qdrant")
parser.add_argument("--user-id", "-u", default=USER_ID, help=f"User ID (default: {USER_ID})")
args = parser.parse_args()
signal.signal(signal.SIGINT, signal_handler)
signal.signal(signal.SIGTERM, signal_handler)
if args.user_id:
USER_ID = args.user_id
print(f"🔍 TrueRecall v1.1 - Real-time Memory Capture")
print(f"📍 Qdrant: {QDRANT_URL}/{QDRANT_COLLECTION}")
print(f"🧠 Ollama: {OLLAMA_URL}/{EMBEDDING_MODEL}")
print(f"👤 User: {USER_ID}")
print()
if args.once:
print("Running once...")
session_file = get_current_session_file()
if session_file:
watch_session(session_file, args.dry_run)
else:
print("No session found")
else:
print("Running as daemon (Ctrl+C to stop)...")
watch_loop(args.dry_run)
if __name__ == "__main__":
main()