forked from SpeedyFoxAi/jarvis-memory
138 lines
3.4 KiB
Markdown
138 lines
3.4 KiB
Markdown
|
|
# Session Harvest Instructions
|
||
|
|
|
||
|
|
## What is Session Harvesting?
|
||
|
|
|
||
|
|
Session harvesting extracts conversation turns from OpenClaw session JSONL files and stores them to Qdrant long-term memory with proper embeddings and user_id linking.
|
||
|
|
|
||
|
|
## When to Use
|
||
|
|
|
||
|
|
- **After setting up a new memory system** — harvest existing sessions
|
||
|
|
- **After discovering missed backups** — recover data from session files
|
||
|
|
- **Periodically** — if cron jobs missed any data
|
||
|
|
|
||
|
|
## Scripts
|
||
|
|
|
||
|
|
| Script | Purpose | Usage |
|
||
|
|
|--------|---------|-------|
|
||
|
|
| `harvest_sessions.py` | Harvest all sessions (auto-sorts by mtime) | Limited by memory, may timeout |
|
||
|
|
| `harvest_newest.py` | Harvest specific sessions by name | Recommended for batch control |
|
||
|
|
|
||
|
|
## Location
|
||
|
|
|
||
|
|
```
|
||
|
|
/root/.openclaw/workspace/skills/qdrant-memory/scripts/
|
||
|
|
├── harvest_sessions.py # Auto-harvest (use --limit to control)
|
||
|
|
└── harvest_newest.py # Manual batch (specify session names)
|
||
|
|
```
|
||
|
|
|
||
|
|
## Usage
|
||
|
|
|
||
|
|
### Method 1: Auto-Harvest with Limit
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Harvest oldest 10 sessions (default sort)
|
||
|
|
python3 harvest_sessions.py --user-id rob --limit 10
|
||
|
|
|
||
|
|
# Dry run to see what would be stored
|
||
|
|
python3 harvest_sessions.py --user-id rob --dry-run --limit 5
|
||
|
|
```
|
||
|
|
|
||
|
|
### Method 2: Batch by Session Name (Recommended)
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Harvest specific sessions (newest first recommended)
|
||
|
|
python3 harvest_newest.py --user-id rob \
|
||
|
|
session-uuid-1.jsonl \
|
||
|
|
session-uuid-2.jsonl \
|
||
|
|
session-uuid-3.jsonl
|
||
|
|
```
|
||
|
|
|
||
|
|
### Finding Newest Sessions
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# List 20 newest session files
|
||
|
|
ls -t /root/.openclaw/agents/main/sessions/*.jsonl | head -20
|
||
|
|
|
||
|
|
# Get just filenames for copy-paste
|
||
|
|
ls -t /root/.openclaw/agents/main/sessions/*.jsonl | head -20 | xargs -I{} basename {}
|
||
|
|
```
|
||
|
|
|
||
|
|
## How It Works
|
||
|
|
|
||
|
|
1. **Parse** — Reads JSONL session file, extracts user/AI turns
|
||
|
|
2. **Pair** — Matches user message with next AI response
|
||
|
|
3. **Embed** — Generates 3 embeddings (user, AI, summary) via Ollama
|
||
|
|
4. **Deduplicate** — Checks content_hash before storing
|
||
|
|
5. **Store** — Upserts to Qdrant with user_id, conversation_id, turn_number
|
||
|
|
|
||
|
|
## Deduplication
|
||
|
|
|
||
|
|
- Uses MD5 hash of `user_message::ai_response`
|
||
|
|
- Checks Qdrant for existing `user_id + content_hash`
|
||
|
|
- Skips if already stored (returns "duplicate")
|
||
|
|
- Safe to run multiple times on same sessions
|
||
|
|
|
||
|
|
## Output Format
|
||
|
|
|
||
|
|
```
|
||
|
|
[1] session-uuid.jsonl
|
||
|
|
Stored: 10, Skipped: 6
|
||
|
|
|
||
|
|
Total: 44 stored, 6 skipped
|
||
|
|
```
|
||
|
|
|
||
|
|
- **Stored** = New memories added to Qdrant
|
||
|
|
- **Skipped** = Duplicates (already in Qdrant)
|
||
|
|
|
||
|
|
## Troubleshooting
|
||
|
|
|
||
|
|
### Timeout / SIGKILL
|
||
|
|
|
||
|
|
The embedding process is CPU-intensive. If killed:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Use smaller batches
|
||
|
|
python3 harvest_newest.py --user-id rob session1.jsonl session2.jsonl
|
||
|
|
```
|
||
|
|
|
||
|
|
### Check Qdrant Status
|
||
|
|
|
||
|
|
```bash
|
||
|
|
curl -s http://10.0.0.40:6333/collections/kimi_memories | \
|
||
|
|
python3 -c "import sys,json; d=json.load(sys.stdin); print(d['result']['points_count'])"
|
||
|
|
```
|
||
|
|
|
||
|
|
### Check Session Content
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Count turns in a session
|
||
|
|
python3 -c "
|
||
|
|
import json
|
||
|
|
from pathlib import Path
|
||
|
|
f = Path('/root/.openclaw/agents/main/sessions/YOUR-SESSION.jsonl')
|
||
|
|
count = sum(1 for line in open(f) if 'user' in line or 'assistant' in line)
|
||
|
|
print(f'~{count} messages')
|
||
|
|
"
|
||
|
|
```
|
||
|
|
|
||
|
|
## Memory Architecture
|
||
|
|
|
||
|
|
```
|
||
|
|
Session JSONL (raw)
|
||
|
|
│
|
||
|
|
▼
|
||
|
|
harvest_*.py
|
||
|
|
│
|
||
|
|
├──► Embeddings (Ollama snowflake-arctic-embed2)
|
||
|
|
│
|
||
|
|
▼
|
||
|
|
Qdrant kimi_memories
|
||
|
|
│
|
||
|
|
└──► Searchable via user_id: "rob"
|
||
|
|
```
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
**Created:** February 17, 2026
|
||
|
|
**Author:** Kimi (audit session)
|