Files
jarvis-memory/skills/qdrant-memory/HARVEST.md

3.4 KiB

Session Harvest Instructions

What is Session Harvesting?

Session harvesting extracts conversation turns from OpenClaw session JSONL files and stores them to Qdrant long-term memory with proper embeddings and user_id linking.

When to Use

  • After setting up a new memory system — harvest existing sessions
  • After discovering missed backups — recover data from session files
  • Periodically — if cron jobs missed any data

Scripts

Script Purpose Usage
harvest_sessions.py Harvest all sessions (auto-sorts by mtime) Limited by memory, may timeout
harvest_newest.py Harvest specific sessions by name Recommended for batch control

Location

/root/.openclaw/workspace/skills/qdrant-memory/scripts/
├── harvest_sessions.py   # Auto-harvest (use --limit to control)
└── harvest_newest.py     # Manual batch (specify session names)

Usage

Method 1: Auto-Harvest with Limit

# Harvest oldest 10 sessions (default sort)
python3 harvest_sessions.py --user-id rob --limit 10

# Dry run to see what would be stored
python3 harvest_sessions.py --user-id rob --dry-run --limit 5
# Harvest specific sessions (newest first recommended)
python3 harvest_newest.py --user-id rob \
  session-uuid-1.jsonl \
  session-uuid-2.jsonl \
  session-uuid-3.jsonl

Finding Newest Sessions

# List 20 newest session files
ls -t /root/.openclaw/agents/main/sessions/*.jsonl | head -20

# Get just filenames for copy-paste
ls -t /root/.openclaw/agents/main/sessions/*.jsonl | head -20 | xargs -I{} basename {}

How It Works

  1. Parse — Reads JSONL session file, extracts user/AI turns
  2. Pair — Matches user message with next AI response
  3. Embed — Generates 3 embeddings (user, AI, summary) via Ollama
  4. Deduplicate — Checks content_hash before storing
  5. Store — Upserts to Qdrant with user_id, conversation_id, turn_number

Deduplication

  • Uses MD5 hash of user_message::ai_response
  • Checks Qdrant for existing user_id + content_hash
  • Skips if already stored (returns "duplicate")
  • Safe to run multiple times on same sessions

Output Format

[1] session-uuid.jsonl
  Stored: 10, Skipped: 6

Total: 44 stored, 6 skipped
  • Stored = New memories added to Qdrant
  • Skipped = Duplicates (already in Qdrant)

Troubleshooting

Timeout / SIGKILL

The embedding process is CPU-intensive. If killed:

# Use smaller batches
python3 harvest_newest.py --user-id rob session1.jsonl session2.jsonl

Check Qdrant Status

curl -s http://10.0.0.40:6333/collections/kimi_memories | \
  python3 -c "import sys,json; d=json.load(sys.stdin); print(d['result']['points_count'])"

Check Session Content

# Count turns in a session
python3 -c "
import json
from pathlib import Path
f = Path('/root/.openclaw/agents/main/sessions/YOUR-SESSION.jsonl')
count = sum(1 for line in open(f) if 'user' in line or 'assistant' in line)
print(f'~{count} messages')
"

Memory Architecture

Session JSONL (raw)
       │
       ▼
  harvest_*.py
       │
       ├──► Embeddings (Ollama snowflake-arctic-embed2)
       │
       ▼
  Qdrant kimi_memories
       │
       └──► Searchable via user_id: "rob"

Created: February 17, 2026
Author: Kimi (audit session)