Initial commit: TrueRecall v2.2 with 30b curator and timer-based curation

2026-02-24 20:27:44 -06:00
commit 8bb1abaf18
23 changed files with 4112 additions and 0 deletions
--- a/session.md.neuralstream.bak
+++ b/session.md.neuralstream.bak
@@ -0,0 +1,150 @@
+# NeuralStream Session State
+
+**Date:** 2026-02-23  
+**Status:** Architecture v2.2 - Context-aware hybrid triggers  
+**Alias:** ns
+
+---
+
+## Architecture v2.2 (Current)
+
+**Decision:** Three hybrid extraction triggers with full context awareness
+
+| Trigger | When | Purpose |
+|---------|------|---------|
+| `turn_end` (N=5) | Every 5 turns | Normal batch extraction |
+| Timer (15 min idle) | No new turn for 15 min | Catch partial batches |
+| Context (50% threshold) | `ctx.getContextUsage().percent >= threshold` | Proactive pre-compaction |
+
+**Context Awareness:**
+- qwen3 gets **up to 256k tokens** of full conversation history for understanding
+- Only extracts **last N turns** (oldest in batch) to avoid gemming current context
+- Uses `ctx.getContextUsage()` native API for token monitoring
+
+**Why Hybrid:**
+- Batch extraction = better quality gems (more context)
+- Timer safety = never lose important turns if user walks away
+- Context trigger = proactive extraction before system forces compaction
+- All gems survive `/new` and `/reset` via Qdrant
+
+**Infrastructure:** Reuse existing Redis/Qdrant — NeuralStream is the "middle layer" only
+
+---
+
+## Core Insight
+
+NeuralStream enables **infinite effective context** — active window stays small, but semantically relevant gems from all past conversations are queryable and injectable.
+
+---
+
+## Technical Decisions 2026-02-23
+
+### Triggers (Three-way Hybrid)
+| Trigger | Config | Default |
+|---------|--------|---------|
+| Batch size | `batch_size` | 5 turns |
+| Idle timeout | `idle_timeout` | 15 minutes |
+| Context threshold | `context_threshold` | 50% |
+
+### Context Monitoring (Native API)
+- `ctx.getContextUsage()` → `{tokens, contextWindow, percent}`
+- Checked in `turn_end` hook
+- Triggers extraction when `percent >= context_threshold`
+
+### Extraction Context Window
+- **Feed to qwen3:** Up to 256k tokens (full history for understanding)
+- **Extract from:** Last `batch_size` turns only
+- **Benefit:** Rich context awareness without gemming current conversation
+
+### Storage
+- **Buffer:** Redis (`neuralstream:buffer` key)
+- **Gems:** Qdrant `neuralstream` collection (1024 dims, Cosine)
+- **Existing infra:** Reuse mem-redis-watcher + qdrant-memory
+
+### Gem Format (Proposed)
+```json
+{
+  "gem_id": "uuid",
+  "content": "Distilled insight/fact/decision",
+  "summary": "One-line for quick scanning",
+  "topics": ["docker", "redis", "architecture"],
+  "importance": 0.9,
+  "source": {
+    "session_id": "uuid",
+    "date": "2026-02-23",
+    "turn_range": "15-20"
+  },
+  "tags": ["decision", "fact", "preference", "todo", "code"],
+  "created_at": "2026-02-23T15:26:00Z"
+}
+```
+
+### Extraction Model
+- **qwen3** for gem extraction (256k context, cheap)
+- **Dedicated prompt** (to be designed) for extracting high-value items
+
+---
+
+## Architecture Layers
+
+| Layer | Status | Description |
+|-------|--------|-------------|
+| Capture | ✅ Existing | Every turn → Redis (mem-redis-watcher) |
+| **Extract** | ⏳ NeuralStream | Batch → qwen3 → gems → Qdrant |
+| Retrieve | ✅ Existing | Semantic search → inject context |
+
+NeuralStream = Smart extraction layer on top of existing infra.
+
+---
+
+## Open Questions
+
+- Gem extraction prompt design (deferred)
+- Importance scoring: auto vs manual?
+- Injection: `turn_start` hook or modify system prompt?
+- Semantic search threshold tuning
+
+---
+
+## Next Steps
+
+| Task | Status |
+|------|--------|
+| Architecture v2.2 finalized | ✅ |
+| Native context monitoring validated | ✅ |
+| Gem JSON schema | ✅ Proposed |
+| Implement turn_end hook | ⏳ |
+| Implement timer/cron check | ⏳ |
+| Implement context trigger | ⏳ |
+| Create extraction prompt | ⏳ |
+| Test gem extraction with qwen3 | ⏳ |
+| Implement injection mechanism | ⏳ |
+
+---
+
+## Decisions Log
+
+| Date | Decision |
+|------|----------|
+| 2026-02-23 | Switch to turn_end hook (v2) |
+| 2026-02-23 | Hybrid triggers with timer (v2.1) |
+| 2026-02-23 | Context-aware extraction (v2.2) |
+| 2026-02-23 | Native API: ctx.getContextUsage() |
+| 2026-02-23 | Full context feed to qwen3 (256k) |
+| 2026-02-23 | Reuse existing Redis/Qdrant infrastructure |
+| 2026-02-23 | Batch N=5 turns |
+| 2026-02-23 | Context threshold = 50% |
+| 2026-02-23 | Inactivity timer = 15 min |
+| 2026-02-23 | Dedicated qwen3 extraction prompt (deferred) |
+
+---
+
+## Backups
+
+- Local: `/root/.openclaw/workspace/.projects/neuralstream/`
+- Remote: `deb2:/root/.projects/neuralstream/` (build/test only)
+- kimi_kb: Research entries stored
+
+---
+
+**Key Insight:** Session resets wipe context but NOT Qdrant. NeuralStream = "Context insurance policy" for infinite LLM memory.