true-recall/session.md.neuralstream.bak

# NeuralStream Session State

**Date:** 2026-02-23
**Status:** Architecture v2.2 - Context-aware hybrid triggers
**Alias:** ns

---

## Architecture v2.2 (Current)

**Decision:** Three hybrid extraction triggers with full context awareness

| Trigger | When | Purpose |
|---------|------|---------|
| `turn_end` (N=5) | Every 5 turns | Normal batch extraction |
| Timer (15 min idle) | No new turn for 15 min | Catch partial batches |
| Context (50% threshold) | `ctx.getContextUsage().percent >= threshold` | Proactive pre-compaction |

**Context Awareness:**
- qwen3 gets **up to 256k tokens** of full conversation history for understanding
- Only extracts **last N turns** (oldest in batch) to avoid gemming current context
- Uses `ctx.getContextUsage()` native API for token monitoring

**Why Hybrid:**
- Batch extraction = better quality gems (more context)
- Timer safety = never lose important turns if user walks away
- Context trigger = proactive extraction before system forces compaction
- All gems survive `/new` and `/reset` via Qdrant

**Infrastructure:** Reuse existing Redis/Qdrant — NeuralStream is the "middle layer" only

---

## Core Insight

NeuralStream enables **infinite effective context** — active window stays small, but semantically relevant gems from all past conversations are queryable and injectable.

---

## Technical Decisions 2026-02-23

### Triggers (Three-way Hybrid)
| Trigger | Config | Default |
|---------|--------|---------|
| Batch size | `batch_size` | 5 turns |
| Idle timeout | `idle_timeout` | 15 minutes |
| Context threshold | `context_threshold` | 50% |

### Context Monitoring (Native API)
- `ctx.getContextUsage()` → `{tokens, contextWindow, percent}`
- Checked in `turn_end` hook
- Triggers extraction when `percent >= context_threshold`

### Extraction Context Window
- **Feed to qwen3:** Up to 256k tokens (full history for understanding)
- **Extract from:** Last `batch_size` turns only
- **Benefit:** Rich context awareness without gemming current conversation

### Storage
- **Buffer:** Redis (`neuralstream:buffer` key)
- **Gems:** Qdrant `neuralstream` collection (1024 dims, Cosine)
- **Existing infra:** Reuse mem-redis-watcher + qdrant-memory

### Gem Format (Proposed)
```json
{
  "gem_id": "uuid",
  "content": "Distilled insight/fact/decision",
  "summary": "One-line for quick scanning",
  "topics": ["docker", "redis", "architecture"],
  "importance": 0.9,
  "source": {
    "session_id": "uuid",
    "date": "2026-02-23",
    "turn_range": "15-20"
  },
  "tags": ["decision", "fact", "preference", "todo", "code"],
  "created_at": "2026-02-23T15:26:00Z"
}
```

### Extraction Model
- **qwen3** for gem extraction (256k context, cheap)
- **Dedicated prompt** (to be designed) for extracting high-value items

---

## Architecture Layers

| Layer | Status | Description |
|-------|--------|-------------|
| Capture | ✅ Existing | Every turn → Redis (mem-redis-watcher) |
| **Extract** | ⏳ NeuralStream | Batch → qwen3 → gems → Qdrant |
| Retrieve | ✅ Existing | Semantic search → inject context |

NeuralStream = Smart extraction layer on top of existing infra.

---

## Open Questions

- Gem extraction prompt design (deferred)
- Importance scoring: auto vs manual?
- Injection: `turn_start` hook or modify system prompt?
- Semantic search threshold tuning

---

## Next Steps

| Task | Status |
|------|--------|
| Architecture v2.2 finalized | ✅ |
| Native context monitoring validated | ✅ |
| Gem JSON schema | ✅ Proposed |
| Implement turn_end hook | ⏳ |
| Implement timer/cron check | ⏳ |
| Implement context trigger | ⏳ |
| Create extraction prompt | ⏳ |
| Test gem extraction with qwen3 | ⏳ |
| Implement injection mechanism | ⏳ |

---

## Decisions Log

| Date | Decision |
|------|----------|
| 2026-02-23 | Switch to turn_end hook (v2) |
| 2026-02-23 | Hybrid triggers with timer (v2.1) |
| 2026-02-23 | Context-aware extraction (v2.2) |
| 2026-02-23 | Native API: ctx.getContextUsage() |
| 2026-02-23 | Full context feed to qwen3 (256k) |
| 2026-02-23 | Reuse existing Redis/Qdrant infrastructure |
| 2026-02-23 | Batch N=5 turns |
| 2026-02-23 | Context threshold = 50% |
| 2026-02-23 | Inactivity timer = 15 min |
| 2026-02-23 | Dedicated qwen3 extraction prompt (deferred) |

---

## Backups

- Local: `/root/.openclaw/workspace/.projects/neuralstream/`
- Remote: `deb2:/root/.projects/neuralstream/` (build/test only)
- kimi_kb: Research entries stored

---

**Key Insight:** Session resets wipe context but NOT Qdrant. NeuralStream = "Context insurance policy" for infinite LLM memory.