true-recall/README.md.neuralstream.bak

# NeuralStream

**Neural streaming memory for OpenClaw with gem-based context injection.**

## Overview

NeuralStream extracts high-value insights ("gems") from conversation batches using qwen3, stores them in Qdrant, and injects relevant gems into context on each new turn. This creates **infinite effective context** — the active window stays small, but semantically relevant gems from all past conversations are always retrievable.

## Core Concept

| Traditional Memory | NeuralStream |
|-------------------|--------------|
| Context lost on `/new` | Gems persist in Qdrant |
| Full history or generic summary | Semantic gem retrieval |
| Static context window | Dynamic injection |
| Survives compaction only | Survives session reset |
| **Limited context** | **Infinite effective context** |

## How It Works

### Capture → Extract → Store → Retrieve

1. **Capture:** Every turn buffered to Redis (reuses mem-redis-watcher)
2. **Extract:** Batch of 5 turns → qwen3 (with 256k context) extracts structured gems
3. **Store:** Gems embedded + stored in Qdrant `neuralstream`
4. **Retrieve:** Each new turn → semantic search → inject top-10 gems

### Hybrid Triggers (Three-way)

| Trigger | Condition | Purpose |
|---------|-----------|---------|
| Batch | Every 5 turns | Normal extraction |
| Context | 50% usage (`ctx.getContextUsage()`) | Proactive pre-compaction |
| Timer | 15 min idle | Safety net |

**Context Awareness:** qwen3 receives up to 256k tokens of history for understanding, but only extracts gems from the last N turns (avoiding current context).

All gems survive `/new`, `/reset`, and compaction via Qdrant persistence.

## Architecture

NeuralStream is the **middle layer** — extraction intelligence on top of existing infrastructure:

```
┌─────────────────────────────────────────────────────────┐
│  EXISTING: mem-redis-watcher                            │
│  Every turn → Redis buffer                              │
└──────────────────┬──────────────────────────────────────┘
                   │
        ┌──────────▼──────────┐
        │   NeuralStream      │
        │  - Batch reader     │
        │  - Gem extractor    │
        │  - Qdrant store     │
        └──────────┬──────────┘
                   │
        ┌──────────▼──────────┐
        │  EXISTING:          │
        │  qdrant-memory      │
        │  Semantic search    │
        │  Context injection  │
        └─────────────────────┘
```

## Technical Reference

### Native Context Monitoring

```typescript
// In turn_end hook
const usage = ctx.getContextUsage();
// usage.tokens, usage.contextWindow, usage.percent
// Trigger extraction when usage.percent >= threshold
```

### Primary Hook: turn_end

```typescript
pi.on("turn_end", async (event, ctx) => {
    const { turnIndex, message, toolResults } = event;

    // Buffer turn to Redis
    // Check ctx.getContextUsage().percent
    // If batch >= 5 OR percent >= 50%: extract
});
```

### Timer Fallback

```bash
# Cron every 10 min
# Check neuralstream:buffer age > 15 min
# If yes: extract from partial batch
```

### Context-Aware Extraction

- Feed qwen3: Up to 256k tokens (full history for context)
- Extract from: Last `batch_size` turns only
- Benefit: Rich understanding without gemming current context

## Gem Format

```json
{
  "gem_id": "uuid",
  "content": "Distilled insight/fact/decision",
  "summary": "One-line for quick scanning",
  "topics": ["docker", "redis", "architecture"],
  "importance": 0.9,
  "source": {
    "session_id": "uuid",
    "date": "2026-02-23",
    "turn_range": "15-20"
  },
  "tags": ["decision", "fact", "preference", "todo", "code"],
  "created_at": "2026-02-23T15:26:00Z"
}
```

## Configuration (All Tunable)

| Setting | Default | Description |
|---------|---------|-------------|
| batch_size | 5 | Turns per extraction |
| context_threshold | 50% | Token % trigger (40-80% range) |
| idle_timeout | 15 min | Timer trigger threshold |
| gem_model | qwen3 | Extraction LLM (256k context) |
| max_gems_injected | 10 | Per-turn limit |
| embedding | snowflake-arctic-embed2 | Same as kimi_memories |
| collection | neuralstream | Qdrant (1024 dims, Cosine) |

## Qdrant Schema

**Collection:** `neuralstream`
- Vector size: 1024
- Distance: Cosine
- On-disk payload: true

## Project Structure

```
.projects/neuralstream/
├── README.md          # This file
├── session.md         # Development log & state
├── prompt.md          # (TBD) qwen3 extraction prompt
└── src/               # (TBD) Implementation
    ├── extract.ts     # Gem extraction logic
    ├── store.ts       # Qdrant storage
    └── inject.ts      # Context injection
```

## Status

- [x] Architecture defined (v2.2 context-aware)
- [x] Native context monitoring validated (ctx.getContextUsage)
- [x] Naming finalized (NeuralStream, alias: ns)
- [x] Hook research completed
- [x] Qdrant collection created (`neuralstream`)
- [x] Gem format proposed
- [x] Infrastructure decision (reuse Redis/Qdrant)
- [ ] Extraction prompt design
- [ ] Implementation
- [ ] Testing

## Backups

- Local: `/root/.openclaw/workspace/.projects/neuralstream/`
- Remote: `deb2:/root/.projects/neuralstream/` (build/test only)
- kimi_kb: Research entries stored

## Related Projects

- **True Recall:** Gem extraction inspiration
- **OpenClaw:** Host platform
- **kimi_memories:** Shared Qdrant infrastructure
- **mem-redis-watcher:** Existing capture layer

---

**Created:** 2026-02-23
**Alias:** ns
**Purpose:** Infinite context for LLMs