Files
jarvis-memory/skills/qdrant-memory/knowledge_base_schema.md

4.2 KiB

knowledge_base Schema

Collection: knowledge_base

Purpose: Personal knowledge repository organized by topic/domain, not by source or project.

Metadata Schema

{
  "domain": "Python",                    // Primary knowledge area (Python, Networking, Android...)
  "path": "Python/AsyncIO/Patterns",     // Hierarchical: domain/subject/specific
  "subjects": ["async", "concurrency"],  // Cross-linking topics
  
  "category": "reference",               // reference | tutorial | snippet | troubleshooting | concept
  "content_type": "code",                // web_page | code | markdown | pdf | note
  
  "title": "Async Context Managers",     // Display name
  "checksum": "sha256:...",              // For duplicate detection
  "source_url": "https://...",           // Source attribution (always stored)
  "date_added": "2026-02-05",            // Date first stored
  "date_scraped": "2026-02-05T10:30:00"  // Exact timestamp scraped
}

Field Descriptions

Field Required Description
domain Yes Primary knowledge domain (e.g., Python, Networking)
path Yes Hierarchical location: Domain/Subject/Specific
subjects No Array of related topics for cross-linking
category Yes Content type classification
content_type Yes Format: web_page, code, markdown, pdf, note
title Yes Human-readable title
checksum Auto SHA256 hash for duplicate detection
source_url Yes Original source (web pages) or reference
date_added Auto Date stored (YYYY-MM-DD)
date_scraped Auto ISO timestamp when content was acquired
text_preview Auto First 300 chars of content (for display)

Content Categories

Category Use For
reference Documentation, specs, cheat sheets
tutorial Step-by-step guides, how-tos
snippet Code snippets, short examples
troubleshooting Error fixes, debugging steps
concept Explanations, theory, patterns

Examples

Content Domain Path Category
DNS troubleshooting Networking Networking/DNS/Reverse-Lookup troubleshooting
Kotlin coroutines Android Android/Kotlin/Coroutines tutorial
Systemd timers Linux Linux/Systemd/Timers reference
Python async patterns Python Python/AsyncIO/Patterns code

Workflow

Smart Search (smart_search.py)

Always follow this pattern:

  1. Search knowledge_base first — vector similarity search
  2. Search web via SearXNG — get fresh results
  3. Synthesize — combine KB + web findings
  4. Store new info — if web has substantial new content
    • Auto-check for duplicates (checksum comparison)
    • Only store if content is unique and substantial (>500 chars)
    • Auto-tag with domain, date_scraped, source_url

Storage Policy

Store when:

  • Content is substantial (>500 chars)
  • Not duplicate of existing KB entry
  • Has clear source attribution
  • Belongs to a defined domain

Skip when:

  • Too short (<500 chars)
  • Duplicate/similar content exists
  • No clear source URL

Review Schedule

Monthly review (cron: 1st of month at 3 AM):

  • Check entries older than 180 days
  • Fast-moving domains (AI/ML, Python, JavaScript, Docker, DevOps): 90 days
  • Remove outdated entries or flag for update

Fast-Moving Domains

These domains get shorter freshness thresholds:

  • AI/ML (models change fast)
  • Python (new versions, packages)
  • JavaScript (framework churn)
  • Docker (image updates)
  • OpenClaw (active development)
  • DevOps (tools evolve)

Scripts

Script Purpose
smart_search.py KB → web → store workflow
kb_store.py Manual content storage
kb_review.py Monthly outdated review
scrape_to_kb.py Direct URL scraping

Design Decisions

  • Subject-first: Organize by knowledge type, not source
  • Path-based hierarchy: Navigate Domain/Subject/Specific
  • Separate from memories: knowledge_base and openclaw_memories are isolated
  • Duplicate handling: Checksum + content similarity → skip duplicates
  • Auto-freshness: Monthly cleanup of outdated entries
  • Full attribution: Always store source_url and date_scraped