Initial commit: workspace setup with skills, memory, config

2026-02-10 14:37:49 -06:00
commit d1357c5463
77 changed files with 10822 additions and 0 deletions
--- a/skills/local-whisper-stt/SKILL.md
+++ b/skills/local-whisper-stt/SKILL.md
@@ -0,0 +1,79 @@
+---
+name: local-whisper-stt
+description: Local speech-to-text transcription using Faster-Whisper. Use when receiving voice messages in Telegram (or other channels) that need to be transcribed to text. Automatically downloads and transcribes audio files using local CPU-based Whisper models. Supports multiple model sizes (tiny, base, small, medium, large) with automatic language detection.
+---
+
+# Local Whisper STT
+
+## Overview
+
+Transcribes voice messages to text using local Faster-Whisper (CPU-based, no GPU required).
+
+## When to Use
+
+- User sends a voice message in Telegram
+- Need to transcribe audio to text locally (free, private)
+- Any audio transcription task where cloud STT is not desired
+
+## Models Available
+
+| Model | Size | Speed | Accuracy | Use Case |
+|-------|------|-------|----------|----------|
+| tiny | 39MB | Fastest | Basic | Quick testing, low resources |
+| base | 74MB | Fast | Good | Default for most use |
+| small | 244MB | Medium | Better | Better accuracy needed |
+| medium | 769MB | Slower | Very Good | High accuracy, more RAM |
+| large | 1550MB | Slowest | Best | Maximum accuracy |
+
+## Workflow
+
+1. Receive voice message (Telegram provides OGG/Opus)
+2. Download audio file to temp location
+3. Load Faster-Whisper model (cached after first use)
+4. Transcribe audio to text
+5. Return transcription to conversation
+6. Cleanup temp file
+
+## Usage
+
+### From Telegram Voice Message
+
+When a voice message arrives, the skill:
+1. Downloads the voice file from Telegram
+2. Transcribes using the configured model
+3. Returns text to the agent context
+
+### Manual Transcription
+
+```python
+# Transcribe a local audio file
+from faster_whisper import WhisperModel
+
+model = WhisperModel("base", device="cpu", compute_type="int8")
+segments, info = model.transcribe("/path/to/audio.ogg", beam_size=5)
+
+for segment in segments:
+    print(segment.text)
+```
+
+## Configuration
+
+Default model: `base` (good balance of speed/accuracy on CPU)
+
+To change model, edit the script or set environment variable:
+```bash
+export WHISPER_MODEL=small
+```
+
+## Requirements
+
+- Python 3.8+
+- faster-whisper package
+- ~100MB-1.5GB disk space (depending on model)
+- No GPU required (CPU-only)
+
+## Resources
+
+### scripts/
+- `transcribe.py` - Main transcription script
+- `telegram_voice_handler.py` - Telegram-specific voice message handler