Files

root d1357c5463 Initial commit: workspace setup with skills, memory, config

2026-02-10 14:37:49 -06:00

2.3 KiB

Raw Blame History

name, description

name	description
local-whisper-stt	Local speech-to-text transcription using Faster-Whisper. Use when receiving voice messages in Telegram (or other channels) that need to be transcribed to text. Automatically downloads and transcribes audio files using local CPU-based Whisper models. Supports multiple model sizes (tiny, base, small, medium, large) with automatic language detection.

Local Whisper STT

Overview

Transcribes voice messages to text using local Faster-Whisper (CPU-based, no GPU required).

When to Use

User sends a voice message in Telegram
Need to transcribe audio to text locally (free, private)
Any audio transcription task where cloud STT is not desired

Models Available

Model	Size	Speed	Accuracy	Use Case
tiny	39MB	Fastest	Basic	Quick testing, low resources
base	74MB	Fast	Good	Default for most use
small	244MB	Medium	Better	Better accuracy needed
medium	769MB	Slower	Very Good	High accuracy, more RAM
large	1550MB	Slowest	Best	Maximum accuracy

Workflow

Receive voice message (Telegram provides OGG/Opus)
Download audio file to temp location
Load Faster-Whisper model (cached after first use)
Transcribe audio to text
Return transcription to conversation
Cleanup temp file

Usage

From Telegram Voice Message

When a voice message arrives, the skill:

Downloads the voice file from Telegram
Transcribes using the configured model
Returns text to the agent context

Manual Transcription

# Transcribe a local audio file
from faster_whisper import WhisperModel

model = WhisperModel("base", device="cpu", compute_type="int8")
segments, info = model.transcribe("/path/to/audio.ogg", beam_size=5)

for segment in segments:
    print(segment.text)

Configuration

Default model: base (good balance of speed/accuracy on CPU)

To change model, edit the script or set environment variable:

export WHISPER_MODEL=small

2.3 KiB

Raw Blame History

Local Whisper STT

Overview

When to Use

Models Available

Workflow

Usage

From Telegram Voice Message

Manual Transcription

Configuration

Requirements

Resources

scripts/

2.3 KiB Raw Blame History

Local Whisper STT

Overview

When to Use

Models Available

Workflow

Usage

From Telegram Voice Message

Manual Transcription

Configuration

Requirements

Resources

scripts/

2.3 KiB

Raw Blame History