initial commit
This commit is contained in:
3
.gitignore
vendored
Normal file
3
.gitignore
vendored
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
venv
|
||||||
|
*.mp3
|
||||||
|
*.txt
|
||||||
232
README.md
232
README.md
@@ -0,0 +1,232 @@
|
|||||||
|
# Meeting Audio Summarizer
|
||||||
|
|
||||||
|
Dieses Python-Programm transkribiert Audio-Dateien von Meetings mit Whisper (lokal) und erstellt automatisch eine Zusammenfassung mit einem LLM über eine OpenAI-kompatible API.
|
||||||
|
|
||||||
|
## Features
|
||||||
|
|
||||||
|
- 🎤 **Lokale Transkription** mit OpenAI Whisper (keine Cloud erforderlich)
|
||||||
|
- 🤖 **Flexible LLM-Integration** über OpenAI-kompatible APIs
|
||||||
|
- 📝 **Strukturierte Zusammenfassungen** mit Hauptthemen, Entscheidungen und Action Items
|
||||||
|
- 🔄 **Provider-unabhängig** - funktioniert mit OpenAI, Anthropic, Ollama, LM Studio, etc.
|
||||||
|
- 💾 **Automatisches Speichern** von Transkript und Zusammenfassung
|
||||||
|
|
||||||
|
## Installation
|
||||||
|
|
||||||
|
### Voraussetzungen
|
||||||
|
|
||||||
|
- Python 3.8 oder höher
|
||||||
|
- ffmpeg (für Audio-Verarbeitung)
|
||||||
|
|
||||||
|
#### ffmpeg Installation
|
||||||
|
|
||||||
|
**Ubuntu/Debian:**
|
||||||
|
```bash
|
||||||
|
sudo apt update
|
||||||
|
sudo apt install ffmpeg
|
||||||
|
```
|
||||||
|
|
||||||
|
**macOS:**
|
||||||
|
```bash
|
||||||
|
brew install ffmpeg
|
||||||
|
```
|
||||||
|
|
||||||
|
**Windows:**
|
||||||
|
Lade ffmpeg von https://ffmpeg.org/download.html herunter und füge es zum PATH hinzu.
|
||||||
|
|
||||||
|
### Python-Pakete installieren
|
||||||
|
|
||||||
|
```bash
|
||||||
|
pip install -r requirements.txt
|
||||||
|
```
|
||||||
|
|
||||||
|
Whisper benötigt beim ersten Start einige Zeit zum Herunterladen der Modelle.
|
||||||
|
|
||||||
|
## Konfiguration
|
||||||
|
|
||||||
|
### API-Key setzen
|
||||||
|
|
||||||
|
Setze deinen API-Key als Umgebungsvariable:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
export OPENAI_API_KEY="dein-api-key"
|
||||||
|
```
|
||||||
|
|
||||||
|
Oder übergebe ihn direkt beim Aufruf mit `--api-key`.
|
||||||
|
|
||||||
|
### Alternative LLM-Provider
|
||||||
|
|
||||||
|
Das Programm funktioniert mit jedem OpenAI-kompatiblen Endpunkt:
|
||||||
|
|
||||||
|
#### Ollama (lokal)
|
||||||
|
```bash
|
||||||
|
python meeting_summarizer.py meeting.mp3 \
|
||||||
|
--api-base http://localhost:11434/v1 \
|
||||||
|
--api-key ollama \
|
||||||
|
--model llama3.2
|
||||||
|
```
|
||||||
|
|
||||||
|
#### LM Studio (lokal)
|
||||||
|
```bash
|
||||||
|
python meeting_summarizer.py meeting.mp3 \
|
||||||
|
--api-base http://localhost:1234/v1 \
|
||||||
|
--api-key lm-studio \
|
||||||
|
--model local-model
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Anthropic Claude (via OpenAI-Kompatibilitätslayer)
|
||||||
|
```bash
|
||||||
|
python meeting_summarizer.py meeting.mp3 \
|
||||||
|
--api-base https://api.anthropic.com/v1 \
|
||||||
|
--api-key $ANTHROPIC_API_KEY \
|
||||||
|
--model claude-3-5-sonnet-20241022
|
||||||
|
```
|
||||||
|
|
||||||
|
#### OpenRouter
|
||||||
|
```bash
|
||||||
|
python meeting_summarizer.py meeting.mp3 \
|
||||||
|
--api-base https://openrouter.ai/api/v1 \
|
||||||
|
--api-key $OPENROUTER_API_KEY \
|
||||||
|
--model anthropic/claude-3.5-sonnet
|
||||||
|
```
|
||||||
|
|
||||||
|
## Verwendung
|
||||||
|
|
||||||
|
### Basis-Verwendung
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python meeting_summarizer.py meeting.mp3
|
||||||
|
```
|
||||||
|
|
||||||
|
Dies erstellt:
|
||||||
|
- `meeting_transcript.txt` - Vollständiges Transkript
|
||||||
|
- `meeting_summary.txt` - Zusammenfassung
|
||||||
|
|
||||||
|
### Mit Optionen
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python meeting_summarizer.py meeting.wav \
|
||||||
|
--whisper-model medium \
|
||||||
|
--model gpt-4 \
|
||||||
|
--output-dir ./summaries \
|
||||||
|
--api-base https://api.openai.com/v1
|
||||||
|
```
|
||||||
|
|
||||||
|
### Alle Optionen
|
||||||
|
|
||||||
|
```
|
||||||
|
Optionen:
|
||||||
|
audio_file Pfad zur Audio-Datei (mp3, wav, m4a, etc.)
|
||||||
|
|
||||||
|
--whisper-model MODEL Whisper-Modellgröße (default: base)
|
||||||
|
Optionen: tiny, base, small, medium, large
|
||||||
|
|
||||||
|
--api-base URL Base URL für OpenAI-kompatible API
|
||||||
|
(default: https://api.openai.com/v1)
|
||||||
|
|
||||||
|
--api-key KEY API-Key (nutzt OPENAI_API_KEY wenn nicht angegeben)
|
||||||
|
|
||||||
|
--model MODEL LLM-Modellname (default: gpt-4)
|
||||||
|
|
||||||
|
--output-dir DIR Ausgabeverzeichnis für Transkript und Zusammenfassung
|
||||||
|
(default: gleiches Verzeichnis wie Audio-Datei)
|
||||||
|
|
||||||
|
--no-transcript Vollständiges Transkript nicht speichern
|
||||||
|
```
|
||||||
|
|
||||||
|
## Whisper-Modelle
|
||||||
|
|
||||||
|
Die Wahl des Whisper-Modells beeinflusst Geschwindigkeit und Genauigkeit:
|
||||||
|
|
||||||
|
| Modell | Parameter | Geschwindigkeit | Genauigkeit | Empfehlung |
|
||||||
|
|--------|-----------|-----------------|-------------|------------|
|
||||||
|
| tiny | 39M | Sehr schnell | Niedrig | Schnelle Tests |
|
||||||
|
| base | 74M | Schnell | Gut | **Standard** |
|
||||||
|
| small | 244M | Mittel | Sehr gut | Gute Balance |
|
||||||
|
| medium | 769M | Langsam | Ausgezeichnet | Hohe Qualität |
|
||||||
|
| large | 1550M | Sehr langsam | Beste | Produktionsumgebung |
|
||||||
|
|
||||||
|
**Empfehlung für Meetings:** `base` oder `small` für gute Balance zwischen Geschwindigkeit und Qualität.
|
||||||
|
|
||||||
|
## Unterstützte Audio-Formate
|
||||||
|
|
||||||
|
Alle Formate, die von ffmpeg unterstützt werden:
|
||||||
|
- MP3
|
||||||
|
- WAV
|
||||||
|
- M4A
|
||||||
|
- FLAC
|
||||||
|
- OGG
|
||||||
|
- WMA
|
||||||
|
- AAC
|
||||||
|
|
||||||
|
## Programmatische Verwendung
|
||||||
|
|
||||||
|
Du kannst das Programm auch als Modul verwenden:
|
||||||
|
|
||||||
|
```python
|
||||||
|
from meeting_summarizer import MeetingSummarizer
|
||||||
|
|
||||||
|
# Initialisiere den Summarizer
|
||||||
|
summarizer = MeetingSummarizer(
|
||||||
|
whisper_model="base",
|
||||||
|
api_base_url="http://localhost:11434/v1",
|
||||||
|
api_key="ollama",
|
||||||
|
model_name="llama3.2"
|
||||||
|
)
|
||||||
|
|
||||||
|
# Verarbeite ein Meeting
|
||||||
|
transcript, summary = summarizer.process_meeting(
|
||||||
|
audio_path="meeting.mp3",
|
||||||
|
output_dir="./output",
|
||||||
|
save_transcript=True
|
||||||
|
)
|
||||||
|
|
||||||
|
print(summary)
|
||||||
|
```
|
||||||
|
|
||||||
|
## Performance-Tipps
|
||||||
|
|
||||||
|
### Für schnellere Transkription:
|
||||||
|
- Nutze kleinere Whisper-Modelle (`tiny` oder `base`)
|
||||||
|
- Nutze GPU-Beschleunigung (CUDA) falls verfügbar
|
||||||
|
- Whisper installiert automatisch die passende Version für deine Hardware
|
||||||
|
|
||||||
|
### Für bessere Qualität:
|
||||||
|
- Nutze größere Whisper-Modelle (`medium` oder `large`)
|
||||||
|
- Stelle sicher, dass die Audio-Qualität gut ist
|
||||||
|
- Bei mehrsprachigen Meetings: Entferne `language="de"` im Code für Auto-Detection
|
||||||
|
|
||||||
|
## Tipps für embedded Systems
|
||||||
|
|
||||||
|
Da du mit embedded Systems arbeitest, hier einige Hinweise für ressourcenbeschränkte Umgebungen:
|
||||||
|
|
||||||
|
- **Raspberry Pi:** Nutze `tiny` oder `base` Modell
|
||||||
|
- **Echtzeit-Verarbeitung:** Whisper ist nicht für Echtzeit optimiert, verarbeite Aufnahmen nachträglich
|
||||||
|
- **Speicher:** `base` benötigt ~140MB RAM, `large` ~3GB
|
||||||
|
- **Alternative:** Nutze Whisper.cpp für C++-Integration in embedded Systems
|
||||||
|
|
||||||
|
## Troubleshooting
|
||||||
|
|
||||||
|
### "No module named 'whisper'"
|
||||||
|
```bash
|
||||||
|
pip install openai-whisper
|
||||||
|
```
|
||||||
|
|
||||||
|
### "ffmpeg not found"
|
||||||
|
Installiere ffmpeg (siehe Installationsanleitung oben)
|
||||||
|
|
||||||
|
### "API key not provided"
|
||||||
|
Setze die Umgebungsvariable oder übergebe `--api-key`
|
||||||
|
|
||||||
|
### Langsame Transkription
|
||||||
|
Nutze ein kleineres Modell oder aktiviere GPU-Beschleunigung
|
||||||
|
|
||||||
|
## Lizenz
|
||||||
|
|
||||||
|
Frei verwendbar für private und kommerzielle Zwecke.
|
||||||
|
|
||||||
|
## Hinweise
|
||||||
|
|
||||||
|
- Whisper läuft komplett lokal - keine Audio-Daten werden gesendet
|
||||||
|
- Nur der transkribierte Text wird an das LLM gesendet
|
||||||
|
- Achte auf Datenschutz bei sensiblen Meeting-Inhalten
|
||||||
|
- Die Qualität der Zusammenfassung hängt vom gewählten LLM ab
|
||||||
|
|||||||
233
meeting_summarizer.py
Normal file
233
meeting_summarizer.py
Normal file
@@ -0,0 +1,233 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
Meeting Audio Summarizer
|
||||||
|
Transcribes audio files using local Whisper and summarizes using OpenAI-compatible API
|
||||||
|
"""
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
import os
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Optional
|
||||||
|
import whisper
|
||||||
|
from openai import OpenAI
|
||||||
|
|
||||||
|
|
||||||
|
class MeetingSummarizer:
|
||||||
|
"""Handles audio transcription and summarization of meetings"""
|
||||||
|
|
||||||
|
def __init__(
|
||||||
|
self,
|
||||||
|
whisper_model: str = "base",
|
||||||
|
api_base_url: str = "https://api.openai.com/v1",
|
||||||
|
api_key: Optional[str] = None,
|
||||||
|
model_name: str = "gpt-4",
|
||||||
|
output_language: str = "english"
|
||||||
|
):
|
||||||
|
"""
|
||||||
|
Initialize the meeting summarizer
|
||||||
|
|
||||||
|
Args:
|
||||||
|
whisper_model: Whisper model size (tiny, base, small, medium, large)
|
||||||
|
api_base_url: Base URL for OpenAI-compatible API
|
||||||
|
api_key: API key (will use OPENAI_API_KEY env var if not provided)
|
||||||
|
model_name: Name of the LLM model to use
|
||||||
|
output_language: Language for the summary output (e.g., "english", "german", "spanish")
|
||||||
|
"""
|
||||||
|
print(f"Loading Whisper model '{whisper_model}'...")
|
||||||
|
self.whisper_model = whisper.load_model(whisper_model)
|
||||||
|
self.output_language = output_language
|
||||||
|
|
||||||
|
self.api_key = api_key or os.getenv("OPENAI_API_KEY")
|
||||||
|
if not self.api_key:
|
||||||
|
raise ValueError(
|
||||||
|
"API key not provided. Set OPENAI_API_KEY environment variable "
|
||||||
|
"or pass api_key parameter"
|
||||||
|
)
|
||||||
|
|
||||||
|
self.client = OpenAI(
|
||||||
|
api_key=self.api_key,
|
||||||
|
base_url=api_base_url
|
||||||
|
)
|
||||||
|
self.model_name = model_name
|
||||||
|
|
||||||
|
def transcribe_audio(self, audio_path: str) -> dict:
|
||||||
|
"""
|
||||||
|
Transcribe audio file using Whisper
|
||||||
|
|
||||||
|
Args:
|
||||||
|
audio_path: Path to audio file (mp3, wav, m4a, etc.)
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Dictionary with transcription results including text and segments
|
||||||
|
"""
|
||||||
|
print(f"Transcribing audio file: {audio_path}")
|
||||||
|
|
||||||
|
if not Path(audio_path).exists():
|
||||||
|
raise FileNotFoundError(f"Audio file not found: {audio_path}")
|
||||||
|
|
||||||
|
result = self.whisper_model.transcribe(
|
||||||
|
audio_path,
|
||||||
|
language=None, # Auto-detect language
|
||||||
|
verbose=False
|
||||||
|
)
|
||||||
|
|
||||||
|
print(f"Transcription complete. Length: {len(result['text'])} characters")
|
||||||
|
return result
|
||||||
|
|
||||||
|
def summarize_text(self, text: str) -> str:
|
||||||
|
"""
|
||||||
|
Summarize transcribed text using LLM
|
||||||
|
|
||||||
|
Args:
|
||||||
|
text: Transcribed text to summarize
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Summary text
|
||||||
|
"""
|
||||||
|
print("Generating summary using LLM...")
|
||||||
|
|
||||||
|
system_prompt = f"""You are an assistant that summarizes meeting transcripts.
|
||||||
|
Create a structured summary in {self.output_language} with the following points:
|
||||||
|
|
||||||
|
1. **Main Topics**: The most important topics discussed
|
||||||
|
2. **Decisions**: Decisions that were made
|
||||||
|
3. **Action Items**: Tasks and responsibilities
|
||||||
|
4. **Next Steps**: Planned next steps
|
||||||
|
|
||||||
|
Be precise and concrete. Write your entire response in {self.output_language}."""
|
||||||
|
|
||||||
|
response = self.client.chat.completions.create(
|
||||||
|
model=self.model_name,
|
||||||
|
messages=[
|
||||||
|
{"role": "system", "content": system_prompt},
|
||||||
|
{"role": "user", "content": f"Please summarize this meeting transcript:\n\n{text}"}
|
||||||
|
],
|
||||||
|
temperature=0.3,
|
||||||
|
max_tokens=2000
|
||||||
|
)
|
||||||
|
|
||||||
|
summary = response.choices[0].message.content
|
||||||
|
print("Summary generated successfully")
|
||||||
|
return summary
|
||||||
|
|
||||||
|
def process_meeting(
|
||||||
|
self,
|
||||||
|
audio_path: str,
|
||||||
|
output_dir: Optional[str] = None,
|
||||||
|
save_transcript: bool = True
|
||||||
|
) -> tuple[str, str]:
|
||||||
|
"""
|
||||||
|
Complete pipeline: transcribe and summarize meeting audio
|
||||||
|
|
||||||
|
Args:
|
||||||
|
audio_path: Path to audio file
|
||||||
|
output_dir: Directory to save outputs (default: same as audio file)
|
||||||
|
save_transcript: Whether to save the full transcript
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Tuple of (transcript, summary)
|
||||||
|
"""
|
||||||
|
# Transcribe
|
||||||
|
result = self.transcribe_audio(audio_path)
|
||||||
|
transcript = result["text"]
|
||||||
|
|
||||||
|
# Generate summary
|
||||||
|
summary = self.summarize_text(transcript)
|
||||||
|
|
||||||
|
# Save outputs if requested
|
||||||
|
if output_dir or save_transcript:
|
||||||
|
audio_file = Path(audio_path)
|
||||||
|
if output_dir:
|
||||||
|
output_path = Path(output_dir)
|
||||||
|
else:
|
||||||
|
output_path = audio_file.parent
|
||||||
|
|
||||||
|
output_path.mkdir(parents=True, exist_ok=True)
|
||||||
|
base_name = audio_file.stem
|
||||||
|
|
||||||
|
if save_transcript:
|
||||||
|
transcript_file = output_path / f"{base_name}_transcript.txt"
|
||||||
|
transcript_file.write_text(transcript, encoding="utf-8")
|
||||||
|
print(f"Transcript saved to: {transcript_file}")
|
||||||
|
|
||||||
|
summary_file = output_path / f"{base_name}_summary.txt"
|
||||||
|
summary_file.write_text(summary, encoding="utf-8")
|
||||||
|
print(f"Summary saved to: {summary_file}")
|
||||||
|
|
||||||
|
return transcript, summary
|
||||||
|
|
||||||
|
|
||||||
|
def main():
|
||||||
|
parser = argparse.ArgumentParser(
|
||||||
|
description="Transcribe and summarize meeting audio files"
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"audio_file",
|
||||||
|
help="Path to audio file (mp3, wav, m4a, etc.)"
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--whisper-model",
|
||||||
|
default="base",
|
||||||
|
choices=["tiny", "base", "small", "medium", "large"],
|
||||||
|
help="Whisper model size (default: base)"
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--api-base",
|
||||||
|
default="https://api.openai.com/v1",
|
||||||
|
help="Base URL for OpenAI-compatible API"
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--api-key",
|
||||||
|
help="API key (defaults to OPENAI_API_KEY env var)"
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--model",
|
||||||
|
default="gpt-4",
|
||||||
|
help="LLM model name (default: gpt-4)"
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--language",
|
||||||
|
default="english",
|
||||||
|
help="Output language for the summary (e.g., english, german, spanish) (default: english)"
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--output-dir",
|
||||||
|
help="Output directory for transcript and summary"
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--no-transcript",
|
||||||
|
action="store_true",
|
||||||
|
help="Don't save the full transcript"
|
||||||
|
)
|
||||||
|
|
||||||
|
args = parser.parse_args()
|
||||||
|
|
||||||
|
try:
|
||||||
|
summarizer = MeetingSummarizer(
|
||||||
|
whisper_model=args.whisper_model,
|
||||||
|
api_base_url=args.api_base,
|
||||||
|
api_key=args.api_key,
|
||||||
|
model_name=args.model,
|
||||||
|
output_language=args.language
|
||||||
|
)
|
||||||
|
|
||||||
|
transcript, summary = summarizer.process_meeting(
|
||||||
|
audio_path=args.audio_file,
|
||||||
|
output_dir=args.output_dir,
|
||||||
|
save_transcript=not args.no_transcript
|
||||||
|
)
|
||||||
|
|
||||||
|
print("\n" + "=" * 80)
|
||||||
|
print("SUMMARY")
|
||||||
|
print("=" * 80)
|
||||||
|
print(summary)
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
print(f"Error: {e}")
|
||||||
|
return 1
|
||||||
|
|
||||||
|
return 0
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
exit(main())
|
||||||
Reference in New Issue
Block a user