This project demonstrates the use of local AI to transcribe, diarize, and enrich transcripts created from videos
It uses:
You will need to download and install microsoft/Phi-3-medium-4k-instruct-onnx-directml and update the model_path below. If you do not have GPU or are not using Windows, see the Phi-3 docs and set yourself up accordingly.
Pyannote.audio is gated on huggingface and requires an account and access key. See https://huggingface.co/pyannote/speaker-diarization-3.1 for instructions.
Notebooks
| Name | Description |
|---|---|
| DownloadFromYoutube.ipynb | Use pytube to download a video |
| ExtractAudioFromVideo.ipynb | Extract mp3 from mp4 with moviepy |
| TranscribeVideoWithWhisperLarge.ipynb | Create a transcript with whisper-large-v3 |
| DiarizeWithPyannote.ipynb | Diarize with Pyannote |
| Phi3-ONNX.ipynb | Use Phi-3 with ONNX to identify names and finalize transcript |
| phi3test-transformers.ipynb | A test for comparing transformers to ONNX (spoiler ONNX is waaaaay faster) |
| VideoToFullTranscriptWithWhisperPyannoteAndPhi3-medium.ipynb | Complete process in a single notebook |