TranscriptionAndDiarization

This project demonstrates the use of local AI to transcribe, diarize, and enrich transcripts created from videos

It uses:

You will need to download and install microsoft/Phi-3-medium-4k-instruct-onnx-directml and update the model_path below. If you do not have GPU or are not using Windows, see the Phi-3 docs and set yourself up accordingly.

Pyannote.audio is gated on huggingface and requires an account and access key. See https://huggingface.co/pyannote/speaker-diarization-3.1 for instructions.

Notebooks

NameDescription
DownloadFromYoutube.ipynbUse pytube to download a video
ExtractAudioFromVideo.ipynbExtract mp3 from mp4 with moviepy
TranscribeVideoWithWhisperLarge.ipynbCreate a transcript with whisper-large-v3
DiarizeWithPyannote.ipynbDiarize with Pyannote
Phi3-ONNX.ipynbUse Phi-3 with ONNX to identify names and finalize transcript
phi3test-transformers.ipynbA test for comparing transformers to ONNX (spoiler ONNX is waaaaay faster)
VideoToFullTranscriptWithWhisperPyannoteAndPhi3-medium.ipynbComplete process in a single notebook