Trainspodder&Whisper Transcribes Radio w/ Good Proper Noun Spelling, Inferring Blanked Words, Lyrics

2 years ago

Trainspodder OpenAI Whisper VoiceActivityDetection Qt QtProject MobileApplication C OpenGL Android

Showing Trainspodder's updated GUI for displaying Whisper Transcriptions ( https://github.com/openai/whisper/ ) as an additional analysis source, this demonstrates some of the features, both bad and good, of using Whisper for free-form transcription of podcasts and broadcasts within Trainspodder. Trainspodder's own display of segmentation, speech/music segments, beat-events, BPMs, etc also provides good and consistent timing markers against the transcribed text, potentially including speaker-detection for multi-party conversations, esp if the voices are tonally different.

Good and/or Interesting to Note:
* Excellent accuracy transcribing the artist's names mentioned, band names, song names, etc.
* No problem with Patois accents, Reggae lyrics.
* Hallucinating the word "Hell" in a contextually correct place where it was blanked out for radio broadcast (while significantly worse "rap" lyrics weren't)
* Either phonetic or actual transcription of Ghanaian language lyrics.
* Attempting to transcribe rap lyrics and perhaps hallucinating the N-word, from a very unclear portion, but *ing out most of the letters.

Bad:
* One instances where the timing of a song lyric segment -- off by seconds.
* Saxophone solos trigger repeated short-phrase hallucinations. (Noticed multiple times over 10's of transcriptions, so perhaps Whisper has emergent Beatnik consciousness and will eventually go "cool cat" "that's hep" or "don't be a drag" in response to jazzy groovy music.)

FYI, I'm using the following whisper call to process these media files:
'whisper --language English --model medium --condition_on_previous_text False --compression_ratio_threshold 1.8 $*'

The original broadcast being played is at
https://www.bbc.co.uk/programmes/m001cpzt
(Gilles Peterson -- Joining the musical dots: Maya Delilah & Conor Albert)

Please note that any copyrighted media being played back is under "fair use" for purposes of demonstration of this app, and research into AI Audio auto-transcription and deep-searching of podcast and broadcast media, aka https://www.trainspodder.com

Loading comments...

Comments