ASR state-of-the art: Indicwav2vec
Automatic speech recognition is an autonomous automated technique for oral speech decoding and transcription (ASR). An ASR system typically uses one or more algorithms to map features to related texts after extracting features from audio recordings or streams.
IndicWav2Vec, a multilingual speech model is pretrained on 40 Indian languages. Among the collection of multilingual speech models, this one represents the widest variety of Indian languages.
The aim of this blog is to make you walk through the process of using Indicwav2vec for converting speech to text. It uses librosa library and ai4bharat/indicwav2vec model.
import time
from transformers import pipeline
import gradio as gr
import numpy as np
import librosa
transcriber_hindi = pipeline("automatic-speech-recognition", model="ai4bharat/indicwav2vec-hindi")
transcriber_bengali = pipeline("automatic-speech-recognition", model="ai4bharat/indicwav2vec_v1_bengali")
transcriber_odia = pipeline("automatic-speech-recognition", model="ai4bharat/indicwav2vec-odia")
transcriber_gujarati = pipeline("automatic-speech-recognition", model="ai4bharat/indicwav2vec_v1_gujarati")
# transcriber_telugu = pipeline("automatic-speech-recognition", model="ai4bharat/indicwav2vec_v1_telugu")
# transcriber_sinhala = pipeline("automatic-speech-recognition", model="ai4bharat/indicwav2vec_v1_sinhala")
# transcriber_tamil = pipeline("automatic-speech-recognition", model="ai4bharat/indicwav2vec_v1_tamil")
# transcriber_nepali = pipeline("automatic-speech-recognition", model="ai4bharat/indicwav2vec_v1_nepali")
# transcriber_marathi = pipeline("automatic-speech-recognition", model="ai4bharat/indicwav2vec_v1_marathi")
languages = ["hindi","bengali","odia","gujarati"]
def resample_to_16k(audio, orig_sr):
y_resampled = librosa.resample(y=audio, orig_sr=orig_sr, target_sr=16000)
return y_resampled
def transcribe(audio,lang="hindi"):
sr,y = audio
y = y.astype(np.float32)
y/= np.max(np.abs(y))
y_resampled = resample_to_16k(y,sr)
if lang not in languages:
return "No Model","So Stay tuned!"
pipe= eval(f'transcriber_{lang}')
start_time = time.time()
trans = pipe(y_resampled)
end_time = time.time()
return trans["text"],(end_time-start_time)
demo = gr.Interface(
transcribe,
inputs=["microphone",gr.Radio(["hindi","bengali","odia","gujarati"],value="hindi")],
# inputs=["microphone",gr.Radio(["hindi","bengali","odia","gujarati","telugu","sinhala","tamil","nepali","marathi"],value="hindi")],
outputs=["text","text"],
examples=[["./Samples/Hindi_1.mp3","hindi"],["./Samples/Hindi_2.mp3","hindi"],["./Samples/Hindi_3.mp3","hindi"],["./Samples/Hindi_4.mp3","hindi"],["./Samples/Hindi_5.mp3","hindi"],["./Samples/Tamil_2.mp3","hindi"],["./Samples/climate ex short.wav","hindi"],["./Samples/Gujarati_1.wav","gujarati"],["./Samples/Gujarati_2.wav","gujarati"],["./Samples/Bengali_1.wav","bengali"],["./Samples/Bengali_2.wav","bengali"]])
# examples=[["./Samples/Hindi_1.mp3","hindi"],["./Samples/Hindi_2.mp3","hindi"],["./Samples/Tamil_1.mp3","tamil"],["./Samples/Tamil_2.mp3","hindi"],["./Samples/Nepal_1.mp3","nepali"],["./Samples/Nepal_2.mp3","nepali"],["./Samples/Marathi_1.mp3","marathi"],["./Samples/Marathi_2.mp3","marathi"],["./Samples/climate ex short.wav","hindi"]])
demo.launch()
One can try this model from Ai4bharat Indicwave2vec Models — a Hugging Face Space by ashokrawat2023. For the demo purpose, results are shown for speech conversion into Hindi, Bengali, Gujrati and Odia languages.
Cheers!! Happy reading!! Keep learning!!
Please upvote if you liked this!! thanks!!
You can connect with me on Jyoti Dabass, Ph.D | LinkedIn and jyotidabass (Jyoti Dabass, Ph.D) (github.com) for more related content. Thanks!!
PlainEnglish.io 🚀
Thank you for being a part of the In Plain English community! Before you go:
- Be sure to clap and follow the writer️
- Learn how you can also write for In Plain English️
- Follow us: X | LinkedIn | YouTube | Discord | Newsletter
- Visit our other platforms: Stackademic | CoFeed | Venture