TLDR: The ‘Hello Afrika!’ project introduces the first speech command model for Kinyarwanda, addressing the lack of voice-controlled functionalities for African languages. It details data collection from diverse sources, model training using advanced techniques like LSTM, and successful deployment on various devices. This initiative aims to enhance accessibility and digital inclusion, particularly for persons with disabilities, by enabling voice interaction in their native language.
Voice commands have become an integral part of our daily lives, enabling hands-free control of smart devices and activating larger AI systems. However, a significant gap exists for speakers of African languages, who often lack access to these functionalities due to a scarcity of relevant speech command models and datasets.
The “Hello Afrika!” project emerges as a pioneering initiative to bridge this critical gap. Its initial phase focuses on the Kinyarwanda language, spoken in Rwanda, a country that has shown a strong interest in developing speech recognition technologies. This project aims to create a robust speech command model that allows native Kinyarwanda speakers to interact with devices in their own language, fostering greater accessibility and inclusivity, particularly for persons with disabilities.
The researchers behind “Hello Afrika!” developed their model using a custom speech command corpus. This corpus includes general directives like “Start” and “Stop,” numbers from 0 to 9, and a unique wake word, “Muraho Afrika” (Hello Afrika). The development process involved gathering data from multiple sources, including the Multilingual Spoken Word Corpus (MSWC) and Google Speech Commands (GSC). Recognizing the need for more specific data, the team also undertook local data collection from over 140 native Kinyarwanda speakers, ensuring a diverse and comprehensive dataset.
The project explored different machine learning models for training. Initially, a two-layered Convolutional Neural Network (CNN) was used, but its performance on larger datasets was limited. The team then transitioned to a more effective LSTM-based model, which showed significant improvement, especially when trained with Mel-Frequency Cepstral Coefficients (MFCCs) – features specifically designed for human voice classification. Data augmentation techniques were also applied to enhance the dataset and further improve model performance.
Upon evaluation, the LSTM model demonstrated promising accuracy, particularly with the MSWC dataset. The final model was successfully deployed on various devices, including Linux PCs, mobile phones, and edge devices like the Wio Terminal, leveraging tools like Edge Impulse for simplified deployment. This multi-platform deployment highlights the practical applicability of the Kinyarwanda speech command system.
Looking ahead, the “Hello Afrika!” team plans to further refine the model by enhancing data cleaning processes and collecting more diverse audio samples, including adversarial words to improve robustness against false alarms. Future work also includes exploring personalization features, allowing the model to calibrate to individual user voices for a more tailored experience. This foundational work paves the way for broader adoption of voice-controlled technologies across African languages, promoting digital inclusion and empowering communities.
Also Read:
- Comparing ASR Models for Bangla: Wav2Vec-BERT Outperforms Whisper in Low-Resource Settings
- Beyond Data: Designing Culturally Attuned AI for Global Health in Latin America
For more detailed information, you can refer to the original research paper: “Hello Afrika!”: Speech Commands in Kinyarwanda.


