Thumbnail for the article 'What is HTML?'

RjochtWurd: Empowering Frisian Through Speech-to-Text Improvement

We recently participated in the Llama Impact Hackathon, organized by lablab.ai on November 9-10—a weekend filled with late nights, breakthroughs, and more energy drinks and popcorn than I want to admit. Our team—Me, Josh, Saker, and Efe—came together for the first time to develop RjochtWurd, a tool aimed at supporting the Frisian language by improving speech-to-text accuracy. Our mission was to help preserve and promote Frisian, an under-resourced language, by making modern technology more accessible.

The Problem

Frisian is a minority language spoken by approximately 400,000 people, but it lacks the same technological resources available to major languages. Existing speech-to-text solutions are either inaccurate or non-existent, making it challenging for Frisian speakers to comfortably use these technologies for activities like auto-captioning movies, transcribing radio shows, and taking personal notes. This digital gap contributes to the exclusion of Frisian speakers from technological advancements that are easily available to users of larger languages. Bridging this digital divide is essential to ensure that Frisian and other minority languages can thrive in the modern age.

Our Solution: RjochtWurd

RjochtWurd is designed to improve the accuracy of Frisian speech-to-text by correcting errors in automatic transcriptions. We used the Mozilla Common Voice dataset to gather labeled voice recordings, then ran these recordings through a Frisian ASR model (https://huggingface.co/wietsedv/wav2vec2-large-xlsr-53-frisian) to generate initial transcriptions. Like all Frisian speech-to-text transcriptions the transcriptions were often imperfect. We fine-tuned Llama 3.2 to map the erroneous transcriptions to their correct forms, effectively making Llama act as an error correction layer. The process is adaptable and language-agnostic, meaning it could easily be used to improve speech-to-text for other low-resource languages.

Challenges and Development Process

The hackathon presented several challenges, particularly due to the tight timeframe. Initially, we intended to train our model using multiple Frisian ASR sources, but time constraints forced us to simplify our approach. Additionally, computational limitations meant that we couldn’t use larger versions of the Llama model, so we started with the 1B parameter version for faster iteration and later moved to the 3B version for improved accuracy.

Throughout the development, our primary struggle was overfitting due to the limited dataset. The small dataset made it challenging for Llama to generalize well across different types of inputs. Moreover, deploying our model on a CPU-only server created additional issues with slow inference times, which delayed our progress and left us with no time to create a polished presentation for submission.

Despite these setbacks, we managed to quantize the model to make it feasible to run on limited hardware. The late-stage adjustments highlighted our need for adaptability and taught us valuable lessons about scoping projects realistically in high-pressure environments.

Results and Impact

The evaluation of RjochtWurd showed mixed results. For simple sentences, the model performed admirably, correcting most errors effectively. However, for more complex inputs, the model sometimes hallucinated, producing incoherent or irrelevant outputs. This inconsistency was largely due to the small dataset and the inherent challenge of error correction in low-resource languages.

Despite the challenges, RjochtWurd successfully demonstrated that Llama models can be used to significantly enhance speech-to-text tools for Frisian. The insights we gained from this hackathon provide a solid foundation for future work in improving speech technology for other minority languages.

Future Directions

Moving forward, we plan to explore the use of other models, like T5, which might be better suited for text-to-text tasks such as transcription error correction. We also envision extending our work to support additional low-resource languages, which could benefit from similar advancements in ASR technology. Another promising direction is using this approach for OCR corrections, making it applicable to noisy text extraction from scanned documents.

Reflections on the Hackathon Experience

This hackathon was a first-time collaboration for all of us, and we spent much of the initial day getting to know each other’s strengths and establishing clear workflows. The fast-paced setting pushed us to improve our communication and teamwork skills rapidly. We went from struggling to make simple decisions (like what pizza place to order from) to working effectively as a cohesive unit. By the end, not only had we developed a valuable language resource for Frisian, but we also became a team of friends—going from strangers arguing over frozen pizzas to a unit that could decide on Subway with lightning speed.

Join Us in Supporting Minority Languages

We invite you to explore RjochtWurd and join us in our mission to support minority languages through innovative AI. We are open to collaboration, feedback, and any form of support to help expand our work and bring it to a larger audience.

Together, we can make technology more inclusive for everyone, regardless of the language they speak.