A real-time tool that actively and efficiently enhances communication for non-English-speaking individuals, accounting for cultural idiosyncrasies and colloquialisms, to facilitate their integration and independence. This project seeks to bridge the cultural gap for elderly immigrants in Australia by using Large Language Models to offer real-time translations that account for local slang and nuances. Based on the 2022-2023 Department of Home Affairs data, select the most common languages for immigrants (Mandarin, Japanese, Punjabi).
We use Diart in combination with real-time Whisper. Diart is responsible for identifying the current speaker, while Whisper handles speech transcription. We fine-tuned the Whisper model to better adapt to Australian slang. The transcribed text is then translated using DeepSeek LLM, with an AU slang-to-normal English dictionary integrated into the process.The translation pipeline operates via WebSocket, while the logic for database interactions is defined in Django. This backend is deployed on Google Cloud.
The user interface follows a minimalist design, with clear fonts and buttons tailored for our target audience. The entire translation process requires no user interaction with the screen. The translated text for both speakers is automatically displayed on-screen and stored in the database for future reference. This project was developed as part of the Capstone Course at the University of Queensland, in collaboration with Kintaro Kawai, Siddhant Arora, Jordan Singh, Milo Hunter, and Arpon Sarker.
* The code of the project is stored in a private github repository due to the protection model API.