Blaise Cruz
Mabuhay! đź‘‹
I’m a PhD student at MBZUAI supervised by Dr. Alham Fikri Aji specializing in problems at the intersection of Multilinguality and Low-resource Languages.
Particularly, I am interested in understanding the behavior of models when constrained under low-resource multilingual domains. I’ve collaborated with many talented colleagues on various topics under this umbrella, including:
- Code Switching – Multilingual speakers naturally code-switch in two or more languages when speaking to peers, but multilingual models are still lacking in capabilities to understand and execute this phenomenon.
- Resources & Evaluation – More data is often the best remedy to “very little data”. In addition to working on 🇵🇠Filipino resources, I have also done work for Southeast Asian Languages and beyond.
- Applications in Low-resource – Employing creative techniques to improve performance in tasks such as Multilingual Translation, Question Generation, Fake News Detection, and more – all constrained under low-resource settings.
Prior to my PhD, I was Lead Research Engineer at Samsung Research in the Philippines where I worked on low-resource machine translation and dialogue generation. I have also been previously affiliated with the University of the Philippines, De La Salle University, and Senti AI.
If you’re interested in collaborating or if you want to chat about low-resource languages, feel free to get in touch! You may reach me through my email me (at) blaisecruz (dot) com
.
News
Oct 18, 2024 | We release World Cuisines, a massive multilingual and multicultural VQA benchmark dataset. Preprint can be accessed here. |
---|---|
Jun 17, 2024 | The preprint for our paper SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages is out! |
Jun 12, 2024 | The preprint for our paper CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark is out! |
May 15, 2024 | I’ll be joining the Mohammed bin Zayed University of Artificial Intelligence as a PhD student this Fall 2024! |
Mar 06, 2024 | The SEACrowd Data Catalogue – the main consolidated repositority for all datasets collected by the SEACrowd Project – is now live! |
Latest Posts
Jun 12, 2024 | Welcome! |
---|