Blaise Cruz

profile2.jpg

Mabuhay! đź‘‹

I’m a PhD student at MBZUAI supervised by Dr. Alham Fikri Aji specializing in problems at the intersection of Multilinguality and Low-resource Languages.

Particularly, I am interested in understanding the behavior of models when constrained under low-resource multilingual domains. I’ve collaborated with many talented colleagues on various topics under this umbrella, including:

Prior to my PhD, I was Lead Research Engineer at Samsung Research in the Philippines where I worked on low-resource machine translation and dialogue generation. I have also been previously affiliated with the University of the Philippines, De La Salle University, and Senti AI.

If you’re interested in collaborating or if you want to chat about low-resource languages, feel free to get in touch! You may reach me through my email me (at) blaisecruz (dot) com.


News

Oct 18, 2024 We release World Cuisines, a massive multilingual and multicultural VQA benchmark dataset. Preprint can be accessed here.
Jun 17, 2024 The preprint for our paper SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages is out!
Jun 12, 2024 The preprint for our paper CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark is out!
May 15, 2024 I’ll be joining the Mohammed bin Zayed University of Artificial Intelligence as a PhD student this Fall 2024!
Mar 06, 2024 The SEACrowd Data Catalogue – the main consolidated repositority for all datasets collected by the SEACrowd Project – is now live!

Latest Posts

Jun 12, 2024 Welcome!

Selected Publications

  1. EMNLP
    SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages
    Holy Lovenia†, Rahmad Mahendra†, Salsabil Maulana Akbar†, Lester James V. Miranda†, Jennifer Santoso†, Elyanah Aco†, Akhdan Fadhilah†, Jonibek Mansurov†, Joseph Marvin Imperial†, Onno P. Kampman†, Joel Ruben Antony Moniz†, and 50 more authors
    To Appear in Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP) , 2024
  2. EMNLP
    Oral Presentation
    Multilingual Large Language Models Are Not (Yet) Code-Switchers
    Ruochen Zhang*, Samuel Cahyawijaya*, Jan Christian Blaise Cruz*, Genta Indra Winata*, and Alham Fikri Aji*
    In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
  3. LREC
    Improving Large-scale Language Models and Resources for Filipino
    Jan Christian Blaise Cruz, and Charibeth Cheng
    In Proceedings of the 13th Language Resources and Evaluation Conference (LREC), 2022