Blaise Cruz

Mabuhay! 👋

I’m a PhD student at MBZUAI supervised by Dr. Alham Fikri Aji working on novel approaches to Modeling Multilinguality. In particular, my work centers on methods that reframe multilinguality in an efficient and linguistically-motivated manner, beyond simply cramming hundreds of unique languages within the same billion-ish parameters.

Besides this, I’ve worked on various other topics under the multilinguality and low-resource umbrella, including:

Code Switching – Multilingual speakers naturally code-switch in two or more languages when speaking to peers, but multilingual models are still lacking in capabilities to understand and execute this phenomenon.
Resources & Evaluation – More data is often the best remedy to “very little data”. Beyond Filipino resources and benchmarks for my home region, I have also collaborated for Southeast Asian tools and resources as a co-founder of SEACrowd and ACL SIGSEA.
Beyond Southeast Asia – Beyond my home region, I’ve also worked on global efforts for cultural QA, cultural MT and large-scale cultural benchmarking. More recenty, I’ve dabbled in topics such as theory of mind and competitive programming, leveraging my past as an ICPC contestant.
Applications in Low-resource – Lastly, I’ve worked on improving performance in tasks such as Multilingual Translation, Question Generation, Fake News Detection, and more – all constrained under low-resource settings.

Prior to my PhD, I was Lead Research Engineer at Samsung Research where I worked on low-resource machine translation and dialogue generation. I’ve also previously been affiliated with Mila - Quebec AI Institute and McGill University, the University of the Philippines, De La Salle University, and Senti AI.

If you’re interested in collaborating or if you want to chat about low-resource languages, feel free to get in touch! You may reach me through my email me (at) blaisecruz (dot) com.

News

Jan 20, 2026	Our new paper on algorithm-focused benchmarking for competitive programming, Idea First, Code Later, is finally out!
Jan 16, 2026	Proud to release my new work, Multilinguality as Sense Adaptation! Many thanks to Mila - Quebec AI Institute and McGill NLP for hosting me in Montréal and making the work possible.
Aug 21, 2025	Three papers accepted for EMNLP 2025!
Aug 12, 2025	We’re proud to release FilBench, the first Open LLM Evaluation Suite and Leaderboard for Filipino!
Jul 09, 2025	We’re excited to announce MoMentS, a new comprehensive multimodal benchmark for theory of mind in large language models!

Latest posts

Jun 12, 2024	Welcome!

Selected Publications

arXiv

Multilinguality as Sense Adaptation

Jan Christian Blaise Cruz, David Ifeoluwa Adelani, and Alham Fikri Aji

2026

arXiv Code
EMNLP

FilBench: Can LLMs Understand and Generate Filipino?

Lester James V Miranda^*, Elyanah Aco^*, Conner Manuel^*, Jan Christian Blaise Cruz^†, and Joseph Marvin Imperial^†

In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2025

Resources arXiv Code

Leaderboard
Blog
EMNLP
Oral Presentation

Multilingual Large Language Models Are Not (Yet) Code-Switchers

Ruochen Zhang^*, Samuel Cahyawijaya^*, Jan Christian Blaise Cruz^*, Genta Indra Winata^*, and Alham Fikri Aji^*

In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023

arXiv PDF