An updated list of my publications can also be found on my Google Scholar profile here.
2024
Thank You, Stingray: Multilingual Large Language Models Can Not (Yet) Disambiguate Cross-Lingual Word Sense
Samuel
Cahyawijaya*, Ruochen
Zhang*, Jan Christian Blaise
Cruz*, Holy
Lovenia*, Hiroki
Nomoto*, and Alham Fikri
Aji*
Under review at ACL Rolling Review
, 2024
WorldCuisines: A Massive-Scale Benchmark for Multilingual and Multicultural Visual Question Answering on Global Cuisines
Genta Indra
Winata*†, Frederikus
Hudi*†, Patrick Amadeus
Irawan*†, David
Anugraha*†, Rifki Afina
Putri*†, Yutong
Wang†, Adam
Nohejl†, Ubaidillah Ariq
Prathama†, Nedjma
Ousidhoum†, Afifa
Amriani, Anar
Rzayev, and
40 more authors
Under review at ACL Rolling Review
, 2024
SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages
Holy
Lovenia†, Rahmad
Mahendra†, Salsabil Maulana
Akbar†, Lester James V.
Miranda†, Jennifer
Santoso†, Elyanah
Aco†, Akhdan
Fadhilah†, Jonibek
Mansurov†, Joseph Marvin
Imperial†, Onno P.
Kampman†, Joel Ruben Antony
Moniz†, and
50 more authors
To Appear in Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP)
, 2024
CVQA: Culturally-diverse Multilingual Question Answering Benchmark
David
Romero*†, Chenyang
Lyu*†, Haryo Akbarianto
Wibowo†, Teresa
Lynn, Injy
Hamed, Aditya Nanda
Kishore, Aishik
Mandal, Alina
Dragonetti, Artem
Abzaliev, Atnafu Lambebo
Tonja, Bontu Fufa
Balcha, and
64 more authors
To Appear in Proceedings of the Thirty-Eighth Annual Conference on Neural Information Processing Systems (NeurIPS)
, 2024
Samsung R&D Institute Philippines @ WMT 2024 Low-resource Languages of Spain Shared Task
Dan John
Velasco*, Manuel Antonio
Rufino*, and Jan Christian Blaise
Cruz§
To Appear in Proceedings of the Ninth Conference on Machine Translation (WMT)
, 2024
Samsung R&D Institute Philippines @ WMT 2024 Indic MT Task
Matthew Theodore
Roque*, Carlos Rafael
Catalan*, Dan John
Velasco, Manuel Antonio
Rufino, and Jan Christian Blaise
Cruz§
To Appear in Proceedings of the Ninth Conference on Machine Translation (WMT)
, 2024
2023
Multilingual Large Language Models Are Not (Yet) Code-Switchers
Ruochen
Zhang*, Samuel
Cahyawijaya*, Jan Christian Blaise
Cruz*, Genta Indra
Winata*, and Alham Fikri
Aji*
In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Samsung R&D Institute Philippines at WMT 2023
Jan Christian Blaise
Cruz
In Proceedings of the Eighth Conference on Machine Translation (WMT), 2023
Prompting Multilingual Large Language Models to Generate Code-Mixed Texts: The Case of South East Asian Languages
Zheng-Xin
Yong, Ruochen
Zhang, Jessica Zosa
Forde, Skyler
Wang, Samuel
Cahyawijaya, Holy
Lovenia, Genta Indra
Winata, Lintang
Sutawika, Jan Christian Blaise
Cruz, Yin Lin
Tan, Long
Phan, and
3 more authors
In Proceedings of the Sixth Computational Approaches to Linguistic Code Switching Workshop (CALCS), 2023
Current Status of NLP in South East Asia with Insights from Multilingualism and Linguistic Diversity
Alham Fikri
Aji, Jessica Zosa
Forde, Alyssa Marie
Loo, Lintang
Sutawika, Skyler
Wang, Genta Indra
Winata, Zheng-Xin
Yong, Ruochen
Zhang, A Seza
Dogruöz, Yin Lin
Tan, and Jan Christian Blaise
Cruz
In Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics: Tutorial Abstract, 2023
Automatic WordNet Construction using Word Sense Induction through Sentence Embeddings
Dan John
Velasco, Axel
Alba, Trisha Gail
Pelagio, Bryce Anthony
Ramirez, Unisse
Chua, Briane Paul
Samson, Jan Christian Blaise
Cruz§, and Charibeth
Cheng§
In Proceedings of the First Workshop in Southeast Asian Language Processing (SEALP), 2023
2022
Samsung Research Philippines - Datasaur AI’s Submission for the WMT22 Large Scale Multilingual Translation Task
Jan Christian Blaise
Cruz*, and Lintang
Sutawika*
In Proceedings of the Seventh Conference on Machine Translation (WMT), 2022
Using Synthetic Data for Conversational Response Generation in Low-resource Settings
Gabriel Louis
Tan, Adrian Paule
Ty, Schuyler
Ng, Denzel Adrian
Co, Jan Christian Blaise
Cruz§, and Charibeth
Cheng§
In Proceedings of the 2022 International Conference on Asian Language Processing (IALP), 2022
Improving Large-scale Language Models and Resources for Filipino
Jan Christian Blaise
Cruz, and Charibeth
Cheng
In Proceedings of the 13th Language Resources and Evaluation Conference (LREC), 2022
2021
Data Processing Matters: SRPH-Konvergen AI’s Machine Translation System for WMT’21
Lintang
Sutawika*, and Jan Christian Blaise
Cruz*
In Proceedings of the Sixth Conference on Machine Translation (WMT), 2021
Exploiting News Article Structure for Automatic Corpus Generation of Entailment Datasets
Jan Christian Blaise
Cruz, Jose Kristian
Resabal, James
Lin, Dan John
Velasco, and Charibeth
Cheng
In Pacific Rim International Conference on Artificial Intelligence (PRICAI), 2021
Simplifying Paragraph-level Question Generation via Transformer Language Models
Luis Enrico
Lopez*, Diane Kathryn
Cruz*, Jan Christian Blaise
Cruz*, and Charibeth
Cheng
In Pacific Rim International Conference on Artificial Intelligence (PRICAI), 2021
2020
Localization of Fake News Detection via Multitask Transfer Learning
Jan Christian Blaise
Cruz, Julianne Agatha
Tan, and Charibeth
Cheng
In Proceedings of The 12th Language Resources and Evaluation Conference (LREC), 2020
Establishing Baselines for Text Classification in Low-Resource Languages
Jan Christian Blaise
Cruz, and Charibeth
Cheng
2020
Evaluating Language Model Finetuning Techniques for Low-resource Languages
Jan Christian Blaise
Cruz, and Charibeth
Cheng
Philippine Computing Journal (PCJ), 2020
2019
Localization of Fake News Detection via Multitask Transfer Learning
Jan Christian Blaise
Cruz, Julianne Agatha
Tan, and Charibeth
Cheng
De La Salle University-Manila, 2019
Undergraduate Thesis
2018
Building Guitar Strum Models for an Interactive Air Guitar Prototype
John Edel
Tamani*, Jan Christian Blaise
Cruz*, Jolene
Valenzuela*, Joshua
Cruzada*, Kevin
Chan, and Jordan
Deja
In 4th International Conference on Human-Computer Interaction and User Experience in Indonesia, 2018