Publications

Conference, journal, and preprint publications of my work

An updated list of my publications can also be found on my Google Scholar profile here.

2024

  1. ARR
    Thank You, Stingray: Multilingual Large Language Models Can Not (Yet) Disambiguate Cross-Lingual Word Sense
    Samuel Cahyawijaya*, Ruochen Zhang*Jan Christian Blaise Cruz*, Holy Lovenia*, Hiroki Nomoto*, and Alham Fikri Aji*
    Under review at ACL Rolling Review , 2024
  2. ARR
    WorldCuisines: A Massive-Scale Benchmark for Multilingual and Multicultural Visual Question Answering on Global Cuisines
    Genta Indra Winata*†, Frederikus Hudi*†, Patrick Amadeus Irawan*†, David Anugraha*†, Rifki Afina Putri*†, Yutong Wang, Adam Nohejl, Ubaidillah Ariq Prathama, Nedjma Ousidhoum, Afifa Amriani, Anar Rzayev, and 40 more authors
    Under review at ACL Rolling Review , 2024
  3. EMNLP
    SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages
    Holy Lovenia, Rahmad Mahendra, Salsabil Maulana Akbar, Lester James V. Miranda, Jennifer Santoso, Elyanah Aco, Akhdan Fadhilah, Jonibek Mansurov, Joseph Marvin Imperial, Onno P. Kampman, Joel Ruben Antony Moniz, and 50 more authors
    To Appear in Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP) , 2024
  4. NeurIPS
    Oral Presentation
    CVQA: Culturally-diverse Multilingual Question Answering Benchmark
    David Romero*†, Chenyang Lyu*†, Haryo Akbarianto Wibowo, Teresa Lynn, Injy Hamed, Aditya Nanda Kishore, Aishik Mandal, Alina Dragonetti, Artem Abzaliev, Atnafu Lambebo Tonja, Bontu Fufa Balcha, and 64 more authors
    To Appear in Proceedings of the Thirty-Eighth Annual Conference on Neural Information Processing Systems (NeurIPS) , 2024
  5. WMT
    Samsung R&D Institute Philippines @ WMT 2024 Low-resource Languages of Spain Shared Task
    Dan John Velasco*, Manuel Antonio Rufino*, and Jan Christian Blaise Cruz§
    To Appear in Proceedings of the Ninth Conference on Machine Translation (WMT) , 2024
  6. WMT
    Samsung R&D Institute Philippines @ WMT 2024 Indic MT Task
    Matthew Theodore Roque*, Carlos Rafael Catalan*, Dan John Velasco, Manuel Antonio Rufino, and Jan Christian Blaise Cruz§
    To Appear in Proceedings of the Ninth Conference on Machine Translation (WMT) , 2024

2023

  1. EMNLP
    Oral Presentation
    Multilingual Large Language Models Are Not (Yet) Code-Switchers
    Ruochen Zhang*, Samuel Cahyawijaya*Jan Christian Blaise Cruz*, Genta Indra Winata*, and Alham Fikri Aji*
    In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
  2. WMT
    Samsung R&D Institute Philippines at WMT 2023
    Jan Christian Blaise Cruz
    In Proceedings of the Eighth Conference on Machine Translation (WMT), 2023
  3. CALCS
    Prompting Multilingual Large Language Models to Generate Code-Mixed Texts: The Case of South East Asian Languages
    Zheng-Xin Yong, Ruochen Zhang, Jessica Zosa Forde, Skyler Wang, Samuel Cahyawijaya, Holy Lovenia, Genta Indra Winata, Lintang Sutawika, Jan Christian Blaise Cruz, Yin Lin Tan, Long Phan, and 3 more authors
    In Proceedings of the Sixth Computational Approaches to Linguistic Code Switching Workshop (CALCS), 2023
  4. AACL
    Tutorials
    Current Status of NLP in South East Asia with Insights from Multilingualism and Linguistic Diversity
    Alham Fikri Aji, Jessica Zosa Forde, Alyssa Marie Loo, Lintang Sutawika, Skyler Wang, Genta Indra Winata, Zheng-Xin Yong, Ruochen Zhang, A Seza Dogruöz, Yin Lin Tan, and Jan Christian Blaise Cruz
    In Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics: Tutorial Abstract, 2023
  5. SEALP
    Oral Presentation
    Automatic WordNet Construction using Word Sense Induction through Sentence Embeddings
    Dan John Velasco, Axel Alba, Trisha Gail Pelagio, Bryce Anthony Ramirez, Unisse Chua, Briane Paul Samson, Jan Christian Blaise Cruz§, and Charibeth Cheng§
    In Proceedings of the First Workshop in Southeast Asian Language Processing (SEALP), 2023

2022

  1. WMT
    Samsung Research Philippines - Datasaur AI’s Submission for the WMT22 Large Scale Multilingual Translation Task
    Jan Christian Blaise Cruz*, and Lintang Sutawika*
    In Proceedings of the Seventh Conference on Machine Translation (WMT), 2022
  2. IALP
    Best Paper Award
    Using Synthetic Data for Conversational Response Generation in Low-resource Settings
    Gabriel Louis Tan, Adrian Paule Ty, Schuyler Ng, Denzel Adrian Co, Jan Christian Blaise Cruz§, and Charibeth Cheng§
    In Proceedings of the 2022 International Conference on Asian Language Processing (IALP), 2022
  3. LREC
    Improving Large-scale Language Models and Resources for Filipino
    Jan Christian Blaise Cruz, and Charibeth Cheng
    In Proceedings of the 13th Language Resources and Evaluation Conference (LREC), 2022

2021

  1. WMT
    Data Processing Matters: SRPH-Konvergen AI’s Machine Translation System for WMT’21
    Lintang Sutawika*, and Jan Christian Blaise Cruz*
    In Proceedings of the Sixth Conference on Machine Translation (WMT), 2021
  2. PRICAI
    Oral Presentation
    Exploiting News Article Structure for Automatic Corpus Generation of Entailment Datasets
    Jan Christian Blaise Cruz, Jose Kristian Resabal, James Lin, Dan John Velasco, and Charibeth Cheng
    In Pacific Rim International Conference on Artificial Intelligence (PRICAI), 2021
  3. PRICAI
    Oral Presentation
    Simplifying Paragraph-level Question Generation via Transformer Language Models
    Luis Enrico Lopez*, Diane Kathryn Cruz*Jan Christian Blaise Cruz*, and Charibeth Cheng
    In Pacific Rim International Conference on Artificial Intelligence (PRICAI), 2021

2020

  1. LREC
    Localization of Fake News Detection via Multitask Transfer Learning
    Jan Christian Blaise Cruz, Julianne Agatha Tan, and Charibeth Cheng
    In Proceedings of The 12th Language Resources and Evaluation Conference (LREC), 2020
  2. arXiv
    Establishing Baselines for Text Classification in Low-Resource Languages
    Jan Christian Blaise Cruz, and Charibeth Cheng
    2020
  3. PCJ
    Evaluating Language Model Finetuning Techniques for Low-resource Languages
    Jan Christian Blaise Cruz, and Charibeth Cheng
    Philippine Computing Journal (PCJ), 2020

2019

  1. B.Sc. Thesis
    Localization of Fake News Detection via Multitask Transfer Learning
    Jan Christian Blaise Cruz, Julianne Agatha Tan, and Charibeth Cheng
    De La Salle University-Manila, 2019
    Undergraduate Thesis

2018

  1. CHIUXID (Oral)
    Building Guitar Strum Models for an Interactive Air Guitar Prototype
    John Edel Tamani*Jan Christian Blaise Cruz*, Jolene Valenzuela*, Joshua Cruzada*, Kevin Chan, and Jordan Deja
    In 4th International Conference on Human-Computer Interaction and User Experience in Indonesia, 2018