Publications

Asterisks (*) denote equal contribution. Order of authors determined randomly.

An up-to-date list can also be found on my Google Scholar profile here.

2023

  1. EMNLP
    Multilingual Large Language Models Are Not (Yet) Code-Switchers
    Zhang, Ruochen*, Cahyawijaya, Samuel*, Cruz, Jan Christian Blaise*, Winata, Genta Indra*, and Aji, Alham Fikri*
    To Appear In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP) 2023
  2. WMT
    Samsung R&D Institute Philippines at WMT 2023
    Cruz, Jan Christian Blaise
    To Appear In Proceedings of the Eighth Conference on Machine Translation 2023
  3. CALCS
    Prompting Multilingual Large Language Models to Generate Code-Mixed Texts: The Case of South East Asian Languages
    Yong, Zheng-Xin, Zhang, Ruochen, Forde, Jessica Zosa, Wang, Skyler, Cahyawijaya, Samuel, Lovenia, Holy, Winata, Genta Indra, Sutawika, Lintang, Cruz, Jan Christian Blaise, Tan, Yin Lin, Phan, Long, Garcia, Rowena, Solorio, Thamar, and Aji, Alham Fikri
    To Appear In Proceedings of the Sixth Computational Approaches to Linguistic Code Switching Workshop (CALCS) 2023
  4. SEALP
    Automatic WordNet Construction using Word Sense Induction through Sentence Embeddings
    Velasco, Dan John, Alba, Axel, Pelagio, Trisha Gail, Ramirez, Bryce Anthony, Chua, Unisse, Samson, Briane Paul, Cruz, Jan Christian Blaise, and Cheng, Charibeth
    To Appear In Proceedings of the First Workshop in Southeast Asian Language Processing (SEALP) 2023

2022

  1. WMT
    Samsung Research Philippines - Datasaur AI’s Submission for the WMT22 Large Scale Multilingual Translation Task
    Cruz, Jan Christian Blaise, and Sutawika, Lintang
    In Proceedings of the Seventh Conference on Machine Translation 2022
  2. IALP
    Using Synthetic Data for Conversational Response Generation in Low-resource Settings
    Tan, Gabriel Louis, Ty, Adrian Paule, Ng, Schuyler, Co, Denzel Adrian, Cruz, Jan Christian Blaise, and Cheng, Charibeth
    In Proceedings of the 2022 International Conference on Asian Language Processing 2022
  3. LREC
    Improving Large-scale Language Models and Resources for Filipino
    Cruz, Jan Christian Blaise, and Cheng, Charibeth
    In Proceedings of the 13th Language Resources and Evaluation Conference 2022

2021

  1. WMT
    Data Processing Matters: SRPH-Konvergen AI’s Machine Translation System for WMT’21
    Sutawika, Lintang*, and Cruz, Jan Christian Blaise*
    In Proceedings of the Sixth Conference on Machine Translation 2021
  2. PRICAI
    Exploiting News Article Structure for Automatic Corpus Generation of Entailment Datasets
    Cruz, Jan Christian Blaise, Resabal, Jose Kristian, Lin, James, Velasco, Dan John, and Cheng, Charibeth
    In Pacific Rim International Conference on Artificial Intelligence 2021
  3. PRICAI
    Simplifying Paragraph-level Question Generation via Transformer Language Models
    Lopez, Luis Enrico*, Cruz, Diane Kathryn*, Cruz, Jan Christian Blaise*, and Cheng, Charibeth
    In Pacific Rim International Conference on Artificial Intelligence 2021

2020

  1. LREC
    Localization of Fake News Detection via Multitask Transfer Learning
    Cruz, Jan Christian Blaise, Tan, Julianne Agatha, and Cheng, Charibeth
    In Proceedings of The 12th Language Resources and Evaluation Conference 2020
  2. arXiv
    Establishing Baselines for Text Classification in Low-Resource Languages
    Cruz, Jan Christian Blaise, and Cheng, Charibeth
    2020
  3. PCJ
    Evaluating Language Model Finetuning Techniques for Low-resource Languages
    Cruz, Jan Christian Blaise, and Cheng, Charibeth
    Philippine Computing Journal 2020

2019

  1. B.Sc. Thesis
    Cruz, J. C. B., Tan, J. A., & Cheng, C. (2019). Localization of Fake News Detection via Multitask Transfer Learning. De La Salle University-Manila.

2018

  1. CHIUXID
    Building Guitar Strum Models for an Interactive Air Guitar Prototype
    Tamani, John Edel*, Cruz, Jan Christian Blaise*, Valenzuela, Jolene*, Cruzada, Joshua*, Chan, Kevin, and Deja, Jordan
    In 4th International Conference on Human-Computer Interaction and User Experience in Indonesia 2018