Publications

Group highlights

(For a full list see below or go to Google Scholar)

On the Humanity of Conversational AI: Evaluating the Psychological Portrayal of LLMs

We propose a framework, PPBench, for evaluating diverse psychological aspects of LLMs, including personality traits, interpersonal relationships, motivational tests, and emotional abilities.

Jen-tse Huang, Wenxuan Wang, Eric John Li, Man Ho LAM, Shujie Ren, Youliang Yuan, Wenxiang Jiao**, Zhaopeng Tu and Michael Lyu

ICLR 2024 Oral (1.2%) Code

GPT-4 Is Too Smart To Be Safe: Stealthy Chat with LLMs via Cipher

We propose a novel framework, CipherChat, to systematically examine the generalizability of safety alignment to non-natural languages – ciphers. GPT-4 understands ciphers such that it tend to generate unsafe outputs with CipherChat.

Youliang Yuan, Wenxiang Jiao, Wenxuan Wang, Jen-tse Huang, Pinjia He, Shuming Shi and Zhaopeng Tu

ICLR 2024 Code

Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate

We propose and define the Degeneration-of-Thought (DoT) problem in self-reflection, and address it by proposing the Multi-Agent Debate (MAD) framework to explore divergent chain-of-thoughts.

Tian Liang *, Zhiwei He *, Wenxiang Jiao *, Xing Wang, Yan Wang, Rui Wang, Yujiu Yang, Zhaopeng Tu and Shuming Shi

EMNLP 2024 Code

Exploring Human-Like Translation Strategy with Large Language Models

We propose the MAPS framework to enable LLMs (e.g., ChatGPT, Alpaca) to mimic the human translation process by multi-aspect prompting and selection.

Zhiwei He *, Tian Liang *, Wenxiang Jiao, Zhuosheng Zhang, Yujiu Yang, Rui Wang, Zhaopeng Tu, Shuming Shi and Xing Wang

TACL 2024 Code Media Report

ParroT: Translating During Chat Using Large Language Models

We propose the ParroT framework to enhance and regulate the translation abilities during chat based on open-sourced LLMs~(i.e., LLaMA-7b) and human written translation and evaluation data. Specifically, ParroT reformulates translation data into the instruction-following style, and introduces a “Hint” field for incorporating extra requirements to regulate the translation process.

Wenxiang Jiao, Jen-tse Huang, Wenxuan Wang, Xing Wang, Shuming Shi and Zhaopeng Tu

EMNLP 2023 (Findings) Code

Is ChatGPT A Good Translator? A Preliminary Study/Yes With GPT-4 As The Engine

We find that ChatGPT performs competitively with commercial translation products (e.g., Google Translate) on high-resource European languages and also well on spoken language. GPT-4 further bridges the gap of translation performance for even low-resource or distant languages.

Wenxiang Jiao, Wenxuan Wang, Jen-tse Huang, Xing Wang, Shuming Shi and Zhaopeng Tu

Preprint Code Media Report

Tencent’s Multilingual Machine Translation System for WMT22 Large-Scale African Languages

We adopt data augmentation, distributionally robust optimization, and language family grouping, respectively, to develop our multilingual neural machine translation (MNMT) models for African languages.

Wenxiang Jiao, Zhaopeng Tu, Jiarui Li, Wenxuan Wang, Jen-tse Huang and Shuming Shi

WMT 2022 / 1st Place in the Competition Code

Adapters for Enhanced Modeling of Multilingual Knowledge and Text

We enhance multilingual LMs with knowledge from multilingual knowledge graphs to tackle language and knowledge graph tasks across many languages.

Yifan Hou, Wenxiang Jiao, Meizhen Liu, Carl Allen, Zhaopeng Tu and Mrinmaya Sachan

EMNLP 2022 (Findings) / Best Paper Award at MRL Workshop Code

Full List

Towards Evaluating Proactive Risk Awareness of Multimodal Language Models
Youliang Yuan, Wenxiang Jiao, Yuejin Xie, Chihao Shen, Menghan Tian, Wenxuan Wang, Jen-tse Huang, Pinjia He
NeurIPS 2025 Datasets & Benchmarks

Learning to Ask: When LLM Agents Meet Unclear Instruction
Wenxuan Wang, Juluan Shi, Zixuan Ling, Yuk-Kit Chan, Chaozheng Wang, Cheryl Lee, Youliang Yuan, Jen-tse Huang, Wenxiang Jiao**, Michael R Lyu
EMNLP 2025

DRPruning: Efficient Large Language Model Pruning through Distributionally Robust Optimization
Hexuan Deng, Wenxiang Jiao, Xuebo Liu, Min Zhang, Zhaopeng Tu
ACL 2025

Refuse Whenever You Feel Unsafe: Improving Safety in LLMs via Decoupled Refusal Training
Youliang Yuan, Wenxiang Jiao, Wenxuan Wang, Jen-tse Huang, Jiahao Xu, Tian Liang, Pinjia He, Zhaopeng Tu
ACL 2025

Chain-of-Jailbreak Attack for Image Generation Models via Editing Step by Step
Wenxuan Wang, Kuiyi Gao, Zihan Jia, Youliang Yuan, Jen-tse Huang, Qiuzhi Liu, Shuai Wang, Wenxiang Jiao**, Zhaopeng Tu
ACL 2025 (Findings)

How Far Are We on the Decision-Making of LLMs? Evaluating LLMs’ Gaming Ability in Multi-Agent Environments
Jen-tse Huang, Eric John Li, Man Ho Lam, Tian Liang, Wenxuan Wang, Youliang Yuan, Wenxiang Jiao**, Xing Wang, Zhaopeng Tu, Michael R Lyu
ICLR 2025

Emotionally Numb or Empathetic? Evaluating How LLMs Feel using EmotionBench
Jen-tse Huang, Man Ho Lam, Eric John Li, Shujie Ren, Wenxuan Wang, Wenxiang Jiao**, Zhaopeng Tu and Michael R. Lyu
NeurIPS 2024

Improving Gloss-free Sign Language Translation by Reducing Representation Density
Jinhui Ye, Xing Wang, Wenxiang Jiao**, Junwei Liang, Hui Xiong
NeurIPS 2024

NewTerm: Benchmarking Real-Time New Terms for Large Language Models with Annual Updates
Hexuan Deng, Wenxiang Jiao, Xuebo Liu, Min Zhang, Zhaopeng Tu
NeurIPS 2024 Datasets & Benchmarks

On the Humanity of Conversational AI: Evaluating the Psychological Portrayal of LLMs
Jen-tse Huang, Wenxuan Wang, Eric John Li, Man Ho LAM, Shujie Ren, Youliang Yuan, Wenxiang Jiao**, Zhaopeng Tu and Michael Lyu
ICLR 2024 Oral (1.2%)

GPT-4 Is Too Smart To Be Safe: Stealthy Chat with LLMs via Cipher
Youliang Yuan, Wenxiang Jiao, Wenxuan Wang, Jen-tse Huang, Pinjia He, Shuming Shi and Zhaopeng Tu
ICLR 2024

ChatGPT an ENFJ, Bard an ISTJ: Empirical Study on Personalities of Large Language Models
Jen-tse Huang, Wenxuan Wang, Man Ho Lam, Eric John Li, Wenxiang Jiao** and Michael R. Lyu
Preprint

Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate
Tian Liang *, Zhiwei He *, Wenxiang Jiao *, Xing Wang, Yan Wang, Rui Wang, Yujiu Yang, Zhaopeng Tu and Shuming Shi
EMNLP 2024

Exploring Human-Like Translation Strategy with Large Language Models
Zhiwei He *, Tian Liang *, Wenxiang Jiao, Zhuosheng Zhang, Yujiu Yang, Rui Wang, Zhaopeng Tu, Shuming Shi and Xing Wang
TACL 2024

ParroT: Translating During Chat Using Large Language Models
Wenxiang Jiao, Jen-tse Huang, Wenxuan Wang, Xing Wang, Shuming Shi and Zhaopeng Tu
EMNLP 2023 (Findings)

ChatGPT or Grammarly? Evaluating ChatGPT on Grammatical Error Correction Benchmark
Haoran Wu, Wenxuan Wang, Yuxuan Wan, Wenxiang Jiao and Michael R. Lyu
Preprint

Is ChatGPT A Good Translator? A Preliminary Study/Yes With GPT-4 As The Engine
Wenxiang Jiao, Wenxuan Wang, Jen-tse Huang, Xing Wang, Shuming Shi and Zhaopeng Tu
Preprint

kNN-TL: k-Nearest-Neighbor Transfer Learning for Low-Resource Neural Machine Translation
Shudong Liu, Xuebo Liu, Derek F. Wong, Zhaocong Li, Wenxiang Jiao, Lidia S. Chao and Min Zhang
ACL 2023

Cross-modality Data Augmentation for End-to-End Sign Language Translation
Jinhui Ye, Wenxiang Jiao, Xing Wang, Zhaopeng Tu and Hui Xiong
EMNLP 2023 (Findings)

Scaling Back-Translation with Domain Text Generation for Sign Language Gloss Translation
Jinhui Ye *, Wenxiang Jiao *, Xing Wang and Zhaopeng Tu
EACL 2023

Tencent’s Multilingual Machine Translation System for WMT22 Large-Scale African Languages
Wenxiang Jiao, Zhaopeng Tu, Jiarui Li, Wenxuan Wang, Jen-tse Huang and Shuming Shi
WMT 2022 / 1st Place in the Competition

Adapters for Enhanced Modeling of Multilingual Knowledge and Text
Yifan Hou, Wenxiang Jiao, Meizhen Liu, Carl Allen, Zhaopeng Tu and Mrinmaya Sachan
EMNLP 2022 (Findings) / Best Paper Award at MRL Workshop

Understanding and Improving Sequence-to-Sequence Pretraining for Neural Machine Translation
Wenxuan Wang, Wenxiang Jiao, Yongchang Hao, Xing Wang, Shuming Shi, Zhaopeng Tu and Michael R. Lyu
ACL 2022

Exploiting Inactive Examples for Natural Language Generation with Data Rejuvenation
Wenxiang Jiao, Xing Wang, Shilin He, Zhaopeng Tu, Irwin King and Michael R. Lyu
IEEE/ACM TASLP 2022

Self-training Sampling with Monolingual Data Uncertainty for Neural Machine Translation
Wenxiang Jiao, Xing Wang, Zhaopeng Tu, Shuming Shi, Michael R. Lyu and Irwin King
ACL 2021

Data Rejuvenation: Exploiting Inactive Training Examples for Neural Machine Translation
Wenxiang Jiao, Xing Wang, Shilin He, Irwin King, Michael R. Lyu and Zhaopeng Tu
EMNLP 2020

Exploiting Unsupervised Data for Emotion Recognition in Conversations
Wenxiang Jiao, Michael R. Lyu and Irwin King
EMNLP 2020 (Findings)

Real-Time Emotion Recognition via Attention Gated Hierarchical Memory Network
Wenxiang Jiao, Michael R. Lyu and Irwin King
AAAI 2020

HiGRU - Hierarchical Gated Recurrent Units for Utterance-Level Emotion Recognition
Wenxiang Jiao, Haiqin Yang, Irwin King and Michael R. Lyu
NAACL 2019