Wenxiang Jiao (焦文祥)

My research studies how large language models reason, act as agents, refuse unsafe requests, and portray psychological traits. I develop benchmarks, open-source systems, and training/alignment methods to diagnose model failures and build more reliable AI applications.

General Agents

Multimodal, tool-using, and collaborative agents for complex, long-horizon tasks.

MuSEAgent: A Multimodal Reasoning Agent with Stateful Experiences

Abstracts interaction data into atomic decision experiences via hindsight reasoning, then retrieves them at inference through policy-driven wide- and deep-search strategies, improving fine-grained visual perception and multimodal reasoning over trajectory-level retrieval baselines.

ArXiv 2026 Paper Code

MMSkills: Towards Multimodal Skills for General Visual Agents

Represents reusable multimodal procedures as compact, state-conditioned skill packages — a textual procedure paired with runtime state cards and multi-view keyframes — generated from public trajectories and consulted by a branch-loaded skill agent at runtime.

ArXiv 2026 Paper Code

DeepAgent: A General Reasoning Agent with Scalable Toolsets

A reasoning agent capable of tackling general tasks by searching for and using the appropriate tools from over 16,000 RapidAPIs in an end-to-end agentic reasoning process.

WWW 2026 Paper Code

OmniGAIA: Towards Native Omni-Modal AI Agents

A benchmark and foundation agent for omni-modal AI assistants. OmniGAIA synthesizes multi-hop queries across video, audio, and image via an omni-modal event graph; the accompanying OmniAtlas agent uses active omni-modal perception trained with hindsight-guided tree exploration and OmniDPO.

ArXiv 2026 Paper Code

Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate

The Multi-Agent Debate (MAD) framework addresses the Degeneration-of-Thought problem and explores divergent chains of thought through structured agent interaction.

EMNLP 2024 Paper Code

LLM Reasoning

Mathematical, reflective, and efficient long-chain reasoning.

REA-RL: Reflection-Aware Online Reinforcement Learning for Efficient Large Reasoning Models

Tackles overthinking in large reasoning models with a small reflection model that enables parallel sampling and sequential revision during online RL, plus a reflection reward that preserves reflection ability — reducing inference cost by ~35% without sacrificing accuracy.

ICLR 2026 Paper Code

DeepCompress: A Dual Reward Strategy for Dynamically Exploring and Compressing Reasoning Chains

A dual-reward RL framework that classifies problems as Simple or Hard in real time and adaptively shortens or extends Chain-of-Thought, improving both accuracy and token efficiency on challenging math benchmarks.

ICLR 2026 Paper Code

Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate

Defines the Degeneration-of-Thought problem in self-reflection and addresses it with multi-agent debate over divergent chains of thought.

EMNLP 2024 Paper Code

LLM Safety

Risk awareness, jailbreak robustness, multilingual safety, and refusal behavior.

Towards Evaluating Proactive Risk Awareness of Multimodal Language Models

PaSBench evaluates proactive safety across 416 multimodal scenarios in five safety-critical domains. Top models such as Gemini-2.5-pro reach 64–71% accuracy but miss 45–55% of risks under repetition — failure analysis traces this to unstable proactive reasoning rather than missing knowledge.

NeurIPS 2025 D&B Paper

Refuse Whenever You Feel Unsafe: Improving Safety in LLMs via Decoupled Refusal Training (DeRTa)

Identifies a refusal position bias in safety-tuning data and proposes Decoupled Refusal Training: MLE with a harmful response prefix plus Reinforced Transition Optimization, letting LLaMA-3 and Mistral models refuse at any position throughout a harmful response without hurting performance.

ACL 2025 Paper Code

Chain-of-Jailbreak Attack for Image Generation Models via Editing Step by Step

Decomposes a malicious image-generation query into innocuous sub-queries and iteratively edits the output; bypasses safeguards on GPT-4V, GPT-4o, and Gemini 1.5/Pro in over 60% of cases. A companion Think Twice Prompting defense blocks more than 95%.

ACL 2025 (Findings) Paper Code

GPT-4 Is Too Smart To Be Safe: Stealthy Chat with LLMs via Cipher

CipherChat examines whether safety alignment generalizes to non-natural languages such as ciphers; GPT-4 understands ciphers well enough to produce unsafe outputs.

ICLR 2024 Paper Project

LLM Personality

Emotion, personality, and psychological portrayals in conversational AI.

On the Humanity of Conversational AI: Evaluating the Psychological Portrayal of LLMs

PPBench evaluates diverse psychological aspects of LLMs, including personality traits, interpersonal relationships, motivational tests, and emotional abilities.

ICLR 2024 Oral Paper Code

Apathetic or Empathetic? Evaluating LLMs' Emotional Alignments with Humans

Uses emotion appraisal theory to test how LLMs' feelings shift across 400+ situations grouped into 36 factors, benchmarked against responses from 1,200+ human subjects. Models including GPT-4, Mixtral-8x22B, and LLaMA-3.1 respond appropriately in some cases but fail to align with human emotional behavior or connect similar situations.

NeurIPS 2024 Paper Code

On the Reliability of Psychological Scales on Large Language Models

Across 2,500 settings per model on GPT-3.5/4, Gemini-Pro, and LLaMA-3.1, shows that LLMs respond consistently to the Big Five Inventory. Further demonstrates GPT-3.5 can emulate diverse personalities and represent specific population groups when given targeted prompts.

EMNLP 2024 Paper

For the chronological list of all papers, see my Google Scholar page.

Research

MuSEAgent: A Multimodal Reasoning Agent with Stateful Experiences

MMSkills: Towards Multimodal Skills for General Visual Agents

DeepAgent: A General Reasoning Agent with Scalable Toolsets

OmniGAIA: Towards Native Omni-Modal AI Agents

Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate

REA-RL: Reflection-Aware Online Reinforcement Learning for Efficient Large Reasoning Models

DeepCompress: A Dual Reward Strategy for Dynamically Exploring and Compressing Reasoning Chains

Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate

Towards Evaluating Proactive Risk Awareness of Multimodal Language Models

Refuse Whenever You Feel Unsafe: Improving Safety in LLMs via Decoupled Refusal Training (DeRTa)

Chain-of-Jailbreak Attack for Image Generation Models via Editing Step by Step

GPT-4 Is Too Smart To Be Safe: Stealthy Chat with LLMs via Cipher

On the Humanity of Conversational AI: Evaluating the Psychological Portrayal of LLMs

Apathetic or Empathetic? Evaluating LLMs' Emotional Alignments with Humans

On the Reliability of Psychological Scales on Large Language Models