LLMs · Agents · Reasoning · Safety

Wenxiang Jiao 焦文祥

I study where large language models break — in reasoning, agency, safety, and psychological behavior — and turn those findings into benchmarks, agents, and alignment methods for reliable AI systems.

Wenxiang Jiao
CurrentXiaohongshu Inc., LLM Algorithm Expert, 2025 - Present
PreviouslyTencent AI Lab, Senior Researcher, 2021 - 2025
EducationPh.D., The Chinese University of Hong Kong, 2021
Service Reviewer for Nature Nature Machine Intelligence NeurIPS ICML ACL

Three papers accepted to ACL 2026 conference (2 main + 1 findings). Congratulations!

Two papers (REA-RL, DeepCompress) accepted to ICLR 2026. Congratulations!

DeepAgent accepted to WWW 2026. Congratulations!

One paper accepted to NeurIPS 2025 D&B. Congratulations!

ArXiv 2026

OmniGAIA: Towards Native Omni-Modal AI Agents

Benchmark and foundation agent for omni-modal AI assistants — multi-hop queries across video, audio, and image, with OmniAtlas trained via hindsight-guided tree exploration and OmniDPO.

ArXiv 2026

MMSkills: Towards Multimodal Skills for General Visual Agents

Reusable multimodal procedures encoded as state-conditioned skill packages, generated from public trajectories and consulted by a branch-loaded skill agent at runtime.

WWW 2026

DeepAgent: A General Reasoning Agent with Scalable Toolsets

A reasoning agent that tackles general tasks by searching for and using appropriate tools from over 16,000 RapidAPIs in an end-to-end agentic reasoning process.

EMNLP 2024

Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate

The MAD framework addresses the Degeneration-of-Thought problem in self-reflection and explores divergent chains of thought through structured agent interaction.

ICLR 2024

GPT-4 Is Too Smart To Be Safe: Stealthy Chat with LLMs via Cipher

Examines whether safety alignment generalizes to non-natural languages such as ciphers; GPT-4 understands ciphers well enough to produce unsafe outputs.

ICLR 2024 Oral

On the Humanity of Conversational AI: Evaluating the Psychological Portrayal of LLMs

Evaluates diverse psychological aspects of LLMs, including personality traits, interpersonal relationships, motivational tests, and emotional abilities.

ArXiv 2023

Is ChatGPT A Good Translator? A Preliminary Study

ChatGPT performs competitively with commercial translation systems on high-resource European languages; GPT-4 further bridges the gap for low-resource and distant languages.

General Agents

Multimodal, tool-using, and collaborative agents for complex tasks.

DeepAgent · OmniGAIA · MAD

LLM Reasoning

Mathematical, reflective, and efficient long-chain reasoning.

DeepCompress · REA-RL

LLM Safety

Risk awareness, jailbreak robustness, multilingual safety, and refusal.

CipherChat · DeRTa

LLM Personality

Emotion, personality, and psychological portrayals in conversational AI.

PsychoBench · EmotionBench · Fints

🌎 Visitor Footprints

... visitors from around the world

Loading visitor map…