Three papers accepted to ACL 2026 conference (2 main + 1 findings). Congratulations!
LLMs · Agents · Reasoning · Safety
Wenxiang Jiao 焦文祥
I study where large language models break — in reasoning, agency, safety, and psychological behavior — and turn those findings into benchmarks, agents, and alignment methods for reliable AI systems.
About
Updates
Two papers (REA-RL, DeepCompress) accepted to ICLR 2026. Congratulations!
Gave an invited talk at Agentic AI Summit 2026 titled Scalable and Personalizable AI Agents.
DeepAgent accepted to WWW 2026. Congratulations!
One paper accepted to NeurIPS 2025 D&B. Congratulations!
Selected Work
OmniGAIA: Towards Native Omni-Modal AI Agents
Benchmark and foundation agent for omni-modal AI assistants — multi-hop queries across video, audio, and image, with OmniAtlas trained via hindsight-guided tree exploration and OmniDPO.
MMSkills: Towards Multimodal Skills for General Visual Agents
Reusable multimodal procedures encoded as state-conditioned skill packages, generated from public trajectories and consulted by a branch-loaded skill agent at runtime.
DeepAgent: A General Reasoning Agent with Scalable Toolsets
A reasoning agent that tackles general tasks by searching for and using appropriate tools from over 16,000 RapidAPIs in an end-to-end agentic reasoning process.
Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate
The MAD framework addresses the Degeneration-of-Thought problem in self-reflection and explores divergent chains of thought through structured agent interaction.
GPT-4 Is Too Smart To Be Safe: Stealthy Chat with LLMs via Cipher
Examines whether safety alignment generalizes to non-natural languages such as ciphers; GPT-4 understands ciphers well enough to produce unsafe outputs.
On the Humanity of Conversational AI: Evaluating the Psychological Portrayal of LLMs
Evaluates diverse psychological aspects of LLMs, including personality traits, interpersonal relationships, motivational tests, and emotional abilities.
Is ChatGPT A Good Translator? A Preliminary Study
ChatGPT performs competitively with commercial translation systems on high-resource European languages; GPT-4 further bridges the gap for low-resource and distant languages.
Research
General Agents
Multimodal, tool-using, and collaborative agents for complex tasks.
DeepAgent · OmniGAIA · MADLLM Reasoning
Mathematical, reflective, and efficient long-chain reasoning.
DeepCompress · REA-RLLLM Safety
Risk awareness, jailbreak robustness, multilingual safety, and refusal.
CipherChat · DeRTaLLM Personality
Emotion, personality, and psychological portrayals in conversational AI.
PsychoBench · EmotionBench · Fints🌎 Visitor Footprints
... visitors from around the world