Three papers accepted to ACL 2026 conference (2 main + 1 findings). Congratulations!
LLMs · Agents · Reasoning · Safety
Wenxiang Jiao 焦文祥
I build benchmarks, agents, and alignment methods that expose where large language models break — in reasoning, safety, and psychological behavior — and turn the findings into systems that ship.
About
Updates
Two papers (REA-RL, DeepCompress) accepted to ICLR 2026. Congratulations!
Gave an invited talk at Agentic AI Summit 2026 titled Scalable and Personalizable AI Agents.
DeepAgent accepted to WWW 2026. Congratulations!
One paper accepted to NeurIPS 2025 D&B. Congratulations!
Selected Work
A benchmark for evaluating omni-modal general AI assistants.
A general reasoning agent with scalable toolsets for complex multi-step tasks.
A framework for encouraging divergent thinking in LLMs through structured debate.
A safety evaluation framework for stealthy chat with LLMs via cipher encoding.
A benchmark for evaluating psychological portrayals in large language models.
A systematic evaluation of ChatGPT translation across languages and domains.
Research
General Agents
Multimodal, tool-using, and collaborative agents for complex tasks.
DeepAgent · OmniGAIA · MADLLM Reasoning
Mathematical, reflective, and efficient long-chain reasoning.
DeepCompress · REA-RLLLM Safety
Risk awareness, jailbreak robustness, multilingual safety, and refusal.
CipherChat · DeRTaLLM Personality
Emotion, personality, and psychological portrayals in conversational AI.
PsychoBench · EmotionBench · Fints