Hengguang Zhou PhD Student at UCLA

About me

I am currently a PhD student at UCLA advised by Prof. Cho-Jui Hsieh, and a project lead at TurningPointAI, advised by Ruochen Wang, Prof. Minhao Cheng, Prof. Tianyi Zhou. We are a collaborative initiative dedicated to advancing the field of Multimodal Language Agents. Learn more about our work at TurningPointAI and stay updated by following us on twitter. Previously, I was fortunate to work with Prof. Yue Gao on 3D Computer Vision at Tsinghua University.

My research is funded by Amazon Trainium Fellowship. For further information, please see my CV (last update: Nov 23, 2024).

Research Interests: My research centers on advancing MLLM post-training, with a focus on Reasoning/Agent and Multimodal.

  • Reasoning/Agent: Eliciting the reasoning and agentic capabilities of (M)LLMs via RL (VisualThinker).
  • Multimodal: Exploring how models comprehend and interpret the physical world across diverse sensory (VisualThinker, MOSSBench).

I developed VisualThinker, one of the first open-source repo to replicate the aha moment of Deepseek-r1 on a small non-sft multimodal model(600+ Github stars).

Prior to the era of LLM, I had experiences working on 3D Computer Vision, Human-Computer Interaction (HCI), and visually-rich document understanding.

Home

Publications(First Author)

R1-Zero’s “Aha Moment” in Visual Reasoning on a 2B Non-SFT Model

Hengguang Zhou*, Xirui Li*, Ruochen Wang, Minhao Cheng, Tianyi Zhou, Cho-Jui Hsieh

MOSSBench: Is Your Multimodal Large Language Model Oversensitive to Safe Queries?

Xirui Li*, Hengguang Zhou*, Ruochen Wang, Tianyi Zhou, Minhao Cheng, Cho-Jui Hsieh
ICLR, 2025.