I am currently a first-year Master’s student at UCLA, and a project lead at TurningPointAI, advised by Ruochen Wang, Prof. Minhao Cheng, Prof. Tianyi Zhou, and Prof. Cho-Jui Hsieh. We are a collaborative initiative dedicated to advancing the field of Multimodal Language Agents. Learn more about our work at TurningPointAI and stay updated by following us on twitter.
Previously, I was fortune to work with Prof. YueGao at Tsinghua University.
My research is primarily focused on Multimodal Large Language Models Reasoning. Prior to the era of LLM, I had experiences working on 3D Computer Vision, Human-Computer Interaction (HCI), and visually-rich document understanding. I hold a B.S degree in Computer Science from the University of Toronto.
We released the first successful replication of DeepSeek-R1's 'aha moment' in a multimodal task using only a 2B non-SFT model! - February 26, 2025
Our submisson on multimodal oversensitivity is accepted in ICLR - January 22, 2025
We presented the first test suite assessing if current MLLMs overreact to benign queries! - June 22, 2024
R1-Zero’s “Aha Moment” in Visual Reasoning on a 2B Non-SFT Model