Welcome! I’m Haosen.
I am a Master’s student in Computer Science at Northwestern University, advised by Prof. Manling Li at the MLL Group in collaboration with the Stanford Vision and Learning Lab. Previously, I was a research intern at the Shanghai AI Lab. I received my Bachelor’s degree in Data Science and Technology from the Hong Kong University of Science and Technology, where I was advised by Prof. Chi-Keung Tang and Prof. Yu-Wing Tai.
My research focuses on Foundation Models, Multimodal Generative Models, 3D Vision, and Embodied Intelligence, with an emphasis on safety, efficiency, and interpretability. I aim to enable machines to understand both structured data (text, images, video) and unstructured 3D data, contributing to human-centered and physically grounded general AI.
🔥 News
- 2026.01: One paper accepted by ICLR 2026!
- 2026.01: New work coming to arXiv: ProgressLM: towards progress reasoning in Vision-Language Models.
📝 Publications
*Equal contribution. †Corresponding author/Co-advisor. ‡Project leader.
ODESteer: A Unified ODE-Based Steering Framework for LLM Alignment
Haosen Sun*, Hongjue Zhao*, Jiangtao Kong, Xiaochang Li, Qineng Wang, Liwei Jiang, Qi Zhu, Tarek F. Abdelzaher, Yejin Choi, Manling Li†, Huajie Shao†
International Conference on Learning Representations (ICLR), 2026
- A unified ODE-based framework for multi-step and adaptive activation steering guided by barrier functions.
- Consistent gains on TruthfulQA (+5.7%), RealToxicityPrompts (+2.4%), UltraFeedback (+2.5%).
ProgressLM: Towards Progress Reasoning in Vision-Language Models
Jianshu Zhang*, Chengxuan Qian*, Haosen Sun, Haoran Lu, Dingcheng Wang, Letian Xue, Han Liu
Preprint, 2026; ICLR 2026 Workshop on World Models
- PROGRESS-BENCH: a benchmark for long-horizon progress reasoning in VLMs, with controlled modality, viewpoint, and answerability.
- Shows that progress reasoning is unstable in current VLMs, and becomes more robust with explicitly trained coarse-to-fine models (ProgressLM-3B).

T*: Re-thinking Temporal Search for Long-Form Video Understanding
Jinhui Ye*, Zihan Wang*, Haosen Sun, Keshigeyan Chandrasegaran, Zane Durante, Cristobal Eyzaguirre, Yonatan Bisk, Juan Carlos Niebles, Ehsan Adeli, Fei-Fei Li, Jiajun Wu, Manling Li
Conference on Computer Vision and Pattern Recognition (CVPR), 2025; Oral @ ICCV 2025 LongVid-Foundations, Featured by Stanford AI Blog
- We introduce LongVideoHaystack (LV-Haystack), a 480-hour dataset for keyframe search in long videos, with 15,092 human-annotated instances (SOTA scores 2.1% Temporal F1).
- Our framework T* reframed temporal search as spatial search with adaptive zooming, boosting GPT-4o from 50.5% to 53.1% and LLaVA-OV from 56.5% to 62.4% on LongVideoBench XL.

Auto-DAS: Automated Proxy Discovery for Training-free Distillation-aware Architecture Search
Haosen Sun, Lujun Li†, Peijie Dong, Zimian Wei, Shitong Shao
European Conference on Computer Vision (ECCV), 2024
- We present Auto-DAS, an automatic proxy discovery framework using an Evolutionary Algorithm (EA) for training-free Distillation-aware Architecture Search (DAS).
- Auto-DAS generalizes well to various architectures and search spaces (e.g. ResNet, ViT, NAS-Bench-101, and NAS-Bench-201), achieving state-of-the-art results in both ranking correlation and final searched accuracy.

Auto-GAS: Automated Proxy Discovery for Training-free Generative Architecture Search
Lujun Li, Haosen Sun, Shiwen Li, Peijie Dong, Wenhan Luo, Wei Xue, Qifeng Liu†, Yike Guo†
European Conference on Computer Vision (ECCV), 2024
- We introduce Auto-GAS, the first training-free Generation Architecture Search (GAS) framework enabled by an auto-discovered proxy, which achieves competitive scores with 110× faster search than GAN Compression.

Inpaint4DNeRF: Promptable Spatio-Temporal NeRF Inpainting with Generative Diffusion Models
Han Jiang*, Haosen Sun*, Ruoxuan Li*, Yu-Wing Tai, Chi-Keung Tang
Arxiv, Dec 2023
- Inpaint4DNeRF can generate prompt-based objects guided by the seed images and their 3D proxies while preserving multiview consistency. Our generative baseline framework is general which can be readily extended to 4D dynamic NeRFs.
🎖 Honors and Awards
- The Dean List Award
- Silver Medal in CVPR’24 Workshop (Image Matching Challenge 2024 - Hexathlon)
- Silver Medal in Kaggle Competition (Google - Fast or Slow? Predict AI Model Runtime)
- Nomination for the Mr. Armin and Mrs. Lillian Kitchell Undergraduate Research Award
- Kerry Holdings Limited Scholarship (HKUST Admissions Scholarship, HK$280,000)
- Bronze Medal and the First Prize in the 36th Chinese Physics Olympiad (CPHO), Top 0.1%
📖 Educations
- 2024.09 - 2026.06, M.S. in Computer Science, Northwestern University, Evanston, IL
- 2020.09 - 2024.07, BSc in Data Science and Technology, Hong Kong University of Science and Technology (HKUST), Hong Kong
💬 Academic Services
- Conference Reviewer: ICLR, NeurIPS, ACM Multimedia
💻 Internships
-
07/2024 – 09/2024, Shanghai Artificial Intelligence Laboratory, China.
Research Intern, working closely with Dr. Peng Ye.
-
10/2023 – 05/2024, Hong Kong Generative AI Research and Development Center (HKGAI), Hong Kong.
Research Intern, working closely with Dr. Lujun Li.
