Welcome! I’m Haosen.

I am a Master’s student in Computer Science at Northwestern University, advised by Prof. Manling Li at the MLL Group in collaboration with the Stanford Vision and Learning Lab. Previously, I was a research intern at the Shanghai AI Lab. I received my Bachelor’s degree in Data Science and Technology from the Hong Kong University of Science and Technology, where I was advised by Prof. Chi-Keung Tang and Prof. Yu-Wing Tai.

My research focuses on Foundation Models, Multimodal Generative Models, 3D Vision, and Embodied Intelligence, with an emphasis on safety, efficiency, and interpretability. I aim to enable machines to understand both structured data (text, images, video) and unstructured 3D data, contributing to human-centered and physically grounded general AI.

🔥 News

📝 Publications

*Equal contribution. Corresponding author/Co-advisor. Project leader.

ICLR 2026
sym

ODESteer: A Unified ODE-Based Steering Framework for LLM Alignment

Haosen Sun*, Hongjue Zhao*, Jiangtao Kong, Xiaochang Li, Qineng Wang, Liwei Jiang, Qi Zhu, Tarek F. Abdelzaher, Yejin Choi, Manling Li, Huajie Shao

International Conference on Learning Representations (ICLR), 2026

[Project Page] [Paper] [Code]

  • A unified ODE-based framework for multi-step and adaptive activation steering guided by barrier functions.
  • Consistent gains on TruthfulQA (+5.7%), RealToxicityPrompts (+2.4%), UltraFeedback (+2.5%).
Preprint
sym

ProgressLM: Towards Progress Reasoning in Vision-Language Models

Jianshu Zhang*, Chengxuan Qian*, Haosen Sun, Haoran Lu, Dingcheng Wang, Letian Xue, Han Liu

Preprint, 2026; ICLR 2026 Workshop on World Models

[Project Page] [Paper] [Code]

  • PROGRESS-BENCH: a benchmark for long-horizon progress reasoning in VLMs, with controlled modality, viewpoint, and answerability.
  • Shows that progress reasoning is unstable in current VLMs, and becomes more robust with explicitly trained coarse-to-fine models (ProgressLM-3B).
CVPR 2025
sym

T*: Re-thinking Temporal Search for Long-Form Video Understanding

Jinhui Ye*, Zihan Wang*, Haosen Sun, Keshigeyan Chandrasegaran, Zane Durante, Cristobal Eyzaguirre, Yonatan Bisk, Juan Carlos Niebles, Ehsan Adeli, Fei-Fei Li, Jiajun Wu, Manling Li

Conference on Computer Vision and Pattern Recognition (CVPR), 2025; Oral @ ICCV 2025 LongVid-Foundations, Featured by Stanford AI Blog

[Project Page] [Paper] [Code]

  • We introduce LongVideoHaystack (LV-Haystack), a 480-hour dataset for keyframe search in long videos, with 15,092 human-annotated instances (SOTA scores 2.1% Temporal F1).
  • Our framework T* reframed temporal search as spatial search with adaptive zooming, boosting GPT-4o from 50.5% to 53.1% and LLaVA-OV from 56.5% to 62.4% on LongVideoBench XL.
ECCV 2024
sym

Auto-DAS: Automated Proxy Discovery for Training-free Distillation-aware Architecture Search

Haosen Sun, Lujun Li, Peijie Dong, Zimian Wei, Shitong Shao

European Conference on Computer Vision (ECCV), 2024

[Paper] [Code]

  • We present Auto-DAS, an automatic proxy discovery framework using an Evolutionary Algorithm (EA) for training-free Distillation-aware Architecture Search (DAS).
  • Auto-DAS generalizes well to various architectures and search spaces (e.g. ResNet, ViT, NAS-Bench-101, and NAS-Bench-201), achieving state-of-the-art results in both ranking correlation and final searched accuracy.
ECCV 2024
sym

Auto-GAS: Automated Proxy Discovery for Training-free Generative Architecture Search

Lujun Li, Haosen Sun, Shiwen Li, Peijie Dong, Wenhan Luo, Wei Xue, Qifeng Liu, Yike Guo

European Conference on Computer Vision (ECCV), 2024

[Paper] [Code]

  • We introduce Auto-GAS, the first training-free Generation Architecture Search (GAS) framework enabled by an auto-discovered proxy, which achieves competitive scores with 110× faster search than GAN Compression.
arXiv 2023
sym

Inpaint4DNeRF: Promptable Spatio-Temporal NeRF Inpainting with Generative Diffusion Models

Han Jiang*, Haosen Sun*, Ruoxuan Li*, Yu-Wing Tai, Chi-Keung Tang

Arxiv, Dec 2023

[Project Page] [Paper] [Code]

  • Inpaint4DNeRF can generate prompt-based objects guided by the seed images and their 3D proxies while preserving multiview consistency. Our generative baseline framework is general which can be readily extended to 4D dynamic NeRFs.

🎖 Honors and Awards

📖 Educations

  • 2024.09 - 2026.06, M.S. in Computer Science, Northwestern University, Evanston, IL
  • 2020.09 - 2024.07, BSc in Data Science and Technology, Hong Kong University of Science and Technology (HKUST), Hong Kong

💬 Academic Services

  • Conference Reviewer: ICLR, NeurIPS, ACM Multimedia

💻 Internships

  • 07/2024 – 09/2024, Shanghai Artificial Intelligence Laboratory, China.

    Research Intern, working closely with Dr. Peng Ye.

  • 10/2023 – 05/2024, Hong Kong Generative AI Research and Development Center (HKGAI), Hong Kong.

    Research Intern, working closely with Dr. Lujun Li.