Welcome! I’m Haosen.

I am an incoming Ph.D. student in Computer and Information Science at University of Pennsylvania, affiliated with the GRASP Lab and IDEAS Center, advised by Prof. René Vidal. Previously, I was a Master’s student at Northwestern University, advised by Prof. Manling Li at the MLL Group in collaboration with the Stanford Vision and Learning Lab, and a research intern at the Shanghai AI Lab. I received my BSc in Data Science and Technology from Hong Kong University of Science and Technology, advised by Prof. Chi-Keung Tang and Prof. Yu-Wing Tai.

My research spans vision, language, and robotics, focusing on Foundation Models, Multimodal Reasoning, and Generative World Modeling. I aim to build efficient, controllable, and interpretable models that can reason about and interact with the physical world.

Happy to chat about ideas, new directions, collaborations, or opportunities!

📢 News

📚 Publications

*Equal contribution. Corresponding author/Co-advisor. Project leader.

ICLR 2026
sym

ODESteer: A Unified ODE-Based Steering Framework for LLM Alignment

Haosen Sun*, Hongjue Zhao*, Jiangtao Kong, Xiaochang Li, Qineng Wang, Liwei Jiang, Qi Zhu, Tarek F. Abdelzaher, Yejin Choi, Manling Li, Huajie Shao

ICLR 2026

Project Page Paper Code

  • A unified ODE-based framework for multi-step and adaptive activation steering guided by barrier functions.
  • Consistent gains on TruthfulQA (+5.7%), RealToxicityPrompts (+2.4%), UltraFeedback (+2.5%).
ACL 2026 (Oral)
sym

ProgressLM: Towards Progress Reasoning in Vision-Language Models

Jianshu Zhang*, Chengxuan Qian*, Haosen Sun, Haoran Lu, Dingcheng Wang, Letian Xue, Han Liu

Oral @ ACL 2026
ICLR 2026 Workshop on World Models

Project Page Paper Code

  • PROGRESS-BENCH: a benchmark for long-horizon progress reasoning in VLMs, with controlled modality, viewpoint, and answerability.
  • Reveals that vanilla VLMs struggle to estimate task progress from a single observation; ProgressLM-3B (SFT + RL) addresses this via episodic retrieval and mental simulation.
CVPR 2025
sym

T*: Re-thinking Temporal Search for Long-Form Video Understanding

Jinhui Ye*, Zihan Wang*, Haosen Sun, Keshigeyan Chandrasegaran, Zane Durante, Cristobal Eyzaguirre, Yonatan Bisk, Juan Carlos Niebles, Ehsan Adeli, Fei-Fei Li, Jiajun Wu, Manling Li

CVPR 2025
Oral @ ICCV 2025 Workshop on LongVid-Foundations
Featured by Stanford AI Blog

Project Page Paper Code

  • We introduce LongVideoHaystack (LV-Haystack), a 480-hour dataset for keyframe search in long videos, with 15,092 human-annotated instances (SOTA: 2.1% Temporal F1).
  • Our framework T* reframed temporal search as spatial search with adaptive zooming, boosting GPT-4o from 50.5% to 53.1% and LLaVA-OV from 56.5% to 62.4% on LongVideoBench XL.
ECCV 2024
sym

Auto-DAS: Automated Proxy Discovery for Training-free Distillation-aware Architecture Search

Haosen Sun, Lujun Li, Peijie Dong, Zimian Wei, Shitong Shao

ECCV 2024

Paper Code

  • We present Auto-DAS, an automatic proxy discovery framework using an Evolutionary Algorithm (EA) for training-free Distillation-aware Architecture Search (DAS).
  • Auto-DAS generalizes well to various architectures and search spaces (e.g. ResNet, ViT, NAS-Bench-101, and NAS-Bench-201), achieving state-of-the-art results in both ranking correlation and final searched accuracy.
ECCV 2024
sym

Auto-GAS: Automated Proxy Discovery for Training-free Generative Architecture Search

Lujun Li, Haosen Sun, Shiwen Li, Peijie Dong, Wenhan Luo, Wei Xue, Qifeng Liu, Yike Guo

ECCV 2024

Paper Code

  • We introduce Auto-GAS, the first training-free Generation Architecture Search (GAS) framework enabled by an auto-discovered proxy, which achieves competitive scores with 110× faster search than GAN Compression.
arXiv 2023
sym

Inpaint4DNeRF: Promptable Spatio-Temporal NeRF Inpainting with Generative Diffusion Models

Han Jiang*, Haosen Sun*, Ruoxuan Li*, Yu-Wing Tai, Chi-Keung Tang

arXiv Preprint 2023

Project Page Paper Code

  • Inpaint4DNeRF can generate prompt-based objects guided by the seed images and their 3D proxies while preserving multiview consistency. Our generative baseline framework is general and can be readily extended to 4D dynamic NeRFs.

🏆 Honors and Awards

🎓 Education

2026.08 – 2030.06 (expected)

Ph.D. in Computer and Information Science

University of Pennsylvania, Philadelphia, PA

2024.09 – 2026.06

M.S. in Computer Science

Northwestern University, Evanston, IL

2020.09 – 2024.06

BSc in Data Science and Technology

Hong Kong University of Science and Technology (HKUST), Hong Kong

🔍 Academic Service

Conference Reviewer ICLR NeurIPS ACM Multimedia

🏢 Internships

2024.07 – 2024.09

Research Intern

Shanghai Artificial Intelligence Laboratory, China

2023.10 – 2024.05

Research Intern

Hong Kong Generative AI Research and Development Center (HKGAI), Hong Kong