Welcome! I’m Haosen.

I am a Master’s student in Computer Science at Northwestern University, advised by Prof. Manling Li at the NU-MLL Group and collaborating with the Stanford Vision and Learning Lab. Previously, I was a research intern at the Shanghai AI Lab. I received my Bachelor’s degree in Data Science and Technology from the Hong Kong University of Science and Technology, where I was advised by Prof. Chi-Keung Tang and Prof. Yu-Wing Tai.

My research focuses on Foundation Models, Multimodal Generative Models, Spatial Reasoning, and Embodied Intelligence, with an emphasis on safety, efficiency, and interpretability. I aim to enable machines to understand both structured data (text, images, video) and unstructured 3D data, contributing to human-centered and physically grounded general AI.

I am seeking PhD opportunities starting in Fall 2026. If our research interests overlap, I would love to connect!

🔥 News

📝 Publications

* indicates equal contribution

ICLR 2026 Submission
sym

Activation Steering for LLM Alignment via a Unified ODE-Based Framework

Hongjue Zhao*, Haosen Sun*, Jiangtao Kong, Xiaochang Li, Qineng Wang, Liwei Jiang, Qi Zhu, Tarek F. Abdelzaher, Yejin Choi, Manling Li, Huajie Shao

International Conference on Learning Representations (ICLR), 2026 (Under Review)

[Paper] [Code]

  • We propose BODES (Barrier function-guided ODE Steering), a unified ODE-based framework for multi-step and adaptive activation steering using control barrier functions.
  • BODES bridges theoretical and empirical advances in LLM alignment, achieving consistent gains on TruthfulQA (+7%), RealToxicityPrompts (+2%), and UltraFeedback (+2%).
CVPR 2025
sym

T*: Re-thinking Temporal Search for Long-Form Video Understanding

Jinhui Ye*, Zihan Wang*, Haosen Sun, Keshigeyan Chandrasegaran, Zane Durante, Cristobal Eyzaguirre, Yonatan Bisk, Juan Carlos Niebles, Ehsan Adeli, Li Fei-Fei, Jiajun Wu, Manling Li

Conference on Computer Vision and Pattern Recognition (CVPR), 2025

[Project Page] [Paper] [Code]

  • We introduce LongVideoHaystack (LV-Haystack), a 480-hour dataset for keyframe search in long videos, with 15,092 human-annotated instances (SOTA scores 2.1% Temporal F1).
  • Our framework T* reframed temporal search as spatial search with adaptive zooming, boosting GPT-4o from 50.5% to 53.1% and LLaVA-OV from 56.5% to 62.4% on LongVideoBench XL.
ECCV 2024
sym

Auto-DAS: Automated Proxy Discovery for Training-free Distillation-aware Architecture Search

Haosen Sun, Lujun Li, Peijie Dong, Zimian Wei, Shitong Shao

European Conference on Computer Vision (ECCV), 2024

[Paper] [Code]

  • We present Auto-DAS, an automatic proxy discovery framework using an Evolutionary Algorithm (EA) for training-free Distillation-aware Architecture Search (DAS).
  • Auto-DAS generalizes well to various architectures and search spaces (e.g. ResNet, ViT, NAS-Bench-101, and NAS-Bench-201), achieving state-of-the-art results in both ranking correlation and final searched accuracy.
ECCV 2024
sym

Auto-GAS: Automated Proxy Discovery for Training-free Generative Architecture Search

Lujun Li, Haosen Sun, Shiwen Li, Peijie Dong, Qifeng Liu, Wei Xue, Yike Guo

European Conference on Computer Vision (ECCV), 2024

[Paper] [Code]

  • We introduce Auto-GAS, the first training-free Generation Architecture Search (GAS) framework enabled by an auto-discovered proxy, which achieves competitive scores with 110× faster search than GAN Compression.
arXiv 2023
sym

Inpaint4DNeRF: Promptable Spatio-Temporal NeRF Inpainting with Generative Diffusion Models

Han Jiang*, Haosen Sun*, Ruoxuan Li*, Yu-Wing Tai, Chi-Keung Tang

Arxiv, Dec 2023

[Project Page] [Paper] [Code]

  • Inpaint4DNeRF can generate prompt-based objects guided by the seed images and their 3D proxies while preserving multiview consistency. Our generative baseline framework is general which can be readily extended to 4D dynamic NeRFs.
arXiv 2023
sym

Registering Neural Radiance Fields as 3D Density Images

Han Jiang*, Ruoxuan Li*, Haosen Sun, Yu-Wing Tai, Chi-Keung Tang

Arxiv, May 2023

[Paper]

  • We proposes a method to align and merge pre-trained NeRF models of partially overlapping 3D scenes using a generalized registration pipeline, incorporating key point detection, point set registration, and universal pre-trained descriptor networks with contrastive learning strategy.

Additional Publications

🎖 Honors and Awards

📖 Educations

  • 2024.09 - 2026.06 (now), M.S. in Computer Science, Northwestern University, Evanston, IL
  • 2020.09 - 2024.07, BSc in Data Science and Technology, Hong Kong University of Science and Technology (HKUST), Hong Kong

💬 Academic Services

  • Conference Reviewer: ICLR, ACM Multimedia

💻 Internships

  • 07/2024 – 09/2024, Shanghai Artificial Intelligence Laboratory, China.

    Research Intern, working closely with Dr. Peng Ye.

  • 10/2023 – 05/2024, Hong Kong Generative AI Research and Development Center (HKGAI), Hong Kong.

    Research Intern, working closely with Dr. Lujun Li.