Welcome! I’m Haosen.

I am a first-year Master’s student in Computer Science at Northwestern University. Currently, I am working with Prof. Manling Li at NU-MLL-Group, collaborating with the Stanford Vision and Learning Lab. Previously, I was a research intern at the Shanghai AI Lab. I earned my bachelor’s degree in Data Science and Technology from the Hong Kong University of Science and Technology, where I worked under the guidance of Prof. Chi-Keung Tang and Prof. Yu-Wing Tai.

My research interests span Multi-modalities Generative AI, 3D Vision, Embodied AI, and Efficient AI. My goal is to empower machines with the ability to extract meaningful patterns and relationships from both structured data (e.g. text, images, and video) and unstructured 3D geometric data. Additionally, I hope to enhance the interpretability and explainability of these models, advancing us toward the development of human-centered and physically-grounded general artificial intelligence.

I am actively seeking a PhD position beginning in Fall 2026. If our research interests align, please feel free to connect!

🔥 News

2025.02: One paper accepted by CVPR 2024!
2024.07: Will join Shanghai Artificial Intelligence Laboratory as a research intern.
2024.07: Two papers accepted by ECCV 2024!
2024.06: Awarded “Kaggle Competitions Expert”.
2024.06: Honored the Dean List Award in Spring 2023-24.
2024.06: Received a Silver medal 🥈 in “Image Matching Challenge 2024 - Hexathlon” (CVPR’24 Workshop), ranked 28^th/ 929. Our solution was released.
2023.11: Received a Silver medal 🥈 in “Google - Fast or Slow? Predict AI Model Runtime”, ranked 40^th/ 616. Our solution was released.

📝 Publications

* indicates equal contribution

CVPR 2025

Re-thinking Temporal Search for Long-Form Video Understanding

Jinhui Ye^*, Zihan Wang^*, Haosen Sun, Keshigeyan Chandrasegaran, Zane Durante, Cristobal Eyzaguirre, Yonatan Bisk, Juan Carlos Niebles, Ehsan Adeli, Li Fei-Fei, Jiajun Wu, Manling Li

Conference on Computer Vision and Pattern Recognition (CVPR), 2025

[Project Page] [Paper] [Project Code]

We introduce LongVideoHaystack (LV-Haystack), a 480-hour dataset for keyframe search in long videos, with 15,092 human-annotated instances (SOTA scores 2.1% Temporal F₁).
Our framework T* reframed temporal search as spatial search with adaptive zooming, boosting GPT-4o from 50.5% to 53.1% and LLaVA-OV from 56.5% to 62.4% on LongVideoBench XL.

ECCV 2024

Auto-DAS: Automated Proxy Discovery for Training-free Distillation-aware Architecture Search

Haosen Sun, Peijie Dong, Zimian Wei, Shitong Shao, Lujun Li

European Conference on Computer Vision (ECCV), 2024

[Project Code] [Paper]

We present Auto-DAS, an automatic proxy discovery framework using an Evolutionary Algorithm (EA) for training-free Distillation-aware Architecture Search (DAS).
Auto-DAS generalizes well to various architectures and search spaces (e.g. ResNet, ViT, NAS-Bench-101, and NAS-Bench-201), achieving state-of-the-art results in both ranking correlation and final searched accuracy.

ECCV 2024

Auto-GAS: Automated Proxy Discovery for Training-free Generative Architecture Search

Lujun Li, Haosen Sun, Shiwen Li, Peijie Dong, Qifeng Liu, Wei Xue, Yike Guo

European Conference on Computer Vision (ECCV), 2024

[Project Code] [Paper]

We introduce Auto-GAS, the first training-free Generation Architecture Search (GAS) framework enabled by an auto-discovered proxy, which achieves competitive scores with 110× faster search than GAN Compression.

arXiv 2023

Inpaint4DNeRF: Promptable Spatio-Temporal NeRF Inpainting with Generative Diffusion Models

Han Jiang^*, Haosen Sun^*, Ruoxuan Li^*, Yu-Wing Tai, Chi-Keung Tang

Arxiv Preprint (Submitted to CVPR’24), Dec 2023

[Project Page] [Paper] [Project Code]

Inpaint4DNeRF can generate prompt-based objects guided by the seed images and their 3D proxies while preserving multiview consistency. Our generative baseline framework is general which can be readily extended to 4D dynamic NeRFs.

arXiv 2023

Registering Neural Radiance Fields as 3D Density Images

Han Jiang^*, Ruoxuan Li^*, Haosen Sun, Yu-Wing Tai, Chi-Keung Tang

Arxiv Preprint, May 2023

[Paper]

We proposes a method to align and merge pre-trained NeRF models of partially overlapping 3D scenes using a generalized registration pipeline, incorporating key point detection, point set registration, and universal pre-trained descriptor networks with contrastive learning strategy.

Additional Publications

Measuring road safety achievement based on EWM-GRA-SVD: A decision-making support system for APEC countries, Faan Chen^*, Lin Shi^*, Yaxin Li, Qilin Wang, Haosen Sun, Xinyu Tang, Jiacheng Zu, Zhenwei Sun, Knowledge-Based Systems

🎖 Honors and Awards

2020.09 - 2024.07 HKUST Admissions Scholarship (Kerry Holdings Limited Scholarship, HK$280,000)
2024.06 The Dean List Award, Top 10%
2024.06 Silver Medal in CVPR’24 Workshop (Image Matching Challenge 2024 - Hexathlon), ranked 28^th/ 929
2023.11 Silver Medal in Kaggle Competition (Google - Fast or Slow? Predict AI Model Runtime), ranked 40^th/ 616
2022.08 Nomination for the Mr. Armin and Mrs. Lillian Kitchell Undergraduate Research Award
2019.10 Bronze Medal and the First Prize in the 36^th Chinese Physics Olympiad (CPHO), Top 0.1%
2019.07 the Third Prize in the 28^th China National Biology Olympiad (CNBO), Top 5%

📖 Educations

2024.09 - 2026.06 (now), M.S. in Computer Science, Northwestern University, USA
2020.09 - 2024.07, BSc in Data Science and Technology, Hong Kong University of Science and Technology (HKUST), Hong Kong

💬 Academic Services

Conference Reviewer: MM(2025), ICLR(2025)

💻 Internships

07/2024 – 09/2024, Shanghai Artificial Intelligence Laboratory, China.

Research Intern, working closely with Dr. Peng Ye.
10/2023 – 05/2024, Hong Kong Generative AI Research and Development Center (HKGAI), Hong Kong.

Research Intern, working closely with Dr. Lujun Li.