Publications

Check the latest through Google Scholar.

2026

  1. CVPR 2026
    spacedrive_arch.png
    SpaceDrive: Infusing Spatial Awareness into VLM-based Autonomous Driving
    Infusing explicit spatial representations into vision-language models for robust autonomous driving with 3D spatial reasoning.
    IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2026
  2. T-ASE 2026
    fam_hri_arch.jpg
    FAM-HRI: Foundation-Model Assisted Multimodal Human-Robot Interaction Combining Gaze and Speech
    Foundation-model assisted multimodal HRI that fuses gaze and speech via LLMs for intuitive robot manipulation.
    Yuzhi Lai, Shenghai Yuan, Peizheng Li, Boya Zhang, Benjamin Kiefer, Tianchen Deng, and Andreas Zell
    IEEE Transactions on Automation Science and Engineering, 2026
  3. ArXiv 2026
    stickyglance_teaser.png
    Sticky-Glance: Robust Intent Recognition for Human Robot Collaboration via Single-Glance
    Robust single-glance intent recognition for natural human-robot collaboration in shared workspaces.
    Yuzhi Lai, Shenghai Yuan, Peizheng Li, and Andreas Zell
    arXiv preprint arXiv:2603.06121, 2026
  4. ICML 2026 Highlight
    seizuresemiologysuite_arch.png
    Seizure-Semiology-Suite (S3): A Clinically Multimodal Dataset, Benchmark, and Models for Seizure Semiology Understanding
    Clinically grounded multimodal benchmark and models for fine-grained seizure semiology understanding.
    Lina Zhang, Tonmoy Monsoor, Peizheng Li, Jiarui Cui, Xinyi Peng, Chong Han, Prateik Sinha, Siyuan Dai, Jessica Nichole Pasqua, Colin M McCrimmon, Weiting Liu, Hailey Marie Miranda, Bing Hu, Xiangting Wu, Tengyou Xu, Chunhan Li, Jiaye Tian, Jiarui Tang, Detao Ma, Lingye Kong, Junnan Lyu, Jungang Li, Yan Zan, Junhua Huang, Rajarshi Mazumder, and Vwani Roychowdhury
    International Conference on Machine Learning (ICML), 2026
  5. EMBC 2026
    pathmotionmllm_arch.png
    Can Multimodal Large Language Models Understand Pathologic Movements? A Pilot Study on Seizure Semiology
    Pilot study evaluating multimodal large language models for interpretable pathological movement recognition in seizure videos.
    Lina Zhang, Tonmoy Monsoor, Mehmet Efe Lorasdagi, Prateik Sinha, Chong Han, Peizheng Li, Yuan Wang, Jessica Pasqua, Colin McCrimmon, Rajarshi Mazumder, and Vwani Roychowdhury
    Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), 2026

2025

  1. ICCV 2025
    ago_arch.jpg
    AGO: Adaptive Grounding for Open World 3D Occupancy Prediction
    Adaptive grounding framework that bridges 2D vision-language features to open-world 3D occupancy prediction without manual vocabulary.
    International Conference on Computer Vision (ICCV), 2025
  2. ArXiv 2025
    tqd_track_arch.jpg
    TQD-Track: Temporal Query Denoising for 3D Multi-Object Tracking
    Temporal query denoising approach for robust 3D multi-object tracking in autonomous driving scenarios.
    Shuxiao Ding, Yutong Yang, Julian Wiederer, Markus Braun, Peizheng Li, Juergen Gall, and Bin Yang
    arXiv preprint arXiv:2504.03258, 2025
  3. ArXiv 2025
    seer_var_arch.png
    SEER-VAR: Semantic Egocentric Environment Reasoner for Vehicle Augmented Reality
    Semantic egocentric environment reasoning for vehicle augmented reality with spatial understanding.
    Yuzhi Lai, Shenghai Yuan, Peizheng Li, Jun Lou, and Andreas Zell
    arXiv preprint arXiv:2508.17255, 2025

2024

  1. ECCV 2024
    seflow_arch.png
    SeFlow: A Self-Supervised Scene Flow Method in Autonomous Driving
    Self-supervised scene flow estimation from point clouds, eliminating the need for expensive human annotations.
    European Conference on Computer Vision (ECCV), 2024

2023

  1. IJCAI 2023
    powerbev_arch.png
    PowerBEV: A Powerful Yet Lightweight Framework for Instance Prediction in Bird’s-Eye View
    Lightweight yet powerful BEV framework for joint instance segmentation and future motion prediction.
    International Joint Conference on Artificial Intelligence (IJCAI), 2023