Publications | Peizheng Li

2026

CVPR 2026

SpaceDrive: Infusing Spatial Awareness into VLM-based Autonomous Driving

Infusing explicit spatial representations into vision-language models for robust autonomous driving with 3D spatial reasoning.

Peizheng Li, Zhenghao Zhang, David Holtz, Hang Yu, Yutong Yang, Yuzhi Lai, Rui Song, Andreas Geiger, and Andreas Zell

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2026

arXiv PDF Supp Video Code Poster Website
ICML 2026 (Highlight)

Seizure-Semiology-Suite (S3): A Clinically Multimodal Dataset, Benchmark, and Models for Seizure Semiology Understanding

Clinically grounded multimodal benchmark and models for fine-grained seizure semiology understanding.

Lina Zhang, Tonmoy Monsoor, Peizheng Li, Jiarui Cui, Xinyi Peng, Chong Han, Prateik Sinha, Siyuan Dai, Jessica Nichole Pasqua, Colin M McCrimmon, Weiting Liu, Hailey Marie Miranda, Bing Hu, Xiangting Wu, Tengyou Xu, Chunhan Li, Jiaye Tian, Jiarui Tang, Detao Ma, Lingye Kong, Junnan Lyu, Jungang Li, Yan Zan, Junhua Huang, Rajarshi Mazumder, and Vwani Roychowdhury

International Conference on Machine Learning (ICML), 2026

arXiv PDF
IROS 2026

G2DP: Diffusion Planning with Spatio-Temporal Grid Guidance

Grid-guided diffusion planner injecting dense spatio-temporal cost gradients into denoising for safe, route-adherent closed-loop driving.

Hang Yu, Ye Jin, Alessandro Canevaro, Julian Schmidt, Julian Jordan, Peizheng Li, Marc Kaufeld, Silvan Lindner, Johannes Betz, and Wilhelm Stork

IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2026

arXiv PDF
IROS 2026

Shift & Drift: A Zero-Shot Benchmark for Generalizable and Robust Autonomous Driving Motion Planning

Zero-shot dual-track benchmark stress-testing motion planners under semantic shift and state-distribution drift in closed-loop driving.

Alessandro Canevaro, Hang Yu, Julian Schmidt, Peizheng Li, Silvan Lindner, Wilhelm Stork, Georg Martius, and Julian Jordan

IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2026

arXiv PDF
T-ASE 2026

FAM-HRI: Foundation-Model Assisted Multimodal Human-Robot Interaction Combining Gaze and Speech

Foundation-model assisted multimodal HRI that fuses gaze and speech via LLMs for intuitive robot manipulation.

Yuzhi Lai, Shenghai Yuan, Peizheng Li, Boya Zhang, Benjamin Kiefer, Tianchen Deng, and Andreas Zell

IEEE Transactions on Automation Science and Engineering, 2026

DOI arXiv PDF Code
EMBC 2026

Can Multimodal Large Language Models Understand Pathologic Movements? A Pilot Study on Seizure Semiology

Pilot study evaluating multimodal large language models for interpretable pathological movement recognition in seizure videos.

Lina Zhang, Tonmoy Monsoor, Mehmet Efe Lorasdagi, Prateik Sinha, Chong Han, Peizheng Li, Yuan Wang, Jessica Pasqua, Colin McCrimmon, Rajarshi Mazumder, and Vwani Roychowdhury

Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), 2026

arXiv PDF Code
ArXiv 2026

Glance-Say: Multimodal Human-Robot Collaboration and Intent Recognition via Sticky Glance

Gaze-speech multimodal interaction fusing sticky-glance intent stabilization and shared control for assistive robotic manipulation.

Yuzhi Lai, Shenghai Yuan, Peizheng Li, Benjamin Kiefer, and Andreas Zell

arXiv preprint arXiv:2603.06121, 2026

arXiv PDF

2025

ICCV 2025

AGO: Adaptive Grounding for Open World 3D Occupancy Prediction

Adaptive grounding framework that bridges 2D vision-language features to open-world 3D occupancy prediction without manual vocabulary.

Peizheng Li, Shuxiao Ding, You Zhou, Qingwen Zhang, Onat Inak, Larissa Triess, Niklas Hanselmann, Marius Cordts, and Andreas Zell

International Conference on Computer Vision (ICCV), 2025

arXiv PDF Supp Code Poster Website
ArXiv 2025

TQD-Track: Temporal Query Denoising for 3D Multi-Object Tracking

Temporal query denoising approach for robust 3D multi-object tracking in autonomous driving scenarios.

Shuxiao Ding, Yutong Yang, Julian Wiederer, Markus Braun, Peizheng Li, Juergen Gall, and Bin Yang

arXiv preprint arXiv:2504.03258, 2025

arXiv PDF
ArXiv 2025

SEER-VAR: Semantic Egocentric Environment Reasoner for Vehicle Augmented Reality

Semantic egocentric environment reasoning for vehicle augmented reality with spatial understanding.

Yuzhi Lai, Shenghai Yuan, Peizheng Li, Jun Lou, and Andreas Zell

arXiv preprint arXiv:2508.17255, 2025

arXiv PDF

2024

ECCV 2024

SeFlow: A Self-Supervised Scene Flow Method in Autonomous Driving

Self-supervised scene flow estimation from point clouds, eliminating the need for expensive human annotations.

Qingwen Zhang, Yi Yang, Peizheng Li, Olov Andersson, and Patric Jensfelt

European Conference on Computer Vision (ECCV), 2024

arXiv PDF Supp Code Poster

2023

IJCAI 2023

PowerBEV: A Powerful Yet Lightweight Framework for Instance Prediction in Bird’s-Eye View

Lightweight yet powerful BEV framework for joint instance segmentation and future motion prediction.

Peizheng Li, Shuxiao Ding, Xieyuanli Chen, Niklas Hanselmann, Marius Cordts, and Juergen Gall

International Joint Conference on Artificial Intelligence (IJCAI), 2023

arXiv PDF Code Poster