Peizheng Li

PhD Researcher | Spatial Intelligence · Embodied AI · Foundation Models for Physical World

prof_pic_new.jpg

My research focuses on building autonomous systems that can perceive, model, and reliably act in the 3D physical world, spanning 3D/4D dynamics scene understanding, open world modeling, spatial intelligence and embodied systems.

I am a PhD researcher at the University of Tübingen and Mercedes-Benz AG R&D, advised by Prof. Andreas Geiger and Prof. Andreas Zell. My work bridges academic research and industrial-scale engineering, spanning real-world vehicle constraints and large-scale data pipelines, with publications at CVPR, ICCV, ECCV, and IJCAI as first or core author across four consecutive years.

News

Feb 26, 2026 Our SpaceDrive: Infusing Spatial Awareness into VLM-based Autonomous Driving paper is accepted by CVPR 2026. 🎉 The 1st ranking on nuScenes benchmark and 2nd best close-loop performance on Bench2Drive leaderboard!
Jun 25, 2025 Our AGO: Adaptive Grounding for Open World 3D Occupancy Prediction paper is accepted by ICCV 2025. 🎉
Jul 01, 2024 Our SeFlow: A Self-Supervised Scene Flow Method in Autonomous Driving paper is accepted by ECCV 2024. 🎉 The 1st ranking on Argoverse 2 Self-supervised scene flow leaderboard!
Apr 19, 2023 PowerBEV, our paper on camera-based end-to-end instance prediction in bird’s-eye view, has been accepted by IJCAI 2023. 🎉

Publication Highlights

  1. CVPR 2026
    spacedrive_arch.png
    SpaceDrive: Infusing Spatial Awareness into VLM-based Autonomous Driving
    Infusing explicit spatial representations into vision-language models for robust autonomous driving with 3D spatial reasoning.
    In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2026
  2. ICCV 2025
    ago_arch.jpg
    AGO: Adaptive Grounding for Open World 3D Occupancy Prediction
    Adaptive grounding framework that bridges 2D vision-language features to open-world 3D occupancy prediction without manual vocabulary.
    In International Conference on Computer Vision (ICCV), 2025
  3. ECCV 2024
    seflow_arch.png
    SeFlow: A Self-Supervised Scene Flow Method in Autonomous Driving
    Self-supervised scene flow estimation from point clouds, eliminating the need for expensive human annotations.
    In European Conference on Computer Vision (ECCV), 2024
  4. IJCAI 2023
    powerbev_arch.png
    PowerBEV: A Powerful Yet Lightweight Framework for Instance Prediction in Bird’s-Eye View
    Lightweight yet powerful BEV framework for joint instance segmentation and future motion prediction.
    In International Joint Conference on Artificial Intelligence (IJCAI), Aug 2023
  5. ArXiv 2025
    tqd_track_arch.jpg
    TQD-Track: Temporal Query Denoising for 3D Multi-Object Tracking
    Temporal query denoising approach for robust 3D multi-object tracking in autonomous driving scenarios.
    Shuxiao Ding, Yutong Yang, Julian Wiederer, Markus Braun, Peizheng Li, Juergen Gall, and Bin Yang
    arXiv preprint arXiv:2504.03258, Aug 2025
  6. ArXiv 2025
    fam_hri_arch.jpg
    FAM-HRI: Foundation-Model Assisted Multi-Modal Human-Robot Interaction Combining Gaze and Speech
    Foundation-model assisted multimodal human-robot interaction combining gaze tracking and speech understanding.
    arXiv preprint arXiv:2503.16492, Aug 2025