Peizheng Li

01

Spatial Intelligence

Learning structured 3D/4D representations for scenes, agents, motion, and geometry in dynamic physical environments.

02

Multimodal Foundation Model

Connecting vision, language, temporal signals, and spatial context for generalizable physical-world understanding.

03

World Model

Modeling how scenes evolve, how agents move, and how actions may change the physical world.

Feb 26, 2026	Our SpaceDrive: Infusing Spatial Awareness into VLM-based Autonomous Driving paper is accepted by CVPR 2026. 🎉 The 1st ranking on nuScenes benchmark and 2nd best close-loop performance on Bench2Drive leaderboard!
Jun 25, 2025	Our AGO: Adaptive Grounding for Open World 3D Occupancy Prediction paper is accepted by ICCV 2025. 🎉
Jul 01, 2024	Our SeFlow: A Self-Supervised Scene Flow Method in Autonomous Driving paper is accepted by ECCV 2024. 🎉 The 1st ranking on Argoverse 2 Self-supervised scene flow leaderboard!
Apr 19, 2023	PowerBEV, our paper on camera-based end-to-end instance prediction in bird’s-eye view, has been accepted by IJCAI 2023. 🎉

CVPR 2026

SpaceDrive: Infusing Spatial Awareness into VLM-based Autonomous Driving

Infusing explicit spatial representations into vision-language models for robust autonomous driving with 3D spatial reasoning.

Peizheng Li, Zhenghao Zhang, David Holtz, Hang Yu, Yutong Yang, Yuzhi Lai, Rui Song, Andreas Geiger, and Andreas Zell

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2026

arXiv PDF Supp Video Code Poster Website
ICCV 2025

AGO: Adaptive Grounding for Open World 3D Occupancy Prediction

Adaptive grounding framework that bridges 2D vision-language features to open-world 3D occupancy prediction without manual vocabulary.

Peizheng Li, Shuxiao Ding, You Zhou, Qingwen Zhang, Onat Inak, Larissa Triess, Niklas Hanselmann, Marius Cordts, and Andreas Zell

International Conference on Computer Vision (ICCV), 2025

arXiv PDF Supp Code Poster Website
ECCV 2024

SeFlow: A Self-Supervised Scene Flow Method in Autonomous Driving

Self-supervised scene flow estimation from point clouds, eliminating the need for expensive human annotations.

Qingwen Zhang, Yi Yang, Peizheng Li, Olov Andersson, and Patric Jensfelt

European Conference on Computer Vision (ECCV), 2024

arXiv PDF Supp Code Poster
IJCAI 2023

PowerBEV: A Powerful Yet Lightweight Framework for Instance Prediction in Bird’s-Eye View

Lightweight yet powerful BEV framework for joint instance segmentation and future motion prediction.

Peizheng Li, Shuxiao Ding, Xieyuanli Chen, Niklas Hanselmann, Marius Cordts, and Juergen Gall

International Joint Conference on Artificial Intelligence (IJCAI), 2023

arXiv PDF Code Poster
T-ASE 2026

FAM-HRI: Foundation-Model Assisted Multimodal Human-Robot Interaction Combining Gaze and Speech

Foundation-model assisted multimodal HRI that fuses gaze and speech via LLMs for intuitive robot manipulation.

Yuzhi Lai, Shenghai Yuan, Peizheng Li, Boya Zhang, Benjamin Kiefer, Tianchen Deng, and Andreas Zell

IEEE Transactions on Automation Science and Engineering, 2026

DOI arXiv PDF Code
ArXiv 2025

TQD-Track: Temporal Query Denoising for 3D Multi-Object Tracking

Temporal query denoising approach for robust 3D multi-object tracking in autonomous driving scenarios.

Shuxiao Ding, Yutong Yang, Julian Wiederer, Markus Braun, Peizheng Li, Juergen Gall, and Bin Yang

arXiv preprint arXiv:2504.03258, 2025

arXiv PDF

Dec 24, 2025	SpaceDrive：Infusing Spatial Awareness into VLM-based Autonomous Driving
Dec 15, 2025	In the Perspective of Manifold Hypotheses - 2
Nov 25, 2025	In the Perspective of Manifold Hypotheses - 1

Peizheng Li

Modeling physical worlds across space, modality, and time.

Spatial Intelligence

Multimodal Foundation Model

World Model

Recent signals

Publication highlights

Research notes

I am open to research scientist / research engineer positions in spatial AI, robotics, autonomous driving, and embodied intelligence.