Multi Sensor Pedestrian Re Identification with Uncertainty Guided Fusion of Vision Language and LiDAR
pdf

Keywords

Multi-sensor fusion
LiDAR
pedestrian re-identification
uncertainty gating
autonomous driving

Abstract

In autonomous driving, pedestrian appearance cues from cameras can be unreliable under glare,  fog, or low illumination, while LiDAR offers complementary geometry signals. Building on  uncertainty-aware CLIP-based modal modeling, this work introduces a fusion framework that  integrates vision–language embeddings with LiDAR-derived shape descriptors using uncertainty gated feature mixing. The gating module increases reliance on LiDAR when visual uncertainty rises  and maintains vision–language dominance in clear conditions. Experiments are conducted on multi sensor datasets covering 210,000 paired camera–LiDAR observations and 26,000 identities. Baselines include camera-only ReID (OSNet, TransReID), CLIP-based ReID, and naive  concatenation fusion. The proposed method improves overall mAP by 3.3%–4.8% and yields larger  gains of 6.0%–7.5% in low-light subsets, while adding less than 8% inference latency compared with camera-only models. 

pdf
Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.

Copyright (c) 2026 Carlos Martínez, Lucía Fernández, Javier Ruiz (Author)