Zero-Shot Clothing Sketch Retrieval and Feature Alignment Based on CLIP Contrastive Learning

Sophie Bernard; Jean Dupont

doi:10.71465/ajainn3590

Vol. 7 No. 1 (2026), Articles

Vol. 7 No. 1 (2026)

Zero-Shot Clothing Sketch Retrieval and Feature Alignment Based on CLIP Contrastive Learning

Articles

Published 2026-02-28

Sophie Bernard⁺⁻
Jean Dupont ⁺⁻

https://doi.org/10.71465/ajainn3590

Sophie Bernard

School of Computer Science, The University of Sydney, Sydney NSW 2006, Australia

Jean Dupont

School of Computer Science, The University of Sydney, Sydney NSW 2006, Australia

PDF

Keywords

Zero-Shot Learning
Sketch-Based Image Retrieval
Contrastive Learning
Feature Alignment

Abstract

The rapid expansion of e-commerce platforms has necessitated the development of advanced information retrieval systems that allow users to search for products using intuitive modalities. Sketch-based image retrieval (SBIR) has emerged as a critical research area, bridging the domain gap between abstract, hand-drawn sketches and realistic product photographs. However, traditional SBIR methods often rely on closed-set assumptions, where the training and testing categories overlap completely. This presents a significant limitation in real-world scenarios where new fashion trends and clothing categories emerge constantly. This paper addresses the challenge of Zero-Shot SBIR (ZS-SBIR) within the clothing domain by leveraging the semantic power of Contrastive Language-Image Pre-training (CLIP). We propose a novel feature alignment framework that utilizes CLIP as a backbone to extractrobust semantic representations. By introducing a specializedprojection module and a semantic consistency regularization mechanism, we effectively align the disparate visual features of sketches and photos within a shared embedding space. Our approach mitigates the domain shift problem and preserves semantic integrity for unseen categories. Extensive experiments on standard benchmark datasets demonstrate that our method significantly outperforms state-of-the-art approaches in zero-shot retrieval tasks. The results validate the efficacy of contrastive learning in harmonizing cross-modal features and establish a new baseline for sketch-based clothing retrieval.

PDF

References

Ahmad, N. R. (2025). Financial inclusion: How digital banking is bridging the gap for emerging markets.

Ahmad, N. R. (2025). Exploring the relationship between leadership styles and employee motivation in remote work environments.

This work is licensed under a Creative Commons Attribution 4.0 International License.