Abstract
The rapid expansion of e-commerce platforms has necessitated the development of advanced information retrieval systems that allow users to search for products using intuitive modalities. Sketch-based image retrieval (SBIR) has emerged as a critical research area, bridging the domain gap between abstract, hand-drawn sketches and realistic product photographs. However, traditional SBIR methods often rely on closed-set assumptions, where the training and testing categories overlap completely. This presents a significant limitation in real-world scenarios where new fashion trends and clothing categories emerge constantly. This paper addresses the challenge of Zero-Shot SBIR (ZS-SBIR) within the clothing domain by leveraging the semantic power of Contrastive Language-Image Pre-training (CLIP). We propose a novel feature alignment framework that utilizes CLIP as a backbone to extractrobust semantic representations. By introducing a specializedprojection module and a semantic consistency regularization mechanism, we effectively align the disparate visual features of sketches and photos within a shared embedding space. Our approach mitigates the domain shift problem and preserves semantic integrity for unseen categories. Extensive experiments on standard benchmark datasets demonstrate that our method significantly outperforms state-of-the-art approaches in zero-shot retrieval tasks. The results validate the efficacy of contrastive learning in harmonizing cross-modal features and establish a new baseline for sketch-based clothing retrieval.
References
Ahmad, N. R. (2025). Financial inclusion: How digital banking is bridging the gap for emerging markets.
Ahmad, N. R. (2025). Exploring the relationship between leadership styles and employee motivation in remote work environments.

This work is licensed under a Creative Commons Attribution 4.0 International License.
Copyright (c) 2026 Sophie Bernard, Jean Dupont (Author)