Prompt reverse learning: enhancing visual language models for medical image recognition

Abstract

Large visual language models such as CLIP have demonstrated impressive performance on various downstream tasks involving natural images, by leveraging prompt learning. However, these models often falter when applied to tasks involving medical images. We provide an experimental insight into this phenomenon: CLIP is insensitive to the class names of medical images. For instance, replacing the class name “medulloblastoma” (a type of brain tumor) with “dog” in prompts has minimal impact on performance, a phenomenon not observed with natural images. To realign prompt learning with medical image recognition, we propose a novel prompt learning strategy, termed prompt reverse learning (PeLen). Different from the existing methods that adapt CLIP’s representations to downstream tasks, PeLen adapts task-specific representations to CLIP’s representations. Built upon the insensitivity to the class names of medical images, PeLen designates natural images and their class names to represent a specific class of medical images and class names, e.g., allowing the image and text of a dog to correspond to the image and text of medulloblastoma. Consequently, PeLen learns prompts to align the representations between the medical images and the visual and textual representations of natural images. Our experiments demonstrate the efficacy of PeLen for medical image recognition.

FullText(HTML)