Identifying fine-grained book genres is essential for enhancing user experience through efficient discovery, personalized recommendations, and improved reader engagement. At the same time, it provides publishers and marketers with valuable insights into consumer preferences and emerging market trends. While traditional genre classification methods predominantly rely on textual reviews or content analysis, the integration of additional modalities, such as book covers, blurbs, and metadata, offers richer contextual cues. However, the effectiveness of such multi-modal systems is often hindered by incomplete, noisy, or missing data across modalities. To address this, we propose IMAGINE (Intelligent Multi-modal Adaptive Genre Identification NEtwork), a framework designed to leverage multi-modal data while remaining robust to missing or unreliable information. IMAGINE learns modality-specific feature representations and adaptively prioritizes the most informative sources available at inference time. It further employs a hierarchical classification strategy, grounded in a curated taxonomy of book genres, to capture inter-genre relationships and support multi-label assignments reflective of real-world literary diversity. A key strength of IMAGINE is its adaptability: it maintains high predictive performance even when one modality, such as text or image, is unavailable. We also curated a large-scale hierarchical dataset that structures book genres into multiple levels of granularity, allowing for a more comprehensive evaluation. Experimental results demonstrate that IMAGINE outperformed strong baselines in various settings, with significant gains in scenarios involving incomplete modality-specific data.
翻译:暂无翻译