Multimedia Search with Noisy Modalities: Fusion and Multistage Retrieval
We report our experiences from participating to the controlled experiment of the ImageCLEF 2010 Wikipedia Retrieval task. We built an experimental search engine which combines multilingual and multi-image search, employing a holistic web interface and enabling the use of highly distributed indices. Modalities are searched in parallel, and results can be fused via several selectable methods. The engine also provides multistage retrieval, as well as a single text index baselines for comparison purposes. Experiments show that the value added by image modalities is very small when textual annotations exist. The contribution of image modalities is larger when search is performed in a 2-stage fashion, i.e., using image search for re-ranking a smaller set of only the top results retrieved by text. Furthermore, first splitting annotations to many modalities with respect to natural language and/or type and then fusing results has the potential of achieving better effectiveness than using all textual information as a single modality. Concerning fusion, the simple method of linearly combining evidence is found to be the most robust, achieving the best effectiveness.