VITA: Multimodal Knowledge Extraction and Fusion
In recent years, we have not only observed larger and larger datasets, but also more and more complex datasets with different types of data. For example, Wikipedia is a huge dataset with unstructured text, semi-structured documents, structured knowledge and images. The VITA project is focused on extracting and fusing knowledge from different data sources. We introduce a few projects for various applications using multimodal fusion.
Multimodal Ensemble Fusion for Disambiguation and Retrieval
We designed a probabilistic ensemble fusion model to combine results from both images and text for information retrieval and word sense disambiguation. In our ensemble fusion model, text processing and image processing are conducted on text and images separately and a fusion algorithm is used to combine the results. We achieved better performance using multimodal fusion compared to single-modality approaches by exploiting the complementary and correlative relations between different modalities.
Query-driven Knowledge Base Completion with Multimodal Fusion
Over the past few years, massive amounts of world knowledge have been accumulated in publicly available knowledge bases (KBs), such as Freebase, NELL, and YAGO. Yet despite their seemingly huge size, these knowledge bases are greatly incomplete. Knowledge base completion (KBC) is the task to fill in missing facts for entities in knowledge bases. We design and implement a query-driven knowledge base system with web-based question answering and augmented rule inference. We combine information from the unstructured Web and structured KBs to improve KBC performance. We use query-driven techniques to improve the efficiency of the pipeline to provide fast real-time responses to user queries. Experimental results show our system achieves state-of-the-art KBC performance and high efficiency.
Faculty: Daisy Zhe Wang
Students: Yang Peng, Dihong Gong, Xiaofeng Zhou, Ishan Patwa
Publications
- Multimodal Learning for Web Information Extraction
Dihong Gong, Daisy Zhe Wang, Yang Peng
ACM Multimedia, 2017 - Extracting Visual Knowledge from the Web with Multimodal Learning
Dihong Gong, Daisy Zhe Wang
The International Joint Conference on Artificial Intelligence (IJCAI), 2017
- Multimodal Ensemble Fusion for Disambiguation and Retrieval
Yang Peng, Xiaofeng Zhou, Daisy Zhe Wang, Ishan Patwa, Dihong Gong, Chunsheng Victor Fang
Proceedings of the IEEE Multimedia Magazine, 2016
- Scalable Image Retrieval with Multimodal Fusion
Yang Peng, Xiaofeng Zhou, Daisy Zhe Wang, Chunsheng Victor Fang
Proceedings of the 29th International FLAIRS conference, 2016 - Probabilistic Ensemble Fusion for Multimodal Word Sense Disambiguation
Yang Peng, Daisy Zhe Wang, Ishan Patwa, Dihong Gong, Victor Chunsheng Fang
IEEE International Symposium on Multimedia, 2015