We consider the problem of extracting text instances of predefined categories (e.g. city and person) from the Web. Instances of a category may be scattered across thousands of independent sources in many different formats with potential noises, which makes open-domain information extraction a challenging problem. Learning syntactic rules like “cities such as _” or “_ is a city” in a semi-supervised manner using a few labeled examples is usually unreliable because 1) high quality syntactic rules are rare and 2) the learning task is usually underconstrained. To address these problems, in this paper we propose to learn multimodal rules to combat the difficulty of syntactic rules. The multimodal rules are learned from information sources of different modalities, which is motivated by an intuition that information that is difficult to disambiguate correctly in one modality may be easily recognized in another. To demonstrate the effectiveness of this method, we have built a sophisticated end-to-end multimodal information extraction system that takes unannotated raw web pages as input, and generates a set of extracted instances (e.g. Boston is an instance of city) as outputs. More specifically, our system learns reliable relationship between multimodal information by multimodal relation analysis on big unstructured data. Based on the learned relationship, we further train a set of multimodal rules for information extraction. Experimental evaluation shows that a greater accuracy for information extraction can be achieved by multimodal learning.
Authors:
Dihong Gong, Daisy Zhe Wang, Yang Peng
Bibtex:
@article{gongmultimodal, title={Multimodal Learning for Web Information Extraction}, author={Gong, Dihong; Wang, Daisy Zhe; Peng, Yang}, booktitle = {The ACM International Conference on Multimedia, 2017,Mountain View, California, USA, October 23-27, 2017}, pages = {288-296}, year = {2017}, }
Download:
[pdf]