Current approaches to Information Extraction (IE) are capable of extracting large amounts of facts with associated probabilities. Because no current IE system is perfect, complementary and conflicting facts are obtained when different systems are run over the same data. Knowledge Fusion (KF) is the problem of aggregating facts from different extractors. Existing methods approach KF using supervised learning or deep linguistic knowledge, which either lack sufficient data or are not robust enough. We propose a semi-supervised application of Consensus Maximization to the KF problem, using a combination of supervised and unsupervised models. Consensus Maximization Fusion (CM Fusion) is able to promote high quality facts and eliminate incorrect ones. We demonstrate the effectiveness of our system on the NIST Slot Filler Validation contest, which seeks to evaluate and aggregate multiple independent information extractors. Our system achieved the highest F1 score relative to other system submissions.
Authors:
Miguel E. RodrÃguez, Sean Goldberg, Daisy Zhe Wang
Bibtex:
@article{rodriguezconsensus, title={Consensus Maximization Fusion of Probabilistic Information Extractors}, author={Rodr{\i}guez, Miguel and Goldberg, Sean and Wang, Daisy Zhe} }
Download:
[pdf]