ProbKB: Web-Scale Probabilistic Knowledge Base
Recent years have seen a drastic rise in the construction of web-scale knowledge bases (e.g., Freebase, YAGO, DBPedia). These knowledge bases store structured information about real-world people, places, organizations, etc. However, due to limitations of human knowledge and extraction algorithms, current knowledge bases are still far from complete. The ProbKB project aims at building a web-scale probabilistic knowledge base through scalable learning and inference. The goal is supported by two current projects:
Mining first-order knowledge
We design the Ontological Pathfinding algorithm that scales first-order rule mining to web knowledge bases via a series of parallelization and optimization techniques: a relational knowledge base model to apply inference rules in batches, a new rule mining algorithm that parallelizes the join queries, a novel partitioning algorithm to break the mining tasks into smaller independent subtasks, and a pruning strategy to eliminate unsound and resource-consuming rules before applying them. Combining these techniques, we are able to develop a first rule mining system that scales to Freebase, the largest public knowledge base with 112 million entities and 388 million facts. We mine 36,625 inference rules in 34 hours; no existing approach achieves this scale.
First-order inference engine
We design an efficient inference engine to infer implicit knowledge from existing knowledge bases: 1) We present a formal definition and a novel relational model for probabilistic knowledge bases. This model allows an efficient SQL-based inference algorithm for knowledge expansion that applies inference rules in batches; 2) We implement ProbKB on massive parallel processing databases to achieve further scalability; 3) We combine several quality control methods that identify erroneous rules, facts, and ambiguous entities to improve the precision of inferred facts. The ProbKB inference engine outperforms the state-of-the-art inference engine in terms of both performance and quality.
Faculty: Daisy Zhe Wang
Students: Yang Chen, Sean Goldberg, Soumitra Siddharth Johri
Publications
- Ontological Pathfinding: Mining First-Order Knowledge from Large Knowledge Bases
Yang Chen, Sean Goldberg, Daisy Zhe Wang, Soumitra Siddharth Johri
Proceedings of the ACM SIGMOD International Conference on Management of Data, 2016
- Knowledge Expansion over Probabilistic Knowledge Bases
Yang Chen, Daisy Zhe Wang
Proceedings of the ACM SIGMOD International Conference on Management of Data, 2014
- Web-Scale Knowledge Inference Using Markov Logic Networks
Yang Chen, Daisy Zhe Wang
Proceedings of ICML workshop on Structured Learning: Inferring Graphs from Structured and Unstructured Inputs (SLG), 2013, Atlanta
Software
- Ontological Pathfinding
Mining first-order knowledge from large knowledge bases.
- Knowledge Expansion
Inferring hidden knowledge from knowledge bases.
Data
- Freebase data dump; please contact Yang Chen for the clean 388M Freebase facts.
- 36,625 Freebase first-order rules
Acknowledgments
The ProbKB project is partially supported by NSF IIS Award # 1526753, DARPA under FA8750-12-2-0348-2 (DEFT/CUBISM), and a generous gift from Google. We also thank Dr. Milenko Petrovic and Dr. Alin Dobra for the helpful discussions on query optimization.
For any questions, please contact Yang Chen or Dr. Daisy Zhe Wang.