ProbKB: Web-Scale Probabilistic Knowledge Base
Recent years have seen a drastic rise in the construction of web-scale knowledge bases (e.g., Freebase, YAGO, DBPedia). These knowledge bases store structured information about real-world people, places, organizations, etc. However, due to limitations of human knowledge and extraction algorithms, current knowledge bases are still far from complete. The ProbKB project aims at building a web-scale probabilistic knowledge base through scalable learning and inference. The goal is supported by two current projects:
Mining first-order knowledge
We design the Ontological Pathfinding algorithm that scales first-order rule mining to web knowledge bases via a series of parallelization and optimization techniques: a relational knowledge base model to apply inference rules in batches, a new rule mining algorithm that parallelizes the join queries, a novel partitioning algorithm to break the mining tasks into smaller independent subtasks, and a pruning strategy to eliminate unsound and resource-consuming rules before applying them. Combining these techniques, we are able to develop a first rule mining system that scales to Freebase, the largest public knowledge base with 112 million entities and 388 million facts. We mine 36,625 inference rules in 34 hours; no existing approach achieves this scale.
First-order inference engine
We design an efficient inference engine to infer implicit knowledge from existing knowledge bases: 1) We present a formal definition and a novel relational model for probabilistic knowledge bases. This model allows an efficient SQL-based inference algorithm for knowledge expansion that applies inference rules in batches; 2) We implement ProbKB on massive parallel processing databases to achieve further scalability; 3) We combine several quality control methods that identify erroneous rules, facts, and ambiguous entities to improve the precision of inferred facts. The ProbKB inference engine outperforms the state-of-the-art inference engine in terms of both performance and quality.
- Ontological Pathfinding: Mining First-Order Knowledge from Large Knowledge Bases
Yang Chen, Sean Goldberg, Daisy Zhe Wang, Soumitra Siddharth Johri
Proceedings of the ACM SIGMOD International Conference on Management of Data, 2016
- Knowledge Expansion over Probabilistic Knowledge Bases
Yang Chen, Daisy Zhe Wang
Proceedings of the ACM SIGMOD International Conference on Management of Data, 2014
- Web-Scale Knowledge Inference Using Markov Logic Networks
Yang Chen, Daisy Zhe Wang
Proceedings of ICML workshop on Structured Learning: Inferring Graphs from Structured and Unstructured Inputs (SLG), 2013, Atlanta
- Ontological Pathfinding
Mining first-order knowledge from large knowledge bases.
- Knowledge Expansion
Inferring hidden knowledge from knowledge bases.
- Freebase data dump; please contact Yang Chen for the clean 388M Freebase facts.
- 36,625 Freebase first-order rules
The ProbKB project is partially supported by NSF IIS Award # 1526753, DARPA under FA8750-12-2-0348-2 (DEFT/CUBISM), and a generous gift from Google. We also thank Dr. Milenko Petrovic and Dr. Alin Dobra for the helpful discussions on query optimization.