Multiple web-scale knowledge bases (e.g., Freebase, YAGO, NELL) have been constructed using semi-supervised or unsupervised information extraction techniques and many of them, despite their large sizes, are continuously growing. Much research effort has been put into mining inference rules from these knowledge bases. To address the task of rule mining over evolving web-scale knowledge bases, we propose a parallel incremental rule mining framework. Our approach is able to efficiently mine rules based on the relational model and applies updates to large knowledge bases; we propose an alternative metric that reduces computation complexity without compromising quality; we apply multiple optimization techniques that reduce runtime by more than 2 orders of magnitude. Experiments show that our approach can scale to web-scale knowledge bases efficiently and save over 90\% time compared to the state-of-the-art batch rule mining system. We apply optimization techniques to the batch rule mining algorithm, reducing runtime by more than half compared to the state-of-the-art. To the best of our knowledge, our incremental rule mining system is the first that handles updates to web-scale knowledge bases.
Comparison with batch algorithm OP
The table below shows the runtime of the state-of-the-art batch rule mining system OP. The two figures below show the 3 variants of our incremental rule mining algorithms on the two datasets. We can see that our ‘xconf’ variant performs consistently better than the other 2 variants. Also our incremental algorithms can save up to 90% of running time comparing to the batch counterpart on Freebase dataset.