• Home
  • Blog
  • People
  • Projects
  • Publications
  • Seminars
  • DSR Expo
  • Courses

Data Science Research

Menu
  • Home
  • Blog
  • People
  • Projects
  • Publications
  • Seminars
  • DSR Expo
  • Courses
Home › publications › research › DRUM: End-To-End Differentiable Rule Mining On Knowledge Graphs

DRUM: End-To-End Differentiable Rule Mining On Knowledge Graphs

December 16, 2019     Comment Closed     publications, research

by Ali Sadeghian

In this blog, we will learn about:

  1. Knowledge graphs
  2. Rule mining on KGs
  3. Differentiable rule mining (DRUM)
  4. Applications of DRUM

Introduction

AI systems can hugely benefit from incorporating knowledge that is often trivial for humans. For example, knowing al-Khwarizmi was born in Khawrazm, and Khawrazm was part of Persia; we can infer that al-Khwarizmi was born in Persia. Here, we applied the rule:

BornIn(x, y) Λ LocatedIn(y, z) ⇒ BornIn(x, z)

to our known facts and inferred a new fact. Using rules for reasoning have various advantages, the most important being their interpretability for humans. So where can a computer learn these sorts of rules? One way is to manually ask expert domains to list as many rules as possible so the AI system can use them. This method can be used in small scales, but it is very expensive and time consuming for use in web-scale AI such as Smart Assistants. Therefore, methods that can automatically learn rules from the available knowledge would be very useful.

In this blog, we will give an overview of traditional rule mining methods and describe our latest model, DRUM, a fully differentiable rule mining method for mining logical rules from knowledge graphs (KG).

Knowledge Graphs and Rule mining

Knowledge graphs store structured information about real-world people, locations, companies, and governments, etc. Knowledge graph construction has attracted the attention of researchers, foundations, industry, and governments. We can use these large collections of facts to learn rules. DRUM learns to mine first-order logical Horn clauses from a knowledge graph. In particular, closed and connected rules. These assumptions ensure finding meaningful rules that are human understandable. Connectedness also prevents finding rules with unrelated relations.

Formally, we aim to find all T \in \mathbb{N}, relations B_1, B_2, \cdots, B_T, H and a confidence value \alpha \in \mathbb{R}, such that:

    \[B_1(x, z_1) \land B_2(z_1,z_2) \land \cdots \land B_{\scriptstyle_{T}}(z_{\scriptstyle_{T-1}}, y) \implies H(x,y) \, \, :\alpha\]

Here z_i are variables that can be substituted with entities. This requires searching a discrete space to find B_is and searching a continuous space to learn \alpha for every particular rule.

Traditional rule mining methods like AMIE and OP [1, 2] use pre-defined formulas (e.g.,Supp(.) and Conf(.) ) for computing \alpha.

Intuitionally, these metrics measure correctness (Conf) and usefulness (Supp) of a rule. However, these metrics are somewhat arbitrary and not tuneable for a specific task that wants to take advantage of the rules. It would be great if we could somehow learn alpha optimized for a given task. In DRUM, we show this significantly improves the task that is using the rules (we use link prediction accuracy as an example).

Differentiable Rule Mining

How can we tune the confidence of each rule for a specific task? Let’s formulate it as a differentiable loss and numerically solve it. The two metrics discussed above are inherently discrete and can not be directly used in a continuous cost function. NeuralLP and DRUM [3, 4], use the graphs adjacency matrices to formulate a differentiable loss based on rule structures and the rule confidences. Let us use a toy example KG with only 2 relations and limit our rule lengths to 2, to show how we can use algebraic tricks to formulate this discrete problem as a continuous cost function (For a more detailed explanation please see section 4.1 of DRUM [4]).

The possible nontrivial rules with a maximum length of 2 and head H = R1(x, y) in our small KG are:

R_2 \Rightarrow R_1; \; \alpha_1 \\ R_1 \wedge R_2 \Rightarrow R_1; \; \alpha_2 \\ R_2 \wedge R_1 \Rightarrow R_1; \; \alpha_3

Let’s use A_i to denote the adjacency matrix of R_i and let \Omega_H^I(\mathbf{a}) = (a_{1,0} I +a_{1,1} A_1 +a_{1,2} A_2) (a_{2,0} I +a_{2,1} A_1 +a_{2,2} A_2). Then each rule and its confidence can then be represented in the expanded form of \Omega, e.g., \alpha_3 \Longleftrightarrow a_{1,2} * a_{2,1}. We use this observation to construct a cost function that optimizes for link prediction accuracy (see paper for details) and find the optimal values for \mathbf{a}.

 

Experiments

To evaluate DRUM, we use the mined rules for the task of KG completion on inductive and transductive link prediction tasks on subsets of two widely used knowledge graphs WordNet and Freebase (WN18RR and FB15K-237). The table below compares DRUM against embedding based models (top section of the table) and path-based methods (bottom section).

 

 

 

 

 

 

 

 

 

 

 

References

[1] Galárraga, Luis Antonio, et al. “AMIE: association rule mining under incomplete evidence in ontological knowledge bases.” Proceedings of the 22nd international conference on World Wide Web. 2013.

[2] Chen, Yang, et al. “Ontological pathfinding.” Proceedings of the 2016 International Conference on Management of Data. 2016.

[3] Yang, Fan, Zhilin Yang, and William W. Cohen. “Differentiable learning of logical rules for knowledge base reasoning.” Advances in Neural Information Processing Systems. 2017.

[4] Sadeghian, Ali, et al. “DRUM: End-To-End Differentiable Rule Mining On Knowledge Graphs.” Advances in Neural Information Processing Systems. 2019.

publications research

 Previous Post

IDTrees Data Science Challenge: 2017

― April 1, 2019

Next Post 

A Brief Overview of Weak Supervision

― October 16, 2020

Related Articles

DBSim: Extensible Database Simulator for Fast Prototyping In-Database Algorithms
DrugEHRQA: A Question Answering Dataset on Structured and Unstructured Electronic Health Records For Medicine Related Queries
A Brief Overview of Weak Supervision
IDTrees Data Science Challenge: 2017
Efficient Conditional Rule Mining over Knowledge Bases

Recent Posts

  • DBSim: Extensible Database Simulator for Fast Prototyping In-Database Algorithms
  • DrugEHRQA: A Question Answering Dataset on Structured and Unstructured Electronic Health Records For Medicine Related Queries
  • A Brief Overview of Weak Supervision
  • DRUM: End-To-End Differentiable Rule Mining On Knowledge Graphs
  • IDTrees Data Science Challenge: 2017

Categories

  • courses
  • ecology
  • NIST and open eval
  • publications
  • research
  • research directions
  • survey
  • Uncategorized

Archives

  • February 2023
  • October 2020
  • December 2019
  • April 2019
  • December 2018
  • August 2018
  • February 2018
  • November 2017
  • June 2017
  • May 2017
  • March 2017
  • December 2016
  • October 2016
  • April 2016
  • March 2016
  • December 2015
  • November 2015
  • October 2015
  • May 2015
  • November 2014
  • October 2014
  • July 2014
  • May 2014
  • March 2014
  • December 2013
  • November 2013
  • October 2013
  • September 2013

Recent Posts

  • DBSim: Extensible Database Simulator for Fast Prototyping In-Database Algorithms
  • DrugEHRQA: A Question Answering Dataset on Structured and Unstructured Electronic Health Records For Medicine Related Queries
  • A Brief Overview of Weak Supervision
  • DRUM: End-To-End Differentiable Rule Mining On Knowledge Graphs
  • IDTrees Data Science Challenge: 2017