by Ali Sadeghian
In this blog, we will learn about:
- Knowledge graphs
- Rule mining on KGs
- Differentiable rule mining (DRUM)
- Applications of DRUM
Introduction
AI systems can hugely benefit from incorporating knowledge that is often trivial for humans. For example, knowing al-Khwarizmi was born in Khawrazm, and Khawrazm was part of Persia; we can infer that al-Khwarizmi was born in Persia. Here, we applied the rule:
BornIn(x, y) Λ LocatedIn(y, z) ⇒ BornIn(x, z)
to our known facts and inferred a new fact. Using rules for reasoning have various advantages, the most important being their interpretability for humans. So where can a computer learn these sorts of rules? One way is to manually ask expert domains to list as many rules as possible so the AI system can use them. This method can be used in small scales, but it is very expensive and time consuming for use in web-scale AI such as Smart Assistants. Therefore, methods that can automatically learn rules from the available knowledge would be very useful.
In this blog, we will give an overview of traditional rule mining methods and describe our latest model, DRUM, a fully differentiable rule mining method for mining logical rules from knowledge graphs (KG).
Knowledge Graphs and Rule mining
Knowledge graphs store structured information about real-world people, locations, companies, and governments, etc. Knowledge graph construction has attracted the attention of researchers, foundations, industry, and governments. We can use these large collections of facts to learn rules. DRUM learns to mine first-order logical Horn clauses from a knowledge graph. In particular, closed and connected rules. These assumptions ensure finding meaningful rules that are human understandable. Connectedness also prevents finding rules with unrelated relations.
Formally, we aim to find all , relations
and a confidence value
, such that:
Here are variables that can be substituted with entities. This requires searching a discrete space to find
s and searching a continuous space to learn
for every particular rule.
Traditional rule mining methods like AMIE and OP [1, 2] use pre-defined formulas (e.g.,Supp(.) and Conf(.) ) for computing .
Intuitionally, these metrics measure correctness (Conf) and usefulness (Supp) of a rule. However, these metrics are somewhat arbitrary and not tuneable for a specific task that wants to take advantage of the rules. It would be great if we could somehow learn optimized for a given task. In DRUM, we show this significantly improves the task that is using the rules (we use link prediction accuracy as an example).
Differentiable Rule Mining
How can we tune the confidence of each rule for a specific task? Let’s formulate it as a differentiable loss and numerically solve it. The two metrics discussed above are inherently discrete and can not be directly used in a continuous cost function. NeuralLP and DRUM [3, 4], use the graphs adjacency matrices to formulate a differentiable loss based on rule structures and the rule confidences. Let us use a toy example KG with only 2 relations and limit our rule lengths to 2, to show how we can use algebraic tricks to formulate this discrete problem as a continuous cost function (For a more detailed explanation please see section 4.1 of DRUM [4]).
The possible nontrivial rules with a maximum length of 2 and head H = R1(x, y) in our small KG are:
Let’s use to denote the adjacency matrix of
and let
. Then each rule and its confidence can then be represented in the expanded form of
, e.g.,
. We use this observation to construct a cost function that optimizes for link prediction accuracy (see paper for details) and find the optimal values for
.
Experiments
To evaluate DRUM, we use the mined rules for the task of KG completion on inductive and transductive link prediction tasks on subsets of two widely used knowledge graphs WordNet and Freebase (WN18RR and FB15K-237). The table below compares DRUM against embedding based models (top section of the table) and path-based methods (bottom section).