• Home
  • Blog
  • People
  • Projects
  • Publications
  • Seminars
  • DSR Expo
  • Courses

Data Science Research

Menu
  • Home
  • Blog
  • People
  • Projects
  • Publications
  • Seminars
  • DSR Expo
  • Courses
Home › publications › research directions › Archimedes: Efficient Query Processing over Probabilistic Knowledge Bases

Archimedes: Efficient Query Processing over Probabilistic Knowledge Bases

June 26, 2017     Comment Closed     publications, research directions

We present the ARCHIMEDES system for efficient query processing over probabilistic knowledge bases. We design ARCHIMEDES for knowledge bases containing incomplete and uncertain information due to limitations of information sources and human knowledge. Answering queries over these knowledge bases requires efficient probabilistic inference. In this paper, we describe ARCHIMEDES’s efficient knowledge expansion and query-driven inference over UDA-GIST, an in-database unified data- and graph-parallel computation framework. With an efficient inference engine, ARCHIMEDES produces reasonable results for queries over large uncertain knowledge bases. We use the Reverb-Sherlock and Wikilinks knowledge bases to show ARCHIMEDES achieves satisfactory quality with real-time performance.

Probabilistic Knowledge Bases

A probabilistic knowledge base extends first-order knowledge bases to support uncertain facts and rules download maps for sygic. The primary goal of modeling uncertainty is to represent knowledge mined by probabilistic information extraction algorithms that contain uncertain facts and rules, as illustrated by the Reverb-Sherlock KB example in the table below:

We view a probabilistic knowledge base as a template for constructing ground factor graphs, which encodes a joint probability distribution over the facts (aka. factors, or relationships.) with the rules connecting them. The factor graph for the above table can be shown below in (b), with each fact explained in (c). Alternatively, it can also be represented with entities as nodes, and relations as edges in (a) as knowledge graph.

In a factor graph Φ = {φ1,…,φN}, the factors to- gether determine a joint probability distribution over the random vector X consisting of all the random variables in the factor graph:

where: X is the vector of factors as random variables, the ni(x) is the number of true groundings of rule Fi in x, Wi is its weight, and Z is the partition function,i.e., normalization constant freeplane herunterladen.

ARCHIMEDES answers user queries by computing the marginal probability P (X = x), the probability distribution of a query node X defined above. The computation of marginal probabilities is called marginal inference in probabilistic graphical models literature. Exact inference is tractable for only limited families of graphical models, and state-of-the-art MLN inference engines use sampling algorithms including Markov chain Monte Carlo (MCMC) and MC- SAT. Observing the ground factor graphs of real knowledge bases are large while user queries often focus on small parts of the knowledge graph, ARCHIMEDES employs a query-driven approach to focus MCMC on the query nodes to avoid computation over the entire factor graph herunterladen.

System Architecture

To efficiently process queries, we design three key components of ARCHIMEDES: an inference engine for efficient knowledge expansion to derive implicit knowledge from existing KBs, query-driven inference to compute probabilities of the query facts, and the UDA-GIST framework for in-database data-parallel and graph-parallel analytics. We provide a user interface for load, search, and update queries. The system architecture is shown in figure below:

Knowledge Expansion

To efficiently apply the inference rules, we represent a knowledge base as relational tables, first introduced by ProbKB.  Based on the relational model, we express the knowledge expansion algorithm as join queries between the facts and rules tables, one join for each rules table fe-font download for free. Our experiments show that applying rules in batches results in a 200-300 times of speedup over the state-of-the-art approaches.

Query-Driven Inference

ARCHIMEDES uses query-driven inference to speed up MLN inference algorithms by focusing computation on the query facts. The query-driven inference algorithm is designed with the UDA-GIST analytics framework to achieve efficient inference in a relational database system. Furthermore, we use K-hop approximation to focus computation on the query facts.

(a) The original factor graph background image for free. (b) 2-hop network.

We use the MCMC algorithms to compute the probabilities defined by the K-hop network. We optimize MCMC on the UDA-GIST in-database analytics framework: we build the factor graph by relational operations with User Defined Aggregates (UDAs) and compute probabilities of query facts by MCMC with General Iterative State Transition (GIST). The combined UDA-GIST framework extends relational database sys- tems to support algorithms requiring both data- and graph- parallel computation, including MCMC and MC-SAT kostenlos antivirus herunterladen. We describe the design of UDA-GIST in below:

UDA-GIST

Most major DBMSes support User-Defined Aggregates (UDAs) for parallel data analytics. UDAs are suitable for data-parallel analytics where data are naively partitioned and computation is performed on the partitions in parallel. However, UDAs do not support efficient statistical inference algorithms that perform iterative transitions over a large state, where the state is a graph-like data structure. The computation is not naively partitioned due to data dependency within the state as DBMSes are fundamentally data driven and computation is tied to the processing of tuples sims 4 kostenlos herunterladen origin.

To answer the demand of supporting in-database graph-parallel analytics, we propose the GIST abstraction to generalize the GraphLab API. We integrate the GIST operator into DBMSes with UDAs and User-Defined Functions. UDA-GIST unifies data-parallel (e.g., graph materialization) and graph- parallel computation (e.g., inference) into an integrated in-database analytics framework.

Experiment Result

Knowledge Expansion

To evaluate knowledge expansion, we use Tuffy as the baseline mods in minecraft downloaden. In figures below, (a) (b) compare performance of ARCHIMEDES with Tuffy on the synthetic knowledge base with varying numbers of facts and rules.

We see that ARCHIMEDES achieves more than 200 times of speedup over Tuffy for 107 facts. The speedup benefits from the batch application of rules with join operations supported by the relational knowledge base model. The precision of the inferred facts is shown in (c). We use semantic constraints and rule cleaning to improve precision of the inferred facts. Combining these two methods, we are able to infer 22,654 new facts at precision 0.65 using top 50% rules, and 16,394 new facts at precision 0.75 using top 20% rules software sicher downloaden.

Knowledge expansion results. (a)(b) Performance comparison with Tuffy. (c) Quality improvement on Reverb-Sherlock.

Query-Driven Inference

Below figures (a)-(c) report the runtime results for query-driven inference by K-hop approximation with different num- bers of hops from large, medium, and small clusters. We can see that query-driven inference achieves a speedup of more than one order of magnitude compared to using the entire factor graph for computation with acceptable error rates in the computed probabilities herunterladen.

(a)-(c) Performance of query-driven inference. (d)(e) Performance improvement of UDA-GIST com- pared to GraphLab.

We evaluate the performance and scalability of UDA- GIST by cross-document coreference using the Wikilinks datasets shown in (d) (d). Using our solution, within a manageable 10-hour computation in a single system the coreference analysis can be performed on the entire Wikilink dataset, 27 times larger than achieved by the current state-of-the-art.

For complete paper with references, please see paper

publications research directions

 Previous Post

Extracting Visual Knowledge from the Web with Multimodal Learning

― May 26, 2017

Next Post 

Multimodal Learning for Web Information Extraction

― November 8, 2017

Related Articles

A Brief Overview of Weak Supervision
DRUM: End-To-End Differentiable Rule Mining On Knowledge Graphs
IDTrees Data Science Challenge: 2017
Efficient Conditional Rule Mining over Knowledge Bases
Taming The Data Monster To Make Better Decisions

Sponsors

NIST

Adobe_Logo

DTCC

pcori uf-clinical

ICHP

Recent Posts

Recent Posts

  • A Brief Overview of Weak Supervision
  • DRUM: End-To-End Differentiable Rule Mining On Knowledge Graphs
  • IDTrees Data Science Challenge: 2017
  • Efficient Conditional Rule Mining over Knowledge Bases
  • Taming The Data Monster To Make Better Decisions

Related Blogs

  • ampLab
  • Data Beta
  • Fast ML

Post Categories

Categories

  • courses
  • ecology
  • NIST and open eval
  • publications
  • research
  • research directions
  • survey
  • Uncategorized

Archives

Archives

  • October 2020
  • December 2019
  • April 2019
  • December 2018
  • August 2018
  • February 2018
  • November 2017
  • June 2017
  • May 2017
  • March 2017
  • December 2016
  • October 2016
  • April 2016
  • March 2016
  • December 2015
  • November 2015
  • October 2015
  • May 2015
  • November 2014
  • October 2014
  • July 2014
  • May 2014
  • March 2014
  • December 2013
  • November 2013
  • October 2013
  • September 2013

Meta

DSR Wiki
Site Admin
WordPress.org

Recent Posts

  • A Brief Overview of Weak Supervision
  • DRUM: End-To-End Differentiable Rule Mining On Knowledge Graphs
  • IDTrees Data Science Challenge: 2017
  • Efficient Conditional Rule Mining over Knowledge Bases
  • Taming The Data Monster To Make Better Decisions
  • Whsmith Lodgers Agreement
  • What To Ask For In A Prenuptial Agreement
  • What Is Department Of State Corporation Bureau Or Business Partnership Agreement
  • What Is A General Security Agreement Nz
  • What Agreement Led To The Establishment Of The Euro A Common European Currency Quizlet
  • Vmware Service Provider License Agreement
  • Validity Of Debt Agreement In India
  • University Of Manitoba Unifor Collective Agreement
  • U.s.-China Trade Agreement 1999
  • Training Agreement Plan Definition
  • Thoroughbred Lease Agreement
  • The Canada-Us-Mexico Agreement Enters Into Force July 1
  • Td Ameritrade Brokerage Agreement
  • Subscription Service Agreements
  • Subject And Verbs Agreement
  • Standard Non Disclosure Agreement Australia
  • Source Code Development Agreement
  • Simple One Page Room Rental Agreement Pdf
  • Shareholders Agreements Sweet Maxwell
  • Service Purchase Agreement Meaning