• Home
  • Blog
  • People
  • Projects
  • Publications
  • Seminars
  • DSR Expo
  • Courses

Data Science Research

Menu
  • Home
  • Blog
  • People
  • Projects
  • Publications
  • Seminars
  • DSR Expo
  • Courses

A Challenge for Long-Term Knowledge Base Maintenance

Knowledge bases (KBs) are repositories of interconnected facts with an inference engine. Companies are increasingly populating KBs with facts from disparate sources to create a central repository of information to provide users with a richer and more integrated user experience [Herman and Delurey 2013]. Additionally, inference over the constructed KB can produce new facts not specifically mentioned in the KB. Google is now employing KBs to surface additional information for user search [Dong et al. 2014a]. Manually constructed KBs, such as YAGO [Hoffart et al. 2013] and DBpedia [Auer et al. 2007], are increasingly being used as the gold standard and ground truth of newer KBs [Dong et al. 2014b]. However, the growing number of KBs inside an organization require a sufficiently high level of quality and must be meticulously maintained. Both YAGO and DBPedia were constructed based on data from Wikipedia. Within Wikipedia, the medium lag between the occurrence of a notable event and the addition of the event was measured at 356 days [Frank et al. 2012]. This fact spurred many efforts to discover methods to automatically build, extend, and clean KBs [Frank et al. 2012; Ellis et al. 2012; Ji et al. 2014; Surdeanu and Ji 2014]. In these contests, teams build systems to explore the creation of Web-scale KBs; however, by and large, these contests stop short of designing systems for deployment in a production system. We believe that there are two main questions that are wholly understudied across research communities: in KBs, over time, (1) what stale information needs to be cleaned? and (2) when should this information be updated? In this article, we present a challenge to the information quality community to develop techniques that support the long-term support and maintenance of critical, rapidly growing KBs. We follow this challenge with two notable papers that make strides in this direction. We end this group of papers with a discussion of three research questions in response to this challenge.

Authors: 

Christan Grant, Daisy Zhe Wang

Bibtex:

@article{,
 author = "Christan Grant, Daisy Zhe Wang",
 title = "A Challenge for Long-Term Knowledge Base Maintenance",
 journal = "Proceedings of ACM Journal on Data and Information Quality",
 year = "2015"
}

Download:
[pdf]

Recent Posts

  • DBSim: Extensible Database Simulator for Fast Prototyping In-Database Algorithms
  • DrugEHRQA: A Question Answering Dataset on Structured and Unstructured Electronic Health Records For Medicine Related Queries
  • A Brief Overview of Weak Supervision
  • DRUM: End-To-End Differentiable Rule Mining On Knowledge Graphs
  • IDTrees Data Science Challenge: 2017

Categories

  • courses
  • ecology
  • NIST and open eval
  • publications
  • research
  • research directions
  • survey
  • Uncategorized

Archives

  • February 2023
  • October 2020
  • December 2019
  • April 2019
  • December 2018
  • August 2018
  • February 2018
  • November 2017
  • June 2017
  • May 2017
  • March 2017
  • December 2016
  • October 2016
  • April 2016
  • March 2016
  • December 2015
  • November 2015
  • October 2015
  • May 2015
  • November 2014
  • October 2014
  • July 2014
  • May 2014
  • March 2014
  • December 2013
  • November 2013
  • October 2013
  • September 2013