• Home
  • Blog
  • People
  • Projects
  • Publications
  • Seminars
  • DSR Expo
  • Courses

Data Science Research

Menu
  • Home
  • Blog
  • People
  • Projects
  • Publications
  • Seminars
  • DSR Expo
  • Courses

Archer: Query-Driven Machine Learning

Archer: Query-Driven Machine Learning

 

In the Archer project we develop techniques for adapting analytics in response to a query as opposed to general computation. Instead of doing a SELECT * FROM Table, as with typical machine learning problems, we are integrating selection style queries, SELECT * FROM Table WHERE X, into typical analytics.

Knowledge Base Acceleration

Wikipedia is the go to knowledge base for information on events, people and scores of other topics.  Wikipedia is collaboratively edited but the number of editors is far below the number of entities so it often takes a long time for important information to be added to the knowledge base.

Knowledge Base Acceleration (KBA) task reads streams of documents and recommends documents to be cited by knowledge base pages. Several issues are involved with this tasks:

  • Many documents in the stream are not relevant, millions of these documents must be filtered.
  • Some document refer to the different entities of the same name. It is important to understand what entity a document it referring too.
  • Some information is not sufficient for citation. Event may have happened, but they may not be notable enough to be included in the knowledge base.

In this work we attempt to filter a stream of document and suggest pieces of information to be added to a set of Wikipedia entities.

Faculty: Daisy Zhe Wang
Students: Christan Grant, Morteza Shahriari, Yang Peng
Collaborators: Milenko Petrovic (IHMC)

Query-Driven Entity Resolution

archerEntity resolution (ER) is the process of determining records (mentions) in a database that correspond to the same real-world entity. Leading ER systems solve this problem by resolving every record in the database; however, for large datasets this is an expensive process. Moreover, such approaches are wasteful because in practice, users are interested in only one or a small subset of the entities mentioned in the database. In this work, we introduce new classes of SQL queries involving ER operators — selection-driven ER and join-driven ER. We develop novel variations of Metropolis Hastings algorithm and introduce selectivity-based scheduling algorithms to support the two classes of ER queries.

Faculty: Daisy Zhe Wang
Students: Christan Grant
Collaborators: Michael Wick (UMass)

 

 

 

Recent Posts

  • DBSim: Extensible Database Simulator for Fast Prototyping In-Database Algorithms
  • DrugEHRQA: A Question Answering Dataset on Structured and Unstructured Electronic Health Records For Medicine Related Queries
  • A Brief Overview of Weak Supervision
  • DRUM: End-To-End Differentiable Rule Mining On Knowledge Graphs
  • IDTrees Data Science Challenge: 2017

Categories

  • courses
  • ecology
  • NIST and open eval
  • publications
  • research
  • research directions
  • survey
  • Uncategorized

Archives

  • February 2023
  • October 2020
  • December 2019
  • April 2019
  • December 2018
  • August 2018
  • February 2018
  • November 2017
  • June 2017
  • May 2017
  • March 2017
  • December 2016
  • October 2016
  • April 2016
  • March 2016
  • December 2015
  • November 2015
  • October 2015
  • May 2015
  • November 2014
  • October 2014
  • July 2014
  • May 2014
  • March 2014
  • December 2013
  • November 2013
  • October 2013
  • September 2013