Daisy Zhe Wang
The IEEE Bulletin September 2014 Special Issue published articles describing efforts from various research groups on the recently emerged theme of Databases, Declarative Systems and Machine Learning. The 7 research project/groups are:
- University of Washington: Lifted Probabilistic Inference: A Guide for the Database Researcher by Eric Gribkoff, Dan Suciu, and Guy Van den Broeck
- University of Oxford: Probabilistic Data Programming with ENFrame by Dan Olteanu and Sebastiaan J. van Schaik
- Stanford University/University of Wisconsin: Feature Engineering for Knowledge Base Construction by Christopher Ré, Amir Abbas Sadeghian, Zifei Shan, Jaeho Shin, Feiran Wang, Sen Wu, Ce Zhang
- University of Florida: Efficient In-Database Analytics with Graphical Models by Daisy Zhe Wang, Yang Chen, Christan Grant, and Kun Li.
- IBM Research: SystemML’s Optimizer: Plan Generation for Large-Scale Machine Learning Programs by Matthias Boehm, Douglas R. Burdick, Alexandre V. Evfimievski, Berthold Reinwald, Frederick R. Reiss, Prithviraj Sen, Shirish Tatikonda, and Yuanyuan Tian
- Brown University:Tupleware: Distributed Machine Learning on Small Clusters by Andrew Crotty, Alex Galakatos, and Tim Kraska
- Duke University: Cumulon: Cloud-Based Statistical Analysis from Users Perspective
Botong Huang, Nicholas W.D. Jarrett, Shivnath Babu, Sayan Mukherjee, and Jun Yang
Following is a summary of this line of research led by Christan Grant, Kun Li and Yang Chen.
Due to application requirement for in-situ real-time query-driven advanced analytics, we need to extend database systems to perform large-scale in-database analytics based on probabilistic graphical models (PGM). Such an extension requires first-class modeling and implementation of probabilistic graphical models and algorithms to support query-time model-based inference and reasoning. New query optimization and view maintenance techniques are also needed to support queries with both relational and statistical analytics operations. The rest of the article describes our efforts to address three major challenges in supporting PGMs efficiently in a database system:
- Model Representation and Model-Data Join: First, we describe a relational representation of graphical models and model-data join algorithms that grounds a first-order PGM over a large number of data instances, resulting in a large propositional PGM.
- Efficient In-Database Statistical Inference: Second, we describe mechanisms that are required for efficient implementation of inference operations in a database system, including data-driven feature extraction and iterative inference computations over grounded PGMs.
- Optimizing Queries with Inference: Third, we describe new query optimization techniques to support online queries involving PGM-based analytics, which contain relational and inference operators over graphical models.
For more details, please refer to http://sites.computer.org/debull/A14sept/issue1.htm.