• Home
  • Blog
  • People
  • Projects
  • Publications
  • Seminars
  • DSR Expo
  • Courses

Data Science Research

Menu
  • Home
  • Blog
  • People
  • Projects
  • Publications
  • Seminars
  • DSR Expo
  • Courses

Projects

Archimedes: A Probabilistic Master Knowledge Base System

Archimedes

The Archimedes project aims at building a probabilistic master knowledge base system by combining novel system components and algorithms that we are designing and building at UF. In the context of the Archimedes project, we pursue a spectrum of research directions we are exploring at the UF Data Science Research (DSR) group including: query-driven and scalable statistical inference, probabilistic data models, state-parallel and data parallel data analytics framework, multimodal (e.g., text, image) information extraction, and KB schema enrichment. This line of research on supporting large-scale automatically extracted knowledge bases is of high impact for many application domains from medical informatics to ecology. We have received funding from industry as well as federal government including DARPA, EMC, Amazon and Google. Other related projects include DeepDive from Stanford, YAGO from Max Planck Institute, NELL from CMU as well as WikiData/Freebase and Google Knowledge Vault.

ProbKBLarge-scale Probabilistic Reasoning over Uncertain Knowledge Bases
HypoGatorDistinct Hypotheses and Claims Retrieval with Stance Detection on Controversial topics
DBlytics/MADLibTextual Retrieval/Analytics in distributed MPP frameworks over hybrid hardware
ArcherQuery-Driven Machine Learning
CAMeLLeverage Crowd Support in Probabilistic Databases
SigmaKBKnowledge fusion, cleaning and knowledge base integration
RoseKnowledge Extraction and Exchange from Electronic Health Records
SMARTeRSmarter information retrieval system
VITAMultimodal knowledge extraction and fusion.

Selected Publications

  • ArchimedesOne: Query Processing over Probabilistic Knowledge Bases
    Xiaofeng Zhou, Yang Chen, Daisy Zhe Wang
    Proceedings of the VLDB Endowment, 2016
  • Ontological Pathfinding: Mining First-Order Knowledge from Large Knowledge Bases
    Yang Chen, Sean Goldberg, Daisy Zhe Wang, Soumitra Siddharth Johri
    Proceedings of the ACM SIGMOD International Conference on Management of Data, 2016
  • UDA-GIST: An In-database Framework to Unify Data-Parallel and State-Parallel Analytics
    Kun Li, Daisy Zhe Wang,  Alin Dobra, Christopher Dudley
    Proceedings of 41th VLDB Very Large Data Base Endowment, 2015
  • Efficient In-Database Analytics with Graphical Models
    Daisy Zhe Wang, Yang Chen, Christan Grant, Kun Li
    IEEE Data Engineering Bulletin, 2014
  • Knowledge Expansion over Probabilistic Knowledge Bases
    Yang Chen, Daisy Zhe Wang
    Proceedings of the ACM SIGMOD International Conference on Management of Data, 2014
  • CASTLE: Crowd-Assisted System for Textual Labeling and Extraction
    Sean Goldberg, Daisy Zhe Wang, Tim Kraska
    Proceedings of the First AAAI Conference on Human Computation and Crowdsourcing (HCOMP-13)
  • The MADlib Analytics Library or MAD Skills, the SQL
    Joseph M. Hellerstein, Christoper Re, Florian Schoppmann, Daisy Zhe Wang, Eugene Fratkin, Aleks Gorajek, Kee Siong Ng, Caleb Welton, Xixuan Feng, Kun Li, Arun Kumar
    Proceedings of 38th VLDB Very Large Data Base Endowment, 2012