Data Science Research (DSR) Lab at the University of Florida focuses on large-scale data management, data mining and data analysis using technologies from database management Systems (DBMS’s), Statistical Machine Learning (SML), and Information Visualization. Such research in a Big Data era is called Data Science, which is a profession, a research agenda, as well as a sport! The goal of Data Science research is to build systems and algorithms to extract knowledge, find patterns, generate insights and predictions from diverse data for various applications and visualization.
The research challenges in Data Science research include:
- Terabytes, even petabytes of data are generated each day;
- Almost every discipline is facing big data analysis problems, including medical sciences, life sciences, bio-informatics, law school, civil engineering and government;
- Data comes in different forms, such as free text, structured data, audio/video, images;
- Analysis tasks performed over the data are becoming more and more sophisticated;
- High performance computing platforms are advancing fast (e.g., cloud computing, multi-core machines, GPU, mobile-computing);
- Communication and feedback needs to be established between machine, algorithms and people.
The Archimedes project aims at building a probabilistic master knowledge base system by combining novel system components and algorithms that we are designing and building at UF. In the context of the Archimedes project, we pursue a spectrum of research directions we are exploring at the UF Data Science Research (DSR) group including: query-driven and scalable statistical inference, probabilistic data models, state-parallel and data parallel data analytics framework, multimodal (e.g., text, image) information extraction, and KB schema enrichment. This line of research on supporting large-scale automatically extracted knowledge bases is of high impact for many application domains from medical informatics to ecology. We have received funding from industry as well as federal government including NSF, DARPA, EMC/Greenplum, Amazon, Pivotal and Google. Other related projects include DeepDive from Stanford, YAGO from Max Planck Institute, NELL from CMU as well as WikiData/Freebase and Google Knowledge Vault.
- DARPA: DEFT: Deep Extraction and Filtering of Text (2013-2017)
- NSF: Eureka: Efficient Query Processing over Large Probabilistic Knowledge Bases (2015-2020)
- DARPA: AIDA: Active Interpretation of Desperate Alternatives (2018-2022)
New! the DSR@UF lab is currently looking for exceptional candidates to fill a PhD student position.
New! the DSR@UF lab is currently looking for exceptional candidates to fill a Postdoc and multiple graduate student positions.
- [Sept 2018] We are participating in the 2018 TAC Streaming Multimedia Knowledge Base Population (SM-KBP) Task 3 Hypotheses generation evaluation as part of the DARPA AIDA program. Dr. Wang was featured in the UF College of Engineering new article “Taming the Data Monster to make Better Decisions“.
- [June 2018] Prof. Daisy Zhe Wang and her collaborators have been awarded a 2018 Very Large Databases (VLDB) 10-Year Test-of-Time Award for their paper, “WebTables: exploring the power of tables on the web.” This award is given to the VLDB paper published ten years earlier that has had the most influence since its publication.
- [March 2018] We are part of a larger PRISMA-P (Precision and Intelligent Systems in Medicine) project, funded by NIH, since its inception from 2013. One of key publications is accepted to Annals of Surgery. We continue to expand our research experience in biomedical and transnational research through project such as Rose and PRISMA-P.
- [Jan 2018] We are part of a newly funded NSF IUCRC (Industry and University Cooperative Research Center) program at the University of Florida: Center for Big Learning, whose goal is to push further the research, tech transfer and application of deep learning technologies.
- [Dec 2017] In collaboration with USC ISI, University of Columbia and RPI, we are selected to receive a grant to work on the DARPA Active INterpretation of Desperate Alternatives (AIDA) program. UF team is going to focus on mining hypothesis from probabilistic knowledge graphs constructed from multimedia event driven corpus.
- [Oct 2017] Supported by NIST and co-PIed with Dr. Ethan White from the Weecology Lab, the Data Science Evaluation (DSE) for Plant Identification with Neon Remote Sensing data is well underway. The tasks and evaluation guideline documents are released — Please join!
- [Aug 2017] The Apache Software Foundation Announces Apache® MADlib™ as a Top-Level Project. MADlib 1.12 released recently with Neural Nets implementation of multi-layer perceptron and Jupiter Notebook demonstrating its application over MNIST dataset.
- [May 2017 UF Clinical and Translational Science Institute (CTSI) and UF Institute for Child Health Policies (ICHP) sponsored the research and development of an intelligent virtual health navigator Rose that is supported by research from the DSR Lab.
- [Jan 2017] Our journal paper by Sean Goldberg et. al.: pi-CASTLE: A Probabilistically Integrated System for Crowd-Assisted Text Labeling and Extraction is published in ACM JDIQ (Journal of Data and Information Quality), 2017.
- More News