Data Science Research (DSR) Lab at the University of Florida focuses on large-scale data management, data mining and data analysis using technologies from database management Systems (DBMS’s), Statistical Machine Learning (SML), and Information Visualization. Such research in a Big Data era is called Data Science, which is a profession, a research agenda, as well as a sport! The goal of Data Science research is to build systems and algorithms to extract knowledge, find patterns, generate insights and predictions from diverse data for various applications and visualization.
The research challenges in Data Science research include:
- Terabytes, even petabytes of data are generated each day;
- Almost every discipline is facing big data analysis problems, including medical sciences, life sciences, bio-informatics, law school, civil engineering and government;
- Data comes in different forms, such as free text, structured data, audio/video, images;
- Analysis tasks performed over the data are becoming more and more sophisticated;
- High performance computing platforms are advancing fast (e.g., cloud computing, multi-core machines, GPU, mobile-computing);
- Communication and feedback needs to be established between machine, algorithms and people.
The Archimedes Project
The Archimedes project aims at building a probabilistic master knowledge base system by combining novel system components and algorithms that we are designing and building at UF. In the context of the Archimedes project, we pursue a spectrum of research directions we are exploring at the UF Data Science Research (DSR) group including: query-driven and scalable statistical inference, probabilistic data models, state-parallel and data parallel data analytics framework, multimodal (e.g., text, image) information extraction, and KB schema enrichment. This line of research on supporting large-scale automatically extracted knowledge bases is of high impact for many application domains from medical informatics to ecology. We have received funding from industry as well as federal government including NSF, DARPA, EMC/Greenplum, Amazon, Pivotal and Google. Other related projects include DeepDive from Stanford, YAGO from Max Planck Institute, NELL from CMU as well as WikiData/Freebase and Google Knowledge Vault.
- DARPA: ECOLE:CReLeRI: Concept-centric Representation, Learning, Reasoning, and Interaction (2023-2026)
- DARPA: DEFT: Deep Extraction and Filtering of Text (2013-2017)
- NSF: III: Eureka: Efficient Query Processing over Large Probabilistic Knowledge Bases (2015-2021)
- DARPA: AIDA: Active Interpretation of Desperate Alternatives (2018-2022)
- NSF: MRA: Disentangling cross-scale influences on tree species, traits, and diversity from individual trees to continental scales (2019-2022)
Hiring! the DSR@UF lab is currently looking for exceptional candidates to fill a PhD student position.
Hiring! the DSR@UF lab is currently looking for exceptional candidates to fill a Postdoc and multiple graduate student positions.
News
- [Oct 2025] Our paper by Michael Perez et. al.: CReLeRI: Explainable, Concept-centric, Representation, Learning, Reasoning, and Interaction Video Analysis System get accepted by ACM MM 2025.
- [Aug 2025] Our paper by Reza Shahriari et. al.: MuCHEx: A Multimodal Conversational Debugging Tool for Interactive Visual Exploration of Hierarchical Object Classification get accepted by IEEE Computer Graphics and Applications 2025.
- [May 2025] Our paper by Haodi Ma and Yifan Wang et. al.: LaPuda: LLM-Enabled Policy-Based Query Optimizer for Multi-modal get accepted by PAKDD 2025.
- [May 2025] Our paper by Bushi Xiao et. al.: From Text to Multi-Modal: Advancing Low-Resource-Language Translation through Synthetic Data Generation and Cross-Modal Alignments get accepted by Eighth Workshop on Technologies for Machine Translation, 2025.
- [Nov 2024] Our paper by Ira Harmon et. al.: A Neuro-Symbolic Framework for Tree Crown Delineation and Tree Species Classification get accepted by MDPI Remote Sensing 2024.
- [Feb 2024] Our paper by Yang Bai et. al.: M3: A Multi-Task Mixed-Objective Learning Framework for Open-Domain Multi-Hop Dense Sentence Retrieval get accepted by COLING 2024.
- [Oct 2023] Our paper by Anthony Colas and Haodi Ma et. al.: Can Knowledge Graphs Simplify Text? is published in CIKM 2023.
- [Oct 2023] Our paper by Yifan Wang et. al.: LIDER: An Efficient High-dimensional Learned Index for Large-scale Dense Passage Retrieval is published in VLDM 2023.
- [Jul 2023] Our research was funded ($4.4M) by DARPA about Concept-centric Representation, Learning, Reasoning, and Interaction (CReLeRI) (PI: Zhiting Hu (UCSD) Co-PIs: Jaime Ruiz (UF), Daisy Wang (UF), Eric Xing (CMU), Jun-Yan Zhu (CMU)).
- [Jul 2023] Our paper by Yang Bai et. al.: MythQA: Query-Based Large-Scale Check-Worthy Claim Detection through Multi-Answer Open-Domain Question Answering is published in SIGIR 2023.
- [May 2023] Our paper by Ira Harmon et. al.: Improving Rare Tree Species Classification Using Domain Knowledge is published in IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, VOL. 20, 2023.
- [Mar 2023] Our paper by Yifan Wang et. al.: Learned Accelerator Framework for Angular-Distance-Based High-Dimensional DBSCAN is published in EDBT 2023.