Data Science Research (DSR) Lab at the University of Florida focuses on large-scale data management, data mining and data analysis using technologies from database management Systems (DBMS’s), Statistical Machine Learning (SML), and Information Visualization. Such research in a Big Data era is called Data Science, which is a profession, a research agenda, as well as a sport! The goal of Data Science research is to build systems and algorithms to extract knowledge, find patterns, generate insights and predictions from diverse data for various applications and visualization.
The research challenges in Data Science research include:
- Terabytes, even petabytes of data are generated each day;
- Almost every discipline is facing big data analysis problems, including medical sciences, life sciences, bio-informatics, law school, civil engineering and government;
- Data comes in different forms, such as free text, structured data, audio/video, images;
- Analysis tasks performed over the data are becoming more and more sophisticated;
- High performance computing platforms are advancing fast (e.g., cloud computing, multi-core machines, GPU, mobile-computing);
- Communication and feedback needs to be established between machine, algorithms and people.
The Archimedes project aims at building a probabilistic master knowledge base system by combining novel system components and algorithms that we are designing and building at UF. In the context of the Archimedes project, we pursue a spectrum of research directions we are exploring at the UF Data Science Research (DSR) group including: query-driven and scalable statistical inference, probabilistic data models, state-parallel and data parallel data analytics framework, multimodal (e.g., text, image) information extraction, and KB schema enrichment. This line of research on supporting large-scale automatically extracted knowledge bases is of high impact for many application domains from medical informatics to ecology. We have received funding from industry as well as federal government including NSF, DARPA, EMC/Greenplum, Amazon, Pivotal and Google. Other related projects include DeepDive from Stanford, YAGO from Max Planck Institute, NELL from CMU as well as WikiData/Freebase and Google Knowledge Vault.
- [March 2016] UF DSR Lab is invited to participate the NIST Data Science pre-pilot evaluation workshop 2016 and will be presenting (1) the results of the 2015 NIST Data Science pre-pilot evaluation participation from UF and (2) a proposal of a new Data Science evaluation on Computational Ecology using remote sensing and data from the NSF Neon program.
- [March 2016] Consensus Maximization Fusion of Probabilistic Information Extractors by Miguel Rodriguez et. al is accepted at HTL NAACL 2016. This CMF algorithm participated in the TAC KBP SVF evaluation organized by NIST in 2015 and achieved top 3 ranked results in CSSF/CSKB and overall ensemble runs.
- [Feb 2016] Prof. Daisy Zhe Wang visited Computer Science at University of Miami, Information Sciences Institute at University of South California and gave talks on different aspects of Archimedes. I also visited UC Irvine to discuss research projects.
- [Jan 2016] The NLP expertise in the UF DSR lab was drawn upon by UF CTSI and supporting the newly funded OneFlorida Clinical Research Consortium, which was recently designated as one of the nation’s 13 clinical data research networks, or CDRNs, by the Patient-Centered Outcomes Research Institute (PCORI) to accelerate the translation of promising research findings into improved patient care.
- [Spring 2016] Prof. Daisy Zhe Wang is advising four student projects in CAP4773/CAP6779 Project in Data Science: (1) contributing to Apache MADlib; (2) Legal citation graph analytics and Case predictions; (3) automatically extracting biomedical knowledge bases; and (4) distributed RDF store for query processing over large knowledge bases.
- [Nov 2015] Ontological Pathfinding: Mining First-Order Knowledge from Large Knowledge Bases by Yang Chen et. al is accepted at SIGMOD 2016.
- [October, 2015] The MADlib, open-source library for scalable in-database analytics, is now an Apache Software Foundation Incubator project: MADlib@ASF. We are excited to continue our contribution!
- [September, 2015] Our research on “Efficient Query Processing over Large Probabilistic Knowledge Bases” is funded for 3 years by NSF IIS Div. of Information & Intelligent Systems.
- [August, 2015] As par of the University of Florida Engineering team, Prof. Daisy Zhe Wang visited the Harris Corporation, presented and discussed past and future research and development projects at the Harris Technology Center.
- [August, 2015] Congratulations to Dr. Christan Grant on successfully defended his Ph.D. thesis on “Query-Driven Text Analytics for Knowledge Extraction, Resolution, and Inference“. Best of luck starting the Assistant Professorship at the University of Oklahoma in Data Science!
- [May, 2015] Congratulations to Dr. Kun Li on successfully defending his Ph.D. thesis on “In-Database Large-Scale Statistical Data Analysis“. Best of luck heading over to Google!
- [April, 2015] Prof. Daisy Zhe Wang visited Bay Area and gave talks at UC Berkeley AMP Lab Seminar and Google Research on “Archimedes: A Probabilistic Master Knowledge Base System”.
- [Feb, 2015] Harris Corporation provided a seed fund to UF DSR group to conduct a Research Excellence Endowment Project, in which Archimedes is the targeted smart big data engine to be implemented over the Gator SmartCloud. This is a collaboration with Prof. Xiaolin (Andy) Li from ECE.
- [Jan, 2015] Sean Goldberg has been selected as a new Sandia Campus Exec Fellow at UF from 2015 to 2017. Data Science in one of the recently identified Sandia Research Challenges. Congratulations to Sean!
- [Nov, 2014] Our paper UDA-GIST: An In-database Framework to Unify Data-Parallel and State-Parallel Analytics is accepted at VLDB 2015 (to appear).
- [Sept, 2014] We published an IEEE Bulletin journal paper describing the big picture of our on-going efforts in extending databases to support Efficient In-Database Analytics with Graphical Models.
- [June, 2014] Our paper Knowledge Expansion over Probabilistic Knowledge Bases is presented at The ACM SIGMOD International Conference on Management of Data, 2014.
- [Apr, 2014] Our paper Exploring Netflow Data using Hadoop is accepted by The Third ASE International Conference on Cyber Security, 2014.
- [Jan, 2014] Three-Course Data Science Curriculum @ UF CISE starts Spring 2014 with a first course in the series — Introduction to Data Science. For more information, please refer to the Dec 2013 blog post on this.