Daisy Zhe Wang
Arnold and Lisa Goldberg Rising Star Associate Professor
Director, Data Science Research Lab
Computer and Information Science and Engineering (CISE)
College of Engineering, University of Florida
Gainesville, FL 32611
Office: E456 CSE Building
Phone: (352) 294-6677; Fax: (352) 392-1220
Office Hours: TBA
Daisy Zhe Wang is an Associate Professor in the CISE department at the University of Florida. She is the Director of the Data Science Research Lab at UF. She obtained her Ph.D. degree from the EECS Department at the University of California, Berkeley in 2011 and her Bachelor’s degree from the ECE Department at the University of Toronto in 2005. At Berkeley, she was a member of the Database Group and the AMP/RAD Lab. She is particularly interested in bridging scalable data management and processing systems with probabilistic models and statistical methods. She currently pursues research topics such as probabilistic databases, probabilistic knowledge bases, large-scale inference engines, query-driven interactive machine learning, and crowd assisted machine learning. She received Google Faculty Award in 2014. Along with other co-authors, she received “10-Years Test-of-Time” Award for the 2018 VLDB paper on WebTables. She received Arno and Lisa Goldberg Rising Star Associate Professorship in 2019. She is inducted into the Senior Members of ACM in 2022. Her research is externally funded by NSF, NIH, DARPA, NIST, Google, Amazon, Pivotal, Greenplum/EMC, Adobe, DTCC, Sandia National Labs and Harris Corporation.
If you are an undergrad/graduate student interested in data science research, please refer to Prospective Students.
- [Aug 2022] Our paper by Anthony Colas et. al.: GAP: A Graph-ware Language Model Framework for Knowledge Graph-to-Text Generation is going to appear in proceedings of COLING 2022 in Republic of Korea.
- [Jul 2022] Our paper by Yifan Wang et. al.: Extensible Database Simulator for Fast Prototyping In-Database Algorithms is going to appear in proceedings of ACM CIKM 2022 in Atlanta Georgia.
- [Mar 2022] Our paper by Jayetri Bardhan et. al.: DrugEHRQA: A Question Answering Dataset on Structured and Unstructured Electronic Health Records for Medicine Related Queries is published in proceedings of LREC 2022 in Marseille, France.
- [Dec 2021] Our paper by Anthony Colas et. al.: EventNarrative: A large-scale Event-centric Dataset for Knowledge Graph-to-Text Generation is published in NeurIPS 2021.
- [Jan 2021] Our paper by Ali Sadeghian et. al.: ChronoR: Rotation Based Temporal Knowledge Graph Embedding is published in AAAI 2021.
- [Dec 2020] Our work in GAIA at SM-KBP 2020 – A Dockerized Multi-media Multi-lingual Knowledge Extraction, Clustering, Temporal Tracking and Hypothesis Generation System in the DARPA AIDA program achieved top performance in the hypotheses generation task.
- [June 2020] The University of Florida Weecology Lab in collaboration with the DSR Lab is holding a data science challenge. The goal is to delineate tree crowns and classify tree species from hyperspectral and RGB remote sensing data . The IDTreeS challenge is open to all participants until August 15th.
- [Dec 2019] Our conference paper by Ali Sadeghian et. al.: DRUM: End-To-End Differentiable Rule Mining On Knowledge Graphs is published at the NeurIPS 2019.
- [Nov 2019] The join work with Sarvesh Soni et.al. and Dr. Roberts at the School of Biomedical Informatics at the University of Texas Health Science Center: Using FHIR to Construct a Corpus of Clinical Questions Annotated with Logical Forms and Answers is published at the AMIA 2019.
- [June 2019] The collaboration between faculty at the UF College of Engineering and UF IFAS has won an NSF award (1.2M) on MRA: Disentangling cross-scale influences on tree species, traits, and diversity from individual trees to continental scales to investigate cross-scale tree identification.
- [Sept 2018] We are participating in the 2018 TAC Streaming Multimedia Knowledge Base Population (SM-KBP) Task 3 Hypotheses generation evaluation as part of the DARPA AIDA program . Dr. Wang was featured in the UF College of Engineering new article “Taming the Data Monster to make Better Decisions“.
- [June 2018] Prof. Daisy Zhe Wang and her collaborators have been awarded a 2018 Very Large Databases (VLDB) 10-Year Test-of-Time Award for their paper, “WebTables: exploring the power of tables on the web.” This award is given to the VLDB paper published ten years earlier that has had the most influence since its publication.
- [March 2018] We are part of a larger PRISMA-P (Precision and Intelligent Systems in Medicine) project, funded by NIH, since its inception from 2013. One of key publications is accepted to Annals of Surgery. We continue to expand our research experience in biomedical and transnational research through project such as Rose and PRISMA-P.
- [Jan 2018] We are part of a newly funded NSF IUCRC (Industry and University Cooperative Research Center) program at the University of Florida: Center for Big Learning, whose goal is to push further the research, tech transfer and application of deep learning technologies.
- [Dec 2017] In collaboration with USC ISI, University of Columbia and RPI, we are selected to receive a grant to work on the DARPA Active INterpretation of Desperate Alternatives (AIDA) program. UF efforts (1.1M) is going to focus on mining hypothesis from probabilistic knowledge graphs constructed from multimedia event driven corpus.
- [Oct 2017] Supported by NIST and co-PIed with Dr. Ethan White from the Weecology Lab, the Data Science Evaluation (DSE) for Plant Identification with Neon Remote Sensing data is well underway. The tasks and evaluation guideline documents are released — Please join!
- [Aug 2017] The Apache Software Foundation Announces Apache® MADlib™ as a Top-Level Project. MADlib 1.12 released recently with Neural Nets implementation of multi-layer perceptron and Jupiter Notebook demonstrating its application over MNIST dataset.
- [May 2017] UF Clinical and Translational Science Institute (CTSI) and UF Institute for Child Health Policies (ICHP) sponsored the research and development of an intelligent virtual health navigator Rose that is supported by research from the DSR Lab.
Current Research Projects
- ProbKB: Large-scale Probabilistic Reasoning over Uncertain Knowledge Bases
- HypoGator: Distinct Hypotheses and Claims Retrieval with Stance Detection on Controversial topics
- DBlytics/MADLib: Textual Retrieval/Analytics in distributed MPP frameworks over hybrid hardware
- Archer: Query-Driven Machine Learning
- CAMeL: Leverage Crowd Support in Probabilistic Databases
- SigmaKB: Knowledge fusion, cleaning and knowledge base integration
- VITA: Multimodal Knowledge Extraction and Fusion
- SMARTeR: Smarter information retrieval system
- Rose: Knowledge Extraction and Exchange over Electronic Health Records
Current Ph.D. Students
- Haodi Ma
- Jayetri Bardhan
- Bai (Tony) Yang
- Yifan Wang
- Ira Harmon
- Anthony Colas
- Sean Goldberg (at Microsoft)
- Christan Grant (2015) University of Oklahoma
- Kun Li (2015) Google Inc
- Morteza Shahriari Nia (2016) Twitter Inc
- Yang Chen (2016) Google Inc
- Yang Peng (2017) Walmart Labs
- Xiaofeng Zhou (2018) Google Inc
- Dihong Gong (2019) Tencent Research
- Miguel Rodriguez (2020) Google Inc
- Ali Sadeghian (2021) Startup
- CAP4770, Introduction to Data Science, Spring 2022
- CAP4770, Introduction to Data Science, Spring 2020
- CAP4770/CAP5771, Introduction to Data Science, Fall 2019
- CAP4773/CAP6779, Project In Data Science, Spring 2018
- CAP4770/CAP5771, Introduction to Data Science, Fall 2017
- CAP4773/CAP6779, Project In Data Science, Spring 2017
- CAP4770/CAP5771, Introduction to Data Science, Fall 2016
- CAP4773/CAP6779, Project In Data Science, Spring 2016
- CAP4770/CAP5771, Introduction to Data Science, Fall 2015
- CIS4301, Information and Data Management Systems, Spring 2015
- CA4773/CIS6930, Projects in Data Science, Fall 2014
- CIS6930, Introduction to Data Science/Data Intensive Computing, Spring 2014
- COP5725, Data Management Systems, Fall 2013
- CIS6930, Data Science: Large-scale Advanced Data Analysis, Spring 2013
- COP5725, Data Management Systems, Fall 2012
- CIS4301, Information and Data Management Systems, Spring 2012
- CIS6930, Data Science: Large-scale Advanced Data Analysis, Fall 2011
- “Hypogator Hypotheses Generator”
- DARPA AIDA PI meeting, October 2021
- DARPA AIDA PI meeting, Feb 2021
- GAIA Site Visit, December 2020
- DARPA AIDA PI meeting, June 2020
- DARPA AIDA PI meeting, November 2019
- DARPA AIDA PI meeting, June 2019
- DARPA AIDA PI meeting, Jan 2019,
- DARPA AIDA PI meeting, August 2018
- “HypoGator: TAC SM-KBP and DARPA AIDA Hypotheses Generation TA3 Evaluation”
- TAC SM-KBP, Feb 2021
- TAC SM-KBP, November 2019
- TAC SM-KBP, August 2018
- “Neural-Symbolic models for Knowledge Graph Extraction and Reasoning”
- “When Deep Learning meets Logic” Workshop, Samsung Research at Cambridge, Feb 2021
- “Rose: Virtual Health Navigator From SCD to SDoH”
- UF Learning Health Systems and AI Symposium, Jan 2021
- “Inference, Learning and Question Answering over Knowledge Graphs”
- Amazon Alexa, June 2020
- “Measuring Impact of Climate Change on Tree Species”
- “Tackling Climate Change with Machine Learning” Workshop, NeurIPS 2019
- “Drum: End-to-end Differentiable Rule Mining on Knowledge Graphs”
- Paper poster presentation, NeurIPS 2019
- “QA with Alternative Hypotheses over Probabilistic Knowledge graph”
- DARPA AIDA PI meeting, August 2018
- “Weathering the (Technology) Hypes”
- New Researcher Symposium, SIGMOD, May 2017
- “Archimedes: A Probabilistic Master Knowledge Base System”
- Florida HLT Cofab, Feb 2017
- “Deep Learning over Large-scale Databases and Knowledge Graphs”
- NSF IUCRC for Big Learning Planning meeting, Jan 2017
- “Archimedes: A Probabilistic Knowledge Base to Combine Information Extraction from Diverse Sources”
- University of South California/Information Sciences Institute, Feb 2016
A Parable of Modern Research
Bob has lost his keys in a room which is dark except for one brightly lit corner.
“Why are you looking under the light, you lost them in the dark!”
“I can only see here.”