Daisy Zhe Wang
Director, Data Science Research Lab
Computer and Information Science and Engineering (CISE)
College of Engineering, University of Florida
Gainesville, FL 32611
Office: E456 CSE Building
Phone: (352) 505-7626; Fax: (352) 392-1220
Office Hours: Friday 3:00-4:00pm or by appointment
Daisy Zhe Wang is an Assistant Professor in the CISE department at the University of Florida. She is the Director of the Data Science Research Lab at UF. She obtained her Ph.D. degree from the EECS Department at the University of California, Berkeley in 2011 and her Bachelor´s degree from the ECE Department at the University of Toronto in 2005. At Berkeley, she was a member of the Database Group and the AMP/RAD Lab. She is particularly interested in bridging scalable data management and processing systems with probabilistic models and statistical methods. She currently pursues research topics such as probabilistic databases, probabilistic knowledge bases, large-scale inference engines, query-driven interactive machine learning, and crowd assisted machine learning. She received Google Faculty Award in 2014. Her research is currently funded by NSF, DARPA, Google, Amazon, Pivotal, Greenplum/EMC, Sandia National Labs and Harris Corporation.
If you are an undergrad/graduate student interested in data science research, please refer to Prospective Students.
- March 2016, UF DSR Lab is invited to participate the NIST Data Science pre-pilot evaluation workshop 2016 and will be presenting (1) the results of the 2015 NIST Data Science pre-pilot evaluation participation from UF and (2) a proposal of a new Data Science evaluation on Computational Ecology using remote sensing and data from the NSF Neon program.
- March 2016, Consensus Maximization Fusion of Probabilistic Information Extractors by Miguel Rodriguez et. al is accepted at HTL NAACL 2016. This CMF algorithm participated in the TAC KBP SVF evaluation organized by NIST in 2015 and achieved top 3 ranked results in CSSF/CSKB and overall ensemble runs.
- Feb 2016, I have visited Computer Science at University of Miami, Information Sciences Institute at University of South California and gave talks on different aspects of Archimedes. I also visited UC Irvine to discuss research projects.
- Jan 2016, I participated in the OneFlorida Clinical Research Consortium’s Second Annual Stakeholder meeting. The NLP expertise in the UF DSR lab was drawn upon by UF CTSI and supporting the newly funded OneFlorida Clinical Research Consortium, which was recently designated as one of the nation’s 13 clinical data research networks, or CDRNs, by the Patient-Centered Outcomes Research Institute (PCORI) to accelerate the translation of promising research findings into improved patient care.
- Spring 2016, I am advising four student projects in CAP4773/CAP6779 Project in Data Science: (1) contributing to Apache MADlib; (2) Legal citation graph analytics and Case predictions; (3) automatically extracting biomedical knowledge bases; and (4) distributed RDF store for query processing over large knowledge bases.
- Nov 2015, Ontological Pathfinding: Mining First-Order Knowledge from Large Knowledge Bases by Yang Chen et. al is accepted at SIGMOD 2016.
- In October 2015, the MADlib open-source library for scalable in-database analytics, is now an Apache Software Foundation Incubator project: MADlib@ASF. Student from the DSR Lab and my Data Science courses are excited to continue our contribution!
- My research on “Efficient Query Processing over Large Probabilistic Knowledge Bases” is funded by NSF IIS Div. of Information & Intelligent Systems starting Sept 1st 2015.
- Fall 2015, I will be teaching Introduction to Data Science and give guest lectures to informatics courses in other disciplines such as Foundations of Biomedical Informatics taught by Dr. William Hogan from UF CTSI.
- As a PC group leader for SIGMOD 2016, I will be part of the effort led by Dr. Sam Madden from MIT to use online conference calls to enhance paper review process.
- Summer 2015, the DSR Lab proudly graduated two Ph.D.’s: Dr. Kun Li and Dr. Christan Grant. One is current at Google and the other is starting as an assistant professor at the University of Oklahoma with their Data Science and Analytics program.
- In Spring and Summer 2015, I gave different version of the talk on “Archimedes: A Master Probabilistic Knowledge Base System” at Google Research, Berkeley AMP Lab, Sandia Livermore Lab, University of Toronto and Harris Coorporation.
- UDA-GIST: An In-database Framework to Unify Data-Parallel and State-Parallel Analytics by Kun Li et. al. is accepted at VLDB 2015.
- Fall 2014, I am co-teaching Projects in Data Science, the second course in the three-course UF CISE Data Science Curriculum with Dr. Sanjay Ranka.
- Together with Dr. Tyson Condie at UCLA, I serve as the Proceeding Chair for VLDB 2015.
- Knowledge Expansion over Probabilistic Knowledge Bases paper with my student Yang Chen was accepted and presented at SIGMOD 2014. I gave an invited talk in the WACCK workshop (Workshop on Automatic Creation and Curation of Knowledge Bases) at SIGMOD 2014.
- I gave a talk on Knowledge Base Construction from Big Text, Images and Crowds at a WISE event June 2014, organized by TRUST at Cornell University with Big Data research as the central theme.
- ProbKB: Large-scale Probabilistic Reasoning over Uncertain Knowledge Bases
- DBlytics/MADLib: Statistical Machine Learning and Text Analytics in MPP DBMS frameworks
- Archer: Query-Driven Machine Learning
- CAMeL: Leverage Crowd Support in Probabilistic Databases
- SigmaKB: Knowledge fusion, cleaning and knowledge base integration
- VITA: Multimodal Knowledge Extraction and Fusion
- SMARTeR: Smarter information retrieval system
- Panda: Knowledge Extraction and Exchange Using Medical Notes
- Past Projects
- Christan Grant (2015) University of Oklahoma
- Kun Li (2015) Google Inc
- Morteza Shahriari Nia (2016) Twitter Inc
- CAP4773/CAP6779, Project In Data Science, Spring 2016
- CAP4770/CAP5771, Introduction to Data Science, Fall 2015
- CIS4301, Information and Data Management Systems, Spring 2015
- CA4773/CIS6930, Projects in Data Science, Fall 2014
- CIS6930, Introduction to Data Science/Data Intensive Computing, Spring 2014
- COP5725, Data Management Systems, Fall 2013
- CIS6930, Data Science: Large-scale Advanced Data Analysis, Spring 2013
- COP5725, Data Management Systems, Fall 2012
- CIS4301, Information and Data Management Systems, Spring 2012
- CIS6930, Data Science: Large-scale Advanced Data Analysis, Fall 2011
- “Archimedes: A Probabilistic Knowledge Base to Combine Information Extraction from Diverse Sources”
- “UDA-GIST: An In-database Framework to Unify Data-Parallel and State-Parallel Analytics”
- VLDB 2015, Waikoloa Hawii, September 2015
- “Archimedes: A Master Probabilistic Knowledge Base System”
- University of Miami, Nov 2015
- Harris Coorporation, August 2015
- University of Toronto, July 2015
- Berkeley AMP Lab Seminar, April 2015
- Google Research, April 2015
- Sandia Livermore Lab, Jan 2015
- “Probabilistic Knowledge Base Construction from Big Text, Images and Crowds”
- TRUST WISE workshop at Cornell University, June 2014
- UF Big Data Workshop, June 2013
- “Probabilistic Knowledge Base Systems”
- Invited Talk, WACCK workshop at SIGMOD, June 2014
- Shanghai Jiaotong University, China, April 2014
- ECE Department, University of Florida, October 2013
- Fudan University, China, August 2013
- Google Research, EMC, April 2013
- Rochester Big Data Forum, October 2012
- “Hybrid In-Database Inference for Declarative Information Extraction” sigmod11slides
- SIGMOD Conference, June 15, 2011
- “Selectivity Estimation for Extraction Operators over Text Data” icde11slides
- ICDE Conference, April 14, 2011
- “Querying Probabilistic Information Extraction”
- EMC/Greenplum Seminar, July 11, 2011
- CSAIL Seminar, MIT, November 17, 2010.
- Database Seminar, University of Toronto, January 5, 2010.
- “Querying Probabilistic Information Extraction” pvldb10slides
- VLDB Conference, September, 2010
- “Probabilistic Declarative Information Extraction” icde10slides
- ICDE Conference, March, 2010
- “Declarative Information Extraction in a Probabilistic Database System”
- Info Lab Seminar, Stanford, May, 2009.
A Parable of Modern Research
Bob has lost his keys in a room which is dark except for one brightly lit corner.
“Why are you looking under the light, you lost them in the dark!”
“I can only see here.”