• Home
  • Blog
  • People
  • Projects
  • Publications
  • Seminars
  • DSR Expo
  • Courses

Data Science Research

Menu
  • Home
  • Blog
  • People
  • Projects
  • Publications
  • Seminars
  • DSR Expo
  • Courses

Publications

View Upcoming Venues

2023

  • Learned Accelerator Framework for Angular-Distance-Based High-Dimensional DBSCAN
    Yifan Wang and Daisy Zhe wang. The 26th International Conference on Extending Database Technology (EDBT), 2023.

2022

  • LIDER: An Efficient High-dimensional Learned Index for Large-scale Dense Passage Retrieval
    Yifan Wang, Haodi Ma, and Daisy Zhe Wang.
    International Conference on Very Large Databases [VLDB] 2023 in Vancouver, Canada.
  • GAP: A Graph-aware Language Model Framework for Knowledge Graph-to-Text Generation
    Anthony Colas, Mehrdad Alvandipour, and Daisy Zhe Wang.
    International Conference on Computational Linguistics [COLING] 2022 in the Republic of Korea.
  • Extensible Database Simulator for Fast Prototyping In-Database Algorithms
    Yifan Wang and Daisy Zhe Wang.
    ACM Conference on Information and Knowledge Management [CIKM] 2022 in Atlanta Georgia.
  • DrugEHRQA: A Question Answering Dataset on Structured and Unstructured Electronic Health Records for Medicine Related Queries
    Jayetri Bardhan, Anthony Colas, Kirk Roberts, and Daisy Zhe Wang.
    Language Resources and Evaluation Conference [LREC] 2022 in Marseille, France.

2021

  • Hotel2vec: Learning Hotel Embeddings from User Click Sessions with Side Information
    Ioannis Partalas, Anne Morvan, Ali Sadeghian, Shervin Minaee, Xinxin Li, Brooke Cowan and Daisy Zhe Wang
    ACM RecSys Workshop on Recommenders in Tourism 2021.
  • EventNarrative: A large-scale Event-centric Dataset for Knowledge Graph-to-Text Generation
    Anthony Colas, Ali Sadeghian, Yue Wang, Daisy Zhe Wang
    35th Conference on Neural Information Processing Systems 2021.
  • Data science competition for cross-site delineation and classification of individual trees from airborne remote sensing data
    Sarah Jane Graves, Sergio Marconi, Dylan Stewart, Ira Harmon, Ben G. Weinstein, Yuzi Kanazawa, Victoria M Scholl, Maxwell B Joseph, Joseph McClinchy, Luke Browne, Megan K Sullivan, Sergio Estrada-Villegas, Eduardo Tusa, Daisy Zhe Wang, Aditya Singh, Stephanie A Bohlman, Alina Zare, Ethan P. White
    bioRxiv, 2021.
  • ChronoR: Rotation Based Temporal Knowledge Graph Embedding
    Ali Sadeghian, Reza Armandpour, Anthony Colas, Daisy Zhe Wang
    AAAI, 2021.

2020

  • TutorialVQA: Question Answering Dataset for Tutorial Videos
    Anthony Colas, Seokhwan Kim, Franck Dernoncourt, Siddhesh Gupte, Daisy Zhe Wang, Doo Soon Kim
    IREC, 2020.
  • GAIA at SM-KBP 2020 – A Dockerized Multi-media Multi-lingual Knowledge Extraction, Clustering, Temporal Tracking and Hypothesis Generation System
    Manling Li, Ying Lin, Tuan Manh Lai, Xiaoman Pan, Haoyang Wen, Sha Li, Zhenhailong Wang, Pengfei Yu, Lifu Huang, Di Lu, Qingyun Wang, Haoran Zhang, Qi Zeng, Chi Han, Zixuan Zhang, Yujia Qin, Xiaodan Hu, Nikolaus Parulian, Daniel Campos, Heng Ji, Brian Chen, Xudong Lin, Alireza Zareian, Amith Ananthram, Emily Allaway, Shih-Fu Chang, Kathleen McKeown, Yixiang Yao, Michael Spector, Mitchell DeHaven, Daniel Napierski, Marjorie Freedman, Pedro Szekely, Haidong Zhu, Ram Nevatia, Yang Bai, Yifan Wang, Ali Sadeghian, Haodi Ma, Daisy Zhe Wang
    Proc. Text Analysis Conference (TAC2020).

2019

  • Using FHIR to Construct a Corpus of Clinical Questions Annotated with Logical Forms and Answers
    Sarvesh Soni, Meghana Gudala, Daisy Zhe Wang, Kirk Roberts
    AMIA, 2019
  • GAIA at SM-KBP 2019-A Multi-media Multi-lingual Knowledge Extraction and Hypothesis Generation System
    Manling Li, Ying Lin, Ananya Subburathinam, Spencer Whitehead, Xiaoman Pan, Di Lu, Qingyun Wang, Tongtao Zhang, Lifu Huang, Heng Ji, Alireza Zareian, Hassan Akbari, Brian Chen, Bo Wu, Emily Allaway, Shih-Fu Chang, Kathleen McKeown, Yixiang Yao, Jennifer Chen, Eric Berquist, Kexuan Sun, Xujun Peng, Ryan Gabbard, Marjorie Freedman, Pedro Szekely, TK Satish Kumar, Arka Sadhu, Ram Nevatia, Miguel Rodriguez, Yifan Wang, Yang Bai, Ali Sadeghian, Daisy Zhe Wang
    Proc. Text Analysis Conference (TAC2019).
  • DRUM: End-To-End Differentiable Rule Mining On Knowledge Graphs [Github]
    Ali Sadeghian, Mohammadreza Armandpour, Patrick Ding, Daisy Zhe Wang

    To Appear, Neural Information Processing Systems 2019 (NeurIPS2019)
  • Hotel2vec: Learning Attribute-Aware Hotel Embeddings with Self-Supervision
    Ali Sadeghian, Shervin Minaee, Ioannis Partalas, Xinxin Li, Daisy Zhe Wang, Brooke Cowan
    In Submission
  • Measuring Impact of Climate Change on Tree Species: analysis of JSDM on FIA data
    Hyun Choi, Ali Sadeghian, Sergio Marconi, Ethan White, Daisy Zhe Wang
    NeurIPS 2019 Workshop Tackling Climate Change with Machine Learning
  • Comparing Clinical Judgement with MySurgeryRisk Algorithm for Preoperative Risk Assessment: A Pilot Study
    Meghan Brennan, Sahil Puri, Tezcan Ozrazgat-Baslanti, Rajendra Bhat, Zheng Feng, Petar Momcilovic, Xiaolin Li, Daisy Zhe Wang, Azra Bihorac
    Surgery. 2019 May;165(5):1035-1045.
  • Mining Rules Incrementally over Large Knowledge Bases
    Xiaofeng Zhou, Ali Sadeghian, Daisy Zhe Wang
    SIAM SDM, 2019

2018

  • GAIA – A Multi-media Multi-lingual Knowledge Extraction and Hypothesis Generation System
    Tongtao Zhang , Ananya Subburathinam , Ge Shi , Lifu Huang, Di Lu, Xiaoman Pan, Manling Li, Boliang Zhang, Qingyun Wang, Spencer Whitehead, Heng Ji, Alireza Zareian, Hassan Akbari, Brian Chen, Ruiqi Zhong, Steven Shao, Emily Allaway, Shih-Fu Chang, Kathleen McKeown, Dongyu Li, Xin Huang, Kexuan Sun, Xujun Peng, Ryan Gabbard, Marjorie Freedman, Mayank Kejriwal, Ram Nevatia, Pedro Szekely, T.K. Satish Kumar, Ali Sadeghian, Giacomo Bergami, Sourav Dutta, Miguel Rodriguez, Daisy Zhe Wang (Information Sciences Institute; Columbia University; Rensselaer Polytechnic Institute; University of Florida; University of Southern California)
    Proc. Text Analysis Conference (TAC2018).
  • Ten Years of WebTables
    Michael Cafarella, Alon Halevy, Hongrae Lee, Cong Yu, Daisy Zhe Wang, Eugene Wu
    To Appear, Proceedings of the VLDB Endowment, Vol. 11, No. 12, 2018.
  • A data science challenge for converting airborne remote sensing data into ecological information
    Sergio Marconi, Sarah J. Graves, Dihong Gong, Morteza Shahriari Nia, Marion Le Bras, Bonnie J. Dorr, Peter Fontana, Justin Gearhart, Craig Greenberg, Dave J. Harris, Sugumar Arvind Kumar, Agarwal Nishant, Joshi Prarabdh, Sundeep U. Rege, Stephanie Ann Bohlman, Ethan P. White​, Daisy Zhe Wang
    To Appear, PeerJ Journal, 2018.
  • Temporal Reasoning Over Event Knowledge Graphs
    Ali Sadeghian, Miguel Rodriguez, Daisy Zhe Wang, Anthony Colas
    ACM WSDM KBCOM workshop, 2018. Best Paper Honorable Mention Award
  • MySurgeryRisk: Development and Validation of a Machine-Learning Risk Algorithm for Major Complications and Death after Surgery
    Bihorac A, Ozrazgat-Baslanti T, Ebadi A, Motaei A, Madkour M, Pardalos PM, Lipori G, Hogan WR, Efron PA, Moore F, Moldawer LL, Wang DZ, Hobson CE, Rashidi P, Li X, Momcilovic P.
    Ann Surg. 2019 Apr;269(4):652-662.
  • Automatic Semantic Edge Labeling over Legal Citation Graphs
    Ali Sadeghian, Laksshman Sundaram, Daisy Zhe Wang, William F. Hamilton, Karl Branting, Craig Pfeifer
    Artificial Intelligence and Law, Volumn 26, Issue 2, Springer, 2018.

2017

  • Archimedes: Efficient Query Processing over Probabilistic Knowledge Bases
    Yang Chen, Xiaofeng Zhou, Kun Li, Daisy Zhe Wang
    ACM SIGMOD Record, 2017.
  • Multimodal Learning for Web Information Extraction
    Dihong Gong, Daisy Zhe Wang, Yang Peng
    The ACM Multimedia, 2017.
  • Extracting Visual Knowledge from the Web with Multimodal Learning
    Dihong Gong, Daisy Zhe Wang
    The International Joint Conference on Artificial Intelligence (IJCAI), 2017.
  • A Probabilistically Integrated System for Crowd-Assisted Text Labeling and Extraction
    Sean Goldberg, Daisy Zhe Wang, Christan Grant
    The ACM Journal of Data and Information Quality, 2017.
  • In-Database Batch and Query-time Inference over Probabilistic Graphical Models using UDA-GIST
    Kun Li, Xiaofeng Zhou, Daisy Zhe Wang, Christan Grant, Alin Dobra, Christopher Dudley
    The VLDB Journal, 2017, Vol 26, Issue 2.
  • Managing Probabilistic Entity Extraction
    Daisy Zhe Wang
    Encyclopedia of Database Systems, Springer, 2017

2016

  • ScaLeKB: Scalable Learning and Inference over Large Knowledge Bases
    Yang Chen, Daisy Zhe Wang, Sean Goldberg
    The VLDB Journal, 2016.
  • Book Chapter: Old media, new media, and public engagement with science
    Yulia A. Strekalova, Janice L. Krieger, Rachel E. Damiani, Sriram Kalyanaraman, Daisy Zhe Wang
    Citizen Engagement and Public Participation in the Era of New Media. Hershey, PA: IGI Global.
  • Query-driven Sampling for Collective Entity Resolution
    Christan Grant, Daisy Zhe Wang, Michael Wick
    IEEE 17th International Conference on Information Reuse and Integration, 2016.
  • ArchimedesOne: Query Processing over Probabilistic Knowledge Bases
    Xiaofeng Zhou, Yang Chen, Daisy Zhe Wang
    Proceedings of the VLDB Endowment, 2016
  • SigmaKB: Multiple Uncertain Knowledge Base Fusion
    Miguel E. Rodríguez, Sean Goldberg, Daisy Zhe Wang
    Proceedings of the VLDB Endowment, 2016.
  • Multimodal Ensemble Fusion for Disambiguation and Retrieval
    Yang Peng, Xiaofeng Zhou, Daisy Zhe Wang, Ishan Patwa, Dihong Gong, Chunsheng Victor Fang
    Proceedings of the IEEE Multimedia Magazine, 2016.
  • Scalable Image Retrieval with Multimodal Fusion
    Yang Peng, Xiaofeng Zhou, Daisy Zhe Wang, Chunsheng Victor Fang
    Proceedings of the 29th International FLAIRS conference, 2016.
  • Consensus Maximization Fusion of Probabilistic Information Extractors
    Miguel E. Rodríguez, Sean Goldberg, Daisy Zhe Wang
    Proceedings of the 15th Conference of the North American Chapter of the Association of Computational Linguistics (NAACL HLT), 2016.
  • Ontological Pathfinding: Mining First-Order Knowledge from Large Knowledge Bases
    Yang Chen, Sean Goldberg, Daisy Zhe Wang, Soumitra Siddharth Johri
    Proceedings of the ACM SIGMOD International Conference on Management of Data, 2016.

2015

  • Probabilistic Ensemble Fusion for Multimodal Word Sense Disambiguation
    Yang Peng, Daisy Zhe Wang, Ishan Patwa, Dihong Gong, Victor Chunsheng Fang
    IEEE International Symposium on Multimedia, 2015
  • Impact of Atmospheric Correction and Image Filtering on Hyperspectral Classification of Tree Species Using Support Vector Machine
    Morteza Shahriari Nia, Daisy Zhe Wang, Stephanie Ann Bohlman, Paul Gader, Sarah J. Graves, Milenko Petrovic
    Journal of Applied Remote Sensing, 2015
  • Optimizing Sampling-based Entity Resolution over Streaming Documents
    Christan Grant, Daisy Zhe Wang
    Proceedings of SDM Big Data & Streaming Analytics Workshop, 2015
  • A Topic-Based Search, Visualization, and Exploration System
    Christan Grant, Clint P. George, Virupaksha Kanjilal, Supriya Nirkhiwale, Joseph Wilson, Daisy Zhe Wang
    Proceedings of the 28th International FLAIRS Conference, 2015
  • A Challenge for Long-term Knowledge Base Maintenance
    Christan Grant, Daisy Zhe Wang
    Proceedings of ACM Journal on Data and Information Quality, 2015
  • UDA-GIST: An In-database Framework to Unify Data-Parallel and State-Parallel Analytics
    Kun Li, Daisy Zhe Wang,  Alin Dobra, Christopher Dudley
    Proceedings of the VLDB Endowment, 2015

2014

  • Efficient In-Database Analytics with Graphical Models
    Daisy Zhe Wang, Yang Chen, Christan Grant, Kun Li
    IEEE Data Engineering Bulletin, 2014
  • Knowledge Expansion over Probabilistic Knowledge Bases
    Yang Chen, Daisy Zhe Wang
    Proceedings of the ACM SIGMOD International Conference on Management of Data, 2014
  • Exploring Netflow Data using Hadoop
    Xiaofeng Zhou, Milenko Petrovic, Tom Eskridge, Marco Carvalho, Xi Tao
    Proceedings of the Third ASE International Conference on Cyber Security, 2014
  • Streaming Fact Extraction for Wikipedia Entities at Web-Scale
    Morteza Shahriari Nia, Christan Grant, Yang Peng, Daisy Zhe Wang, Milenko Petrovic
    Proceedings of the 27th International FLAIRS Conference, 2014
  • SemMemDB: In-Database Knowledge Activation
    Yang Chen, Milenko Petrovic, Micah H. Clark
    Proceedings of the 27th International FLAIRS Conference, 2014
  • SMART Electronic Legal Discovery via Topic Modeling
    Clint P. George, Sahil Puri, Daisy Zhe Wang, Joseph Wilson, William Hamilton
    Proceedings of the 27th International FLAIRS Conference, 2014

2013

  • CASTLE: Crowd-Assisted System for Textual Labeling & Extraction
    Sean Goldberg, Daisy Zhe Wang, Tim Kraska
    Proceedings of HCOMP, 2013
  • GPText: Greenplum Parallel Statistical Text Analysis Framework
    Kun Li, Christan Grant, Daisy Zhe Wang, Sunny Khatri, George Chitouras
    Data analaytics in the Cloud workshop (DanaC) at SIGMOD, 2013
  • Web-Scale Knowledge Inference Using Markov Logic Networks
    Yang Chen, Daisy Zhe Wang
    Proceedings of ICML workshop on Structured Learning: Inferring Graphs from Structured and Unstructured Inputs (SLG), 2013
  • Knowledge Extraction and Outcome Prediction using Medical Notes
    Ryan Cobb, Sahil Puri, Daisy Zhe Wang, Tezcan Baslanti, Azra Bihorac
    Proceedings of ICML workshop on Role of Machine Learning in Transforming Healthcare, 2013

2012

  • A Machine Learning Based Topic Exploration and Categorization on Surveys
    Clint P. George, Daisy Zhe Wang, Joseph N. Wilson, Liana M. Epstein, Philip Garland, Annabell Suh
    Proceedings of the 11th International Conference on Machine Learning and Applications (ICMLA), 2012
  • MADden: Query-Driven Statistical Text Analytics
    Christan Grant, Jordan Gumbs, Kun Li, Daisy Zhe Wang, George Chitouras
    Proceedings of the 21st ACM CIKM International Conference on Information and Knowledge Management, 2012
  • Automatic Knowledge Base Construction using Probabilistic Extraction, Deductive Reasoning, and Human Feedback
    Daisy Zhe Wang, Yang Chen, Sean Goldberg, Christan Grant, and Kun Li
    Proceedings of the Joint Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge Extraction (AKBC-WEKEX), 2012
  • The MADlib Analytics Library or MAD Skills, the SQL
    Joseph M. Hellerstein, Christoper Re, Florian Schoppmann, Daisy Zhe Wang, Eugene Fratkin, Aleks Gorajek, Kee Siong Ng, Caleb Welton, Xixuan Feng, Kun Li, Arun Kumar
    Proceedings of the VLDB Endowment, 2012

Recent Posts

  • DBSim: Extensible Database Simulator for Fast Prototyping In-Database Algorithms
  • DrugEHRQA: A Question Answering Dataset on Structured and Unstructured Electronic Health Records For Medicine Related Queries
  • A Brief Overview of Weak Supervision
  • DRUM: End-To-End Differentiable Rule Mining On Knowledge Graphs
  • IDTrees Data Science Challenge: 2017

Categories

  • courses
  • ecology
  • NIST and open eval
  • publications
  • research
  • research directions
  • survey
  • Uncategorized

Archives

  • February 2023
  • October 2020
  • December 2019
  • April 2019
  • December 2018
  • August 2018
  • February 2018
  • November 2017
  • June 2017
  • May 2017
  • March 2017
  • December 2016
  • October 2016
  • April 2016
  • March 2016
  • December 2015
  • November 2015
  • October 2015
  • May 2015
  • November 2014
  • October 2014
  • July 2014
  • May 2014
  • March 2014
  • December 2013
  • November 2013
  • October 2013
  • September 2013