• Home
  • Blog
  • People
  • Projects
  • Publications
  • Seminars
  • DSR Expo
  • Courses

Data Science Research

Menu
  • Home
  • Blog
  • People
  • Projects
  • Publications
  • Seminars
  • DSR Expo
  • Courses

CAP4773/CAP6779, Projects in Data Science, Spring 2017

CAP4773/CAP6779, Projects in Data Science, Spring 2017

Course Description (3 credit hours)

In order to address the growing need from both industry and academia (e.g., medical and bio informatics, financial, law enforcement, economics, decision support, social networks) for big data analytics skills including, data management, data mining, machine learning and data visualization, this course is part of the three-course series in the Data Science curriculum. The aim is to apply data science and big data analytic tools to develop domain-specific applications. Advanced topics in data science, individual projects in application areas such as vision, natural language processing, computational fluid dynamics, social networks, bioinformatics etc.

Prerequisite

Introduction to Data Science (CAP4770/CAP5771) or equivalent.

Course Objectives

Building on the foundations of databases and data mining, this course will prepare students for a variety of individualized projects in interest areas such as bioinformatics, vision and imaging, sensor and social networks, computational neuroscience, natural language processing, medical informatics and scientific data analysis.

Instructor: Prof. Daisy Z. Wang

  • Office location: CSE E456
  • Telephone: (352) 505-7626
  • Email address: daisyw@cise.ufl.edu
  • Office hours: Fridays 3-4pm

Projects

Project membersProject SummaryPostersExternal Advisors
Harish BalajiChronoSeek:Information Extraction from temporal knowledge bases

Sequential pattern mining has been more focused on instantaneous events
rather than time intervals. KBs such as YAGO and Wikidata have temporal
annotations on their relations by way of reification and have proposed various
data models like SPOT. This can be exploited to find patterns not only in a
temporal arrangement but also in a combination of topological and temporal
arrangement. This has not been explored and it leads to fifteen different
arrangements that prove to be interesting. The results of 2-arrangement phase
of the enumeration tree is generated from Wikidata.
PosterMiguel E. Rodriguez@UF Data Science Research Lab
Akash Agarwal, Roukna SenguptaAnomaly Detection over Graphical KB using HPCC

The goal of our project is anomaly detection over time evolving network graphs using HPCC systems. Our methods consider the network as it evolves and monitors properties of the network for changes. We use HPCC Systems ®, which is an open source, a massive parallel-processing computing platform for big data processing and analytics.

In the presentation, we wish to discuss our learning from HPCC and evaluate its performance for querying and operating on a large dataset. We would also discuss Enterprise Control Language(ECL) which is designed specifically for big data processing with HPCC.

Besides we would discuss our evaluations of anomaly detection algorithms over graphical KB - Wikipedia revision history, where we try to detect events using a distribution based methodology and structural changes in the graph over the time series.
PosterHPCC LexisNexis

Auon Haidar Kazmi, Karthik Maharajan Sankara SubramanianMADLIB Analytics Library Contributions

MADlib is a free, open source library of in-database analytic methods. It provides an evolving suite of SQL-based algorithms for machine learning, data mining and statistics that run at scale within a database engine, with no need for data import/export to other tools. In this presentation we introduce the MADlib project, including the background that led to its beginnings, and the motivation for using Python and C++ along with Postgres. We provide an overview of the library’s architecture and design patterns, and provide a description of various statistical methods in that context. We will explain the key contributions made by us to the MADlib project including the perceptron and the KNN algorithms.
PosterMADlib Apache Project
Samskruthi Padigepati, Abhinav ShankarLink Prediction on EHR Data using Medical Knowledge Base

Electronic health records store the medical and demographic information of patients in a digital format and can be used for advancement in clinical research. While the EHR data can be used for predicting patient-centered outcomes, challenges arise when there is missing information. In this project, we predict the missing links in the EHR data by integrating with a biomedical knowledge base.
PosterCTSI and UF Data Science Research Lab
Arvind Kumar Sugumar, Nishant AgarwalNEON NIST DSE – Tree Crown Delineation

Automatic tree crown delineation has a great impact on tracking and preserving bio diversity in our world. To serve as the pre-pilot for the full DSE track, which comprises of delineation, alignment and classification, we propose using the watershed class of algorithms to implement a baseline model for the delineation task.

This talk will take over from where we left off earlier and we will be talking in particular about two different approaches to making the naive watershed segmentation better i.e. Laplacian of Gaussian (LoG) method followed by Morphological enhancement and the Region Growing algorithm. We will be going through the techniques which we utilize to get the crown delineation done and the current progress will be demoed. Also the participant evaluation system would be demoed and a sample report will be generated.
PosterUF WeEcologyLab and UF Data Science Research Lab
Caleb BryantThe Rose Dialogue System

Personal digital assistants, such as Siri and Alexa, are the most well-known examples of dialogue systems. In recent years high accuracy speech recognition and natural language processing tools have made building custom dialogue systems ever more feasible. In this talk, we will be taking an end-of-semester look at the dialogue system for Rose, a virtual health navigator whose goal is to help patients understand their medical situations.
CTSI and UF Data Science Research Lab

Recent Posts

  • DBSim: Extensible Database Simulator for Fast Prototyping In-Database Algorithms
  • DrugEHRQA: A Question Answering Dataset on Structured and Unstructured Electronic Health Records For Medicine Related Queries
  • A Brief Overview of Weak Supervision
  • DRUM: End-To-End Differentiable Rule Mining On Knowledge Graphs
  • IDTrees Data Science Challenge: 2017

Categories

  • courses
  • ecology
  • NIST and open eval
  • publications
  • research
  • research directions
  • survey
  • Uncategorized

Archives

  • February 2023
  • October 2020
  • December 2019
  • April 2019
  • December 2018
  • August 2018
  • February 2018
  • November 2017
  • June 2017
  • May 2017
  • March 2017
  • December 2016
  • October 2016
  • April 2016
  • March 2016
  • December 2015
  • November 2015
  • October 2015
  • May 2015
  • November 2014
  • October 2014
  • July 2014
  • May 2014
  • March 2014
  • December 2013
  • November 2013
  • October 2013
  • September 2013