• Home
  • Blog
  • People
  • Projects
  • Publications
  • Seminars
  • DSR Expo
  • Courses

Data Science Research

Menu
  • Home
  • Blog
  • People
  • Projects
  • Publications
  • Seminars
  • DSR Expo
  • Courses
Home › courses › NIST and open eval › research directions › NIST 2015 Pre-Pilot Data Science Evaluation

NIST 2015 Pre-Pilot Data Science Evaluation

March 21, 2016     Comment Closed     courses, NIST and open eval, research directions

Dihong Gong and Daisy Zhe Wang

We participated in the 2015 Pre-Pilot Data Science Evaluation organized by the National Institute of Standards and Technology (NIST). The primary goal of the pre-pilot evaluation is to develop and exercise the evaluation process in the context of data science. The evaluation consists of four tasks including data cleaning, data alignment, forecasting and prediction windows iso dateien herunterladen. Our DSR lab has participated the data cleaning and traffic event prediction tasks, and submitted several running systems of different algorithms and configurations. Most of the submissions are based on final student projects results in Dr. Daisy Zhe Wang’s 2015 Fall Introduction to Data Science class herunterladen. The course gave an introduction to the basic data science techniques including programming in Python, exploratory and statistical analysis, and Map-Reduce for small and big data manipulation and data analytics. In our class, 7 groups of 3-6 students participated in the pre-pilot, where 4 groups were mainly undergrads and 3 groups are mainly grads (master + PhD) video aus facebook nachricht downloaden.

 

Figure 1: NIST 2015 Data Science Pre-Pilot Evaluation Tasks in Traffic Domain

Since the 1980s, NIST has been conducting evaluations of data-centric technologies, including automatic speech transcription, information retrieval, machine translation, speaker and language recognition, image recognition, fingerprint matching, event detection from text, video, multimedia, and automatic knowledge base construction, among many others. These evaluations have enabled rigorous research by sharing the following fundamental elements: (1) the use of common tasks, datasets, and metrics; (2) the presence of specific research challenges meant to drive the technology forward; (3) an infrastructure for developing effective measurement techniques and measuring the state-of-the-art; and (4) a venue for encouraging innovative algorithmic approaches. The tasks included in the pre-pilot are illustrated in Figure 1 and consist of: 1) Cleaning: finding and eliminating errors in dirty data. 2) Alignment: relating different representations of the same object in different data sources. 3) Prediction: determining possible values for an unknown variable based on known variables. 4) Forecasting: determining future values for a variable based on past values download whatsapp on the internet.

In the data cleaning task, we are given a set of detector measurements (see Figure 2 for detector distribution) for vehicle velocity, lane occupancy and flows. Among the given measurements, part of the values are incorrect (e.g. added noises by an artificial program), and our task is to correct those incorrect values program free music download. The major challenges of this task come from three aspects. First, the data is big (150GB text data with around 1.46 billion measurement entries), which makes it difficult for system development, debugging and fine tuning. Second, the model used for noising the data is unknown, which makes it difficult to reliably detect erroneous values. Finally, when a measurement is considered as incorrect, replacing that value with a correct one can also be a challenge as well reigns kostenlos downloaden. For this task, we’ve submitted three runs, but neither of them performs better than the baseline systems. The evaluation metric used was mean of the absolute errors (MAE), and our best performing system had MAE value of 0.40 while the baseline systems had MAE around 0.28. According to the report from the NIST, most of our errors come from the false alarms (where flow value is correct but we consider it as incorrect). A following up discussion with NIST in a workshop confirms that false alarms can easily occur due to the noising model they have used. In addition, while some of the erroneous flow values can be detected easily (like extremely high or negative values), a good portion of them are not gif downloaden kostenlos. In the future, we should improve our system to reliably detect such incorrect flow values and reduce the amount of false alarms.

 

Detector Dsitribution

Figure 2. Detector Distribution

figure3

Figure 3 der steppenwolf herunterladen. Number of Event Occurrence by Year

 

 

 

 

 

 

 

 

For the prediction task, we have developed systems that can predict the number and types of traffic events for a given (geographical bounding, interval of time) pair. In this task, we have event data of the past 12 years (see Figure 3 for number of event occurrence by year), where the location of events is surrounding the Baltimore-DC areas teams op chromebook. At the testing stage, a system needs to predict the number of events will occur within given geographical bounding and time interval (1 month). We have submitted 7 runs in total. While all the runs are based on regression models for prediction, some of them are making use of extra data such as weather and Open Street Maps in addition to time and event counts in the past years garmin connect app. We have also tested different regression models, including linear regression models, second-order polynomial regression models and support vector regression models. The evaluation metric used in this task was Root Mean Squared Error (RMSE), and our submitted systems had RMSE values ranging from 5.17 to 33.44. Based on the report, we found that the best performing systems have the following features: 1) cleaning the noisy data (e.g. “zero counts”) before feeding into regression models; and 2) Using higher order regression models than linear regression; and 3) train one model per year instead of per month to alleviate the curse of dimensionality effect caused by sparse training data points.

Overall, the benefit of participating the Pre-Pilot evaluation is tremendous. The students had a very realistic and positive experience learning to tackle big data science challenges including data volume in data cleaning task, data veracity and value in traffic event prediction task. The tasks of pre-pilot evaluation are independent, allowing entry from groups with different expertise and can be done by student groups at undergrad and grad level with different sophistication of models and tools. Students in our data science class have learned significant amount of knowledge from using basic tools like Pandas, Hadoop or Apache Spark to developing scalable systems, and combining machine learning into big data analytics. We have also learned valuable information about the evaluation process in the context of data science, and provide feedback to further improve the future evaluation (e.g. upcoming Pilot by NIST). We conclude with three observations based on our participation in the NIST Data Science Evaluation workshop held at NIST March 2016 (please see our presentation slides part 1 part 2 for more details): 1) Prototype with simple model first with less data types and analyze potential correlative relationship between data types; 2) Curse of dimensionality problems for the prediction task when given training data is sparse; 3) Maybe useful if part of the ground-truth data for cleaning task is released to avoid over-aggressive or under-aggressive cleaning schemes.

 

 

courses NIST and open eval research directions

 Previous Post

SMART Electronic Discovery: System Evaluation

― December 27, 2015

Next Post 

NIST 2015 Knowledge Base Population – Ensemble

― April 15, 2016

Related Articles

DrugEHRQA: A Question Answering Dataset on Structured and Unstructured Electronic Health Records For Medicine Related Queries
DBSim: Extensible Database Simulator for Fast Prototyping In-Database Algorithms
A Brief Overview of Weak Supervision
DRUM: End-To-End Differentiable Rule Mining On Knowledge Graphs
IDTrees Data Science Challenge: 2017

Sponsors

NIST

Adobe_Logo

DTCC

pcori uf-clinical

ICHP

Recent Posts

Recent Posts

  • DrugEHRQA: A Question Answering Dataset on Structured and Unstructured Electronic Health Records For Medicine Related Queries
  • DBSim: Extensible Database Simulator for Fast Prototyping In-Database Algorithms
  • A Brief Overview of Weak Supervision
  • DRUM: End-To-End Differentiable Rule Mining On Knowledge Graphs
  • IDTrees Data Science Challenge: 2017

Related Blogs

  • ampLab
  • Data Beta
  • Fast ML

Post Categories

Categories

  • courses
  • ecology
  • NIST and open eval
  • publications
  • research
  • research directions
  • survey
  • Uncategorized

Archives

Archives

  • September 2022
  • August 2022
  • October 2020
  • December 2019
  • April 2019
  • December 2018
  • August 2018
  • February 2018
  • November 2017
  • June 2017
  • May 2017
  • March 2017
  • December 2016
  • October 2016
  • April 2016
  • March 2016
  • December 2015
  • November 2015
  • October 2015
  • May 2015
  • November 2014
  • October 2014
  • July 2014
  • May 2014
  • March 2014
  • December 2013
  • November 2013
  • October 2013
  • September 2013

Meta

DSR Wiki
Site Admin
WordPress.org

Recent Posts

  • DrugEHRQA: A Question Answering Dataset on Structured and Unstructured Electronic Health Records For Medicine Related Queries
  • DBSim: Extensible Database Simulator for Fast Prototyping In-Database Algorithms
  • A Brief Overview of Weak Supervision
  • DRUM: End-To-End Differentiable Rule Mining On Knowledge Graphs
  • IDTrees Data Science Challenge: 2017
  • Whsmith Lodgers Agreement
  • What To Ask For In A Prenuptial Agreement
  • What Is Department Of State Corporation Bureau Or Business Partnership Agreement
  • What Is A General Security Agreement Nz
  • What Agreement Led To The Establishment Of The Euro A Common European Currency Quizlet
  • Vmware Service Provider License Agreement
  • Validity Of Debt Agreement In India
  • University Of Manitoba Unifor Collective Agreement
  • U.s.-China Trade Agreement 1999
  • Training Agreement Plan Definition
  • Thoroughbred Lease Agreement
  • The Canada-Us-Mexico Agreement Enters Into Force July 1
  • Td Ameritrade Brokerage Agreement
  • Subscription Service Agreements
  • Subject And Verbs Agreement
  • Standard Non Disclosure Agreement Australia
  • Source Code Development Agreement
  • Simple One Page Room Rental Agreement Pdf
  • Shareholders Agreements Sweet Maxwell
  • Service Purchase Agreement Meaning