CAMeL: Crowd Assisted Machine Learning
The trade-offs between human and mechanical computation are well known. While machines are cheaper and quicker than humans, on most tasks they are still far less accurate. Crowdsourcing through services like Amazon Mechanical Turk bridge this gap somewhat by making human computation less costly and more available, though large scale or repetitious tasks may still prove costly. Ideally, we want to automate most aspects of our task and only bring in humans when they’re really needed.
CAMeL (Crowd-Assisted Machine Learning) is a paradigm that takes a data cleaning approach to the use of human computation. Allow automated methods to perform the task to the best of their abilities and let humans “clean up” the most erroneous or uncertain aspects. The research challenges become how to best decompose the machine learning problem to be solvable by many micro-tasks in parallel, how to optimize over the number of questions asked, and how best to present information and elicit feedback from the crowd. We have built a number of systems to address these challenges.
CASTLE: Crowd-Assisted Information Extraction System
CASTLE is a crowd-assisted information extraction system (IE) based on statistical machine learning. It uses a conditional random field (CRF) to annotate an initial batch of text data. In contrast to other IE systems, however, CASTLE uses a probabilistic data model to store the results, automatically executes crowdsourcing to correct the most uncertain results, and integrates their responses back into the probabilistic data model.
Pi-CASLE: Probabilistically Integrated CASTLE
Pi-CASTLE (Probabilistically Integrated CASTLE) is an extension of CASTLE with a number of enhancements centered around probabilistic integration. It expands upon the data model of the original CASTLE system by pushing all operations into the database implemented as user-defined functions. Additionally, Pi-CASTLE contains a more robust quality control mechanism compared to its counterpart. We implemented a novel Bayesian scheme that maps crowd responses to probabilities and combines them before integration back into the DB.
The CAKE (Crowd-Assisted Knowledge Extraction) system is currently under development. The goal is to automate the data cleaning process that occurs over a knowledge base (KB) using the crowd. The previous CASTLE systems contained only text annotations, but CAKE is a fully probabilistic KB containing facts, relations, and rules for generating new facts. We use human computation to improve all three aspects of the KB, cleaning up as well as generating new facts and relations, and even using human ingenuity to create new rules that govern the data.
- CASTLE: Crowd-Assisted System for Textual Labeling & Extraction
Sean Goldberg, Daisy Zhe Wang, Tim Kraska
Proceedings of AAAI HCOMP 2013
- Pi-CASTLE: A Probabilistically Integrated System for Crowd-Assisted Textual Labeling & Extraction
Sean Goldberg, Daisy Zhe Wang, Christan Grant
Accepted to ACM JDIQ, 2016