• Home
  • Blog
  • People
  • Projects
  • Publications
  • Seminars
  • DSR Expo
  • Courses

Data Science Research

Menu
  • Home
  • Blog
  • People
  • Projects
  • Publications
  • Seminars
  • DSR Expo
  • Courses
Home › publications › research directions › Extracting Visual Knowledge from the Web with Multimodal Learning

Extracting Visual Knowledge from the Web with Multimodal Learning

May 26, 2017     Comment Closed     publications, research directions

We consider the problem of automatically extracting visual objects from web images. Despite the extraordinary advancement in deep learning, visual object detection remains a challenging task gute besserung bilder kostenlos herunterladen. To overcome the deficiency of pure visual techniques, we propose to make use of meta text surrounding images on the Web for enhanced detection accuracy handy email anhang herunterladen. In this work we present a multimodal learning algorithm to integrate text information into visual knowledge extraction. we developed a system that takes raw webpages and a small set of training images from ImageNet as inputs, and automatically extracts visual knowledge test drive unlimited 2 download kostenlos pc. Experimental results based on 46 object categories show that the extraction precision is improved significantly from 73% (with state-of-the-art deep learning programs) to 81%, which is equivalent to a 31% reduction in error rates download with handy mp3.

 

Multimodal Embeddings

Our algorithm is closely related to the skip-gram model, which is trained to learn word embeddings by maximizing the following objective function:

where w1, w2, … is a sequence of training words in the corpus, and c is the size of window around target wt foto's download van icloud naar computer. We extend this skip-gram model into multimodal corpus to learn vector embeddings for both text words and image concepts, such that objects with similar semantic meanings are also closed to each other in the embedding space gefahrensymbole herunterladen.

Structure Learning and Prediction

Given candidate image objects along with text words describing these objects, our goal is to predict the confidence score that the image objects are belonging to some predefined image categories magix foto clinic 6 for free. Mathematically, we model the probability that an image In contains objects of category c with a logistic regression model:

where Wn is a set of multimodal words describing image In andreas gabalier kostenlos downloaden. To learn the model parameters, we maximize the following regularized objective function:

Experiments and Results

We evaluate our approach based on a collection of web pages and images derived from the Common Crawl dataset that is publicly available on Amazon S3 youtube music handy. The data is processed to extract image objects along with text tags, resulting in around 10 million tagged images for our study. The Table 1 shows some example documents primera app download.

Quantative evaluation based on 46 image categories show that, on average the multimodal approach has improved the image prediction precision by 8.48% from 72.95% to 81.43%, which is equivalent to 31% reduction in error rates. To intuitively examine the effectiveness, we visualize extracted examples as shown in Table 3. From these examples, we conclude that the baseline Uni. approach extracts objects with the highest visual detection score (1st row), while the proposed Mul. approach leverages both text and visual information (2nd row). We also observe that the text description for images retrieved with Mul. (2nd row) is more consistent with the visual objects in the images. The second image in the first row is a false positive extraction, which also shows the unreliability of algorithms relying on single source of information.

 

For more details, please see our paper (Gong et. al. IJCAI-2017).

publications research directions

 Previous Post

Interactive Inference for Information Extraction

― March 14, 2017

Next Post 

Archimedes: Efficient Query Processing over Probabilistic Knowledge Bases

― June 26, 2017

Related Articles

A Brief Overview of Weak Supervision
DRUM: End-To-End Differentiable Rule Mining On Knowledge Graphs
IDTrees Data Science Challenge: 2017
Efficient Conditional Rule Mining over Knowledge Bases
Taming The Data Monster To Make Better Decisions

Sponsors

NIST

Adobe_Logo

DTCC

pcori uf-clinical

ICHP

Recent Posts

Recent Posts

  • A Brief Overview of Weak Supervision
  • DRUM: End-To-End Differentiable Rule Mining On Knowledge Graphs
  • IDTrees Data Science Challenge: 2017
  • Efficient Conditional Rule Mining over Knowledge Bases
  • Taming The Data Monster To Make Better Decisions

Related Blogs

  • ampLab
  • Data Beta
  • Fast ML

Post Categories

Categories

  • courses
  • ecology
  • NIST and open eval
  • publications
  • research
  • research directions
  • survey
  • Uncategorized

Archives

Archives

  • October 2020
  • December 2019
  • April 2019
  • December 2018
  • August 2018
  • February 2018
  • November 2017
  • June 2017
  • May 2017
  • March 2017
  • December 2016
  • October 2016
  • April 2016
  • March 2016
  • December 2015
  • November 2015
  • October 2015
  • May 2015
  • November 2014
  • October 2014
  • July 2014
  • May 2014
  • March 2014
  • December 2013
  • November 2013
  • October 2013
  • September 2013

Meta

DSR Wiki
Site Admin
WordPress.org

Recent Posts

  • A Brief Overview of Weak Supervision
  • DRUM: End-To-End Differentiable Rule Mining On Knowledge Graphs
  • IDTrees Data Science Challenge: 2017
  • Efficient Conditional Rule Mining over Knowledge Bases
  • Taming The Data Monster To Make Better Decisions