We consider the problem of extracting text instances of predefined categories (e.g. city and person) from the Web. Instances of a category may be scattered across thousands of independent sources in many different formats with potential noises, which makes open-domain information extraction a challenging problem. Learning syntactic rules like “cities such as _” or “_