Domain Adaptation with Structural Correspondence Learning

Google Tech Talks September, 5 2007 ABSTRACT Statistical language processing tools are being applied to an ever-wider and more varied range of linguistic data. Researchers and engineers are using statistical models to organize and understand financial news, legal documents, biomedical abstracts, and weblog entries, among many other domains. Because language varies so widely, collecting and curating training sets for each different domain is prohibitively expensive. At the same time, differences in vocabulary and writing style across domains can cause state-of-the-art supervised models to dramatically increase in error. This talk describes structural correspondence learning (SCL), a method for adapting models from resource-rich source domains to resource-poor target domains. SCL uses unlabeled data from both domains to induce a common feature representation for domain adaptation. We demonstrate SCL for two NLP tasks: sentiment classification and part of speech tagging. For each of these tasks, SCL significantly reduces the error of a state-of-the-art discriminative model. Speaker: John Blitzer production company: Google engEDU