NIPS 2011 Domain Adaptation Workshop: Training Structured Prediction Models

Domain Adaptation Workshop: Theory and Application at NIPS 2011 Invited Talk: Training Structured Prediction Models with Extrinsic Loss Functions by Slav Petrov Slav Petrov is a Research Scientist at Google New York who works on problems at the intersection of natural language processing and machine learning. In particular, hes interested in syntactic parsing and its applications to machine translation and information extraction. Abstract: We present an online learning algorithm for training structured prediction models with extrinsic loss functions. This allows us to extend a standard supervised learning objective with additional loss-functions, either based on intrinsic or task-specific extrinsic measures of quality. We present experiments with sequence models on part-of-speech tagging and named entity recognition tasks, and with syntactic parsers on dependency parsing and machine translation reordering tasks.

NIPS 2011 Domain Adaptation Workshop: History Dependent Domain Adaptation

Domain Adaptation Workshop: Theory and Application at NIPS 2011 Invited Talk: History Dependent Domain Adaptation by Allen Lavoie Abstract: We study a novel variant of the domain adaptation problem, in which the loss function on test data changes due to dependencies on prior predictions. One important instance of this problem area occurs in settings where it is more costly to make a new error than to repeat a previous error. We propose several methods for learning effectively in this setting, and test them empirically on the real-world tasks of malicious URL classification and adversarial advertisement detection.

NIPS 2011 Domain Adaptation Workshop: Transportability and the Bias-Variance Trade-off

Domain Adaptation Workshop: Theory and Application at NIPS 2011 Invited Talk: Transportability and the Bias-Variance Trade-off by Karthika Mohan Abstract: Transportability is a recently proposed framework that examines whether or not a particular statistical or causal relation can be transported from a source domain to a related target domain given some knowledge of the differences between the domains, usually by appealing to graphical criteria. Transportability allows us to exploit the structure of the source and target domains, but is rigid in the sense that each relation is said to be either fully transportable or not. We propose a relaxation of transportability, and provide examples illustrating how this relaxation can be used to determine whether or not to conduct a new study or collect new data. Finally, we briefly mention ongoing research formalizing and quantifying the bias-variance tradeoff that arises when determining whether to mix source and target data under various graphical criteria.

NIPS 2011 Domain Adaptation Workshop: Domain Adaptation with Multiple Latent Domains

Domain Adaptation Workshop: Theory and Application at NIPS 2011 Invited Talk: Domain Adaptation with Multiple Latent Domains by Kate Saenko Abstract: Domain adaptation is important for practical applications of supervised learning, as the distribution of inputs can differ significantly between available sources of training data and the test data in a particular target domain. Many domain adaptation methods have been proposed, yet very few of them deal with the case of more than one training domain; methods that do incorporate multiple domains assume that the separation into domains is known a priori, which is not always the case in practice. In this paper, we introduce a method for multi-domain adaptation with unknown domain labels, based on learning nonlinear crossdomain transforms, and apply it to image classification. Our key contribution is a novel version of constrained clustering; unlike many existing constrained clustering algorithms, ours can be shown to provably converge locally while satisfying all constraints. We present experiments on a commonly available image dataset.

NIPS 2011 Domain Adaptation Workshop: Adaptation without Retraining

Domain Adaptation Workshop: Theory and Application at NIPS 2011 Invited Speaker: Adaptation without Retraining by Dan Roth Dan Roth is a Professor in the Department of Computer Science and the Beckman Institute at the University of Illinois at Urbana-Champaign and a University of Illinois Scholar. He is also a Fellow of AAAI, for his contributions to the foundations of machine learning and inference and for developing learning centered solutions for natural language processing problems. Abstract: Natural language models trained on labeled data from one domain do not perform well on other domains. Most adaptation algorithms proposed in the literature train a new model for the target domain using a mix of labeled and unlabeled data. We discuss some limitations of existing general purpose adaptation algorithms that are due to the interaction betweendifferences in base feature statistics and task differences and illustrate howthis should be taken into account jointly. With these insights we propose a new approach to adaptation that avoids the need for retraining models. I nstead, at evaluation time, we perturb the given instance to be more similar to instances the model can h andle well, or perturb the model outcomes to fit our expectation of the target domain better, given some prior knowledge on the task and the target domain. We provide experimental evidence in a range of natural language processing, including semantic role labeling and English as a Second Language (ESL <b>…<b>

NIPS 2011 Domain Adaptation Workshop: Overfitting and Small Sample Statistics

Domain Adaptation Workshop: Theory and Application at NIPS 2011 Invited Talk: Overfitting and Small Sample Statistics by Ruslan Salakhutdinov Abstract: We study the prevalent problem when a test distribution differs from the training distribution. We consider a setting where our training set consists of a small number of sample domains, but where we have many samples in each domain. Our goal is to generalize to a new domain. For example, we may want to learn a similarity function using only certain classes of objects, but we desire that this similarity function be applicable to object classes not present in our training sample (eg we might seek to learn that "dogs are similar to dogs" even though images of dogs were absent from our training set). Our theoretical analysis shows that we can select many more features than domains while avoiding overfitting by utilizing data-dependent variance properties. We present a greedy feature selection algorithm based on using T-statistics. Our experiments validate this theory showing that our T-statistic based greedy feature selection is more robust at avoiding overfitting than the classical greedy procedure.