Contrastive Estimation: Training Log-Linear Models on ...

Contrastive Estimation: Training Log-Linear Models on ...

Contrastive Estimation: (Efficiently) Training LogLinear Models (of Sequences) on Unlabeled Data Noah A. Smith and Jason Eisner Department of Computer Science / Center for Language and Speech Processing Johns Hopkins University {nasmith,jason}@cs.jhu.edu ACL 2005 N. A. Smith and J. Eisner Contrastive Estimation Nutshell Version unannotated text tractable training contrastive estimation with lattice neighborhoods Experiments on unlabeled data: max ent features sequence models POS tagging: 46% error rate reduction (relative to EM) Max ent features make it possible to survive damage to tag dictionary ACL 2005 N. A. Smith and J. Eisner Contrastiveparsing: Estimation Dependency

21% Red leaves dont hide blue jays. ACL 2005 N. A. Smith and J. Eisner Contrastive Estimation Maximum Likelihood Estimation (Supervised) p JJ NNS MD VB JJ NNS y red leaves dont hide blue jays x

p ? ? * * * ACL 2005 N. A. Smith and J. Eisner Contrastive Estimation Maximum Likelihood Estimation (Unsupervised) p ? ? ? ? ? ? red leaves dont hide blue

jays This is what EM does. p ? ? * * * ACL 2005 N. A. Smith and J. Eisner Contrastive Estimation x Focusing Probability Mass numerator denominator ACL 2005 N. A. Smith and J. Eisner Contrastive Estimation Focusing Probability Mass numerator denominator ACL 2005 N. A. Smith and J. Eisner Contrastive Estimation Conditional Estimation (Supervised) p p

JJ NNS MD VB JJ NNS y red leaves dont hide blue jays x ? ? ? ? ?

? red leaves dont hide blue jays (x) * A different denominator! ACL 2005 N. A. Smith and J. Eisner Contrastive Estimation Objective Functions Objective Optimizati on Algorithm MLE Count & Normalize* tags & words

* * MLE with hidden variables EM* words * * Conditional Likelihood Iterative Scaling tags & words (words) * tags & words hypothesize d tags & words Perceptron Backprop Denominat Numerator or

*For generative models. ACL 2005 N. A. Smith and J. Eisner Contrastive Estimation Objective Functions Objective MLE MLE with Contrastive hidden Estimation variables Optimizati on Algorithm Count & generic Normalize* numerical solvers EM* Conditional Likelihood (in this talk, LMVM Iterative L-BFGS) Scaling Perceptron Backprop Denominat Numerator

or observed tags data& * * words (in this talk, raw words * * ? word sequence, tagsover & sum (words) * words all possible hypothesize taggings) tags & d tags & words words *For generative models. ACL 2005 N. A. Smith and J. Eisner Contrastive Estimation This talk is about denominators ... in the unsupervised case.

A good denominator can improve accuracy and tractability. ACL 2005 N. A. Smith and J. Eisner Contrastive Estimation Language Learning (Syntax) red leaves EM dont hide blue jays Why didnt he say, birds fly or dancing granola or the wash dishes or any other sequence of words? ACL 2005 N. A. Smith and J. Eisner Contrastive Estimation Language Learning (Syntax) red leaves dont

hide blue jays Why did he pick that sequence for those words? Why not say leaves red ... or ... hide dont ... or ... ACL 2005 N. A. Smith and J. Eisner Contrastive Estimation What is a syntax model supposed to explain? Each learning hypothesis corresponds to a denominator / neighborhood. ACL 2005 N. A. Smith and J. Eisner Contrastive Estimation The Job of Syntax Explain why each word is necessary. DEL1WORD neighborhood red dont hide blue jays leaves dont hide blue jays red leaves hide blue jays red leavesdont hide blue jays red leaves dont hide blue

red leaves dont blue jays red leaves dont hide jays ACL 2005 N. A. Smith and J. Eisner Contrastive Estimation The Job of Syntax Explain the (local) order of the words. TRANS1 neighborhood red dont leaves hide blue jays leaves red dont hide blue jays red leavesdont hide blue jays red leaves hide dont blue jays red leaves dont hide jays blue red leaves dont blue hide jays ACL 2005 N. A. Smith and J. Eisner Contrastive Estimation p ? ? ? ? ? ? red leaves dont

hide blue jays ? ? ? red leaves dont p ? ? ? hide blue jays ? ? ? ? ? ? leaves

red dont hide blue jays ? ? ? ? ? ? red dont leaves hide blue jays ? ?

? ? ? ? red leaves hide dont blue jays ? ? ? ? ? ? red leaves dont

blue hide jays ? ? ? ? ? ? red leaves dont hide jays blue sentences in TRANS1 neighborhoo d ACL 2005 N. A. Smith and J. Eisner Contrastive Estimation p

p ? ? ? ? ? ? red leaves dont hide blue jays red leaves dont le a do n

ve s t re d hide hi de bl ue le a do n ve dont s t hide (with any tagging) ACL 2005 N. A. Smith and J. Eisner

blue jays ja ys hi d e blue bl ue jays sentences in TRANS1 neighborhoo Contrastive Estimation d The New Modeling Imperative A good sentence hints that a set of bad ones is nearby. numerator denominator (neighborhood) Make the good

sentence at the ACL 2005 N. A. Smith and J. Eisner Contrastive Estimation expense likely, This talk is about denominators ... in the unsupervised case. A good denominator can improve accuracy and tractability. ACL 2005 N. A. Smith and J. Eisner Contrastive Estimation Log-Linear Models score of x, y partition function Computing is undesirable! Conditional Estimation (Supervised) 1 sentence Contrastive Estimation (Unsupervised) a few

sentences Sums over all possible taggings of all possible sentences! ACL 2005 N. A. Smith and J. Eisner Contrastive Estimation A Big Picture: Sequence Model Estimation unannotated data tractable sums generative, EM: p(x) log-linear, EM: p(x) log-linear, MLE: p(x, y) generative, MLE: p(x, y) log-linear, conditional estimation: p(y | x) log-linear, CE with lattice neighborhoods overlapping

features ACL 2005 N. A. Smith and J. Eisner Contrastive Estimation Contrastive Neighborhoods Guide the learner toward models that do what syntax is supposed to do. Lattice representation efficient algorithms. There is an art to choosing neighborhoo d functions. ACL 2005 N. A. Smith and J. Eisner Contrastive Estimation Neighborhoods neighborhood size lattice arcs DEL1WORD n+1 O(n) delete up to 1 word TRANS1 n DELORTRANS1 O(n)

DEL1SUBSEQUENCE * (EM) perturbations O(n) transpose any bigram O(n) DEL1WORD TRANS1 O(n2 delete any contiguous O(n2) subsequence ) - replace each word with anything ACL 2005 N. A. Smith and J. Eisner Contrastive Estimation The Merialdo (1994) Task Given unlabeled text and a POS dictionary (that tells all possible tags for each word type), A form of learn to tag. supervision. ACL 2005 N. A. Smith and J. Eisner Contrastive Estimation

Trigram Tagging Model JJ NNS MD VB JJ NNS red leaves dont hide blue jays feature set: tag trigrams tag/word pairs from a POS dictionary ACL 2005 N. A. Smith and J. Eisner Contrastive Estimation CRF supervised 99.5

HMM 97.2 LENGTH 79.3 TRANS1 79.0 DELORTRANS1 78.8 10 data 70.0 DA Smith & Eisner (2004) 66.6 EM Merialdo (1994) EM 62.1 60.4 DEL1WORD 58.7 DEL1SUBSEQUENCE random loglinear EM

35.1 96K words 30.0 40.0 50.0 60.0 70.0 80.0 90.0 full POS dictionary tagging accuracy (ambiguous words) uninformative initializer best of 8 smoothing conditions ACL 2005 N. A. Smith and J. Eisner Contrastive Estimation 100.0 Dictionary includes ... What if we damage the POS dictionary? all words words from 1st half of corpus words with count 2 words with count 3 Dictionary excludes OOV words, which can get any tag.

ACL 2005 N. A. Smith and J. Eisner Contrastive Estimation 90.4 Dictionary excludes OOV words, which can get any tag. 60.5 56.6 65.0 all words words from 1st half of corpus words with count 2 words with count 3 69.5 66.5 70.0 70.9 75.0 84.4 80.5 80.0

Dictionary includes ... 78.3 75.2 72.3 84.8 81.3 77.2 85.0 60.0 51.0 55.0 96K words 17 coarse POS tags uninformative initializer do m ra n EM N G TH LE EL O RT

RA NS 1 50.0 D tagging accuracy (all words) 90.0 90.1 95.0 ACL 2005 N. A. Smith and J. Eisner Contrastive Estimation Trigram Tagging Model + Spelling JJ NNS MD VB JJ NNS red leaves dont

hide blue jays feature set: tag trigrams tag/word pairs from a POS dictionary 1- to 3-character suffixes, contains hyphen, digit ACL 2005 N. A. Smith and J. Eisner Contrastive Estimation 50.0 ACL 2005 N. A. Smith and J. Eisner Contrastive Estimation + LEN sp G T el H lin g D EL + OR sp T R el AN lin S 1 g 55.0 51.0 60.5 56.6 60.0

do m 65.0 ra n EM 70.9 73.8 73.6 69.5 66.5 70.0 83.2 84.4 80.5 78.3 75.2 72.3 75.0 N G TH 80.0

84.8 81.3 77.2 85.0 LE EL O RT RA NS 1 91.9 91.1 90.8 90.3 89.8 90.4 90.1 90.0 D tagging accuracy (all words) 95.0 Spelling features aided recovery,

but only with a smart neighborhoo d. The model need not be finite-state. ACL 2005 N. A. Smith and J. Eisner Contrastive Estimation 35.2 42.1 23.6 33.8 48.7 50.0 40.0 37.4 30.0 20.0 EM 10.0 LENGTH 0.0 TRANS1 See our paper at the IJCAI 2005

Grammatical Inference workshop. ACL 2005 attachment accuracy Klein & Manning (2004) Unsupervised Dependency Parsing initializer cleve r unin form ative N. A. Smith and J. Eisner Contrastive Estimation To Sum Up ... ontrastive Estimation means denominator for tractability or for accuracy picking your own (or, as in our case, for both). w we can use the task to guide the unsupervised learner (like discriminative techniques do for supervised learners

s a particularly good fit for log-linear models: with max ent features unsupervised sequence models all in time for ACL 2006. ACL 2005 N. A. Smith and J. Eisner Contrastive Estimation ACL 2005 N. A. Smith and J. Eisner Contrastive Estimation

Recently Viewed Presentations

  • Diapositive 1 - ESIEE Paris

    Diapositive 1 - ESIEE Paris

    PR3602 Résolution de problèmes en intelligence artificielle : les algorithmes A* Qu'est-ce que l'intelligence ? Qu'est-ce que l'intelligence ?
  • RIS Coding - pacsgroup.org.uk

    RIS Coding - pacsgroup.org.uk

    MRI Colonogram. MRI Cranio-Facial. MRI Elbow Lt. MRI Elbow Rt. MRI Femur (Thigh) Lt. MRI Femur (Thigh) Rt. MFINL. MRI Fingers Lt. MFINR. MRI Fingers Rt. MRI Foot Lt. MRI Foot Rt. Knees US Rt. Liver US. Abdomen Lower US....
  • Washington Army National Guard

    Washington Army National Guard

    RQ-7B. The AAI RQ-7 Shadow is an American unmanned aerial vehicle (UAV) used by the United States Army, Marine Corps, Australian Army and Swedish Army for reconnaissance, surveillance Target Acquisition and battle damage assessment. In service since 2007 as a...
  • Coronary Artery Bypass Grafting

    Coronary Artery Bypass Grafting

    Coronary ostia perfusion catheters (auto-inflating, gummy tip, or spencers (for aortic only) Supplies continued. Aortic cannula. Venous cannula (need two for bicaval cannulation-need for mitral valve surgery)
  • Patient Access Services (PAS) Program

    Patient Access Services (PAS) Program

    Your participation will help us better understand your concerns and needs as we work toward a successful implementation. ----- News Bytes: News Bytes is a bi-monthly newsletter showcasing the Florida Agency for Health Care Administration's Health Information Technology initiatives.
  • These training materials are provided by the New Mexico ...

    These training materials are provided by the New Mexico ...

    A USB-B plug may be seen on some printers. Symbol for a USB port or plug. ... This "digital subscriber line" transmits signals over a specially augmented phone line, using different frequencies than those of the phone conversation. DSL -...
  • Market Structures - Folsom Cordova Unified School District ...

    Market Structures - Folsom Cordova Unified School District ...

    The objective of differentiation is to develop a position that potential customers see as unique. Who has the advantage in each market? Monopolies. ... Innocent monopoly", or monopoly achieved solely by merit, is perfectly legal, but acts by a monopolist...
  • What Trustees Need to Know About Developing a

    What Trustees Need to Know About Developing a

    Former DPS SWAT Team Leader. Commander of DPS Dive Recovery Team. Federal Bureau of Investigation National Academy Graduate. 34 years of law enforcement experience. ... Use building and fire codes. Build and support a strong culture of safety and security.