Safaba Template

Safaba Template

Machine Translation in Academia and in the Commercial World: a Contrastive Perspective Alon Lavie Research Professor LTI, Carnegie Mellon University Co-founder, President and CTO Safaba Translation Solutions WMT-2014 June 26, 2014 My Two Perspective Views on MT > Research Professor Language Technologies Inst., Carnegie Mellon > Main areas of research: > MT evaluation metrics: Meteor > Syntax-based MT: syntax-to-syntax models > MT System Combination: CMU MEMT System > MT into morphologically-rich languages (Arabic) > MT for human translation and post-editing > Co-founder, President and CTO Safaba Translation Solutions > Commercial MT technology company focused on solutions and services to global enterprises 2

Safaba Translation Solutions 3 Safaba Translation Solutions 4 Safaba Translation Solutions > Mission Statement: Safaba helps global corporations translate and localize their large volumes of corporate content into the local languages of the markets in which they operate, by dramatically improving translation velocity and reducing translation costs > Customers: Global corporations, primarily in the hardware, software and IT space, such as Dell, PayPal > Partners: Select commercial Language Service Providers (LSPs), such as Welocalize, ABBYY-LS > MT Solutions: Primarily real-time MT services delivered as software-as-a-service (SaaS) using dedicated hosted private-cloud platform 5 Safaba Translation Solutions > Business Model: > Primary - Full-Service SaaS Model: client delivers data resources,

Safaba develops and deploys the MT engines as remote hosted services > Secondary Full-Service with on-site installation > Secondary Do It Yourself (DIY) service using Safabas EMTGlobal Online platform > Clients typically pay us for MT Implementation, Integration and a volume-based annual license > Our Largest Deployment: Dell.com content is translated daily from English into 28 different languages by Safaba's automated translation solutions in collaboration with Welocalize. > Volume: Dell.com translates over 1M words per month through the Safaba EMTGlobal MT platform. 6 Safaba Translation Solutions > Enterprise Impact and ROI at Dell of Welocalize + Safaba MT Program: > Wayne Bourland Director of Translation, Dell.com Enterprise Language Strategy, TAUS ILF, June 2014 > > > > Translation cost reduced by nearly 40% on average Savings to-date of $2.4M from using MT

Project delivery times reduced by 40% - 5 days to 3 Quality has been maintained at the same level as traditional HT > ROI for MT over 900% 7 Safaba MT Technology Overview > Main MT Technology Stack: > Predominantly NLP-augmented phrase-based statistical MT technology > MT runtime decoding platform based on Moses, augmented with Safaba-proprietary pre and post processing modules > Safaba-proprietary MT development platform based in part on open-source components (Moses, FastAlign, KenLM, etc.) > DuctTape as a workflow management framework that supports the entire MT development workflow > Main MT Technology Challenges: > > > > > 8 Effective and scalable client-specific adaptation

Maximizing MT accuracy into many morphologically-rich languages Translation of highly-structured content Maximizing translator MT post-editing productivity Frequent and ongoing adaptation Talk Objectives > Provide some deeper insight about the characteristic differences between typical academic MT systems (i.e. for WMT and NIST evaluations) and Safabas typical commercial systems > Provide a closer look at some of the main R&D challenges and requirements for delivering advanced hosted real-time Statistical MT services and solutions in commercial settings > Motivate the broader research community to work more extensively on MT problems and solutions for commercially-relevant content-types and domains 9 WMT MT Systems vs Safaba MT Systems > WMT: MT for Assimilation (mostly) > Broad-domain systems: News commentary, medical information > Training data: Europarl, News commentary, Common Crawl, Gigaword > In and out of English and several major European languages > Safaba: MT for Dissemination (mostly)

> Client-specific and client-adapted MT engines for enterprise clients > Typically domain-focused and consistent content types: product information and documentation, customer support, marketing > Training data: Translation Memories and other assets from the client + domain-relevant background data (i.e. TAUS data) > Mostly out of English, into 30+ languages (European, Asian, South American variants of ES and PT) > Different language variants (FR-France/Canada, PT-Portugal/Brazil, ES-Spain/Latin America, EN-US/GB, etc.) 10 TAUS: Translation Automation User Society https://www.taus.net/ 11 TAUS Data > https://www.tausdata.org/ > Data repository consisting of pooled parallel translation data from over 100 contributors (primarily large corporations and LSPs) > Total data assets: about 56 Billion words (including matrix TMs) > Variety of domains: hardware, software, IT, financial, automotive, medical and bio-pharma, etc.

> Mostly categorized, indexed and word-aligned > Free online search as a translation memory, terminology DB > Coming soon: freely available for non-commercial academic research!! > Data Example: > ENUS-to-ESES: 217.4 M source words > > > > 12 Computer Software: 66.9 M words Computer Hardware: 9.0 M words Legal Services: 2.4 M words Other: 138.5M words Some Contrastive MT System Scores > BLEU Scores of best WMT-2014 MT systems versus Safaba-developed TAUS data generic MT systems Language Pair Best WMT2014 13

Safaba TAUS Generic EN-to-FR 35.8 65.4 EN-to-ES 30.4 * 66.2 EN-to-RU 29.9 41.6 EN-to-CS 21.6 43.6 EN-to-DE

20.6 52.5 FR-to-EN 35.0 68.0 RU-to-EN 31.8 --- ES-to-EN 31.4 * 70.4 DE-to-EN 29.0 62.4

CS-to-EN 28.8 --- Sample Safaba Output > Unseen test set output, Safaba ES-to-EN TAUS Generic: Reference 14 WMT vs TAUS: EN-to-DE MT Systems > Used Safaba default EN-to-DE pipeline to develop a WMT-2014 EN-to-DE MT system, as a contrastive reference to our TAUS EN-to-DE system > Safaba WMT system: > > > > Phrase-based system with domain adaptation Constrained WMT-2014 parallel data resources only No extra monolingual data for LM (i.e. GigaWord or CommonCrawl) News Commentary as in-domain, everything else as

background > Resulting system scores 17.3 cased BLEU (best system is 20.6) > Training Statistics: 15 Training Segments WMT 2014 TAUS Generic 4,143,962 5,767,915 Training Tokens (EN) Training Tokens (DE) Average tokens/segment EN Average tokens/segment DE Global length ratio DE/EN 106,951,743 101,810,648 25.8 24.6 95.2% 85,331,463 89,190,947 14.8

15.5 104.5% WMT vs TAUS: EN-to-DE MT Systems TAUS vs. WMT Input Length Distribution 45.00% 40.00% Fraction of Sentences 35.00% 30.00% 25.00% 20.00% 15.00% 10.00% 5.00% 0.00% 19 1019 2029 3039

4049 5059 Input Length (tokens) TAUS 16 WMT 2014 6069 7079 8089 9099 WMT vs TAUS: EN-to-DE MT Systems TAUS vs. WMT Input Length Distribution (cdf) 100.00% 90.00% 80.00%

Fraction of Sentences 70.00% 60.00% 50.00% 40.00% 30.00% 20.00% 10.00% 0.00% 0 10 20 30 40 50 Input Length (tokens) TAUS 17

WMT 2014 60 70 80 90 100 WMT vs TAUS: EN-to-DE MT Systems > Word Alignment Statistics: Mgiza++, Grow-diag sym. # training segments # training tokens EN # training tokens DE # of alignment links, gd Average links per token EN Average links per token DE 90.00% 90.00% 80.00%

80.00% 70.00% 70.00% 60.00% 50.00% 40.00% 30.00% 20.00% 10.00% 0.00% 60.00% 50.00% 40.00% 30.00% 20.00% 10.00% 0 1 2

310 Alignment Links for Token TAUS 18 TAUS vs. WMT Alignment Link Distribution (DE) Fraction of Target Tokens Fraction of Source Tokens TAUS vs. WMT Alignment Link Distribution (EN) WMT 2014 TAUS Generic 4,143,962 5,767,915 106,951,743 85,331,463 101,810,648 89,190,947 91,519,169 85,607,364 0.856 1.003

0.899 0.960 WMT 2014 0.00% 0 1 2 310 Alignment Links for Token TAUS WMT 2014 WMT vs TAUS: EN-to-DE MT Systems > Phrase Extraction Statistics: WMT 2014 19

TAUS Generic # training tokens EN 106,951,743 85,331,463 # training tokens DE 101,810,648 89,190,947 Total extracted phrase instances Average phrases/token EN Average phrases/token DE Unique phrases EN Unique phrases DE Average instances per phrase EN Average instances per phrase DE 652,123,624 6.10 6.41 156,911,242 168,034,534 4.16

3.88 374,142,109 4.38 4.19 80,497,425 97,586,721 4.65 3.83 Total unique phrase pairs Average instances per phrase pair Average translations per phrase EN Average translations per phrase DE 503,220,418 1.30 3.21 2.99 177,760,867 2.10 2.21 1.82 WMT vs TAUS: EN-to-DE MT Systems > Phrase Count Distribution Statistics:

Phrase Pair Count Histogram: WMT 2014 TAUS WMT 2014 485,511,30 137,309,18 1 2 4 96.48% 2 10,193,347 26,380,365 2.03% 3 2,710,843 5,760,614 0.54% 4 1,291,623 3,019,769 0.26% 5+Phrase 3,513,303 0.70% TAUS vs. WMT Pair Count5,290,935 Distribution 100.00% Fraction of Phrase Pairs 90.00%

80.00% 70.00% 60.00% 50.00% 40.00% 30.00% 20.00% 10.00% 0.00% 1 2 3 Phrase Pair Count TAUS 20 WMT 2014 4 5 TAUS

77.24% 14.84% 3.24% 1.70% 2.98% WMT vs TAUS: EN-to-DE MT Systems > Phrase Translation Ambiguity TAUS vs. WMT Targets Per Source (cdf) 100.00% Fraction of Source Phrases 90.00% 80.00% 70.00% 60.00% 50.00% 40.00% 30.00% 20.00% 10.00% 0.00% 0 2

4 6 8 10 12 Target Sides per Source Phrase TAUS 21 WMT 2014 14 16 18 20 WMT vs TAUS: EN-to-DE MT Systems

> Test-set Decoding Statistics: # test set segments # test set source types # test set source tokens Average test set tokens/segment # decoder phrases used on test set Average decoder source phrase length # test set OOV types # test set OOV tokens OOV rate (types / type) OOV rate (tokens / running token) Test set BLEU Test set METEOR Test set TER Test set length ratio (MT/Ref) 22 WMT 2014 WMT 2014 TAUS Generic newstest201 newstest201 2 4 test 3003 2737

1200 10267 9650 4554 73643 62871 19332 24.5 23.0 16.1 39982 34631 8642 1.84 450 720 4.38% 0.98% 15.0 34.8 67.9 97.7 1.82 493 797 5.11% 1.27%

17.1 38.8 66.5 102.8 2.24 82 83 1.80% 0.43% 52.5 63.5 38.5 100.8 WMT vs TAUS: EN-to-DE MT Systems TAUS vs. WMT Segment-Level METEOR Score Distribution Fraction of test-set segments 30.00% 25.00% 20.00% 15.00% 10.00% 5.00% 0.00%

0.0 0.1 0.2 0.3 0.4 0.5 0.6 Segment-level METEOR score TAUS (test) 23 WMT (nt2014) 0.7 0.8 0.9

1.0 WMT vs TAUS: EN-to-DE MT Systems TAUS vs. WMT Decoder Phrase Length Distribution Fraction of phrase pair instances used on test set 60.00% 50.00% 40.00% 30.00% 20.00% 10.00% 0.00% 1 2 3

4 Phrase pair length (source tokens) TAUS (test) 24 WMT (nt2014) 5 6 7 WMT vs TAUS MT Systems - Insights > What explains the dramatic difference in translation quality between these two setups? > Consistent domain(s) versus broad domain > Much lower OOV rates for TAUS (0.43% vs. 1.27%) > Longer phrase matches for TAUS (average 2.24 vs. 1.82) > Significantly more frequently-occurring phrases for TAUS > Lower translation ambiguity for TAUS (2.21 vs. 3.21) > Indirect evidence for significantly cleaner and more parallel training data > Denser word alignments for TAUS (1.003 vs. 0.856 links per EN token)

> Significantly fewer unaligned words for TAUS (9.39% vs. 22.27%) > Significantly more frequently-occurring phrases for TAUS > Lower translation ambiguity for TAUS (2.21 vs. 3.21) > TAUS primary data source is highly-QAed commercial TMs > Shorter input segments allow limited-window reordering models to cover a significantly larger fraction of the data > Conclusion: TAUS data is a cleaner, higher-quality and potentially more suitable data source for clean-lab experiments with advanced translation models with results having potentially significant commercial relevance. 25 Multilingual Meteor > http://www.cs.cmu.edu/~alavie/METEOR/ > We extensively use Meteor at Safaba > As an MT evaluation toolkit > As a monolingual aligner with flexible matches 26 Multilingual Meteor > Meteor has expanded to cover 17 languages: 27 Meteor Universal > [Denkowski and Lavie, 2014] WMT-2014 Metrics Task

> New support included in Meteor 1.5: > Support for any target language using only bi-text used to > > > > 28 build statistical MT systems Learn paraphrases by phrase pivoting (Bannard and CallisonBurch, 2005) Learn function words by relative frequency in monolingual data Universal parameter set learned by pooling data from all WMT languages Significantly outperforms baseline metrics on unseen languages with no development data. Safaba MT Architecture Overview Development Platform Developmen t Frontend MT Development Workflow Deployme nt

Production Platform Safaba MT API and Integratio n Connecto rs Safaba Server Fronten d DB 29 MT System Production Cloud MT MT MT Development Workflow Management > Main Alternatives: > train-factored-model.perl > For Moses, fossilized 9 steps

> Experiment.perl > For Moses, customizable > LoonyBin [Clark and Lavie, 2009] > General-purpose, customizable > DuctTape > Unix-based workflow management system for experimental NLP pipelines > General-purpose, customizable, with nice execution properties > Open-source, initial development by Jonathan Clark > https://github.com/jhclark/ducttape 30 DuctTape > Break long pipelines into series of tasks: small block of arbitrary Bash code > Specify inputs, outputs, configuration parameters, and what tools are required for each task > Designed to easily test multiple settings via branch points > DuctTape runs everything in the right order 31 DuctTape: Tasks

task align_mkcls_src : mgiza < corpus=$train_src_for_align > classes :: num_classes=50 :: num_runs=2 { zcat -f $corpus > corpus $mgiza/bin/mkcls -c$num_classes -n$num_runs \ -pcorpus -V$classes opt rm corpus } task align_mgiza_direction : mgiza < [email protected]_mkcls_src < [email protected]_mkcls_tgt < ... > src_tgt_alignments :: ... { ... } DuctTape: Tasks task align_mkcls_src : mgiza < corpus=$train_src_for_align > classes :: num_classes=50 :: num_runs=2 {

zcat -f $corpus > corpus $mgiza/bin/mkcls -c$num_classes -n$num_runs \ -pcorpus -V$classes opt rm corpus } task align_mgiza_direction : mgiza < [email protected]_mkcls_src < [email protected]_mkcls_tgt < ... > src_tgt_alignments :: ... { ... } DuctTape: Tasks task align_mkcls_src : mgiza < corpus=$train_src_for_align > classes :: num_classes=50 :: num_runs=2 { zcat -f $corpus > corpus $mgiza/bin/mkcls -c$num_classes -n$num_runs \ -pcorpus -V$classes opt rm corpus } task align_mgiza_direction : mgiza

< src_c[email protected]_mkcls_src < [email protected]_mkcls_tgt < ... > src_tgt_alignments :: ... { ... } DuctTape: Tasks DuctTape: Branch Points task align_mkcls_src : mgiza < corpus=$train_src_for_align > classes :: num_classes=50 :: num_runs=2 { zcat -f $corpus > corpus $mgiza/bin/mkcls -c$num_classes -n$num_runs \ -pcorpus -V$classes opt rm corpus } task align_mgiza_direction : mgiza < [email protected]_mkcls_src < [email protected]_mkcls_tgt < ...

> src_tgt_alignments :: ... { ... } DuctTape: Branch Points task align_mkcls_src : mgiza < corpus=$train_src_for_align > classes :: num_classes=(Classes: small=50 large=1000) :: num_runs=2 { zcat -f $corpus > corpus $mgiza/bin/mkcls -c$num_classes -n$num_runs \ -pcorpus -V$classes opt rm corpus } num_classes=50 num_classes=1000 task align_mgiza_direction : mgiza task align_mgiza_direction : mgiza < [email protected]_mkcls_src < [email protected]_mkcls_src < [email protected]_mkcls_tgt < [email protected]_mkcls_tgt

< ... < ... > src_tgt_alignments > src_tgt_alignments :: ... :: ... { { ... ... } } DuctTape: Branch Points DuctTape: Workflows INI file DuctTape: Workflows INI file DuctTape workflow ducttape workflow.tape -C myparams.ini package

Safaba MT Deployment Process > Deployment involves: > Packaging a Safaba MT system coming out of the development process > Staging the system for production > Migrating the system to our production platform > Activating the system within production > Packaging: > Generating a software container with local copies of all data files, software modules and parameter files required to run the MT system in production > Staging: > The MT system is staged locally as a real-time translator for rigorous functionality and unit-testing > Migration: > Secure rsync transfer of the staged MT system to the Safaba production platform > Activation: > Updating of runtime DB and configuration files, and MT engine launch in production 41 Safaba EMTGlobal Online > Web-based overlay platform and UI that supports remote

development, deployment and runtime access and monitoring of Safaba EMTGlobal MT systems > Provides functionality similar to MS Hub and other cloudbased MT development platforms > Primary Use Cases: > > > > 42 DIY MT Platform for select Safaba clients and partners Monitoring and Testing platform for our end clients Safaba system demonstrations Internal training and development Safaba EMTGlobal Online 43 Safaba EMTGlobal Online 44 Safaba EMTGlobal Online 45

Safaba EMTGlobal Online 46 Safaba EMTGlobal Online 47 External Workflow Integrations Client CMS/TMS Platform Content Managemen t Translation Management Workflow Safaba Connecto r Translation Memory Safaba MT API and Integratio n

Connecto rs Safaba Server Fronten d Content Publication CAT Translatio n UI MT System Production Cloud DB Safaba Production Platform 48 MT MT Translation with MT Post-Editing > Translation Setup:

> Source document is pre-translated by translation memory matches augmented by Safaba MT > Translation Memory fuzzy match threshold typically set at 7585% > Pre-translations are presented to human translator as starting point for editing; translators can use or ignore the suggested pretranslations > Training: > Translation teams typically receive training in MT post-editing > Post-Editing Productivity Assessment: > Contrastive translation projects that measure and compare translation team productivity with MT post-editing versus translation using just translation memories > Productivity measured by contrasting translated words per hour under both conditions: MT-PE throughput / HT throughput 49 MT Post-Editing Productivity Assessment > Evaluated by Welocalize in the context of our joint Dell MT Program 90.00 BLEU 300.00% PE Distance

80.00 Productivity Delta 250.00% 70.00 60.00 200.00% 50.00 150.00% 40.00 30.00 100.00% 20.00 50.00% 10.00 0.00 50 0.00% Challenge: Structured Content Translation

> Commercial enterprise translation data is often in the form of files in structured formats converted for translation into XML-based schemas (i.e. XLIFF and TMX) with tag-annotated segments of source text > Correctly projecting and placing these segment-internal tags from the source language to the target language is a well-known difficult challenge for MT in general, and statistical MT engines in particular > Safaba has focused significant effort to developing advanced high-accuracy algorithms for source-to-target tag projection within our EMTGlobal MT solution > Example: Source (EN): Click the Advanced tab, and click Change. Reference (PT): Clique no separador Avanado e em Alterar. 51 Challenge: Structured Content Translation > Structured Tag Projection Process: les ordinateurs de bureau les plus populaires pour lcole et la maison 52

Challenge: Structured Content Translation > Structured Tag Projection Process: > Strip out all internal tags from the input and remember their original contexts. les ordinateurs de bureau les plus populaires pour lcole et la maison les ordinateurs de bureau lcole et la maison 53 les plus populaires pour Challenge: Structured Content Translation > Structured Tag Projection Process: > Translate pure text segment and preserve word and phrase alignments. les ordinateurs de bureau les plus populaires pour lcole et la maison les ordinateurs de bureau lcole et la maison

les plus populaires les plus populaires les ordinateurs de bureau lcole et la maison popular desktops school and home 54 pour pour for Challenge: Structured Content Translation > Structured Tag Projection Process: > Reinsert tags with rules based on alignments, contexts and tag types. les ordinateurs de bureau les plus populaires pour lcole et la maison

les ordinateurs de bureau lcole et la maison les plus populaires les plus populaires les ordinateurs de bureau lcole et la maison popular desktops school and home popular desktops school and home popular desktops for school and home 55 pour pour for for

Tag Projection Accuracy Evaluation > > > > > > 56 Goal: Assess tag projection and placement accuracy of EMTGlobal version 1.1 versus 2.1, based on analysis of post-edited MT segments generated by Welocalize for Safabas eDell MT engines in production Methodology: Estimate accuracy by aligning the target language raw MT output with the post-edited MT version and assess whether each tag is placed between the same target words on both sides Example: Reference: Clique no separador Avanado e em Alterar. EMTGlobal v1.1: Clique na guia Avanado e em Alterar. EMTGlobal v2.1: Clique na guia Avanado e em Alterar.

Tag Projection Accuracy Evaluation [Beregovaya, Lavie and Denkowski, MT Summit 2013] EMTGlobal version 1.1 Tag Type Beginnin g Ending Standalone Total Context Matched Both Left EMTGlobal version 2.1 Right Neither Total 100.00% 100.00% 33.33% 32.06%

19.44% 10.10% 11.46% 8.01% 35.76% 49.83% 56.91% 39.95% 23.98% 17.54% 18.29% 12.30% 0.81% 100.00% 30.21% 100.00% Contexts Tag Type Both Beginnin g

66.67% Ending 63.41% Standalone 67.89% Total 65.90% Matched Left Right Neither Total 12.50% 10.80% 9.38% 11.50% 11.46% 14.29% 100.00% 100.00%

18.29% 13.64% 13.01% 11.21% 0.81% 9.26% 100.00% 100.00% > Fraction of likely incorrectly placed tags reduced from 30% to 9% > Fraction of confirmed correctly placed tags improved from 40% to 66% > Fraction of tags with partially-matched contexts reduced from 30% to 25% > Data: Welocalize post-editing productivity data set > > > > 57 26 target languages, one document per language, 4907 segments

For 15 languages (3211 segments), EMTGlobal v1.1 was post-edited For 11 languages (1696 segments), EMTGlobal v2.1 was post-edited Total of 830 tags in PE segments, 821 aligned with MT output (98.9%) Client-Specific Adaptation > The majority of the MT systems Safaba develops are specifically developed and optimized for specific client content types > Data Scenario: > Some amount of client-specific data: translation memories, terminology glossaries and monolingual data resources > Additional domain-specific and general background data resources: other client-specific content types, TAUS data, other general parallel and monolingual background data > Safaba Collection of Adaptation Approaches: > > > > Data selection, filtering and prioritization methods Data mixture and interpolation methods Model mixture and interpolation methods Client-specific Automated Post-Editing (Language Optimization Engine) > Styling and Formatting post-processing modules > Terminology and DNT runtime overrides

58 Challenge: Content Drift > Client-specific systems often degrade in performance over time for two main reasons: 1. Client content, even in controlled-domains, gradually changes over time: new products, new terminology, new content developers 2. The typical integrated setup of MT and translation memories: TMs are updated more frequently, so only harder segments are sent to MT > We see strong evidence of content drift over time with many of our clients, especially in post-editing setups > The ongoing generation of new translated content with MT post-editing provides opportunities for generating an MT feedback loop retrain and/or adapt the MT systems on an ongoing basis > This motivates our focus on ongoing adaptation approaches 59 Challenge: Content Drift > Evidence from a typical client-specific MT system: > EN-to-DE MT System - original and retrained systems: > February 2013 System: 565K client + 964K background segments

> March 2014 System: 594K client + 6,795K background segments (including 140K aged-out client segments) > Two test sets: > Original test set from February 2013 system build (1,200 segments) > Incremental test set extracted from incremental data (500 segments) > System Test Scores and Statistics: 60 Lang System Gloss Inconsist. Orig. BLEU Orig. MET Orig. TER

Orig. Orig. LEN OOVs Incr. BLEU Incr. MET Incr. TER Incr. Incr. LEN OOVs DE Feb. 2013 55.7 % 51.0 63.4 38.2 101.2 63

41.7 56.6 45.0 101. 2 107 DE March 2014 24.8 % 52.9 64.2 36.9 100.5 33 60.5 69.9 30.3 99.9 31 Challenge: Content Drift > Evidence from a typical client-specific MT system: > EN-to-DE MT System - original and retrained systems: > February 2013 System: 565K client + 964K background segments > March 2014 System: 594K client + 6,795K background segments

(including 140K aged-out client segments) > Two test sets: > Original test set from February 2013 system build (1,200 segments) > Incremental test set extracted from incremental data (500 segments) > System Test Scores and Statistics: 61 Lang System Gloss Inconsist. Orig. BLEU Orig. MET Orig. TER

Orig. Orig. LEN OOVs Incr. BLEU Incr. MET Incr. TER Incr. Incr. LEN OOVs DE Feb. 2013 55.7 % 51.0 63.4 38.2 101.2 63

41.7 56.6 45.0 101. 2 107 DE March 2014 24.8 % 52.9 64.2 36.9 100.5 33 60.5 69.9 30.3 99.9 31 Challenge: Content Drift > Evidence from a typical client-specific MT system: > EN-to-DE MT System - original and retrained systems: > February 2013 System: 565K client + 964K background segments > March 2014 System: 594K client + 6,795K background segments (including 140K aged-out client segments)

> Two test sets: > Original test set from February 2013 system build (1,200 segments) > Incremental test set extracted from incremental data (500 segments) > System Test Scores and Statistics: 62 Lang System Gloss Inconsist. Orig. BLEU Orig. MET Orig. TER Orig. Orig.

LEN OOVs Incr. BLEU Incr. MET Incr. TER Incr. Incr. LEN OOVs DE Feb. 2013 55.7 % 51.0 63.4 38.2 101.2 63 41.7 56.6 45.0 101. 2

107 DE March 2014 24.8 % 52.9 64.2 36.9 100.5 33 60.5 69.9 30.3 99.9 31 Challenge: Content Drift > Evidence from a typical client-specific MT system: > EN-to-DE MT System - original and retrained systems: > February 2013 System: 565K client + 964K background segments > March 2014 System: 594K client + 6,795K background segments (including 140K aged-out client segments) > Two test sets:

> Original test set from February 2013 system build (1,200 segments) > Incremental test set extracted from incremental data (500 segments) > System Test Scores and Statistics: 63 Lang System Gloss Inconsist. Orig. BLEU Orig. MET Orig. TER Orig. Orig. LEN OOVs

Incr. BLEU Incr. MET Incr. TER Incr. Incr. LEN OOVs DE Feb. 2013 55.7 % 51.0 63.4 38.2 101.2 63 41.7 56.6 45.0 101. 2 107

DE March 2014 24.8 % 52.9 64.2 36.9 100.5 33 60.5 69.9 30.3 99.9 31 Challenge: Content Drift > Evidence from a typical client-specific MT system: > EN-to-DE MT System - original and retrained systems: > February 2013 System: 565K client + 964K background segments > March 2014 System: 594K client + 6,795K background segments (including 140K aged-out client segments) > Two test sets: > Original test set from February 2013 system build (1,200 segments)

> Incremental test set extracted from incremental data (500 segments) > System Test Scores and Statistics: 64 Lang System Gloss Inconsist. Orig. BLEU Orig. MET Orig. TER Orig. Orig. LEN OOVs Incr. BLEU

Incr. MET Incr. TER Incr. Incr. LEN OOVs DE Feb. 2013 55.7 % 51.0 63.4 38.2 101.2 63 41.7 56.6 45.0 101. 2 107

DE March 2014 24.8 % 52.9 64.2 36.9 100.5 33 60.5 69.9 30.3 99.9 31 Challenge: Content Drift > Evidence from a typical client-specific MT system: > EN-to-DE MT System - original and retrained systems: > February 2013 System: 565K client + 964K background segments > March 2014 System: 594K client + 6,795K background segments (including 140K aged-out client segments) > Two test sets: > Original test set from February 2013 system build (1,200 segments) > Incremental test set extracted from incremental data (500

segments) > System Test Scores and Statistics: 65 Lang System Gloss Inconsist. Orig. BLEU Orig. MET Orig. TER Orig. Orig. LEN OOVs Incr. BLEU

Incr. MET Incr. TER Incr. Incr. LEN OOVs DE Feb. 2013 55.7 % 51.0 63.4 38.2 101.2 63 41.7 56.6 45.0 101. 2 107 DE

March 2014 24.8 % 52.9 64.2 36.9 100.5 33 60.5 69.9 30.3 99.9 31 Overnight Incremental Adaptation > Objective: Counter content drift and help maintain and accelerate post-editing productivity with fast and frequent incremental adaptation retraining > Setting: New additional post-edited client data is deposited and made available for adaptation in small incremental batches > Challenge: Full offline system retraining is slow and computationally intensive and can take several days > Safaba Solution: implement fast light-weight adaptations that can be executed, tested and deployed

into production within hours (overnight) > Suffix-array variant of Moses supports rapid updating of indexed training data > Safaba automated post-editing module supports rapid retraining > KenLM supports rapid rebuilding of language models > Currently in pilot testing with Welocalize and one of our major clients 66 Real-time Online Adaptation > Ultimate Goal: immediate online feedback loop between MT post-editing and the live MT system in the background > Engineering Challenge: requires a fully integrated online solution where the MT post-editors translation environment is directly connected to the real-time MT engine, and feeds back post-edited segments immediately back to the MT engine for online adaptation > MT Challenge: extend training of all major MT system components to operate in online mode rather than batch mode > Main focus of Michael Denkowskis PhD thesis at LTI > Fully implemented, fully online adapting MT system > Recently published work: > [Denkowski, Dyer and Lavie, 2014] EACL 2014 > [Denkowski, Lavie, Lacruz and Dyer, 2014] EACL 2014 Workshop 67

on Humans and Computer-assisted Translation Real-time Online Adaptation > Static MT System: > Grammar: precompiled corpus level grammar (Chiang, 2005) > LM: kndiscount N-gram model (Chen and Goodman, 1996) > Feature Weights: batch (corpus-level) optimization with MERT (Och, 2003) > Online Adaptive MT System: > Grammar: on-demand sentence level with online learning [Denkowski et al., 2014] > LM: updateable Bayesian N-gram model [Denkowski et al., 2014] > Feature Weights: online learning with MIRA [Chiang, 2012] > Online Adaptation: Update all components immediately after each sentence is post-edited, before MT generated for next sentence 68 Real-time Online Adaptation 69 Real-time Online Adaptation 70

Real-time Online Adaptation 71 Real-time Online Adaptation 72 Real-time Online Adaptation > Online Grammar Extraction: > Index bi-text with suffix array, extract sentence-level grammars on demand [Lopez, 2008] > Index bilingual sentences from post-editing data in a separate suffix-array as they become available > Grammar for each sentence learned using a sample from suffix array (S) and full locally-indexed postediting data (L) > Grammar Rule Features: 73 Real-time Online Adaptation 74 Real-time Online Adaptation > Tuning an Online Adaptive System Using Simulated Post-Editing:

> Real post-edited segments are not available during initial system training and tuning > Challenge: How do we learn discriminative weights for our online features? > Solution: Use pre-generated references in place of post-editing [Hardt and Elming, 2010] 75 Real-time Online Adaptation > Simulated Post-Editing Experiments: > Baseline MT system (cdec): > Hierarchical phrase-based model with suffix array grammars > Large Modified Kneser-Ney smoothed LM > MIRA optimization > Online Adaptive Systems: > Update grammars, LM, and weights independently and in combination > Training Data: > WMT-2012 SpanishEnglish and NIST 2012 ArabicEnglish > Evaluation Data: > WMT News Commentary test sets and out-of-domain TED talks 76

Real-time Online Adaptation > Evaluation Results: 77 Real-time Online Adaptation > Evaluation with Live Human Translator PostEditing: > Fully integrated adaptive MT system with TransCenter 78 Real-time Online Adaptation > Evaluation with Live Human Translator PostEditing: > Experimental Setup: > Six translators post-edited 4 talk excerpts totaling 100 MTgenerated segments > Two excerpts translated by static system, two by adaptive system > Evaluated post-editing effort (HTER) and translator rating of MT suitability > Results: > Adaptive system significantly outperforms static baseline > Compared to simulated post-editing with static references > Small improvement in simulated scenario leads to significant improvement in our live scenario 79

Concluding Remarks > MT for Dissemination vs. MT for Assimilation: quite different! > Commercially-relevant data such as TAUS data has some significant advantages for clean lab MT modeling research work > Commercially-useful MT systems have unique requirements and introduce a broad range interesting problems for researchers to focus on: > High-accuracy translation of structured content > Translation of terminology-heavy content, respecting brand > > > > language and style MT adaptation with limited amounts of client-specific data Ongoing adaptation to address content drift Optimizing MT post-editing productivity Real-time online adaptation > Safaba is doing some cool MT stuff! 80 Acknowledgements > My CMU collaborators, students and contributors:

Chris Dyer, Noah Smith, Michael Denkowski, Greg Hanneman, Austin Matthews, Jonathan Clark, Kenneth Heafield, Wes Feely and the c-lab group > My Safaba colleagues, current and former staff members and consultants: Bob Olszewski, Udi Hershkovich, Jonathan Clark, Michael Denkowski, Greg Hanneman, Sagi Perel, Matt Fiorillo, Austin Matthews, Pradeep Dasigi, Wes Feely, Serena Jeblee, Callie Vaughn, Laura Kieras, Ryan Carlson, Kenton Murray, Chris Dyer. > Other Collaborators: Isabel Lacruz and Greg Shreve @ Kent State; Olga Beregovaya and the team @ Welocalize 81 Thank You! 82

Recently Viewed Presentations

  • Socials 10: Chapter 7 The Emergence of Modern

    Socials 10: Chapter 7 The Emergence of Modern

    Laurier compromised once again, and decided Canada would have its own navy that could be turned over to Britain in times of emergency. To complicate matters, Canada's first warships were two vessels lent to Canada by Britain: the Niobe and...
  • Bruce Buckley Strategic Director Frances James Assistant Director

    Bruce Buckley Strategic Director Frances James Assistant Director

    Assistant Director Social Inclusion - Children & Family Support Frances James Senior District Manager Children with Disabilities Mary Wilton Head of SEN and Disabilities Ralph Ashton 7 x District Managers [0-11] Louis Hughes Sue Jones Rosie Kightley Jill Langley Deb...
  • Review Day: Make A Comic Strip to Review A Journal

    Review Day: Make A Comic Strip to Review A Journal

    Review Day: Make A Comic Strip to Review A Journal. You and a partner will be assigned a journal/journals/or range to select from to create a review comic strip. Practice using your developing comic analysis skills to represent/teach the learning...
  • What is Sports Medicine? - Commack Schools

    What is Sports Medicine? - Commack Schools

    What is Sports Medicine? is a branch of medicine that deals with physical fitness, treatment and prevention of injuries related to sports and exercise. Sports Medicine Team. Doctor. Athletic Trainer. Coach. Athlete.
  • www.lcps.org

    www.lcps.org

    Activité. Classeur. Une Chanson de Noël. I. Corrections . 1. Le cahier d'exercices pp. 95-103 (TOUT !) 2. Feuille : Le passé composé : Act. G-H-I. Activité ...
  • Corporate Title - 36pt, Three Lines Max. Anchor: Bottom Left

    Corporate Title - 36pt, Three Lines Max. Anchor: Bottom Left

    Mark Po-Hung Lin. Comparison of detailed routing violations for best placement results in 2015 vs. 2014 ... Ivan Kissiov. Professor Evangeline Young and her student Wing-Kai Chow generously provided their . RippleDP. detailed placer to the contest.
  • Ch 4, Sec 1: Life in the Colonies

    Ch 4, Sec 1: Life in the Colonies

    Religious freedom Rise in slavery Large families were started early in life Colonies were a healthy place to live City Life New York, Savannah, Philadelphia, and Charles Town were major cities Meetinghouse-middle of town (church/gov't) Common land was set up...
  • Higher Still English Norman MacCaig: "Brooklyn Cop" "Hotel

    Higher Still English Norman MacCaig: "Brooklyn Cop" "Hotel

    The poem highlights the word so that the physical structure of the poem almost mirrors the idea of falling through the "thin tissue" into violence. (1 mark). Stanza two reiterates the theme of violence, and shows how quickly the cop...