0% found this document useful (0 votes)

36 views145 pages

AACL Machine Translation Tutorial 2023

The document discusses the development of advanced multilingual machine translation (MNMT) systems, focusing on the importance of inclusivity and accessibility in bridging gaps between high-resource and low-resource languages. It outlines various architectures and techniques, including the evolution of machine translation methods, the significance of data quality, and the role of tokenization strategies in enhancing translation performance. The tutorial also highlights prominent MNMT models and their contributions to improving translation capabilities across multiple languages.

Uploaded by

Lesly HONFO

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

36 views145 pages

AACL Machine Translation Tutorial 2023

Uploaded by

Lesly HONFO

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

Developing State-Of-The-Art Massively

Multilingual Machine Translation

Systems for Related Languages

Jay Gala Pranjal A. Chitale Raj Dabre

AI4Bharat AI4Bharat, NICT
IIT Madras, India IIT Madras, India Kyoto, Japan
1
Get access to the slides here

https://github.com/AI4Bharat/aacl23-mnmt-tutorial
(under construction)
2
Self Introduction: Jay Gala (jaygala24.github.io)

● Experience
○ 2022 - present: AI Resident, AI4Bharat (IIT Madras)
○ 2021 - 2022: Research Intern, UCSD

● Research
○ Multilingual NLP
■ Translation, Language Modeling: 2022 - present
○ Efficient Deep Learning
■ Data Pruning: 2022 - present
■ Neural Architecture Search: 2021 - 2022
○ Federated Learning: 2021 - 2022

3
Self Introduction: Pranjal A. Chitale ([email protected])

● Experience
○ 2021-present: MS Student at IIT Madras (AI4Bharat)
○ 2017-2021: BE Computer Engineering, University of Mumbai
● Research
○ Multilingual NLP
■ Translation, Language Modeling.
○ Efficient Deep Learning

4
Self Introduction: Raj Dabre ([email protected])

● Experience
○ 2018-present: Researcher at NICT, Japan
■ Visiting researcher at AI4Bharat, IIT Madras (and perhaps more soon
🤫)
○ 2014-2018: MEXT Ph.D. scholar at Kyoto University, Japan
○ 2011-2014: M.Tech. Government RA at IIT Bombay, India

● Research
○ Low-Resource Natural Language Processing
■ Multilingual Machine Translation: 2012-present
■ Document Level Machine Translation: 2021-
■ Large Scale Pre-training for Generation: 2021-
○ Efficient Deep Learning:
■ Compact, flexible and fast models (2018-present) 5
Table of Contents

● Introduction + Prominent MNMT (35 mins)

● Data + Benchmark (40 mins)

● Vocabulary (20 mins)

● Break & Q/A (15-20 mins)

● Architecture + Training (70 mins)

● Automatic Evaluation (10 mins)

● Human Evaluation (10 mins)

● Future work (10 mins)

● Q/A
6
Why Machine Translation is still an important task?
Inclusivity and Accessibility Data Augmentation for
Multilingual Performance
Enhancement

Bridge gap between low-resource

languages (HRL) and high-resource
Transfer Learning via Translation
languages (HRL)
Unlocking Multilingual
Improve language coverage
Capabilities of LLMs
(only covers ~1K of ~7K in the
world)
7
Evolution of Machine Translation

Rule-Based Example-Based
Statistical Machine
Machine Machine Neural Machine
Translation
Translation Translation Translation (NMT)
(SMT)
(RBMT) (EBMT)

● Direct MT ● Word-based ● RNNs

● Transfer-based MT ● Syntax-based ● LSTMs
● Interlingua MT ● Phrase-based ● Transformers

1950 - 1980 1980 - 1990 1990 - 2015 2015 -

8
Evolution of Machine Translation

Rule-Based Example-Based
Statistical Machine
Machine Machine Neural Machine
Translation
Translation Translation Translation (NMT)
(SMT)
(RBMT) (EBMT)

● Direct MT ● Word-based ● RNNs

● Transfer-based MT ● Syntax-based ● LSTMs
● Interlingua MT ● Phrase-based ● Transformers

1950 - 1980 1980 - 1990 1990 - 2015 2015 -

This Tutorial

9
Neural MT Basics: Encoder-Decoder Paradigm

image credits Sutskever et al. 2014 10

Neural MT Basics: Encoder-Decoder with Attention

image credits Bahdanau et al. 2015 11

Neural MT Basics: Transformer Architecture

image credits Vaswani et al. 2017 12

Neural MT Basics: Tokenization
Word-level tokenization Character-level Sub-word level
tokenization tokenization
Split on whitespace.
Split on characters. Intermediate solution between
Drawbacks:
word-level and character-level.
No OOV.
1. Cannot handle OOV cases.
Very frequent words at word-level.
Small Vocab-size.
2. Large vocabulary with
Rare words represented at
redundant entries. Drawbacks: character level.
1. Longer token sequences. Best of both worlds.
2. Lacks semantic meaning
which is present at word-
level.

13
Neural MT Basics: Subword MT
frequency word

5 low

2 lower

6 newest

3 wildest

{l, o, w, e, r, n, w, s, t, i, d}

Senrich et al. 2016 14

Neural MT Basics: Subword MT - Sennrich et al. 2016
frequency word word

5 low low

2 lower lower

6 newest n e w es t

3 wildest w i l d es t

{l, o, w, e, r, n, w, s, t, i, d, es}

Senrich et al. 2016 15

Neural MT Basics: Subword MT - Sennrich et al. 2016
frequency word word word

5 low low low

2 lower lower lower

6 newest n e w es t n e w est

3 wildest w i l d es t w i l d est

{l, o, w, e, r, n, w, s, t, i, d, es, est}

Senrich et al. 2016 16

Neural MT Basics: Subword MT - Sennrich et al. 2016
frequency word word word word

5 low low low lo w

2 lower lower lower lo w e r

6 newest n e w es t n e w est n e w est

3 wildest w i l d es t w i l d est w i l d est

{l, o, w, e, r, n, w, s, t, i, d, es, est, lo}

Senrich et al. 2016 17

Neural MT Basics: Subword MT - Sennrich et al. 2016
frequency word word word word word

5 low low low lo w low

2 lower lower lower lo w e r low e r

6 newest n e w es t n e w est n e w est n e w est

3 wildest w i l d es t w i l d est w i l d est w i l d est

{l, o, w, e, r, n, w, s, t, i, d, es, est, lo, low}

Senrich et al. 2016 18

Neural MT Basics: Advances in Subword-level Tokenization

Byte-level BPE (BBPE) WordPiece Unigram (SentencePiece)

Inclusion of all unicode characters Outlined in Schuster et al. 2012, Initialize a large vocabulary and
increases base vocabulary popularized by BERT (Delvin et al., trim it based on training data
2018). likelihood while minimizing loss
Use 256 base tokens to overcome increase.
the above and ensure coverage Initialize character-level vocab
with effectively no UNK token. similar to BPE. Prune until desired size is
reached, retaining base
GPT-2 used BBPE, with 256 base Now, instead of including most characters.
tokens and 50K merges frequent symbol, choose the
symbol that maximizes likelihood Store tokenization options with
of training data post adding to the corpus probabilities, defaulting to
vocabulary. the most likely choice.

Wang et al. 2019, Delvin et al. 2018, Kudo et al. 2018 19

Prominent Massively Multilingual NMT systems

20
Google’s Multilingual NMT

● First approach to train a single enc-dec based model for multilingual NMT

● Shared vocabulary + Prepend target language token (<2en> / <2es>).

● Improves performance on low-resource languages (transfer-learning).

● Enables zero-shot translation.

● Data and compute efficient compared to bilingual NMT models.

Johnson et al. 2016 21

Google’s Massively MNMT

● 59-lingual low-resource models (6-layer transformer base)

● One to many models better than many-to-many for non-English targets

○ 2-3 BLEU improvement

○ Relative over-representation of English

● Many to one models worse than many-to-many for English targets

○ 2-3 BLEU drop

○ Relative under-representation of English

Aharoni et al. 2019 22

Google’s Massively MNMT

● 103-lingual model (6-layer variation of transformer-big)

● Many-to-one and one-to-many are both better than many-to-many

○ N-way corpora is detrimental to many-to-one performance

● Supervised NMT quality degrades with more languages

● Zero-shot NMT quality increases with more languages

Ukranian to Russian Zero-Shot

Aharoni et al. 2019 23

Google’s Massively MNMT Model In the Wild
● Main contributions

○ Temperature based data sampling for transfer-interference balance

○ Pushing number of supported language pairs to limit

○ Pushing MNMT performance with 1+ billion parameters

● Starting point: Quality is directly proportional to data size

Arivazhagan et al. 2019 24

tl;dr
One-to-many and many-to-one
models better than many-to-many
model

25
Arivazhagan et al. 2019
M2M-100: Beyond English-Centric NMT
● First MNMT model trained with large-scale non-English centric mined data.

● It outperforms English-centric systems by 10 points on the widely used BLEU

metric for evaluating machine translations
● M2M-100 is trained on a total of 2,200 language directions — or 10x more
than previous best, English-centric multilingual models.
● LASER2 model used for mining and fastext for language identification.

Fan et al. 2020 26

● M2M-100 Model : 15B parameter model.
● 12B dense and ~3B sparse language-specific (family) parameters.
● 1.2 BLEU point improvement on an average over 1B model (24E-24D).

Fan et al. 2020 27

BART

● Similar to MLM models such as BERT, RoBERTa, etc for enc-dec LMs
● BART: English-pretraining
● Fine-tune the models on specific tasks → transfer learning
● mBART-25 (Liu et al. 2020) and mBART-50 (Tang et al., 2020) extends the
same idea for multilingual models.

Denoising Pretraining

Lewis et al. 2020 28

DeltaLM

Interleaved Decoder: 1. Initialize Encoder and Decoder

with pretrained multilingual
Leverages complete weights of PME,
encoders (InfoXLM).
while XLM does random init of CA.
2. Trained with monolingual +
SA and bottom FFNs initialized using bilingual data using MLM + TLM.
odd layers, CA and top FFNs
initialized using even.
Ma et al. 2021 29
DeltaLM

Ma et al. 2021 30
NLLB-200 : No Language Left Behind
MoEs Dense Distilled XSTS

Word-level low-resource
LID-200 Stopes library
at later
stages
Costa-jussà et al. 2022 31
NLLB-200: Recipe to Scale up to 200 languages
● Mixture-of-Experts Model

○ Better sharing by routing low-resource through shared-weights.

○ Prevents overfitting on low-resource.

● Curriculum learning

○ Train on high-resource first, then introduce low-resource, to prevent overfitting.

● Self-supervised learning

○ Self-supervised learning on monolingual data for low-resource and linguistically

similar high-resource languages for improved performance.

● Diversified back-translation

○ Leverage BT data from various sources including Bilingual SMT models and
existing MNMT models (BT data diversity).

Costa-jussà et al. 2022 32

Average BLEU score on Baseline on FLORES-200 (english-
FLORES-101 (english- centric).
centric) M2M-100 and 3.3B Transformer (base, with SSL, with
DeltaLM BT) 54B MoE (SSL + BT)

Costa-jussà et al. 2022 33

MADLAD-400

● Rigorous filtering of monolingual data from CC across 419

languages.
● Monolingual data -> 3T tokens.
● Joint training (MASS + CE) -> 3 Enc-Dec variants (3B, 7.2B, 10.7B).
● UL2 - 8B-Decoder-only (monolingual data).

Kudugunta et al. 2023 34

Towards the Next 1000 Languages in Multilingual MT

● Scaling to 1000 languages by

leveraging supervised and
unsupervised objectives.
● Leveraging all HRL monolingual
and parallel data available to
enable transfer to LRL.
● Single stage joint training

○ denoising - MASS objective

○ translation - standard CE
objective

Siddhant et al. 2022 35

Towards the Next 1000 Languages in Multilingual MT :
Findings
● High-quality data leads to significant performance improvements on related
zero-resource pairs.
● Num (Supervised Directions) > Num (Self-supervised / Zero-resource
Directions) for maintaining performance.
● Scaling parallel data is more important than scaling monolingual data.

● Self-supervised pre-training leads to domain robustness in NMT.

● Joint-pretraining + NMT better than 2-stage training (like BART).

Siddhant et al. 2022 36

Models for related languages / Demography-specific 37
Main considerations to build SOTA models

Robust SOTA MT models

Data Modeling Benchmark

high-quality data deeper architectures multi-domain

domain diversity training objectives demography-specific
language relatedness language relatedness formality levels

Standard recipe for deep

learning
38
Data: Parallel Corpus Creation

39
Monolingual data : Introduction & Need

● Abundant on the Internet and in electronic format books

● Primarily English and available in document-level format
● Data from books not standardized, difficult to crawl and use
○ needs sophisticated extraction techniques
● Regional language websites - Valuable sources for Low-resource languages
● Collected using large-scale web-crawling efforts
○ example: C4 (English) and mC4 (multilingual)
● Web data crucial for training language models and other purposes

40
How is monolingual data curated ?

URL Filtering Language Line-wise Filtering

Identification
offensive, copyright script-based (unicode) remove undesirable
content and spam model-based lines, repetitions,
filtering (fasttext, cld3, etc) toxicity filters, etc

Text Extraction Document-wise Deduplication

Filtering
tools for extraction in-document fuzzy / exact
include boilerpipe, repetition removal, substring
warcio, trafilatura, etc toxicity filters, etc deduplication

Data Filterin Deduplicatio

Preparation g n
Penedo et al. 2033 41
Monolingual Data Curation Efforts

Large-scale Language-group focused

● CommonCrawl (C4) ● IndicCorp (v1, v2), Varta (Indian)

● mC4 ● IndoNLG (Indonesian)
● Pile ● IndCorpus (Indigenous)
● RedPajama ● WebCrawl African corpora (African)
● RefinedWeb ● CreoleMT/CreoleEval

... Large-scale corpora with more ... Fine-grained language-

general-purpose or one-fit-all specific heuristics designed
heuristics, might yield noisy to ensure high quality corpora
corpora for few languages even though scales of data
at scale. curated might be lower.

42
Sentence Embedding: LABSE
● Supports 109+ languages

● Dual-encoder approach for training

○ Stage 1: Continual pre-training

on MLM + TLM
○ Stage 2: Translation-ranking +
in-batch negative sampling
● Additive margin softmax similar to
SVM to discriminate good and bad
translation pairs
● One-for-all approach

Feng et al. 2020 43

Sentence Embedding: LEALLA
● A distilled LaBSE with increased
inference efficiency and low-dimensional
sentence embedding (128, 192 or 256)
● Performs comparably with LaBSE for
109+ languages

Mao and Nakagawa, 2023 44

Sentence Embedding: LASERx
● LASER1 Supports 93 languages

● LASER - encoder of BiLSTM enc-dec

NMT model
● LASER2 - SPM instead of BPE +
upsampling of low-resource
● LASER3 - distilled LASER2 into
language-specific encoders
● LASER3 competitive with LABSE

● Additional support for low-resource

languages (147 encoders in total)

Schwenk et al. 2017, Artetxe et al. 2018, Heffernan et al. 2022, Tan et al. 2023 45
Sentence Embedding: MuSR

Gao et al. 2023 46

Sentence Embedding: SONAR
● Supports 200 text languages and
37 speech.
● Dual-encoder approach for training

○ Speech-text alignment

○ Text-text alignment

Duquenne et al. 2023 47

Sentence Embedding: SONAR

Duquenne et al. 2023 48

Mining parallel data at scale: Basics

https://news.un.org/sw/

Shared multilingual vector

space*
https://news.un.org/
en/
LABSE / LASER3 49
Issue : Infeasible at scale due to very large search space

___
___
___
En ___
(429M ___
)
___

___
___
FAISS Index for efficient indexing,
Hi ___
(473M clustering, semantic matching and
___ retrieval of dense vectors.
)
Brute-force search (429M x 473M)
is infeasible. (~1000 sent/sec).

Johnson et al. 2019 50

CC-Matrix: Global Monolingual Mining

● Global comparison of all unique sentences between source and target

languages.
● High computational cost, but eliminates the need for manual intervention or
heuristics in identifying document-aligned data.
● FAISS-index used for efficient indexing and retrieval.

● Yields large-scale data, although compromising on quality due to global-level

comparison.
● Maximizes Recall, Precision might be compromised.

Schwenk et al. 2021 51

CC-Aligned: Comparable Corpora Mining

● Heuristic-based document-level alignment identification. (human effort).

● Sentence extraction from aligned documents.

● Local search only on sentences between aligned documents.

● Fast, scalable, more chance of mining high-quality bitext.

● Lower computational requirements compared to CC-Matrix style global

mining.
● Rich resource for non-English pairs (same English document translated to
multiple languages).
● Emphasis more on precision.

El-Kishky et al. 2020 52

Choosing the appropriate sentence embedding model

Check correlation with

human STS for the set
of languages you wish
to consider.

Gala et al. 2023 53

Parallel Corpus filtering
Q: Does noise affect model training ?

A: Depends.

If you operate at very large data and model scales.

⇒ Data at scale matters, noise has minimal impact.

[Gordon et al. 2021, Bansal et al. 2022]

If you are operating at lower data and model scales ?

=> It does. Eliminating noisy data improves NMT performance.

[Gala et al. 2023, Batheja et al. 2023]

54
Data Quality v/s Scale Tradeoff

Data Quality matters over

scale
Gala et al. 2023
Parallel Corpus Filtering

Embedding COMET
Cos_sim Referenceles
thresholds s
thresholds

0.80 threshold optimal for Check COMET calibration

LABSE ( for your desired language
Ramesh et al. 2022, set.
Gala et al. 2023).
Empirically determine
Embedding models might language-specific QE
Token count threshold vary in performance with thresholds. (No prior
heuristics might be
length. work).
useful to reduce false
positives. 56
Seed Data: High-quality human-annotated data

Language-family-specific: ILCI (Jha et al. 2010) (Tourism).

Multilingual efforts:
● NLLB-seed (Maillard et al. 2023) - multi-domain, inclusion of low-resource.

● MASSIVE (FitzGerald et al. 2022) - spoken content.

BPCC-Human (Gala et al. 2023)

● Enhanced domain coverage, emphasizing diverse domains and demographic
representation to improve performance in India-centric use cases.
● Inclusion of content not typically covered in web crawls, such as informal or spoken
content, to make models more robust.
57
Benchmark

58
Benchmarks

● Benchmarks are indicators of performance of systems on the task across

different settings.
● Benchmarks are harder to create, limited in quantity as well as demographic
Why?
diversity, and also variable in quality.
● NMT systems have reached So many systems
a fair point, out
multi-domain demography-specific
benchmarks importantthere, biases
to drive are progress
further bound toin terms of quality and
suitability for production-usage. come in
QC Procedure ?
● Creation of benchmark for tasks like NMT is much harder than it seems.

59
Existing benchmarks

WMT Shared Task FLORES-x

Benchmarks NTREX-128
+ Largest coverage
+ Direction-specific + 128 languages.
200+ langs. + Human-generated
+ Human- + Multi-domain
generated + Human-generated
- Limited
- Limited language demographic
- Limited
coverage coverage
demographic -
- Limited domain Only news (WMT).
coverage -
coverage (usually Only prose or formal
- Only prose or formal
news). style of text.
style of text. 60
Wishlist for creating a NMT benchmark

Multidomai Diverse Demographic

n Sources Coverage

Formal + Quality Deduplicated with

Informal Controlled existing benchmarks
61
Benchmark Creation Procedure

Identifying Sources Sampling sentences Automatic QC

Automated plagiarism
Multi-domain content Sample source
detection mechanism
from diverse sources sentences, keeping
to avoid biases
of the said length and domain
towards external NMT
demographics. diversity in mind.
systems.

Sentence Verification Translation Review : Manual QC

Get content from the
sources, conduct Get the sentences Human verification
sentence verification translated into the and correction to
and domain target language. ensure high-quality.
classification.

Data Translatio Quality

Preparation n Check
62
Deduplication with Benchmarks

● Most benchmarks being created from Wikimedia entities / data on web.

● Most data crawls include entire internet, so susceptible to leakages.
● Apply strict deduplication technique with benchmarks, to ensure minimal
data leakages and ensure robust evaluation.

Pre-hoc deduplication Post-hoc deduplication

(Existing benchmarks) (New benchmarks)

Eliminate data leakages from Only evaluate on those sentences

training / pre-training data to obtain from the benchmark which did not
an unbiased estimation of translation have overlaps in training data.
quality.

63
Evaluation Set Leakage Elimination Strategies for NMT Data

Parallel / Exact Exact Monolingual Fuzzy Monolingual

Dedup Dedup Dedup

Strictness
Eliminate X,Y pairs Eliminate all pairs Eliminate all pairs
from training data if with monolingual with monolingual
present in side of sentences side of sentences
benchmark from benchmark. from benchmark.

Only eliminates Eliminates exact Here the matching is

exact pair matches matches as well as fuzzy, n-gram based
from benchmark. paraphrases. and hence stricter.

Gopher and Megatron

uses 13-gram fuzzy 64
dedup
Modeling
Decision choices affecting massively multilingual MT

65
Core considerations

● Vocabulary

● Architecture

● Training

66
Core considerations
● Vocabulary

● Architecture

● Training

67
On Vocabulary

● Goal: Fair representation across languages

● Problem: Imbalance of data

○ Lower-resource languages get smaller share of vocab space

○ More character level representation

○ Higher fertilities

○ Longer sequences

○ Training time impacted

○ Downstream performance impacted

Arivazhagan et al. 2019 68

Solving The Vocabulary Skew
● Same number of tokens per language

○ Wasteful since not enough training

data anyway
● Balance the distribution with temperature

○ Original distribution:

○ Modified distribution:

○ S = T-1 where T is the temperature

○ T=1 for original distribution

T=100 for equal distribution
○ Minor impact??

Arivazhagan et al. 2019 69

When does skew really hurt?

● Study by Zhang et al. 2022

○ Just dont go to character level and have high unk rates

○ Training data balancing >> Vocab data balancing

Zhang et al. 2022 70

Vocabulary For Related Languages

● Shared scripts enable smaller vocabularies without going to character level

○ Opportunity to have compact models

○ Smaller softmaxes means faster training and decoding
● Transliteration to boost cognates: Boost transfer

○ Into English - Universal Romanization (Hermjakob et al. 2018,

Sennrich et al. 2016)
○ Into Kanji/Hanzi: Different languages - Similar scripts (
Song et al. 2020)
○ Into a common related language - IndicBERT (Kakwani et al. 2021),
IndicBART (Dabre et al. 2022), IndicTrans2 (Gala et al. 2023)
○ Boosting overlaps (NLU): Overlap BPE (Patil et al. 2022) 72
Vocabulary For Related Languages (2)

● Special case: Creoles (Dabre et al. 2014 & 2022, Lent et al. 2022 & 2023)

○ Free ride due to high similarity with parent languages

● Multiple segmentations as related languages

○ Kambhatla et al. 2022

○ Multiple segmentations of same sentences boost translation

■ Potato_, Po tat o_, Potat o_

■ BPE-dropout (Provlikov et al. 2020)

● Subword regularization boosts low-resource performance

73
Universal Romanization

Hermjakob et al. 2018 74

Family Specific Transliteration

● Hindi to Tamil (Indic NLP Library)

○ input_text = राजस्थान (rajasthan)
○ output_text = ராஜஸ்தாந
● Typical solution: Map to the highest resource or related language
○ IndicBART (Dabre et al. 2022)
○ IndicTrans2 (Gala et al. 2023)
○ RelateLM (Khemchandani et al. 2021)
■ Xlingual noising
○ Khatari et al. 2021
● Other solutions: Subgroup?
○ Indo-Aryan vs Dravidian
Kunchukuttan et al. 2017 Goyal et al.2020 75
Zero-Shot Translation Capabilities of IndicTrans2

tl;dr. Zero-shot (into-English) performance of IndicTrans2 close to

NLLB-54B (N54) (explicitly tuned for those languages).
Potential for light-weight and rapid adaptation to extremely low-
resource languages. (Neubig et al. 2018)
Gala et al. 2023 76
Artificial Noise: Induce Vocabulary And Zero Shot Translation

● Bhojpuri, Chattisgari
○ Extremely low-resource
○ Very similar to Hindi
■ Many spelling
variations

● Noise
○ Character span noise
○ Unigram noise

● Improved zero shot

translation
○ Up to 6 BLEU gains

● Linguistic noise ineffective

○ Scope Aepli+,
for investigation
2022, Maurya+, 2023 77
Orthogonal: Leveraging Ordering Information

Reorder parent-source sentence to match child-source sentence

Ensures better alignment of encoder contextual embeddings

● Significant improvements over baseline finetuning: needs a parser and re-ordering

system
○ Popovic et al. 2016, Mao et al., 2020, Jones et al. 2021, and Mao et al., 2022 also
focus on syntactic differences
○ Puduppully et al. 2023 show that monotone word order helps related language MT
○ Philippy et al. 2023 highlight more considerations for crosslingual transfer (survey)

Murthy et al., 2019 78

Summary: Make friends, not enemies!

(Force?)
Ensure
Share
balance
vocabulary

Noise is
Reorder
your friend

Leverage Size impacts

dictionaries minimally

79
Core considerations

● Vocabulary

● Architecture

● Training

80
Architecture Variants

NLU NLG

image credits 81
Architecture Variants: Block Choices

● Dense
○ Most commonly used
○ Standard transformers

● Sparse
○ Recent interest
○ Mixtures-of-experts

● Hybrid
○ Partially explored (M2M)
○ Extra hyperparameter

82
(Sparsely Gated) Mixtures Of Experts

● 1 FFN becomes N FFNs

○ Insert every Kth layer

● Route to 1 or more

○ Load balancing
important
○ Lephkin et al. 2020

● Explosively increase params

● Difficult to train

○ ST-MOE to the rescue

○ Zoph et al. 2022

83
Routing Decisions Lead To Language Clusters

● Costa-jussà et al. 2022 analyze gating

vectors (NLLB)

● Plot of cosine similarity between

language level gating vectors
● Language families have similar
gating behavior
● Kudugunta et al. 2021 manually route
tasks or languages to experts
○ Suitable for related languages

○ MoLEs by Gu et al. 2018

85
Note On LLM Only MT Modeling

Zhang et al. 2022 87

Decoder Only MT Performance

Depth > Width at smaller scales!

Encoder-Decoder is still the best at scale! 88
But!

Better zero-shot performance of prefix-LMs is intriguing!

(Cool implications for related languages) 89
Summary: Wise Choices Maketh a Model!
Design language
Use MoEs for
group specific
Scale
modules

LMs are decent Dense models Careful about PEs

MT models for are not ideal and Normalization
zero shot

90
Core considerations

● Vocabulary

● Architecture

● Training

91
All At Once (Joint) or Stage-wise (Incremental)?

All Corpora Train Model

Corpora Model A Corpora Corpora

Model B
Subset A Subset B Subset B

Modification Modification

Train Stage 1 Train Stage 2 Train Stage 3 ....

92
Joint Training: All At Once!

93
Training Schedule: Joint Training

● Mixed language pairs batch (Johnson et al. 2017)

○ Mix all corpora, shuffle and then choose batches

● Useful for fully shared models

● For models with separate language encoders/decoders

○ Shard batch and feed to appropriate components

○ Special encoders for language families

95
Are Language Family Specific Models Better?

96
Child Parent Language Multilingual
Language Δ(FS - FA )
Turkish Arabic Hindi
Yes They Are!
(135M) (134M) (60M)
● From my
Hausa Ph.D. thesis (2018)
(1.6M)
-0.05 +0.85 +0.01 +0.66
Uzbek ● Goyal et al. 2020
(8M)
+1.33 +0.79 +1.1 +2.79 ● Jointly training HRL and LRL
Marathi
(7.3M) ○ Similar HRL and LRL is
+1.64 +1.35 +1.87 +2.88 best
Malayala
m ● Training joint multilingual
(4M)
+2.27 +1.53 +2.27 +0.44 models
Punjabi
(5.7M) ○ FS = Family Specific
+1.15 +0.34 +1.88 +3.75
Somali ○ FA = Family Agnostic
(3.5M)
+2.28 +2.68 +2.55 +0.96 ○ Family specific 97
Visualization Of MNMT Representations

● SVCCA similarity between representations (

Kuduganta et al. 2019)

○ Also see Dabre et al. 2017 and

Johnson et al. 2017

● Encoder representations cluster sentences into

language families

○ Regardless of script sharing

○ Script sharing for stronger clustering

● High resource languages cause partition

○ Low-resource languages ride the wave

● Evidence of representation invariance when fine-

tuning

○ Explains poor zero shot quality between

distant pairs 98
Empirically
Determined
Language
Families
● Train many-to-many model with language tokens
● Hierarchical clustering of tokens
○ Set number of clusters by elbow-sampling
● Tan et al. 2019
● Also see Oncevay et al. 2020
● LangRank: find similar languages (
Zhou et al. 2021)

Predetermined language
families

Empirically
determined language
families via
embedding clustering 99
Is There An Optimal Number of Languages
● Does empirical clustering help? (Upper table)

○ Mostly yes

○ Random clustering gives poorer results

○ Predetermined clustering is equally good

● Language family specific models (Bottom table)

○ Universal model < Individual models

○ Family specific model > Individual models

○ Related to observations by Dabre et al. 2018

● Next steps

○ Family specific adapter layers (Bapna et al. 2019)

○ Family specific vocabulary and decoder separation

○ Behavior in extremely low-resource settings (<20k

pairs; Dabre et al, 2019) 100
Leveraging Noise and Paraphrasing For Related Languages (Aly+ 2021)

101
Joint Denoising and Adversarial Approaches For Alignment

Forcing representations of
related languages to be similar
helps!

Ko et al. 2021 102

Training Schedule: Addressing Language Equality

● Source of inequality: Corpora size skew

● Solutions: Oversampling smaller corpora

● Oversampling before training or during training?

○ Matter of implementation choice

○ Oversampling prior to training creates large duplicated corpora

103
Importance Of Temperature Based
Sampling
● Naive approaches:
○ Ignore corpora size distributions
○ Sample from all corpora equally
● New approach: Temperature based sampling
(pL(1/i) )
○ Where pL is the probability of sampling a
sentence from a corpus
○ i is the sampling temperature
○ Strongly benefits low-resource pairs

104
Arivazhagan et al. 2019
Stage-wise Training: A Bit at a Time!

105
Why Stage-wise/Incremental Training?
● All at once cant be learned effectively

○ Missing languages

○ Data skew

○ Difficulty in handling language group phenomena explicitly

● Benefits

○ Incorporating new languages

○ Expanding capacity

■ New tokens

■ New layers

■ Adapters
106
Incorporating New Languages

● Language specific transfer ● Expanding to new languages

○ Replace vocabulary ○ Expand vocabulary
○ Fine-tune on new data ○ Fine-tune on old+new data
● Similar to Zoph et al. 2016 ● Increase computational capacity?
WECHSEL for initialization Surafel et al. 2018108
Capacity Expansion Of Existing Models

● Add new components while freezing existing components

○ Lightweight training BUT
○ Previous components may not be aware of new languages
■ Poor transfer learning
○ Potential zero-shot learning
■ Will it work for distant languages
■ Similar recipe for multimodality (Duquenne et al. 2022)

● Lessons from Sachan et al. 2018; Firat et al. 2016a/b;

Bapna et al. 2019
○ Deepen encoders and decoders
○ Only train new components with old and/or new data
■ Vocabulary expansion by Surafel et al. 2018 will help
Escolano et al. 2019 109
Adapting Previously
Trained Models
● Feed forward layers to refine
outputs
○ Bapna et al. 2019
○ Partial solution to
bottleneck
○ Language pair specific
■ Zero-shot
performance?
● 13.5% larger models
● Improved high-resource pair
performance
○ Low-resource
performance kept
110
Adapters For Related Languages
● Language family specific adapters
○ Gumma et al. 2023
○ Chronopoulou et al. 2023
● Mixtures of adapters
○ Wang et al. 2023
○ Automatic adapter clustering
○ Language branches (Sun et al. 2022)
● Adapter ensembling
○ Use similar languages
○ Wang et al. 2021
● Hyperadapters (Baziotis et al. 2022)
○ Base parameters to create specific adapter instances
■ Contextual parameters (Platanois et al. 2018)
○ Similar languages support each other
○ Networks encode relatedness

111
Family Specific Adapters
● Definite advantage over language ● Gains regardless of supervised or
agnosticism unsupervised directions!
● Strong gains for distant languages
○ Balto-slavic are more similar to English

Chronopoulou et al. 2022112

Vocabulary Expansion Not Needed For Related Languages

● Creoles and dialects can be well segmented by related language tokenizers

○ Selected tokenizer impacts performance (choose wisely!)
○ Chen et al. 2023 (mBART50 for South American indigenous languages
[Spanish based])
○ Dabre et al. 2022 (mBART50 for Mauritian Creole [French Creole])

113
Multi-stage training (Curriculum?)

Train the model on the BPCC +

Train the model on the BPCC
BPCC-BT
(Stage 1)
(Stage 1)
Data Augmentation
Experiment well with
forward as wellback
using as back
translation: BPCC-BT
translations!
(~400M bitext pairs)
Fine-tune the model on the seed Fine-tune the model on the seed
corpora corpora

(Stage 2) (Stage 2)

Auxiliary Downstream

Train the Model from Finetune the model from

scratch stage 1
114
Importance of Tuning on Clean Corpora

En-Indic did better with

forward translated data!
(Relatively smaller
monolingual corpora to
blame?)

Gala et al. 2023115

Importance of Curricula
● Dabre et al. 2019: Introduce low-resource later
○ Jointly training HRLs and LRLs was suboptimal
○ Best recipe: HRL --> HRL+LRL --> LRL
■ Noisy --> Noisy+Clean --> Clean
○ Add complexity then reduce!
● Mohiuddin et al. 2022: Score and select subsets
○ Start with General Domain model
○ Score instances:
■ Use external scorer
■ Use model itself
○ Related to: Wang et al. 2017
● Faster convergence
● Better performance
● Low resource languages:
○ Scoring using related HRL?

116
Summary: Wise Choices Maketh a Good Model!

Build
Freeze for Add
incrementally,
efficiency and vocabulary as
in stages, with
not forgetting applicable
curricula

Leverage Group
noise, specific Careful about
paraphrases, models and balance of
adversaries adapters are languages
wisely promising
117
Model Compression: Light as a Feather!

118
Why Compression?

I am a boy 私は男の子です

Behavior #Params Translation Translatio

Quality n
Speed Achieved using
Default Many High Slow sequence
distillation Kim
Few Low Fast
and Rush, 2016
Desired Few High Fast

Kim and Rush, 2016119

Compression Approach: Pruning

● Structured Pruning: Remove specific layers or groups

● Unstructured Pruning: Remove least important weights or neurons
Han et al. 2015120
Memory-efficient NLLB-200
● Determine statistics: Expert importance and routing probabilities
● Prune: Per layer or Globally

Koishekenov+ 2022121
Dettmers et al. 2022, Dettmers et al. 2022

Quantization

7B params = 28 GB in FP32
= 3.5 GB in
FP4
Go low!

122
Distillation

● Word level distillation (online)

○ Use parent distributions
○ Not always good

● Sequence level distillation (offline)

○ Translate to get parents token distributions
○ Mostly enough

● Interpolation (hybrid)
○ Use both losses

Kim and Rush, 2016123

Wiki
Distilling Large Models Corpus

Adapt for
Train Large NLLB 54B MoE
NLLB 54B MoE Domain
Model Wiki
(optional)

Offline sequence
distillation is too
expensive for largeWord Level
NLLB models! Distillation
Corpus (online)

1.3B Dense 600M Dense

Costa-jussà et al. 2022 125

Summary: The Holy Trinity

Prune and Distill

Tune (online?)

Quantize

128
Implementations and Toolkits

129
Toolkits: The Big 4
● Fairseq (v1/v2) by Meta
○ Pytorch
○ Comprehensive for MT pre-training and fine-tuning
○ All rounded and most popular among researchers
● Transformers by HuggingFace
○ Pytorch/Tensorflow Large code bases.
Overwhelming for beginners!
○ Most popular for fine-tuners
○ Has a hub for all models
● Tensor2tensor by google
○ Tensorflow
○ Deprecated in favor of TRAX
● OpenNMT by various researchers
○ One of the earliest
○ Both pytorch and tensorflow
130
More Toolkits
● MarianMT by several researchers
○ Written in C++ and minimal dependencies
● JoeyNMT by Amsterdam and Heidlberg universities
○ Minimal
○ For beginners and learners
● Sockeye by Amazon
○ Pytorch
○ Distributed training and efficient inference
● YANMTT by NICT (actually mainly ME)
○ Pytorch
○ Distributed multilingual pre-training and fine-tuning (of lightweight
models) at scale
■ Started out as a pre-training script
○ My hobby/pet project :-)
131
On Evaluation
We built a model but how good is it?

132
Taxonomy of MT Metrics

Lee et al. 2023, Sai et al. 2020 133

Which String-based Metric Is Reliable These Days?

chrF/chrF++ most reliable.

Do significance testing.
Especially for Indic languages.
End of BLEU?

Sai et al. 2023 134

Rei et al. 2020 Freitag et al. 2021

COMET

Direct
Source Assessment COMET
(1 score) DA

Translation

Reference Annotations to MQM COMET

(optional) score via (Annotations) MQM
formula
135
IndicComet

Indic Comet
Comet DA
DA

Indic DA

Fine-tune

Indic MQM

Indic Comet
Comet
Improved humanMQMcorrelations!
MQM
Excellent zero-shot capability.
Influence of related languages?

Sai et al. 2023 136

Limitations of Learned Metrics
● Amrhein et al. 2022: COMET makes mistakes
○ Not sensitive to number and named entities discrepancies
○ Hard to fix biases via fine-tuning
● Moghe et al. 2023: Poor quality estimation - downstream performance
correlation
○ Metrics have negligible correlations with the extrinsic evaluation
of downstream outcomes
○ Scores provided by neural metrics are not interpretable
○ Diverse references and metrics to produce labels instead of scores
(MQM?)

137
LLMs For Evaluation (GEMBA)

Large PALM models prompted

for MQM gave better
correlation with human
annotations!

Kocmi et al. 2023, Fernandes et al. 2023 138

Evaluation Implementations

● String-based

○ SacreBLEU (Post et al. 2018)

● Model-based

○ COMET (Rei et al. 2020, Freitag et al. 2021)

○ BLEURT (Sellam et al. 2020, Pu et al. 2021)

○ IndicCOMET2 - WIP

always perform significance testing

139
Human Evaluation: For Humans By Humans!

140
Human Evaluation

● Adequacy :- Faithfulness of the translation to its source.

○ Errors like omission, unwarranted additions / mistranslations.

○ Needs bilingual evaluators

● Fluency :- How fluent is the translation as a standalone sentence in the

target language.
○ Factors include naturalness, grammatical / spelling errors.

○ Can be conducted by monolingual evaluators

● Most critical aspect of Human evaluation : Inter-Annotator-Agreement

○ Humans can be subjective, sometimes too strict or too lenient, we want

consistency.

141
Human Evaluation : Relative Ranking

● Annotators shown outputs of 2 or more systems and have to rank the

systems.
● Relative measure.

● Not an indicator of how-good or how-bad a system is.

● Usually poor Inter-Annotator-Agreement (IAA) observed for relative ranking.

● Lacks interpretability and offers limited insights about aspects to improve

upon (Adequacy / Fluency).

142
Direct Assessment

Absolute 0–100 rating.

DA-Adequacy

Annotators rate how adequately the candidate expresses the meaning

of the corresponding reference translation on a scale of 0-100.

DA fluency

Annotators rate on a scale of 0-100 about how much they agree that a
given translation is fluent target language text.

Reference-free evaluation.

Graham et al, (2013, 2014); 143

Relative Ranking v/s Direct Assessment
Input text MT System Output Human Aggregation
text evaluators Strategies ?
System 1 �
�
∑

System 2 �
�
Averaging /
Relative Ranking
Max-voting.

2❌
System 2 ∑
3✅
Direct Assessment
144
Multidimensional Quality Metrics - MQM

Lommel et al. 2014; Freitag et al. 2021 145

Annotators assign scores based on
the quality of translation and
Example of a MQM annotation identified errors.

MQM provides more informed

judgements

Sai et al. 2023 146

Semantic Textual Similarity - STS

More focus on
Adequacy
than Fluency !

Why ?

Fluency is
subjective.

Modern use-
cases like
social media
lack fluency
by design.

Agirre et al. 2016 147

Cross-lingual STS - XSTS

● Large degree of variance in STS ratings across different language pairs as

pool of annotators and outputs are different.

Solution ? Calibration!
● Annotators perform the task on actual data + calibration set (English output
+ Reference) for cross-lingual consistency, as evaluation is english-centric.
● Compute average scores per annotator for calibration set.

● Check annotator-wise deviations on the calibration set and apply this

correction factor on the actual data as well to normalize it and impose some
cross-lingual consistency.

Licht et al. 2022 148

The future
Deepening the niche

150
What still lacks ?
● Low-resource performance can still be improved.
○ Still many extremely low-resource languages
○ Still many relatedness ideas to explore and exploit
● Idiomatic usage needs to be covered.
● Non-english numerals expressed in words might result in hallucinations.
○ Many specific cases
● Coverage of dialects and more languages
● Extension to speech-translation
○ Speech to speech is the dream
● Improve LLMs for MT
○ Will Decoders replace Encoder-Decoders? (I think not)
● Document-level translation underexplored
○ Improve long context handling 151
Bringing in dialects

● Dialects in India

○ Dialects of Marathi (42 dialects)

○ I speak only 3! :-(

Language resource papers
● Dialects in Japan should not be looked down
upon!
○ I can handle Osaka and Kyoto dialect but theres MORE!

● Dialects is what people use but mostly spoken

○ Spoken language data collection needs focus

○ Creoles are mostly spoken

○ Case for direct speech-speech MT?

● Let us collect data aggressively!

152
Summary

● Big Picture

○ The current state MT

● Data Creation

○ Manual and mining

● Modeling

○ Models at scale

○ Compactness

● Evaluation

○ Automatic

○ Human 153
Q&A
Thank You

154

Neural Machine Translation Review
No ratings yet
Neural Machine Translation Review
76 pages
Paper Review
No ratings yet
Paper Review
41 pages
Natural Language Processing Unit 5
No ratings yet
Natural Language Processing Unit 5
23 pages
Neural Machine Translation A Review of Methods Resources and - 2020 - AI Ope
No ratings yet
Neural Machine Translation A Review of Methods Resources and - 2020 - AI Ope
17 pages
Neural Machine Translation Overview
No ratings yet
Neural Machine Translation Overview
91 pages
Multilingual NMT Challenges
No ratings yet
Multilingual NMT Challenges
27 pages
Toward Multilingual Neural Machine Translation With Universal Encoder and Decoder
No ratings yet
Toward Multilingual Neural Machine Translation With Universal Encoder and Decoder
10 pages
BERT for English-Telugu Translation
No ratings yet
BERT for English-Telugu Translation
4 pages
10 1016@j CSL 2017 03 001
No ratings yet
10 1016@j CSL 2017 03 001
16 pages
Marathi-English Translation via Transformers
No ratings yet
Marathi-English Translation via Transformers
5 pages
2503 06594v1-LaMaTE
No ratings yet
2503 06594v1-LaMaTE
36 pages
A Survey of Multilingual Neural Machine Translation: Raj Dabre, Chenhui Chu, Anoop Kunchukuttan
No ratings yet
A Survey of Multilingual Neural Machine Translation: Raj Dabre, Chenhui Chu, Anoop Kunchukuttan
38 pages
Machine Translation Wise 2016/2017
No ratings yet
Machine Translation Wise 2016/2017
58 pages
Translation Theory
No ratings yet
Translation Theory
44 pages
Neural Machine Translation: Max Mustermann, and Hermann Ney
No ratings yet
Neural Machine Translation: Max Mustermann, and Hermann Ney
18 pages
Neural Machine Translation for Low-Resource Languages
No ratings yet
Neural Machine Translation for Low-Resource Languages
9 pages
Challenges in NMT - 2004.05809
No ratings yet
Challenges in NMT - 2004.05809
22 pages
Advances in Neural Machine Translation
No ratings yet
Advances in Neural Machine Translation
10 pages
Google PDF
No ratings yet
Google PDF
23 pages
Google Neural Machine Translation System
No ratings yet
Google Neural Machine Translation System
23 pages
NLP Unit 1
100% (1)
NLP Unit 1
34 pages
Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation
No ratings yet
Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation
17 pages
Quinn Thesis Final On NMT
No ratings yet
Quinn Thesis Final On NMT
29 pages
Research Paper 4
No ratings yet
Research Paper 4
10 pages
Wintersc Iitguwahati Multilingual Model Jan25
No ratings yet
Wintersc Iitguwahati Multilingual Model Jan25
81 pages
359 1632 1 PB
No ratings yet
359 1632 1 PB
5 pages
Neural Machine Paper 5
No ratings yet
Neural Machine Paper 5
4 pages
Notes 1311
No ratings yet
Notes 1311
4 pages
OpenNMT Open-Source Toolkit For Neural Machine Translation
No ratings yet
OpenNMT Open-Source Toolkit For Neural Machine Translation
6 pages
From Recurrent Neural Network Techniques To Pre-Trained Models: Emphasis On The Use in Arabic Machine Translation
No ratings yet
From Recurrent Neural Network Techniques To Pre-Trained Models: Emphasis On The Use in Arabic Machine Translation
10 pages
Low-Resource NMT Training Guide
No ratings yet
Low-Resource NMT Training Guide
14 pages
FN Paper 2
No ratings yet
FN Paper 2
13 pages
Electronics 14 00243
No ratings yet
Electronics 14 00243
30 pages
Improving Neural Machine Translation Models With Monolingual Data
No ratings yet
Improving Neural Machine Translation Models With Monolingual Data
11 pages
CBHG Text Ai
No ratings yet
CBHG Text Ai
13 pages
Cs224n 2020 Lecture08 NMT
No ratings yet
Cs224n 2020 Lecture08 NMT
77 pages
Presentation RaviShankar
No ratings yet
Presentation RaviShankar
28 pages
Overview of NLP Translation Methods
No ratings yet
Overview of NLP Translation Methods
5 pages
Neural Machine Translation
No ratings yet
Neural Machine Translation
29 pages
05 Lecture08 NMT
No ratings yet
05 Lecture08 NMT
79 pages
1679506287709733
No ratings yet
1679506287709733
15 pages
Is Neural Machine Translation The New State of The Art?
No ratings yet
Is Neural Machine Translation The New State of The Art?
12 pages
Hindi-English NMT Systems at WAT 2019
No ratings yet
Hindi-English NMT Systems at WAT 2019
4 pages
Deep Learning in Machine Translation
No ratings yet
Deep Learning in Machine Translation
9 pages
An Introduction To Machine Translation: Andy Way, DCU
No ratings yet
An Introduction To Machine Translation: Andy Way, DCU
23 pages
Low-Resource Neural Machine Translation With Morphological Modeling
No ratings yet
Low-Resource Neural Machine Translation With Morphological Modeling
14 pages
XCS224N Module5 Slides
No ratings yet
XCS224N Module5 Slides
80 pages
A Recipe For Arabic-English Neural Machine Translation
No ratings yet
A Recipe For Arabic-English Neural Machine Translation
5 pages
Artificial Intelligent Decoding of Rare Words in Natural Language Translation Using Lexical Level Context
No ratings yet
Artificial Intelligent Decoding of Rare Words in Natural Language Translation Using Lexical Level Context
7 pages
Open Vocabulary NMT with Hybrid Models
No ratings yet
Open Vocabulary NMT with Hybrid Models
10 pages
Neural Machine Translation For English-Tamil: Himanshu Choudhary Aditya Kumar Pathak
No ratings yet
Neural Machine Translation For English-Tamil: Himanshu Choudhary Aditya Kumar Pathak
7 pages
NMT for Low-Resource Languages
No ratings yet
NMT for Low-Resource Languages
35 pages
Iconips Paper On Transfer Learning
No ratings yet
Iconips Paper On Transfer Learning
11 pages
Tamil-to-English NMT Techniques
No ratings yet
Tamil-to-English NMT Techniques
2 pages
NLP Transformer Models Survey
No ratings yet
NLP Transformer Models Survey
42 pages
Thesis Amended
No ratings yet
Thesis Amended
157 pages
Machine Translation
No ratings yet
Machine Translation
33 pages
Neural Translation Quality Analysis
No ratings yet
Neural Translation Quality Analysis
4 pages
Lesson Plan Lesson Plan: Units of Measure 1
No ratings yet
Lesson Plan Lesson Plan: Units of Measure 1
4 pages
Music G8 Q2
100% (1)
Music G8 Q2
40 pages
Grade 6 Exam Results Analysis
No ratings yet
Grade 6 Exam Results Analysis
9 pages
Lesson Plan Music and Movement For Assessment Pattern
No ratings yet
Lesson Plan Music and Movement For Assessment Pattern
3 pages
Special & Inclusive Education Guide
100% (1)
Special & Inclusive Education Guide
13 pages
2022 Fall AREC 323 LECTURE A1
No ratings yet
2022 Fall AREC 323 LECTURE A1
7 pages
Motivation Letter English
No ratings yet
Motivation Letter English
2 pages
Design Your Own Town Project Guide
No ratings yet
Design Your Own Town Project Guide
2 pages
Individual Transition Plan - FUNCTIONAL ACADEMIC
100% (7)
Individual Transition Plan - FUNCTIONAL ACADEMIC
8 pages
Teaching Macro Skills in English
100% (1)
Teaching Macro Skills in English
14 pages
Math1 DLP
No ratings yet
Math1 DLP
9 pages
What Is Standardized Testing
No ratings yet
What Is Standardized Testing
5 pages
Focus On The Learner - Reflective Teaching in Second Language Classrooms (Cambridge Language Education) by Jack C. Richards, Charles Lockhart
No ratings yet
Focus On The Learner - Reflective Teaching in Second Language Classrooms (Cambridge Language Education) by Jack C. Richards, Charles Lockhart
26 pages
Volume 1 (2) - 2i PDF
No ratings yet
Volume 1 (2) - 2i PDF
9 pages
Unit Two Ethics and Values
No ratings yet
Unit Two Ethics and Values
20 pages
Stationers' Crown Woods Academy: Aspire, Strive, Thrive Relentlessly Pursuing Excellence
No ratings yet
Stationers' Crown Woods Academy: Aspire, Strive, Thrive Relentlessly Pursuing Excellence
12 pages
Navitas Brochure
No ratings yet
Navitas Brochure
16 pages
Rentention & Transferability (Sage Open, 2024)
No ratings yet
Rentention & Transferability (Sage Open, 2024)
17 pages
Phil IRI Presentation
100% (1)
Phil IRI Presentation
19 pages
Kemahiran 6K
No ratings yet
Kemahiran 6K
22 pages
Three Facets To Curriculum
100% (2)
Three Facets To Curriculum
2 pages
My CV
No ratings yet
My CV
2 pages
11 Reflection
No ratings yet
11 Reflection
8 pages
June 2022 (v2) MS
No ratings yet
June 2022 (v2) MS
12 pages
QTR 1 Lesson 6 and 7 - Evaluating and Making Judgment and Listening Text
No ratings yet
QTR 1 Lesson 6 and 7 - Evaluating and Making Judgment and Listening Text
45 pages
Terminal Date Sheet S25 (Tentative 3rd Version)
No ratings yet
Terminal Date Sheet S25 (Tentative 3rd Version)
3 pages
Educ 205 - Structure in Social Studies
No ratings yet
Educ 205 - Structure in Social Studies
2 pages
Reasons For Management Control Systems Adoption: Insights From Product Development Systems Choice by Early-Stage Entrepreneurial Companies
No ratings yet
Reasons For Management Control Systems Adoption: Insights From Product Development Systems Choice by Early-Stage Entrepreneurial Companies
26 pages
Literature Review - Jeff Duggan
No ratings yet
Literature Review - Jeff Duggan
9 pages
Statement of Purpose MBZUAI
No ratings yet
Statement of Purpose MBZUAI
1 page