Style, Semantics, and Other
Things
Krishnapriya Vishnubhotla (KP)
Intro: Style in NLP
● Uniqueness of writing style
● Due to:
○ Lexical choices (big words vs small words)
○ Sentence structure (short n simple vs complex with clauses)
● Stylometry:
○ Surface features (word lengths, sentence lengths)
○ Lexical features (LIWC, number of hapax legomena)
○ Syntactic features (function word frequencies, PoS tag frequencies, parse tree features, character trigrams)
● Authorship attribution, plagiarism detection, digital forensics
Form and Meaning
● Text generation process:
○ a meaning, or content +
○ Form, or style
● Multiple surface realisations are possible for the same meaning
● Natural language corpora:
○ Complex vs simple wikipedia
○ Literary translations
● Closely related to: paraphrases
Paraphrases
● Paraphrase identification, generation
● Datasets: Quora Question Pairs, Microsoft Research Paraphrase Corpus, ParaNMT
● Semantic Textual Similarity tasks
NLP: Style Transfer
● Lots of work on style transfer in NLP
● “Style” ---> factor of variation
○ Sentiment
○ Attributes
○ Topics
● Usually guided by the dataset used.
● Problematic:
○ What should be preserved?
○ Adds to already problematic evaluation metrics
Complications
● There are no true synonyms -- “near-synonyms”
● Changing active to passive → change of focus
● Pragmatics -- viewpoint, framing, denotation, connotation, implication.
● Can draw some fuzzy boundaries between clusters of near-synonyms at a word-level
○ What about for phrases/sentences/documents?
● Style: Literary definition: what is “lost in translation”
Meaning Representations
● Formal representation of meaning/semantics
● Lots of CL research on logical forms, compositionality
● Two relatively-recent projects I came across
○ Abstract Meaning Representation (AMR)
○ Minimal Recursion Semantics (MRS)
Abstract Meaning Representation
● Rooted, directed, (edge+leaf)-labelled graph
● Uses PropBank frames
● Example: “The dog is eating a bone,”
Relations
Variable / Concept
● “The dog ate the bone that he found.”
● Has ways to handle:
○ Coreference
○ Negation
○ Numbers/quantity
○ Names
Generalisation capabilities
- The man described the mission as a disaster.
- The man’s description of the mission: disaster. Same AMR.
- As the man described it, the mission was a disaster.
- The man described the mission as disastrous.
● Abstracts away morphological and syntactic variations.
● But does not handle synonyms
○ “afraid” and “terrified” are treated as different concepts.
● Useful?
○ Not yet.
○ Purpose: dataset to help develop algorithms that can generate AMRs.
Minimal Recursion Semantics
● Another formalism: phrase structure grammar
● More fine-grained
● Can distinguish between tense, number.
● Practical utility:
○ Has a command-line parser you can use
○ Can generate simple paraphrases
Practical Utility
● Unlikely that they can parse many real-world sentences:
○ LIT paper: successful at 19.7% of SNLI sentences
● Using AMR to detect paraphrases:
○ ~85% on the Microsoft Paraphrase Corpus
● A separate research problem, not a tool to be used.
Back to Representation Learning
● Let us assume we have…
● Some proxy information for:
○ Form
○ Meaning Text t
Form Vector Meaning Vector
Stylistic similarity Semantic similarity
Neural Models
z classifier
● Modified Autoencoders
Paraphrases
● Encode into two vectors
● Use both to reconstruct
● Restrict information using
motivational/adversarial
discriminators
Semantic z classifier
Syntactic
What kinds of supervision?
Datasets
● Style class labels
● Paraphrases ● Paraphrase datasets
● Heuristic info: ● Parallel style transfer datasets
○ BoW for content ○ Formality
● Syntax: Syntax tree features ○ Diachronic language change
○ Tree edit distance ● Data-to-text datasets
○ ~Synthetic
Synthetic Dataset: PersonageNLG
● Personality model might be questionable
● BUT gives us two neat dimensions of variation.
All the losses later….
Evaluation:
● Style transfer (swap variables + generate)
● Retrieval
● Prediction (kNN)
More supervision == better representations
● Kinda boring
● Just train a separate supervised model
for each end-goal?
● Style transfer:
○ Generation problems
○ Evaluation problems
● Real-world text: not so cleanly
separable.
:(
What would be interesting?
● Unsupervised disentanglement?
○ beta-VAE in vision
○ At least for the synthetic dataset
● Evaluating the representations:
○ Probe for linguistic knowledge/features
○ Robust to “noise”? → domain adaptation/zero-shot prediction
● Using pre-trained models?
● (TBD) Should the latent spaces be entirely unrelated?
○ Where do style and semantics intersect?
○ What is a “latent space of sentences” anyway?