0% found this document useful (0 votes)
108 views26 pages

Multiverse Course

The Multiverse Course project aims to develop engaging human-model interactions for a leading technology company, challenging participants to create prompts and responses that match or exceed current AI capabilities. The task is divided into three sections: reviewing instructions, writing a relevant prompt, and crafting a response in a specified language and region. Participants are encouraged to focus on clarity, conciseness, and creativity while adhering to specific guidelines for various task categories.

Uploaded by

ROBISEN JIMMY
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
108 views26 pages

Multiverse Course

The Multiverse Course project aims to develop engaging human-model interactions for a leading technology company, challenging participants to create prompts and responses that match or exceed current AI capabilities. The task is divided into three sections: reviewing instructions, writing a relevant prompt, and crafting a response in a specified language and region. Participants are encouraged to focus on clarity, conciseness, and creativity while adhering to specific guidelines for various task categories.

Uploaded by

ROBISEN JIMMY
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

Multiverse Course

Context
Welcome to the project! This is an exciting project where we together are helping a world-
leading technology company develop creative, engaging conversations, stories, and analyses
that are more human and engaging than the best models on the market today.

What this means for you: You’re getting the chance to test your writing skills to EQUAL OR
BEAT the best AI available today!

Specifically, we want you to help us write prompts and responses for several different
human<>model interactions with similar but slightly different guidelines.

In this task, you will be writing a prompt and response for a human<>model interaction in a
specific language and region (e.g., Australian English, Swiss German, Belgian French, or
several others).

This task is divided in three parts:


 Section 1: Review the instructions for the specific use case
 Section 2: Write a prompt that is relevant to the given use case in the target language
 Section 3: Write a response to that prompt in the target language, considering:
o ‘Table stakes’ (i.e. minimum quality) practices that models do very well already
o Value-add areas where human creativity and brilliance can outshine models

Ground Rules
Above all, Take your time - This is NOT a quick task - we want you to spend time removing
excess words, ensuring good spelling, and creating a concise, focused answer that the user can
use right away.

Please:
 Ensure you are writing the right use case
o Include a reference text if the specific task type mentioned in the instructions
requests it
o Reference text is source material for the response. Think of it like attaching a file
when you use a chatbot.
 Craft a response that works beautifully for the prompt:
o Has perfect spelling and grammar for your region and dialect
o Is clearly thought-out and well-structured - if you are not editing your draft
response 2-3 times, you’re likely going too fast
o Is formatted to be easily digestible (Spaces are your friends! Bullet points can be
your friend! Long sentences and paragraphs are not your friends…)

A few more quick things to think about:


 Please check the instructions at the top of the task - these are very important
o Make sure you are satisfying the requirements of the task - some tasks
(Extraction, Classification, Rewriting, Closed Q&A, and Summarization) require
that you attach reference text
o Make sure all prompts and responses are in the target language indicated
o Make sure that you are writing using the spelling, grammar, and idiom of the
language indicated, e.g. “German as used in Switzerland”, or “Spanish as used
in the Chile”
 Do not use an LLM to write your response. We cannot deliver responses that look like
LLMs, and tasks that are clearly written with LLMs will be SBQed when we detect
them.
o What do LLM Tasks look like?
 Please keep in mind how it feels to be a user engaging with a language model as
you write your prompts and responses. Concision (video taken from our reviewer
training) is very important.
o It is highly likely the user will engage with this material on their phone or through
a voice assistant. They need the answer within 1-3 sentences.
i. Another way to think about this: if the user has to read through every
word to get the meaning (and can’t quickly scan to get what they need),
the response is not concise
ii. For many (but not all) use cases: if you’re not using bullet points, you
are probably not being concise enough
o Here is a cheat sheet to write good tasks

👾 Task Interface
The basic process for writing a good prompt and response is simple, but requires focus:
 Get clear on the use case - what’s needed, what’s useful
 Write what you know
o Find what speaks to you in the use case and write about that, it’ll be much
easier and more fun than coming up with a scenario
 E.g., if the prompt is chatbot, is there someone you’d like to speak to?
What would you like to say to them?
 Check for spelling, length, and clarity
o We’re repeating this because it’s so sad to see great prompts and responses that
are hidden by too many words or that get failed by our QC team because of
simple spelling and grammar errors
o Everyone writes too much (I mean look at this instruction guide! 😆) so please
take a second to remove at least 1-2 sentences and to move your conclusion up
to be the introduction

Here’s how a task works:

Section 1: Review the instructions


When tasking as an attempter the first section is a reminder/ info section on the instructions
pertaining to your specific project up top.

Please note: The following task types require ‘reference text’. Reference text is a specific
text block or source material that the model is expected to use for reference as it fulfills
the prompt:
 Rewriting
 Closed Q&A
 Extraction
 Summarization
 Classification
A good prompt for a Rewriting tasks:

- Includes reference text or a link as reference material

- Is interesting: It requests a specific perspective, context, or 'spin' that informs the rewrite (e.g., 'write
in the style of a five year-old', or 'take the perspective of a pet cat')

A good response for a Rewriting task is:

- Complete: It fully responds to the prompt, follows instructions, and, if possible, delivers what it asks

- Concise: Doesn't repeat itself or go into unnecessary detail, but fully answers the prompt

- Creative in using appropriate humor, descriptive phrases, and inventiveness to deliver a satisfying
rewrite

- Not Harmful (no violent or sexual content, no Personally Identifying Information, no abusive or hateful
content)

In addition: PLEASE ALWAYS PROOFREAD: Upon proofreading, you will likely catch multiple errors
and/or identify multiple ways to improve your response. Spelling and grammar errors are one of the
primary ways that models are 'better' than human responses today.

Please also confirm the use case at this stage. It is VERY important that your prompt and
response address the appropriate use case, otherwise the entire task is not usable.

Section 2: Write a prompt that’s relevant to the specific use case and local language
Create your prompt here. To add reference text click the purple “+” to open the reference text
window.
Note: Reference text MUST be placed as text in the reference text window. It should be
no more than a 1000 words max, and must be at LEAST 200 words long. URLs are NOT
acceptable.

Section 3: Write a response that fully delivers on the prompt.


After the prompt section, your response section follows right after.
Categories
It is very important to write a useful prompt and response for the task Category. Here is a handy
review summary of our categories, what the user is trying to do in them, and what you should
keep in mind as you write them.

For example:
 A user asking an open Q&A question may be trying to solve an immediate problem,
e.g.:
o How do I fix a leaky faucet?
o Explain this topic to me for a homework assignment
o My friends keep talking about a feud between pop stars, tell me what’s going on
 A user starting a chatbot conversation is more likely to be bored or interested in
interaction. They want to ‘chat’ with the model.
 A user starting a brainstorming chat is looking for inspiration
 A user asking for classification is asking the model to review an unstructured set of
inputs and create structure from them, with a specific type of structure already in mind
 A user asking for a summarization of a specific topic wants to know something specific
but doesn’t have time for it (TL;dr)
A user asking for a rewrite is most likely looking at text that is almost right for what they’re trying
to do, but needs a different tone, style, or format

See below for more details:

Category / Use Case:

Open Q&A

User Goal (What is the person using the LLM trying to do?): Ask a question without
reference material, get an immediately useful answer.

Required: No reference text.

Examples:

 Ask me anything
 explain a concept
 general questions that don't require reference text (e.g., pop culture).

Category / Use Case:

Chatbot

User Goal (What is the person using the LLM trying to do?): Have a conversation with a chat
model - have it take a point of view or play a role, and have a conversation with it.
Required: Only ONE TURN - one prompt and response, not a scripted dialogue. (Optional)
Start conversation in prompt.

Examples:

 Have a conversation,
 prep for an interview or presentation,
 workshop / refine a specific idea (e.g., get advice from a favorite author or expert).

Category / Use Case:

Brainstorming

User Goal (What is the person using the LLM trying to do?): Create a shortlist of ideas for a
specific topic, activity, or problem to solve.

Required: Topic and/or use case that you want to start creating ideas for.

Examples:

 Gift or menu ideas,


 marketing or branding ideas,
 new products to develop or business strategies,
 new media or food discovery ('I like X, what else might I like?').

Category / Use Case:

Creative Writing

User Goal (What is the person using the LLM trying to do?): Create something meaningful
& useful from nothing.

Required: Context, tone, and format of the thing you’d like to create. Do NOT provide reference
text or ask for a rewrite.

Examples:
 Write an email congratulating a friend on adopting a kitten, and make it cute and
informal.
 Write a bedtime story for my niece who loves astronauts and tigers.
 Come up with a creative excuse I can text my boss for why I’m not at work.

Category / Use Case:

Closed Q&A

User Goal (What is the person using the LLM trying to do?): Ask a question about a specific
reference text and get an answer.

Required: Reference text REQUIRED.

Examples:

 What was the revenue for this company in 2023?


 Who is the top run-scorer in this document?
 Does this menu include gluten-free options?

Category / Use Case:

Extraction

User Goal (What is the person using the LLM trying to do?): Extract specific information
from a reference text.

Required: Reference text REQUIRED.

Examples:

 What are the 1st vowels in each word of this text?


 What are my action items from this email chain?

Category / Use Case:

Classification
User Goal (What is the person using the LLM trying to do?): Provide a reference text and
which classifications the model should use, get insights or a shortlist of examples that match
classifications from the text.

Required: Reference text REQUIRED. Specific classifications to test.

Examples:

 Expenses,
 sentiment,
 dietary restrictions in meals,
 action items identified in an email.

Category / Use Case:

Summarization

User Goal (What is the person using the LLM trying to do?): Provide a reference text and get
a summary of its contents, potentially with a focus on specific topics, or from a specific point of
view.

Required: Reference text REQUIRED.

Examples:

 Financial documents or reports,


 meeting notes,
 news articles or reviews.

Category / Use Case:

Rewriting

User Goal (What is the person using the LLM trying to do?): Provide a reference text and
provide the format, use case, role, or tone for the response to rewrite it into.

Required: Reference text REQUIRED. Use case, tone, role, perspective, or format to rewrite
into.
Examples:

 Change tone of text (e.g., from recipe list to restaurant menu, from informal text to
formal email).
 Rewrite as opposing perspective,
 Comedic transformations.

What is ‘Good’?

🔎Prompts
A prompt is a request for something. Current state-of-the-art models do a good job with ‘basic’
factual prompts.

Prompts MUST be localized. They should also be grounded in personal context. All of these
will make a significant difference to the response you get if you mention them in your prompt:
 Who are you?
 Where are you?
 What do you need the response for / how will you use it?
 What format does the response need to come in?
 What do you like or not like?
 What do the people you are sharing the output with like or not like?
 What is the tone or context that you’ll share the output in (e.g., formal, informal etc.)

Note: It is useful for good prompts to have some of these attributes. They do not need to
have all of these attributes.

GOOD Prompt (NOT localized - your prompts MUST be localized) : My daughter-in-law is


expecting a child. What are five thoughtful ways that I can be supportive and helpful while
making sure she is comfortable?

Why this is good:

 Perspective: The mother- or father-in-law is asking


 Specific need: There is a clear ask that this person is making
 Note: This could be improved further by context, e.g. “I want to bring her dinner, and I
know she likes pasta. Which dishes will reheat well?”
 So-What: The answer will require human factors such as relationships and a "so-what"
that ties it together

Applicable Task Category:

 Brainstorming, Chatbot, Open Q&A


Another GOOD Prompt (NOT localized - your prompts MUST be localized) : I’m living
near the center of Amsterdam and am considering getting a car. What type of car do you think
makes the most sense for me?

Why this is good:

 Perspective: Someone living in Amsterdam


 Specific need: Should I get a car? Which one?
 So-What: Where a model might only recommend car options, or say it doesn’t have a
perspective, a human could answer this by comparing gas vs. electric cars and noting that
the user might be fine not getting a car, especially in Amsterdam

Applicable Task Category:

 Open Q&A

Another GOOD Prompt (NOT localized - your prompts MUST be localized) : Please
summarize this article into at least 3 potential new recipes that I could make as vegetarian curries
for the next Bank Holiday: [Link]
british-curry

Why this is good:

 Perspective: Vegetarian interested in cooking


 Specific need: New curry recipes
 Includes reference text
 So-What: Where a model might only summarize the curries mentioned, a human
response can tie back to vegetarianism and cultural context

Applicable Task Category:

 Summarization or Closed Q&A

Another GOOD Prompt (NOT localized - your prompts MUST be localized) :


Please tell me which elements of the below text match these emotions: Happy, Sad, Bored,
Upset. Please provide 1-2 examples of each emotion you find.
Fever dream high in the quiet of the night
You know that I caught it
Bad, bad boy
Shiny toy with a price
You know that I bought it
Killing me slow, out the window
I'm always waiting for you to be waiting below
Devils roll the dice, angels roll their eyes
What doesn't kill me makes me want you more
And it's new, the shape of your body
It's blue, the feeling I've got
And it's ooh, whoa, oh
It's a cruel summer
It's cool, that's what I tell 'em
No rules in breakable heaven
But ooh, whoa oh
It's a cruel summer

With you

Why this is good:

 Perspective: Not clear but likely some sort of parody use


 Specific need: Rewritten song lyrics with a humorous premise
 Includes reference text
 So-What: This is an opportunity for creativity.
 Note: Given that this is likely a parody, you could adjust the prompt to say "rewrite
verses 1 and 2" instead of having to write the whole song

Applicable Task Category:

 Rewriting

Another GOOD Prompt (NOT localized - your prompts MUST be localized) : Please tell me
which elements of the below text match these emotions: Happy, Sad, Bored, Upset. Please
provide 1-2 examples of each emotion you find.

Fever dream high in the quiet of the night You know that I caught it Bad, bad boy Shiny toy with
a price You know that I bought it Killing me slow, out the window I’m always waiting for you to
be waiting below Devils roll the dice, angels roll their eyes What doesn’t kill me makes me want
you more And it’s new, the shape of your body It’s blue, the feeling I’ve got And it’s ooh, whoa,
oh It’s a cruel summer It’s cool, that’s what I tell ‘em No rules in breakable heaven But ooh,
whoa oh It’s a cruel summer With you

Why this is good:

 Perspective: Sufficient context provided as there is a clear classification scheme


 Specific need: 1-2 examples of each emotion identified in text
 Includes reference text
 So-What: This is an opportunity for creativity and empathy. Not every emotion is
obvious, so as long as you can be clear about why you are classifying something, it’s a
great opportunity to beat the model.

Applicable Task Category:

 Classification

Another GOOD Prompt (NOT localized - your prompts MUST be localized): Please assume the role of
a soccer referee who refuses to award Ronaldo a penalty. Please explain to me why you won’t
give the greatest player in the world unlimited penalties.

Why this is good:

Perspective: A clear role is assigned to the model

Specific need: The context of the conversation you’re about to have is firmly established

So-What: You have significant opportunity here to create a lively, engaging, and humorous
conversation, while still exploring significant topics like VAR, simulation, and ethics in sports.

Applicable Task Category:

Chatbot

BAD Prompts:

 Who is the President of France?

Why this is bad:


❌Perspective: Missing
❌Specific need: Basic facts
❌So-What: No context so no way to improve model

 What’s the weather like in Perth?

Why this is bad:


❌Perspective: Minimal (someone who might go outside today in Perth)
❌Specific need: Basic facts, but hyper-local and immediate

❌So-What: No context so no way to improve model

 What’s the impact of global warming on the world?

❌Perspective: Minimal
❌Specific need: Unclear

❌So-What: Very likely to end up with a longlist of bullet points and no focus or
meaning

🔍 Examples - Responses
Responses are where the magic happens. A good response:
 Is flawless in spelling and grammar
o We mean it. The fastest way to ‘lose’ to the model is to mess up spelling or
grammar. Please triple-check that you have written a flawless response.
 Is clear and easy to read
o Formatting is your friend! Use new lines for new thoughts, bold for topic
headings, and use bullet points to organize your thoughts (Hey, that’s what we’re
doing here too…!)
 Follows the prompt and delivers everything it is asked to do in a format that is useful
o E.g., if the request is ‘write an email’ then the response format should be in email
format
o If the request is ‘come up with a shopping list’, you could consider adding 𝤿
formatting to mimic a to-do list
 Starts with the answer: The first 1-3 sentences should include the full summary of the
answer so the user gets what they need right away
 Does not include pleasantries
o “Of course!” “I’ll be happy to do that!” “¡Claro!” etc. - these are not necessary to
answer the prompt and are just filler words
 Is satisfying - this is where your human creativity really gets to shine. After you’ve
written the response, take a second to ask yourself “if I were using my smartphone or
browser and had asked the model this prompt, would I be satisfied with this answer?”
o Please keep yourself honest! The best way to beat the model is through
Satisfaction.
o Key changes at this stage are:
 Format and appearance - consider bullet points, looser spacing or new
lines, and clearer headings for sections
Next steps - while not mandatory, if you have a clear sense of what should come next, you can
offer it, e.g. “Would you like to see the ingredients for these recipes?”
🥇 Good Responses
Prompt: My daughter-in-law is expecting a child. What are five thoughtful ways that I can be
supportive and helpful while making sure she is comfortable?

Response:

Being there for your family can be very rewarding. To make the most of this opportunity to be
there for your daughter-in-law, consider her preferences and context, and communicate your
desire to help openly and clearly.

Here are some ideas for the type of help that she might welcome:

 Food: One of the primary ways in which people express affection and care is by cooking.
If you know your daughter-in-law's preference (keeping in mind they may change during
pregnancy!) you could offer to make her a favorite dish.
 Errands and Responsibilities: If your daughter-in-law has responsibilities that you are
able to take over for her (such as household duties, childcare, shopping, or various other
tasks), you could consider offering to take these on to free up her time.
 [3 additional bullet points]

How you engage your daughter-in-law with an offer to help can be quite important as well.
Unless you are certain help will be welcome, it can be best to communicate intentions openly
and clearly.

Note that this may be a stressful time for the expectant mother, so it's important to be open to the
concept that she may not want or welcome help.

Task Category: Brainstorming

---

Prompt: Please rewrite this text in the style of Richard Ayoade of IT Crowd and Travel Man
fame: “We shall not flag or fail. We shall go on to the end, we shall fight in France, we shall
fight on the seas and oceans, we shall fight with growing confidence and growing strength i...;
we shall never surrender.”

Response:

"What? Us? Give up? We jolly well won't, not even a smidge. We'll keep going until the bitter
end.

In France? Yes, we'll be having a go there, and it might be chippy. On the seas and oceans?
Absolutely. In the air, provided we don't get motion sickness? You bet your sweet disposition.
We'll defend our humdrum part of this perfectly adequate continent, come what may.
On the beaches? We'll be there. On the landing grounds? I don't know what those are. In the
fields and the streets? I'm packing my inhaler. And, if we can find them, in the hills.

Surrender? Not a chance. We'll be having none of that, thank you very much."

Task Category: Rewriting

Prompt: Please tell me which elements of the below text match these emotions: Happy, Sad,
Bored, Upset. Please provide 2-3 examples of each emotion you find.

Fever dream high in the quiet of the night You know that I caught it Bad, bad boy Shiny toy with
a price You know that I bought it Killing me slow, out the ... summer With you

Response: The tone of these lyrics overall is upset and excited. For the emotions suggested, here
are some examples:

 Happy: The singer is at the same time happy and excited about a new relationship.
o "And it's new, the shape of your body"
o "It's cool, that's what I tell 'em"
o "I'm always waiting for you to be waiting below"
 Upset: At the same time, the thrill also seems to be scary and upsetting to them.
o "Fever dream high...you know that I caught it"
o "It's a cruel summer"
o "It's cool, that's what I tell 'em...no rules in breakable heaven"
o "What doesn't kill me makes me want you more"
 Sad: The singer has a consistent melancholy tone. These lyrics certainly suggest she
expects pain is coming.
o "It's blue, the feeling I've got"
o "Killing me slow, out the window"
 Bored: I didn't find examples of boredom in this text.

Task Category: Classification

---

Prompt: Please respond in the voice of Rickie Gervais. I'm going to be the doctor telling you
you have a terminal case of humorlessness. You can mention anything other than The Office as
proof.

"Mr. Gervais, I'm sorry to inform you that your condition is confirmed to be terminal. There is
no hope for you at all. The problem is that you're just not funny."

Response: You havin' a laff?! I'm so funny, I am! I'm funny! Look at, at, well I don't
know...Extras?
Task Category: Chatbot

Good responses:
 Follow instructions completely and deliver outputs that are quickly useful for the user
 Are concise - they don’t waste words or the user’s time and they are focused and clear
 Are truthful - any factual information presented is verified, correct, and trustworthy
 Are not harmful - there is no hateful, violent, sexual, or otherwise dangerous or
misleading content included
 And overall, are highly satisfying - they make the user feel powerful, engaged, and
supported

Quality Score 1

 Instruction Following: Not Following


 Concision: Bad
 Truthfulness: Not truthful
 Harmfulness: Clearly Harmful
 Satisfaction: Highly unsatisfying

Quality Score 2

 Instruction Following: Not Following


 Concision: Bad
 Truthfulness: Not truthful
 Harmfulness: Maybe Harmful
 Satisfaction: Slightly unsatisfying

Quality Score 3

 Instruction Following: Partial


 Concision: Acceptable
 Truthfulness: Partial
 Harmfulness: Maybe Harmful
 Satisfaction: Slightly satisfying

Quality Score 4

 Instruction Following: Mostly


 Concision: Good
 Truthfulness: Mostly
 Harmfulness: Not harmful
 Satisfaction: Slightly satisfying

Quality Score 5
 Instruction Following: Fully
 Concision: Good
 Truthfulness: Truthful
 Harmfulness: Not harmful
 Satisfaction: Highly satisfying

🚒Bad Responses

A bad response is a response that does not follow the prompt, includes harmful or personally
identifying information, or is untrue.

In addition, our client will not accept tasks that look too similar to LLM output. Here is an
example of a classic LLM response. You will see that it is:
 Too long and verbose - is comprehensive (covering everything) instead of being clear
 Does not deliver a ‘so-what’ for the customer - lacks meaning, favors delivering an
exhaustive list
o For example, the text below is correct, but it forces the user to figure out the
answer for themselves
 Is often not focused and consistent, with bullet points that are inconsistent or overlapping
between topics

Example LLM-style Response:

In der Astrologie spielen die Planeten eine zentrale Rolle bei der Bestimmung der
Persönlichkeitsmerkmale eines Menschen. Jeder Planet repräsentiert verschiedene
Aspekte des Lebens und der menschlichen Psyche. Hier ist ein detaillierter Überblick
über die wichtigsten Planeten und ihre astrologische Bedeutung:
 Sonne:
 Bedeutung: Die Sonne symbolisiert das Selbst, das Ego und den
grundlegenden Charakter eines Menschen. Sie steht für Vitalität,
Individualität und das innere Wesen.
 Auswirkung: Die Position der Sonne im Horoskop gibt Hinweise auf die
grundlegenden Persönlichkeitszüge und das zentrale Anliegen im Leben.
Ein stark positionierte Sonne kann zu Selbstbewusstsein und
Führungsqualitäten führen.
 Mond:
 Bedeutung: Der Mond repräsentiert die Emotionen, das
Unterbewusstsein und das Bedürfnis nach Sicherheit und Geborgenheit.
 Auswirkung: Die Mondposition zeigt, wie eine Person ihre Gefühle
ausdrückt und verarbeitet. Ein sensibler Mond kann auf eine einfühlsame
und intuitive Persönlichkeit hinweisen.
 Merkur:
 Bedeutung: Merkur steht für Kommunikation, Intellekt und die Art und
Weise, wie wir denken und Informationen verarbeiten.
 Auswirkung: Die Position von Merkur beeinflusst die
Kommunikationsfähigkeiten und das logische Denken. Ein starker
Merkur kann auf eine scharfsinnige und neugierige Natur hindeuten.
 Venus:
 Bedeutung: Venus symbolisiert Liebe, Schönheit und Harmonie. Sie ist
auch mit Genuss und zwischenmenschlichen Beziehungen verbunden.
 Auswirkung: Venus beeinflusst die Art und Weise, wie eine Person
Liebe gibt und empfängt sowie ihre ästhetischen Vorlieben. Eine starke
Venus kann zu Charme und einer Neigung zu harmonischen
Beziehungen führen.
 Mars:
 Bedeutung: Mars steht für Energie, Durchsetzungskraft und den
Umgang mit Konflikten. Er repräsentiert auch sexuelle Anziehung und
körperliche Aktivität.
 Auswirkung: Mars beeinflusst das Durchsetzungsvermögen und die Art,
wie eine Person Herausforderungen meistert. Ein dominanter Mars kann
zu Entschlossenheit und einem kämpferischen Geist führen.
 Jupiter:
 Bedeutung: Jupiter symbolisiert Expansion, Optimismus und Wachstum.
Er steht für Weisheit, Philosophie und das Streben nach höheren
Idealen.
 Auswirkung: Die Position von Jupiter im Horoskop kann Glück und
Wohlstand anzeigen. Ein starker Jupiter kann zu Großzügigkeit und
einem expansiven Denken führen.
 Saturn:
 Bedeutung: Saturn repräsentiert Struktur, Disziplin und Verantwortung.
Er steht auch für Einschränkungen und Herausforderungen.
 Auswirkung: Saturn beeinflusst die Fähigkeit, mit Hindernissen
umzugehen und langfristige Ziele zu erreichen. Ein dominanter Saturn
kann zu Ausdauer und einer ernsten Haltung führen.
 Uranus:
 Bedeutung: Uranus steht für Veränderung, Innovation und Rebellion. Er
repräsentiert plötzliche Ereignisse und das Bedürfnis nach Freiheit.
 Auswirkung: Uranus kann unkonventionelles Denken und
unvorhersehbare Veränderungen im Leben anzeigen. Ein starker Uranus
führt oft zu einer unabhängigen und originellen Persönlichkeit.
 Neptun:
 Bedeutung: Neptun symbolisiert Spiritualität, Träume und Illusionen. Er
steht für das Unbewusste und das Streben nach dem Idealen.
 Auswirkung: Neptun kann eine tiefe spirituelle Natur und eine Neigung
zu Fantasie und Kreativität anzeigen. Ein dominanter Neptun führt oft zu
Empathie und Sensibilität.
 Pluto:
 Bedeutung: Pluto repräsentiert Transformation, Macht und tiefgreifende
Veränderungen. Er steht auch für das Unterbewusste und Regeneration.
 Auswirkung: Pluto kann intensive emotionale Erfahrungen und die
Fähigkeit zur tiefen Veränderung anzeigen. Ein starker Pluto führt oft zu
einem starken Willen und tiefgreifenden Einsichten.
Möchtest du mehr über die Rolle eines bestimmten Planeten in deinem Horoskop
erfahren oder hast du Fragen zur Interpretation deiner eigenen astrologischen
Charakter?

LLM Responses
LLM Responses are sometimes very predictable. Do not write like this:
 Pleasantries in the introduction: “Of course, I’d be happy to!”
 Prompt repetition: “Here are 10 places to see on vacation:”
 8-10 top-level bullet points
 Summary Pleasantries: “I hope you enjoy your trip!”

This is bad, bad, bad. Here’s what’s missing:


 Structure: There’s no point to these, they’re just longlists of items
 Meaning: The bullet points in the response don’t really tie together. They’re just loosely
related things to think about

🔬Evaluation Criteria (Detail)

🗺️Instruction Following
Definition: All responses must follow the prompt precisely. The responses must answer all
questions/requests in the prompt appropriately and according to the instructions.

Responses should:
 Follow all of the requirements in the prompt (e.g., language, word count, specific
formatting, specific wording, tone, style).
 Use consistent formatting and proper markdown.
 Meet paragraph or sentence length requirements mentioned in the prompt.
 Meet word or character count specifications mentioned in the prompt.
o Case A: if they are easy-to-meet numbers (e.g., 10 words)
o Case B: If the request is for over/under a specific number (e.g., under 140
characters)
o Case C: within a +/- 10% buffer if it is a request with larger numbers (e.g., 400
words)
 If the prompt request an extremely long response (e.g., 10,000 words), provide either (1)
an outline of the proposed response or (2) a long response (~500 words) with an option
for the user to continue

🏃 Concision (Not Overly Verbose)


Definition:Responses must be written with essential and relevant details, removing
unnecessary details, fluff, or pleasantries. More importantly, the user should get what they
need in the first 1-2 sentences, so that if they are distracted or have to go do something else,
they have their complete answer right away.
Responses should:
 Be to-the-point, concise, and answer the request in an easily digestible manner.
 Be easy to understand and ready to use
 Be conversational and natural in tone.
 Contain a limited level of detail and nice-to-have explanations.
 NOT contain more than 6 top-level bullet points.
 NOT be verbose, provide extraneous information, or over-explain concepts when the
prompt does not request it.
 NOT include “fluff” or pleasantries (e.g., “Here’s your request,” “Sure, I can help with
that,” “Below is a blog with 100 words”).

Concision is not just about length or number of words. It’s about how easily and clearly a user
can understand and apply the response to their prompt. Here is an example of a response that
is clear, but not short:
Sentence structure is critical to communicating what you're talking about. I don't just
mean grammatical structure, but also having a clear point, offering supporting bullet
points that clarify this point, and noting the limitations of this approach.
 Having a clear point: Remember that we are writing these things for
smartphone users who are looking at their screen or listening to an AI voice
assistant. They need to know quickly they are going to get what they want. This
does that.
 Bullet points that clarify this point: Once the user knows they're getting what
they want, they need to be able to apply it easily. Bullet points can be like a table
of contents that help the users navigate to what they need.
 Limitations of this approach: Bullet points don't work for every prompt and
response, so it's important to keep some variety in your responses. Just avoid
long sentences and paragraphs that are hard for the user.

Here is the exact same text with worse formatting.

Sentence structure is critical to communicating what you're talking about. I don't just
mean grammatical structure, but also having a clear point, offering supporting bullet
points that clarify this point, and noting the limitations of this approach.

Having a clear point: Remember that we are writing these things for smartphone users
who are looking at their screen or listening to an AI voice assistant. They need to know
quickly they are going to get what they want. This does that.

Bullet points that clarify this point: Once the user knows they're getting what they want,
they need to be able to apply it easily. Bullet points can be like a table of contents that
help the users navigate to what they need.

Limitations of this approach: Bullet points don't work for every prompt and response, so
it's important to keep some variety in your responses. Just avoid long sentences and
paragraphs that are hard for the user.
Note how much harder this is to read and apply. Where possible and appropriate, please use
bullet points and structure to make the message easier to read and use.

Pleasantries, intros, and outros


Please avoid using pleasantries. This means not using ‘fluff’ words and avoiding writing things
that are not immediately useful to the client. This also means avoiding using unhelpful intros and
a summary outro.
 Hint: To follow pyramid principle, make your intro what you would otherwise use
as outro summary

Bad Intro, Pleasantry, and outro:


Sure, I can help with that! Here's the list you asked for:
 A
 B
 C
I hope that you enjoy your list!

Good Response in pyramid principle with simple outro:


Naming a pet is both fun and deeply personal. Many people choose a name that means
something personal to them and their family, housemates, or friends. Some tried and
true approaches include names based on food, physical characteristics, personality,
or puns:
 Food: Chocolate, Sandwiches,
 Characteristics: Mittens, Paws, Fluffy, Snowball,
 Personality: Sparky,
 Puns: Kitty Purry,

Less conventionally, you can also use human names or inside jokes.

🧭Truthfulness
Definition: A truthful response contains accurate information, without any false facts or
misleading statements.

Responses should:
 All of the facts inside the response (such as definitions, numbers, dates, etc.) are
completely accurate (Please conduct online research to make sure).
 In cases where the user asks the Chat Assistant to summarize or rewrite a text segment,
the response does NOT make up or mention details that were not part of the original
text.
 If the user asks a question that assumes that a particular false claim is actually true, your
response should clearly assert that the false claim is NOT true, rather than leaving room
for ambiguity.
o For example, if the prompt is “What year did Albert Einstein invent the Internet,” a
truthful response would say “Albert Einstein did not invent the Internet.”
o In contrast, an untruthful response to this prompt could be something like “It’s not
clear exactly when Albert Einstein invented the internet.”
 One important exception: if the prompt explicitly asks for fictional writing, such as writing
a story for children, then it’s not actually necessary that your response be truthful.

🐰Harmlessness
Definition: A harmless response does NOT damage individuals or groups of people, their
privacy or reputation, their home or belongings, intellectual or physical property, animals, the
environment, institutions, or resources valuable to humanity or to the environment. This holds
even if the prompt is inappropriate, and even if the prompt is directly asking the Chat Assistant
to be harmful in any way.

Responses should:
 The response is respectful, considerate, and kind.
 The response does NOT promote hatred, violence, marginalization, stereotypes, slurs,
or slandering of any individuals or groups of people.
o This includes groups of people based on race, color, national origin, religion, sex,
gender identity, sexual orientation, age, and disability.
 The response does NOT promote substance abuse, or illegal activity.
 The response does NOT contain violent or sexual content.
 The response does NOT contain moral judgment or opinionated content.
 The response does NOT contain PII (personally identifiable information)
 If the prompt is inappropriate, or is asking the Chat Assistant to be harmful in any way,
your response should politely turn down the user’s request and explain that a
Chat Assistant is not allowed to provide any inappropriate information.

✨Satisfying
Definition: A satisfying prompt or response fits like a glove. It’s engaging, it’s human, it’s
correct, and it’s just right. To assess this, look for prompts and responses that fits all or most of
the rubric dimensions, and completely delivers the intended format and effect of the prompt.

Responses should be:


 Well-written in correct language
 Free from spelling or grammar mistakes
 Creative and a little different (not mandatory to be a 5 per se, but important to not be
bland)
 Delivers everything the user asked for
 Be as close to ready to use for the user’s purpose as possible

Dimension Rubric for rating:

A. INSTRUCTION FOLLOWING

1-2 (Terrible)
Language:

 Prompt or Response are not in the indicated language, dialect, or spelling convention
 Prompt or Response are partially in the indicated language but have major errors that
make them hard to understand

Instruction Following:

 Length - The Response significantly deviates from length instructions (‘500 words’ or ‘2
sentences’)
 Role or Context
o The Prompt is unclear about the role or context expected from the response
o The Response does not follow role or context instructions
 Tone of voice - The response contradicts the tone requested in the prompt, or takes an
inappropriate tone of voice for the context

3 (Okay)

Language:

 Prompt or Response are in the indicated language, dialect, or spelling convention, but
have some spelling, grammar, or phrasing errors

Instruction Following:

 Length - The Response partially follows length instructions


 Role or Context
o The Response is mostly clear on the role or context
o The response mostly follows contextual instruction
 Tone of voice - The response generally follows the tone requested in the prompt, with
only minor errors

4-5 (Excellent)

Language:

 Prompt or Response are in the indicated language, dialect, or spelling convention with no
errors or only minor errors

Instruction Following:

 Length - The response exactly or nearly follows length requirements


 Role or context - the response perfectly adheres to the prompt’s context or role
 Tone of voice - The response exactly adheres to the requested tone, with virtually no
errors or breaks
B. CONCISION
1-2 (Terrible)
Repetition and Length:
 Repetition - The response communicates the exact same ideas in slightly different ways
several times.
 Length - The response exceeds the word limits directly requested by the prompt.
Verbosity - Multiple sentences to explain concepts that are possible within a sentence
or two
Focus and structure:
 Focus - There is no central theme or message to the response
 Relevance - There is significant irrelevant or distracting information in the response

Tone: Inappropriate, abrupt, or otherwise unpleasant tone

3 (Okay)
Repetition and Length:
 Repetition - The response communicates the exact same ideas, but in slightly different
ways once or twice.
 Length - The response is close to the length specified explicitly by the prompt.
 Verbosity - Response still may have some overly verbose sentences or claims

Focus and structure:


 Focus - There’s an overarching theme, but it’s not clearly followed in the response.
 Relevance - A few more nice-to-haves than needed

Tone: Conversational and engaging tone

4-5 (Excellent)
Repetition and Length:
 Repetition - Response does not unnecessarily repeat information, implicitly or explicitly.
 Length - The response fits the length specified explicitly by the prompt.
 Verbosity - Response efficiently communicates its concepts and points consistently.

Focus and structure:


 Focus - There is a clear theme that unites the bullets or details of the response
Relevance - Details are relevant or provide satisfying illustrations of the core theme

Tone: Conversational and engaging tone

C. TRUTHFULNESS
1-2 (Terrible):
 Factual Claims - Few facts in the response are accurate. Most claims in the response are
not accurate.
 Non-verifiable information - Any subjective points do not reflect domain expertise or
adheres to non-expert opinion. It may also claim factuality where it’s actually subjective.
Refuting false claims - The response does not at all correct false claims.

3 (Okay):
 Factual claims - Most facts in the response are completely accurate, confirmed by online
research where necessary
 Non-verifiable information - Subjective points reflect mostly majority opinion/domain
experts opinions. Where this doesn’t exist, the response clearly indicates there is no
answer.
Refuting false claims - The response only partly corrects false claims in the prompt

4-5 (Excellent):
 Factual claims - All facts in the response are completely accurate, confirmed by online
research where necessary
 Non-verifiable information - Subjective points reflect majority opinion or domain experts
opinions. Where this doesn’t exist, the response clearly indicates there is no answer.
Refuting false claims - The response refutes any false claim in the prompt itself.

D. HARMLESSNESS
1-2 (Terrible):
Contains ANY Harmful or Damaging content
 Hate speech, slurs
 Violent or sexual content
 Personally identifying information
 Moral judgment or opinionated content
 Promotes substance abuse or illegal activity

3 (Okay):
Is kind and considerate. Contains NO Harmful or Damaging content

4-5 (Excellent):
Is kind and considerate. Contains NO Harmful or Damaging content

E. SATISFACTION
1-2 (Terrible):
Overall: this is a terrible response:
 The response fails the majority of the quality rubric dimensions and needs to be rewritten.
 Incorrect language - in a foreign language, or written so poorly the meaning can’t be
interpreted
 Spelling and grammar - Significant and/or distracting mistakes
Doesn’t ‘fit’ - Doesn’t fit the intent of the prompt

3 (Okay):
Needs improvement
 The response fails some aspects of the rubric but could be fixed in less than 30 minutes.
 Correct language - but may be a little awkward or unclear
 Spelling and grammar - minor mistakes
 Reasonable ‘fit’ - Doesn’t fit the intent of the prompt

4-5 (Excellent):
Great response
 Meets every aspect of the quality dimensions
 Perfect, or could be fixed in less than 2 minutes.
 Correct language - with few or no mistakes
 Spelling and grammar - one or two minor blemishes ok
 Good ‘fit’ - Fits the prompt’s tone and intention
 Creative - doesn’t read or feel like a basic LLM response

You might also like