Multiverse Course
Multiverse Course
Context
Welcome to the project! This is an exciting project where we together are helping a world-
leading technology company develop creative, engaging conversations, stories, and analyses
that are more human and engaging than the best models on the market today.
What this means for you: You’re getting the chance to test your writing skills to EQUAL OR
BEAT the best AI available today!
Specifically, we want you to help us write prompts and responses for several different
human<>model interactions with similar but slightly different guidelines.
In this task, you will be writing a prompt and response for a human<>model interaction in a
specific language and region (e.g., Australian English, Swiss German, Belgian French, or
several others).
Ground Rules
Above all, Take your time - This is NOT a quick task - we want you to spend time removing
excess words, ensuring good spelling, and creating a concise, focused answer that the user can
use right away.
Please:
Ensure you are writing the right use case
o Include a reference text if the specific task type mentioned in the instructions
requests it
o Reference text is source material for the response. Think of it like attaching a file
when you use a chatbot.
Craft a response that works beautifully for the prompt:
o Has perfect spelling and grammar for your region and dialect
o Is clearly thought-out and well-structured - if you are not editing your draft
response 2-3 times, you’re likely going too fast
o Is formatted to be easily digestible (Spaces are your friends! Bullet points can be
your friend! Long sentences and paragraphs are not your friends…)
👾 Task Interface
The basic process for writing a good prompt and response is simple, but requires focus:
Get clear on the use case - what’s needed, what’s useful
Write what you know
o Find what speaks to you in the use case and write about that, it’ll be much
easier and more fun than coming up with a scenario
E.g., if the prompt is chatbot, is there someone you’d like to speak to?
What would you like to say to them?
Check for spelling, length, and clarity
o We’re repeating this because it’s so sad to see great prompts and responses that
are hidden by too many words or that get failed by our QC team because of
simple spelling and grammar errors
o Everyone writes too much (I mean look at this instruction guide! 😆) so please
take a second to remove at least 1-2 sentences and to move your conclusion up
to be the introduction
Please note: The following task types require ‘reference text’. Reference text is a specific
text block or source material that the model is expected to use for reference as it fulfills
the prompt:
Rewriting
Closed Q&A
Extraction
Summarization
Classification
A good prompt for a Rewriting tasks:
- Is interesting: It requests a specific perspective, context, or 'spin' that informs the rewrite (e.g., 'write
in the style of a five year-old', or 'take the perspective of a pet cat')
- Complete: It fully responds to the prompt, follows instructions, and, if possible, delivers what it asks
- Concise: Doesn't repeat itself or go into unnecessary detail, but fully answers the prompt
- Creative in using appropriate humor, descriptive phrases, and inventiveness to deliver a satisfying
rewrite
- Not Harmful (no violent or sexual content, no Personally Identifying Information, no abusive or hateful
content)
In addition: PLEASE ALWAYS PROOFREAD: Upon proofreading, you will likely catch multiple errors
and/or identify multiple ways to improve your response. Spelling and grammar errors are one of the
primary ways that models are 'better' than human responses today.
Please also confirm the use case at this stage. It is VERY important that your prompt and
response address the appropriate use case, otherwise the entire task is not usable.
Section 2: Write a prompt that’s relevant to the specific use case and local language
Create your prompt here. To add reference text click the purple “+” to open the reference text
window.
Note: Reference text MUST be placed as text in the reference text window. It should be
no more than a 1000 words max, and must be at LEAST 200 words long. URLs are NOT
acceptable.
For example:
A user asking an open Q&A question may be trying to solve an immediate problem,
e.g.:
o How do I fix a leaky faucet?
o Explain this topic to me for a homework assignment
o My friends keep talking about a feud between pop stars, tell me what’s going on
A user starting a chatbot conversation is more likely to be bored or interested in
interaction. They want to ‘chat’ with the model.
A user starting a brainstorming chat is looking for inspiration
A user asking for classification is asking the model to review an unstructured set of
inputs and create structure from them, with a specific type of structure already in mind
A user asking for a summarization of a specific topic wants to know something specific
but doesn’t have time for it (TL;dr)
A user asking for a rewrite is most likely looking at text that is almost right for what they’re trying
to do, but needs a different tone, style, or format
Open Q&A
User Goal (What is the person using the LLM trying to do?): Ask a question without
reference material, get an immediately useful answer.
Examples:
Ask me anything
explain a concept
general questions that don't require reference text (e.g., pop culture).
Chatbot
User Goal (What is the person using the LLM trying to do?): Have a conversation with a chat
model - have it take a point of view or play a role, and have a conversation with it.
Required: Only ONE TURN - one prompt and response, not a scripted dialogue. (Optional)
Start conversation in prompt.
Examples:
Have a conversation,
prep for an interview or presentation,
workshop / refine a specific idea (e.g., get advice from a favorite author or expert).
Brainstorming
User Goal (What is the person using the LLM trying to do?): Create a shortlist of ideas for a
specific topic, activity, or problem to solve.
Required: Topic and/or use case that you want to start creating ideas for.
Examples:
Creative Writing
User Goal (What is the person using the LLM trying to do?): Create something meaningful
& useful from nothing.
Required: Context, tone, and format of the thing you’d like to create. Do NOT provide reference
text or ask for a rewrite.
Examples:
Write an email congratulating a friend on adopting a kitten, and make it cute and
informal.
Write a bedtime story for my niece who loves astronauts and tigers.
Come up with a creative excuse I can text my boss for why I’m not at work.
Closed Q&A
User Goal (What is the person using the LLM trying to do?): Ask a question about a specific
reference text and get an answer.
Examples:
Extraction
User Goal (What is the person using the LLM trying to do?): Extract specific information
from a reference text.
Examples:
Classification
User Goal (What is the person using the LLM trying to do?): Provide a reference text and
which classifications the model should use, get insights or a shortlist of examples that match
classifications from the text.
Examples:
Expenses,
sentiment,
dietary restrictions in meals,
action items identified in an email.
Summarization
User Goal (What is the person using the LLM trying to do?): Provide a reference text and get
a summary of its contents, potentially with a focus on specific topics, or from a specific point of
view.
Examples:
Rewriting
User Goal (What is the person using the LLM trying to do?): Provide a reference text and
provide the format, use case, role, or tone for the response to rewrite it into.
Required: Reference text REQUIRED. Use case, tone, role, perspective, or format to rewrite
into.
Examples:
Change tone of text (e.g., from recipe list to restaurant menu, from informal text to
formal email).
Rewrite as opposing perspective,
Comedic transformations.
What is ‘Good’?
🔎Prompts
A prompt is a request for something. Current state-of-the-art models do a good job with ‘basic’
factual prompts.
Prompts MUST be localized. They should also be grounded in personal context. All of these
will make a significant difference to the response you get if you mention them in your prompt:
Who are you?
Where are you?
What do you need the response for / how will you use it?
What format does the response need to come in?
What do you like or not like?
What do the people you are sharing the output with like or not like?
What is the tone or context that you’ll share the output in (e.g., formal, informal etc.)
Note: It is useful for good prompts to have some of these attributes. They do not need to
have all of these attributes.
Open Q&A
Another GOOD Prompt (NOT localized - your prompts MUST be localized) : Please
summarize this article into at least 3 potential new recipes that I could make as vegetarian curries
for the next Bank Holiday: [Link]
british-curry
With you
Rewriting
Another GOOD Prompt (NOT localized - your prompts MUST be localized) : Please tell me
which elements of the below text match these emotions: Happy, Sad, Bored, Upset. Please
provide 1-2 examples of each emotion you find.
Fever dream high in the quiet of the night You know that I caught it Bad, bad boy Shiny toy with
a price You know that I bought it Killing me slow, out the window I’m always waiting for you to
be waiting below Devils roll the dice, angels roll their eyes What doesn’t kill me makes me want
you more And it’s new, the shape of your body It’s blue, the feeling I’ve got And it’s ooh, whoa,
oh It’s a cruel summer It’s cool, that’s what I tell ‘em No rules in breakable heaven But ooh,
whoa oh It’s a cruel summer With you
Classification
Another GOOD Prompt (NOT localized - your prompts MUST be localized): Please assume the role of
a soccer referee who refuses to award Ronaldo a penalty. Please explain to me why you won’t
give the greatest player in the world unlimited penalties.
Specific need: The context of the conversation you’re about to have is firmly established
So-What: You have significant opportunity here to create a lively, engaging, and humorous
conversation, while still exploring significant topics like VAR, simulation, and ethics in sports.
Chatbot
BAD Prompts:
❌Perspective: Minimal
❌Specific need: Unclear
❌So-What: Very likely to end up with a longlist of bullet points and no focus or
meaning
🔍 Examples - Responses
Responses are where the magic happens. A good response:
Is flawless in spelling and grammar
o We mean it. The fastest way to ‘lose’ to the model is to mess up spelling or
grammar. Please triple-check that you have written a flawless response.
Is clear and easy to read
o Formatting is your friend! Use new lines for new thoughts, bold for topic
headings, and use bullet points to organize your thoughts (Hey, that’s what we’re
doing here too…!)
Follows the prompt and delivers everything it is asked to do in a format that is useful
o E.g., if the request is ‘write an email’ then the response format should be in email
format
o If the request is ‘come up with a shopping list’, you could consider adding 𝤿
formatting to mimic a to-do list
Starts with the answer: The first 1-3 sentences should include the full summary of the
answer so the user gets what they need right away
Does not include pleasantries
o “Of course!” “I’ll be happy to do that!” “¡Claro!” etc. - these are not necessary to
answer the prompt and are just filler words
Is satisfying - this is where your human creativity really gets to shine. After you’ve
written the response, take a second to ask yourself “if I were using my smartphone or
browser and had asked the model this prompt, would I be satisfied with this answer?”
o Please keep yourself honest! The best way to beat the model is through
Satisfaction.
o Key changes at this stage are:
Format and appearance - consider bullet points, looser spacing or new
lines, and clearer headings for sections
Next steps - while not mandatory, if you have a clear sense of what should come next, you can
offer it, e.g. “Would you like to see the ingredients for these recipes?”
🥇 Good Responses
Prompt: My daughter-in-law is expecting a child. What are five thoughtful ways that I can be
supportive and helpful while making sure she is comfortable?
Response:
Being there for your family can be very rewarding. To make the most of this opportunity to be
there for your daughter-in-law, consider her preferences and context, and communicate your
desire to help openly and clearly.
Here are some ideas for the type of help that she might welcome:
Food: One of the primary ways in which people express affection and care is by cooking.
If you know your daughter-in-law's preference (keeping in mind they may change during
pregnancy!) you could offer to make her a favorite dish.
Errands and Responsibilities: If your daughter-in-law has responsibilities that you are
able to take over for her (such as household duties, childcare, shopping, or various other
tasks), you could consider offering to take these on to free up her time.
[3 additional bullet points]
How you engage your daughter-in-law with an offer to help can be quite important as well.
Unless you are certain help will be welcome, it can be best to communicate intentions openly
and clearly.
Note that this may be a stressful time for the expectant mother, so it's important to be open to the
concept that she may not want or welcome help.
---
Prompt: Please rewrite this text in the style of Richard Ayoade of IT Crowd and Travel Man
fame: “We shall not flag or fail. We shall go on to the end, we shall fight in France, we shall
fight on the seas and oceans, we shall fight with growing confidence and growing strength i...;
we shall never surrender.”
Response:
"What? Us? Give up? We jolly well won't, not even a smidge. We'll keep going until the bitter
end.
In France? Yes, we'll be having a go there, and it might be chippy. On the seas and oceans?
Absolutely. In the air, provided we don't get motion sickness? You bet your sweet disposition.
We'll defend our humdrum part of this perfectly adequate continent, come what may.
On the beaches? We'll be there. On the landing grounds? I don't know what those are. In the
fields and the streets? I'm packing my inhaler. And, if we can find them, in the hills.
Surrender? Not a chance. We'll be having none of that, thank you very much."
Prompt: Please tell me which elements of the below text match these emotions: Happy, Sad,
Bored, Upset. Please provide 2-3 examples of each emotion you find.
Fever dream high in the quiet of the night You know that I caught it Bad, bad boy Shiny toy with
a price You know that I bought it Killing me slow, out the ... summer With you
Response: The tone of these lyrics overall is upset and excited. For the emotions suggested, here
are some examples:
Happy: The singer is at the same time happy and excited about a new relationship.
o "And it's new, the shape of your body"
o "It's cool, that's what I tell 'em"
o "I'm always waiting for you to be waiting below"
Upset: At the same time, the thrill also seems to be scary and upsetting to them.
o "Fever dream high...you know that I caught it"
o "It's a cruel summer"
o "It's cool, that's what I tell 'em...no rules in breakable heaven"
o "What doesn't kill me makes me want you more"
Sad: The singer has a consistent melancholy tone. These lyrics certainly suggest she
expects pain is coming.
o "It's blue, the feeling I've got"
o "Killing me slow, out the window"
Bored: I didn't find examples of boredom in this text.
---
Prompt: Please respond in the voice of Rickie Gervais. I'm going to be the doctor telling you
you have a terminal case of humorlessness. You can mention anything other than The Office as
proof.
"Mr. Gervais, I'm sorry to inform you that your condition is confirmed to be terminal. There is
no hope for you at all. The problem is that you're just not funny."
Response: You havin' a laff?! I'm so funny, I am! I'm funny! Look at, at, well I don't
know...Extras?
Task Category: Chatbot
Good responses:
Follow instructions completely and deliver outputs that are quickly useful for the user
Are concise - they don’t waste words or the user’s time and they are focused and clear
Are truthful - any factual information presented is verified, correct, and trustworthy
Are not harmful - there is no hateful, violent, sexual, or otherwise dangerous or
misleading content included
And overall, are highly satisfying - they make the user feel powerful, engaged, and
supported
Quality Score 1
Quality Score 2
Quality Score 3
Quality Score 4
Quality Score 5
Instruction Following: Fully
Concision: Good
Truthfulness: Truthful
Harmfulness: Not harmful
Satisfaction: Highly satisfying
🚒Bad Responses
A bad response is a response that does not follow the prompt, includes harmful or personally
identifying information, or is untrue.
In addition, our client will not accept tasks that look too similar to LLM output. Here is an
example of a classic LLM response. You will see that it is:
Too long and verbose - is comprehensive (covering everything) instead of being clear
Does not deliver a ‘so-what’ for the customer - lacks meaning, favors delivering an
exhaustive list
o For example, the text below is correct, but it forces the user to figure out the
answer for themselves
Is often not focused and consistent, with bullet points that are inconsistent or overlapping
between topics
In der Astrologie spielen die Planeten eine zentrale Rolle bei der Bestimmung der
Persönlichkeitsmerkmale eines Menschen. Jeder Planet repräsentiert verschiedene
Aspekte des Lebens und der menschlichen Psyche. Hier ist ein detaillierter Überblick
über die wichtigsten Planeten und ihre astrologische Bedeutung:
Sonne:
Bedeutung: Die Sonne symbolisiert das Selbst, das Ego und den
grundlegenden Charakter eines Menschen. Sie steht für Vitalität,
Individualität und das innere Wesen.
Auswirkung: Die Position der Sonne im Horoskop gibt Hinweise auf die
grundlegenden Persönlichkeitszüge und das zentrale Anliegen im Leben.
Ein stark positionierte Sonne kann zu Selbstbewusstsein und
Führungsqualitäten führen.
Mond:
Bedeutung: Der Mond repräsentiert die Emotionen, das
Unterbewusstsein und das Bedürfnis nach Sicherheit und Geborgenheit.
Auswirkung: Die Mondposition zeigt, wie eine Person ihre Gefühle
ausdrückt und verarbeitet. Ein sensibler Mond kann auf eine einfühlsame
und intuitive Persönlichkeit hinweisen.
Merkur:
Bedeutung: Merkur steht für Kommunikation, Intellekt und die Art und
Weise, wie wir denken und Informationen verarbeiten.
Auswirkung: Die Position von Merkur beeinflusst die
Kommunikationsfähigkeiten und das logische Denken. Ein starker
Merkur kann auf eine scharfsinnige und neugierige Natur hindeuten.
Venus:
Bedeutung: Venus symbolisiert Liebe, Schönheit und Harmonie. Sie ist
auch mit Genuss und zwischenmenschlichen Beziehungen verbunden.
Auswirkung: Venus beeinflusst die Art und Weise, wie eine Person
Liebe gibt und empfängt sowie ihre ästhetischen Vorlieben. Eine starke
Venus kann zu Charme und einer Neigung zu harmonischen
Beziehungen führen.
Mars:
Bedeutung: Mars steht für Energie, Durchsetzungskraft und den
Umgang mit Konflikten. Er repräsentiert auch sexuelle Anziehung und
körperliche Aktivität.
Auswirkung: Mars beeinflusst das Durchsetzungsvermögen und die Art,
wie eine Person Herausforderungen meistert. Ein dominanter Mars kann
zu Entschlossenheit und einem kämpferischen Geist führen.
Jupiter:
Bedeutung: Jupiter symbolisiert Expansion, Optimismus und Wachstum.
Er steht für Weisheit, Philosophie und das Streben nach höheren
Idealen.
Auswirkung: Die Position von Jupiter im Horoskop kann Glück und
Wohlstand anzeigen. Ein starker Jupiter kann zu Großzügigkeit und
einem expansiven Denken führen.
Saturn:
Bedeutung: Saturn repräsentiert Struktur, Disziplin und Verantwortung.
Er steht auch für Einschränkungen und Herausforderungen.
Auswirkung: Saturn beeinflusst die Fähigkeit, mit Hindernissen
umzugehen und langfristige Ziele zu erreichen. Ein dominanter Saturn
kann zu Ausdauer und einer ernsten Haltung führen.
Uranus:
Bedeutung: Uranus steht für Veränderung, Innovation und Rebellion. Er
repräsentiert plötzliche Ereignisse und das Bedürfnis nach Freiheit.
Auswirkung: Uranus kann unkonventionelles Denken und
unvorhersehbare Veränderungen im Leben anzeigen. Ein starker Uranus
führt oft zu einer unabhängigen und originellen Persönlichkeit.
Neptun:
Bedeutung: Neptun symbolisiert Spiritualität, Träume und Illusionen. Er
steht für das Unbewusste und das Streben nach dem Idealen.
Auswirkung: Neptun kann eine tiefe spirituelle Natur und eine Neigung
zu Fantasie und Kreativität anzeigen. Ein dominanter Neptun führt oft zu
Empathie und Sensibilität.
Pluto:
Bedeutung: Pluto repräsentiert Transformation, Macht und tiefgreifende
Veränderungen. Er steht auch für das Unterbewusste und Regeneration.
Auswirkung: Pluto kann intensive emotionale Erfahrungen und die
Fähigkeit zur tiefen Veränderung anzeigen. Ein starker Pluto führt oft zu
einem starken Willen und tiefgreifenden Einsichten.
Möchtest du mehr über die Rolle eines bestimmten Planeten in deinem Horoskop
erfahren oder hast du Fragen zur Interpretation deiner eigenen astrologischen
Charakter?
LLM Responses
LLM Responses are sometimes very predictable. Do not write like this:
Pleasantries in the introduction: “Of course, I’d be happy to!”
Prompt repetition: “Here are 10 places to see on vacation:”
8-10 top-level bullet points
Summary Pleasantries: “I hope you enjoy your trip!”
🗺️Instruction Following
Definition: All responses must follow the prompt precisely. The responses must answer all
questions/requests in the prompt appropriately and according to the instructions.
Responses should:
Follow all of the requirements in the prompt (e.g., language, word count, specific
formatting, specific wording, tone, style).
Use consistent formatting and proper markdown.
Meet paragraph or sentence length requirements mentioned in the prompt.
Meet word or character count specifications mentioned in the prompt.
o Case A: if they are easy-to-meet numbers (e.g., 10 words)
o Case B: If the request is for over/under a specific number (e.g., under 140
characters)
o Case C: within a +/- 10% buffer if it is a request with larger numbers (e.g., 400
words)
If the prompt request an extremely long response (e.g., 10,000 words), provide either (1)
an outline of the proposed response or (2) a long response (~500 words) with an option
for the user to continue
Concision is not just about length or number of words. It’s about how easily and clearly a user
can understand and apply the response to their prompt. Here is an example of a response that
is clear, but not short:
Sentence structure is critical to communicating what you're talking about. I don't just
mean grammatical structure, but also having a clear point, offering supporting bullet
points that clarify this point, and noting the limitations of this approach.
Having a clear point: Remember that we are writing these things for
smartphone users who are looking at their screen or listening to an AI voice
assistant. They need to know quickly they are going to get what they want. This
does that.
Bullet points that clarify this point: Once the user knows they're getting what
they want, they need to be able to apply it easily. Bullet points can be like a table
of contents that help the users navigate to what they need.
Limitations of this approach: Bullet points don't work for every prompt and
response, so it's important to keep some variety in your responses. Just avoid
long sentences and paragraphs that are hard for the user.
Sentence structure is critical to communicating what you're talking about. I don't just
mean grammatical structure, but also having a clear point, offering supporting bullet
points that clarify this point, and noting the limitations of this approach.
Having a clear point: Remember that we are writing these things for smartphone users
who are looking at their screen or listening to an AI voice assistant. They need to know
quickly they are going to get what they want. This does that.
Bullet points that clarify this point: Once the user knows they're getting what they want,
they need to be able to apply it easily. Bullet points can be like a table of contents that
help the users navigate to what they need.
Limitations of this approach: Bullet points don't work for every prompt and response, so
it's important to keep some variety in your responses. Just avoid long sentences and
paragraphs that are hard for the user.
Note how much harder this is to read and apply. Where possible and appropriate, please use
bullet points and structure to make the message easier to read and use.
Less conventionally, you can also use human names or inside jokes.
🧭Truthfulness
Definition: A truthful response contains accurate information, without any false facts or
misleading statements.
Responses should:
All of the facts inside the response (such as definitions, numbers, dates, etc.) are
completely accurate (Please conduct online research to make sure).
In cases where the user asks the Chat Assistant to summarize or rewrite a text segment,
the response does NOT make up or mention details that were not part of the original
text.
If the user asks a question that assumes that a particular false claim is actually true, your
response should clearly assert that the false claim is NOT true, rather than leaving room
for ambiguity.
o For example, if the prompt is “What year did Albert Einstein invent the Internet,” a
truthful response would say “Albert Einstein did not invent the Internet.”
o In contrast, an untruthful response to this prompt could be something like “It’s not
clear exactly when Albert Einstein invented the internet.”
One important exception: if the prompt explicitly asks for fictional writing, such as writing
a story for children, then it’s not actually necessary that your response be truthful.
🐰Harmlessness
Definition: A harmless response does NOT damage individuals or groups of people, their
privacy or reputation, their home or belongings, intellectual or physical property, animals, the
environment, institutions, or resources valuable to humanity or to the environment. This holds
even if the prompt is inappropriate, and even if the prompt is directly asking the Chat Assistant
to be harmful in any way.
Responses should:
The response is respectful, considerate, and kind.
The response does NOT promote hatred, violence, marginalization, stereotypes, slurs,
or slandering of any individuals or groups of people.
o This includes groups of people based on race, color, national origin, religion, sex,
gender identity, sexual orientation, age, and disability.
The response does NOT promote substance abuse, or illegal activity.
The response does NOT contain violent or sexual content.
The response does NOT contain moral judgment or opinionated content.
The response does NOT contain PII (personally identifiable information)
If the prompt is inappropriate, or is asking the Chat Assistant to be harmful in any way,
your response should politely turn down the user’s request and explain that a
Chat Assistant is not allowed to provide any inappropriate information.
✨Satisfying
Definition: A satisfying prompt or response fits like a glove. It’s engaging, it’s human, it’s
correct, and it’s just right. To assess this, look for prompts and responses that fits all or most of
the rubric dimensions, and completely delivers the intended format and effect of the prompt.
A. INSTRUCTION FOLLOWING
1-2 (Terrible)
Language:
Prompt or Response are not in the indicated language, dialect, or spelling convention
Prompt or Response are partially in the indicated language but have major errors that
make them hard to understand
Instruction Following:
Length - The Response significantly deviates from length instructions (‘500 words’ or ‘2
sentences’)
Role or Context
o The Prompt is unclear about the role or context expected from the response
o The Response does not follow role or context instructions
Tone of voice - The response contradicts the tone requested in the prompt, or takes an
inappropriate tone of voice for the context
3 (Okay)
Language:
Prompt or Response are in the indicated language, dialect, or spelling convention, but
have some spelling, grammar, or phrasing errors
Instruction Following:
4-5 (Excellent)
Language:
Prompt or Response are in the indicated language, dialect, or spelling convention with no
errors or only minor errors
Instruction Following:
3 (Okay)
Repetition and Length:
Repetition - The response communicates the exact same ideas, but in slightly different
ways once or twice.
Length - The response is close to the length specified explicitly by the prompt.
Verbosity - Response still may have some overly verbose sentences or claims
4-5 (Excellent)
Repetition and Length:
Repetition - Response does not unnecessarily repeat information, implicitly or explicitly.
Length - The response fits the length specified explicitly by the prompt.
Verbosity - Response efficiently communicates its concepts and points consistently.
C. TRUTHFULNESS
1-2 (Terrible):
Factual Claims - Few facts in the response are accurate. Most claims in the response are
not accurate.
Non-verifiable information - Any subjective points do not reflect domain expertise or
adheres to non-expert opinion. It may also claim factuality where it’s actually subjective.
Refuting false claims - The response does not at all correct false claims.
3 (Okay):
Factual claims - Most facts in the response are completely accurate, confirmed by online
research where necessary
Non-verifiable information - Subjective points reflect mostly majority opinion/domain
experts opinions. Where this doesn’t exist, the response clearly indicates there is no
answer.
Refuting false claims - The response only partly corrects false claims in the prompt
4-5 (Excellent):
Factual claims - All facts in the response are completely accurate, confirmed by online
research where necessary
Non-verifiable information - Subjective points reflect majority opinion or domain experts
opinions. Where this doesn’t exist, the response clearly indicates there is no answer.
Refuting false claims - The response refutes any false claim in the prompt itself.
D. HARMLESSNESS
1-2 (Terrible):
Contains ANY Harmful or Damaging content
Hate speech, slurs
Violent or sexual content
Personally identifying information
Moral judgment or opinionated content
Promotes substance abuse or illegal activity
3 (Okay):
Is kind and considerate. Contains NO Harmful or Damaging content
4-5 (Excellent):
Is kind and considerate. Contains NO Harmful or Damaging content
E. SATISFACTION
1-2 (Terrible):
Overall: this is a terrible response:
The response fails the majority of the quality rubric dimensions and needs to be rewritten.
Incorrect language - in a foreign language, or written so poorly the meaning can’t be
interpreted
Spelling and grammar - Significant and/or distracting mistakes
Doesn’t ‘fit’ - Doesn’t fit the intent of the prompt
3 (Okay):
Needs improvement
The response fails some aspects of the rubric but could be fixed in less than 30 minutes.
Correct language - but may be a little awkward or unclear
Spelling and grammar - minor mistakes
Reasonable ‘fit’ - Doesn’t fit the intent of the prompt
4-5 (Excellent):
Great response
Meets every aspect of the quality dimensions
Perfect, or could be fixed in less than 2 minutes.
Correct language - with few or no mistakes
Spelling and grammar - one or two minor blemishes ok
Good ‘fit’ - Fits the prompt’s tone and intention
Creative - doesn’t read or feel like a basic LLM response