# Why sanitize when you can validate?
## Background
A sanitizer takes in a string in a language and puts out a *safe*
version.
Occasionally people ask for a function, that instead of returning a
safe version of the input, just labels the input as *safe* or
*unsafe*.
Herein I address why I think the latter is a bad idea for HTML
specifically. Hopefully this will prompt a discussion, and I'm
interested why people want validators. Please let me know your
thoughts on use cases and how that relates to the definition of
*valid*.
## Defining "Valid"
The sanitizer promises that it's output can be safely embedded in a
larger document.
It seems to me that any *valid* input should also have this property.
#### Valid means idempotent
One naïve way to define *valid* is thus:
> A valid input is any input such that `input.equals(sanitized(input))`.
This is sound, but not very useful. Intuitively, it seems that there
must be a lot of inputs that don't have this property but are not
unsafe.
For example, maybe the sanitizer takes as an input
```html
For example
```
and returns
```html
For example
```
This difference seems unimportant.
#### Valid according to policy
Instead, we could try to define *valid* thus:
> An input is valid when the policy rejects no part of it.
This misses part of the picture. A string is safe because of the way
browsers parse it, **not** the way the sanitizer parses it.
```html
```
contains a script tag when served to Internet Explorer, but contains
no tags at all when served to other browsers.
If the sanitizer interprets all comments as ignorable content, then
the policy never sees the `
```
contains a `