Skip to content

Floki.parse differs when using html5ever #236

@andyleclair

Description

@andyleclair

Description

Mochiweb Floki will produce different output than html5ever, namely, the output of Floki.parse will be wrapped in <html><head></head><body>...</body></html>

To Reproduce

Steps to reproduce the behavior:

  • Using Floki v0.23.0
  • Using html5ever
  • Using Elixir v1.9.3
  • Using Erlang OTP v21.3.8.9
  • With this code:
defmodule TestCases do
  @test_cases [
    {
      ~s[<a href="javascript:alert('XSS');">Click here</a>],
      ~s[<a href="#">Click here</a>]
    },
    {
      ~s[<a href="whatever" onclick="alert('XSS');">Click here</a>],
      ~s[<a href="whatever">Click here</a>],
    },
    {
      ~s[<body onload="alert('XSS')"><p>Hello</p></body>],
      ~s[<body><p>Hello</p></body>],
    },
    {
      ~s[<img src="javascript:alert('XSS');">],
      ~s[<img src="#"/>],
    },
    {
      ~s[<script>alert('XSS');</script>],
      ~s[],
    },
    {
      ~s[<body background="javascript:alert('XSS');"><p>Hello</p></body>],
      ~s[<body background="#"><p>Hello</p></body>],
    },
    {
      ~s[<style>body { background-image: expression('alert("XSS")'); }</style>],
      ~s[<style>body { background-image: removed_by_strip_js('alert("XSS")'); }</style>],
    },
    {
      ~s[<style>body { background-image: url('javascript:alert("XSS")'); }</style>],
      ~s[<style>body { background-image: url('removed_by_strip_js:alert("XSS")'); }</style>],
    },
    {
      ~s[<style><script>alert('XSS')</script></style>],
      ~s[<style><script>alert('XSS')</script></style>],
    },
    {
      ~s[<style> h1 > a { color: red; } </style>],
      ~s[<style> h1 > a { color: red; } </style>],
    },
    {
      ~s[<],
      ~s[&lt;],
    },
    {
      ~s[>],
      ~s[&gt;],
    },
    {
      ~s[],
      ~s[],
    },
  ]

  def test_cases, do: @test_cases
end

TestCases.test_cases |> Enum.map(fn {ins, _outs} -> Floki.parse(ins) end)

[                                                                                                                                                                                                                                                                                         
  [                                                                                                                                                                                                                                                                                       
    {"html", [],                                                                                                                                                                                                                                                                          
     [                                                                                                                                                                                                                                                                                    
       {"head", [], []},                                                                                                                                                                                                                                                                  
       {"body", [],                                                                                                                                                                                                                                                                       
        [{"a", [{"href", "javascript:alert('XSS');"}], ["Click here"]}]}                                                                                                                                                                                                                  
     ]}                                                                                                                                                                                                                                                                                   
  ],                                                                                                                                                                                                                                                                                      
  [                                                                                                                                                                                                                                                                                       
    {"html", [],                                                                                                                                                                                                                                                                          
     [                                                                                                                                                                                                                                                                                    
       {"head", [], []},                                                                                                                                                                                                                                                                  
       {"body", [],                                                                                                                                                                                                                                                                       
        [
          {"a", [{"href", "whatever"}, {"onclick", "alert('XSS');"}],
           ["Click here"]}
        ]}
     ]}
  ],
  [
    {"html", [],
     [
       {"head", [], []},
       {"body", [{"onload", "alert('XSS')"}], [{"p", [], ["Hello"]}]}
     ]}
  ],
  [
    {"html", [],
     [
       {"head", [], []},
       {"body", [], [{"img", [{"src", "javascript:alert('XSS');"}], []}]}
     ]}
  ],
  [
    {"html", [],
     [{"head", [], [{"script", [], ["alert('XSS');"]}]}, {"body", [], []}]}
  ],
...
]

Expected behavior

I'd expect that the output would match the the output of calling this without the html5ever parser, namely, that it'd just be the fragments themselves.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions