mirror of
https://github.com/servo/servo.git
synced 2025-08-07 06:25:32 +01:00
Update web-platform-tests to revision 0d318188757a9c996e20b82db201fd04de5aa255
This commit is contained in:
parent
b2a5225831
commit
1a81b18b9f
12321 changed files with 544385 additions and 6 deletions
20
tests/wpt/web-platform-tests/tools/html5lib/.gitignore
vendored
Normal file
20
tests/wpt/web-platform-tests/tools/html5lib/.gitignore
vendored
Normal file
|
@ -0,0 +1,20 @@
|
|||
# Because we never want compiled Python
|
||||
__pycache__/
|
||||
*.pyc
|
||||
|
||||
# Ignore stuff produced by distutils
|
||||
/build/
|
||||
/dist/
|
||||
/MANIFEST
|
||||
|
||||
# Generated by parse.py -p
|
||||
stats.prof
|
||||
|
||||
# From cover (esp. in combination with nose)
|
||||
.coverage
|
||||
|
||||
# Because tox's data is inherently local
|
||||
/.tox/
|
||||
|
||||
# We have no interest in built Sphinx files
|
||||
/doc/_build
|
3
tests/wpt/web-platform-tests/tools/html5lib/.gitmodules
vendored
Normal file
3
tests/wpt/web-platform-tests/tools/html5lib/.gitmodules
vendored
Normal file
|
@ -0,0 +1,3 @@
|
|||
[submodule "testdata"]
|
||||
path = html5lib/tests/testdata
|
||||
url = https://github.com/html5lib/html5lib-tests.git
|
37
tests/wpt/web-platform-tests/tools/html5lib/.travis.yml
Normal file
37
tests/wpt/web-platform-tests/tools/html5lib/.travis.yml
Normal file
|
@ -0,0 +1,37 @@
|
|||
language: python
|
||||
python:
|
||||
- "2.6"
|
||||
- "2.7"
|
||||
- "3.2"
|
||||
- "3.3"
|
||||
- "3.4"
|
||||
- "pypy"
|
||||
|
||||
env:
|
||||
- USE_OPTIONAL=true
|
||||
- USE_OPTIONAL=false
|
||||
|
||||
matrix:
|
||||
exclude:
|
||||
- python: "2.7"
|
||||
env: USE_OPTIONAL=false
|
||||
- python: "3.4"
|
||||
env: USE_OPTIONAL=false
|
||||
include:
|
||||
- python: "2.7"
|
||||
env: USE_OPTIONAL=false FLAKE=true
|
||||
- python: "3.4"
|
||||
env: USE_OPTIONAL=false FLAKE=true
|
||||
|
||||
before_install:
|
||||
- git submodule update --init --recursive
|
||||
|
||||
install:
|
||||
- bash requirements-install.sh
|
||||
|
||||
script:
|
||||
- nosetests
|
||||
- bash flake8-run.sh
|
||||
|
||||
after_script:
|
||||
- python debug-info.py
|
34
tests/wpt/web-platform-tests/tools/html5lib/AUTHORS.rst
Normal file
34
tests/wpt/web-platform-tests/tools/html5lib/AUTHORS.rst
Normal file
|
@ -0,0 +1,34 @@
|
|||
Credits
|
||||
=======
|
||||
|
||||
``html5lib`` is written and maintained by:
|
||||
|
||||
- James Graham
|
||||
- Geoffrey Sneddon
|
||||
- Łukasz Langa
|
||||
|
||||
|
||||
Patches and suggestions
|
||||
-----------------------
|
||||
(In chronological order, by first commit:)
|
||||
|
||||
- Anne van Kesteren
|
||||
- Lachlan Hunt
|
||||
- lantis63
|
||||
- Sam Ruby
|
||||
- Tim Fletcher
|
||||
- Thomas Broyer
|
||||
- Mark Pilgrim
|
||||
- Philip Taylor
|
||||
- Ryan King
|
||||
- Edward Z. Yang
|
||||
- fantasai
|
||||
- Philip Jägenstedt
|
||||
- Ms2ger
|
||||
- Andy Wingo
|
||||
- Andreas Madsack
|
||||
- Karim Valiev
|
||||
- Mohammad Taha Jahangir
|
||||
- Juan Carlos Garcia Segovia
|
||||
- Mike West
|
||||
- Marc DM
|
171
tests/wpt/web-platform-tests/tools/html5lib/CHANGES.rst
Normal file
171
tests/wpt/web-platform-tests/tools/html5lib/CHANGES.rst
Normal file
|
@ -0,0 +1,171 @@
|
|||
Change Log
|
||||
----------
|
||||
|
||||
0.9999
|
||||
~~~~~~
|
||||
|
||||
Released on XXX, 2014
|
||||
|
||||
* XXX
|
||||
|
||||
|
||||
0.999
|
||||
~~~~~
|
||||
|
||||
Released on December 23, 2013
|
||||
|
||||
* Fix #127: add work-around for CPython issue #20007: .read(0) on
|
||||
http.client.HTTPResponse drops the rest of the content.
|
||||
|
||||
* Fix #115: lxml treewalker can now deal with fragments containing, at
|
||||
their root level, text nodes with non-ASCII characters on Python 2.
|
||||
|
||||
|
||||
0.99
|
||||
~~~~
|
||||
|
||||
Released on September 10, 2013
|
||||
|
||||
* No library changes from 1.0b3; released as 0.99 as pip has changed
|
||||
behaviour from 1.4 to avoid installing pre-release versions per
|
||||
PEP 440.
|
||||
|
||||
|
||||
1.0b3
|
||||
~~~~~
|
||||
|
||||
Released on July 24, 2013
|
||||
|
||||
* Removed ``RecursiveTreeWalker`` from ``treewalkers._base``. Any
|
||||
implementation using it should be moved to
|
||||
``NonRecursiveTreeWalker``, as everything bundled with html5lib has
|
||||
for years.
|
||||
|
||||
* Fix #67 so that ``BufferedStream`` to correctly returns a bytes
|
||||
object, thereby fixing any case where html5lib is passed a
|
||||
non-seekable RawIOBase-like object.
|
||||
|
||||
|
||||
1.0b2
|
||||
~~~~~
|
||||
|
||||
Released on June 27, 2013
|
||||
|
||||
* Removed reordering of attributes within the serializer. There is now
|
||||
an ``alphabetical_attributes`` option which preserves the previous
|
||||
behaviour through a new filter. This allows attribute order to be
|
||||
preserved through html5lib if the tree builder preserves order.
|
||||
|
||||
* Removed ``dom2sax`` from DOM treebuilders. It has been replaced by
|
||||
``treeadapters.sax.to_sax`` which is generic and supports any
|
||||
treewalker; it also resolves all known bugs with ``dom2sax``.
|
||||
|
||||
* Fix treewalker assertions on hitting bytes strings on
|
||||
Python 2. Previous to 1.0b1, treewalkers coped with mixed
|
||||
bytes/unicode data on Python 2; this reintroduces this prior
|
||||
behaviour on Python 2. Behaviour is unchanged on Python 3.
|
||||
|
||||
|
||||
1.0b1
|
||||
~~~~~
|
||||
|
||||
Released on May 17, 2013
|
||||
|
||||
* Implementation updated to implement the `HTML specification
|
||||
<http://www.whatwg.org/specs/web-apps/current-work/>`_ as of 5th May
|
||||
2013 (`SVN <http://svn.whatwg.org/webapps/>`_ revision r7867).
|
||||
|
||||
* Python 3.2+ supported in a single codebase using the ``six`` library.
|
||||
|
||||
* Removed support for Python 2.5 and older.
|
||||
|
||||
* Removed the deprecated Beautiful Soup 3 treebuilder.
|
||||
``beautifulsoup4`` can use ``html5lib`` as a parser instead. Note that
|
||||
since it doesn't support namespaces, foreign content like SVG and
|
||||
MathML is parsed incorrectly.
|
||||
|
||||
* Removed ``simpletree`` from the package. The default tree builder is
|
||||
now ``etree`` (using the ``xml.etree.cElementTree`` implementation if
|
||||
available, and ``xml.etree.ElementTree`` otherwise).
|
||||
|
||||
* Removed the ``XHTMLSerializer`` as it never actually guaranteed its
|
||||
output was well-formed XML, and hence provided little of use.
|
||||
|
||||
* Removed default DOM treebuilder, so ``html5lib.treebuilders.dom`` is no
|
||||
longer supported. ``html5lib.treebuilders.getTreeBuilder("dom")`` will
|
||||
return the default DOM treebuilder, which uses ``xml.dom.minidom``.
|
||||
|
||||
* Optional heuristic character encoding detection now based on
|
||||
``charade`` for Python 2.6 - 3.3 compatibility.
|
||||
|
||||
* Optional ``Genshi`` treewalker support fixed.
|
||||
|
||||
* Many bugfixes, including:
|
||||
|
||||
* #33: null in attribute value breaks XML AttValue;
|
||||
|
||||
* #4: nested, indirect descendant, <button> causes infinite loop;
|
||||
|
||||
* `Google Code 215
|
||||
<http://code.google.com/p/html5lib/issues/detail?id=215>`_: Properly
|
||||
detect seekable streams;
|
||||
|
||||
* `Google Code 206
|
||||
<http://code.google.com/p/html5lib/issues/detail?id=206>`_: add
|
||||
support for <video preload=...>, <audio preload=...>;
|
||||
|
||||
* `Google Code 205
|
||||
<http://code.google.com/p/html5lib/issues/detail?id=205>`_: add
|
||||
support for <video poster=...>;
|
||||
|
||||
* `Google Code 202
|
||||
<http://code.google.com/p/html5lib/issues/detail?id=202>`_: Unicode
|
||||
file breaks InputStream.
|
||||
|
||||
* Source code is now mostly PEP 8 compliant.
|
||||
|
||||
* Test harness has been improved and now depends on ``nose``.
|
||||
|
||||
* Documentation updated and moved to http://html5lib.readthedocs.org/.
|
||||
|
||||
|
||||
0.95
|
||||
~~~~
|
||||
|
||||
Released on February 11, 2012
|
||||
|
||||
|
||||
0.90
|
||||
~~~~
|
||||
|
||||
Released on January 17, 2010
|
||||
|
||||
|
||||
0.11.1
|
||||
~~~~~~
|
||||
|
||||
Released on June 12, 2008
|
||||
|
||||
|
||||
0.11
|
||||
~~~~
|
||||
|
||||
Released on June 10, 2008
|
||||
|
||||
|
||||
0.10
|
||||
~~~~
|
||||
|
||||
Released on October 7, 2007
|
||||
|
||||
|
||||
0.9
|
||||
~~~
|
||||
|
||||
Released on March 11, 2007
|
||||
|
||||
|
||||
0.2
|
||||
~~~
|
||||
|
||||
Released on January 8, 2007
|
60
tests/wpt/web-platform-tests/tools/html5lib/CONTRIBUTING.rst
Normal file
60
tests/wpt/web-platform-tests/tools/html5lib/CONTRIBUTING.rst
Normal file
|
@ -0,0 +1,60 @@
|
|||
Contributing
|
||||
============
|
||||
|
||||
Pull requests are more than welcome — both to the library and to the
|
||||
documentation. Some useful information:
|
||||
|
||||
- We aim to follow PEP 8 in the library, but ignoring the
|
||||
79-character-per-line limit, instead following a soft limit of 99,
|
||||
but allowing lines over this where it is the readable thing to do.
|
||||
|
||||
- We aim to follow PEP 257 for all docstrings, and make them properly
|
||||
parseable by Sphinx while generating API documentation.
|
||||
|
||||
- We keep ``pyflakes`` reporting no errors or warnings at all times.
|
||||
|
||||
- We keep the master branch passing all tests at all times on all
|
||||
supported versions.
|
||||
|
||||
`Travis CI <https://travis-ci.org/html5lib/html5lib-python/>`_ is run
|
||||
against all pull requests and should enforce all of the above.
|
||||
|
||||
We use `Opera Critic <https://critic.hoppipolla.co.uk/>`_ as an external
|
||||
code-review tool, which uses your GitHub login to authenticate. You'll
|
||||
get email notifications for issues raised in the review.
|
||||
|
||||
|
||||
Patch submission guidelines
|
||||
---------------------------
|
||||
|
||||
- **Create a new Git branch specific to your change.** Do not put
|
||||
multiple fixes/features in the same pull request. If you find an
|
||||
unrelated bug, create a distinct branch and submit a separate pull
|
||||
request for the bugfix. This makes life much easier for maintainers
|
||||
and will speed up merging your patches.
|
||||
|
||||
- **Write a test** whenever possible. Following existing tests is often
|
||||
easiest, and a good way to tell whether the feature you're modifying
|
||||
is easily testable.
|
||||
|
||||
- **Make sure documentation is updated.** Keep docstrings current, and
|
||||
if necessary, update the Sphinx documentation in ``doc/``.
|
||||
|
||||
- **Add a changelog entry** at the top of ``CHANGES.rst`` following
|
||||
existing entries' styles.
|
||||
|
||||
- **Run tests with tox** if possible, to make sure your changes are
|
||||
compatible with all supported Python versions.
|
||||
|
||||
- **Squash commits** before submitting the pull request so that a single
|
||||
commit contains the entire change, and only that change (see the first
|
||||
bullet).
|
||||
|
||||
- **Don't rebase after creating the pull request.** Merge with upstream,
|
||||
if necessary, and use ``git commit --fixup`` for fixing issues raised
|
||||
in a Critic review or by a failing Travis build. The reviewer will
|
||||
squash and rebase your pull request while accepting it. Even though
|
||||
GitHub won't recognize the pull request as accepted, the squashed
|
||||
commits will properly specify you as the author.
|
||||
|
||||
- **Attribute yourself** in ``AUTHORS.rst``.
|
20
tests/wpt/web-platform-tests/tools/html5lib/LICENSE
Normal file
20
tests/wpt/web-platform-tests/tools/html5lib/LICENSE
Normal file
|
@ -0,0 +1,20 @@
|
|||
Copyright (c) 2006-2013 James Graham and other contributors
|
||||
|
||||
Permission is hereby granted, free of charge, to any person obtaining
|
||||
a copy of this software and associated documentation files (the
|
||||
"Software"), to deal in the Software without restriction, including
|
||||
without limitation the rights to use, copy, modify, merge, publish,
|
||||
distribute, sublicense, and/or sell copies of the Software, and to
|
||||
permit persons to whom the Software is furnished to do so, subject to
|
||||
the following conditions:
|
||||
|
||||
The above copyright notice and this permission notice shall be
|
||||
included in all copies or substantial portions of the Software.
|
||||
|
||||
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
|
||||
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
|
||||
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
|
||||
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
|
||||
LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
|
||||
OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
|
||||
WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|
6
tests/wpt/web-platform-tests/tools/html5lib/MANIFEST.in
Normal file
6
tests/wpt/web-platform-tests/tools/html5lib/MANIFEST.in
Normal file
|
@ -0,0 +1,6 @@
|
|||
include LICENSE
|
||||
include CHANGES.rst
|
||||
include README.rst
|
||||
include requirements*.txt
|
||||
graft html5lib/tests/testdata
|
||||
recursive-include html5lib/tests *.py
|
157
tests/wpt/web-platform-tests/tools/html5lib/README.rst
Normal file
157
tests/wpt/web-platform-tests/tools/html5lib/README.rst
Normal file
|
@ -0,0 +1,157 @@
|
|||
html5lib
|
||||
========
|
||||
|
||||
.. image:: https://travis-ci.org/html5lib/html5lib-python.png?branch=master
|
||||
:target: https://travis-ci.org/html5lib/html5lib-python
|
||||
|
||||
html5lib is a pure-python library for parsing HTML. It is designed to
|
||||
conform to the WHATWG HTML specification, as is implemented by all major
|
||||
web browsers.
|
||||
|
||||
|
||||
Usage
|
||||
-----
|
||||
|
||||
Simple usage follows this pattern:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
import html5lib
|
||||
with open("mydocument.html", "rb") as f:
|
||||
document = html5lib.parse(f)
|
||||
|
||||
or:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
import html5lib
|
||||
document = html5lib.parse("<p>Hello World!")
|
||||
|
||||
By default, the ``document`` will be an ``xml.etree`` element instance.
|
||||
Whenever possible, html5lib chooses the accelerated ``ElementTree``
|
||||
implementation (i.e. ``xml.etree.cElementTree`` on Python 2.x).
|
||||
|
||||
Two other tree types are supported: ``xml.dom.minidom`` and
|
||||
``lxml.etree``. To use an alternative format, specify the name of
|
||||
a treebuilder:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
import html5lib
|
||||
with open("mydocument.html", "rb") as f:
|
||||
lxml_etree_document = html5lib.parse(f, treebuilder="lxml")
|
||||
|
||||
When using with ``urllib2`` (Python 2), the charset from HTTP should be
|
||||
pass into html5lib as follows:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
from contextlib import closing
|
||||
from urllib2 import urlopen
|
||||
import html5lib
|
||||
|
||||
with closing(urlopen("http://example.com/")) as f:
|
||||
document = html5lib.parse(f, encoding=f.info().getparam("charset"))
|
||||
|
||||
When using with ``urllib.request`` (Python 3), the charset from HTTP
|
||||
should be pass into html5lib as follows:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
from urllib.request import urlopen
|
||||
import html5lib
|
||||
|
||||
with urlopen("http://example.com/") as f:
|
||||
document = html5lib.parse(f, encoding=f.info().get_content_charset())
|
||||
|
||||
To have more control over the parser, create a parser object explicitly.
|
||||
For instance, to make the parser raise exceptions on parse errors, use:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
import html5lib
|
||||
with open("mydocument.html", "rb") as f:
|
||||
parser = html5lib.HTMLParser(strict=True)
|
||||
document = parser.parse(f)
|
||||
|
||||
When you're instantiating parser objects explicitly, pass a treebuilder
|
||||
class as the ``tree`` keyword argument to use an alternative document
|
||||
format:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
import html5lib
|
||||
parser = html5lib.HTMLParser(tree=html5lib.getTreeBuilder("dom"))
|
||||
minidom_document = parser.parse("<p>Hello World!")
|
||||
|
||||
More documentation is available at http://html5lib.readthedocs.org/.
|
||||
|
||||
|
||||
Installation
|
||||
------------
|
||||
|
||||
html5lib works on CPython 2.6+, CPython 3.2+ and PyPy. To install it,
|
||||
use:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
$ pip install html5lib
|
||||
|
||||
|
||||
Optional Dependencies
|
||||
---------------------
|
||||
|
||||
The following third-party libraries may be used for additional
|
||||
functionality:
|
||||
|
||||
- ``datrie`` can be used to improve parsing performance (though in
|
||||
almost all cases the improvement is marginal);
|
||||
|
||||
- ``lxml`` is supported as a tree format (for both building and
|
||||
walking) under CPython (but *not* PyPy where it is known to cause
|
||||
segfaults);
|
||||
|
||||
- ``genshi`` has a treewalker (but not builder); and
|
||||
|
||||
- ``charade`` can be used as a fallback when character encoding cannot
|
||||
be determined; ``chardet``, from which it was forked, can also be used
|
||||
on Python 2.
|
||||
|
||||
- ``ordereddict`` can be used under Python 2.6
|
||||
(``collections.OrderedDict`` is used instead on later versions) to
|
||||
serialize attributes in alphabetical order.
|
||||
|
||||
|
||||
Bugs
|
||||
----
|
||||
|
||||
Please report any bugs on the `issue tracker
|
||||
<https://github.com/html5lib/html5lib-python/issues>`_.
|
||||
|
||||
|
||||
Tests
|
||||
-----
|
||||
|
||||
Unit tests require the ``nose`` library and can be run using the
|
||||
``nosetests`` command in the root directory; ``ordereddict`` is
|
||||
required under Python 2.6. All should pass.
|
||||
|
||||
Test data are contained in a separate `html5lib-tests
|
||||
<https://github.com/html5lib/html5lib-tests>`_ repository and included
|
||||
as a submodule, thus for git checkouts they must be initialized::
|
||||
|
||||
$ git submodule init
|
||||
$ git submodule update
|
||||
|
||||
If you have all compatible Python implementations available on your
|
||||
system, you can run tests on all of them using the ``tox`` utility,
|
||||
which can be found on PyPI.
|
||||
|
||||
|
||||
Questions?
|
||||
----------
|
||||
|
||||
There's a mailing list available for support on Google Groups,
|
||||
`html5lib-discuss <http://groups.google.com/group/html5lib-discuss>`_,
|
||||
though you may get a quicker response asking on IRC in `#whatwg on
|
||||
irc.freenode.net <http://wiki.whatwg.org/wiki/IRC>`_.
|
37
tests/wpt/web-platform-tests/tools/html5lib/debug-info.py
Normal file
37
tests/wpt/web-platform-tests/tools/html5lib/debug-info.py
Normal file
|
@ -0,0 +1,37 @@
|
|||
from __future__ import print_function, unicode_literals
|
||||
|
||||
import platform
|
||||
import sys
|
||||
|
||||
|
||||
info = {
|
||||
"impl": platform.python_implementation(),
|
||||
"version": platform.python_version(),
|
||||
"revision": platform.python_revision(),
|
||||
"maxunicode": sys.maxunicode,
|
||||
"maxsize": sys.maxsize
|
||||
}
|
||||
|
||||
search_modules = ["charade", "chardet", "datrie", "genshi", "html5lib", "lxml", "six"]
|
||||
found_modules = []
|
||||
|
||||
for m in search_modules:
|
||||
try:
|
||||
__import__(m)
|
||||
except ImportError:
|
||||
pass
|
||||
else:
|
||||
found_modules.append(m)
|
||||
|
||||
info["modules"] = ", ".join(found_modules)
|
||||
|
||||
|
||||
print("""html5lib debug info:
|
||||
|
||||
Python %(version)s (revision: %(revision)s)
|
||||
Implementation: %(impl)s
|
||||
|
||||
sys.maxunicode: %(maxunicode)X
|
||||
sys.maxsize: %(maxsize)X
|
||||
|
||||
Installed modules: %(modules)s""" % info)
|
177
tests/wpt/web-platform-tests/tools/html5lib/doc/Makefile
Normal file
177
tests/wpt/web-platform-tests/tools/html5lib/doc/Makefile
Normal file
|
@ -0,0 +1,177 @@
|
|||
# Makefile for Sphinx documentation
|
||||
#
|
||||
|
||||
# You can set these variables from the command line.
|
||||
SPHINXOPTS =
|
||||
SPHINXBUILD = sphinx-build
|
||||
PAPER =
|
||||
BUILDDIR = _build
|
||||
|
||||
# User-friendly check for sphinx-build
|
||||
ifeq ($(shell which $(SPHINXBUILD) >/dev/null 2>&1; echo $$?), 1)
|
||||
$(error The '$(SPHINXBUILD)' command was not found. Make sure you have Sphinx installed, then set the SPHINXBUILD environment variable to point to the full path of the '$(SPHINXBUILD)' executable. Alternatively you can add the directory with the executable to your PATH. If you don't have Sphinx installed, grab it from http://sphinx-doc.org/)
|
||||
endif
|
||||
|
||||
# Internal variables.
|
||||
PAPEROPT_a4 = -D latex_paper_size=a4
|
||||
PAPEROPT_letter = -D latex_paper_size=letter
|
||||
ALLSPHINXOPTS = -d $(BUILDDIR)/doctrees $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) .
|
||||
# the i18n builder cannot share the environment and doctrees with the others
|
||||
I18NSPHINXOPTS = $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) .
|
||||
|
||||
.PHONY: help clean html dirhtml singlehtml pickle json htmlhelp qthelp devhelp epub latex latexpdf text man changes linkcheck doctest gettext
|
||||
|
||||
help:
|
||||
@echo "Please use \`make <target>' where <target> is one of"
|
||||
@echo " html to make standalone HTML files"
|
||||
@echo " dirhtml to make HTML files named index.html in directories"
|
||||
@echo " singlehtml to make a single large HTML file"
|
||||
@echo " pickle to make pickle files"
|
||||
@echo " json to make JSON files"
|
||||
@echo " htmlhelp to make HTML files and a HTML help project"
|
||||
@echo " qthelp to make HTML files and a qthelp project"
|
||||
@echo " devhelp to make HTML files and a Devhelp project"
|
||||
@echo " epub to make an epub"
|
||||
@echo " latex to make LaTeX files, you can set PAPER=a4 or PAPER=letter"
|
||||
@echo " latexpdf to make LaTeX files and run them through pdflatex"
|
||||
@echo " latexpdfja to make LaTeX files and run them through platex/dvipdfmx"
|
||||
@echo " text to make text files"
|
||||
@echo " man to make manual pages"
|
||||
@echo " texinfo to make Texinfo files"
|
||||
@echo " info to make Texinfo files and run them through makeinfo"
|
||||
@echo " gettext to make PO message catalogs"
|
||||
@echo " changes to make an overview of all changed/added/deprecated items"
|
||||
@echo " xml to make Docutils-native XML files"
|
||||
@echo " pseudoxml to make pseudoxml-XML files for display purposes"
|
||||
@echo " linkcheck to check all external links for integrity"
|
||||
@echo " doctest to run all doctests embedded in the documentation (if enabled)"
|
||||
|
||||
clean:
|
||||
rm -rf $(BUILDDIR)/*
|
||||
|
||||
html:
|
||||
$(SPHINXBUILD) -b html $(ALLSPHINXOPTS) $(BUILDDIR)/html
|
||||
@echo
|
||||
@echo "Build finished. The HTML pages are in $(BUILDDIR)/html."
|
||||
|
||||
dirhtml:
|
||||
$(SPHINXBUILD) -b dirhtml $(ALLSPHINXOPTS) $(BUILDDIR)/dirhtml
|
||||
@echo
|
||||
@echo "Build finished. The HTML pages are in $(BUILDDIR)/dirhtml."
|
||||
|
||||
singlehtml:
|
||||
$(SPHINXBUILD) -b singlehtml $(ALLSPHINXOPTS) $(BUILDDIR)/singlehtml
|
||||
@echo
|
||||
@echo "Build finished. The HTML page is in $(BUILDDIR)/singlehtml."
|
||||
|
||||
pickle:
|
||||
$(SPHINXBUILD) -b pickle $(ALLSPHINXOPTS) $(BUILDDIR)/pickle
|
||||
@echo
|
||||
@echo "Build finished; now you can process the pickle files."
|
||||
|
||||
json:
|
||||
$(SPHINXBUILD) -b json $(ALLSPHINXOPTS) $(BUILDDIR)/json
|
||||
@echo
|
||||
@echo "Build finished; now you can process the JSON files."
|
||||
|
||||
htmlhelp:
|
||||
$(SPHINXBUILD) -b htmlhelp $(ALLSPHINXOPTS) $(BUILDDIR)/htmlhelp
|
||||
@echo
|
||||
@echo "Build finished; now you can run HTML Help Workshop with the" \
|
||||
".hhp project file in $(BUILDDIR)/htmlhelp."
|
||||
|
||||
qthelp:
|
||||
$(SPHINXBUILD) -b qthelp $(ALLSPHINXOPTS) $(BUILDDIR)/qthelp
|
||||
@echo
|
||||
@echo "Build finished; now you can run "qcollectiongenerator" with the" \
|
||||
".qhcp project file in $(BUILDDIR)/qthelp, like this:"
|
||||
@echo "# qcollectiongenerator $(BUILDDIR)/qthelp/html5lib.qhcp"
|
||||
@echo "To view the help file:"
|
||||
@echo "# assistant -collectionFile $(BUILDDIR)/qthelp/html5lib.qhc"
|
||||
|
||||
devhelp:
|
||||
$(SPHINXBUILD) -b devhelp $(ALLSPHINXOPTS) $(BUILDDIR)/devhelp
|
||||
@echo
|
||||
@echo "Build finished."
|
||||
@echo "To view the help file:"
|
||||
@echo "# mkdir -p $$HOME/.local/share/devhelp/html5lib"
|
||||
@echo "# ln -s $(BUILDDIR)/devhelp $$HOME/.local/share/devhelp/html5lib"
|
||||
@echo "# devhelp"
|
||||
|
||||
epub:
|
||||
$(SPHINXBUILD) -b epub $(ALLSPHINXOPTS) $(BUILDDIR)/epub
|
||||
@echo
|
||||
@echo "Build finished. The epub file is in $(BUILDDIR)/epub."
|
||||
|
||||
latex:
|
||||
$(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex
|
||||
@echo
|
||||
@echo "Build finished; the LaTeX files are in $(BUILDDIR)/latex."
|
||||
@echo "Run \`make' in that directory to run these through (pdf)latex" \
|
||||
"(use \`make latexpdf' here to do that automatically)."
|
||||
|
||||
latexpdf:
|
||||
$(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex
|
||||
@echo "Running LaTeX files through pdflatex..."
|
||||
$(MAKE) -C $(BUILDDIR)/latex all-pdf
|
||||
@echo "pdflatex finished; the PDF files are in $(BUILDDIR)/latex."
|
||||
|
||||
latexpdfja:
|
||||
$(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex
|
||||
@echo "Running LaTeX files through platex and dvipdfmx..."
|
||||
$(MAKE) -C $(BUILDDIR)/latex all-pdf-ja
|
||||
@echo "pdflatex finished; the PDF files are in $(BUILDDIR)/latex."
|
||||
|
||||
text:
|
||||
$(SPHINXBUILD) -b text $(ALLSPHINXOPTS) $(BUILDDIR)/text
|
||||
@echo
|
||||
@echo "Build finished. The text files are in $(BUILDDIR)/text."
|
||||
|
||||
man:
|
||||
$(SPHINXBUILD) -b man $(ALLSPHINXOPTS) $(BUILDDIR)/man
|
||||
@echo
|
||||
@echo "Build finished. The manual pages are in $(BUILDDIR)/man."
|
||||
|
||||
texinfo:
|
||||
$(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo
|
||||
@echo
|
||||
@echo "Build finished. The Texinfo files are in $(BUILDDIR)/texinfo."
|
||||
@echo "Run \`make' in that directory to run these through makeinfo" \
|
||||
"(use \`make info' here to do that automatically)."
|
||||
|
||||
info:
|
||||
$(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo
|
||||
@echo "Running Texinfo files through makeinfo..."
|
||||
make -C $(BUILDDIR)/texinfo info
|
||||
@echo "makeinfo finished; the Info files are in $(BUILDDIR)/texinfo."
|
||||
|
||||
gettext:
|
||||
$(SPHINXBUILD) -b gettext $(I18NSPHINXOPTS) $(BUILDDIR)/locale
|
||||
@echo
|
||||
@echo "Build finished. The message catalogs are in $(BUILDDIR)/locale."
|
||||
|
||||
changes:
|
||||
$(SPHINXBUILD) -b changes $(ALLSPHINXOPTS) $(BUILDDIR)/changes
|
||||
@echo
|
||||
@echo "The overview file is in $(BUILDDIR)/changes."
|
||||
|
||||
linkcheck:
|
||||
$(SPHINXBUILD) -b linkcheck $(ALLSPHINXOPTS) $(BUILDDIR)/linkcheck
|
||||
@echo
|
||||
@echo "Link check complete; look for any errors in the above output " \
|
||||
"or in $(BUILDDIR)/linkcheck/output.txt."
|
||||
|
||||
doctest:
|
||||
$(SPHINXBUILD) -b doctest $(ALLSPHINXOPTS) $(BUILDDIR)/doctest
|
||||
@echo "Testing of doctests in the sources finished, look at the " \
|
||||
"results in $(BUILDDIR)/doctest/output.txt."
|
||||
|
||||
xml:
|
||||
$(SPHINXBUILD) -b xml $(ALLSPHINXOPTS) $(BUILDDIR)/xml
|
||||
@echo
|
||||
@echo "Build finished. The XML files are in $(BUILDDIR)/xml."
|
||||
|
||||
pseudoxml:
|
||||
$(SPHINXBUILD) -b pseudoxml $(ALLSPHINXOPTS) $(BUILDDIR)/pseudoxml
|
||||
@echo
|
||||
@echo "Build finished. The pseudo-XML files are in $(BUILDDIR)/pseudoxml."
|
|
@ -0,0 +1,3 @@
|
|||
.. :changelog:
|
||||
|
||||
.. include:: ../CHANGES.rst
|
280
tests/wpt/web-platform-tests/tools/html5lib/doc/conf.py
Normal file
280
tests/wpt/web-platform-tests/tools/html5lib/doc/conf.py
Normal file
|
@ -0,0 +1,280 @@
|
|||
#!/usr/bin/env python3
|
||||
# -*- coding: utf-8 -*-
|
||||
#
|
||||
# html5lib documentation build configuration file, created by
|
||||
# sphinx-quickstart on Wed May 8 00:04:49 2013.
|
||||
#
|
||||
# This file is execfile()d with the current directory set to its containing dir.
|
||||
#
|
||||
# Note that not all possible configuration values are present in this
|
||||
# autogenerated file.
|
||||
#
|
||||
# All configuration values have a default; values that are commented out
|
||||
# serve to show the default.
|
||||
|
||||
import sys, os
|
||||
|
||||
# If extensions (or modules to document with autodoc) are in another directory,
|
||||
# add these directories to sys.path here. If the directory is relative to the
|
||||
# documentation root, use os.path.abspath to make it absolute, like shown here.
|
||||
#sys.path.insert(0, os.path.abspath('.'))
|
||||
|
||||
# -- General configuration -----------------------------------------------------
|
||||
|
||||
# If your documentation needs a minimal Sphinx version, state it here.
|
||||
#needs_sphinx = '1.0'
|
||||
|
||||
# Add any Sphinx extension module names here, as strings. They can be extensions
|
||||
# coming with Sphinx (named 'sphinx.ext.*') or your custom ones.
|
||||
extensions = ['sphinx.ext.autodoc', 'sphinx.ext.doctest', 'sphinx.ext.viewcode']
|
||||
|
||||
# Add any paths that contain templates here, relative to this directory.
|
||||
templates_path = ['_templates']
|
||||
|
||||
# The suffix of source filenames.
|
||||
source_suffix = '.rst'
|
||||
|
||||
# The encoding of source files.
|
||||
#source_encoding = 'utf-8-sig'
|
||||
|
||||
# The master toctree document.
|
||||
master_doc = 'index'
|
||||
|
||||
# General information about the project.
|
||||
project = 'html5lib'
|
||||
copyright = '2006 - 2013, James Graham, Geoffrey Sneddon, and contributors'
|
||||
|
||||
# The version info for the project you're documenting, acts as replacement for
|
||||
# |version| and |release|, also used in various other places throughout the
|
||||
# built documents.
|
||||
#
|
||||
# The short X.Y version.
|
||||
version = '1.0'
|
||||
# The full version, including alpha/beta/rc tags.
|
||||
sys.path.append(os.path.abspath('..'))
|
||||
from html5lib import __version__
|
||||
release = __version__
|
||||
|
||||
# The language for content autogenerated by Sphinx. Refer to documentation
|
||||
# for a list of supported languages.
|
||||
#language = 'en'
|
||||
|
||||
# There are two options for replacing |today|: either, you set today to some
|
||||
# non-false value, then it is used:
|
||||
#today = ''
|
||||
# Else, today_fmt is used as the format for a strftime call.
|
||||
#today_fmt = '%B %d, %Y'
|
||||
|
||||
# List of patterns, relative to source directory, that match files and
|
||||
# directories to ignore when looking for source files.
|
||||
exclude_patterns = ['_build', 'theme']
|
||||
|
||||
# The reST default role (used for this markup: `text`) to use for all documents.
|
||||
#default_role = None
|
||||
|
||||
# If true, '()' will be appended to :func: etc. cross-reference text.
|
||||
#add_function_parentheses = True
|
||||
|
||||
# If true, the current module name will be prepended to all description
|
||||
# unit titles (such as .. function::).
|
||||
#add_module_names = True
|
||||
|
||||
# If true, sectionauthor and moduleauthor directives will be shown in the
|
||||
# output. They are ignored by default.
|
||||
#show_authors = False
|
||||
|
||||
# The name of the Pygments (syntax highlighting) style to use.
|
||||
pygments_style = 'sphinx'
|
||||
|
||||
# A list of ignored prefixes for module index sorting.
|
||||
#modindex_common_prefix = []
|
||||
|
||||
# If true, keep warnings as "system message" paragraphs in the built documents.
|
||||
#keep_warnings = False
|
||||
|
||||
|
||||
# -- Options for HTML output ---------------------------------------------------
|
||||
|
||||
# The theme to use for HTML and HTML Help pages. See the documentation for
|
||||
# a list of builtin themes.
|
||||
html_theme = 'default'
|
||||
|
||||
# Theme options are theme-specific and customize the look and feel of a theme
|
||||
# further. For a list of options available for each theme, see the
|
||||
# documentation.
|
||||
#html_theme_options = {}
|
||||
|
||||
# Add any paths that contain custom themes here, relative to this directory.
|
||||
#html_theme_path = []
|
||||
|
||||
# The name for this set of Sphinx documents. If None, it defaults to
|
||||
# "<project> v<release> documentation".
|
||||
#html_title = None
|
||||
|
||||
# A shorter title for the navigation bar. Default is the same as html_title.
|
||||
#html_short_title = None
|
||||
|
||||
# The name of an image file (relative to this directory) to place at the top
|
||||
# of the sidebar.
|
||||
#html_logo = None
|
||||
|
||||
# The name of an image file (within the static path) to use as favicon of the
|
||||
# docs. This file should be a Windows icon file (.ico) being 16x16 or 32x32
|
||||
# pixels large.
|
||||
#html_favicon = None
|
||||
|
||||
# Add any paths that contain custom static files (such as style sheets) here,
|
||||
# relative to this directory. They are copied after the builtin static files,
|
||||
# so a file named "default.css" will overwrite the builtin "default.css".
|
||||
html_static_path = ['_static']
|
||||
|
||||
# If not '', a 'Last updated on:' timestamp is inserted at every page bottom,
|
||||
# using the given strftime format.
|
||||
#html_last_updated_fmt = '%b %d, %Y'
|
||||
|
||||
# If true, SmartyPants will be used to convert quotes and dashes to
|
||||
# typographically correct entities.
|
||||
#html_use_smartypants = True
|
||||
|
||||
# Custom sidebar templates, maps document names to template names.
|
||||
#html_sidebars = {}
|
||||
|
||||
# Additional templates that should be rendered to pages, maps page names to
|
||||
# template names.
|
||||
#html_additional_pages = {}
|
||||
|
||||
# If false, no module index is generated.
|
||||
#html_domain_indices = True
|
||||
|
||||
# If false, no index is generated.
|
||||
#html_use_index = True
|
||||
|
||||
# If true, the index is split into individual pages for each letter.
|
||||
#html_split_index = False
|
||||
|
||||
# If true, links to the reST sources are added to the pages.
|
||||
#html_show_sourcelink = True
|
||||
|
||||
# If true, "Created using Sphinx" is shown in the HTML footer. Default is True.
|
||||
#html_show_sphinx = True
|
||||
|
||||
# If true, "(C) Copyright ..." is shown in the HTML footer. Default is True.
|
||||
#html_show_copyright = True
|
||||
|
||||
# If true, an OpenSearch description file will be output, and all pages will
|
||||
# contain a <link> tag referring to it. The value of this option must be the
|
||||
# base URL from which the finished HTML is served.
|
||||
#html_use_opensearch = ''
|
||||
|
||||
# This is the file name suffix for HTML files (e.g. ".xhtml").
|
||||
#html_file_suffix = None
|
||||
|
||||
# Output file base name for HTML help builder.
|
||||
htmlhelp_basename = 'html5libdoc'
|
||||
|
||||
|
||||
# -- Options for LaTeX output --------------------------------------------------
|
||||
|
||||
latex_elements = {
|
||||
# The paper size ('letterpaper' or 'a4paper').
|
||||
#'papersize': 'letterpaper',
|
||||
|
||||
# The font size ('10pt', '11pt' or '12pt').
|
||||
#'pointsize': '10pt',
|
||||
|
||||
# Additional stuff for the LaTeX preamble.
|
||||
#'preamble': '',
|
||||
}
|
||||
|
||||
# Grouping the document tree into LaTeX files. List of tuples
|
||||
# (source start file, target name, title, author, documentclass [howto/manual]).
|
||||
latex_documents = [
|
||||
('index', 'html5lib.tex', 'html5lib Documentation',
|
||||
'James Graham, Geoffrey Sneddon, and contributors', 'manual'),
|
||||
]
|
||||
|
||||
# The name of an image file (relative to this directory) to place at the top of
|
||||
# the title page.
|
||||
#latex_logo = None
|
||||
|
||||
# For "manual" documents, if this is true, then toplevel headings are parts,
|
||||
# not chapters.
|
||||
#latex_use_parts = False
|
||||
|
||||
# If true, show page references after internal links.
|
||||
#latex_show_pagerefs = False
|
||||
|
||||
# If true, show URL addresses after external links.
|
||||
#latex_show_urls = False
|
||||
|
||||
# Documents to append as an appendix to all manuals.
|
||||
#latex_appendices = []
|
||||
|
||||
# If false, no module index is generated.
|
||||
#latex_domain_indices = True
|
||||
|
||||
|
||||
# -- Options for manual page output --------------------------------------------
|
||||
|
||||
# One entry per manual page. List of tuples
|
||||
# (source start file, name, description, authors, manual section).
|
||||
man_pages = [
|
||||
('index', 'html5lib', 'html5lib Documentation',
|
||||
['James Graham, Geoffrey Sneddon, and contributors'], 1)
|
||||
]
|
||||
|
||||
# If true, show URL addresses after external links.
|
||||
#man_show_urls = False
|
||||
|
||||
|
||||
# -- Options for Texinfo output ------------------------------------------------
|
||||
|
||||
# Grouping the document tree into Texinfo files. List of tuples
|
||||
# (source start file, target name, title, author,
|
||||
# dir menu entry, description, category)
|
||||
texinfo_documents = [
|
||||
('index', 'html5lib', 'html5lib Documentation',
|
||||
'James Graham, Geoffrey Sneddon, and contributors', 'html5lib', 'One line description of project.',
|
||||
'Miscellaneous'),
|
||||
]
|
||||
|
||||
# Documents to append as an appendix to all manuals.
|
||||
#texinfo_appendices = []
|
||||
|
||||
# If false, no module index is generated.
|
||||
#texinfo_domain_indices = True
|
||||
|
||||
# How to display URL addresses: 'footnote', 'no', or 'inline'.
|
||||
#texinfo_show_urls = 'footnote'
|
||||
|
||||
# If true, do not generate a @detailmenu in the "Top" node's menu.
|
||||
#texinfo_no_detailmenu = False
|
||||
|
||||
class CExtMock(object):
|
||||
"""Required for autodoc on readthedocs.org where you cannot build C extensions."""
|
||||
def __init__(self, *args, **kwargs):
|
||||
pass
|
||||
|
||||
def __call__(self, *args, **kwargs):
|
||||
return CExtMock()
|
||||
|
||||
@classmethod
|
||||
def __getattr__(cls, name):
|
||||
if name in ('__file__', '__path__'):
|
||||
return '/dev/null'
|
||||
else:
|
||||
return CExtMock()
|
||||
|
||||
try:
|
||||
import lxml # flake8: noqa
|
||||
except ImportError:
|
||||
sys.modules['lxml'] = CExtMock()
|
||||
sys.modules['lxml.etree'] = CExtMock()
|
||||
print("warning: lxml modules mocked.")
|
||||
|
||||
try:
|
||||
import genshi # flake8: noqa
|
||||
except ImportError:
|
||||
sys.modules['genshi'] = CExtMock()
|
||||
sys.modules['genshi.core'] = CExtMock()
|
||||
print("warning: genshi modules mocked.")
|
|
@ -0,0 +1,59 @@
|
|||
filters Package
|
||||
===============
|
||||
|
||||
:mod:`_base` Module
|
||||
-------------------
|
||||
|
||||
.. automodule:: html5lib.filters._base
|
||||
:members:
|
||||
:undoc-members:
|
||||
:show-inheritance:
|
||||
|
||||
:mod:`alphabeticalattributes` Module
|
||||
------------------------------------
|
||||
|
||||
.. automodule:: html5lib.filters.alphabeticalattributes
|
||||
:members:
|
||||
:undoc-members:
|
||||
:show-inheritance:
|
||||
|
||||
:mod:`inject_meta_charset` Module
|
||||
---------------------------------
|
||||
|
||||
.. automodule:: html5lib.filters.inject_meta_charset
|
||||
:members:
|
||||
:undoc-members:
|
||||
:show-inheritance:
|
||||
|
||||
:mod:`lint` Module
|
||||
------------------
|
||||
|
||||
.. automodule:: html5lib.filters.lint
|
||||
:members:
|
||||
:undoc-members:
|
||||
:show-inheritance:
|
||||
|
||||
:mod:`optionaltags` Module
|
||||
--------------------------
|
||||
|
||||
.. automodule:: html5lib.filters.optionaltags
|
||||
:members:
|
||||
:undoc-members:
|
||||
:show-inheritance:
|
||||
|
||||
:mod:`sanitizer` Module
|
||||
-----------------------
|
||||
|
||||
.. automodule:: html5lib.filters.sanitizer
|
||||
:members:
|
||||
:undoc-members:
|
||||
:show-inheritance:
|
||||
|
||||
:mod:`whitespace` Module
|
||||
------------------------
|
||||
|
||||
.. automodule:: html5lib.filters.whitespace
|
||||
:members:
|
||||
:undoc-members:
|
||||
:show-inheritance:
|
||||
|
77
tests/wpt/web-platform-tests/tools/html5lib/doc/html5lib.rst
Normal file
77
tests/wpt/web-platform-tests/tools/html5lib/doc/html5lib.rst
Normal file
|
@ -0,0 +1,77 @@
|
|||
html5lib Package
|
||||
================
|
||||
|
||||
:mod:`html5lib` Package
|
||||
-----------------------
|
||||
|
||||
.. automodule:: html5lib.__init__
|
||||
:members:
|
||||
:undoc-members:
|
||||
:show-inheritance:
|
||||
|
||||
:mod:`constants` Module
|
||||
-----------------------
|
||||
|
||||
.. automodule:: html5lib.constants
|
||||
:members:
|
||||
:undoc-members:
|
||||
:show-inheritance:
|
||||
|
||||
:mod:`html5parser` Module
|
||||
-------------------------
|
||||
|
||||
.. automodule:: html5lib.html5parser
|
||||
:members:
|
||||
:undoc-members:
|
||||
:show-inheritance:
|
||||
|
||||
:mod:`ihatexml` Module
|
||||
----------------------
|
||||
|
||||
.. automodule:: html5lib.ihatexml
|
||||
:members:
|
||||
:undoc-members:
|
||||
:show-inheritance:
|
||||
|
||||
:mod:`inputstream` Module
|
||||
-------------------------
|
||||
|
||||
.. automodule:: html5lib.inputstream
|
||||
:members:
|
||||
:undoc-members:
|
||||
:show-inheritance:
|
||||
|
||||
:mod:`sanitizer` Module
|
||||
-----------------------
|
||||
|
||||
.. automodule:: html5lib.sanitizer
|
||||
:members:
|
||||
:undoc-members:
|
||||
:show-inheritance:
|
||||
|
||||
:mod:`tokenizer` Module
|
||||
-----------------------
|
||||
|
||||
.. automodule:: html5lib.tokenizer
|
||||
:members:
|
||||
:undoc-members:
|
||||
:show-inheritance:
|
||||
|
||||
:mod:`utils` Module
|
||||
-------------------
|
||||
|
||||
.. automodule:: html5lib.utils
|
||||
:members:
|
||||
:undoc-members:
|
||||
:show-inheritance:
|
||||
|
||||
Subpackages
|
||||
-----------
|
||||
|
||||
.. toctree::
|
||||
|
||||
html5lib.filters
|
||||
html5lib.serializer
|
||||
html5lib.treebuilders
|
||||
html5lib.treewalkers
|
||||
|
|
@ -0,0 +1,19 @@
|
|||
serializer Package
|
||||
==================
|
||||
|
||||
:mod:`serializer` Package
|
||||
-------------------------
|
||||
|
||||
.. automodule:: html5lib.serializer
|
||||
:members:
|
||||
:undoc-members:
|
||||
:show-inheritance:
|
||||
|
||||
:mod:`htmlserializer` Module
|
||||
----------------------------
|
||||
|
||||
.. automodule:: html5lib.serializer.htmlserializer
|
||||
:members:
|
||||
:undoc-members:
|
||||
:show-inheritance:
|
||||
|
|
@ -0,0 +1,43 @@
|
|||
treebuilders Package
|
||||
====================
|
||||
|
||||
:mod:`treebuilders` Package
|
||||
---------------------------
|
||||
|
||||
.. automodule:: html5lib.treebuilders
|
||||
:members:
|
||||
:undoc-members:
|
||||
:show-inheritance:
|
||||
|
||||
:mod:`_base` Module
|
||||
-------------------
|
||||
|
||||
.. automodule:: html5lib.treebuilders._base
|
||||
:members:
|
||||
:undoc-members:
|
||||
:show-inheritance:
|
||||
|
||||
:mod:`dom` Module
|
||||
-----------------
|
||||
|
||||
.. automodule:: html5lib.treebuilders.dom
|
||||
:members:
|
||||
:undoc-members:
|
||||
:show-inheritance:
|
||||
|
||||
:mod:`etree` Module
|
||||
-------------------
|
||||
|
||||
.. automodule:: html5lib.treebuilders.etree
|
||||
:members:
|
||||
:undoc-members:
|
||||
:show-inheritance:
|
||||
|
||||
:mod:`etree_lxml` Module
|
||||
------------------------
|
||||
|
||||
.. automodule:: html5lib.treebuilders.etree_lxml
|
||||
:members:
|
||||
:undoc-members:
|
||||
:show-inheritance:
|
||||
|
|
@ -0,0 +1,59 @@
|
|||
treewalkers Package
|
||||
===================
|
||||
|
||||
:mod:`treewalkers` Package
|
||||
--------------------------
|
||||
|
||||
.. automodule:: html5lib.treewalkers
|
||||
:members:
|
||||
:undoc-members:
|
||||
:show-inheritance:
|
||||
|
||||
:mod:`_base` Module
|
||||
-------------------
|
||||
|
||||
.. automodule:: html5lib.treewalkers._base
|
||||
:members:
|
||||
:undoc-members:
|
||||
:show-inheritance:
|
||||
|
||||
:mod:`dom` Module
|
||||
-----------------
|
||||
|
||||
.. automodule:: html5lib.treewalkers.dom
|
||||
:members:
|
||||
:undoc-members:
|
||||
:show-inheritance:
|
||||
|
||||
:mod:`etree` Module
|
||||
-------------------
|
||||
|
||||
.. automodule:: html5lib.treewalkers.etree
|
||||
:members:
|
||||
:undoc-members:
|
||||
:show-inheritance:
|
||||
|
||||
:mod:`genshistream` Module
|
||||
--------------------------
|
||||
|
||||
.. automodule:: html5lib.treewalkers.genshistream
|
||||
:members:
|
||||
:undoc-members:
|
||||
:show-inheritance:
|
||||
|
||||
:mod:`lxmletree` Module
|
||||
-----------------------
|
||||
|
||||
.. automodule:: html5lib.treewalkers.lxmletree
|
||||
:members:
|
||||
:undoc-members:
|
||||
:show-inheritance:
|
||||
|
||||
:mod:`pulldom` Module
|
||||
---------------------
|
||||
|
||||
.. automodule:: html5lib.treewalkers.pulldom
|
||||
:members:
|
||||
:undoc-members:
|
||||
:show-inheritance:
|
||||
|
21
tests/wpt/web-platform-tests/tools/html5lib/doc/index.rst
Normal file
21
tests/wpt/web-platform-tests/tools/html5lib/doc/index.rst
Normal file
|
@ -0,0 +1,21 @@
|
|||
Overview
|
||||
========
|
||||
|
||||
.. include:: ../README.rst
|
||||
:start-line: 6
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 2
|
||||
|
||||
movingparts
|
||||
changes
|
||||
License <license>
|
||||
|
||||
|
||||
Indices and tables
|
||||
==================
|
||||
|
||||
* :ref:`genindex`
|
||||
* :ref:`modindex`
|
||||
* :ref:`search`
|
||||
|
|
@ -0,0 +1,4 @@
|
|||
License
|
||||
=======
|
||||
|
||||
.. include:: ../LICENSE
|
242
tests/wpt/web-platform-tests/tools/html5lib/doc/make.bat
Normal file
242
tests/wpt/web-platform-tests/tools/html5lib/doc/make.bat
Normal file
|
@ -0,0 +1,242 @@
|
|||
@ECHO OFF
|
||||
|
||||
REM Command file for Sphinx documentation
|
||||
|
||||
if "%SPHINXBUILD%" == "" (
|
||||
set SPHINXBUILD=sphinx-build
|
||||
)
|
||||
set BUILDDIR=_build
|
||||
set ALLSPHINXOPTS=-d %BUILDDIR%/doctrees %SPHINXOPTS% .
|
||||
set I18NSPHINXOPTS=%SPHINXOPTS% .
|
||||
if NOT "%PAPER%" == "" (
|
||||
set ALLSPHINXOPTS=-D latex_paper_size=%PAPER% %ALLSPHINXOPTS%
|
||||
set I18NSPHINXOPTS=-D latex_paper_size=%PAPER% %I18NSPHINXOPTS%
|
||||
)
|
||||
|
||||
if "%1" == "" goto help
|
||||
|
||||
if "%1" == "help" (
|
||||
:help
|
||||
echo.Please use `make ^<target^>` where ^<target^> is one of
|
||||
echo. html to make standalone HTML files
|
||||
echo. dirhtml to make HTML files named index.html in directories
|
||||
echo. singlehtml to make a single large HTML file
|
||||
echo. pickle to make pickle files
|
||||
echo. json to make JSON files
|
||||
echo. htmlhelp to make HTML files and a HTML help project
|
||||
echo. qthelp to make HTML files and a qthelp project
|
||||
echo. devhelp to make HTML files and a Devhelp project
|
||||
echo. epub to make an epub
|
||||
echo. latex to make LaTeX files, you can set PAPER=a4 or PAPER=letter
|
||||
echo. text to make text files
|
||||
echo. man to make manual pages
|
||||
echo. texinfo to make Texinfo files
|
||||
echo. gettext to make PO message catalogs
|
||||
echo. changes to make an overview over all changed/added/deprecated items
|
||||
echo. xml to make Docutils-native XML files
|
||||
echo. pseudoxml to make pseudoxml-XML files for display purposes
|
||||
echo. linkcheck to check all external links for integrity
|
||||
echo. doctest to run all doctests embedded in the documentation if enabled
|
||||
goto end
|
||||
)
|
||||
|
||||
if "%1" == "clean" (
|
||||
for /d %%i in (%BUILDDIR%\*) do rmdir /q /s %%i
|
||||
del /q /s %BUILDDIR%\*
|
||||
goto end
|
||||
)
|
||||
|
||||
|
||||
%SPHINXBUILD% 2> nul
|
||||
if errorlevel 9009 (
|
||||
echo.
|
||||
echo.The 'sphinx-build' command was not found. Make sure you have Sphinx
|
||||
echo.installed, then set the SPHINXBUILD environment variable to point
|
||||
echo.to the full path of the 'sphinx-build' executable. Alternatively you
|
||||
echo.may add the Sphinx directory to PATH.
|
||||
echo.
|
||||
echo.If you don't have Sphinx installed, grab it from
|
||||
echo.http://sphinx-doc.org/
|
||||
exit /b 1
|
||||
)
|
||||
|
||||
if "%1" == "html" (
|
||||
%SPHINXBUILD% -b html %ALLSPHINXOPTS% %BUILDDIR%/html
|
||||
if errorlevel 1 exit /b 1
|
||||
echo.
|
||||
echo.Build finished. The HTML pages are in %BUILDDIR%/html.
|
||||
goto end
|
||||
)
|
||||
|
||||
if "%1" == "dirhtml" (
|
||||
%SPHINXBUILD% -b dirhtml %ALLSPHINXOPTS% %BUILDDIR%/dirhtml
|
||||
if errorlevel 1 exit /b 1
|
||||
echo.
|
||||
echo.Build finished. The HTML pages are in %BUILDDIR%/dirhtml.
|
||||
goto end
|
||||
)
|
||||
|
||||
if "%1" == "singlehtml" (
|
||||
%SPHINXBUILD% -b singlehtml %ALLSPHINXOPTS% %BUILDDIR%/singlehtml
|
||||
if errorlevel 1 exit /b 1
|
||||
echo.
|
||||
echo.Build finished. The HTML pages are in %BUILDDIR%/singlehtml.
|
||||
goto end
|
||||
)
|
||||
|
||||
if "%1" == "pickle" (
|
||||
%SPHINXBUILD% -b pickle %ALLSPHINXOPTS% %BUILDDIR%/pickle
|
||||
if errorlevel 1 exit /b 1
|
||||
echo.
|
||||
echo.Build finished; now you can process the pickle files.
|
||||
goto end
|
||||
)
|
||||
|
||||
if "%1" == "json" (
|
||||
%SPHINXBUILD% -b json %ALLSPHINXOPTS% %BUILDDIR%/json
|
||||
if errorlevel 1 exit /b 1
|
||||
echo.
|
||||
echo.Build finished; now you can process the JSON files.
|
||||
goto end
|
||||
)
|
||||
|
||||
if "%1" == "htmlhelp" (
|
||||
%SPHINXBUILD% -b htmlhelp %ALLSPHINXOPTS% %BUILDDIR%/htmlhelp
|
||||
if errorlevel 1 exit /b 1
|
||||
echo.
|
||||
echo.Build finished; now you can run HTML Help Workshop with the ^
|
||||
.hhp project file in %BUILDDIR%/htmlhelp.
|
||||
goto end
|
||||
)
|
||||
|
||||
if "%1" == "qthelp" (
|
||||
%SPHINXBUILD% -b qthelp %ALLSPHINXOPTS% %BUILDDIR%/qthelp
|
||||
if errorlevel 1 exit /b 1
|
||||
echo.
|
||||
echo.Build finished; now you can run "qcollectiongenerator" with the ^
|
||||
.qhcp project file in %BUILDDIR%/qthelp, like this:
|
||||
echo.^> qcollectiongenerator %BUILDDIR%\qthelp\html5lib.qhcp
|
||||
echo.To view the help file:
|
||||
echo.^> assistant -collectionFile %BUILDDIR%\qthelp\html5lib.ghc
|
||||
goto end
|
||||
)
|
||||
|
||||
if "%1" == "devhelp" (
|
||||
%SPHINXBUILD% -b devhelp %ALLSPHINXOPTS% %BUILDDIR%/devhelp
|
||||
if errorlevel 1 exit /b 1
|
||||
echo.
|
||||
echo.Build finished.
|
||||
goto end
|
||||
)
|
||||
|
||||
if "%1" == "epub" (
|
||||
%SPHINXBUILD% -b epub %ALLSPHINXOPTS% %BUILDDIR%/epub
|
||||
if errorlevel 1 exit /b 1
|
||||
echo.
|
||||
echo.Build finished. The epub file is in %BUILDDIR%/epub.
|
||||
goto end
|
||||
)
|
||||
|
||||
if "%1" == "latex" (
|
||||
%SPHINXBUILD% -b latex %ALLSPHINXOPTS% %BUILDDIR%/latex
|
||||
if errorlevel 1 exit /b 1
|
||||
echo.
|
||||
echo.Build finished; the LaTeX files are in %BUILDDIR%/latex.
|
||||
goto end
|
||||
)
|
||||
|
||||
if "%1" == "latexpdf" (
|
||||
%SPHINXBUILD% -b latex %ALLSPHINXOPTS% %BUILDDIR%/latex
|
||||
cd %BUILDDIR%/latex
|
||||
make all-pdf
|
||||
cd %BUILDDIR%/..
|
||||
echo.
|
||||
echo.Build finished; the PDF files are in %BUILDDIR%/latex.
|
||||
goto end
|
||||
)
|
||||
|
||||
if "%1" == "latexpdfja" (
|
||||
%SPHINXBUILD% -b latex %ALLSPHINXOPTS% %BUILDDIR%/latex
|
||||
cd %BUILDDIR%/latex
|
||||
make all-pdf-ja
|
||||
cd %BUILDDIR%/..
|
||||
echo.
|
||||
echo.Build finished; the PDF files are in %BUILDDIR%/latex.
|
||||
goto end
|
||||
)
|
||||
|
||||
if "%1" == "text" (
|
||||
%SPHINXBUILD% -b text %ALLSPHINXOPTS% %BUILDDIR%/text
|
||||
if errorlevel 1 exit /b 1
|
||||
echo.
|
||||
echo.Build finished. The text files are in %BUILDDIR%/text.
|
||||
goto end
|
||||
)
|
||||
|
||||
if "%1" == "man" (
|
||||
%SPHINXBUILD% -b man %ALLSPHINXOPTS% %BUILDDIR%/man
|
||||
if errorlevel 1 exit /b 1
|
||||
echo.
|
||||
echo.Build finished. The manual pages are in %BUILDDIR%/man.
|
||||
goto end
|
||||
)
|
||||
|
||||
if "%1" == "texinfo" (
|
||||
%SPHINXBUILD% -b texinfo %ALLSPHINXOPTS% %BUILDDIR%/texinfo
|
||||
if errorlevel 1 exit /b 1
|
||||
echo.
|
||||
echo.Build finished. The Texinfo files are in %BUILDDIR%/texinfo.
|
||||
goto end
|
||||
)
|
||||
|
||||
if "%1" == "gettext" (
|
||||
%SPHINXBUILD% -b gettext %I18NSPHINXOPTS% %BUILDDIR%/locale
|
||||
if errorlevel 1 exit /b 1
|
||||
echo.
|
||||
echo.Build finished. The message catalogs are in %BUILDDIR%/locale.
|
||||
goto end
|
||||
)
|
||||
|
||||
if "%1" == "changes" (
|
||||
%SPHINXBUILD% -b changes %ALLSPHINXOPTS% %BUILDDIR%/changes
|
||||
if errorlevel 1 exit /b 1
|
||||
echo.
|
||||
echo.The overview file is in %BUILDDIR%/changes.
|
||||
goto end
|
||||
)
|
||||
|
||||
if "%1" == "linkcheck" (
|
||||
%SPHINXBUILD% -b linkcheck %ALLSPHINXOPTS% %BUILDDIR%/linkcheck
|
||||
if errorlevel 1 exit /b 1
|
||||
echo.
|
||||
echo.Link check complete; look for any errors in the above output ^
|
||||
or in %BUILDDIR%/linkcheck/output.txt.
|
||||
goto end
|
||||
)
|
||||
|
||||
if "%1" == "doctest" (
|
||||
%SPHINXBUILD% -b doctest %ALLSPHINXOPTS% %BUILDDIR%/doctest
|
||||
if errorlevel 1 exit /b 1
|
||||
echo.
|
||||
echo.Testing of doctests in the sources finished, look at the ^
|
||||
results in %BUILDDIR%/doctest/output.txt.
|
||||
goto end
|
||||
)
|
||||
|
||||
if "%1" == "xml" (
|
||||
%SPHINXBUILD% -b xml %ALLSPHINXOPTS% %BUILDDIR%/xml
|
||||
if errorlevel 1 exit /b 1
|
||||
echo.
|
||||
echo.Build finished. The XML files are in %BUILDDIR%/xml.
|
||||
goto end
|
||||
)
|
||||
|
||||
if "%1" == "pseudoxml" (
|
||||
%SPHINXBUILD% -b pseudoxml %ALLSPHINXOPTS% %BUILDDIR%/pseudoxml
|
||||
if errorlevel 1 exit /b 1
|
||||
echo.
|
||||
echo.Build finished. The pseudo-XML files are in %BUILDDIR%/pseudoxml.
|
||||
goto end
|
||||
)
|
||||
|
||||
:end
|
|
@ -0,0 +1,7 @@
|
|||
html5lib
|
||||
========
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 4
|
||||
|
||||
html5lib
|
209
tests/wpt/web-platform-tests/tools/html5lib/doc/movingparts.rst
Normal file
209
tests/wpt/web-platform-tests/tools/html5lib/doc/movingparts.rst
Normal file
|
@ -0,0 +1,209 @@
|
|||
The moving parts
|
||||
================
|
||||
|
||||
html5lib consists of a number of components, which are responsible for
|
||||
handling its features.
|
||||
|
||||
|
||||
Tree builders
|
||||
-------------
|
||||
|
||||
The parser reads HTML by tokenizing the content and building a tree that
|
||||
the user can later access. There are three main types of trees that
|
||||
html5lib can build:
|
||||
|
||||
* ``etree`` - this is the default; builds a tree based on ``xml.etree``,
|
||||
which can be found in the standard library. Whenever possible, the
|
||||
accelerated ``ElementTree`` implementation (i.e.
|
||||
``xml.etree.cElementTree`` on Python 2.x) is used.
|
||||
|
||||
* ``dom`` - builds a tree based on ``xml.dom.minidom``.
|
||||
|
||||
* ``lxml.etree`` - uses lxml's implementation of the ``ElementTree``
|
||||
API. The performance gains are relatively small compared to using the
|
||||
accelerated ``ElementTree`` module.
|
||||
|
||||
You can specify the builder by name when using the shorthand API:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
import html5lib
|
||||
with open("mydocument.html", "rb") as f:
|
||||
lxml_etree_document = html5lib.parse(f, treebuilder="lxml")
|
||||
|
||||
When instantiating a parser object, you have to pass a tree builder
|
||||
class in the ``tree`` keyword attribute:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
import html5lib
|
||||
parser = html5lib.HTMLParser(tree=SomeTreeBuilder)
|
||||
document = parser.parse("<p>Hello World!")
|
||||
|
||||
To get a builder class by name, use the ``getTreeBuilder`` function:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
import html5lib
|
||||
parser = html5lib.HTMLParser(tree=html5lib.getTreeBuilder("dom"))
|
||||
minidom_document = parser.parse("<p>Hello World!")
|
||||
|
||||
The implementation of builders can be found in `html5lib/treebuilders/
|
||||
<https://github.com/html5lib/html5lib-python/tree/master/html5lib/treebuilders>`_.
|
||||
|
||||
|
||||
Tree walkers
|
||||
------------
|
||||
|
||||
Once a tree is ready, you can work on it either manually, or using
|
||||
a tree walker, which provides a streaming view of the tree. html5lib
|
||||
provides walkers for all three supported types of trees (``etree``,
|
||||
``dom`` and ``lxml``).
|
||||
|
||||
The implementation of walkers can be found in `html5lib/treewalkers/
|
||||
<https://github.com/html5lib/html5lib-python/tree/master/html5lib/treewalkers>`_.
|
||||
|
||||
Walkers make consuming HTML easier. html5lib uses them to provide you
|
||||
with has a couple of handy tools.
|
||||
|
||||
|
||||
HTMLSerializer
|
||||
~~~~~~~~~~~~~~
|
||||
|
||||
The serializer lets you write HTML back as a stream of bytes.
|
||||
|
||||
.. code-block:: pycon
|
||||
|
||||
>>> import html5lib
|
||||
>>> element = html5lib.parse('<p xml:lang="pl">Witam wszystkich')
|
||||
>>> walker = html5lib.getTreeWalker("etree")
|
||||
>>> stream = walker(element)
|
||||
>>> s = html5lib.serializer.HTMLSerializer()
|
||||
>>> output = s.serialize(stream)
|
||||
>>> for item in output:
|
||||
... print("%r" % item)
|
||||
'<p'
|
||||
' '
|
||||
'xml:lang'
|
||||
'='
|
||||
'pl'
|
||||
'>'
|
||||
'Witam wszystkich'
|
||||
|
||||
You can customize the serializer behaviour in a variety of ways, consult
|
||||
the :class:`~html5lib.serializer.htmlserializer.HTMLSerializer`
|
||||
documentation.
|
||||
|
||||
|
||||
Filters
|
||||
~~~~~~~
|
||||
|
||||
You can alter the stream content with filters provided by html5lib:
|
||||
|
||||
* :class:`alphabeticalattributes.Filter
|
||||
<html5lib.filters.alphabeticalattributes.Filter>` sorts attributes on
|
||||
tags to be in alphabetical order
|
||||
|
||||
* :class:`inject_meta_charset.Filter
|
||||
<html5lib.filters.inject_meta_charset.Filter>` sets a user-specified
|
||||
encoding in the correct ``<meta>`` tag in the ``<head>`` section of
|
||||
the document
|
||||
|
||||
* :class:`lint.Filter <html5lib.filters.lint.Filter>` raises
|
||||
``LintError`` exceptions on invalid tag and attribute names, invalid
|
||||
PCDATA, etc.
|
||||
|
||||
* :class:`optionaltags.Filter <html5lib.filters.optionaltags.Filter>`
|
||||
removes tags from the stream which are not necessary to produce valid
|
||||
HTML
|
||||
|
||||
* :class:`sanitizer.Filter <html5lib.filters.sanitizer.Filter>` removes
|
||||
unsafe markup and CSS. Elements that are known to be safe are passed
|
||||
through and the rest is converted to visible text. The default
|
||||
configuration of the sanitizer follows the `WHATWG Sanitization Rules
|
||||
<http://wiki.whatwg.org/wiki/Sanitization_rules>`_.
|
||||
|
||||
* :class:`whitespace.Filter <html5lib.filters.whitespace.Filter>`
|
||||
collapses all whitespace characters to single spaces unless they're in
|
||||
``<pre/>`` or ``textarea`` tags.
|
||||
|
||||
To use a filter, simply wrap it around a stream:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
>>> import html5lib
|
||||
>>> from html5lib.filters import sanitizer
|
||||
>>> dom = html5lib.parse("<p><script>alert('Boo!')", treebuilder="dom")
|
||||
>>> walker = html5lib.getTreeWalker("dom")
|
||||
>>> stream = walker(dom)
|
||||
>>> sane_stream = sanitizer.Filter(stream) clean_stream = sanitizer.Filter(stream)
|
||||
|
||||
|
||||
Tree adapters
|
||||
-------------
|
||||
|
||||
Used to translate one type of tree to another. More documentation
|
||||
pending, sorry.
|
||||
|
||||
|
||||
Encoding discovery
|
||||
------------------
|
||||
|
||||
Parsed trees are always Unicode. However a large variety of input
|
||||
encodings are supported. The encoding of the document is determined in
|
||||
the following way:
|
||||
|
||||
* The encoding may be explicitly specified by passing the name of the
|
||||
encoding as the encoding parameter to the
|
||||
:meth:`~html5lib.html5parser.HTMLParser.parse` method on
|
||||
``HTMLParser`` objects.
|
||||
|
||||
* If no encoding is specified, the parser will attempt to detect the
|
||||
encoding from a ``<meta>`` element in the first 512 bytes of the
|
||||
document (this is only a partial implementation of the current HTML
|
||||
5 specification).
|
||||
|
||||
* If no encoding can be found and the chardet library is available, an
|
||||
attempt will be made to sniff the encoding from the byte pattern.
|
||||
|
||||
* If all else fails, the default encoding will be used. This is usually
|
||||
`Windows-1252 <http://en.wikipedia.org/wiki/Windows-1252>`_, which is
|
||||
a common fallback used by Web browsers.
|
||||
|
||||
|
||||
Tokenizers
|
||||
----------
|
||||
|
||||
The part of the parser responsible for translating a raw input stream
|
||||
into meaningful tokens is the tokenizer. Currently html5lib provides
|
||||
two.
|
||||
|
||||
To set up a tokenizer, simply pass it when instantiating
|
||||
a :class:`~html5lib.html5parser.HTMLParser`:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
import html5lib
|
||||
from html5lib import sanitizer
|
||||
|
||||
p = html5lib.HTMLParser(tokenizer=sanitizer.HTMLSanitizer)
|
||||
p.parse("<p>Surprise!<script>alert('Boo!');</script>")
|
||||
|
||||
HTMLTokenizer
|
||||
~~~~~~~~~~~~~
|
||||
|
||||
This is the default tokenizer, the heart of html5lib. The implementation
|
||||
can be found in `html5lib/tokenizer.py
|
||||
<https://github.com/html5lib/html5lib-python/blob/master/html5lib/tokenizer.py>`_.
|
||||
|
||||
HTMLSanitizer
|
||||
~~~~~~~~~~~~~
|
||||
|
||||
This is a tokenizer that removes unsafe markup and CSS styles from the
|
||||
input. Elements that are known to be safe are passed through and the
|
||||
rest is converted to visible text. The default configuration of the
|
||||
sanitizer follows the `WHATWG Sanitization Rules
|
||||
<http://wiki.whatwg.org/wiki/Sanitization_rules>`_.
|
||||
|
||||
The implementation can be found in `html5lib/sanitizer.py
|
||||
<https://github.com/html5lib/html5lib-python/blob/master/html5lib/sanitizer.py>`_.
|
14
tests/wpt/web-platform-tests/tools/html5lib/flake8-run.sh
Executable file
14
tests/wpt/web-platform-tests/tools/html5lib/flake8-run.sh
Executable file
|
@ -0,0 +1,14 @@
|
|||
#!/bin/bash -e
|
||||
|
||||
if [[ ! -x $(which flake8) ]]; then
|
||||
echo "fatal: flake8 not found on $PATH. Exiting."
|
||||
exit 1
|
||||
fi
|
||||
|
||||
if [[ $TRAVIS != "true" || $FLAKE == "true" ]]; then
|
||||
find html5lib/ -name '*.py' -and -not -name 'constants.py' -print0 | xargs -0 flake8 --ignore=E501
|
||||
flake1=$?
|
||||
flake8 --max-line-length=99 --ignore=E126 html5lib/constants.py
|
||||
flake2=$?
|
||||
exit $[$flake1 || $flake2]
|
||||
fi
|
|
@ -0,0 +1,23 @@
|
|||
"""
|
||||
HTML parsing library based on the WHATWG "HTML5"
|
||||
specification. The parser is designed to be compatible with existing
|
||||
HTML found in the wild and implements well-defined error recovery that
|
||||
is largely compatible with modern desktop web browsers.
|
||||
|
||||
Example usage:
|
||||
|
||||
import html5lib
|
||||
f = open("my_document.html")
|
||||
tree = html5lib.parse(f)
|
||||
"""
|
||||
|
||||
from __future__ import absolute_import, division, unicode_literals
|
||||
|
||||
from .html5parser import HTMLParser, parse, parseFragment
|
||||
from .treebuilders import getTreeBuilder
|
||||
from .treewalkers import getTreeWalker
|
||||
from .serializer import serialize
|
||||
|
||||
__all__ = ["HTMLParser", "parse", "parseFragment", "getTreeBuilder",
|
||||
"getTreeWalker", "serialize"]
|
||||
__version__ = "0.9999-dev"
|
3104
tests/wpt/web-platform-tests/tools/html5lib/html5lib/constants.py
Normal file
3104
tests/wpt/web-platform-tests/tools/html5lib/html5lib/constants.py
Normal file
File diff suppressed because it is too large
Load diff
|
@ -0,0 +1,12 @@
|
|||
from __future__ import absolute_import, division, unicode_literals
|
||||
|
||||
|
||||
class Filter(object):
|
||||
def __init__(self, source):
|
||||
self.source = source
|
||||
|
||||
def __iter__(self):
|
||||
return iter(self.source)
|
||||
|
||||
def __getattr__(self, name):
|
||||
return getattr(self.source, name)
|
|
@ -0,0 +1,20 @@
|
|||
from __future__ import absolute_import, division, unicode_literals
|
||||
|
||||
from . import _base
|
||||
|
||||
try:
|
||||
from collections import OrderedDict
|
||||
except ImportError:
|
||||
from ordereddict import OrderedDict
|
||||
|
||||
|
||||
class Filter(_base.Filter):
|
||||
def __iter__(self):
|
||||
for token in _base.Filter.__iter__(self):
|
||||
if token["type"] in ("StartTag", "EmptyTag"):
|
||||
attrs = OrderedDict()
|
||||
for name, value in sorted(token["data"].items(),
|
||||
key=lambda x: x[0]):
|
||||
attrs[name] = value
|
||||
token["data"] = attrs
|
||||
yield token
|
|
@ -0,0 +1,65 @@
|
|||
from __future__ import absolute_import, division, unicode_literals
|
||||
|
||||
from . import _base
|
||||
|
||||
|
||||
class Filter(_base.Filter):
|
||||
def __init__(self, source, encoding):
|
||||
_base.Filter.__init__(self, source)
|
||||
self.encoding = encoding
|
||||
|
||||
def __iter__(self):
|
||||
state = "pre_head"
|
||||
meta_found = (self.encoding is None)
|
||||
pending = []
|
||||
|
||||
for token in _base.Filter.__iter__(self):
|
||||
type = token["type"]
|
||||
if type == "StartTag":
|
||||
if token["name"].lower() == "head":
|
||||
state = "in_head"
|
||||
|
||||
elif type == "EmptyTag":
|
||||
if token["name"].lower() == "meta":
|
||||
# replace charset with actual encoding
|
||||
has_http_equiv_content_type = False
|
||||
for (namespace, name), value in token["data"].items():
|
||||
if namespace is not None:
|
||||
continue
|
||||
elif name.lower() == 'charset':
|
||||
token["data"][(namespace, name)] = self.encoding
|
||||
meta_found = True
|
||||
break
|
||||
elif name == 'http-equiv' and value.lower() == 'content-type':
|
||||
has_http_equiv_content_type = True
|
||||
else:
|
||||
if has_http_equiv_content_type and (None, "content") in token["data"]:
|
||||
token["data"][(None, "content")] = 'text/html; charset=%s' % self.encoding
|
||||
meta_found = True
|
||||
|
||||
elif token["name"].lower() == "head" and not meta_found:
|
||||
# insert meta into empty head
|
||||
yield {"type": "StartTag", "name": "head",
|
||||
"data": token["data"]}
|
||||
yield {"type": "EmptyTag", "name": "meta",
|
||||
"data": {(None, "charset"): self.encoding}}
|
||||
yield {"type": "EndTag", "name": "head"}
|
||||
meta_found = True
|
||||
continue
|
||||
|
||||
elif type == "EndTag":
|
||||
if token["name"].lower() == "head" and pending:
|
||||
# insert meta into head (if necessary) and flush pending queue
|
||||
yield pending.pop(0)
|
||||
if not meta_found:
|
||||
yield {"type": "EmptyTag", "name": "meta",
|
||||
"data": {(None, "charset"): self.encoding}}
|
||||
while pending:
|
||||
yield pending.pop(0)
|
||||
meta_found = True
|
||||
state = "post_head"
|
||||
|
||||
if state == "in_head":
|
||||
pending.append(token)
|
||||
else:
|
||||
yield token
|
|
@ -0,0 +1,93 @@
|
|||
from __future__ import absolute_import, division, unicode_literals
|
||||
|
||||
from gettext import gettext
|
||||
_ = gettext
|
||||
|
||||
from . import _base
|
||||
from ..constants import cdataElements, rcdataElements, voidElements
|
||||
|
||||
from ..constants import spaceCharacters
|
||||
spaceCharacters = "".join(spaceCharacters)
|
||||
|
||||
|
||||
class LintError(Exception):
|
||||
pass
|
||||
|
||||
|
||||
class Filter(_base.Filter):
|
||||
def __iter__(self):
|
||||
open_elements = []
|
||||
contentModelFlag = "PCDATA"
|
||||
for token in _base.Filter.__iter__(self):
|
||||
type = token["type"]
|
||||
if type in ("StartTag", "EmptyTag"):
|
||||
name = token["name"]
|
||||
if contentModelFlag != "PCDATA":
|
||||
raise LintError(_("StartTag not in PCDATA content model flag: %(tag)s") % {"tag": name})
|
||||
if not isinstance(name, str):
|
||||
raise LintError(_("Tag name is not a string: %(tag)r") % {"tag": name})
|
||||
if not name:
|
||||
raise LintError(_("Empty tag name"))
|
||||
if type == "StartTag" and name in voidElements:
|
||||
raise LintError(_("Void element reported as StartTag token: %(tag)s") % {"tag": name})
|
||||
elif type == "EmptyTag" and name not in voidElements:
|
||||
raise LintError(_("Non-void element reported as EmptyTag token: %(tag)s") % {"tag": token["name"]})
|
||||
if type == "StartTag":
|
||||
open_elements.append(name)
|
||||
for name, value in token["data"]:
|
||||
if not isinstance(name, str):
|
||||
raise LintError(_("Attribute name is not a string: %(name)r") % {"name": name})
|
||||
if not name:
|
||||
raise LintError(_("Empty attribute name"))
|
||||
if not isinstance(value, str):
|
||||
raise LintError(_("Attribute value is not a string: %(value)r") % {"value": value})
|
||||
if name in cdataElements:
|
||||
contentModelFlag = "CDATA"
|
||||
elif name in rcdataElements:
|
||||
contentModelFlag = "RCDATA"
|
||||
elif name == "plaintext":
|
||||
contentModelFlag = "PLAINTEXT"
|
||||
|
||||
elif type == "EndTag":
|
||||
name = token["name"]
|
||||
if not isinstance(name, str):
|
||||
raise LintError(_("Tag name is not a string: %(tag)r") % {"tag": name})
|
||||
if not name:
|
||||
raise LintError(_("Empty tag name"))
|
||||
if name in voidElements:
|
||||
raise LintError(_("Void element reported as EndTag token: %(tag)s") % {"tag": name})
|
||||
start_name = open_elements.pop()
|
||||
if start_name != name:
|
||||
raise LintError(_("EndTag (%(end)s) does not match StartTag (%(start)s)") % {"end": name, "start": start_name})
|
||||
contentModelFlag = "PCDATA"
|
||||
|
||||
elif type == "Comment":
|
||||
if contentModelFlag != "PCDATA":
|
||||
raise LintError(_("Comment not in PCDATA content model flag"))
|
||||
|
||||
elif type in ("Characters", "SpaceCharacters"):
|
||||
data = token["data"]
|
||||
if not isinstance(data, str):
|
||||
raise LintError(_("Attribute name is not a string: %(name)r") % {"name": data})
|
||||
if not data:
|
||||
raise LintError(_("%(type)s token with empty data") % {"type": type})
|
||||
if type == "SpaceCharacters":
|
||||
data = data.strip(spaceCharacters)
|
||||
if data:
|
||||
raise LintError(_("Non-space character(s) found in SpaceCharacters token: %(token)r") % {"token": data})
|
||||
|
||||
elif type == "Doctype":
|
||||
name = token["name"]
|
||||
if contentModelFlag != "PCDATA":
|
||||
raise LintError(_("Doctype not in PCDATA content model flag: %(name)s") % {"name": name})
|
||||
if not isinstance(name, str):
|
||||
raise LintError(_("Tag name is not a string: %(tag)r") % {"tag": name})
|
||||
# XXX: what to do with token["data"] ?
|
||||
|
||||
elif type in ("ParseError", "SerializeError"):
|
||||
pass
|
||||
|
||||
else:
|
||||
raise LintError(_("Unknown token type: %(type)s") % {"type": type})
|
||||
|
||||
yield token
|
|
@ -0,0 +1,205 @@
|
|||
from __future__ import absolute_import, division, unicode_literals
|
||||
|
||||
from . import _base
|
||||
|
||||
|
||||
class Filter(_base.Filter):
|
||||
def slider(self):
|
||||
previous1 = previous2 = None
|
||||
for token in self.source:
|
||||
if previous1 is not None:
|
||||
yield previous2, previous1, token
|
||||
previous2 = previous1
|
||||
previous1 = token
|
||||
yield previous2, previous1, None
|
||||
|
||||
def __iter__(self):
|
||||
for previous, token, next in self.slider():
|
||||
type = token["type"]
|
||||
if type == "StartTag":
|
||||
if (token["data"] or
|
||||
not self.is_optional_start(token["name"], previous, next)):
|
||||
yield token
|
||||
elif type == "EndTag":
|
||||
if not self.is_optional_end(token["name"], next):
|
||||
yield token
|
||||
else:
|
||||
yield token
|
||||
|
||||
def is_optional_start(self, tagname, previous, next):
|
||||
type = next and next["type"] or None
|
||||
if tagname in 'html':
|
||||
# An html element's start tag may be omitted if the first thing
|
||||
# inside the html element is not a space character or a comment.
|
||||
return type not in ("Comment", "SpaceCharacters")
|
||||
elif tagname == 'head':
|
||||
# A head element's start tag may be omitted if the first thing
|
||||
# inside the head element is an element.
|
||||
# XXX: we also omit the start tag if the head element is empty
|
||||
if type in ("StartTag", "EmptyTag"):
|
||||
return True
|
||||
elif type == "EndTag":
|
||||
return next["name"] == "head"
|
||||
elif tagname == 'body':
|
||||
# A body element's start tag may be omitted if the first thing
|
||||
# inside the body element is not a space character or a comment,
|
||||
# except if the first thing inside the body element is a script
|
||||
# or style element and the node immediately preceding the body
|
||||
# element is a head element whose end tag has been omitted.
|
||||
if type in ("Comment", "SpaceCharacters"):
|
||||
return False
|
||||
elif type == "StartTag":
|
||||
# XXX: we do not look at the preceding event, so we never omit
|
||||
# the body element's start tag if it's followed by a script or
|
||||
# a style element.
|
||||
return next["name"] not in ('script', 'style')
|
||||
else:
|
||||
return True
|
||||
elif tagname == 'colgroup':
|
||||
# A colgroup element's start tag may be omitted if the first thing
|
||||
# inside the colgroup element is a col element, and if the element
|
||||
# is not immediately preceeded by another colgroup element whose
|
||||
# end tag has been omitted.
|
||||
if type in ("StartTag", "EmptyTag"):
|
||||
# XXX: we do not look at the preceding event, so instead we never
|
||||
# omit the colgroup element's end tag when it is immediately
|
||||
# followed by another colgroup element. See is_optional_end.
|
||||
return next["name"] == "col"
|
||||
else:
|
||||
return False
|
||||
elif tagname == 'tbody':
|
||||
# A tbody element's start tag may be omitted if the first thing
|
||||
# inside the tbody element is a tr element, and if the element is
|
||||
# not immediately preceeded by a tbody, thead, or tfoot element
|
||||
# whose end tag has been omitted.
|
||||
if type == "StartTag":
|
||||
# omit the thead and tfoot elements' end tag when they are
|
||||
# immediately followed by a tbody element. See is_optional_end.
|
||||
if previous and previous['type'] == 'EndTag' and \
|
||||
previous['name'] in ('tbody', 'thead', 'tfoot'):
|
||||
return False
|
||||
return next["name"] == 'tr'
|
||||
else:
|
||||
return False
|
||||
return False
|
||||
|
||||
def is_optional_end(self, tagname, next):
|
||||
type = next and next["type"] or None
|
||||
if tagname in ('html', 'head', 'body'):
|
||||
# An html element's end tag may be omitted if the html element
|
||||
# is not immediately followed by a space character or a comment.
|
||||
return type not in ("Comment", "SpaceCharacters")
|
||||
elif tagname in ('li', 'optgroup', 'tr'):
|
||||
# A li element's end tag may be omitted if the li element is
|
||||
# immediately followed by another li element or if there is
|
||||
# no more content in the parent element.
|
||||
# An optgroup element's end tag may be omitted if the optgroup
|
||||
# element is immediately followed by another optgroup element,
|
||||
# or if there is no more content in the parent element.
|
||||
# A tr element's end tag may be omitted if the tr element is
|
||||
# immediately followed by another tr element, or if there is
|
||||
# no more content in the parent element.
|
||||
if type == "StartTag":
|
||||
return next["name"] == tagname
|
||||
else:
|
||||
return type == "EndTag" or type is None
|
||||
elif tagname in ('dt', 'dd'):
|
||||
# A dt element's end tag may be omitted if the dt element is
|
||||
# immediately followed by another dt element or a dd element.
|
||||
# A dd element's end tag may be omitted if the dd element is
|
||||
# immediately followed by another dd element or a dt element,
|
||||
# or if there is no more content in the parent element.
|
||||
if type == "StartTag":
|
||||
return next["name"] in ('dt', 'dd')
|
||||
elif tagname == 'dd':
|
||||
return type == "EndTag" or type is None
|
||||
else:
|
||||
return False
|
||||
elif tagname == 'p':
|
||||
# A p element's end tag may be omitted if the p element is
|
||||
# immediately followed by an address, article, aside,
|
||||
# blockquote, datagrid, dialog, dir, div, dl, fieldset,
|
||||
# footer, form, h1, h2, h3, h4, h5, h6, header, hr, menu,
|
||||
# nav, ol, p, pre, section, table, or ul, element, or if
|
||||
# there is no more content in the parent element.
|
||||
if type in ("StartTag", "EmptyTag"):
|
||||
return next["name"] in ('address', 'article', 'aside',
|
||||
'blockquote', 'datagrid', 'dialog',
|
||||
'dir', 'div', 'dl', 'fieldset', 'footer',
|
||||
'form', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6',
|
||||
'header', 'hr', 'menu', 'nav', 'ol',
|
||||
'p', 'pre', 'section', 'table', 'ul')
|
||||
else:
|
||||
return type == "EndTag" or type is None
|
||||
elif tagname == 'option':
|
||||
# An option element's end tag may be omitted if the option
|
||||
# element is immediately followed by another option element,
|
||||
# or if it is immediately followed by an <code>optgroup</code>
|
||||
# element, or if there is no more content in the parent
|
||||
# element.
|
||||
if type == "StartTag":
|
||||
return next["name"] in ('option', 'optgroup')
|
||||
else:
|
||||
return type == "EndTag" or type is None
|
||||
elif tagname in ('rt', 'rp'):
|
||||
# An rt element's end tag may be omitted if the rt element is
|
||||
# immediately followed by an rt or rp element, or if there is
|
||||
# no more content in the parent element.
|
||||
# An rp element's end tag may be omitted if the rp element is
|
||||
# immediately followed by an rt or rp element, or if there is
|
||||
# no more content in the parent element.
|
||||
if type == "StartTag":
|
||||
return next["name"] in ('rt', 'rp')
|
||||
else:
|
||||
return type == "EndTag" or type is None
|
||||
elif tagname == 'colgroup':
|
||||
# A colgroup element's end tag may be omitted if the colgroup
|
||||
# element is not immediately followed by a space character or
|
||||
# a comment.
|
||||
if type in ("Comment", "SpaceCharacters"):
|
||||
return False
|
||||
elif type == "StartTag":
|
||||
# XXX: we also look for an immediately following colgroup
|
||||
# element. See is_optional_start.
|
||||
return next["name"] != 'colgroup'
|
||||
else:
|
||||
return True
|
||||
elif tagname in ('thead', 'tbody'):
|
||||
# A thead element's end tag may be omitted if the thead element
|
||||
# is immediately followed by a tbody or tfoot element.
|
||||
# A tbody element's end tag may be omitted if the tbody element
|
||||
# is immediately followed by a tbody or tfoot element, or if
|
||||
# there is no more content in the parent element.
|
||||
# A tfoot element's end tag may be omitted if the tfoot element
|
||||
# is immediately followed by a tbody element, or if there is no
|
||||
# more content in the parent element.
|
||||
# XXX: we never omit the end tag when the following element is
|
||||
# a tbody. See is_optional_start.
|
||||
if type == "StartTag":
|
||||
return next["name"] in ['tbody', 'tfoot']
|
||||
elif tagname == 'tbody':
|
||||
return type == "EndTag" or type is None
|
||||
else:
|
||||
return False
|
||||
elif tagname == 'tfoot':
|
||||
# A tfoot element's end tag may be omitted if the tfoot element
|
||||
# is immediately followed by a tbody element, or if there is no
|
||||
# more content in the parent element.
|
||||
# XXX: we never omit the end tag when the following element is
|
||||
# a tbody. See is_optional_start.
|
||||
if type == "StartTag":
|
||||
return next["name"] == 'tbody'
|
||||
else:
|
||||
return type == "EndTag" or type is None
|
||||
elif tagname in ('td', 'th'):
|
||||
# A td element's end tag may be omitted if the td element is
|
||||
# immediately followed by a td or th element, or if there is
|
||||
# no more content in the parent element.
|
||||
# A th element's end tag may be omitted if the th element is
|
||||
# immediately followed by a td or th element, or if there is
|
||||
# no more content in the parent element.
|
||||
if type == "StartTag":
|
||||
return next["name"] in ('td', 'th')
|
||||
else:
|
||||
return type == "EndTag" or type is None
|
||||
return False
|
|
@ -0,0 +1,12 @@
|
|||
from __future__ import absolute_import, division, unicode_literals
|
||||
|
||||
from . import _base
|
||||
from ..sanitizer import HTMLSanitizerMixin
|
||||
|
||||
|
||||
class Filter(_base.Filter, HTMLSanitizerMixin):
|
||||
def __iter__(self):
|
||||
for token in _base.Filter.__iter__(self):
|
||||
token = self.sanitize_token(token)
|
||||
if token:
|
||||
yield token
|
|
@ -0,0 +1,38 @@
|
|||
from __future__ import absolute_import, division, unicode_literals
|
||||
|
||||
import re
|
||||
|
||||
from . import _base
|
||||
from ..constants import rcdataElements, spaceCharacters
|
||||
spaceCharacters = "".join(spaceCharacters)
|
||||
|
||||
SPACES_REGEX = re.compile("[%s]+" % spaceCharacters)
|
||||
|
||||
|
||||
class Filter(_base.Filter):
|
||||
|
||||
spacePreserveElements = frozenset(["pre", "textarea"] + list(rcdataElements))
|
||||
|
||||
def __iter__(self):
|
||||
preserve = 0
|
||||
for token in _base.Filter.__iter__(self):
|
||||
type = token["type"]
|
||||
if type == "StartTag" \
|
||||
and (preserve or token["name"] in self.spacePreserveElements):
|
||||
preserve += 1
|
||||
|
||||
elif type == "EndTag" and preserve:
|
||||
preserve -= 1
|
||||
|
||||
elif not preserve and type == "SpaceCharacters" and token["data"]:
|
||||
# Test on token["data"] above to not introduce spaces where there were not
|
||||
token["data"] = " "
|
||||
|
||||
elif not preserve and type == "Characters":
|
||||
token["data"] = collapse_spaces(token["data"])
|
||||
|
||||
yield token
|
||||
|
||||
|
||||
def collapse_spaces(text):
|
||||
return SPACES_REGEX.sub(' ', text)
|
2723
tests/wpt/web-platform-tests/tools/html5lib/html5lib/html5parser.py
Normal file
2723
tests/wpt/web-platform-tests/tools/html5lib/html5lib/html5parser.py
Normal file
File diff suppressed because it is too large
Load diff
285
tests/wpt/web-platform-tests/tools/html5lib/html5lib/ihatexml.py
Normal file
285
tests/wpt/web-platform-tests/tools/html5lib/html5lib/ihatexml.py
Normal file
|
@ -0,0 +1,285 @@
|
|||
from __future__ import absolute_import, division, unicode_literals
|
||||
|
||||
import re
|
||||
import warnings
|
||||
|
||||
from .constants import DataLossWarning
|
||||
|
||||
baseChar = """
|
||||
[#x0041-#x005A] | [#x0061-#x007A] | [#x00C0-#x00D6] | [#x00D8-#x00F6] |
|
||||
[#x00F8-#x00FF] | [#x0100-#x0131] | [#x0134-#x013E] | [#x0141-#x0148] |
|
||||
[#x014A-#x017E] | [#x0180-#x01C3] | [#x01CD-#x01F0] | [#x01F4-#x01F5] |
|
||||
[#x01FA-#x0217] | [#x0250-#x02A8] | [#x02BB-#x02C1] | #x0386 |
|
||||
[#x0388-#x038A] | #x038C | [#x038E-#x03A1] | [#x03A3-#x03CE] |
|
||||
[#x03D0-#x03D6] | #x03DA | #x03DC | #x03DE | #x03E0 | [#x03E2-#x03F3] |
|
||||
[#x0401-#x040C] | [#x040E-#x044F] | [#x0451-#x045C] | [#x045E-#x0481] |
|
||||
[#x0490-#x04C4] | [#x04C7-#x04C8] | [#x04CB-#x04CC] | [#x04D0-#x04EB] |
|
||||
[#x04EE-#x04F5] | [#x04F8-#x04F9] | [#x0531-#x0556] | #x0559 |
|
||||
[#x0561-#x0586] | [#x05D0-#x05EA] | [#x05F0-#x05F2] | [#x0621-#x063A] |
|
||||
[#x0641-#x064A] | [#x0671-#x06B7] | [#x06BA-#x06BE] | [#x06C0-#x06CE] |
|
||||
[#x06D0-#x06D3] | #x06D5 | [#x06E5-#x06E6] | [#x0905-#x0939] | #x093D |
|
||||
[#x0958-#x0961] | [#x0985-#x098C] | [#x098F-#x0990] | [#x0993-#x09A8] |
|
||||
[#x09AA-#x09B0] | #x09B2 | [#x09B6-#x09B9] | [#x09DC-#x09DD] |
|
||||
[#x09DF-#x09E1] | [#x09F0-#x09F1] | [#x0A05-#x0A0A] | [#x0A0F-#x0A10] |
|
||||
[#x0A13-#x0A28] | [#x0A2A-#x0A30] | [#x0A32-#x0A33] | [#x0A35-#x0A36] |
|
||||
[#x0A38-#x0A39] | [#x0A59-#x0A5C] | #x0A5E | [#x0A72-#x0A74] |
|
||||
[#x0A85-#x0A8B] | #x0A8D | [#x0A8F-#x0A91] | [#x0A93-#x0AA8] |
|
||||
[#x0AAA-#x0AB0] | [#x0AB2-#x0AB3] | [#x0AB5-#x0AB9] | #x0ABD | #x0AE0 |
|
||||
[#x0B05-#x0B0C] | [#x0B0F-#x0B10] | [#x0B13-#x0B28] | [#x0B2A-#x0B30] |
|
||||
[#x0B32-#x0B33] | [#x0B36-#x0B39] | #x0B3D | [#x0B5C-#x0B5D] |
|
||||
[#x0B5F-#x0B61] | [#x0B85-#x0B8A] | [#x0B8E-#x0B90] | [#x0B92-#x0B95] |
|
||||
[#x0B99-#x0B9A] | #x0B9C | [#x0B9E-#x0B9F] | [#x0BA3-#x0BA4] |
|
||||
[#x0BA8-#x0BAA] | [#x0BAE-#x0BB5] | [#x0BB7-#x0BB9] | [#x0C05-#x0C0C] |
|
||||
[#x0C0E-#x0C10] | [#x0C12-#x0C28] | [#x0C2A-#x0C33] | [#x0C35-#x0C39] |
|
||||
[#x0C60-#x0C61] | [#x0C85-#x0C8C] | [#x0C8E-#x0C90] | [#x0C92-#x0CA8] |
|
||||
[#x0CAA-#x0CB3] | [#x0CB5-#x0CB9] | #x0CDE | [#x0CE0-#x0CE1] |
|
||||
[#x0D05-#x0D0C] | [#x0D0E-#x0D10] | [#x0D12-#x0D28] | [#x0D2A-#x0D39] |
|
||||
[#x0D60-#x0D61] | [#x0E01-#x0E2E] | #x0E30 | [#x0E32-#x0E33] |
|
||||
[#x0E40-#x0E45] | [#x0E81-#x0E82] | #x0E84 | [#x0E87-#x0E88] | #x0E8A |
|
||||
#x0E8D | [#x0E94-#x0E97] | [#x0E99-#x0E9F] | [#x0EA1-#x0EA3] | #x0EA5 |
|
||||
#x0EA7 | [#x0EAA-#x0EAB] | [#x0EAD-#x0EAE] | #x0EB0 | [#x0EB2-#x0EB3] |
|
||||
#x0EBD | [#x0EC0-#x0EC4] | [#x0F40-#x0F47] | [#x0F49-#x0F69] |
|
||||
[#x10A0-#x10C5] | [#x10D0-#x10F6] | #x1100 | [#x1102-#x1103] |
|
||||
[#x1105-#x1107] | #x1109 | [#x110B-#x110C] | [#x110E-#x1112] | #x113C |
|
||||
#x113E | #x1140 | #x114C | #x114E | #x1150 | [#x1154-#x1155] | #x1159 |
|
||||
[#x115F-#x1161] | #x1163 | #x1165 | #x1167 | #x1169 | [#x116D-#x116E] |
|
||||
[#x1172-#x1173] | #x1175 | #x119E | #x11A8 | #x11AB | [#x11AE-#x11AF] |
|
||||
[#x11B7-#x11B8] | #x11BA | [#x11BC-#x11C2] | #x11EB | #x11F0 | #x11F9 |
|
||||
[#x1E00-#x1E9B] | [#x1EA0-#x1EF9] | [#x1F00-#x1F15] | [#x1F18-#x1F1D] |
|
||||
[#x1F20-#x1F45] | [#x1F48-#x1F4D] | [#x1F50-#x1F57] | #x1F59 | #x1F5B |
|
||||
#x1F5D | [#x1F5F-#x1F7D] | [#x1F80-#x1FB4] | [#x1FB6-#x1FBC] | #x1FBE |
|
||||
[#x1FC2-#x1FC4] | [#x1FC6-#x1FCC] | [#x1FD0-#x1FD3] | [#x1FD6-#x1FDB] |
|
||||
[#x1FE0-#x1FEC] | [#x1FF2-#x1FF4] | [#x1FF6-#x1FFC] | #x2126 |
|
||||
[#x212A-#x212B] | #x212E | [#x2180-#x2182] | [#x3041-#x3094] |
|
||||
[#x30A1-#x30FA] | [#x3105-#x312C] | [#xAC00-#xD7A3]"""
|
||||
|
||||
ideographic = """[#x4E00-#x9FA5] | #x3007 | [#x3021-#x3029]"""
|
||||
|
||||
combiningCharacter = """
|
||||
[#x0300-#x0345] | [#x0360-#x0361] | [#x0483-#x0486] | [#x0591-#x05A1] |
|
||||
[#x05A3-#x05B9] | [#x05BB-#x05BD] | #x05BF | [#x05C1-#x05C2] | #x05C4 |
|
||||
[#x064B-#x0652] | #x0670 | [#x06D6-#x06DC] | [#x06DD-#x06DF] |
|
||||
[#x06E0-#x06E4] | [#x06E7-#x06E8] | [#x06EA-#x06ED] | [#x0901-#x0903] |
|
||||
#x093C | [#x093E-#x094C] | #x094D | [#x0951-#x0954] | [#x0962-#x0963] |
|
||||
[#x0981-#x0983] | #x09BC | #x09BE | #x09BF | [#x09C0-#x09C4] |
|
||||
[#x09C7-#x09C8] | [#x09CB-#x09CD] | #x09D7 | [#x09E2-#x09E3] | #x0A02 |
|
||||
#x0A3C | #x0A3E | #x0A3F | [#x0A40-#x0A42] | [#x0A47-#x0A48] |
|
||||
[#x0A4B-#x0A4D] | [#x0A70-#x0A71] | [#x0A81-#x0A83] | #x0ABC |
|
||||
[#x0ABE-#x0AC5] | [#x0AC7-#x0AC9] | [#x0ACB-#x0ACD] | [#x0B01-#x0B03] |
|
||||
#x0B3C | [#x0B3E-#x0B43] | [#x0B47-#x0B48] | [#x0B4B-#x0B4D] |
|
||||
[#x0B56-#x0B57] | [#x0B82-#x0B83] | [#x0BBE-#x0BC2] | [#x0BC6-#x0BC8] |
|
||||
[#x0BCA-#x0BCD] | #x0BD7 | [#x0C01-#x0C03] | [#x0C3E-#x0C44] |
|
||||
[#x0C46-#x0C48] | [#x0C4A-#x0C4D] | [#x0C55-#x0C56] | [#x0C82-#x0C83] |
|
||||
[#x0CBE-#x0CC4] | [#x0CC6-#x0CC8] | [#x0CCA-#x0CCD] | [#x0CD5-#x0CD6] |
|
||||
[#x0D02-#x0D03] | [#x0D3E-#x0D43] | [#x0D46-#x0D48] | [#x0D4A-#x0D4D] |
|
||||
#x0D57 | #x0E31 | [#x0E34-#x0E3A] | [#x0E47-#x0E4E] | #x0EB1 |
|
||||
[#x0EB4-#x0EB9] | [#x0EBB-#x0EBC] | [#x0EC8-#x0ECD] | [#x0F18-#x0F19] |
|
||||
#x0F35 | #x0F37 | #x0F39 | #x0F3E | #x0F3F | [#x0F71-#x0F84] |
|
||||
[#x0F86-#x0F8B] | [#x0F90-#x0F95] | #x0F97 | [#x0F99-#x0FAD] |
|
||||
[#x0FB1-#x0FB7] | #x0FB9 | [#x20D0-#x20DC] | #x20E1 | [#x302A-#x302F] |
|
||||
#x3099 | #x309A"""
|
||||
|
||||
digit = """
|
||||
[#x0030-#x0039] | [#x0660-#x0669] | [#x06F0-#x06F9] | [#x0966-#x096F] |
|
||||
[#x09E6-#x09EF] | [#x0A66-#x0A6F] | [#x0AE6-#x0AEF] | [#x0B66-#x0B6F] |
|
||||
[#x0BE7-#x0BEF] | [#x0C66-#x0C6F] | [#x0CE6-#x0CEF] | [#x0D66-#x0D6F] |
|
||||
[#x0E50-#x0E59] | [#x0ED0-#x0ED9] | [#x0F20-#x0F29]"""
|
||||
|
||||
extender = """
|
||||
#x00B7 | #x02D0 | #x02D1 | #x0387 | #x0640 | #x0E46 | #x0EC6 | #x3005 |
|
||||
#[#x3031-#x3035] | [#x309D-#x309E] | [#x30FC-#x30FE]"""
|
||||
|
||||
letter = " | ".join([baseChar, ideographic])
|
||||
|
||||
# Without the
|
||||
name = " | ".join([letter, digit, ".", "-", "_", combiningCharacter,
|
||||
extender])
|
||||
nameFirst = " | ".join([letter, "_"])
|
||||
|
||||
reChar = re.compile(r"#x([\d|A-F]{4,4})")
|
||||
reCharRange = re.compile(r"\[#x([\d|A-F]{4,4})-#x([\d|A-F]{4,4})\]")
|
||||
|
||||
|
||||
def charStringToList(chars):
|
||||
charRanges = [item.strip() for item in chars.split(" | ")]
|
||||
rv = []
|
||||
for item in charRanges:
|
||||
foundMatch = False
|
||||
for regexp in (reChar, reCharRange):
|
||||
match = regexp.match(item)
|
||||
if match is not None:
|
||||
rv.append([hexToInt(item) for item in match.groups()])
|
||||
if len(rv[-1]) == 1:
|
||||
rv[-1] = rv[-1] * 2
|
||||
foundMatch = True
|
||||
break
|
||||
if not foundMatch:
|
||||
assert len(item) == 1
|
||||
|
||||
rv.append([ord(item)] * 2)
|
||||
rv = normaliseCharList(rv)
|
||||
return rv
|
||||
|
||||
|
||||
def normaliseCharList(charList):
|
||||
charList = sorted(charList)
|
||||
for item in charList:
|
||||
assert item[1] >= item[0]
|
||||
rv = []
|
||||
i = 0
|
||||
while i < len(charList):
|
||||
j = 1
|
||||
rv.append(charList[i])
|
||||
while i + j < len(charList) and charList[i + j][0] <= rv[-1][1] + 1:
|
||||
rv[-1][1] = charList[i + j][1]
|
||||
j += 1
|
||||
i += j
|
||||
return rv
|
||||
|
||||
# We don't really support characters above the BMP :(
|
||||
max_unicode = int("FFFF", 16)
|
||||
|
||||
|
||||
def missingRanges(charList):
|
||||
rv = []
|
||||
if charList[0] != 0:
|
||||
rv.append([0, charList[0][0] - 1])
|
||||
for i, item in enumerate(charList[:-1]):
|
||||
rv.append([item[1] + 1, charList[i + 1][0] - 1])
|
||||
if charList[-1][1] != max_unicode:
|
||||
rv.append([charList[-1][1] + 1, max_unicode])
|
||||
return rv
|
||||
|
||||
|
||||
def listToRegexpStr(charList):
|
||||
rv = []
|
||||
for item in charList:
|
||||
if item[0] == item[1]:
|
||||
rv.append(escapeRegexp(chr(item[0])))
|
||||
else:
|
||||
rv.append(escapeRegexp(chr(item[0])) + "-" +
|
||||
escapeRegexp(chr(item[1])))
|
||||
return "[%s]" % "".join(rv)
|
||||
|
||||
|
||||
def hexToInt(hex_str):
|
||||
return int(hex_str, 16)
|
||||
|
||||
|
||||
def escapeRegexp(string):
|
||||
specialCharacters = (".", "^", "$", "*", "+", "?", "{", "}",
|
||||
"[", "]", "|", "(", ")", "-")
|
||||
for char in specialCharacters:
|
||||
string = string.replace(char, "\\" + char)
|
||||
|
||||
return string
|
||||
|
||||
# output from the above
|
||||
nonXmlNameBMPRegexp = re.compile('[\x00-,/:-@\\[-\\^`\\{-\xb6\xb8-\xbf\xd7\xf7\u0132-\u0133\u013f-\u0140\u0149\u017f\u01c4-\u01cc\u01f1-\u01f3\u01f6-\u01f9\u0218-\u024f\u02a9-\u02ba\u02c2-\u02cf\u02d2-\u02ff\u0346-\u035f\u0362-\u0385\u038b\u038d\u03a2\u03cf\u03d7-\u03d9\u03db\u03dd\u03df\u03e1\u03f4-\u0400\u040d\u0450\u045d\u0482\u0487-\u048f\u04c5-\u04c6\u04c9-\u04ca\u04cd-\u04cf\u04ec-\u04ed\u04f6-\u04f7\u04fa-\u0530\u0557-\u0558\u055a-\u0560\u0587-\u0590\u05a2\u05ba\u05be\u05c0\u05c3\u05c5-\u05cf\u05eb-\u05ef\u05f3-\u0620\u063b-\u063f\u0653-\u065f\u066a-\u066f\u06b8-\u06b9\u06bf\u06cf\u06d4\u06e9\u06ee-\u06ef\u06fa-\u0900\u0904\u093a-\u093b\u094e-\u0950\u0955-\u0957\u0964-\u0965\u0970-\u0980\u0984\u098d-\u098e\u0991-\u0992\u09a9\u09b1\u09b3-\u09b5\u09ba-\u09bb\u09bd\u09c5-\u09c6\u09c9-\u09ca\u09ce-\u09d6\u09d8-\u09db\u09de\u09e4-\u09e5\u09f2-\u0a01\u0a03-\u0a04\u0a0b-\u0a0e\u0a11-\u0a12\u0a29\u0a31\u0a34\u0a37\u0a3a-\u0a3b\u0a3d\u0a43-\u0a46\u0a49-\u0a4a\u0a4e-\u0a58\u0a5d\u0a5f-\u0a65\u0a75-\u0a80\u0a84\u0a8c\u0a8e\u0a92\u0aa9\u0ab1\u0ab4\u0aba-\u0abb\u0ac6\u0aca\u0ace-\u0adf\u0ae1-\u0ae5\u0af0-\u0b00\u0b04\u0b0d-\u0b0e\u0b11-\u0b12\u0b29\u0b31\u0b34-\u0b35\u0b3a-\u0b3b\u0b44-\u0b46\u0b49-\u0b4a\u0b4e-\u0b55\u0b58-\u0b5b\u0b5e\u0b62-\u0b65\u0b70-\u0b81\u0b84\u0b8b-\u0b8d\u0b91\u0b96-\u0b98\u0b9b\u0b9d\u0ba0-\u0ba2\u0ba5-\u0ba7\u0bab-\u0bad\u0bb6\u0bba-\u0bbd\u0bc3-\u0bc5\u0bc9\u0bce-\u0bd6\u0bd8-\u0be6\u0bf0-\u0c00\u0c04\u0c0d\u0c11\u0c29\u0c34\u0c3a-\u0c3d\u0c45\u0c49\u0c4e-\u0c54\u0c57-\u0c5f\u0c62-\u0c65\u0c70-\u0c81\u0c84\u0c8d\u0c91\u0ca9\u0cb4\u0cba-\u0cbd\u0cc5\u0cc9\u0cce-\u0cd4\u0cd7-\u0cdd\u0cdf\u0ce2-\u0ce5\u0cf0-\u0d01\u0d04\u0d0d\u0d11\u0d29\u0d3a-\u0d3d\u0d44-\u0d45\u0d49\u0d4e-\u0d56\u0d58-\u0d5f\u0d62-\u0d65\u0d70-\u0e00\u0e2f\u0e3b-\u0e3f\u0e4f\u0e5a-\u0e80\u0e83\u0e85-\u0e86\u0e89\u0e8b-\u0e8c\u0e8e-\u0e93\u0e98\u0ea0\u0ea4\u0ea6\u0ea8-\u0ea9\u0eac\u0eaf\u0eba\u0ebe-\u0ebf\u0ec5\u0ec7\u0ece-\u0ecf\u0eda-\u0f17\u0f1a-\u0f1f\u0f2a-\u0f34\u0f36\u0f38\u0f3a-\u0f3d\u0f48\u0f6a-\u0f70\u0f85\u0f8c-\u0f8f\u0f96\u0f98\u0fae-\u0fb0\u0fb8\u0fba-\u109f\u10c6-\u10cf\u10f7-\u10ff\u1101\u1104\u1108\u110a\u110d\u1113-\u113b\u113d\u113f\u1141-\u114b\u114d\u114f\u1151-\u1153\u1156-\u1158\u115a-\u115e\u1162\u1164\u1166\u1168\u116a-\u116c\u116f-\u1171\u1174\u1176-\u119d\u119f-\u11a7\u11a9-\u11aa\u11ac-\u11ad\u11b0-\u11b6\u11b9\u11bb\u11c3-\u11ea\u11ec-\u11ef\u11f1-\u11f8\u11fa-\u1dff\u1e9c-\u1e9f\u1efa-\u1eff\u1f16-\u1f17\u1f1e-\u1f1f\u1f46-\u1f47\u1f4e-\u1f4f\u1f58\u1f5a\u1f5c\u1f5e\u1f7e-\u1f7f\u1fb5\u1fbd\u1fbf-\u1fc1\u1fc5\u1fcd-\u1fcf\u1fd4-\u1fd5\u1fdc-\u1fdf\u1fed-\u1ff1\u1ff5\u1ffd-\u20cf\u20dd-\u20e0\u20e2-\u2125\u2127-\u2129\u212c-\u212d\u212f-\u217f\u2183-\u3004\u3006\u3008-\u3020\u3030\u3036-\u3040\u3095-\u3098\u309b-\u309c\u309f-\u30a0\u30fb\u30ff-\u3104\u312d-\u4dff\u9fa6-\uabff\ud7a4-\uffff]')
|
||||
|
||||
nonXmlNameFirstBMPRegexp = re.compile('[\x00-@\\[-\\^`\\{-\xbf\xd7\xf7\u0132-\u0133\u013f-\u0140\u0149\u017f\u01c4-\u01cc\u01f1-\u01f3\u01f6-\u01f9\u0218-\u024f\u02a9-\u02ba\u02c2-\u0385\u0387\u038b\u038d\u03a2\u03cf\u03d7-\u03d9\u03db\u03dd\u03df\u03e1\u03f4-\u0400\u040d\u0450\u045d\u0482-\u048f\u04c5-\u04c6\u04c9-\u04ca\u04cd-\u04cf\u04ec-\u04ed\u04f6-\u04f7\u04fa-\u0530\u0557-\u0558\u055a-\u0560\u0587-\u05cf\u05eb-\u05ef\u05f3-\u0620\u063b-\u0640\u064b-\u0670\u06b8-\u06b9\u06bf\u06cf\u06d4\u06d6-\u06e4\u06e7-\u0904\u093a-\u093c\u093e-\u0957\u0962-\u0984\u098d-\u098e\u0991-\u0992\u09a9\u09b1\u09b3-\u09b5\u09ba-\u09db\u09de\u09e2-\u09ef\u09f2-\u0a04\u0a0b-\u0a0e\u0a11-\u0a12\u0a29\u0a31\u0a34\u0a37\u0a3a-\u0a58\u0a5d\u0a5f-\u0a71\u0a75-\u0a84\u0a8c\u0a8e\u0a92\u0aa9\u0ab1\u0ab4\u0aba-\u0abc\u0abe-\u0adf\u0ae1-\u0b04\u0b0d-\u0b0e\u0b11-\u0b12\u0b29\u0b31\u0b34-\u0b35\u0b3a-\u0b3c\u0b3e-\u0b5b\u0b5e\u0b62-\u0b84\u0b8b-\u0b8d\u0b91\u0b96-\u0b98\u0b9b\u0b9d\u0ba0-\u0ba2\u0ba5-\u0ba7\u0bab-\u0bad\u0bb6\u0bba-\u0c04\u0c0d\u0c11\u0c29\u0c34\u0c3a-\u0c5f\u0c62-\u0c84\u0c8d\u0c91\u0ca9\u0cb4\u0cba-\u0cdd\u0cdf\u0ce2-\u0d04\u0d0d\u0d11\u0d29\u0d3a-\u0d5f\u0d62-\u0e00\u0e2f\u0e31\u0e34-\u0e3f\u0e46-\u0e80\u0e83\u0e85-\u0e86\u0e89\u0e8b-\u0e8c\u0e8e-\u0e93\u0e98\u0ea0\u0ea4\u0ea6\u0ea8-\u0ea9\u0eac\u0eaf\u0eb1\u0eb4-\u0ebc\u0ebe-\u0ebf\u0ec5-\u0f3f\u0f48\u0f6a-\u109f\u10c6-\u10cf\u10f7-\u10ff\u1101\u1104\u1108\u110a\u110d\u1113-\u113b\u113d\u113f\u1141-\u114b\u114d\u114f\u1151-\u1153\u1156-\u1158\u115a-\u115e\u1162\u1164\u1166\u1168\u116a-\u116c\u116f-\u1171\u1174\u1176-\u119d\u119f-\u11a7\u11a9-\u11aa\u11ac-\u11ad\u11b0-\u11b6\u11b9\u11bb\u11c3-\u11ea\u11ec-\u11ef\u11f1-\u11f8\u11fa-\u1dff\u1e9c-\u1e9f\u1efa-\u1eff\u1f16-\u1f17\u1f1e-\u1f1f\u1f46-\u1f47\u1f4e-\u1f4f\u1f58\u1f5a\u1f5c\u1f5e\u1f7e-\u1f7f\u1fb5\u1fbd\u1fbf-\u1fc1\u1fc5\u1fcd-\u1fcf\u1fd4-\u1fd5\u1fdc-\u1fdf\u1fed-\u1ff1\u1ff5\u1ffd-\u2125\u2127-\u2129\u212c-\u212d\u212f-\u217f\u2183-\u3006\u3008-\u3020\u302a-\u3040\u3095-\u30a0\u30fb-\u3104\u312d-\u4dff\u9fa6-\uabff\ud7a4-\uffff]')
|
||||
|
||||
# Simpler things
|
||||
nonPubidCharRegexp = re.compile("[^\x20\x0D\x0Aa-zA-Z0-9\-\'()+,./:=?;!*#@$_%]")
|
||||
|
||||
|
||||
class InfosetFilter(object):
|
||||
replacementRegexp = re.compile(r"U[\dA-F]{5,5}")
|
||||
|
||||
def __init__(self, replaceChars=None,
|
||||
dropXmlnsLocalName=False,
|
||||
dropXmlnsAttrNs=False,
|
||||
preventDoubleDashComments=False,
|
||||
preventDashAtCommentEnd=False,
|
||||
replaceFormFeedCharacters=True,
|
||||
preventSingleQuotePubid=False):
|
||||
|
||||
self.dropXmlnsLocalName = dropXmlnsLocalName
|
||||
self.dropXmlnsAttrNs = dropXmlnsAttrNs
|
||||
|
||||
self.preventDoubleDashComments = preventDoubleDashComments
|
||||
self.preventDashAtCommentEnd = preventDashAtCommentEnd
|
||||
|
||||
self.replaceFormFeedCharacters = replaceFormFeedCharacters
|
||||
|
||||
self.preventSingleQuotePubid = preventSingleQuotePubid
|
||||
|
||||
self.replaceCache = {}
|
||||
|
||||
def coerceAttribute(self, name, namespace=None):
|
||||
if self.dropXmlnsLocalName and name.startswith("xmlns:"):
|
||||
warnings.warn("Attributes cannot begin with xmlns", DataLossWarning)
|
||||
return None
|
||||
elif (self.dropXmlnsAttrNs and
|
||||
namespace == "http://www.w3.org/2000/xmlns/"):
|
||||
warnings.warn("Attributes cannot be in the xml namespace", DataLossWarning)
|
||||
return None
|
||||
else:
|
||||
return self.toXmlName(name)
|
||||
|
||||
def coerceElement(self, name, namespace=None):
|
||||
return self.toXmlName(name)
|
||||
|
||||
def coerceComment(self, data):
|
||||
if self.preventDoubleDashComments:
|
||||
while "--" in data:
|
||||
warnings.warn("Comments cannot contain adjacent dashes", DataLossWarning)
|
||||
data = data.replace("--", "- -")
|
||||
return data
|
||||
|
||||
def coerceCharacters(self, data):
|
||||
if self.replaceFormFeedCharacters:
|
||||
for i in range(data.count("\x0C")):
|
||||
warnings.warn("Text cannot contain U+000C", DataLossWarning)
|
||||
data = data.replace("\x0C", " ")
|
||||
# Other non-xml characters
|
||||
return data
|
||||
|
||||
def coercePubid(self, data):
|
||||
dataOutput = data
|
||||
for char in nonPubidCharRegexp.findall(data):
|
||||
warnings.warn("Coercing non-XML pubid", DataLossWarning)
|
||||
replacement = self.getReplacementCharacter(char)
|
||||
dataOutput = dataOutput.replace(char, replacement)
|
||||
if self.preventSingleQuotePubid and dataOutput.find("'") >= 0:
|
||||
warnings.warn("Pubid cannot contain single quote", DataLossWarning)
|
||||
dataOutput = dataOutput.replace("'", self.getReplacementCharacter("'"))
|
||||
return dataOutput
|
||||
|
||||
def toXmlName(self, name):
|
||||
nameFirst = name[0]
|
||||
nameRest = name[1:]
|
||||
m = nonXmlNameFirstBMPRegexp.match(nameFirst)
|
||||
if m:
|
||||
warnings.warn("Coercing non-XML name", DataLossWarning)
|
||||
nameFirstOutput = self.getReplacementCharacter(nameFirst)
|
||||
else:
|
||||
nameFirstOutput = nameFirst
|
||||
|
||||
nameRestOutput = nameRest
|
||||
replaceChars = set(nonXmlNameBMPRegexp.findall(nameRest))
|
||||
for char in replaceChars:
|
||||
warnings.warn("Coercing non-XML name", DataLossWarning)
|
||||
replacement = self.getReplacementCharacter(char)
|
||||
nameRestOutput = nameRestOutput.replace(char, replacement)
|
||||
return nameFirstOutput + nameRestOutput
|
||||
|
||||
def getReplacementCharacter(self, char):
|
||||
if char in self.replaceCache:
|
||||
replacement = self.replaceCache[char]
|
||||
else:
|
||||
replacement = self.escapeChar(char)
|
||||
return replacement
|
||||
|
||||
def fromXmlName(self, name):
|
||||
for item in set(self.replacementRegexp.findall(name)):
|
||||
name = name.replace(item, self.unescapeChar(item))
|
||||
return name
|
||||
|
||||
def escapeChar(self, char):
|
||||
replacement = "U%05X" % ord(char)
|
||||
self.replaceCache[char] = replacement
|
||||
return replacement
|
||||
|
||||
def unescapeChar(self, charcode):
|
||||
return chr(int(charcode[1:], 16))
|
|
@ -0,0 +1,886 @@
|
|||
from __future__ import absolute_import, division, unicode_literals
|
||||
from six import text_type
|
||||
from six.moves import http_client
|
||||
|
||||
import codecs
|
||||
import re
|
||||
|
||||
from .constants import EOF, spaceCharacters, asciiLetters, asciiUppercase
|
||||
from .constants import encodings, ReparseException
|
||||
from . import utils
|
||||
|
||||
from io import StringIO
|
||||
|
||||
try:
|
||||
from io import BytesIO
|
||||
except ImportError:
|
||||
BytesIO = StringIO
|
||||
|
||||
try:
|
||||
from io import BufferedIOBase
|
||||
except ImportError:
|
||||
class BufferedIOBase(object):
|
||||
pass
|
||||
|
||||
# Non-unicode versions of constants for use in the pre-parser
|
||||
spaceCharactersBytes = frozenset([item.encode("ascii") for item in spaceCharacters])
|
||||
asciiLettersBytes = frozenset([item.encode("ascii") for item in asciiLetters])
|
||||
asciiUppercaseBytes = frozenset([item.encode("ascii") for item in asciiUppercase])
|
||||
spacesAngleBrackets = spaceCharactersBytes | frozenset([b">", b"<"])
|
||||
|
||||
invalid_unicode_re = re.compile("[\u0001-\u0008\u000B\u000E-\u001F\u007F-\u009F\uD800-\uDFFF\uFDD0-\uFDEF\uFFFE\uFFFF\U0001FFFE\U0001FFFF\U0002FFFE\U0002FFFF\U0003FFFE\U0003FFFF\U0004FFFE\U0004FFFF\U0005FFFE\U0005FFFF\U0006FFFE\U0006FFFF\U0007FFFE\U0007FFFF\U0008FFFE\U0008FFFF\U0009FFFE\U0009FFFF\U000AFFFE\U000AFFFF\U000BFFFE\U000BFFFF\U000CFFFE\U000CFFFF\U000DFFFE\U000DFFFF\U000EFFFE\U000EFFFF\U000FFFFE\U000FFFFF\U0010FFFE\U0010FFFF]")
|
||||
|
||||
non_bmp_invalid_codepoints = set([0x1FFFE, 0x1FFFF, 0x2FFFE, 0x2FFFF, 0x3FFFE,
|
||||
0x3FFFF, 0x4FFFE, 0x4FFFF, 0x5FFFE, 0x5FFFF,
|
||||
0x6FFFE, 0x6FFFF, 0x7FFFE, 0x7FFFF, 0x8FFFE,
|
||||
0x8FFFF, 0x9FFFE, 0x9FFFF, 0xAFFFE, 0xAFFFF,
|
||||
0xBFFFE, 0xBFFFF, 0xCFFFE, 0xCFFFF, 0xDFFFE,
|
||||
0xDFFFF, 0xEFFFE, 0xEFFFF, 0xFFFFE, 0xFFFFF,
|
||||
0x10FFFE, 0x10FFFF])
|
||||
|
||||
ascii_punctuation_re = re.compile("[\u0009-\u000D\u0020-\u002F\u003A-\u0040\u005B-\u0060\u007B-\u007E]")
|
||||
|
||||
# Cache for charsUntil()
|
||||
charsUntilRegEx = {}
|
||||
|
||||
|
||||
class BufferedStream(object):
|
||||
"""Buffering for streams that do not have buffering of their own
|
||||
|
||||
The buffer is implemented as a list of chunks on the assumption that
|
||||
joining many strings will be slow since it is O(n**2)
|
||||
"""
|
||||
|
||||
def __init__(self, stream):
|
||||
self.stream = stream
|
||||
self.buffer = []
|
||||
self.position = [-1, 0] # chunk number, offset
|
||||
|
||||
def tell(self):
|
||||
pos = 0
|
||||
for chunk in self.buffer[:self.position[0]]:
|
||||
pos += len(chunk)
|
||||
pos += self.position[1]
|
||||
return pos
|
||||
|
||||
def seek(self, pos):
|
||||
assert pos <= self._bufferedBytes()
|
||||
offset = pos
|
||||
i = 0
|
||||
while len(self.buffer[i]) < offset:
|
||||
offset -= len(self.buffer[i])
|
||||
i += 1
|
||||
self.position = [i, offset]
|
||||
|
||||
def read(self, bytes):
|
||||
if not self.buffer:
|
||||
return self._readStream(bytes)
|
||||
elif (self.position[0] == len(self.buffer) and
|
||||
self.position[1] == len(self.buffer[-1])):
|
||||
return self._readStream(bytes)
|
||||
else:
|
||||
return self._readFromBuffer(bytes)
|
||||
|
||||
def _bufferedBytes(self):
|
||||
return sum([len(item) for item in self.buffer])
|
||||
|
||||
def _readStream(self, bytes):
|
||||
data = self.stream.read(bytes)
|
||||
self.buffer.append(data)
|
||||
self.position[0] += 1
|
||||
self.position[1] = len(data)
|
||||
return data
|
||||
|
||||
def _readFromBuffer(self, bytes):
|
||||
remainingBytes = bytes
|
||||
rv = []
|
||||
bufferIndex = self.position[0]
|
||||
bufferOffset = self.position[1]
|
||||
while bufferIndex < len(self.buffer) and remainingBytes != 0:
|
||||
assert remainingBytes > 0
|
||||
bufferedData = self.buffer[bufferIndex]
|
||||
|
||||
if remainingBytes <= len(bufferedData) - bufferOffset:
|
||||
bytesToRead = remainingBytes
|
||||
self.position = [bufferIndex, bufferOffset + bytesToRead]
|
||||
else:
|
||||
bytesToRead = len(bufferedData) - bufferOffset
|
||||
self.position = [bufferIndex, len(bufferedData)]
|
||||
bufferIndex += 1
|
||||
rv.append(bufferedData[bufferOffset:bufferOffset + bytesToRead])
|
||||
remainingBytes -= bytesToRead
|
||||
|
||||
bufferOffset = 0
|
||||
|
||||
if remainingBytes:
|
||||
rv.append(self._readStream(remainingBytes))
|
||||
|
||||
return b"".join(rv)
|
||||
|
||||
|
||||
def HTMLInputStream(source, encoding=None, parseMeta=True, chardet=True):
|
||||
if isinstance(source, http_client.HTTPResponse):
|
||||
# Work around Python bug #20007: read(0) closes the connection.
|
||||
# http://bugs.python.org/issue20007
|
||||
isUnicode = False
|
||||
elif hasattr(source, "read"):
|
||||
isUnicode = isinstance(source.read(0), text_type)
|
||||
else:
|
||||
isUnicode = isinstance(source, text_type)
|
||||
|
||||
if isUnicode:
|
||||
if encoding is not None:
|
||||
raise TypeError("Cannot explicitly set an encoding with a unicode string")
|
||||
|
||||
return HTMLUnicodeInputStream(source)
|
||||
else:
|
||||
return HTMLBinaryInputStream(source, encoding, parseMeta, chardet)
|
||||
|
||||
|
||||
class HTMLUnicodeInputStream(object):
|
||||
"""Provides a unicode stream of characters to the HTMLTokenizer.
|
||||
|
||||
This class takes care of character encoding and removing or replacing
|
||||
incorrect byte-sequences and also provides column and line tracking.
|
||||
|
||||
"""
|
||||
|
||||
_defaultChunkSize = 10240
|
||||
|
||||
def __init__(self, source):
|
||||
"""Initialises the HTMLInputStream.
|
||||
|
||||
HTMLInputStream(source, [encoding]) -> Normalized stream from source
|
||||
for use by html5lib.
|
||||
|
||||
source can be either a file-object, local filename or a string.
|
||||
|
||||
The optional encoding parameter must be a string that indicates
|
||||
the encoding. If specified, that encoding will be used,
|
||||
regardless of any BOM or later declaration (such as in a meta
|
||||
element)
|
||||
|
||||
parseMeta - Look for a <meta> element containing encoding information
|
||||
|
||||
"""
|
||||
|
||||
# Craziness
|
||||
if len("\U0010FFFF") == 1:
|
||||
self.reportCharacterErrors = self.characterErrorsUCS4
|
||||
self.replaceCharactersRegexp = re.compile("[\uD800-\uDFFF]")
|
||||
else:
|
||||
self.reportCharacterErrors = self.characterErrorsUCS2
|
||||
self.replaceCharactersRegexp = re.compile("([\uD800-\uDBFF](?![\uDC00-\uDFFF])|(?<![\uD800-\uDBFF])[\uDC00-\uDFFF])")
|
||||
|
||||
# List of where new lines occur
|
||||
self.newLines = [0]
|
||||
|
||||
self.charEncoding = ("utf-8", "certain")
|
||||
self.dataStream = self.openStream(source)
|
||||
|
||||
self.reset()
|
||||
|
||||
def reset(self):
|
||||
self.chunk = ""
|
||||
self.chunkSize = 0
|
||||
self.chunkOffset = 0
|
||||
self.errors = []
|
||||
|
||||
# number of (complete) lines in previous chunks
|
||||
self.prevNumLines = 0
|
||||
# number of columns in the last line of the previous chunk
|
||||
self.prevNumCols = 0
|
||||
|
||||
# Deal with CR LF and surrogates split over chunk boundaries
|
||||
self._bufferedCharacter = None
|
||||
|
||||
def openStream(self, source):
|
||||
"""Produces a file object from source.
|
||||
|
||||
source can be either a file object, local filename or a string.
|
||||
|
||||
"""
|
||||
# Already a file object
|
||||
if hasattr(source, 'read'):
|
||||
stream = source
|
||||
else:
|
||||
stream = StringIO(source)
|
||||
|
||||
return stream
|
||||
|
||||
def _position(self, offset):
|
||||
chunk = self.chunk
|
||||
nLines = chunk.count('\n', 0, offset)
|
||||
positionLine = self.prevNumLines + nLines
|
||||
lastLinePos = chunk.rfind('\n', 0, offset)
|
||||
if lastLinePos == -1:
|
||||
positionColumn = self.prevNumCols + offset
|
||||
else:
|
||||
positionColumn = offset - (lastLinePos + 1)
|
||||
return (positionLine, positionColumn)
|
||||
|
||||
def position(self):
|
||||
"""Returns (line, col) of the current position in the stream."""
|
||||
line, col = self._position(self.chunkOffset)
|
||||
return (line + 1, col)
|
||||
|
||||
def char(self):
|
||||
""" Read one character from the stream or queue if available. Return
|
||||
EOF when EOF is reached.
|
||||
"""
|
||||
# Read a new chunk from the input stream if necessary
|
||||
if self.chunkOffset >= self.chunkSize:
|
||||
if not self.readChunk():
|
||||
return EOF
|
||||
|
||||
chunkOffset = self.chunkOffset
|
||||
char = self.chunk[chunkOffset]
|
||||
self.chunkOffset = chunkOffset + 1
|
||||
|
||||
return char
|
||||
|
||||
def readChunk(self, chunkSize=None):
|
||||
if chunkSize is None:
|
||||
chunkSize = self._defaultChunkSize
|
||||
|
||||
self.prevNumLines, self.prevNumCols = self._position(self.chunkSize)
|
||||
|
||||
self.chunk = ""
|
||||
self.chunkSize = 0
|
||||
self.chunkOffset = 0
|
||||
|
||||
data = self.dataStream.read(chunkSize)
|
||||
|
||||
# Deal with CR LF and surrogates broken across chunks
|
||||
if self._bufferedCharacter:
|
||||
data = self._bufferedCharacter + data
|
||||
self._bufferedCharacter = None
|
||||
elif not data:
|
||||
# We have no more data, bye-bye stream
|
||||
return False
|
||||
|
||||
if len(data) > 1:
|
||||
lastv = ord(data[-1])
|
||||
if lastv == 0x0D or 0xD800 <= lastv <= 0xDBFF:
|
||||
self._bufferedCharacter = data[-1]
|
||||
data = data[:-1]
|
||||
|
||||
self.reportCharacterErrors(data)
|
||||
|
||||
# Replace invalid characters
|
||||
# Note U+0000 is dealt with in the tokenizer
|
||||
data = self.replaceCharactersRegexp.sub("\ufffd", data)
|
||||
|
||||
data = data.replace("\r\n", "\n")
|
||||
data = data.replace("\r", "\n")
|
||||
|
||||
self.chunk = data
|
||||
self.chunkSize = len(data)
|
||||
|
||||
return True
|
||||
|
||||
def characterErrorsUCS4(self, data):
|
||||
for i in range(len(invalid_unicode_re.findall(data))):
|
||||
self.errors.append("invalid-codepoint")
|
||||
|
||||
def characterErrorsUCS2(self, data):
|
||||
# Someone picked the wrong compile option
|
||||
# You lose
|
||||
skip = False
|
||||
for match in invalid_unicode_re.finditer(data):
|
||||
if skip:
|
||||
continue
|
||||
codepoint = ord(match.group())
|
||||
pos = match.start()
|
||||
# Pretty sure there should be endianness issues here
|
||||
if utils.isSurrogatePair(data[pos:pos + 2]):
|
||||
# We have a surrogate pair!
|
||||
char_val = utils.surrogatePairToCodepoint(data[pos:pos + 2])
|
||||
if char_val in non_bmp_invalid_codepoints:
|
||||
self.errors.append("invalid-codepoint")
|
||||
skip = True
|
||||
elif (codepoint >= 0xD800 and codepoint <= 0xDFFF and
|
||||
pos == len(data) - 1):
|
||||
self.errors.append("invalid-codepoint")
|
||||
else:
|
||||
skip = False
|
||||
self.errors.append("invalid-codepoint")
|
||||
|
||||
def charsUntil(self, characters, opposite=False):
|
||||
""" Returns a string of characters from the stream up to but not
|
||||
including any character in 'characters' or EOF. 'characters' must be
|
||||
a container that supports the 'in' method and iteration over its
|
||||
characters.
|
||||
"""
|
||||
|
||||
# Use a cache of regexps to find the required characters
|
||||
try:
|
||||
chars = charsUntilRegEx[(characters, opposite)]
|
||||
except KeyError:
|
||||
if __debug__:
|
||||
for c in characters:
|
||||
assert(ord(c) < 128)
|
||||
regex = "".join(["\\x%02x" % ord(c) for c in characters])
|
||||
if not opposite:
|
||||
regex = "^%s" % regex
|
||||
chars = charsUntilRegEx[(characters, opposite)] = re.compile("[%s]+" % regex)
|
||||
|
||||
rv = []
|
||||
|
||||
while True:
|
||||
# Find the longest matching prefix
|
||||
m = chars.match(self.chunk, self.chunkOffset)
|
||||
if m is None:
|
||||
# If nothing matched, and it wasn't because we ran out of chunk,
|
||||
# then stop
|
||||
if self.chunkOffset != self.chunkSize:
|
||||
break
|
||||
else:
|
||||
end = m.end()
|
||||
# If not the whole chunk matched, return everything
|
||||
# up to the part that didn't match
|
||||
if end != self.chunkSize:
|
||||
rv.append(self.chunk[self.chunkOffset:end])
|
||||
self.chunkOffset = end
|
||||
break
|
||||
# If the whole remainder of the chunk matched,
|
||||
# use it all and read the next chunk
|
||||
rv.append(self.chunk[self.chunkOffset:])
|
||||
if not self.readChunk():
|
||||
# Reached EOF
|
||||
break
|
||||
|
||||
r = "".join(rv)
|
||||
return r
|
||||
|
||||
def unget(self, char):
|
||||
# Only one character is allowed to be ungotten at once - it must
|
||||
# be consumed again before any further call to unget
|
||||
if char is not None:
|
||||
if self.chunkOffset == 0:
|
||||
# unget is called quite rarely, so it's a good idea to do
|
||||
# more work here if it saves a bit of work in the frequently
|
||||
# called char and charsUntil.
|
||||
# So, just prepend the ungotten character onto the current
|
||||
# chunk:
|
||||
self.chunk = char + self.chunk
|
||||
self.chunkSize += 1
|
||||
else:
|
||||
self.chunkOffset -= 1
|
||||
assert self.chunk[self.chunkOffset] == char
|
||||
|
||||
|
||||
class HTMLBinaryInputStream(HTMLUnicodeInputStream):
|
||||
"""Provides a unicode stream of characters to the HTMLTokenizer.
|
||||
|
||||
This class takes care of character encoding and removing or replacing
|
||||
incorrect byte-sequences and also provides column and line tracking.
|
||||
|
||||
"""
|
||||
|
||||
def __init__(self, source, encoding=None, parseMeta=True, chardet=True):
|
||||
"""Initialises the HTMLInputStream.
|
||||
|
||||
HTMLInputStream(source, [encoding]) -> Normalized stream from source
|
||||
for use by html5lib.
|
||||
|
||||
source can be either a file-object, local filename or a string.
|
||||
|
||||
The optional encoding parameter must be a string that indicates
|
||||
the encoding. If specified, that encoding will be used,
|
||||
regardless of any BOM or later declaration (such as in a meta
|
||||
element)
|
||||
|
||||
parseMeta - Look for a <meta> element containing encoding information
|
||||
|
||||
"""
|
||||
# Raw Stream - for unicode objects this will encode to utf-8 and set
|
||||
# self.charEncoding as appropriate
|
||||
self.rawStream = self.openStream(source)
|
||||
|
||||
HTMLUnicodeInputStream.__init__(self, self.rawStream)
|
||||
|
||||
self.charEncoding = (codecName(encoding), "certain")
|
||||
|
||||
# Encoding Information
|
||||
# Number of bytes to use when looking for a meta element with
|
||||
# encoding information
|
||||
self.numBytesMeta = 512
|
||||
# Number of bytes to use when using detecting encoding using chardet
|
||||
self.numBytesChardet = 100
|
||||
# Encoding to use if no other information can be found
|
||||
self.defaultEncoding = "windows-1252"
|
||||
|
||||
# Detect encoding iff no explicit "transport level" encoding is supplied
|
||||
if (self.charEncoding[0] is None):
|
||||
self.charEncoding = self.detectEncoding(parseMeta, chardet)
|
||||
|
||||
# Call superclass
|
||||
self.reset()
|
||||
|
||||
def reset(self):
|
||||
self.dataStream = codecs.getreader(self.charEncoding[0])(self.rawStream,
|
||||
'replace')
|
||||
HTMLUnicodeInputStream.reset(self)
|
||||
|
||||
def openStream(self, source):
|
||||
"""Produces a file object from source.
|
||||
|
||||
source can be either a file object, local filename or a string.
|
||||
|
||||
"""
|
||||
# Already a file object
|
||||
if hasattr(source, 'read'):
|
||||
stream = source
|
||||
else:
|
||||
stream = BytesIO(source)
|
||||
|
||||
try:
|
||||
stream.seek(stream.tell())
|
||||
except:
|
||||
stream = BufferedStream(stream)
|
||||
|
||||
return stream
|
||||
|
||||
def detectEncoding(self, parseMeta=True, chardet=True):
|
||||
# First look for a BOM
|
||||
# This will also read past the BOM if present
|
||||
encoding = self.detectBOM()
|
||||
confidence = "certain"
|
||||
# If there is no BOM need to look for meta elements with encoding
|
||||
# information
|
||||
if encoding is None and parseMeta:
|
||||
encoding = self.detectEncodingMeta()
|
||||
confidence = "tentative"
|
||||
# Guess with chardet, if avaliable
|
||||
if encoding is None and chardet:
|
||||
confidence = "tentative"
|
||||
try:
|
||||
try:
|
||||
from charade.universaldetector import UniversalDetector
|
||||
except ImportError:
|
||||
from chardet.universaldetector import UniversalDetector
|
||||
buffers = []
|
||||
detector = UniversalDetector()
|
||||
while not detector.done:
|
||||
buffer = self.rawStream.read(self.numBytesChardet)
|
||||
assert isinstance(buffer, bytes)
|
||||
if not buffer:
|
||||
break
|
||||
buffers.append(buffer)
|
||||
detector.feed(buffer)
|
||||
detector.close()
|
||||
encoding = detector.result['encoding']
|
||||
self.rawStream.seek(0)
|
||||
except ImportError:
|
||||
pass
|
||||
# If all else fails use the default encoding
|
||||
if encoding is None:
|
||||
confidence = "tentative"
|
||||
encoding = self.defaultEncoding
|
||||
|
||||
# Substitute for equivalent encodings:
|
||||
encodingSub = {"iso-8859-1": "windows-1252"}
|
||||
|
||||
if encoding.lower() in encodingSub:
|
||||
encoding = encodingSub[encoding.lower()]
|
||||
|
||||
return encoding, confidence
|
||||
|
||||
def changeEncoding(self, newEncoding):
|
||||
assert self.charEncoding[1] != "certain"
|
||||
newEncoding = codecName(newEncoding)
|
||||
if newEncoding in ("utf-16", "utf-16-be", "utf-16-le"):
|
||||
newEncoding = "utf-8"
|
||||
if newEncoding is None:
|
||||
return
|
||||
elif newEncoding == self.charEncoding[0]:
|
||||
self.charEncoding = (self.charEncoding[0], "certain")
|
||||
else:
|
||||
self.rawStream.seek(0)
|
||||
self.reset()
|
||||
self.charEncoding = (newEncoding, "certain")
|
||||
raise ReparseException("Encoding changed from %s to %s" % (self.charEncoding[0], newEncoding))
|
||||
|
||||
def detectBOM(self):
|
||||
"""Attempts to detect at BOM at the start of the stream. If
|
||||
an encoding can be determined from the BOM return the name of the
|
||||
encoding otherwise return None"""
|
||||
bomDict = {
|
||||
codecs.BOM_UTF8: 'utf-8',
|
||||
codecs.BOM_UTF16_LE: 'utf-16-le', codecs.BOM_UTF16_BE: 'utf-16-be',
|
||||
codecs.BOM_UTF32_LE: 'utf-32-le', codecs.BOM_UTF32_BE: 'utf-32-be'
|
||||
}
|
||||
|
||||
# Go to beginning of file and read in 4 bytes
|
||||
string = self.rawStream.read(4)
|
||||
assert isinstance(string, bytes)
|
||||
|
||||
# Try detecting the BOM using bytes from the string
|
||||
encoding = bomDict.get(string[:3]) # UTF-8
|
||||
seek = 3
|
||||
if not encoding:
|
||||
# Need to detect UTF-32 before UTF-16
|
||||
encoding = bomDict.get(string) # UTF-32
|
||||
seek = 4
|
||||
if not encoding:
|
||||
encoding = bomDict.get(string[:2]) # UTF-16
|
||||
seek = 2
|
||||
|
||||
# Set the read position past the BOM if one was found, otherwise
|
||||
# set it to the start of the stream
|
||||
self.rawStream.seek(encoding and seek or 0)
|
||||
|
||||
return encoding
|
||||
|
||||
def detectEncodingMeta(self):
|
||||
"""Report the encoding declared by the meta element
|
||||
"""
|
||||
buffer = self.rawStream.read(self.numBytesMeta)
|
||||
assert isinstance(buffer, bytes)
|
||||
parser = EncodingParser(buffer)
|
||||
self.rawStream.seek(0)
|
||||
encoding = parser.getEncoding()
|
||||
|
||||
if encoding in ("utf-16", "utf-16-be", "utf-16-le"):
|
||||
encoding = "utf-8"
|
||||
|
||||
return encoding
|
||||
|
||||
|
||||
class EncodingBytes(bytes):
|
||||
"""String-like object with an associated position and various extra methods
|
||||
If the position is ever greater than the string length then an exception is
|
||||
raised"""
|
||||
def __new__(self, value):
|
||||
assert isinstance(value, bytes)
|
||||
return bytes.__new__(self, value.lower())
|
||||
|
||||
def __init__(self, value):
|
||||
self._position = -1
|
||||
|
||||
def __iter__(self):
|
||||
return self
|
||||
|
||||
def __next__(self):
|
||||
p = self._position = self._position + 1
|
||||
if p >= len(self):
|
||||
raise StopIteration
|
||||
elif p < 0:
|
||||
raise TypeError
|
||||
return self[p:p + 1]
|
||||
|
||||
def next(self):
|
||||
# Py2 compat
|
||||
return self.__next__()
|
||||
|
||||
def previous(self):
|
||||
p = self._position
|
||||
if p >= len(self):
|
||||
raise StopIteration
|
||||
elif p < 0:
|
||||
raise TypeError
|
||||
self._position = p = p - 1
|
||||
return self[p:p + 1]
|
||||
|
||||
def setPosition(self, position):
|
||||
if self._position >= len(self):
|
||||
raise StopIteration
|
||||
self._position = position
|
||||
|
||||
def getPosition(self):
|
||||
if self._position >= len(self):
|
||||
raise StopIteration
|
||||
if self._position >= 0:
|
||||
return self._position
|
||||
else:
|
||||
return None
|
||||
|
||||
position = property(getPosition, setPosition)
|
||||
|
||||
def getCurrentByte(self):
|
||||
return self[self.position:self.position + 1]
|
||||
|
||||
currentByte = property(getCurrentByte)
|
||||
|
||||
def skip(self, chars=spaceCharactersBytes):
|
||||
"""Skip past a list of characters"""
|
||||
p = self.position # use property for the error-checking
|
||||
while p < len(self):
|
||||
c = self[p:p + 1]
|
||||
if c not in chars:
|
||||
self._position = p
|
||||
return c
|
||||
p += 1
|
||||
self._position = p
|
||||
return None
|
||||
|
||||
def skipUntil(self, chars):
|
||||
p = self.position
|
||||
while p < len(self):
|
||||
c = self[p:p + 1]
|
||||
if c in chars:
|
||||
self._position = p
|
||||
return c
|
||||
p += 1
|
||||
self._position = p
|
||||
return None
|
||||
|
||||
def matchBytes(self, bytes):
|
||||
"""Look for a sequence of bytes at the start of a string. If the bytes
|
||||
are found return True and advance the position to the byte after the
|
||||
match. Otherwise return False and leave the position alone"""
|
||||
p = self.position
|
||||
data = self[p:p + len(bytes)]
|
||||
rv = data.startswith(bytes)
|
||||
if rv:
|
||||
self.position += len(bytes)
|
||||
return rv
|
||||
|
||||
def jumpTo(self, bytes):
|
||||
"""Look for the next sequence of bytes matching a given sequence. If
|
||||
a match is found advance the position to the last byte of the match"""
|
||||
newPosition = self[self.position:].find(bytes)
|
||||
if newPosition > -1:
|
||||
# XXX: This is ugly, but I can't see a nicer way to fix this.
|
||||
if self._position == -1:
|
||||
self._position = 0
|
||||
self._position += (newPosition + len(bytes) - 1)
|
||||
return True
|
||||
else:
|
||||
raise StopIteration
|
||||
|
||||
|
||||
class EncodingParser(object):
|
||||
"""Mini parser for detecting character encoding from meta elements"""
|
||||
|
||||
def __init__(self, data):
|
||||
"""string - the data to work on for encoding detection"""
|
||||
self.data = EncodingBytes(data)
|
||||
self.encoding = None
|
||||
|
||||
def getEncoding(self):
|
||||
methodDispatch = (
|
||||
(b"<!--", self.handleComment),
|
||||
(b"<meta", self.handleMeta),
|
||||
(b"</", self.handlePossibleEndTag),
|
||||
(b"<!", self.handleOther),
|
||||
(b"<?", self.handleOther),
|
||||
(b"<", self.handlePossibleStartTag))
|
||||
for byte in self.data:
|
||||
keepParsing = True
|
||||
for key, method in methodDispatch:
|
||||
if self.data.matchBytes(key):
|
||||
try:
|
||||
keepParsing = method()
|
||||
break
|
||||
except StopIteration:
|
||||
keepParsing = False
|
||||
break
|
||||
if not keepParsing:
|
||||
break
|
||||
|
||||
return self.encoding
|
||||
|
||||
def handleComment(self):
|
||||
"""Skip over comments"""
|
||||
return self.data.jumpTo(b"-->")
|
||||
|
||||
def handleMeta(self):
|
||||
if self.data.currentByte not in spaceCharactersBytes:
|
||||
# if we have <meta not followed by a space so just keep going
|
||||
return True
|
||||
# We have a valid meta element we want to search for attributes
|
||||
hasPragma = False
|
||||
pendingEncoding = None
|
||||
while True:
|
||||
# Try to find the next attribute after the current position
|
||||
attr = self.getAttribute()
|
||||
if attr is None:
|
||||
return True
|
||||
else:
|
||||
if attr[0] == b"http-equiv":
|
||||
hasPragma = attr[1] == b"content-type"
|
||||
if hasPragma and pendingEncoding is not None:
|
||||
self.encoding = pendingEncoding
|
||||
return False
|
||||
elif attr[0] == b"charset":
|
||||
tentativeEncoding = attr[1]
|
||||
codec = codecName(tentativeEncoding)
|
||||
if codec is not None:
|
||||
self.encoding = codec
|
||||
return False
|
||||
elif attr[0] == b"content":
|
||||
contentParser = ContentAttrParser(EncodingBytes(attr[1]))
|
||||
tentativeEncoding = contentParser.parse()
|
||||
if tentativeEncoding is not None:
|
||||
codec = codecName(tentativeEncoding)
|
||||
if codec is not None:
|
||||
if hasPragma:
|
||||
self.encoding = codec
|
||||
return False
|
||||
else:
|
||||
pendingEncoding = codec
|
||||
|
||||
def handlePossibleStartTag(self):
|
||||
return self.handlePossibleTag(False)
|
||||
|
||||
def handlePossibleEndTag(self):
|
||||
next(self.data)
|
||||
return self.handlePossibleTag(True)
|
||||
|
||||
def handlePossibleTag(self, endTag):
|
||||
data = self.data
|
||||
if data.currentByte not in asciiLettersBytes:
|
||||
# If the next byte is not an ascii letter either ignore this
|
||||
# fragment (possible start tag case) or treat it according to
|
||||
# handleOther
|
||||
if endTag:
|
||||
data.previous()
|
||||
self.handleOther()
|
||||
return True
|
||||
|
||||
c = data.skipUntil(spacesAngleBrackets)
|
||||
if c == b"<":
|
||||
# return to the first step in the overall "two step" algorithm
|
||||
# reprocessing the < byte
|
||||
data.previous()
|
||||
else:
|
||||
# Read all attributes
|
||||
attr = self.getAttribute()
|
||||
while attr is not None:
|
||||
attr = self.getAttribute()
|
||||
return True
|
||||
|
||||
def handleOther(self):
|
||||
return self.data.jumpTo(b">")
|
||||
|
||||
def getAttribute(self):
|
||||
"""Return a name,value pair for the next attribute in the stream,
|
||||
if one is found, or None"""
|
||||
data = self.data
|
||||
# Step 1 (skip chars)
|
||||
c = data.skip(spaceCharactersBytes | frozenset([b"/"]))
|
||||
assert c is None or len(c) == 1
|
||||
# Step 2
|
||||
if c in (b">", None):
|
||||
return None
|
||||
# Step 3
|
||||
attrName = []
|
||||
attrValue = []
|
||||
# Step 4 attribute name
|
||||
while True:
|
||||
if c == b"=" and attrName:
|
||||
break
|
||||
elif c in spaceCharactersBytes:
|
||||
# Step 6!
|
||||
c = data.skip()
|
||||
break
|
||||
elif c in (b"/", b">"):
|
||||
return b"".join(attrName), b""
|
||||
elif c in asciiUppercaseBytes:
|
||||
attrName.append(c.lower())
|
||||
elif c is None:
|
||||
return None
|
||||
else:
|
||||
attrName.append(c)
|
||||
# Step 5
|
||||
c = next(data)
|
||||
# Step 7
|
||||
if c != b"=":
|
||||
data.previous()
|
||||
return b"".join(attrName), b""
|
||||
# Step 8
|
||||
next(data)
|
||||
# Step 9
|
||||
c = data.skip()
|
||||
# Step 10
|
||||
if c in (b"'", b'"'):
|
||||
# 10.1
|
||||
quoteChar = c
|
||||
while True:
|
||||
# 10.2
|
||||
c = next(data)
|
||||
# 10.3
|
||||
if c == quoteChar:
|
||||
next(data)
|
||||
return b"".join(attrName), b"".join(attrValue)
|
||||
# 10.4
|
||||
elif c in asciiUppercaseBytes:
|
||||
attrValue.append(c.lower())
|
||||
# 10.5
|
||||
else:
|
||||
attrValue.append(c)
|
||||
elif c == b">":
|
||||
return b"".join(attrName), b""
|
||||
elif c in asciiUppercaseBytes:
|
||||
attrValue.append(c.lower())
|
||||
elif c is None:
|
||||
return None
|
||||
else:
|
||||
attrValue.append(c)
|
||||
# Step 11
|
||||
while True:
|
||||
c = next(data)
|
||||
if c in spacesAngleBrackets:
|
||||
return b"".join(attrName), b"".join(attrValue)
|
||||
elif c in asciiUppercaseBytes:
|
||||
attrValue.append(c.lower())
|
||||
elif c is None:
|
||||
return None
|
||||
else:
|
||||
attrValue.append(c)
|
||||
|
||||
|
||||
class ContentAttrParser(object):
|
||||
def __init__(self, data):
|
||||
assert isinstance(data, bytes)
|
||||
self.data = data
|
||||
|
||||
def parse(self):
|
||||
try:
|
||||
# Check if the attr name is charset
|
||||
# otherwise return
|
||||
self.data.jumpTo(b"charset")
|
||||
self.data.position += 1
|
||||
self.data.skip()
|
||||
if not self.data.currentByte == b"=":
|
||||
# If there is no = sign keep looking for attrs
|
||||
return None
|
||||
self.data.position += 1
|
||||
self.data.skip()
|
||||
# Look for an encoding between matching quote marks
|
||||
if self.data.currentByte in (b'"', b"'"):
|
||||
quoteMark = self.data.currentByte
|
||||
self.data.position += 1
|
||||
oldPosition = self.data.position
|
||||
if self.data.jumpTo(quoteMark):
|
||||
return self.data[oldPosition:self.data.position]
|
||||
else:
|
||||
return None
|
||||
else:
|
||||
# Unquoted value
|
||||
oldPosition = self.data.position
|
||||
try:
|
||||
self.data.skipUntil(spaceCharactersBytes)
|
||||
return self.data[oldPosition:self.data.position]
|
||||
except StopIteration:
|
||||
# Return the whole remaining value
|
||||
return self.data[oldPosition:]
|
||||
except StopIteration:
|
||||
return None
|
||||
|
||||
|
||||
def codecName(encoding):
|
||||
"""Return the python codec name corresponding to an encoding or None if the
|
||||
string doesn't correspond to a valid encoding."""
|
||||
if isinstance(encoding, bytes):
|
||||
try:
|
||||
encoding = encoding.decode("ascii")
|
||||
except UnicodeDecodeError:
|
||||
return None
|
||||
if encoding:
|
||||
canonicalName = ascii_punctuation_re.sub("", encoding).lower()
|
||||
return encodings.get(canonicalName, None)
|
||||
else:
|
||||
return None
|
|
@ -0,0 +1,271 @@
|
|||
from __future__ import absolute_import, division, unicode_literals
|
||||
|
||||
import re
|
||||
from xml.sax.saxutils import escape, unescape
|
||||
|
||||
from .tokenizer import HTMLTokenizer
|
||||
from .constants import tokenTypes
|
||||
|
||||
|
||||
class HTMLSanitizerMixin(object):
|
||||
""" sanitization of XHTML+MathML+SVG and of inline style attributes."""
|
||||
|
||||
acceptable_elements = ['a', 'abbr', 'acronym', 'address', 'area',
|
||||
'article', 'aside', 'audio', 'b', 'big', 'blockquote', 'br', 'button',
|
||||
'canvas', 'caption', 'center', 'cite', 'code', 'col', 'colgroup',
|
||||
'command', 'datagrid', 'datalist', 'dd', 'del', 'details', 'dfn',
|
||||
'dialog', 'dir', 'div', 'dl', 'dt', 'em', 'event-source', 'fieldset',
|
||||
'figcaption', 'figure', 'footer', 'font', 'form', 'header', 'h1',
|
||||
'h2', 'h3', 'h4', 'h5', 'h6', 'hr', 'i', 'img', 'input', 'ins',
|
||||
'keygen', 'kbd', 'label', 'legend', 'li', 'm', 'map', 'menu', 'meter',
|
||||
'multicol', 'nav', 'nextid', 'ol', 'output', 'optgroup', 'option',
|
||||
'p', 'pre', 'progress', 'q', 's', 'samp', 'section', 'select',
|
||||
'small', 'sound', 'source', 'spacer', 'span', 'strike', 'strong',
|
||||
'sub', 'sup', 'table', 'tbody', 'td', 'textarea', 'time', 'tfoot',
|
||||
'th', 'thead', 'tr', 'tt', 'u', 'ul', 'var', 'video']
|
||||
|
||||
mathml_elements = ['maction', 'math', 'merror', 'mfrac', 'mi',
|
||||
'mmultiscripts', 'mn', 'mo', 'mover', 'mpadded', 'mphantom',
|
||||
'mprescripts', 'mroot', 'mrow', 'mspace', 'msqrt', 'mstyle', 'msub',
|
||||
'msubsup', 'msup', 'mtable', 'mtd', 'mtext', 'mtr', 'munder',
|
||||
'munderover', 'none']
|
||||
|
||||
svg_elements = ['a', 'animate', 'animateColor', 'animateMotion',
|
||||
'animateTransform', 'clipPath', 'circle', 'defs', 'desc', 'ellipse',
|
||||
'font-face', 'font-face-name', 'font-face-src', 'g', 'glyph', 'hkern',
|
||||
'linearGradient', 'line', 'marker', 'metadata', 'missing-glyph',
|
||||
'mpath', 'path', 'polygon', 'polyline', 'radialGradient', 'rect',
|
||||
'set', 'stop', 'svg', 'switch', 'text', 'title', 'tspan', 'use']
|
||||
|
||||
acceptable_attributes = ['abbr', 'accept', 'accept-charset', 'accesskey',
|
||||
'action', 'align', 'alt', 'autocomplete', 'autofocus', 'axis',
|
||||
'background', 'balance', 'bgcolor', 'bgproperties', 'border',
|
||||
'bordercolor', 'bordercolordark', 'bordercolorlight', 'bottompadding',
|
||||
'cellpadding', 'cellspacing', 'ch', 'challenge', 'char', 'charoff',
|
||||
'choff', 'charset', 'checked', 'cite', 'class', 'clear', 'color',
|
||||
'cols', 'colspan', 'compact', 'contenteditable', 'controls', 'coords',
|
||||
'data', 'datafld', 'datapagesize', 'datasrc', 'datetime', 'default',
|
||||
'delay', 'dir', 'disabled', 'draggable', 'dynsrc', 'enctype', 'end',
|
||||
'face', 'for', 'form', 'frame', 'galleryimg', 'gutter', 'headers',
|
||||
'height', 'hidefocus', 'hidden', 'high', 'href', 'hreflang', 'hspace',
|
||||
'icon', 'id', 'inputmode', 'ismap', 'keytype', 'label', 'leftspacing',
|
||||
'lang', 'list', 'longdesc', 'loop', 'loopcount', 'loopend',
|
||||
'loopstart', 'low', 'lowsrc', 'max', 'maxlength', 'media', 'method',
|
||||
'min', 'multiple', 'name', 'nohref', 'noshade', 'nowrap', 'open',
|
||||
'optimum', 'pattern', 'ping', 'point-size', 'poster', 'pqg', 'preload',
|
||||
'prompt', 'radiogroup', 'readonly', 'rel', 'repeat-max', 'repeat-min',
|
||||
'replace', 'required', 'rev', 'rightspacing', 'rows', 'rowspan',
|
||||
'rules', 'scope', 'selected', 'shape', 'size', 'span', 'src', 'start',
|
||||
'step', 'style', 'summary', 'suppress', 'tabindex', 'target',
|
||||
'template', 'title', 'toppadding', 'type', 'unselectable', 'usemap',
|
||||
'urn', 'valign', 'value', 'variable', 'volume', 'vspace', 'vrml',
|
||||
'width', 'wrap', 'xml:lang']
|
||||
|
||||
mathml_attributes = ['actiontype', 'align', 'columnalign', 'columnalign',
|
||||
'columnalign', 'columnlines', 'columnspacing', 'columnspan', 'depth',
|
||||
'display', 'displaystyle', 'equalcolumns', 'equalrows', 'fence',
|
||||
'fontstyle', 'fontweight', 'frame', 'height', 'linethickness', 'lspace',
|
||||
'mathbackground', 'mathcolor', 'mathvariant', 'mathvariant', 'maxsize',
|
||||
'minsize', 'other', 'rowalign', 'rowalign', 'rowalign', 'rowlines',
|
||||
'rowspacing', 'rowspan', 'rspace', 'scriptlevel', 'selection',
|
||||
'separator', 'stretchy', 'width', 'width', 'xlink:href', 'xlink:show',
|
||||
'xlink:type', 'xmlns', 'xmlns:xlink']
|
||||
|
||||
svg_attributes = ['accent-height', 'accumulate', 'additive', 'alphabetic',
|
||||
'arabic-form', 'ascent', 'attributeName', 'attributeType',
|
||||
'baseProfile', 'bbox', 'begin', 'by', 'calcMode', 'cap-height',
|
||||
'class', 'clip-path', 'color', 'color-rendering', 'content', 'cx',
|
||||
'cy', 'd', 'dx', 'dy', 'descent', 'display', 'dur', 'end', 'fill',
|
||||
'fill-opacity', 'fill-rule', 'font-family', 'font-size',
|
||||
'font-stretch', 'font-style', 'font-variant', 'font-weight', 'from',
|
||||
'fx', 'fy', 'g1', 'g2', 'glyph-name', 'gradientUnits', 'hanging',
|
||||
'height', 'horiz-adv-x', 'horiz-origin-x', 'id', 'ideographic', 'k',
|
||||
'keyPoints', 'keySplines', 'keyTimes', 'lang', 'marker-end',
|
||||
'marker-mid', 'marker-start', 'markerHeight', 'markerUnits',
|
||||
'markerWidth', 'mathematical', 'max', 'min', 'name', 'offset',
|
||||
'opacity', 'orient', 'origin', 'overline-position',
|
||||
'overline-thickness', 'panose-1', 'path', 'pathLength', 'points',
|
||||
'preserveAspectRatio', 'r', 'refX', 'refY', 'repeatCount',
|
||||
'repeatDur', 'requiredExtensions', 'requiredFeatures', 'restart',
|
||||
'rotate', 'rx', 'ry', 'slope', 'stemh', 'stemv', 'stop-color',
|
||||
'stop-opacity', 'strikethrough-position', 'strikethrough-thickness',
|
||||
'stroke', 'stroke-dasharray', 'stroke-dashoffset', 'stroke-linecap',
|
||||
'stroke-linejoin', 'stroke-miterlimit', 'stroke-opacity',
|
||||
'stroke-width', 'systemLanguage', 'target', 'text-anchor', 'to',
|
||||
'transform', 'type', 'u1', 'u2', 'underline-position',
|
||||
'underline-thickness', 'unicode', 'unicode-range', 'units-per-em',
|
||||
'values', 'version', 'viewBox', 'visibility', 'width', 'widths', 'x',
|
||||
'x-height', 'x1', 'x2', 'xlink:actuate', 'xlink:arcrole',
|
||||
'xlink:href', 'xlink:role', 'xlink:show', 'xlink:title', 'xlink:type',
|
||||
'xml:base', 'xml:lang', 'xml:space', 'xmlns', 'xmlns:xlink', 'y',
|
||||
'y1', 'y2', 'zoomAndPan']
|
||||
|
||||
attr_val_is_uri = ['href', 'src', 'cite', 'action', 'longdesc', 'poster',
|
||||
'xlink:href', 'xml:base']
|
||||
|
||||
svg_attr_val_allows_ref = ['clip-path', 'color-profile', 'cursor', 'fill',
|
||||
'filter', 'marker', 'marker-start', 'marker-mid', 'marker-end',
|
||||
'mask', 'stroke']
|
||||
|
||||
svg_allow_local_href = ['altGlyph', 'animate', 'animateColor',
|
||||
'animateMotion', 'animateTransform', 'cursor', 'feImage', 'filter',
|
||||
'linearGradient', 'pattern', 'radialGradient', 'textpath', 'tref',
|
||||
'set', 'use']
|
||||
|
||||
acceptable_css_properties = ['azimuth', 'background-color',
|
||||
'border-bottom-color', 'border-collapse', 'border-color',
|
||||
'border-left-color', 'border-right-color', 'border-top-color', 'clear',
|
||||
'color', 'cursor', 'direction', 'display', 'elevation', 'float', 'font',
|
||||
'font-family', 'font-size', 'font-style', 'font-variant', 'font-weight',
|
||||
'height', 'letter-spacing', 'line-height', 'overflow', 'pause',
|
||||
'pause-after', 'pause-before', 'pitch', 'pitch-range', 'richness',
|
||||
'speak', 'speak-header', 'speak-numeral', 'speak-punctuation',
|
||||
'speech-rate', 'stress', 'text-align', 'text-decoration', 'text-indent',
|
||||
'unicode-bidi', 'vertical-align', 'voice-family', 'volume',
|
||||
'white-space', 'width']
|
||||
|
||||
acceptable_css_keywords = ['auto', 'aqua', 'black', 'block', 'blue',
|
||||
'bold', 'both', 'bottom', 'brown', 'center', 'collapse', 'dashed',
|
||||
'dotted', 'fuchsia', 'gray', 'green', '!important', 'italic', 'left',
|
||||
'lime', 'maroon', 'medium', 'none', 'navy', 'normal', 'nowrap', 'olive',
|
||||
'pointer', 'purple', 'red', 'right', 'solid', 'silver', 'teal', 'top',
|
||||
'transparent', 'underline', 'white', 'yellow']
|
||||
|
||||
acceptable_svg_properties = ['fill', 'fill-opacity', 'fill-rule',
|
||||
'stroke', 'stroke-width', 'stroke-linecap', 'stroke-linejoin',
|
||||
'stroke-opacity']
|
||||
|
||||
acceptable_protocols = ['ed2k', 'ftp', 'http', 'https', 'irc',
|
||||
'mailto', 'news', 'gopher', 'nntp', 'telnet', 'webcal',
|
||||
'xmpp', 'callto', 'feed', 'urn', 'aim', 'rsync', 'tag',
|
||||
'ssh', 'sftp', 'rtsp', 'afs']
|
||||
|
||||
# subclasses may define their own versions of these constants
|
||||
allowed_elements = acceptable_elements + mathml_elements + svg_elements
|
||||
allowed_attributes = acceptable_attributes + mathml_attributes + svg_attributes
|
||||
allowed_css_properties = acceptable_css_properties
|
||||
allowed_css_keywords = acceptable_css_keywords
|
||||
allowed_svg_properties = acceptable_svg_properties
|
||||
allowed_protocols = acceptable_protocols
|
||||
|
||||
# Sanitize the +html+, escaping all elements not in ALLOWED_ELEMENTS, and
|
||||
# stripping out all # attributes not in ALLOWED_ATTRIBUTES. Style
|
||||
# attributes are parsed, and a restricted set, # specified by
|
||||
# ALLOWED_CSS_PROPERTIES and ALLOWED_CSS_KEYWORDS, are allowed through.
|
||||
# attributes in ATTR_VAL_IS_URI are scanned, and only URI schemes specified
|
||||
# in ALLOWED_PROTOCOLS are allowed.
|
||||
#
|
||||
# sanitize_html('<script> do_nasty_stuff() </script>')
|
||||
# => <script> do_nasty_stuff() </script>
|
||||
# sanitize_html('<a href="javascript: sucker();">Click here for $100</a>')
|
||||
# => <a>Click here for $100</a>
|
||||
def sanitize_token(self, token):
|
||||
|
||||
# accommodate filters which use token_type differently
|
||||
token_type = token["type"]
|
||||
if token_type in list(tokenTypes.keys()):
|
||||
token_type = tokenTypes[token_type]
|
||||
|
||||
if token_type in (tokenTypes["StartTag"], tokenTypes["EndTag"],
|
||||
tokenTypes["EmptyTag"]):
|
||||
if token["name"] in self.allowed_elements:
|
||||
return self.allowed_token(token, token_type)
|
||||
else:
|
||||
return self.disallowed_token(token, token_type)
|
||||
elif token_type == tokenTypes["Comment"]:
|
||||
pass
|
||||
else:
|
||||
return token
|
||||
|
||||
def allowed_token(self, token, token_type):
|
||||
if "data" in token:
|
||||
attrs = dict([(name, val) for name, val in
|
||||
token["data"][::-1]
|
||||
if name in self.allowed_attributes])
|
||||
for attr in self.attr_val_is_uri:
|
||||
if attr not in attrs:
|
||||
continue
|
||||
val_unescaped = re.sub("[`\000-\040\177-\240\s]+", '',
|
||||
unescape(attrs[attr])).lower()
|
||||
# remove replacement characters from unescaped characters
|
||||
val_unescaped = val_unescaped.replace("\ufffd", "")
|
||||
if (re.match("^[a-z0-9][-+.a-z0-9]*:", val_unescaped) and
|
||||
(val_unescaped.split(':')[0] not in
|
||||
self.allowed_protocols)):
|
||||
del attrs[attr]
|
||||
for attr in self.svg_attr_val_allows_ref:
|
||||
if attr in attrs:
|
||||
attrs[attr] = re.sub(r'url\s*\(\s*[^#\s][^)]+?\)',
|
||||
' ',
|
||||
unescape(attrs[attr]))
|
||||
if (token["name"] in self.svg_allow_local_href and
|
||||
'xlink:href' in attrs and re.search('^\s*[^#\s].*',
|
||||
attrs['xlink:href'])):
|
||||
del attrs['xlink:href']
|
||||
if 'style' in attrs:
|
||||
attrs['style'] = self.sanitize_css(attrs['style'])
|
||||
token["data"] = [[name, val] for name, val in list(attrs.items())]
|
||||
return token
|
||||
|
||||
def disallowed_token(self, token, token_type):
|
||||
if token_type == tokenTypes["EndTag"]:
|
||||
token["data"] = "</%s>" % token["name"]
|
||||
elif token["data"]:
|
||||
attrs = ''.join([' %s="%s"' % (k, escape(v)) for k, v in token["data"]])
|
||||
token["data"] = "<%s%s>" % (token["name"], attrs)
|
||||
else:
|
||||
token["data"] = "<%s>" % token["name"]
|
||||
if token.get("selfClosing"):
|
||||
token["data"] = token["data"][:-1] + "/>"
|
||||
|
||||
if token["type"] in list(tokenTypes.keys()):
|
||||
token["type"] = "Characters"
|
||||
else:
|
||||
token["type"] = tokenTypes["Characters"]
|
||||
|
||||
del token["name"]
|
||||
return token
|
||||
|
||||
def sanitize_css(self, style):
|
||||
# disallow urls
|
||||
style = re.compile('url\s*\(\s*[^\s)]+?\s*\)\s*').sub(' ', style)
|
||||
|
||||
# gauntlet
|
||||
if not re.match("""^([:,;#%.\sa-zA-Z0-9!]|\w-\w|'[\s\w]+'|"[\s\w]+"|\([\d,\s]+\))*$""", style):
|
||||
return ''
|
||||
if not re.match("^\s*([-\w]+\s*:[^:;]*(;\s*|$))*$", style):
|
||||
return ''
|
||||
|
||||
clean = []
|
||||
for prop, value in re.findall("([-\w]+)\s*:\s*([^:;]*)", style):
|
||||
if not value:
|
||||
continue
|
||||
if prop.lower() in self.allowed_css_properties:
|
||||
clean.append(prop + ': ' + value + ';')
|
||||
elif prop.split('-')[0].lower() in ['background', 'border', 'margin',
|
||||
'padding']:
|
||||
for keyword in value.split():
|
||||
if keyword not in self.acceptable_css_keywords and \
|
||||
not re.match("^(#[0-9a-f]+|rgb\(\d+%?,\d*%?,?\d*%?\)?|\d{0,2}\.?\d{0,2}(cm|em|ex|in|mm|pc|pt|px|%|,|\))?)$", keyword):
|
||||
break
|
||||
else:
|
||||
clean.append(prop + ': ' + value + ';')
|
||||
elif prop.lower() in self.allowed_svg_properties:
|
||||
clean.append(prop + ': ' + value + ';')
|
||||
|
||||
return ' '.join(clean)
|
||||
|
||||
|
||||
class HTMLSanitizer(HTMLTokenizer, HTMLSanitizerMixin):
|
||||
def __init__(self, stream, encoding=None, parseMeta=True, useChardet=True,
|
||||
lowercaseElementName=False, lowercaseAttrName=False, parser=None):
|
||||
# Change case matching defaults as we only output lowercase html anyway
|
||||
# This solution doesn't seem ideal...
|
||||
HTMLTokenizer.__init__(self, stream, encoding, parseMeta, useChardet,
|
||||
lowercaseElementName, lowercaseAttrName, parser=parser)
|
||||
|
||||
def __iter__(self):
|
||||
for token in HTMLTokenizer.__iter__(self):
|
||||
token = self.sanitize_token(token)
|
||||
if token:
|
||||
yield token
|
|
@ -0,0 +1,16 @@
|
|||
from __future__ import absolute_import, division, unicode_literals
|
||||
|
||||
from .. import treewalkers
|
||||
|
||||
from .htmlserializer import HTMLSerializer
|
||||
|
||||
|
||||
def serialize(input, tree="etree", format="html", encoding=None,
|
||||
**serializer_opts):
|
||||
# XXX: Should we cache this?
|
||||
walker = treewalkers.getTreeWalker(tree)
|
||||
if format == "html":
|
||||
s = HTMLSerializer(**serializer_opts)
|
||||
else:
|
||||
raise ValueError("type must be html")
|
||||
return s.render(walker(input), encoding)
|
|
@ -0,0 +1,320 @@
|
|||
from __future__ import absolute_import, division, unicode_literals
|
||||
from six import text_type
|
||||
|
||||
import gettext
|
||||
_ = gettext.gettext
|
||||
|
||||
try:
|
||||
from functools import reduce
|
||||
except ImportError:
|
||||
pass
|
||||
|
||||
from ..constants import voidElements, booleanAttributes, spaceCharacters
|
||||
from ..constants import rcdataElements, entities, xmlEntities
|
||||
from .. import utils
|
||||
from xml.sax.saxutils import escape
|
||||
|
||||
spaceCharacters = "".join(spaceCharacters)
|
||||
|
||||
try:
|
||||
from codecs import register_error, xmlcharrefreplace_errors
|
||||
except ImportError:
|
||||
unicode_encode_errors = "strict"
|
||||
else:
|
||||
unicode_encode_errors = "htmlentityreplace"
|
||||
|
||||
encode_entity_map = {}
|
||||
is_ucs4 = len("\U0010FFFF") == 1
|
||||
for k, v in list(entities.items()):
|
||||
# skip multi-character entities
|
||||
if ((is_ucs4 and len(v) > 1) or
|
||||
(not is_ucs4 and len(v) > 2)):
|
||||
continue
|
||||
if v != "&":
|
||||
if len(v) == 2:
|
||||
v = utils.surrogatePairToCodepoint(v)
|
||||
else:
|
||||
v = ord(v)
|
||||
if v not in encode_entity_map or k.islower():
|
||||
# prefer < over < and similarly for &, >, etc.
|
||||
encode_entity_map[v] = k
|
||||
|
||||
def htmlentityreplace_errors(exc):
|
||||
if isinstance(exc, (UnicodeEncodeError, UnicodeTranslateError)):
|
||||
res = []
|
||||
codepoints = []
|
||||
skip = False
|
||||
for i, c in enumerate(exc.object[exc.start:exc.end]):
|
||||
if skip:
|
||||
skip = False
|
||||
continue
|
||||
index = i + exc.start
|
||||
if utils.isSurrogatePair(exc.object[index:min([exc.end, index + 2])]):
|
||||
codepoint = utils.surrogatePairToCodepoint(exc.object[index:index + 2])
|
||||
skip = True
|
||||
else:
|
||||
codepoint = ord(c)
|
||||
codepoints.append(codepoint)
|
||||
for cp in codepoints:
|
||||
e = encode_entity_map.get(cp)
|
||||
if e:
|
||||
res.append("&")
|
||||
res.append(e)
|
||||
if not e.endswith(";"):
|
||||
res.append(";")
|
||||
else:
|
||||
res.append("&#x%s;" % (hex(cp)[2:]))
|
||||
return ("".join(res), exc.end)
|
||||
else:
|
||||
return xmlcharrefreplace_errors(exc)
|
||||
|
||||
register_error(unicode_encode_errors, htmlentityreplace_errors)
|
||||
|
||||
del register_error
|
||||
|
||||
|
||||
class HTMLSerializer(object):
|
||||
|
||||
# attribute quoting options
|
||||
quote_attr_values = False
|
||||
quote_char = '"'
|
||||
use_best_quote_char = True
|
||||
|
||||
# tag syntax options
|
||||
omit_optional_tags = True
|
||||
minimize_boolean_attributes = True
|
||||
use_trailing_solidus = False
|
||||
space_before_trailing_solidus = True
|
||||
|
||||
# escaping options
|
||||
escape_lt_in_attrs = False
|
||||
escape_rcdata = False
|
||||
resolve_entities = True
|
||||
|
||||
# miscellaneous options
|
||||
alphabetical_attributes = False
|
||||
inject_meta_charset = True
|
||||
strip_whitespace = False
|
||||
sanitize = False
|
||||
|
||||
options = ("quote_attr_values", "quote_char", "use_best_quote_char",
|
||||
"omit_optional_tags", "minimize_boolean_attributes",
|
||||
"use_trailing_solidus", "space_before_trailing_solidus",
|
||||
"escape_lt_in_attrs", "escape_rcdata", "resolve_entities",
|
||||
"alphabetical_attributes", "inject_meta_charset",
|
||||
"strip_whitespace", "sanitize")
|
||||
|
||||
def __init__(self, **kwargs):
|
||||
"""Initialize HTMLSerializer.
|
||||
|
||||
Keyword options (default given first unless specified) include:
|
||||
|
||||
inject_meta_charset=True|False
|
||||
Whether it insert a meta element to define the character set of the
|
||||
document.
|
||||
quote_attr_values=True|False
|
||||
Whether to quote attribute values that don't require quoting
|
||||
per HTML5 parsing rules.
|
||||
quote_char=u'"'|u"'"
|
||||
Use given quote character for attribute quoting. Default is to
|
||||
use double quote unless attribute value contains a double quote,
|
||||
in which case single quotes are used instead.
|
||||
escape_lt_in_attrs=False|True
|
||||
Whether to escape < in attribute values.
|
||||
escape_rcdata=False|True
|
||||
Whether to escape characters that need to be escaped within normal
|
||||
elements within rcdata elements such as style.
|
||||
resolve_entities=True|False
|
||||
Whether to resolve named character entities that appear in the
|
||||
source tree. The XML predefined entities < > & " '
|
||||
are unaffected by this setting.
|
||||
strip_whitespace=False|True
|
||||
Whether to remove semantically meaningless whitespace. (This
|
||||
compresses all whitespace to a single space except within pre.)
|
||||
minimize_boolean_attributes=True|False
|
||||
Shortens boolean attributes to give just the attribute value,
|
||||
for example <input disabled="disabled"> becomes <input disabled>.
|
||||
use_trailing_solidus=False|True
|
||||
Includes a close-tag slash at the end of the start tag of void
|
||||
elements (empty elements whose end tag is forbidden). E.g. <hr/>.
|
||||
space_before_trailing_solidus=True|False
|
||||
Places a space immediately before the closing slash in a tag
|
||||
using a trailing solidus. E.g. <hr />. Requires use_trailing_solidus.
|
||||
sanitize=False|True
|
||||
Strip all unsafe or unknown constructs from output.
|
||||
See `html5lib user documentation`_
|
||||
omit_optional_tags=True|False
|
||||
Omit start/end tags that are optional.
|
||||
alphabetical_attributes=False|True
|
||||
Reorder attributes to be in alphabetical order.
|
||||
|
||||
.. _html5lib user documentation: http://code.google.com/p/html5lib/wiki/UserDocumentation
|
||||
"""
|
||||
if 'quote_char' in kwargs:
|
||||
self.use_best_quote_char = False
|
||||
for attr in self.options:
|
||||
setattr(self, attr, kwargs.get(attr, getattr(self, attr)))
|
||||
self.errors = []
|
||||
self.strict = False
|
||||
|
||||
def encode(self, string):
|
||||
assert(isinstance(string, text_type))
|
||||
if self.encoding:
|
||||
return string.encode(self.encoding, unicode_encode_errors)
|
||||
else:
|
||||
return string
|
||||
|
||||
def encodeStrict(self, string):
|
||||
assert(isinstance(string, text_type))
|
||||
if self.encoding:
|
||||
return string.encode(self.encoding, "strict")
|
||||
else:
|
||||
return string
|
||||
|
||||
def serialize(self, treewalker, encoding=None):
|
||||
self.encoding = encoding
|
||||
in_cdata = False
|
||||
self.errors = []
|
||||
|
||||
if encoding and self.inject_meta_charset:
|
||||
from ..filters.inject_meta_charset import Filter
|
||||
treewalker = Filter(treewalker, encoding)
|
||||
# WhitespaceFilter should be used before OptionalTagFilter
|
||||
# for maximum efficiently of this latter filter
|
||||
if self.strip_whitespace:
|
||||
from ..filters.whitespace import Filter
|
||||
treewalker = Filter(treewalker)
|
||||
if self.sanitize:
|
||||
from ..filters.sanitizer import Filter
|
||||
treewalker = Filter(treewalker)
|
||||
if self.omit_optional_tags:
|
||||
from ..filters.optionaltags import Filter
|
||||
treewalker = Filter(treewalker)
|
||||
# Alphabetical attributes must be last, as other filters
|
||||
# could add attributes and alter the order
|
||||
if self.alphabetical_attributes:
|
||||
from ..filters.alphabeticalattributes import Filter
|
||||
treewalker = Filter(treewalker)
|
||||
|
||||
for token in treewalker:
|
||||
type = token["type"]
|
||||
if type == "Doctype":
|
||||
doctype = "<!DOCTYPE %s" % token["name"]
|
||||
|
||||
if token["publicId"]:
|
||||
doctype += ' PUBLIC "%s"' % token["publicId"]
|
||||
elif token["systemId"]:
|
||||
doctype += " SYSTEM"
|
||||
if token["systemId"]:
|
||||
if token["systemId"].find('"') >= 0:
|
||||
if token["systemId"].find("'") >= 0:
|
||||
self.serializeError(_("System identifer contains both single and double quote characters"))
|
||||
quote_char = "'"
|
||||
else:
|
||||
quote_char = '"'
|
||||
doctype += " %s%s%s" % (quote_char, token["systemId"], quote_char)
|
||||
|
||||
doctype += ">"
|
||||
yield self.encodeStrict(doctype)
|
||||
|
||||
elif type in ("Characters", "SpaceCharacters"):
|
||||
if type == "SpaceCharacters" or in_cdata:
|
||||
if in_cdata and token["data"].find("</") >= 0:
|
||||
self.serializeError(_("Unexpected </ in CDATA"))
|
||||
yield self.encode(token["data"])
|
||||
else:
|
||||
yield self.encode(escape(token["data"]))
|
||||
|
||||
elif type in ("StartTag", "EmptyTag"):
|
||||
name = token["name"]
|
||||
yield self.encodeStrict("<%s" % name)
|
||||
if name in rcdataElements and not self.escape_rcdata:
|
||||
in_cdata = True
|
||||
elif in_cdata:
|
||||
self.serializeError(_("Unexpected child element of a CDATA element"))
|
||||
for (attr_namespace, attr_name), attr_value in token["data"].items():
|
||||
# TODO: Add namespace support here
|
||||
k = attr_name
|
||||
v = attr_value
|
||||
yield self.encodeStrict(' ')
|
||||
|
||||
yield self.encodeStrict(k)
|
||||
if not self.minimize_boolean_attributes or \
|
||||
(k not in booleanAttributes.get(name, tuple())
|
||||
and k not in booleanAttributes.get("", tuple())):
|
||||
yield self.encodeStrict("=")
|
||||
if self.quote_attr_values or not v:
|
||||
quote_attr = True
|
||||
else:
|
||||
quote_attr = reduce(lambda x, y: x or (y in v),
|
||||
spaceCharacters + ">\"'=", False)
|
||||
v = v.replace("&", "&")
|
||||
if self.escape_lt_in_attrs:
|
||||
v = v.replace("<", "<")
|
||||
if quote_attr:
|
||||
quote_char = self.quote_char
|
||||
if self.use_best_quote_char:
|
||||
if "'" in v and '"' not in v:
|
||||
quote_char = '"'
|
||||
elif '"' in v and "'" not in v:
|
||||
quote_char = "'"
|
||||
if quote_char == "'":
|
||||
v = v.replace("'", "'")
|
||||
else:
|
||||
v = v.replace('"', """)
|
||||
yield self.encodeStrict(quote_char)
|
||||
yield self.encode(v)
|
||||
yield self.encodeStrict(quote_char)
|
||||
else:
|
||||
yield self.encode(v)
|
||||
if name in voidElements and self.use_trailing_solidus:
|
||||
if self.space_before_trailing_solidus:
|
||||
yield self.encodeStrict(" /")
|
||||
else:
|
||||
yield self.encodeStrict("/")
|
||||
yield self.encode(">")
|
||||
|
||||
elif type == "EndTag":
|
||||
name = token["name"]
|
||||
if name in rcdataElements:
|
||||
in_cdata = False
|
||||
elif in_cdata:
|
||||
self.serializeError(_("Unexpected child element of a CDATA element"))
|
||||
yield self.encodeStrict("</%s>" % name)
|
||||
|
||||
elif type == "Comment":
|
||||
data = token["data"]
|
||||
if data.find("--") >= 0:
|
||||
self.serializeError(_("Comment contains --"))
|
||||
yield self.encodeStrict("<!--%s-->" % token["data"])
|
||||
|
||||
elif type == "Entity":
|
||||
name = token["name"]
|
||||
key = name + ";"
|
||||
if key not in entities:
|
||||
self.serializeError(_("Entity %s not recognized" % name))
|
||||
if self.resolve_entities and key not in xmlEntities:
|
||||
data = entities[key]
|
||||
else:
|
||||
data = "&%s;" % name
|
||||
yield self.encodeStrict(data)
|
||||
|
||||
else:
|
||||
self.serializeError(token["data"])
|
||||
|
||||
def render(self, treewalker, encoding=None):
|
||||
if encoding:
|
||||
return b"".join(list(self.serialize(treewalker, encoding)))
|
||||
else:
|
||||
return "".join(list(self.serialize(treewalker)))
|
||||
|
||||
def serializeError(self, data="XXX ERROR MESSAGE NEEDED"):
|
||||
# XXX The idea is to make data mandatory.
|
||||
self.errors.append(data)
|
||||
if self.strict:
|
||||
raise SerializeError
|
||||
|
||||
|
||||
def SerializeError(Exception):
|
||||
"""Error in serialized tree"""
|
||||
pass
|
|
@ -0,0 +1 @@
|
|||
Each testcase file can be run through nose (using ``nosetests``).
|
|
@ -0,0 +1 @@
|
|||
from __future__ import absolute_import, division, unicode_literals
|
|
@ -0,0 +1,41 @@
|
|||
from __future__ import absolute_import, division, unicode_literals
|
||||
|
||||
import sys
|
||||
import os
|
||||
|
||||
if __name__ == '__main__':
|
||||
# Allow us to import from the src directory
|
||||
os.chdir(os.path.split(os.path.abspath(__file__))[0])
|
||||
sys.path.insert(0, os.path.abspath(os.path.join(os.pardir, "src")))
|
||||
|
||||
from html5lib.tokenizer import HTMLTokenizer
|
||||
|
||||
|
||||
class HTMLParser(object):
|
||||
""" Fake parser to test tokenizer output """
|
||||
def parse(self, stream, output=True):
|
||||
tokenizer = HTMLTokenizer(stream)
|
||||
for token in tokenizer:
|
||||
if output:
|
||||
print(token)
|
||||
|
||||
if __name__ == "__main__":
|
||||
x = HTMLParser()
|
||||
if len(sys.argv) > 1:
|
||||
if len(sys.argv) > 2:
|
||||
import hotshot
|
||||
import hotshot.stats
|
||||
prof = hotshot.Profile('stats.prof')
|
||||
prof.runcall(x.parse, sys.argv[1], False)
|
||||
prof.close()
|
||||
stats = hotshot.stats.load('stats.prof')
|
||||
stats.strip_dirs()
|
||||
stats.sort_stats('time')
|
||||
stats.print_stats()
|
||||
else:
|
||||
x.parse(sys.argv[1])
|
||||
else:
|
||||
print("""Usage: python mockParser.py filename [stats]
|
||||
If stats is specified the hotshots profiler will run and output the
|
||||
stats instead.
|
||||
""")
|
|
@ -0,0 +1,36 @@
|
|||
from __future__ import absolute_import, division, unicode_literals
|
||||
|
||||
|
||||
def f1():
|
||||
x = "ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZ"
|
||||
y = "ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZ"
|
||||
z = "ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZ"
|
||||
x += y + z
|
||||
|
||||
|
||||
def f2():
|
||||
x = "ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZ"
|
||||
y = "ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZ"
|
||||
z = "ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZ"
|
||||
x = x + y + z
|
||||
|
||||
|
||||
def f3():
|
||||
x = "ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZ"
|
||||
y = "ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZ"
|
||||
z = "ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZ"
|
||||
x = "".join((x, y, z))
|
||||
|
||||
|
||||
def f4():
|
||||
x = "ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZ"
|
||||
y = "ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZ"
|
||||
z = "ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZ"
|
||||
x = "%s%s%s" % (x, y, z)
|
||||
|
||||
import timeit
|
||||
for x in range(4):
|
||||
statement = "f%s" % (x + 1)
|
||||
t = timeit.Timer(statement, "from __main__ import " + statement)
|
||||
r = t.repeat(3, 1000000)
|
||||
print(r, min(r))
|
|
@ -0,0 +1,177 @@
|
|||
from __future__ import absolute_import, division, unicode_literals
|
||||
|
||||
import os
|
||||
import sys
|
||||
import codecs
|
||||
import glob
|
||||
import xml.sax.handler
|
||||
|
||||
base_path = os.path.split(__file__)[0]
|
||||
|
||||
test_dir = os.path.join(base_path, 'testdata')
|
||||
sys.path.insert(0, os.path.abspath(os.path.join(base_path,
|
||||
os.path.pardir,
|
||||
os.path.pardir)))
|
||||
|
||||
from html5lib import treebuilders
|
||||
del base_path
|
||||
|
||||
# Build a dict of avaliable trees
|
||||
treeTypes = {"DOM": treebuilders.getTreeBuilder("dom")}
|
||||
|
||||
# Try whatever etree implementations are avaliable from a list that are
|
||||
#"supposed" to work
|
||||
try:
|
||||
import xml.etree.ElementTree as ElementTree
|
||||
treeTypes['ElementTree'] = treebuilders.getTreeBuilder("etree", ElementTree, fullTree=True)
|
||||
except ImportError:
|
||||
try:
|
||||
import elementtree.ElementTree as ElementTree
|
||||
treeTypes['ElementTree'] = treebuilders.getTreeBuilder("etree", ElementTree, fullTree=True)
|
||||
except ImportError:
|
||||
pass
|
||||
|
||||
try:
|
||||
import xml.etree.cElementTree as cElementTree
|
||||
treeTypes['cElementTree'] = treebuilders.getTreeBuilder("etree", cElementTree, fullTree=True)
|
||||
except ImportError:
|
||||
try:
|
||||
import cElementTree
|
||||
treeTypes['cElementTree'] = treebuilders.getTreeBuilder("etree", cElementTree, fullTree=True)
|
||||
except ImportError:
|
||||
pass
|
||||
|
||||
try:
|
||||
import lxml.etree as lxml # flake8: noqa
|
||||
except ImportError:
|
||||
pass
|
||||
else:
|
||||
treeTypes['lxml'] = treebuilders.getTreeBuilder("lxml")
|
||||
|
||||
|
||||
def get_data_files(subdirectory, files='*.dat'):
|
||||
return glob.glob(os.path.join(test_dir, subdirectory, files))
|
||||
|
||||
|
||||
class DefaultDict(dict):
|
||||
def __init__(self, default, *args, **kwargs):
|
||||
self.default = default
|
||||
dict.__init__(self, *args, **kwargs)
|
||||
|
||||
def __getitem__(self, key):
|
||||
return dict.get(self, key, self.default)
|
||||
|
||||
|
||||
class TestData(object):
|
||||
def __init__(self, filename, newTestHeading="data", encoding="utf8"):
|
||||
if encoding is None:
|
||||
self.f = open(filename, mode="rb")
|
||||
else:
|
||||
self.f = codecs.open(filename, encoding=encoding)
|
||||
self.encoding = encoding
|
||||
self.newTestHeading = newTestHeading
|
||||
|
||||
def __del__(self):
|
||||
self.f.close()
|
||||
|
||||
def __iter__(self):
|
||||
data = DefaultDict(None)
|
||||
key = None
|
||||
for line in self.f:
|
||||
heading = self.isSectionHeading(line)
|
||||
if heading:
|
||||
if data and heading == self.newTestHeading:
|
||||
# Remove trailing newline
|
||||
data[key] = data[key][:-1]
|
||||
yield self.normaliseOutput(data)
|
||||
data = DefaultDict(None)
|
||||
key = heading
|
||||
data[key] = "" if self.encoding else b""
|
||||
elif key is not None:
|
||||
data[key] += line
|
||||
if data:
|
||||
yield self.normaliseOutput(data)
|
||||
|
||||
def isSectionHeading(self, line):
|
||||
"""If the current heading is a test section heading return the heading,
|
||||
otherwise return False"""
|
||||
# print(line)
|
||||
if line.startswith("#" if self.encoding else b"#"):
|
||||
return line[1:].strip()
|
||||
else:
|
||||
return False
|
||||
|
||||
def normaliseOutput(self, data):
|
||||
# Remove trailing newlines
|
||||
for key, value in data.items():
|
||||
if value.endswith("\n" if self.encoding else b"\n"):
|
||||
data[key] = value[:-1]
|
||||
return data
|
||||
|
||||
|
||||
def convert(stripChars):
|
||||
def convertData(data):
|
||||
"""convert the output of str(document) to the format used in the testcases"""
|
||||
data = data.split("\n")
|
||||
rv = []
|
||||
for line in data:
|
||||
if line.startswith("|"):
|
||||
rv.append(line[stripChars:])
|
||||
else:
|
||||
rv.append(line)
|
||||
return "\n".join(rv)
|
||||
return convertData
|
||||
|
||||
convertExpected = convert(2)
|
||||
|
||||
|
||||
def errorMessage(input, expected, actual):
|
||||
msg = ("Input:\n%s\nExpected:\n%s\nRecieved\n%s\n" %
|
||||
(repr(input), repr(expected), repr(actual)))
|
||||
if sys.version_info.major == 2:
|
||||
msg = msg.encode("ascii", "backslashreplace")
|
||||
return msg
|
||||
|
||||
|
||||
class TracingSaxHandler(xml.sax.handler.ContentHandler):
|
||||
def __init__(self):
|
||||
xml.sax.handler.ContentHandler.__init__(self)
|
||||
self.visited = []
|
||||
|
||||
def startDocument(self):
|
||||
self.visited.append('startDocument')
|
||||
|
||||
def endDocument(self):
|
||||
self.visited.append('endDocument')
|
||||
|
||||
def startPrefixMapping(self, prefix, uri):
|
||||
# These are ignored as their order is not guaranteed
|
||||
pass
|
||||
|
||||
def endPrefixMapping(self, prefix):
|
||||
# These are ignored as their order is not guaranteed
|
||||
pass
|
||||
|
||||
def startElement(self, name, attrs):
|
||||
self.visited.append(('startElement', name, attrs))
|
||||
|
||||
def endElement(self, name):
|
||||
self.visited.append(('endElement', name))
|
||||
|
||||
def startElementNS(self, name, qname, attrs):
|
||||
self.visited.append(('startElementNS', name, qname, dict(attrs)))
|
||||
|
||||
def endElementNS(self, name, qname):
|
||||
self.visited.append(('endElementNS', name, qname))
|
||||
|
||||
def characters(self, content):
|
||||
self.visited.append(('characters', content))
|
||||
|
||||
def ignorableWhitespace(self, whitespace):
|
||||
self.visited.append(('ignorableWhitespace', whitespace))
|
||||
|
||||
def processingInstruction(self, target, data):
|
||||
self.visited.append(('processingInstruction', target, data))
|
||||
|
||||
def skippedEntity(self, name):
|
||||
self.visited.append(('skippedEntity', name))
|
|
@ -0,0 +1,67 @@
|
|||
from __future__ import absolute_import, division, unicode_literals
|
||||
|
||||
import os
|
||||
import unittest
|
||||
|
||||
try:
|
||||
unittest.TestCase.assertEqual
|
||||
except AttributeError:
|
||||
unittest.TestCase.assertEqual = unittest.TestCase.assertEquals
|
||||
|
||||
from .support import get_data_files, TestData, test_dir, errorMessage
|
||||
from html5lib import HTMLParser, inputstream
|
||||
|
||||
|
||||
class Html5EncodingTestCase(unittest.TestCase):
|
||||
def test_codec_name_a(self):
|
||||
self.assertEqual(inputstream.codecName("utf-8"), "utf-8")
|
||||
|
||||
def test_codec_name_b(self):
|
||||
self.assertEqual(inputstream.codecName("utf8"), "utf-8")
|
||||
|
||||
def test_codec_name_c(self):
|
||||
self.assertEqual(inputstream.codecName(" utf8 "), "utf-8")
|
||||
|
||||
def test_codec_name_d(self):
|
||||
self.assertEqual(inputstream.codecName("ISO_8859--1"), "windows-1252")
|
||||
|
||||
|
||||
def runParserEncodingTest(data, encoding):
|
||||
p = HTMLParser()
|
||||
assert p.documentEncoding is None
|
||||
p.parse(data, useChardet=False)
|
||||
encoding = encoding.lower().decode("ascii")
|
||||
|
||||
assert encoding == p.documentEncoding, errorMessage(data, encoding, p.documentEncoding)
|
||||
|
||||
|
||||
def runPreScanEncodingTest(data, encoding):
|
||||
stream = inputstream.HTMLBinaryInputStream(data, chardet=False)
|
||||
encoding = encoding.lower().decode("ascii")
|
||||
|
||||
# Very crude way to ignore irrelevant tests
|
||||
if len(data) > stream.numBytesMeta:
|
||||
return
|
||||
|
||||
assert encoding == stream.charEncoding[0], errorMessage(data, encoding, stream.charEncoding[0])
|
||||
|
||||
|
||||
def test_encoding():
|
||||
for filename in get_data_files("encoding"):
|
||||
tests = TestData(filename, b"data", encoding=None)
|
||||
for idx, test in enumerate(tests):
|
||||
yield (runParserEncodingTest, test[b'data'], test[b'encoding'])
|
||||
yield (runPreScanEncodingTest, test[b'data'], test[b'encoding'])
|
||||
|
||||
try:
|
||||
try:
|
||||
import charade # flake8: noqa
|
||||
except ImportError:
|
||||
import chardet # flake8: noqa
|
||||
except ImportError:
|
||||
print("charade/chardet not found, skipping chardet tests")
|
||||
else:
|
||||
def test_chardet():
|
||||
with open(os.path.join(test_dir, "encoding" , "chardet", "test_big5.txt"), "rb") as fp:
|
||||
encoding = inputstream.HTMLInputStream(fp.read()).charEncoding
|
||||
assert encoding[0].lower() == "big5"
|
|
@ -0,0 +1,96 @@
|
|||
from __future__ import absolute_import, division, unicode_literals
|
||||
|
||||
import os
|
||||
import sys
|
||||
import traceback
|
||||
import warnings
|
||||
import re
|
||||
|
||||
warnings.simplefilter("error")
|
||||
|
||||
from .support import get_data_files
|
||||
from .support import TestData, convert, convertExpected, treeTypes
|
||||
from html5lib import html5parser, constants
|
||||
|
||||
# Run the parse error checks
|
||||
checkParseErrors = False
|
||||
|
||||
# XXX - There should just be one function here but for some reason the testcase
|
||||
# format differs from the treedump format by a single space character
|
||||
|
||||
|
||||
def convertTreeDump(data):
|
||||
return "\n".join(convert(3)(data).split("\n")[1:])
|
||||
|
||||
namespaceExpected = re.compile(r"^(\s*)<(\S+)>", re.M).sub
|
||||
|
||||
|
||||
def runParserTest(innerHTML, input, expected, errors, treeClass,
|
||||
namespaceHTMLElements):
|
||||
with warnings.catch_warnings(record=True) as caughtWarnings:
|
||||
warnings.simplefilter("always")
|
||||
p = html5parser.HTMLParser(tree=treeClass,
|
||||
namespaceHTMLElements=namespaceHTMLElements)
|
||||
|
||||
try:
|
||||
if innerHTML:
|
||||
document = p.parseFragment(input, innerHTML)
|
||||
else:
|
||||
document = p.parse(input)
|
||||
except:
|
||||
errorMsg = "\n".join(["\n\nInput:", input, "\nExpected:", expected,
|
||||
"\nTraceback:", traceback.format_exc()])
|
||||
assert False, errorMsg
|
||||
|
||||
otherWarnings = [x for x in caughtWarnings
|
||||
if not issubclass(x.category, constants.DataLossWarning)]
|
||||
assert len(otherWarnings) == 0, [(x.category, x.message) for x in otherWarnings]
|
||||
if len(caughtWarnings):
|
||||
return
|
||||
|
||||
output = convertTreeDump(p.tree.testSerializer(document))
|
||||
|
||||
expected = convertExpected(expected)
|
||||
if namespaceHTMLElements:
|
||||
expected = namespaceExpected(r"\1<html \2>", expected)
|
||||
|
||||
errorMsg = "\n".join(["\n\nInput:", input, "\nExpected:", expected,
|
||||
"\nReceived:", output])
|
||||
assert expected == output, errorMsg
|
||||
|
||||
errStr = []
|
||||
for (line, col), errorcode, datavars in p.errors:
|
||||
assert isinstance(datavars, dict), "%s, %s" % (errorcode, repr(datavars))
|
||||
errStr.append("Line: %i Col: %i %s" % (line, col,
|
||||
constants.E[errorcode] % datavars))
|
||||
|
||||
errorMsg2 = "\n".join(["\n\nInput:", input,
|
||||
"\nExpected errors (" + str(len(errors)) + "):\n" + "\n".join(errors),
|
||||
"\nActual errors (" + str(len(p.errors)) + "):\n" + "\n".join(errStr)])
|
||||
if checkParseErrors:
|
||||
assert len(p.errors) == len(errors), errorMsg2
|
||||
|
||||
|
||||
def test_parser():
|
||||
sys.stderr.write('Testing tree builders ' + " ".join(list(treeTypes.keys())) + "\n")
|
||||
files = get_data_files('tree-construction')
|
||||
|
||||
for filename in files:
|
||||
testName = os.path.basename(filename).replace(".dat", "")
|
||||
if testName in ("template",):
|
||||
continue
|
||||
|
||||
tests = TestData(filename, "data")
|
||||
|
||||
for index, test in enumerate(tests):
|
||||
input, errors, innerHTML, expected = [test[key] for key in
|
||||
('data', 'errors',
|
||||
'document-fragment',
|
||||
'document')]
|
||||
if errors:
|
||||
errors = errors.split("\n")
|
||||
|
||||
for treeName, treeCls in treeTypes.items():
|
||||
for namespaceHTMLElements in (True, False):
|
||||
yield (runParserTest, innerHTML, input, expected, errors, treeCls,
|
||||
namespaceHTMLElements)
|
|
@ -0,0 +1,64 @@
|
|||
from __future__ import absolute_import, division, unicode_literals
|
||||
|
||||
import io
|
||||
|
||||
from . import support # flake8: noqa
|
||||
from html5lib import html5parser
|
||||
from html5lib.constants import namespaces
|
||||
from html5lib import treebuilders
|
||||
|
||||
import unittest
|
||||
|
||||
# tests that aren't autogenerated from text files
|
||||
|
||||
|
||||
class MoreParserTests(unittest.TestCase):
|
||||
|
||||
def setUp(self):
|
||||
self.dom_tree = treebuilders.getTreeBuilder("dom")
|
||||
|
||||
def test_assertDoctypeCloneable(self):
|
||||
parser = html5parser.HTMLParser(tree=self.dom_tree)
|
||||
doc = parser.parse('<!DOCTYPE HTML>')
|
||||
self.assertTrue(doc.cloneNode(True))
|
||||
|
||||
def test_line_counter(self):
|
||||
# http://groups.google.com/group/html5lib-discuss/browse_frm/thread/f4f00e4a2f26d5c0
|
||||
parser = html5parser.HTMLParser(tree=self.dom_tree)
|
||||
parser.parse("<pre>\nx\n>\n</pre>")
|
||||
|
||||
def test_namespace_html_elements_0_dom(self):
|
||||
parser = html5parser.HTMLParser(tree=self.dom_tree, namespaceHTMLElements=True)
|
||||
doc = parser.parse("<html></html>")
|
||||
self.assertTrue(doc.childNodes[0].namespaceURI == namespaces["html"])
|
||||
|
||||
def test_namespace_html_elements_1_dom(self):
|
||||
parser = html5parser.HTMLParser(tree=self.dom_tree, namespaceHTMLElements=False)
|
||||
doc = parser.parse("<html></html>")
|
||||
self.assertTrue(doc.childNodes[0].namespaceURI is None)
|
||||
|
||||
def test_namespace_html_elements_0_etree(self):
|
||||
parser = html5parser.HTMLParser(namespaceHTMLElements=True)
|
||||
doc = parser.parse("<html></html>")
|
||||
self.assertTrue(list(doc)[0].tag == "{%s}html" % (namespaces["html"],))
|
||||
|
||||
def test_namespace_html_elements_1_etree(self):
|
||||
parser = html5parser.HTMLParser(namespaceHTMLElements=False)
|
||||
doc = parser.parse("<html></html>")
|
||||
self.assertTrue(list(doc)[0].tag == "html")
|
||||
|
||||
def test_unicode_file(self):
|
||||
parser = html5parser.HTMLParser()
|
||||
parser.parse(io.StringIO("a"))
|
||||
|
||||
|
||||
def buildTestSuite():
|
||||
return unittest.defaultTestLoader.loadTestsFromName(__name__)
|
||||
|
||||
|
||||
def main():
|
||||
buildTestSuite()
|
||||
unittest.main()
|
||||
|
||||
if __name__ == '__main__':
|
||||
main()
|
|
@ -0,0 +1,105 @@
|
|||
from __future__ import absolute_import, division, unicode_literals
|
||||
|
||||
try:
|
||||
import json
|
||||
except ImportError:
|
||||
import simplejson as json
|
||||
|
||||
from html5lib import html5parser, sanitizer, constants, treebuilders
|
||||
|
||||
|
||||
def toxmlFactory():
|
||||
tree = treebuilders.getTreeBuilder("etree")
|
||||
|
||||
def toxml(element):
|
||||
# encode/decode roundtrip required for Python 2.6 compatibility
|
||||
result_bytes = tree.implementation.tostring(element, encoding="utf-8")
|
||||
return result_bytes.decode("utf-8")
|
||||
|
||||
return toxml
|
||||
|
||||
|
||||
def runSanitizerTest(name, expected, input, toxml=None):
|
||||
if toxml is None:
|
||||
toxml = toxmlFactory()
|
||||
expected = ''.join([toxml(token) for token in html5parser.HTMLParser().
|
||||
parseFragment(expected)])
|
||||
expected = json.loads(json.dumps(expected))
|
||||
assert expected == sanitize_html(input)
|
||||
|
||||
|
||||
def sanitize_html(stream, toxml=None):
|
||||
if toxml is None:
|
||||
toxml = toxmlFactory()
|
||||
return ''.join([toxml(token) for token in
|
||||
html5parser.HTMLParser(tokenizer=sanitizer.HTMLSanitizer).
|
||||
parseFragment(stream)])
|
||||
|
||||
|
||||
def test_should_handle_astral_plane_characters():
|
||||
assert '<html:p xmlns:html="http://www.w3.org/1999/xhtml">\U0001d4b5 \U0001d538</html:p>' == sanitize_html("<p>𝒵 𝔸</p>")
|
||||
|
||||
|
||||
def test_sanitizer():
|
||||
toxml = toxmlFactory()
|
||||
for tag_name in sanitizer.HTMLSanitizer.allowed_elements:
|
||||
if tag_name in ['caption', 'col', 'colgroup', 'optgroup', 'option', 'table', 'tbody', 'td', 'tfoot', 'th', 'thead', 'tr']:
|
||||
continue # TODO
|
||||
if tag_name != tag_name.lower():
|
||||
continue # TODO
|
||||
if tag_name == 'image':
|
||||
yield (runSanitizerTest, "test_should_allow_%s_tag" % tag_name,
|
||||
"<img title=\"1\"/>foo <bad>bar</bad> baz",
|
||||
"<%s title='1'>foo <bad>bar</bad> baz</%s>" % (tag_name, tag_name),
|
||||
toxml)
|
||||
elif tag_name == 'br':
|
||||
yield (runSanitizerTest, "test_should_allow_%s_tag" % tag_name,
|
||||
"<br title=\"1\"/>foo <bad>bar</bad> baz<br/>",
|
||||
"<%s title='1'>foo <bad>bar</bad> baz</%s>" % (tag_name, tag_name),
|
||||
toxml)
|
||||
elif tag_name in constants.voidElements:
|
||||
yield (runSanitizerTest, "test_should_allow_%s_tag" % tag_name,
|
||||
"<%s title=\"1\"/>foo <bad>bar</bad> baz" % tag_name,
|
||||
"<%s title='1'>foo <bad>bar</bad> baz</%s>" % (tag_name, tag_name),
|
||||
toxml)
|
||||
else:
|
||||
yield (runSanitizerTest, "test_should_allow_%s_tag" % tag_name,
|
||||
"<%s title=\"1\">foo <bad>bar</bad> baz</%s>" % (tag_name, tag_name),
|
||||
"<%s title='1'>foo <bad>bar</bad> baz</%s>" % (tag_name, tag_name),
|
||||
toxml)
|
||||
|
||||
for tag_name in sanitizer.HTMLSanitizer.allowed_elements:
|
||||
tag_name = tag_name.upper()
|
||||
yield (runSanitizerTest, "test_should_forbid_%s_tag" % tag_name,
|
||||
"<%s title=\"1\">foo <bad>bar</bad> baz</%s>" % (tag_name, tag_name),
|
||||
"<%s title='1'>foo <bad>bar</bad> baz</%s>" % (tag_name, tag_name),
|
||||
toxml)
|
||||
|
||||
for attribute_name in sanitizer.HTMLSanitizer.allowed_attributes:
|
||||
if attribute_name != attribute_name.lower():
|
||||
continue # TODO
|
||||
if attribute_name == 'style':
|
||||
continue
|
||||
yield (runSanitizerTest, "test_should_allow_%s_attribute" % attribute_name,
|
||||
"<p %s=\"foo\">foo <bad>bar</bad> baz</p>" % attribute_name,
|
||||
"<p %s='foo'>foo <bad>bar</bad> baz</p>" % attribute_name,
|
||||
toxml)
|
||||
|
||||
for attribute_name in sanitizer.HTMLSanitizer.allowed_attributes:
|
||||
attribute_name = attribute_name.upper()
|
||||
yield (runSanitizerTest, "test_should_forbid_%s_attribute" % attribute_name,
|
||||
"<p>foo <bad>bar</bad> baz</p>",
|
||||
"<p %s='display: none;'>foo <bad>bar</bad> baz</p>" % attribute_name,
|
||||
toxml)
|
||||
|
||||
for protocol in sanitizer.HTMLSanitizer.allowed_protocols:
|
||||
yield (runSanitizerTest, "test_should_allow_%s_uris" % protocol,
|
||||
"<a href=\"%s\">foo</a>" % protocol,
|
||||
"""<a href="%s">foo</a>""" % protocol,
|
||||
toxml)
|
||||
|
||||
for protocol in sanitizer.HTMLSanitizer.allowed_protocols:
|
||||
yield (runSanitizerTest, "test_should_allow_uppercase_%s_uris" % protocol,
|
||||
"<a href=\"%s\">foo</a>" % protocol,
|
||||
"""<a href="%s">foo</a>""" % protocol,
|
||||
toxml)
|
|
@ -0,0 +1,178 @@
|
|||
from __future__ import absolute_import, division, unicode_literals
|
||||
|
||||
import json
|
||||
import unittest
|
||||
|
||||
from .support import get_data_files
|
||||
|
||||
try:
|
||||
unittest.TestCase.assertEqual
|
||||
except AttributeError:
|
||||
unittest.TestCase.assertEqual = unittest.TestCase.assertEquals
|
||||
|
||||
import html5lib
|
||||
from html5lib import constants
|
||||
from html5lib.serializer import HTMLSerializer, serialize
|
||||
from html5lib.treewalkers._base import TreeWalker
|
||||
|
||||
optionals_loaded = []
|
||||
|
||||
try:
|
||||
from lxml import etree
|
||||
optionals_loaded.append("lxml")
|
||||
except ImportError:
|
||||
pass
|
||||
|
||||
default_namespace = constants.namespaces["html"]
|
||||
|
||||
|
||||
class JsonWalker(TreeWalker):
|
||||
def __iter__(self):
|
||||
for token in self.tree:
|
||||
type = token[0]
|
||||
if type == "StartTag":
|
||||
if len(token) == 4:
|
||||
namespace, name, attrib = token[1:4]
|
||||
else:
|
||||
namespace = default_namespace
|
||||
name, attrib = token[1:3]
|
||||
yield self.startTag(namespace, name, self._convertAttrib(attrib))
|
||||
elif type == "EndTag":
|
||||
if len(token) == 3:
|
||||
namespace, name = token[1:3]
|
||||
else:
|
||||
namespace = default_namespace
|
||||
name = token[1]
|
||||
yield self.endTag(namespace, name)
|
||||
elif type == "EmptyTag":
|
||||
if len(token) == 4:
|
||||
namespace, name, attrib = token[1:]
|
||||
else:
|
||||
namespace = default_namespace
|
||||
name, attrib = token[1:]
|
||||
for token in self.emptyTag(namespace, name, self._convertAttrib(attrib)):
|
||||
yield token
|
||||
elif type == "Comment":
|
||||
yield self.comment(token[1])
|
||||
elif type in ("Characters", "SpaceCharacters"):
|
||||
for token in self.text(token[1]):
|
||||
yield token
|
||||
elif type == "Doctype":
|
||||
if len(token) == 4:
|
||||
yield self.doctype(token[1], token[2], token[3])
|
||||
elif len(token) == 3:
|
||||
yield self.doctype(token[1], token[2])
|
||||
else:
|
||||
yield self.doctype(token[1])
|
||||
else:
|
||||
raise ValueError("Unknown token type: " + type)
|
||||
|
||||
def _convertAttrib(self, attribs):
|
||||
"""html5lib tree-walkers use a dict of (namespace, name): value for
|
||||
attributes, but JSON cannot represent this. Convert from the format
|
||||
in the serializer tests (a list of dicts with "namespace", "name",
|
||||
and "value" as keys) to html5lib's tree-walker format."""
|
||||
attrs = {}
|
||||
for attrib in attribs:
|
||||
name = (attrib["namespace"], attrib["name"])
|
||||
assert(name not in attrs)
|
||||
attrs[name] = attrib["value"]
|
||||
return attrs
|
||||
|
||||
|
||||
def serialize_html(input, options):
|
||||
options = dict([(str(k), v) for k, v in options.items()])
|
||||
stream = JsonWalker(input)
|
||||
serializer = HTMLSerializer(alphabetical_attributes=True, **options)
|
||||
return serializer.render(stream, options.get("encoding", None))
|
||||
|
||||
|
||||
def runSerializerTest(input, expected, options):
|
||||
encoding = options.get("encoding", None)
|
||||
|
||||
if encoding:
|
||||
encode = lambda x: x.encode(encoding)
|
||||
expected = list(map(encode, expected))
|
||||
|
||||
result = serialize_html(input, options)
|
||||
if len(expected) == 1:
|
||||
assert expected[0] == result, "Expected:\n%s\nActual:\n%s\nOptions:\n%s" % (expected[0], result, str(options))
|
||||
elif result not in expected:
|
||||
assert False, "Expected: %s, Received: %s" % (expected, result)
|
||||
|
||||
|
||||
class EncodingTestCase(unittest.TestCase):
|
||||
def throwsWithLatin1(self, input):
|
||||
self.assertRaises(UnicodeEncodeError, serialize_html, input, {"encoding": "iso-8859-1"})
|
||||
|
||||
def testDoctypeName(self):
|
||||
self.throwsWithLatin1([["Doctype", "\u0101"]])
|
||||
|
||||
def testDoctypePublicId(self):
|
||||
self.throwsWithLatin1([["Doctype", "potato", "\u0101"]])
|
||||
|
||||
def testDoctypeSystemId(self):
|
||||
self.throwsWithLatin1([["Doctype", "potato", "potato", "\u0101"]])
|
||||
|
||||
def testCdataCharacters(self):
|
||||
runSerializerTest([["StartTag", "http://www.w3.org/1999/xhtml", "style", {}], ["Characters", "\u0101"]],
|
||||
["<style>ā"], {"encoding": "iso-8859-1"})
|
||||
|
||||
def testCharacters(self):
|
||||
runSerializerTest([["Characters", "\u0101"]],
|
||||
["ā"], {"encoding": "iso-8859-1"})
|
||||
|
||||
def testStartTagName(self):
|
||||
self.throwsWithLatin1([["StartTag", "http://www.w3.org/1999/xhtml", "\u0101", []]])
|
||||
|
||||
def testEmptyTagName(self):
|
||||
self.throwsWithLatin1([["EmptyTag", "http://www.w3.org/1999/xhtml", "\u0101", []]])
|
||||
|
||||
def testAttributeName(self):
|
||||
self.throwsWithLatin1([["StartTag", "http://www.w3.org/1999/xhtml", "span", [{"namespace": None, "name": "\u0101", "value": "potato"}]]])
|
||||
|
||||
def testAttributeValue(self):
|
||||
runSerializerTest([["StartTag", "http://www.w3.org/1999/xhtml", "span",
|
||||
[{"namespace": None, "name": "potato", "value": "\u0101"}]]],
|
||||
["<span potato=ā>"], {"encoding": "iso-8859-1"})
|
||||
|
||||
def testEndTagName(self):
|
||||
self.throwsWithLatin1([["EndTag", "http://www.w3.org/1999/xhtml", "\u0101"]])
|
||||
|
||||
def testComment(self):
|
||||
self.throwsWithLatin1([["Comment", "\u0101"]])
|
||||
|
||||
|
||||
if "lxml" in optionals_loaded:
|
||||
class LxmlTestCase(unittest.TestCase):
|
||||
def setUp(self):
|
||||
self.parser = etree.XMLParser(resolve_entities=False)
|
||||
self.treewalker = html5lib.getTreeWalker("lxml")
|
||||
self.serializer = HTMLSerializer()
|
||||
|
||||
def testEntityReplacement(self):
|
||||
doc = """<!DOCTYPE html SYSTEM "about:legacy-compat"><html>β</html>"""
|
||||
tree = etree.fromstring(doc, parser=self.parser).getroottree()
|
||||
result = serialize(tree, tree="lxml", omit_optional_tags=False)
|
||||
self.assertEqual("""<!DOCTYPE html SYSTEM "about:legacy-compat"><html>\u03B2</html>""", result)
|
||||
|
||||
def testEntityXML(self):
|
||||
doc = """<!DOCTYPE html SYSTEM "about:legacy-compat"><html>></html>"""
|
||||
tree = etree.fromstring(doc, parser=self.parser).getroottree()
|
||||
result = serialize(tree, tree="lxml", omit_optional_tags=False)
|
||||
self.assertEqual("""<!DOCTYPE html SYSTEM "about:legacy-compat"><html>></html>""", result)
|
||||
|
||||
def testEntityNoResolve(self):
|
||||
doc = """<!DOCTYPE html SYSTEM "about:legacy-compat"><html>β</html>"""
|
||||
tree = etree.fromstring(doc, parser=self.parser).getroottree()
|
||||
result = serialize(tree, tree="lxml", omit_optional_tags=False,
|
||||
resolve_entities=False)
|
||||
self.assertEqual("""<!DOCTYPE html SYSTEM "about:legacy-compat"><html>β</html>""", result)
|
||||
|
||||
|
||||
def test_serializer():
|
||||
for filename in get_data_files('serializer', '*.test'):
|
||||
with open(filename) as fp:
|
||||
tests = json.load(fp)
|
||||
for index, test in enumerate(tests['tests']):
|
||||
yield runSerializerTest, test["input"], test["expected"], test.get("options", {})
|
|
@ -0,0 +1,183 @@
|
|||
from __future__ import absolute_import, division, unicode_literals
|
||||
|
||||
from . import support # flake8: noqa
|
||||
import unittest
|
||||
import codecs
|
||||
from io import BytesIO
|
||||
|
||||
from six.moves import http_client
|
||||
|
||||
from html5lib.inputstream import (BufferedStream, HTMLInputStream,
|
||||
HTMLUnicodeInputStream, HTMLBinaryInputStream)
|
||||
|
||||
class BufferedStreamTest(unittest.TestCase):
|
||||
def test_basic(self):
|
||||
s = b"abc"
|
||||
fp = BufferedStream(BytesIO(s))
|
||||
read = fp.read(10)
|
||||
assert read == s
|
||||
|
||||
def test_read_length(self):
|
||||
fp = BufferedStream(BytesIO(b"abcdef"))
|
||||
read1 = fp.read(1)
|
||||
assert read1 == b"a"
|
||||
read2 = fp.read(2)
|
||||
assert read2 == b"bc"
|
||||
read3 = fp.read(3)
|
||||
assert read3 == b"def"
|
||||
read4 = fp.read(4)
|
||||
assert read4 == b""
|
||||
|
||||
def test_tell(self):
|
||||
fp = BufferedStream(BytesIO(b"abcdef"))
|
||||
read1 = fp.read(1)
|
||||
assert fp.tell() == 1
|
||||
read2 = fp.read(2)
|
||||
assert fp.tell() == 3
|
||||
read3 = fp.read(3)
|
||||
assert fp.tell() == 6
|
||||
read4 = fp.read(4)
|
||||
assert fp.tell() == 6
|
||||
|
||||
def test_seek(self):
|
||||
fp = BufferedStream(BytesIO(b"abcdef"))
|
||||
read1 = fp.read(1)
|
||||
assert read1 == b"a"
|
||||
fp.seek(0)
|
||||
read2 = fp.read(1)
|
||||
assert read2 == b"a"
|
||||
read3 = fp.read(2)
|
||||
assert read3 == b"bc"
|
||||
fp.seek(2)
|
||||
read4 = fp.read(2)
|
||||
assert read4 == b"cd"
|
||||
fp.seek(4)
|
||||
read5 = fp.read(2)
|
||||
assert read5 == b"ef"
|
||||
|
||||
def test_seek_tell(self):
|
||||
fp = BufferedStream(BytesIO(b"abcdef"))
|
||||
read1 = fp.read(1)
|
||||
assert fp.tell() == 1
|
||||
fp.seek(0)
|
||||
read2 = fp.read(1)
|
||||
assert fp.tell() == 1
|
||||
read3 = fp.read(2)
|
||||
assert fp.tell() == 3
|
||||
fp.seek(2)
|
||||
read4 = fp.read(2)
|
||||
assert fp.tell() == 4
|
||||
fp.seek(4)
|
||||
read5 = fp.read(2)
|
||||
assert fp.tell() == 6
|
||||
|
||||
|
||||
class HTMLUnicodeInputStreamShortChunk(HTMLUnicodeInputStream):
|
||||
_defaultChunkSize = 2
|
||||
|
||||
|
||||
class HTMLBinaryInputStreamShortChunk(HTMLBinaryInputStream):
|
||||
_defaultChunkSize = 2
|
||||
|
||||
|
||||
class HTMLInputStreamTest(unittest.TestCase):
|
||||
|
||||
def test_char_ascii(self):
|
||||
stream = HTMLInputStream(b"'", encoding='ascii')
|
||||
self.assertEqual(stream.charEncoding[0], 'ascii')
|
||||
self.assertEqual(stream.char(), "'")
|
||||
|
||||
def test_char_utf8(self):
|
||||
stream = HTMLInputStream('\u2018'.encode('utf-8'), encoding='utf-8')
|
||||
self.assertEqual(stream.charEncoding[0], 'utf-8')
|
||||
self.assertEqual(stream.char(), '\u2018')
|
||||
|
||||
def test_char_win1252(self):
|
||||
stream = HTMLInputStream("\xa9\xf1\u2019".encode('windows-1252'))
|
||||
self.assertEqual(stream.charEncoding[0], 'windows-1252')
|
||||
self.assertEqual(stream.char(), "\xa9")
|
||||
self.assertEqual(stream.char(), "\xf1")
|
||||
self.assertEqual(stream.char(), "\u2019")
|
||||
|
||||
def test_bom(self):
|
||||
stream = HTMLInputStream(codecs.BOM_UTF8 + b"'")
|
||||
self.assertEqual(stream.charEncoding[0], 'utf-8')
|
||||
self.assertEqual(stream.char(), "'")
|
||||
|
||||
def test_utf_16(self):
|
||||
stream = HTMLInputStream((' ' * 1025).encode('utf-16'))
|
||||
self.assertTrue(stream.charEncoding[0] in ['utf-16-le', 'utf-16-be'], stream.charEncoding)
|
||||
self.assertEqual(len(stream.charsUntil(' ', True)), 1025)
|
||||
|
||||
def test_newlines(self):
|
||||
stream = HTMLBinaryInputStreamShortChunk(codecs.BOM_UTF8 + b"a\nbb\r\nccc\rddddxe")
|
||||
self.assertEqual(stream.position(), (1, 0))
|
||||
self.assertEqual(stream.charsUntil('c'), "a\nbb\n")
|
||||
self.assertEqual(stream.position(), (3, 0))
|
||||
self.assertEqual(stream.charsUntil('x'), "ccc\ndddd")
|
||||
self.assertEqual(stream.position(), (4, 4))
|
||||
self.assertEqual(stream.charsUntil('e'), "x")
|
||||
self.assertEqual(stream.position(), (4, 5))
|
||||
|
||||
def test_newlines2(self):
|
||||
size = HTMLUnicodeInputStream._defaultChunkSize
|
||||
stream = HTMLInputStream("\r" * size + "\n")
|
||||
self.assertEqual(stream.charsUntil('x'), "\n" * size)
|
||||
|
||||
def test_position(self):
|
||||
stream = HTMLBinaryInputStreamShortChunk(codecs.BOM_UTF8 + b"a\nbb\nccc\nddde\nf\ngh")
|
||||
self.assertEqual(stream.position(), (1, 0))
|
||||
self.assertEqual(stream.charsUntil('c'), "a\nbb\n")
|
||||
self.assertEqual(stream.position(), (3, 0))
|
||||
stream.unget("\n")
|
||||
self.assertEqual(stream.position(), (2, 2))
|
||||
self.assertEqual(stream.charsUntil('c'), "\n")
|
||||
self.assertEqual(stream.position(), (3, 0))
|
||||
stream.unget("\n")
|
||||
self.assertEqual(stream.position(), (2, 2))
|
||||
self.assertEqual(stream.char(), "\n")
|
||||
self.assertEqual(stream.position(), (3, 0))
|
||||
self.assertEqual(stream.charsUntil('e'), "ccc\nddd")
|
||||
self.assertEqual(stream.position(), (4, 3))
|
||||
self.assertEqual(stream.charsUntil('h'), "e\nf\ng")
|
||||
self.assertEqual(stream.position(), (6, 1))
|
||||
|
||||
def test_position2(self):
|
||||
stream = HTMLUnicodeInputStreamShortChunk("abc\nd")
|
||||
self.assertEqual(stream.position(), (1, 0))
|
||||
self.assertEqual(stream.char(), "a")
|
||||
self.assertEqual(stream.position(), (1, 1))
|
||||
self.assertEqual(stream.char(), "b")
|
||||
self.assertEqual(stream.position(), (1, 2))
|
||||
self.assertEqual(stream.char(), "c")
|
||||
self.assertEqual(stream.position(), (1, 3))
|
||||
self.assertEqual(stream.char(), "\n")
|
||||
self.assertEqual(stream.position(), (2, 0))
|
||||
self.assertEqual(stream.char(), "d")
|
||||
self.assertEqual(stream.position(), (2, 1))
|
||||
|
||||
def test_python_issue_20007(self):
|
||||
"""
|
||||
Make sure we have a work-around for Python bug #20007
|
||||
http://bugs.python.org/issue20007
|
||||
"""
|
||||
class FakeSocket(object):
|
||||
def makefile(self, _mode, _bufsize=None):
|
||||
return BytesIO(b"HTTP/1.1 200 Ok\r\n\r\nText")
|
||||
|
||||
source = http_client.HTTPResponse(FakeSocket())
|
||||
source.begin()
|
||||
stream = HTMLInputStream(source)
|
||||
self.assertEqual(stream.charsUntil(" "), "Text")
|
||||
|
||||
|
||||
def buildTestSuite():
|
||||
return unittest.defaultTestLoader.loadTestsFromName(__name__)
|
||||
|
||||
|
||||
def main():
|
||||
buildTestSuite()
|
||||
unittest.main()
|
||||
|
||||
if __name__ == '__main__':
|
||||
main()
|
|
@ -0,0 +1,188 @@
|
|||
from __future__ import absolute_import, division, unicode_literals
|
||||
|
||||
import json
|
||||
import warnings
|
||||
import re
|
||||
|
||||
from .support import get_data_files
|
||||
|
||||
from html5lib.tokenizer import HTMLTokenizer
|
||||
from html5lib import constants
|
||||
|
||||
|
||||
class TokenizerTestParser(object):
|
||||
def __init__(self, initialState, lastStartTag=None):
|
||||
self.tokenizer = HTMLTokenizer
|
||||
self._state = initialState
|
||||
self._lastStartTag = lastStartTag
|
||||
|
||||
def parse(self, stream, encoding=None, innerHTML=False):
|
||||
tokenizer = self.tokenizer(stream, encoding)
|
||||
self.outputTokens = []
|
||||
|
||||
tokenizer.state = getattr(tokenizer, self._state)
|
||||
if self._lastStartTag is not None:
|
||||
tokenizer.currentToken = {"type": "startTag",
|
||||
"name": self._lastStartTag}
|
||||
|
||||
types = dict((v, k) for k, v in constants.tokenTypes.items())
|
||||
for token in tokenizer:
|
||||
getattr(self, 'process%s' % types[token["type"]])(token)
|
||||
|
||||
return self.outputTokens
|
||||
|
||||
def processDoctype(self, token):
|
||||
self.outputTokens.append(["DOCTYPE", token["name"], token["publicId"],
|
||||
token["systemId"], token["correct"]])
|
||||
|
||||
def processStartTag(self, token):
|
||||
self.outputTokens.append(["StartTag", token["name"],
|
||||
dict(token["data"][::-1]), token["selfClosing"]])
|
||||
|
||||
def processEmptyTag(self, token):
|
||||
if token["name"] not in constants.voidElements:
|
||||
self.outputTokens.append("ParseError")
|
||||
self.outputTokens.append(["StartTag", token["name"], dict(token["data"][::-1])])
|
||||
|
||||
def processEndTag(self, token):
|
||||
self.outputTokens.append(["EndTag", token["name"],
|
||||
token["selfClosing"]])
|
||||
|
||||
def processComment(self, token):
|
||||
self.outputTokens.append(["Comment", token["data"]])
|
||||
|
||||
def processSpaceCharacters(self, token):
|
||||
self.outputTokens.append(["Character", token["data"]])
|
||||
self.processSpaceCharacters = self.processCharacters
|
||||
|
||||
def processCharacters(self, token):
|
||||
self.outputTokens.append(["Character", token["data"]])
|
||||
|
||||
def processEOF(self, token):
|
||||
pass
|
||||
|
||||
def processParseError(self, token):
|
||||
self.outputTokens.append(["ParseError", token["data"]])
|
||||
|
||||
|
||||
def concatenateCharacterTokens(tokens):
|
||||
outputTokens = []
|
||||
for token in tokens:
|
||||
if "ParseError" not in token and token[0] == "Character":
|
||||
if (outputTokens and "ParseError" not in outputTokens[-1] and
|
||||
outputTokens[-1][0] == "Character"):
|
||||
outputTokens[-1][1] += token[1]
|
||||
else:
|
||||
outputTokens.append(token)
|
||||
else:
|
||||
outputTokens.append(token)
|
||||
return outputTokens
|
||||
|
||||
|
||||
def normalizeTokens(tokens):
|
||||
# TODO: convert tests to reflect arrays
|
||||
for i, token in enumerate(tokens):
|
||||
if token[0] == 'ParseError':
|
||||
tokens[i] = token[0]
|
||||
return tokens
|
||||
|
||||
|
||||
def tokensMatch(expectedTokens, receivedTokens, ignoreErrorOrder,
|
||||
ignoreErrors=False):
|
||||
"""Test whether the test has passed or failed
|
||||
|
||||
If the ignoreErrorOrder flag is set to true we don't test the relative
|
||||
positions of parse errors and non parse errors
|
||||
"""
|
||||
checkSelfClosing = False
|
||||
for token in expectedTokens:
|
||||
if (token[0] == "StartTag" and len(token) == 4
|
||||
or token[0] == "EndTag" and len(token) == 3):
|
||||
checkSelfClosing = True
|
||||
break
|
||||
|
||||
if not checkSelfClosing:
|
||||
for token in receivedTokens:
|
||||
if token[0] == "StartTag" or token[0] == "EndTag":
|
||||
token.pop()
|
||||
|
||||
if not ignoreErrorOrder and not ignoreErrors:
|
||||
return expectedTokens == receivedTokens
|
||||
else:
|
||||
# Sort the tokens into two groups; non-parse errors and parse errors
|
||||
tokens = {"expected": [[], []], "received": [[], []]}
|
||||
for tokenType, tokenList in zip(list(tokens.keys()),
|
||||
(expectedTokens, receivedTokens)):
|
||||
for token in tokenList:
|
||||
if token != "ParseError":
|
||||
tokens[tokenType][0].append(token)
|
||||
else:
|
||||
if not ignoreErrors:
|
||||
tokens[tokenType][1].append(token)
|
||||
return tokens["expected"] == tokens["received"]
|
||||
|
||||
|
||||
def unescape(test):
|
||||
def decode(inp):
|
||||
return inp.encode("utf-8").decode("unicode-escape")
|
||||
|
||||
test["input"] = decode(test["input"])
|
||||
for token in test["output"]:
|
||||
if token == "ParseError":
|
||||
continue
|
||||
else:
|
||||
token[1] = decode(token[1])
|
||||
if len(token) > 2:
|
||||
for key, value in token[2]:
|
||||
del token[2][key]
|
||||
token[2][decode(key)] = decode(value)
|
||||
return test
|
||||
|
||||
|
||||
def runTokenizerTest(test):
|
||||
warnings.resetwarnings()
|
||||
warnings.simplefilter("error")
|
||||
|
||||
expected = concatenateCharacterTokens(test['output'])
|
||||
if 'lastStartTag' not in test:
|
||||
test['lastStartTag'] = None
|
||||
parser = TokenizerTestParser(test['initialState'],
|
||||
test['lastStartTag'])
|
||||
tokens = parser.parse(test['input'])
|
||||
tokens = concatenateCharacterTokens(tokens)
|
||||
received = normalizeTokens(tokens)
|
||||
errorMsg = "\n".join(["\n\nInitial state:",
|
||||
test['initialState'],
|
||||
"\nInput:", test['input'],
|
||||
"\nExpected:", repr(expected),
|
||||
"\nreceived:", repr(tokens)])
|
||||
errorMsg = errorMsg
|
||||
ignoreErrorOrder = test.get('ignoreErrorOrder', False)
|
||||
assert tokensMatch(expected, received, ignoreErrorOrder, True), errorMsg
|
||||
|
||||
|
||||
def _doCapitalize(match):
|
||||
return match.group(1).upper()
|
||||
|
||||
_capitalizeRe = re.compile(r"\W+(\w)").sub
|
||||
|
||||
|
||||
def capitalize(s):
|
||||
s = s.lower()
|
||||
s = _capitalizeRe(_doCapitalize, s)
|
||||
return s
|
||||
|
||||
|
||||
def testTokenizer():
|
||||
for filename in get_data_files('tokenizer', '*.test'):
|
||||
with open(filename) as fp:
|
||||
tests = json.load(fp)
|
||||
if 'tests' in tests:
|
||||
for index, test in enumerate(tests['tests']):
|
||||
if 'initialStates' not in test:
|
||||
test["initialStates"] = ["Data state"]
|
||||
if 'doubleEscaped' in test:
|
||||
test = unescape(test)
|
||||
for initialState in test["initialStates"]:
|
||||
test["initialState"] = capitalize(initialState)
|
||||
yield runTokenizerTest, test
|
|
@ -0,0 +1,40 @@
|
|||
from __future__ import absolute_import, division, unicode_literals
|
||||
|
||||
from . import support # flake8: noqa
|
||||
|
||||
import html5lib
|
||||
from html5lib.treeadapters import sax
|
||||
from html5lib.treewalkers import getTreeWalker
|
||||
|
||||
|
||||
def test_to_sax():
|
||||
handler = support.TracingSaxHandler()
|
||||
tree = html5lib.parse("""<html xml:lang="en">
|
||||
<title>Directory Listing</title>
|
||||
<a href="/"><b/></p>
|
||||
""", treebuilder="etree")
|
||||
walker = getTreeWalker("etree")
|
||||
sax.to_sax(walker(tree), handler)
|
||||
expected = [
|
||||
'startDocument',
|
||||
('startElementNS', ('http://www.w3.org/1999/xhtml', 'html'),
|
||||
'html', {(None, 'xml:lang'): 'en'}),
|
||||
('startElementNS', ('http://www.w3.org/1999/xhtml', 'head'), 'head', {}),
|
||||
('startElementNS', ('http://www.w3.org/1999/xhtml', 'title'), 'title', {}),
|
||||
('characters', 'Directory Listing'),
|
||||
('endElementNS', ('http://www.w3.org/1999/xhtml', 'title'), 'title'),
|
||||
('characters', '\n '),
|
||||
('endElementNS', ('http://www.w3.org/1999/xhtml', 'head'), 'head'),
|
||||
('startElementNS', ('http://www.w3.org/1999/xhtml', 'body'), 'body', {}),
|
||||
('startElementNS', ('http://www.w3.org/1999/xhtml', 'a'), 'a', {(None, 'href'): '/'}),
|
||||
('startElementNS', ('http://www.w3.org/1999/xhtml', 'b'), 'b', {}),
|
||||
('startElementNS', ('http://www.w3.org/1999/xhtml', 'p'), 'p', {}),
|
||||
('endElementNS', ('http://www.w3.org/1999/xhtml', 'p'), 'p'),
|
||||
('characters', '\n '),
|
||||
('endElementNS', ('http://www.w3.org/1999/xhtml', 'b'), 'b'),
|
||||
('endElementNS', ('http://www.w3.org/1999/xhtml', 'a'), 'a'),
|
||||
('endElementNS', ('http://www.w3.org/1999/xhtml', 'body'), 'body'),
|
||||
('endElementNS', ('http://www.w3.org/1999/xhtml', 'html'), 'html'),
|
||||
'endDocument',
|
||||
]
|
||||
assert expected == handler.visited
|
|
@ -0,0 +1,353 @@
|
|||
from __future__ import absolute_import, division, unicode_literals
|
||||
|
||||
import os
|
||||
import sys
|
||||
import unittest
|
||||
import warnings
|
||||
from difflib import unified_diff
|
||||
|
||||
try:
|
||||
unittest.TestCase.assertEqual
|
||||
except AttributeError:
|
||||
unittest.TestCase.assertEqual = unittest.TestCase.assertEquals
|
||||
|
||||
from .support import get_data_files, TestData, convertExpected
|
||||
|
||||
from html5lib import html5parser, treewalkers, treebuilders, constants
|
||||
|
||||
|
||||
def PullDOMAdapter(node):
|
||||
from xml.dom import Node
|
||||
from xml.dom.pulldom import START_ELEMENT, END_ELEMENT, COMMENT, CHARACTERS
|
||||
|
||||
if node.nodeType in (Node.DOCUMENT_NODE, Node.DOCUMENT_FRAGMENT_NODE):
|
||||
for childNode in node.childNodes:
|
||||
for event in PullDOMAdapter(childNode):
|
||||
yield event
|
||||
|
||||
elif node.nodeType == Node.DOCUMENT_TYPE_NODE:
|
||||
raise NotImplementedError("DOCTYPE nodes are not supported by PullDOM")
|
||||
|
||||
elif node.nodeType == Node.COMMENT_NODE:
|
||||
yield COMMENT, node
|
||||
|
||||
elif node.nodeType in (Node.TEXT_NODE, Node.CDATA_SECTION_NODE):
|
||||
yield CHARACTERS, node
|
||||
|
||||
elif node.nodeType == Node.ELEMENT_NODE:
|
||||
yield START_ELEMENT, node
|
||||
for childNode in node.childNodes:
|
||||
for event in PullDOMAdapter(childNode):
|
||||
yield event
|
||||
yield END_ELEMENT, node
|
||||
|
||||
else:
|
||||
raise NotImplementedError("Node type not supported: " + str(node.nodeType))
|
||||
|
||||
treeTypes = {
|
||||
"DOM": {"builder": treebuilders.getTreeBuilder("dom"),
|
||||
"walker": treewalkers.getTreeWalker("dom")},
|
||||
"PullDOM": {"builder": treebuilders.getTreeBuilder("dom"),
|
||||
"adapter": PullDOMAdapter,
|
||||
"walker": treewalkers.getTreeWalker("pulldom")},
|
||||
}
|
||||
|
||||
# Try whatever etree implementations are available from a list that are
|
||||
#"supposed" to work
|
||||
try:
|
||||
import xml.etree.ElementTree as ElementTree
|
||||
except ImportError:
|
||||
pass
|
||||
else:
|
||||
treeTypes['ElementTree'] = \
|
||||
{"builder": treebuilders.getTreeBuilder("etree", ElementTree),
|
||||
"walker": treewalkers.getTreeWalker("etree", ElementTree)}
|
||||
|
||||
try:
|
||||
import xml.etree.cElementTree as ElementTree
|
||||
except ImportError:
|
||||
pass
|
||||
else:
|
||||
treeTypes['cElementTree'] = \
|
||||
{"builder": treebuilders.getTreeBuilder("etree", ElementTree),
|
||||
"walker": treewalkers.getTreeWalker("etree", ElementTree)}
|
||||
|
||||
|
||||
try:
|
||||
import lxml.etree as ElementTree # flake8: noqa
|
||||
except ImportError:
|
||||
pass
|
||||
else:
|
||||
treeTypes['lxml_native'] = \
|
||||
{"builder": treebuilders.getTreeBuilder("lxml"),
|
||||
"walker": treewalkers.getTreeWalker("lxml")}
|
||||
|
||||
|
||||
try:
|
||||
from genshi.core import QName, Attrs
|
||||
from genshi.core import START, END, TEXT, COMMENT, DOCTYPE
|
||||
except ImportError:
|
||||
pass
|
||||
else:
|
||||
def GenshiAdapter(tree):
|
||||
text = None
|
||||
for token in treewalkers.getTreeWalker("dom")(tree):
|
||||
type = token["type"]
|
||||
if type in ("Characters", "SpaceCharacters"):
|
||||
if text is None:
|
||||
text = token["data"]
|
||||
else:
|
||||
text += token["data"]
|
||||
elif text is not None:
|
||||
yield TEXT, text, (None, -1, -1)
|
||||
text = None
|
||||
|
||||
if type in ("StartTag", "EmptyTag"):
|
||||
if token["namespace"]:
|
||||
name = "{%s}%s" % (token["namespace"], token["name"])
|
||||
else:
|
||||
name = token["name"]
|
||||
attrs = Attrs([(QName("{%s}%s" % attr if attr[0] is not None else attr[1]), value)
|
||||
for attr, value in token["data"].items()])
|
||||
yield (START, (QName(name), attrs), (None, -1, -1))
|
||||
if type == "EmptyTag":
|
||||
type = "EndTag"
|
||||
|
||||
if type == "EndTag":
|
||||
if token["namespace"]:
|
||||
name = "{%s}%s" % (token["namespace"], token["name"])
|
||||
else:
|
||||
name = token["name"]
|
||||
|
||||
yield END, QName(name), (None, -1, -1)
|
||||
|
||||
elif type == "Comment":
|
||||
yield COMMENT, token["data"], (None, -1, -1)
|
||||
|
||||
elif type == "Doctype":
|
||||
yield DOCTYPE, (token["name"], token["publicId"],
|
||||
token["systemId"]), (None, -1, -1)
|
||||
|
||||
else:
|
||||
pass # FIXME: What to do?
|
||||
|
||||
if text is not None:
|
||||
yield TEXT, text, (None, -1, -1)
|
||||
|
||||
treeTypes["genshi"] = \
|
||||
{"builder": treebuilders.getTreeBuilder("dom"),
|
||||
"adapter": GenshiAdapter,
|
||||
"walker": treewalkers.getTreeWalker("genshi")}
|
||||
|
||||
|
||||
def concatenateCharacterTokens(tokens):
|
||||
charactersToken = None
|
||||
for token in tokens:
|
||||
type = token["type"]
|
||||
if type in ("Characters", "SpaceCharacters"):
|
||||
if charactersToken is None:
|
||||
charactersToken = {"type": "Characters", "data": token["data"]}
|
||||
else:
|
||||
charactersToken["data"] += token["data"]
|
||||
else:
|
||||
if charactersToken is not None:
|
||||
yield charactersToken
|
||||
charactersToken = None
|
||||
yield token
|
||||
if charactersToken is not None:
|
||||
yield charactersToken
|
||||
|
||||
|
||||
def convertTokens(tokens):
|
||||
output = []
|
||||
indent = 0
|
||||
for token in concatenateCharacterTokens(tokens):
|
||||
type = token["type"]
|
||||
if type in ("StartTag", "EmptyTag"):
|
||||
if (token["namespace"] and
|
||||
token["namespace"] != constants.namespaces["html"]):
|
||||
if token["namespace"] in constants.prefixes:
|
||||
name = constants.prefixes[token["namespace"]]
|
||||
else:
|
||||
name = token["namespace"]
|
||||
name += " " + token["name"]
|
||||
else:
|
||||
name = token["name"]
|
||||
output.append("%s<%s>" % (" " * indent, name))
|
||||
indent += 2
|
||||
attrs = token["data"]
|
||||
if attrs:
|
||||
# TODO: Remove this if statement, attrs should always exist
|
||||
for (namespace, name), value in sorted(attrs.items()):
|
||||
if namespace:
|
||||
if namespace in constants.prefixes:
|
||||
outputname = constants.prefixes[namespace]
|
||||
else:
|
||||
outputname = namespace
|
||||
outputname += " " + name
|
||||
else:
|
||||
outputname = name
|
||||
output.append("%s%s=\"%s\"" % (" " * indent, outputname, value))
|
||||
if type == "EmptyTag":
|
||||
indent -= 2
|
||||
elif type == "EndTag":
|
||||
indent -= 2
|
||||
elif type == "Comment":
|
||||
output.append("%s<!-- %s -->" % (" " * indent, token["data"]))
|
||||
elif type == "Doctype":
|
||||
if token["name"]:
|
||||
if token["publicId"]:
|
||||
output.append("""%s<!DOCTYPE %s "%s" "%s">""" %
|
||||
(" " * indent, token["name"],
|
||||
token["publicId"],
|
||||
token["systemId"] and token["systemId"] or ""))
|
||||
elif token["systemId"]:
|
||||
output.append("""%s<!DOCTYPE %s "" "%s">""" %
|
||||
(" " * indent, token["name"],
|
||||
token["systemId"]))
|
||||
else:
|
||||
output.append("%s<!DOCTYPE %s>" % (" " * indent,
|
||||
token["name"]))
|
||||
else:
|
||||
output.append("%s<!DOCTYPE >" % (" " * indent,))
|
||||
elif type in ("Characters", "SpaceCharacters"):
|
||||
output.append("%s\"%s\"" % (" " * indent, token["data"]))
|
||||
else:
|
||||
pass # TODO: what to do with errors?
|
||||
return "\n".join(output)
|
||||
|
||||
import re
|
||||
attrlist = re.compile(r"^(\s+)\w+=.*(\n\1\w+=.*)+", re.M)
|
||||
|
||||
|
||||
def sortattrs(x):
|
||||
lines = x.group(0).split("\n")
|
||||
lines.sort()
|
||||
return "\n".join(lines)
|
||||
|
||||
|
||||
class TokenTestCase(unittest.TestCase):
|
||||
def test_all_tokens(self):
|
||||
expected = [
|
||||
{'data': {}, 'type': 'StartTag', 'namespace': 'http://www.w3.org/1999/xhtml', 'name': 'html'},
|
||||
{'data': {}, 'type': 'StartTag', 'namespace': 'http://www.w3.org/1999/xhtml', 'name': 'head'},
|
||||
{'data': {}, 'type': 'EndTag', 'namespace': 'http://www.w3.org/1999/xhtml', 'name': 'head'},
|
||||
{'data': {}, 'type': 'StartTag', 'namespace': 'http://www.w3.org/1999/xhtml', 'name': 'body'},
|
||||
{'data': 'a', 'type': 'Characters'},
|
||||
{'data': {}, 'type': 'StartTag', 'namespace': 'http://www.w3.org/1999/xhtml', 'name': 'div'},
|
||||
{'data': 'b', 'type': 'Characters'},
|
||||
{'data': {}, 'type': 'EndTag', 'namespace': 'http://www.w3.org/1999/xhtml', 'name': 'div'},
|
||||
{'data': 'c', 'type': 'Characters'},
|
||||
{'data': {}, 'type': 'EndTag', 'namespace': 'http://www.w3.org/1999/xhtml', 'name': 'body'},
|
||||
{'data': {}, 'type': 'EndTag', 'namespace': 'http://www.w3.org/1999/xhtml', 'name': 'html'}
|
||||
]
|
||||
for treeName, treeCls in treeTypes.items():
|
||||
p = html5parser.HTMLParser(tree=treeCls["builder"])
|
||||
document = p.parse("<html><head></head><body>a<div>b</div>c</body></html>")
|
||||
document = treeCls.get("adapter", lambda x: x)(document)
|
||||
output = treeCls["walker"](document)
|
||||
for expectedToken, outputToken in zip(expected, output):
|
||||
self.assertEqual(expectedToken, outputToken)
|
||||
|
||||
|
||||
def runTreewalkerTest(innerHTML, input, expected, errors, treeClass):
|
||||
warnings.resetwarnings()
|
||||
warnings.simplefilter("error")
|
||||
try:
|
||||
p = html5parser.HTMLParser(tree=treeClass["builder"])
|
||||
if innerHTML:
|
||||
document = p.parseFragment(input, innerHTML)
|
||||
else:
|
||||
document = p.parse(input)
|
||||
except constants.DataLossWarning:
|
||||
# Ignore testcases we know we don't pass
|
||||
return
|
||||
|
||||
document = treeClass.get("adapter", lambda x: x)(document)
|
||||
try:
|
||||
output = convertTokens(treeClass["walker"](document))
|
||||
output = attrlist.sub(sortattrs, output)
|
||||
expected = attrlist.sub(sortattrs, convertExpected(expected))
|
||||
diff = "".join(unified_diff([line + "\n" for line in expected.splitlines()],
|
||||
[line + "\n" for line in output.splitlines()],
|
||||
"Expected", "Received"))
|
||||
assert expected == output, "\n".join([
|
||||
"", "Input:", input,
|
||||
"", "Expected:", expected,
|
||||
"", "Received:", output,
|
||||
"", "Diff:", diff,
|
||||
])
|
||||
except NotImplementedError:
|
||||
pass # Amnesty for those that confess...
|
||||
|
||||
|
||||
def test_treewalker():
|
||||
sys.stdout.write('Testing tree walkers ' + " ".join(list(treeTypes.keys())) + "\n")
|
||||
|
||||
for treeName, treeCls in treeTypes.items():
|
||||
files = get_data_files('tree-construction')
|
||||
for filename in files:
|
||||
testName = os.path.basename(filename).replace(".dat", "")
|
||||
if testName in ("template",):
|
||||
continue
|
||||
|
||||
tests = TestData(filename, "data")
|
||||
|
||||
for index, test in enumerate(tests):
|
||||
(input, errors,
|
||||
innerHTML, expected) = [test[key] for key in ("data", "errors",
|
||||
"document-fragment",
|
||||
"document")]
|
||||
errors = errors.split("\n")
|
||||
yield runTreewalkerTest, innerHTML, input, expected, errors, treeCls
|
||||
|
||||
|
||||
def set_attribute_on_first_child(docfrag, name, value, treeName):
|
||||
"""naively sets an attribute on the first child of the document
|
||||
fragment passed in"""
|
||||
setter = {'ElementTree': lambda d: d[0].set,
|
||||
'DOM': lambda d: d.firstChild.setAttribute}
|
||||
setter['cElementTree'] = setter['ElementTree']
|
||||
try:
|
||||
setter.get(treeName, setter['DOM'])(docfrag)(name, value)
|
||||
except AttributeError:
|
||||
setter['ElementTree'](docfrag)(name, value)
|
||||
|
||||
|
||||
def runTreewalkerEditTest(intext, expected, attrs_to_add, tree):
|
||||
"""tests what happens when we add attributes to the intext"""
|
||||
treeName, treeClass = tree
|
||||
parser = html5parser.HTMLParser(tree=treeClass["builder"])
|
||||
document = parser.parseFragment(intext)
|
||||
for nom, val in attrs_to_add:
|
||||
set_attribute_on_first_child(document, nom, val, treeName)
|
||||
|
||||
document = treeClass.get("adapter", lambda x: x)(document)
|
||||
output = convertTokens(treeClass["walker"](document))
|
||||
output = attrlist.sub(sortattrs, output)
|
||||
if not output in expected:
|
||||
raise AssertionError("TreewalkerEditTest: %s\nExpected:\n%s\nReceived:\n%s" % (treeName, expected, output))
|
||||
|
||||
|
||||
def test_treewalker_six_mix():
|
||||
"""Str/Unicode mix. If str attrs added to tree"""
|
||||
|
||||
# On Python 2.x string literals are of type str. Unless, like this
|
||||
# file, the programmer imports unicode_literals from __future__.
|
||||
# In that case, string literals become objects of type unicode.
|
||||
|
||||
# This test simulates a Py2 user, modifying attributes on a document
|
||||
# fragment but not using the u'' syntax nor importing unicode_literals
|
||||
sm_tests = [
|
||||
('<a href="http://example.com">Example</a>',
|
||||
[(str('class'), str('test123'))],
|
||||
'<a>\n class="test123"\n href="http://example.com"\n "Example"'),
|
||||
|
||||
('<link href="http://example.com/cow">',
|
||||
[(str('rel'), str('alternate'))],
|
||||
'<link>\n href="http://example.com/cow"\n rel="alternate"\n "Example"')
|
||||
]
|
||||
|
||||
for tree in treeTypes.items():
|
||||
for intext, attrs, expected in sm_tests:
|
||||
yield runTreewalkerEditTest, intext, expected, attrs, tree
|
|
@ -0,0 +1,133 @@
|
|||
from __future__ import absolute_import, division, unicode_literals
|
||||
|
||||
import unittest
|
||||
|
||||
from html5lib.filters.whitespace import Filter
|
||||
from html5lib.constants import spaceCharacters
|
||||
spaceCharacters = "".join(spaceCharacters)
|
||||
|
||||
try:
|
||||
unittest.TestCase.assertEqual
|
||||
except AttributeError:
|
||||
unittest.TestCase.assertEqual = unittest.TestCase.assertEquals
|
||||
|
||||
|
||||
class TestCase(unittest.TestCase):
|
||||
def runTest(self, input, expected):
|
||||
output = list(Filter(input))
|
||||
errorMsg = "\n".join(["\n\nInput:", str(input),
|
||||
"\nExpected:", str(expected),
|
||||
"\nReceived:", str(output)])
|
||||
self.assertEqual(output, expected, errorMsg)
|
||||
|
||||
def runTestUnmodifiedOutput(self, input):
|
||||
self.runTest(input, input)
|
||||
|
||||
def testPhrasingElements(self):
|
||||
self.runTestUnmodifiedOutput(
|
||||
[{"type": "Characters", "data": "This is a "},
|
||||
{"type": "StartTag", "name": "span", "data": []},
|
||||
{"type": "Characters", "data": "phrase"},
|
||||
{"type": "EndTag", "name": "span", "data": []},
|
||||
{"type": "SpaceCharacters", "data": " "},
|
||||
{"type": "Characters", "data": "with"},
|
||||
{"type": "SpaceCharacters", "data": " "},
|
||||
{"type": "StartTag", "name": "em", "data": []},
|
||||
{"type": "Characters", "data": "emphasised text"},
|
||||
{"type": "EndTag", "name": "em", "data": []},
|
||||
{"type": "Characters", "data": " and an "},
|
||||
{"type": "StartTag", "name": "img", "data": [["alt", "image"]]},
|
||||
{"type": "Characters", "data": "."}])
|
||||
|
||||
def testLeadingWhitespace(self):
|
||||
self.runTest(
|
||||
[{"type": "StartTag", "name": "p", "data": []},
|
||||
{"type": "SpaceCharacters", "data": spaceCharacters},
|
||||
{"type": "Characters", "data": "foo"},
|
||||
{"type": "EndTag", "name": "p", "data": []}],
|
||||
[{"type": "StartTag", "name": "p", "data": []},
|
||||
{"type": "SpaceCharacters", "data": " "},
|
||||
{"type": "Characters", "data": "foo"},
|
||||
{"type": "EndTag", "name": "p", "data": []}])
|
||||
|
||||
def testLeadingWhitespaceAsCharacters(self):
|
||||
self.runTest(
|
||||
[{"type": "StartTag", "name": "p", "data": []},
|
||||
{"type": "Characters", "data": spaceCharacters + "foo"},
|
||||
{"type": "EndTag", "name": "p", "data": []}],
|
||||
[{"type": "StartTag", "name": "p", "data": []},
|
||||
{"type": "Characters", "data": " foo"},
|
||||
{"type": "EndTag", "name": "p", "data": []}])
|
||||
|
||||
def testTrailingWhitespace(self):
|
||||
self.runTest(
|
||||
[{"type": "StartTag", "name": "p", "data": []},
|
||||
{"type": "Characters", "data": "foo"},
|
||||
{"type": "SpaceCharacters", "data": spaceCharacters},
|
||||
{"type": "EndTag", "name": "p", "data": []}],
|
||||
[{"type": "StartTag", "name": "p", "data": []},
|
||||
{"type": "Characters", "data": "foo"},
|
||||
{"type": "SpaceCharacters", "data": " "},
|
||||
{"type": "EndTag", "name": "p", "data": []}])
|
||||
|
||||
def testTrailingWhitespaceAsCharacters(self):
|
||||
self.runTest(
|
||||
[{"type": "StartTag", "name": "p", "data": []},
|
||||
{"type": "Characters", "data": "foo" + spaceCharacters},
|
||||
{"type": "EndTag", "name": "p", "data": []}],
|
||||
[{"type": "StartTag", "name": "p", "data": []},
|
||||
{"type": "Characters", "data": "foo "},
|
||||
{"type": "EndTag", "name": "p", "data": []}])
|
||||
|
||||
def testWhitespace(self):
|
||||
self.runTest(
|
||||
[{"type": "StartTag", "name": "p", "data": []},
|
||||
{"type": "Characters", "data": "foo" + spaceCharacters + "bar"},
|
||||
{"type": "EndTag", "name": "p", "data": []}],
|
||||
[{"type": "StartTag", "name": "p", "data": []},
|
||||
{"type": "Characters", "data": "foo bar"},
|
||||
{"type": "EndTag", "name": "p", "data": []}])
|
||||
|
||||
def testLeadingWhitespaceInPre(self):
|
||||
self.runTestUnmodifiedOutput(
|
||||
[{"type": "StartTag", "name": "pre", "data": []},
|
||||
{"type": "SpaceCharacters", "data": spaceCharacters},
|
||||
{"type": "Characters", "data": "foo"},
|
||||
{"type": "EndTag", "name": "pre", "data": []}])
|
||||
|
||||
def testLeadingWhitespaceAsCharactersInPre(self):
|
||||
self.runTestUnmodifiedOutput(
|
||||
[{"type": "StartTag", "name": "pre", "data": []},
|
||||
{"type": "Characters", "data": spaceCharacters + "foo"},
|
||||
{"type": "EndTag", "name": "pre", "data": []}])
|
||||
|
||||
def testTrailingWhitespaceInPre(self):
|
||||
self.runTestUnmodifiedOutput(
|
||||
[{"type": "StartTag", "name": "pre", "data": []},
|
||||
{"type": "Characters", "data": "foo"},
|
||||
{"type": "SpaceCharacters", "data": spaceCharacters},
|
||||
{"type": "EndTag", "name": "pre", "data": []}])
|
||||
|
||||
def testTrailingWhitespaceAsCharactersInPre(self):
|
||||
self.runTestUnmodifiedOutput(
|
||||
[{"type": "StartTag", "name": "pre", "data": []},
|
||||
{"type": "Characters", "data": "foo" + spaceCharacters},
|
||||
{"type": "EndTag", "name": "pre", "data": []}])
|
||||
|
||||
def testWhitespaceInPre(self):
|
||||
self.runTestUnmodifiedOutput(
|
||||
[{"type": "StartTag", "name": "pre", "data": []},
|
||||
{"type": "Characters", "data": "foo" + spaceCharacters + "bar"},
|
||||
{"type": "EndTag", "name": "pre", "data": []}])
|
||||
|
||||
|
||||
def buildTestSuite():
|
||||
return unittest.defaultTestLoader.loadTestsFromName(__name__)
|
||||
|
||||
|
||||
def main():
|
||||
buildTestSuite()
|
||||
unittest.main()
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
34
tests/wpt/web-platform-tests/tools/html5lib/html5lib/tests/testdata/AUTHORS.rst
vendored
Normal file
34
tests/wpt/web-platform-tests/tools/html5lib/html5lib/tests/testdata/AUTHORS.rst
vendored
Normal file
|
@ -0,0 +1,34 @@
|
|||
Credits
|
||||
=======
|
||||
|
||||
The ``html5lib`` test data is maintained by:
|
||||
|
||||
- James Graham
|
||||
- Geoffrey Sneddon
|
||||
|
||||
|
||||
Contributors
|
||||
------------
|
||||
|
||||
- Adam Barth
|
||||
- Andi Sidwell
|
||||
- Anne van Kesteren
|
||||
- David Flanagan
|
||||
- Edward Z. Yang
|
||||
- Geoffrey Sneddon
|
||||
- Henri Sivonen
|
||||
- Ian Hickson
|
||||
- Jacques Distler
|
||||
- James Graham
|
||||
- Lachlan Hunt
|
||||
- lantis63
|
||||
- Mark Pilgrim
|
||||
- Mats Palmgren
|
||||
- Ms2ger
|
||||
- Nolan Waite
|
||||
- Philip Taylor
|
||||
- Rafael Weinstein
|
||||
- Ryan King
|
||||
- Sam Ruby
|
||||
- Simon Pieters
|
||||
- Thomas Broyer
|
21
tests/wpt/web-platform-tests/tools/html5lib/html5lib/tests/testdata/LICENSE
vendored
Normal file
21
tests/wpt/web-platform-tests/tools/html5lib/html5lib/tests/testdata/LICENSE
vendored
Normal file
|
@ -0,0 +1,21 @@
|
|||
Copyright (c) 2006-2013 James Graham, Geoffrey Sneddon, and
|
||||
other contributors
|
||||
|
||||
Permission is hereby granted, free of charge, to any person obtaining
|
||||
a copy of this software and associated documentation files (the
|
||||
"Software"), to deal in the Software without restriction, including
|
||||
without limitation the rights to use, copy, modify, merge, publish,
|
||||
distribute, sublicense, and/or sell copies of the Software, and to
|
||||
permit persons to whom the Software is furnished to do so, subject to
|
||||
the following conditions:
|
||||
|
||||
The above copyright notice and this permission notice shall be
|
||||
included in all copies or substantial portions of the Software.
|
||||
|
||||
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
|
||||
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
|
||||
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
|
||||
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
|
||||
LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
|
||||
OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
|
||||
WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|
|
@ -0,0 +1,51 @@
|
|||
老子《道德經》 第一~四十章
|
||||
|
||||
老子道經
|
||||
|
||||
第一章
|
||||
|
||||
道可道,非常道。名可名,非常名。無,名天地之始﹔有,名萬物之母。
|
||||
故常無,欲以觀其妙;常有,欲以觀其徼。此兩者,同出而異名,同謂之
|
||||
玄。玄之又玄,眾妙之門。
|
||||
|
||||
第二章
|
||||
|
||||
天下皆知美之為美,斯惡矣﹔皆知善之為善,斯不善矣。故有無相生,難
|
||||
易相成,長短相形,高下相傾,音聲相和,前後相隨。是以聖人處「無為
|
||||
」之事,行「不言」之教。萬物作焉而不辭,生而不有,為而不恃,功成
|
||||
而弗居。夫唯弗居,是以不去。
|
||||
|
||||
第三章
|
||||
|
||||
不尚賢,使民不爭﹔不貴難得之貨,使民不為盜﹔不見可欲,使民心不亂
|
||||
。是以「聖人」之治,虛其心,實其腹,弱其志,強其骨。常使民無知無
|
||||
欲。使夫智者不敢為也。為「無為」,則無不治。
|
||||
|
||||
第四章
|
||||
|
||||
「道」沖,而用之或不盈。淵兮,似萬物之宗﹔挫其銳,解其紛,和其光
|
||||
,同其塵﹔湛兮似或存。吾不知誰之子?象帝之先。
|
||||
|
||||
第五章
|
||||
|
||||
天地不仁,以萬物為芻狗﹔聖人不仁,以百姓為芻狗。天地之間,其猶橐
|
||||
蘥乎?虛而不屈,動而愈出。多言數窮,不如守中。
|
||||
|
||||
第六章
|
||||
|
||||
谷神不死,是謂玄牝。玄牝之門,是謂天地根。綿綿若存,用之不勤。
|
||||
|
||||
第七章
|
||||
|
||||
天長地久。天地所以能長且久者,以其不自生,故能長久。是以聖人後其
|
||||
身而身先,外其身而身存。非以其無私邪?故能成其私。
|
||||
|
||||
第八章
|
||||
|
||||
上善若水。水善利萬物而不爭。處眾人之所惡,故幾於道。居善地,心善
|
||||
淵,與善仁,言善信,政善治,事善能,動善時。夫唯不爭,故無尤。
|
||||
|
||||
第九章
|
||||
|
||||
持而盈之,不如其已﹔揣而銳之,不可長保。金玉滿堂,莫之能守﹔富貴
|
||||
而驕,自遺其咎。功遂身退,天之道。
|
10
tests/wpt/web-platform-tests/tools/html5lib/html5lib/tests/testdata/encoding/test-yahoo-jp.dat
vendored
Normal file
10
tests/wpt/web-platform-tests/tools/html5lib/html5lib/tests/testdata/encoding/test-yahoo-jp.dat
vendored
Normal file
|
@ -0,0 +1,10 @@
|
|||
#data
|
||||
<html>
|
||||
<head>
|
||||
<meta http-equiv="Content-Type" content="text/html; charset=euc-jp">
|
||||
<!--京-->
|
||||
<title>Yahoo! JAPAN</title>
|
||||
<meta name="description" content="日本最大級のポータルサイト。検索、オークション、ニュース、メール、コミュニティ、ショッピング、など80以上のサービスを展開。あなたの生活をより豊かにする「ライフ・エンジン」を目指していきます。">
|
||||
<style type="text/css" media="all">
|
||||
#encoding
|
||||
euc_jp
|
394
tests/wpt/web-platform-tests/tools/html5lib/html5lib/tests/testdata/encoding/tests1.dat
vendored
Normal file
394
tests/wpt/web-platform-tests/tools/html5lib/html5lib/tests/testdata/encoding/tests1.dat
vendored
Normal file
File diff suppressed because one or more lines are too long
115
tests/wpt/web-platform-tests/tools/html5lib/html5lib/tests/testdata/encoding/tests2.dat
vendored
Normal file
115
tests/wpt/web-platform-tests/tools/html5lib/html5lib/tests/testdata/encoding/tests2.dat
vendored
Normal file
|
@ -0,0 +1,115 @@
|
|||
#data
|
||||
<meta
|
||||
#encoding
|
||||
windows-1252
|
||||
|
||||
#data
|
||||
<
|
||||
#encoding
|
||||
windows-1252
|
||||
|
||||
#data
|
||||
<!
|
||||
#encoding
|
||||
windows-1252
|
||||
|
||||
#data
|
||||
<meta charset = "
|
||||
#encoding
|
||||
windows-1252
|
||||
|
||||
#data
|
||||
<meta charset=euc_jp
|
||||
#encoding
|
||||
windows-1252
|
||||
|
||||
#data
|
||||
<meta <meta charset='euc_jp'>
|
||||
#encoding
|
||||
euc_jp
|
||||
|
||||
#data
|
||||
<meta charset = 'euc_jp'>
|
||||
#encoding
|
||||
euc_jp
|
||||
|
||||
#data
|
||||
<!-- -->
|
||||
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
|
||||
#encoding
|
||||
utf-8
|
||||
|
||||
#data
|
||||
<!-- -->
|
||||
<meta http-equiv="Content-Type" content="text/html; charset=utf
|
||||
#encoding
|
||||
windows-1252
|
||||
|
||||
#data
|
||||
<meta http-equiv="Content-Type<meta charset="utf-8">
|
||||
#encoding
|
||||
windows-1252
|
||||
|
||||
#data
|
||||
<meta http-equiv="Content-Type" content="text/html; charset='utf-8'">
|
||||
#encoding
|
||||
utf-8
|
||||
|
||||
#data
|
||||
<meta http-equiv="Content-Type" content="text/html; charset='utf-8">
|
||||
#encoding
|
||||
windows-1252
|
||||
|
||||
#data
|
||||
<meta
|
||||
#encoding
|
||||
windows-1252
|
||||
|
||||
#data
|
||||
<meta charset =
|
||||
#encoding
|
||||
windows-1252
|
||||
|
||||
#data
|
||||
<meta charset= utf-8
|
||||
>
|
||||
#encoding
|
||||
utf-8
|
||||
|
||||
#data
|
||||
<meta content = "text/html;
|
||||
#encoding
|
||||
windows-1252
|
||||
|
||||
#data
|
||||
<meta charset="UTF-16">
|
||||
#encoding
|
||||
utf-8
|
||||
|
||||
#data
|
||||
<meta charset="UTF-16LE">
|
||||
#encoding
|
||||
utf-8
|
||||
|
||||
#data
|
||||
<meta charset="UTF-16BE">
|
||||
#encoding
|
||||
utf-8
|
||||
|
||||
#data
|
||||
<html a=ñ>
|
||||
<meta charset="utf-8">
|
||||
#encoding
|
||||
utf-8
|
||||
|
||||
#data
|
||||
<html ñ>
|
||||
<meta charset="utf-8">
|
||||
#encoding
|
||||
utf-8
|
||||
|
||||
#data
|
||||
<html>ñ
|
||||
<meta charset="utf-8">
|
||||
#encoding
|
||||
utf-8
|
501
tests/wpt/web-platform-tests/tools/html5lib/html5lib/tests/testdata/sanitizer/tests1.dat
vendored
Normal file
501
tests/wpt/web-platform-tests/tools/html5lib/html5lib/tests/testdata/sanitizer/tests1.dat
vendored
Normal file
|
@ -0,0 +1,501 @@
|
|||
[
|
||||
{
|
||||
"name": "IE_Comments",
|
||||
"input": "<!--[if gte IE 4]><script>alert('XSS');</script><![endif]-->",
|
||||
"output": ""
|
||||
},
|
||||
|
||||
{
|
||||
"name": "IE_Comments_2",
|
||||
"input": "<![if !IE 5]><script>alert('XSS');</script><![endif]>",
|
||||
"output": "<script>alert('XSS');</script>",
|
||||
"rexml": "Ill-formed XHTML!"
|
||||
},
|
||||
|
||||
{
|
||||
"name": "allow_colons_in_path_component",
|
||||
"input": "<a href=\"./this:that\">foo</a>",
|
||||
"output": "<a href='./this:that'>foo</a>"
|
||||
},
|
||||
|
||||
{
|
||||
"name": "background_attribute",
|
||||
"input": "<div background=\"javascript:alert('XSS')\"></div>",
|
||||
"output": "<div/>",
|
||||
"xhtml": "<div></div>",
|
||||
"rexml": "<div></div>"
|
||||
},
|
||||
|
||||
{
|
||||
"name": "bgsound",
|
||||
"input": "<bgsound src=\"javascript:alert('XSS');\" />",
|
||||
"output": "<bgsound src=\"javascript:alert('XSS');\"/>",
|
||||
"rexml": "<bgsound src=\"javascript:alert('XSS');\"></bgsound>"
|
||||
},
|
||||
|
||||
{
|
||||
"name": "div_background_image_unicode_encoded",
|
||||
"input": "<div style=\"background-image:\u00a5\u00a2\u006C\u0028'\u006a\u0061\u00a6\u0061\u00a3\u0063\u00a2\u0069\u00a0\u00a4\u003a\u0061\u006c\u0065\u00a2\u00a4\u0028.1027\u0058.1053\u0053\u0027\u0029'\u0029\">foo</div>",
|
||||
"output": "<div style=''>foo</div>"
|
||||
},
|
||||
|
||||
{
|
||||
"name": "div_expression",
|
||||
"input": "<div style=\"width: expression(alert('XSS'));\">foo</div>",
|
||||
"output": "<div style=''>foo</div>"
|
||||
},
|
||||
|
||||
{
|
||||
"name": "double_open_angle_brackets",
|
||||
"input": "<img src=http://ha.ckers.org/scriptlet.html <",
|
||||
"output": "<img src='http://ha.ckers.org/scriptlet.html'>",
|
||||
"rexml": "Ill-formed XHTML!"
|
||||
},
|
||||
|
||||
{
|
||||
"name": "double_open_angle_brackets_2",
|
||||
"input": "<script src=http://ha.ckers.org/scriptlet.html <",
|
||||
"output": "<script src=\"http://ha.ckers.org/scriptlet.html\" <=\"\">",
|
||||
"rexml": "Ill-formed XHTML!"
|
||||
},
|
||||
|
||||
{
|
||||
"name": "grave_accents",
|
||||
"input": "<img src=`javascript:alert('XSS')` />",
|
||||
"output": "<img/>",
|
||||
"rexml": "Ill-formed XHTML!"
|
||||
},
|
||||
|
||||
{
|
||||
"name": "img_dynsrc_lowsrc",
|
||||
"input": "<img dynsrc=\"javascript:alert('XSS')\" />",
|
||||
"output": "<img/>",
|
||||
"rexml": "<img />"
|
||||
},
|
||||
|
||||
{
|
||||
"name": "img_vbscript",
|
||||
"input": "<img src='vbscript:msgbox(\"XSS\")' />",
|
||||
"output": "<img/>",
|
||||
"rexml": "<img />"
|
||||
},
|
||||
|
||||
{
|
||||
"name": "input_image",
|
||||
"input": "<input type=\"image\" src=\"javascript:alert('XSS');\" />",
|
||||
"output": "<input type='image'/>",
|
||||
"rexml": "<input type='image' />"
|
||||
},
|
||||
|
||||
{
|
||||
"name": "link_stylesheets",
|
||||
"input": "<link rel=\"stylesheet\" href=\"javascript:alert('XSS');\" />",
|
||||
"output": "<link rel=\"stylesheet\" href=\"javascript:alert('XSS');\"/>",
|
||||
"rexml": "<link href=\"javascript:alert('XSS');\" rel=\"stylesheet\"/>"
|
||||
},
|
||||
|
||||
{
|
||||
"name": "link_stylesheets_2",
|
||||
"input": "<link rel=\"stylesheet\" href=\"http://ha.ckers.org/xss.css\" />",
|
||||
"output": "<link rel=\"stylesheet\" href=\"http://ha.ckers.org/xss.css\"/>",
|
||||
"rexml": "<link href=\"http://ha.ckers.org/xss.css\" rel=\"stylesheet\"/>"
|
||||
},
|
||||
|
||||
{
|
||||
"name": "list_style_image",
|
||||
"input": "<li style=\"list-style-image: url(javascript:alert('XSS'))\">foo</li>",
|
||||
"output": "<li style=''>foo</li>"
|
||||
},
|
||||
|
||||
{
|
||||
"name": "no_closing_script_tags",
|
||||
"input": "<script src=http://ha.ckers.org/xss.js?<b>",
|
||||
"output": "<script src=\"http://ha.ckers.org/xss.js?&lt;b\">",
|
||||
"rexml": "Ill-formed XHTML!"
|
||||
},
|
||||
|
||||
{
|
||||
"name": "non_alpha_non_digit",
|
||||
"input": "<script/XSS src=\"http://ha.ckers.org/xss.js\"></script>",
|
||||
"output": "<script XSS=\"\" src=\"http://ha.ckers.org/xss.js\"></script>",
|
||||
"rexml": "Ill-formed XHTML!"
|
||||
},
|
||||
|
||||
{
|
||||
"name": "non_alpha_non_digit_2",
|
||||
"input": "<a onclick!\\#$%&()*~+-_.,:;?@[/|\\]^`=alert(\"XSS\")>foo</a>",
|
||||
"output": "<a>foo</a>",
|
||||
"rexml": "Ill-formed XHTML!"
|
||||
},
|
||||
|
||||
{
|
||||
"name": "non_alpha_non_digit_3",
|
||||
"input": "<img/src=\"http://ha.ckers.org/xss.js\"/>",
|
||||
"output": "<img src='http://ha.ckers.org/xss.js'/>",
|
||||
"rexml": "Ill-formed XHTML!"
|
||||
},
|
||||
|
||||
{
|
||||
"name": "non_alpha_non_digit_II",
|
||||
"input": "<a href!\\#$%&()*~+-_.,:;?@[/|]^`=alert('XSS')>foo</a>",
|
||||
"output": "<a>foo</a>",
|
||||
"rexml": "Ill-formed XHTML!"
|
||||
},
|
||||
|
||||
{
|
||||
"name": "non_alpha_non_digit_III",
|
||||
"input": "<a/href=\"javascript:alert('XSS');\">foo</a>",
|
||||
"output": "<a>foo</a>",
|
||||
"rexml": "Ill-formed XHTML!"
|
||||
},
|
||||
|
||||
{
|
||||
"name": "platypus",
|
||||
"input": "<a href=\"http://www.ragingplatypus.com/\" style=\"display:block; position:absolute; left:0; top:0; width:100%; height:100%; z-index:1; background-color:black; background-image:url(http://www.ragingplatypus.com/i/cam-full.jpg); background-x:center; background-y:center; background-repeat:repeat;\">never trust your upstream platypus</a>",
|
||||
"output": "<a href='http://www.ragingplatypus.com/' style='display: block; width: 100%; height: 100%; background-color: black; background-x: center; background-y: center;'>never trust your upstream platypus</a>"
|
||||
},
|
||||
|
||||
{
|
||||
"name": "protocol_resolution_in_script_tag",
|
||||
"input": "<script src=//ha.ckers.org/.j></script>",
|
||||
"output": "<script src=\"//ha.ckers.org/.j\"></script>",
|
||||
"rexml": "Ill-formed XHTML!"
|
||||
},
|
||||
|
||||
{
|
||||
"name": "should_allow_anchors",
|
||||
"input": "<a href='foo' onclick='bar'><script>baz</script></a>",
|
||||
"output": "<a href='foo'><script>baz</script></a>"
|
||||
},
|
||||
|
||||
{
|
||||
"name": "should_allow_image_alt_attribute",
|
||||
"input": "<img alt='foo' onclick='bar' />",
|
||||
"output": "<img alt='foo'/>",
|
||||
"rexml": "<img alt='foo' />"
|
||||
},
|
||||
|
||||
{
|
||||
"name": "should_allow_image_height_attribute",
|
||||
"input": "<img height='foo' onclick='bar' />",
|
||||
"output": "<img height='foo'/>",
|
||||
"rexml": "<img height='foo' />"
|
||||
},
|
||||
|
||||
{
|
||||
"name": "should_allow_image_src_attribute",
|
||||
"input": "<img src='foo' onclick='bar' />",
|
||||
"output": "<img src='foo'/>",
|
||||
"rexml": "<img src='foo' />"
|
||||
},
|
||||
|
||||
{
|
||||
"name": "should_allow_image_width_attribute",
|
||||
"input": "<img width='foo' onclick='bar' />",
|
||||
"output": "<img width='foo'/>",
|
||||
"rexml": "<img width='foo' />"
|
||||
},
|
||||
|
||||
{
|
||||
"name": "should_handle_blank_text",
|
||||
"input": "",
|
||||
"output": ""
|
||||
},
|
||||
|
||||
{
|
||||
"name": "should_handle_malformed_image_tags",
|
||||
"input": "<img \"\"\"><script>alert(\"XSS\")</script>\">",
|
||||
"output": "<img/><script>alert(\"XSS\")</script>\">",
|
||||
"rexml": "Ill-formed XHTML!"
|
||||
},
|
||||
|
||||
{
|
||||
"name": "should_handle_non_html",
|
||||
"input": "abc",
|
||||
"output": "abc"
|
||||
},
|
||||
|
||||
{
|
||||
"name": "should_not_fall_for_ridiculous_hack",
|
||||
"input": "<img\nsrc\n=\n\"\nj\na\nv\na\ns\nc\nr\ni\np\nt\n:\na\nl\ne\nr\nt\n(\n'\nX\nS\nS\n'\n)\n\"\n />",
|
||||
"output": "<img/>",
|
||||
"rexml": "<img />"
|
||||
},
|
||||
|
||||
{
|
||||
"name": "should_not_fall_for_xss_image_hack_0",
|
||||
"input": "<img src=\"javascript:alert('XSS');\" />",
|
||||
"output": "<img/>",
|
||||
"rexml": "<img />"
|
||||
},
|
||||
|
||||
{
|
||||
"name": "should_not_fall_for_xss_image_hack_1",
|
||||
"input": "<img src=javascript:alert('XSS') />",
|
||||
"output": "<img/>",
|
||||
"rexml": "Ill-formed XHTML!"
|
||||
},
|
||||
|
||||
{
|
||||
"name": "should_not_fall_for_xss_image_hack_10",
|
||||
"input": "<img src=\"jav
ascript:alert('XSS');\" />",
|
||||
"output": "<img/>",
|
||||
"rexml": "<img />"
|
||||
},
|
||||
|
||||
{
|
||||
"name": "should_not_fall_for_xss_image_hack_11",
|
||||
"input": "<img src=\"jav
ascript:alert('XSS');\" />",
|
||||
"output": "<img/>",
|
||||
"rexml": "<img />"
|
||||
},
|
||||
|
||||
{
|
||||
"name": "should_not_fall_for_xss_image_hack_12",
|
||||
"input": "<img src=\"  javascript:alert('XSS');\" />",
|
||||
"output": "<img/>",
|
||||
"rexml": "<img />"
|
||||
},
|
||||
|
||||
{
|
||||
"name": "should_not_fall_for_xss_image_hack_13",
|
||||
"input": "<img src=\" javascript:alert('XSS');\" />",
|
||||
"output": "<img/>",
|
||||
"rexml": "<img />"
|
||||
},
|
||||
|
||||
{
|
||||
"name": "should_not_fall_for_xss_image_hack_14",
|
||||
"input": "<img src=\" javascript:alert('XSS');\" />",
|
||||
"output": "<img/>",
|
||||
"rexml": "<img />"
|
||||
},
|
||||
|
||||
{
|
||||
"name": "should_not_fall_for_xss_image_hack_2",
|
||||
"input": "<img src=\"JaVaScRiPt:alert('XSS')\" />",
|
||||
"output": "<img/>",
|
||||
"rexml": "<img />"
|
||||
},
|
||||
|
||||
{
|
||||
"name": "should_not_fall_for_xss_image_hack_3",
|
||||
"input": "<img src='javascript:alert("XSS")' />",
|
||||
"output": "<img/>",
|
||||
"rexml": "<img />"
|
||||
},
|
||||
|
||||
{
|
||||
"name": "should_not_fall_for_xss_image_hack_4",
|
||||
"input": "<img src='javascript:alert(String.fromCharCode(88,83,83))' />",
|
||||
"output": "<img/>",
|
||||
"rexml": "<img />"
|
||||
},
|
||||
|
||||
{
|
||||
"name": "should_not_fall_for_xss_image_hack_5",
|
||||
"input": "<img src='javascript:alert('XSS')' />",
|
||||
"output": "<img/>",
|
||||
"rexml": "<img />"
|
||||
},
|
||||
|
||||
{
|
||||
"name": "should_not_fall_for_xss_image_hack_6",
|
||||
"input": "<img src='javascript:alert('XSS')' />",
|
||||
"output": "<img/>",
|
||||
"rexml": "<img />"
|
||||
},
|
||||
|
||||
{
|
||||
"name": "should_not_fall_for_xss_image_hack_7",
|
||||
"input": "<img src='javascript:alert('XSS')' />",
|
||||
"output": "<img/>",
|
||||
"rexml": "<img />"
|
||||
},
|
||||
|
||||
{
|
||||
"name": "should_not_fall_for_xss_image_hack_8",
|
||||
"input": "<img src=\"jav\tascript:alert('XSS');\" />",
|
||||
"output": "<img/>",
|
||||
"rexml": "<img />"
|
||||
},
|
||||
|
||||
{
|
||||
"name": "should_not_fall_for_xss_image_hack_9",
|
||||
"input": "<img src=\"jav	ascript:alert('XSS');\" />",
|
||||
"output": "<img/>",
|
||||
"rexml": "<img />"
|
||||
},
|
||||
|
||||
{
|
||||
"name": "should_sanitize_half_open_scripts",
|
||||
"input": "<img src=\"javascript:alert('XSS')\"",
|
||||
"output": "<img/>",
|
||||
"rexml": "Ill-formed XHTML!"
|
||||
},
|
||||
|
||||
{
|
||||
"name": "should_sanitize_invalid_script_tag",
|
||||
"input": "<script/XSS SRC=\"http://ha.ckers.org/xss.js\"></script>",
|
||||
"output": "<script XSS=\"\" SRC=\"http://ha.ckers.org/xss.js\"></script>",
|
||||
"rexml": "Ill-formed XHTML!"
|
||||
},
|
||||
|
||||
{
|
||||
"name": "should_sanitize_script_tag_with_multiple_open_brackets",
|
||||
"input": "<<script>alert(\"XSS\");//<</script>",
|
||||
"output": "<<script>alert(\"XSS\");//<</script>",
|
||||
"rexml": "Ill-formed XHTML!"
|
||||
},
|
||||
|
||||
{
|
||||
"name": "should_sanitize_script_tag_with_multiple_open_brackets_2",
|
||||
"input": "<iframe src=http://ha.ckers.org/scriptlet.html\n<",
|
||||
"output": "<iframe src=\"http://ha.ckers.org/scriptlet.html\" <=\"\">",
|
||||
"rexml": "Ill-formed XHTML!"
|
||||
},
|
||||
|
||||
{
|
||||
"name": "should_sanitize_tag_broken_up_by_null",
|
||||
"input": "<scr\u0000ipt>alert(\"XSS\")</scr\u0000ipt>",
|
||||
"output": "<scr\ufffdipt>alert(\"XSS\")</scr\ufffdipt>",
|
||||
"rexml": "Ill-formed XHTML!"
|
||||
},
|
||||
|
||||
{
|
||||
"name": "should_sanitize_unclosed_script",
|
||||
"input": "<script src=http://ha.ckers.org/xss.js?<b>",
|
||||
"output": "<script src=\"http://ha.ckers.org/xss.js?&lt;b\">",
|
||||
"rexml": "Ill-formed XHTML!"
|
||||
},
|
||||
|
||||
{
|
||||
"name": "should_strip_href_attribute_in_a_with_bad_protocols",
|
||||
"input": "<a href=\"javascript:XSS\" title=\"1\">boo</a>",
|
||||
"output": "<a title='1'>boo</a>"
|
||||
},
|
||||
|
||||
{
|
||||
"name": "should_strip_href_attribute_in_a_with_bad_protocols_and_whitespace",
|
||||
"input": "<a href=\" javascript:XSS\" title=\"1\">boo</a>",
|
||||
"output": "<a title='1'>boo</a>"
|
||||
},
|
||||
|
||||
{
|
||||
"name": "should_strip_src_attribute_in_img_with_bad_protocols",
|
||||
"input": "<img src=\"javascript:XSS\" title=\"1\">boo</img>",
|
||||
"output": "<img title='1'/>boo",
|
||||
"rexml": "<img title='1' />"
|
||||
},
|
||||
|
||||
{
|
||||
"name": "should_strip_src_attribute_in_img_with_bad_protocols_and_whitespace",
|
||||
"input": "<img src=\" javascript:XSS\" title=\"1\">boo</img>",
|
||||
"output": "<img title='1'/>boo",
|
||||
"rexml": "<img title='1' />"
|
||||
},
|
||||
|
||||
{
|
||||
"name": "xml_base",
|
||||
"input": "<div xml:base=\"javascript:alert('XSS');//\">foo</div>",
|
||||
"output": "<div>foo</div>"
|
||||
},
|
||||
|
||||
{
|
||||
"name": "xul",
|
||||
"input": "<p style=\"-moz-binding:url('http://ha.ckers.org/xssmoz.xml#xss')\">fubar</p>",
|
||||
"output": "<p style=''>fubar</p>"
|
||||
},
|
||||
|
||||
{
|
||||
"name": "quotes_in_attributes",
|
||||
"input": "<img src='foo' title='\"foo\" bar' />",
|
||||
"rexml": "<img src='foo' title='\"foo\" bar' />",
|
||||
"output": "<img title='"foo" bar' src='foo'/>"
|
||||
},
|
||||
|
||||
{
|
||||
"name": "uri_refs_in_svg_attributes",
|
||||
"input": "<rect fill='url(#foo)' />",
|
||||
"rexml": "<rect fill='url(#foo)'></rect>",
|
||||
"xhtml": "<rect fill='url(#foo)'></rect>",
|
||||
"output": "<rect fill='url(#foo)'/>"
|
||||
},
|
||||
|
||||
{
|
||||
"name": "absolute_uri_refs_in_svg_attributes",
|
||||
"input": "<rect fill='url(http://bad.com/) #fff' />",
|
||||
"rexml": "<rect fill=' #fff'></rect>",
|
||||
"xhtml": "<rect fill=' #fff'></rect>",
|
||||
"output": "<rect fill=' #fff'/>"
|
||||
},
|
||||
|
||||
{
|
||||
"name": "uri_ref_with_space_in svg_attribute",
|
||||
"input": "<rect fill='url(\n#foo)' />",
|
||||
"rexml": "<rect fill='url(\n#foo)'></rect>",
|
||||
"xhtml": "<rect fill='url(\n#foo)'></rect>",
|
||||
"output": "<rect fill='url(\n#foo)'/>"
|
||||
},
|
||||
|
||||
{
|
||||
"name": "absolute_uri_ref_with_space_in svg_attribute",
|
||||
"input": "<rect fill=\"url(\nhttp://bad.com/)\" />",
|
||||
"rexml": "<rect fill=' '></rect>",
|
||||
"xhtml": "<rect fill=' '></rect>",
|
||||
"output": "<rect fill=' '/>"
|
||||
},
|
||||
|
||||
{
|
||||
"name": "allow_html5_image_tag",
|
||||
"input": "<image src='foo' />",
|
||||
"rexml": "<image src=\"foo\"></image>",
|
||||
"output": "<image src=\"foo\"/>"
|
||||
},
|
||||
|
||||
{
|
||||
"name": "style_attr_end_with_nothing",
|
||||
"input": "<div style=\"color: blue\" />",
|
||||
"output": "<div style='color: blue;'/>",
|
||||
"xhtml": "<div style='color: blue;'></div>",
|
||||
"rexml": "<div style='color: blue;'></div>"
|
||||
},
|
||||
|
||||
{
|
||||
"name": "style_attr_end_with_space",
|
||||
"input": "<div style=\"color: blue \" />",
|
||||
"output": "<div style='color: blue ;'/>",
|
||||
"xhtml": "<div style='color: blue ;'></div>",
|
||||
"rexml": "<div style='color: blue ;'></div>"
|
||||
},
|
||||
|
||||
{
|
||||
"name": "style_attr_end_with_semicolon",
|
||||
"input": "<div style=\"color: blue;\" />",
|
||||
"output": "<div style='color: blue;'/>",
|
||||
"xhtml": "<div style='color: blue;'></div>",
|
||||
"rexml": "<div style='color: blue;'></div>"
|
||||
},
|
||||
|
||||
{
|
||||
"name": "style_attr_end_with_semicolon_space",
|
||||
"input": "<div style=\"color: blue; \" />",
|
||||
"output": "<div style='color: blue;'/>",
|
||||
"xhtml": "<div style='color: blue;'></div>",
|
||||
"rexml": "<div style='color: blue;'></div>"
|
||||
},
|
||||
|
||||
{
|
||||
"name": "attributes_with_embedded_quotes",
|
||||
"input": "<img src=doesntexist.jpg\"'onerror=\"alert(1) />",
|
||||
"output": "<img src='doesntexist.jpg"'onerror="alert(1)'/>",
|
||||
"rexml": "Ill-formed XHTML!"
|
||||
},
|
||||
|
||||
{
|
||||
"name": "attributes_with_embedded_quotes_II",
|
||||
"input": "<img src=notthere.jpg\"\"onerror=\"alert(2) />",
|
||||
"output": "<img src='notthere.jpg""onerror="alert(2)'/>",
|
||||
"rexml": "Ill-formed XHTML!"
|
||||
}
|
||||
]
|
125
tests/wpt/web-platform-tests/tools/html5lib/html5lib/tests/testdata/serializer/core.test
vendored
Normal file
125
tests/wpt/web-platform-tests/tools/html5lib/html5lib/tests/testdata/serializer/core.test
vendored
Normal file
|
@ -0,0 +1,125 @@
|
|||
{"tests": [
|
||||
|
||||
{"description": "proper attribute value escaping",
|
||||
"input": [["StartTag", "http://www.w3.org/1999/xhtml", "span", [{"namespace": null, "name": "title", "value": "test \"with\" ""}]]],
|
||||
"expected": ["<span title='test \"with\" &quot;'>"]
|
||||
},
|
||||
|
||||
{"description": "proper attribute value non-quoting",
|
||||
"input": [["StartTag", "http://www.w3.org/1999/xhtml", "span", [{"namespace": null, "name": "title", "value": "foo"}]]],
|
||||
"expected": ["<span title=foo>"],
|
||||
"xhtml": ["<span title=\"foo\">"]
|
||||
},
|
||||
|
||||
{"description": "proper attribute value non-quoting (with <)",
|
||||
"input": [["StartTag", "http://www.w3.org/1999/xhtml", "span", [{"namespace": null, "name": "title", "value": "foo<bar"}]]],
|
||||
"expected": ["<span title=foo<bar>"],
|
||||
"xhtml": ["<span title=\"foo<bar\">"]
|
||||
},
|
||||
|
||||
{"description": "proper attribute value quoting (with =)",
|
||||
"input": [["StartTag", "http://www.w3.org/1999/xhtml", "span", [{"namespace": null, "name": "title", "value": "foo=bar"}]]],
|
||||
"expected": ["<span title=\"foo=bar\">"]
|
||||
},
|
||||
|
||||
{"description": "proper attribute value quoting (with >)",
|
||||
"input": [["StartTag", "http://www.w3.org/1999/xhtml", "span", [{"namespace": null, "name": "title", "value": "foo>bar"}]]],
|
||||
"expected": ["<span title=\"foo>bar\">"]
|
||||
},
|
||||
|
||||
{"description": "proper attribute value quoting (with \")",
|
||||
"input": [["StartTag", "http://www.w3.org/1999/xhtml", "span", [{"namespace": null, "name": "title", "value": "foo\"bar"}]]],
|
||||
"expected": ["<span title='foo\"bar'>"]
|
||||
},
|
||||
|
||||
{"description": "proper attribute value quoting (with ')",
|
||||
"input": [["StartTag", "http://www.w3.org/1999/xhtml", "span", [{"namespace": null, "name": "title", "value": "foo'bar"}]]],
|
||||
"expected": ["<span title=\"foo'bar\">"]
|
||||
},
|
||||
|
||||
{"description": "proper attribute value quoting (with both \" and ')",
|
||||
"input": [["StartTag", "http://www.w3.org/1999/xhtml", "span", [{"namespace": null, "name": "title", "value": "foo'bar\"baz"}]]],
|
||||
"expected": ["<span title=\"foo'bar"baz\">"]
|
||||
},
|
||||
|
||||
{"description": "proper attribute value quoting (with space)",
|
||||
"input": [["StartTag", "http://www.w3.org/1999/xhtml", "span", [{"namespace": null, "name": "title", "value": "foo bar"}]]],
|
||||
"expected": ["<span title=\"foo bar\">"]
|
||||
},
|
||||
|
||||
{"description": "proper attribute value quoting (with tab)",
|
||||
"input": [["StartTag", "http://www.w3.org/1999/xhtml", "span", [{"namespace": null, "name": "title", "value": "foo\tbar"}]]],
|
||||
"expected": ["<span title=\"foo\tbar\">"]
|
||||
},
|
||||
|
||||
{"description": "proper attribute value quoting (with LF)",
|
||||
"input": [["StartTag", "http://www.w3.org/1999/xhtml", "span", [{"namespace": null, "name": "title", "value": "foo\nbar"}]]],
|
||||
"expected": ["<span title=\"foo\nbar\">"]
|
||||
},
|
||||
|
||||
{"description": "proper attribute value quoting (with CR)",
|
||||
"input": [["StartTag", "http://www.w3.org/1999/xhtml", "span", [{"namespace": null, "name": "title", "value": "foo\rbar"}]]],
|
||||
"expected": ["<span title=\"foo\rbar\">"]
|
||||
},
|
||||
|
||||
{"description": "proper attribute value non-quoting (with linetab)",
|
||||
"input": [["StartTag", "http://www.w3.org/1999/xhtml", "span", [{"namespace": null, "name": "title", "value": "foo\u000Bbar"}]]],
|
||||
"expected": ["<span title=foo\u000Bbar>"],
|
||||
"xhtml": ["<span title=\"foo\u000Bbar\">"]
|
||||
},
|
||||
|
||||
{"description": "proper attribute value quoting (with form feed)",
|
||||
"input": [["StartTag", "http://www.w3.org/1999/xhtml", "span", [{"namespace": null, "name": "title", "value": "foo\u000Cbar"}]]],
|
||||
"expected": ["<span title=\"foo\u000Cbar\">"]
|
||||
},
|
||||
|
||||
{"description": "void element (as EmptyTag token)",
|
||||
"input": [["EmptyTag", "img", {}]],
|
||||
"expected": ["<img>"],
|
||||
"xhtml": ["<img />"]
|
||||
},
|
||||
|
||||
{"description": "void element (as StartTag token)",
|
||||
"input": [["StartTag", "http://www.w3.org/1999/xhtml", "img", {}]],
|
||||
"expected": ["<img>"],
|
||||
"xhtml": ["<img />"]
|
||||
},
|
||||
|
||||
{"description": "doctype in error",
|
||||
"input": [["Doctype", "foo"]],
|
||||
"expected": ["<!DOCTYPE foo>"]
|
||||
},
|
||||
|
||||
{"description": "character data",
|
||||
"options": {"encoding":"utf-8"},
|
||||
"input": [["Characters", "a<b>c&d"]],
|
||||
"expected": ["a<b>c&d"]
|
||||
},
|
||||
|
||||
{"description": "rcdata",
|
||||
"input": [["StartTag", "http://www.w3.org/1999/xhtml", "script", {}], ["Characters", "a<b>c&d"]],
|
||||
"expected": ["<script>a<b>c&d"],
|
||||
"xhtml": ["<script>a<b>c&d"]
|
||||
},
|
||||
|
||||
{"description": "doctype",
|
||||
"input": [["Doctype", "HTML"]],
|
||||
"expected": ["<!DOCTYPE HTML>"]
|
||||
},
|
||||
|
||||
{"description": "HTML 4.01 DOCTYPE",
|
||||
"input": [["Doctype", "HTML", "-//W3C//DTD HTML 4.01//EN", "http://www.w3.org/TR/html4/strict.dtd"]],
|
||||
"expected": ["<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01//EN\" \"http://www.w3.org/TR/html4/strict.dtd\">"]
|
||||
},
|
||||
|
||||
{"description": "HTML 4.01 DOCTYPE without system identifer",
|
||||
"input": [["Doctype", "HTML", "-//W3C//DTD HTML 4.01//EN"]],
|
||||
"expected": ["<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01//EN\">"]
|
||||
},
|
||||
|
||||
{"description": "IBM DOCTYPE without public identifer",
|
||||
"input": [["Doctype", "html", "", "http://www.ibm.com/data/dtd/v11/ibmxhtml1-transitional.dtd"]],
|
||||
"expected": ["<!DOCTYPE html SYSTEM \"http://www.ibm.com/data/dtd/v11/ibmxhtml1-transitional.dtd\">"]
|
||||
}
|
||||
|
||||
]}
|
66
tests/wpt/web-platform-tests/tools/html5lib/html5lib/tests/testdata/serializer/injectmeta.test
vendored
Normal file
66
tests/wpt/web-platform-tests/tools/html5lib/html5lib/tests/testdata/serializer/injectmeta.test
vendored
Normal file
|
@ -0,0 +1,66 @@
|
|||
{"tests": [
|
||||
|
||||
{"description": "no encoding",
|
||||
"options": {"inject_meta_charset": true},
|
||||
"input": [["StartTag", "http://www.w3.org/1999/xhtml", "head", {}], ["EndTag", "http://www.w3.org/1999/xhtml", "head"]],
|
||||
"expected": [""],
|
||||
"xhtml": ["<head></head>"]
|
||||
},
|
||||
|
||||
{"description": "empytag head",
|
||||
"options": {"inject_meta_charset": true, "encoding":"utf-8"},
|
||||
"input": [["StartTag", "http://www.w3.org/1999/xhtml", "head", {}], ["EndTag", "http://www.w3.org/1999/xhtml", "head"]],
|
||||
"expected": ["<meta charset=utf-8>"],
|
||||
"xhtml": ["<head><meta charset=\"utf-8\" /></head>"]
|
||||
},
|
||||
|
||||
{"description": "head w/title",
|
||||
"options": {"inject_meta_charset": true, "encoding":"utf-8"},
|
||||
"input": [["StartTag", "http://www.w3.org/1999/xhtml", "head", {}], ["StartTag", "http://www.w3.org/1999/xhtml","title",{}], ["Characters", "foo"],["EndTag", "http://www.w3.org/1999/xhtml", "title"], ["EndTag", "http://www.w3.org/1999/xhtml", "head"]],
|
||||
"expected": ["<meta charset=utf-8><title>foo</title>"],
|
||||
"xhtml": ["<head><meta charset=\"utf-8\" /><title>foo</title></head>"]
|
||||
},
|
||||
|
||||
{"description": "head w/meta-charset",
|
||||
"options": {"inject_meta_charset": true, "encoding":"utf-8"},
|
||||
"input": [["StartTag", "http://www.w3.org/1999/xhtml", "head", {}], ["EmptyTag","meta",[{"namespace": null, "name": "charset", "value": "ascii"}]], ["EndTag", "http://www.w3.org/1999/xhtml", "head"]],
|
||||
"expected": ["<meta charset=utf-8>"],
|
||||
"xhtml": ["<head><meta charset=\"utf-8\" /></head>"]
|
||||
},
|
||||
|
||||
{"description": "head w/ two meta-charset",
|
||||
"options": {"inject_meta_charset": true, "encoding":"utf-8"},
|
||||
"input": [["StartTag", "http://www.w3.org/1999/xhtml", "head", {}], ["EmptyTag","meta",[{"namespace": null, "name": "charset", "value": "ascii"}]], ["EmptyTag","meta",[{"namespace": null, "name": "charset", "value": "ascii"}]], ["EndTag", "http://www.w3.org/1999/xhtml", "head"]],
|
||||
"expected": ["<meta charset=utf-8><meta charset=utf-8>", "<head><meta charset=utf-8><meta charset=ascii>"],
|
||||
"xhtml": ["<head><meta charset=\"utf-8\" /><meta charset=\"utf-8\" /></head>", "<head><meta charset=\"utf-8\" /><meta charset=\"ascii\" /></head>"]
|
||||
},
|
||||
|
||||
{"description": "head w/robots",
|
||||
"options": {"inject_meta_charset": true, "encoding":"utf-8"},
|
||||
"input": [["StartTag", "http://www.w3.org/1999/xhtml", "head", {}], ["EmptyTag","meta",[{"namespace": null, "name": "name", "value": "robots"},{"namespace": null, "name": "content", "value": "noindex"}]], ["EndTag", "http://www.w3.org/1999/xhtml", "head"]],
|
||||
"expected": ["<meta charset=utf-8><meta content=noindex name=robots>"],
|
||||
"xhtml": ["<head><meta charset=\"utf-8\" /><meta content=\"noindex\" name=\"robots\" /></head>"]
|
||||
},
|
||||
|
||||
{"description": "head w/robots & charset",
|
||||
"options": {"inject_meta_charset": true, "encoding":"utf-8"},
|
||||
"input": [["StartTag", "http://www.w3.org/1999/xhtml", "head", {}], ["EmptyTag","meta",[{"namespace": null, "name": "name", "value": "robots"},{"namespace": null, "name": "content", "value": "noindex"}]], ["EmptyTag","meta",[{"namespace": null, "name": "charset", "value": "ascii"}]], ["EndTag", "http://www.w3.org/1999/xhtml", "head"]],
|
||||
"expected": ["<meta content=noindex name=robots><meta charset=utf-8>"],
|
||||
"xhtml": ["<head><meta content=\"noindex\" name=\"robots\" /><meta charset=\"utf-8\" /></head>"]
|
||||
},
|
||||
|
||||
{"description": "head w/ charset in http-equiv content-type",
|
||||
"options": {"inject_meta_charset": true, "encoding":"utf-8"},
|
||||
"input": [["StartTag", "http://www.w3.org/1999/xhtml", "head", {}], ["EmptyTag","meta",[{"namespace": null, "name": "http-equiv", "value": "content-type"}, {"namespace": null, "name": "content", "value": "text/html; charset=ascii"}]], ["EndTag", "http://www.w3.org/1999/xhtml", "head"]],
|
||||
"expected": ["<meta content=\"text/html; charset=utf-8\" http-equiv=content-type>"],
|
||||
"xhtml": ["<head><meta content=\"text/html; charset=utf-8\" http-equiv=\"content-type\" /></head>"]
|
||||
},
|
||||
|
||||
{"description": "head w/robots & charset in http-equiv content-type",
|
||||
"options": {"inject_meta_charset": true, "encoding":"utf-8"},
|
||||
"input": [["StartTag", "http://www.w3.org/1999/xhtml", "head", {}], ["EmptyTag","meta",[{"namespace": null, "name": "name", "value": "robots"},{"namespace": null, "name": "content", "value": "noindex"}]], ["EmptyTag","meta",[{"namespace": null, "name": "http-equiv", "value": "content-type"}, {"namespace": null, "name": "content", "value": "text/html; charset=ascii"}]], ["EndTag", "http://www.w3.org/1999/xhtml", "head"]],
|
||||
"expected": ["<meta content=noindex name=robots><meta content=\"text/html; charset=utf-8\" http-equiv=content-type>"],
|
||||
"xhtml": ["<head><meta content=\"noindex\" name=\"robots\" /><meta content=\"text/html; charset=utf-8\" http-equiv=\"content-type\" /></head>"]
|
||||
}
|
||||
|
||||
]}
|
965
tests/wpt/web-platform-tests/tools/html5lib/html5lib/tests/testdata/serializer/optionaltags.test
vendored
Normal file
965
tests/wpt/web-platform-tests/tools/html5lib/html5lib/tests/testdata/serializer/optionaltags.test
vendored
Normal file
|
@ -0,0 +1,965 @@
|
|||
{"tests": [
|
||||
|
||||
{"description": "html start-tag followed by text, with attributes",
|
||||
"input": [["StartTag", "http://www.w3.org/1999/xhtml", "html", [{"namespace": null, "name": "lang", "value": "en"}]], ["Characters", "foo"]],
|
||||
"expected": ["<html lang=en>foo"]
|
||||
},
|
||||
|
||||
|
||||
|
||||
{"description": "html start-tag followed by comment",
|
||||
"input": [["StartTag", "http://www.w3.org/1999/xhtml", "html", {}], ["Comment", "foo"]],
|
||||
"expected": ["<html><!--foo-->"]
|
||||
},
|
||||
|
||||
{"description": "html start-tag followed by space character",
|
||||
"input": [["StartTag", "http://www.w3.org/1999/xhtml", "html", {}], ["Characters", " foo"]],
|
||||
"expected": ["<html> foo"]
|
||||
},
|
||||
|
||||
{"description": "html start-tag followed by text",
|
||||
"input": [["StartTag", "http://www.w3.org/1999/xhtml", "html", {}], ["Characters", "foo"]],
|
||||
"expected": ["foo"]
|
||||
},
|
||||
|
||||
{"description": "html start-tag followed by start-tag",
|
||||
"input": [["StartTag", "http://www.w3.org/1999/xhtml", "html", {}], ["StartTag", "http://www.w3.org/1999/xhtml", "foo", {}]],
|
||||
"expected": ["<foo>"]
|
||||
},
|
||||
|
||||
{"description": "html start-tag followed by end-tag",
|
||||
"input": [["StartTag", "http://www.w3.org/1999/xhtml", "html", {}], ["EndTag", "http://www.w3.org/1999/xhtml", "foo"]],
|
||||
"expected": ["</foo>"]
|
||||
},
|
||||
|
||||
{"description": "html start-tag at EOF (shouldn't ever happen?!)",
|
||||
"input": [["StartTag", "http://www.w3.org/1999/xhtml", "html", {}]],
|
||||
"expected": [""]
|
||||
},
|
||||
|
||||
|
||||
|
||||
{"description": "html end-tag followed by comment",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "html"], ["Comment", "foo"]],
|
||||
"expected": ["</html><!--foo-->"]
|
||||
},
|
||||
|
||||
{"description": "html end-tag followed by space character",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "html"], ["Characters", " foo"]],
|
||||
"expected": ["</html> foo"]
|
||||
},
|
||||
|
||||
{"description": "html end-tag followed by text",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "html"], ["Characters", "foo"]],
|
||||
"expected": ["foo"]
|
||||
},
|
||||
|
||||
{"description": "html end-tag followed by start-tag",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "html"], ["StartTag", "http://www.w3.org/1999/xhtml", "foo", {}]],
|
||||
"expected": ["<foo>"]
|
||||
},
|
||||
|
||||
{"description": "html end-tag followed by end-tag",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "html"], ["EndTag", "http://www.w3.org/1999/xhtml", "foo"]],
|
||||
"expected": ["</foo>"]
|
||||
},
|
||||
|
||||
{"description": "html end-tag at EOF",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "html"]],
|
||||
"expected": [""]
|
||||
},
|
||||
|
||||
|
||||
|
||||
|
||||
{"description": "head start-tag followed by comment",
|
||||
"input": [["StartTag", "http://www.w3.org/1999/xhtml", "head", {}], ["Comment", "foo"]],
|
||||
"expected": ["<head><!--foo-->"]
|
||||
},
|
||||
|
||||
{"description": "head start-tag followed by space character",
|
||||
"input": [["StartTag", "http://www.w3.org/1999/xhtml", "head", {}], ["Characters", " foo"]],
|
||||
"expected": ["<head> foo"]
|
||||
},
|
||||
|
||||
{"description": "head start-tag followed by text",
|
||||
"input": [["StartTag", "http://www.w3.org/1999/xhtml", "head", {}], ["Characters", "foo"]],
|
||||
"expected": ["<head>foo"]
|
||||
},
|
||||
|
||||
{"description": "head start-tag followed by start-tag",
|
||||
"input": [["StartTag", "http://www.w3.org/1999/xhtml", "head", {}], ["StartTag", "http://www.w3.org/1999/xhtml", "foo", {}]],
|
||||
"expected": ["<foo>"]
|
||||
},
|
||||
|
||||
{"description": "head start-tag followed by end-tag (shouldn't ever happen?!)",
|
||||
"input": [["StartTag", "http://www.w3.org/1999/xhtml", "head", {}], ["EndTag", "http://www.w3.org/1999/xhtml", "foo"]],
|
||||
"expected": ["<head></foo>", "</foo>"]
|
||||
},
|
||||
|
||||
{"description": "empty head element",
|
||||
"input": [["StartTag", "http://www.w3.org/1999/xhtml", "head", {}], ["EndTag", "http://www.w3.org/1999/xhtml", "head"]],
|
||||
"expected": [""]
|
||||
},
|
||||
|
||||
{"description": "head start-tag followed by empty-tag",
|
||||
"input": [["StartTag", "http://www.w3.org/1999/xhtml", "head", {}], ["EmptyTag", "foo", {}]],
|
||||
"expected": ["<foo>"]
|
||||
},
|
||||
|
||||
{"description": "head start-tag at EOF (shouldn't ever happen?!)",
|
||||
"input": [["StartTag", "http://www.w3.org/1999/xhtml", "head", {}]],
|
||||
"expected": ["<head>", ""]
|
||||
},
|
||||
|
||||
|
||||
|
||||
{"description": "head end-tag followed by comment",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "head"], ["Comment", "foo"]],
|
||||
"expected": ["</head><!--foo-->"]
|
||||
},
|
||||
|
||||
{"description": "head end-tag followed by space character",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "head"], ["Characters", " foo"]],
|
||||
"expected": ["</head> foo"]
|
||||
},
|
||||
|
||||
{"description": "head end-tag followed by text",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "head"], ["Characters", "foo"]],
|
||||
"expected": ["foo"]
|
||||
},
|
||||
|
||||
{"description": "head end-tag followed by start-tag",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "head"], ["StartTag", "http://www.w3.org/1999/xhtml", "foo", {}]],
|
||||
"expected": ["<foo>"]
|
||||
},
|
||||
|
||||
{"description": "head end-tag followed by end-tag",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "head"], ["EndTag", "http://www.w3.org/1999/xhtml", "foo"]],
|
||||
"expected": ["</foo>"]
|
||||
},
|
||||
|
||||
{"description": "head end-tag at EOF",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "head"]],
|
||||
"expected": [""]
|
||||
},
|
||||
|
||||
|
||||
|
||||
|
||||
{"description": "body start-tag followed by comment",
|
||||
"input": [["StartTag", "http://www.w3.org/1999/xhtml", "body", {}], ["Comment", "foo"]],
|
||||
"expected": ["<body><!--foo-->"]
|
||||
},
|
||||
|
||||
{"description": "body start-tag followed by space character",
|
||||
"input": [["StartTag", "http://www.w3.org/1999/xhtml", "body", {}], ["Characters", " foo"]],
|
||||
"expected": ["<body> foo"]
|
||||
},
|
||||
|
||||
{"description": "body start-tag followed by text",
|
||||
"input": [["StartTag", "http://www.w3.org/1999/xhtml", "body", {}], ["Characters", "foo"]],
|
||||
"expected": ["foo"]
|
||||
},
|
||||
|
||||
{"description": "body start-tag followed by start-tag",
|
||||
"input": [["StartTag", "http://www.w3.org/1999/xhtml", "body", {}], ["StartTag", "http://www.w3.org/1999/xhtml", "foo", {}]],
|
||||
"expected": ["<foo>"]
|
||||
},
|
||||
|
||||
{"description": "body start-tag followed by end-tag",
|
||||
"input": [["StartTag", "http://www.w3.org/1999/xhtml", "body", {}], ["EndTag", "http://www.w3.org/1999/xhtml", "foo"]],
|
||||
"expected": ["</foo>"]
|
||||
},
|
||||
|
||||
{"description": "body start-tag at EOF (shouldn't ever happen?!)",
|
||||
"input": [["StartTag", "http://www.w3.org/1999/xhtml", "body", {}]],
|
||||
"expected": [""]
|
||||
},
|
||||
|
||||
|
||||
|
||||
{"description": "body end-tag followed by comment",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "body"], ["Comment", "foo"]],
|
||||
"expected": ["</body><!--foo-->"]
|
||||
},
|
||||
|
||||
{"description": "body end-tag followed by space character",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "body"], ["Characters", " foo"]],
|
||||
"expected": ["</body> foo"]
|
||||
},
|
||||
|
||||
{"description": "body end-tag followed by text",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "body"], ["Characters", "foo"]],
|
||||
"expected": ["foo"]
|
||||
},
|
||||
|
||||
{"description": "body end-tag followed by start-tag",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "body"], ["StartTag", "http://www.w3.org/1999/xhtml", "foo", {}]],
|
||||
"expected": ["<foo>"]
|
||||
},
|
||||
|
||||
{"description": "body end-tag followed by end-tag",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "body"], ["EndTag", "http://www.w3.org/1999/xhtml", "foo"]],
|
||||
"expected": ["</foo>"]
|
||||
},
|
||||
|
||||
{"description": "body end-tag at EOF",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "body"]],
|
||||
"expected": [""]
|
||||
},
|
||||
|
||||
|
||||
|
||||
|
||||
{"description": "li end-tag followed by comment",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "li"], ["Comment", "foo"]],
|
||||
"expected": ["</li><!--foo-->"]
|
||||
},
|
||||
|
||||
{"description": "li end-tag followed by space character",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "li"], ["Characters", " foo"]],
|
||||
"expected": ["</li> foo"]
|
||||
},
|
||||
|
||||
{"description": "li end-tag followed by text",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "li"], ["Characters", "foo"]],
|
||||
"expected": ["</li>foo"]
|
||||
},
|
||||
|
||||
{"description": "li end-tag followed by start-tag",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "li"], ["StartTag", "http://www.w3.org/1999/xhtml", "foo", {}]],
|
||||
"expected": ["</li><foo>"]
|
||||
},
|
||||
|
||||
{"description": "li end-tag followed by li start-tag",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "li"], ["StartTag", "http://www.w3.org/1999/xhtml", "li", {}]],
|
||||
"expected": ["<li>"]
|
||||
},
|
||||
|
||||
{"description": "li end-tag followed by end-tag",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "li"], ["EndTag", "http://www.w3.org/1999/xhtml", "foo"]],
|
||||
"expected": ["</foo>"]
|
||||
},
|
||||
|
||||
{"description": "li end-tag at EOF",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "li"]],
|
||||
"expected": [""]
|
||||
},
|
||||
|
||||
|
||||
|
||||
|
||||
{"description": "dt end-tag followed by comment",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "dt"], ["Comment", "foo"]],
|
||||
"expected": ["</dt><!--foo-->"]
|
||||
},
|
||||
|
||||
{"description": "dt end-tag followed by space character",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "dt"], ["Characters", " foo"]],
|
||||
"expected": ["</dt> foo"]
|
||||
},
|
||||
|
||||
{"description": "dt end-tag followed by text",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "dt"], ["Characters", "foo"]],
|
||||
"expected": ["</dt>foo"]
|
||||
},
|
||||
|
||||
{"description": "dt end-tag followed by start-tag",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "dt"], ["StartTag", "http://www.w3.org/1999/xhtml", "foo", {}]],
|
||||
"expected": ["</dt><foo>"]
|
||||
},
|
||||
|
||||
{"description": "dt end-tag followed by dt start-tag",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "dt"], ["StartTag", "http://www.w3.org/1999/xhtml", "dt", {}]],
|
||||
"expected": ["<dt>"]
|
||||
},
|
||||
|
||||
{"description": "dt end-tag followed by dd start-tag",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "dt"], ["StartTag", "http://www.w3.org/1999/xhtml", "dd", {}]],
|
||||
"expected": ["<dd>"]
|
||||
},
|
||||
|
||||
{"description": "dt end-tag followed by end-tag",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "dt"], ["EndTag", "http://www.w3.org/1999/xhtml", "foo"]],
|
||||
"expected": ["</dt></foo>"]
|
||||
},
|
||||
|
||||
{"description": "dt end-tag at EOF",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "dt"]],
|
||||
"expected": ["</dt>"]
|
||||
},
|
||||
|
||||
|
||||
|
||||
|
||||
{"description": "dd end-tag followed by comment",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "dd"], ["Comment", "foo"]],
|
||||
"expected": ["</dd><!--foo-->"]
|
||||
},
|
||||
|
||||
{"description": "dd end-tag followed by space character",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "dd"], ["Characters", " foo"]],
|
||||
"expected": ["</dd> foo"]
|
||||
},
|
||||
|
||||
{"description": "dd end-tag followed by text",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "dd"], ["Characters", "foo"]],
|
||||
"expected": ["</dd>foo"]
|
||||
},
|
||||
|
||||
{"description": "dd end-tag followed by start-tag",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "dd"], ["StartTag", "http://www.w3.org/1999/xhtml", "foo", {}]],
|
||||
"expected": ["</dd><foo>"]
|
||||
},
|
||||
|
||||
{"description": "dd end-tag followed by dd start-tag",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "dd"], ["StartTag", "http://www.w3.org/1999/xhtml", "dd", {}]],
|
||||
"expected": ["<dd>"]
|
||||
},
|
||||
|
||||
{"description": "dd end-tag followed by dt start-tag",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "dd"], ["StartTag", "http://www.w3.org/1999/xhtml", "dt", {}]],
|
||||
"expected": ["<dt>"]
|
||||
},
|
||||
|
||||
{"description": "dd end-tag followed by end-tag",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "dd"], ["EndTag", "http://www.w3.org/1999/xhtml", "foo"]],
|
||||
"expected": ["</foo>"]
|
||||
},
|
||||
|
||||
{"description": "dd end-tag at EOF",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "dd"]],
|
||||
"expected": [""]
|
||||
},
|
||||
|
||||
|
||||
|
||||
|
||||
{"description": "p end-tag followed by comment",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "p"], ["Comment", "foo"]],
|
||||
"expected": ["</p><!--foo-->"]
|
||||
},
|
||||
|
||||
{"description": "p end-tag followed by space character",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "p"], ["Characters", " foo"]],
|
||||
"expected": ["</p> foo"]
|
||||
},
|
||||
|
||||
{"description": "p end-tag followed by text",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "p"], ["Characters", "foo"]],
|
||||
"expected": ["</p>foo"]
|
||||
},
|
||||
|
||||
{"description": "p end-tag followed by start-tag",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "p"], ["StartTag", "http://www.w3.org/1999/xhtml", "foo", {}]],
|
||||
"expected": ["</p><foo>"]
|
||||
},
|
||||
|
||||
{"description": "p end-tag followed by address start-tag",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "p"], ["StartTag", "http://www.w3.org/1999/xhtml", "address", {}]],
|
||||
"expected": ["<address>"]
|
||||
},
|
||||
|
||||
{"description": "p end-tag followed by article start-tag",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "p"], ["StartTag", "http://www.w3.org/1999/xhtml", "article", {}]],
|
||||
"expected": ["<article>"]
|
||||
},
|
||||
|
||||
{"description": "p end-tag followed by aside start-tag",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "p"], ["StartTag", "http://www.w3.org/1999/xhtml", "aside", {}]],
|
||||
"expected": ["<aside>"]
|
||||
},
|
||||
|
||||
{"description": "p end-tag followed by blockquote start-tag",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "p"], ["StartTag", "http://www.w3.org/1999/xhtml", "blockquote", {}]],
|
||||
"expected": ["<blockquote>"]
|
||||
},
|
||||
|
||||
{"description": "p end-tag followed by datagrid start-tag",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "p"], ["StartTag", "http://www.w3.org/1999/xhtml", "datagrid", {}]],
|
||||
"expected": ["<datagrid>"]
|
||||
},
|
||||
|
||||
{"description": "p end-tag followed by dialog start-tag",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "p"], ["StartTag", "http://www.w3.org/1999/xhtml", "dialog", {}]],
|
||||
"expected": ["<dialog>"]
|
||||
},
|
||||
|
||||
{"description": "p end-tag followed by dir start-tag",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "p"], ["StartTag", "http://www.w3.org/1999/xhtml", "dir", {}]],
|
||||
"expected": ["<dir>"]
|
||||
},
|
||||
|
||||
{"description": "p end-tag followed by div start-tag",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "p"], ["StartTag", "http://www.w3.org/1999/xhtml", "div", {}]],
|
||||
"expected": ["<div>"]
|
||||
},
|
||||
|
||||
{"description": "p end-tag followed by dl start-tag",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "p"], ["StartTag", "http://www.w3.org/1999/xhtml", "dl", {}]],
|
||||
"expected": ["<dl>"]
|
||||
},
|
||||
|
||||
{"description": "p end-tag followed by fieldset start-tag",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "p"], ["StartTag", "http://www.w3.org/1999/xhtml", "fieldset", {}]],
|
||||
"expected": ["<fieldset>"]
|
||||
},
|
||||
|
||||
{"description": "p end-tag followed by footer start-tag",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "p"], ["StartTag", "http://www.w3.org/1999/xhtml", "footer", {}]],
|
||||
"expected": ["<footer>"]
|
||||
},
|
||||
|
||||
{"description": "p end-tag followed by form start-tag",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "p"], ["StartTag", "http://www.w3.org/1999/xhtml", "form", {}]],
|
||||
"expected": ["<form>"]
|
||||
},
|
||||
|
||||
{"description": "p end-tag followed by h1 start-tag",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "p"], ["StartTag", "http://www.w3.org/1999/xhtml", "h1", {}]],
|
||||
"expected": ["<h1>"]
|
||||
},
|
||||
|
||||
{"description": "p end-tag followed by h2 start-tag",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "p"], ["StartTag", "http://www.w3.org/1999/xhtml", "h2", {}]],
|
||||
"expected": ["<h2>"]
|
||||
},
|
||||
|
||||
{"description": "p end-tag followed by h3 start-tag",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "p"], ["StartTag", "http://www.w3.org/1999/xhtml", "h3", {}]],
|
||||
"expected": ["<h3>"]
|
||||
},
|
||||
|
||||
{"description": "p end-tag followed by h4 start-tag",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "p"], ["StartTag", "http://www.w3.org/1999/xhtml", "h4", {}]],
|
||||
"expected": ["<h4>"]
|
||||
},
|
||||
|
||||
{"description": "p end-tag followed by h5 start-tag",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "p"], ["StartTag", "http://www.w3.org/1999/xhtml", "h5", {}]],
|
||||
"expected": ["<h5>"]
|
||||
},
|
||||
|
||||
{"description": "p end-tag followed by h6 start-tag",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "p"], ["StartTag", "http://www.w3.org/1999/xhtml", "h6", {}]],
|
||||
"expected": ["<h6>"]
|
||||
},
|
||||
|
||||
{"description": "p end-tag followed by header start-tag",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "p"], ["StartTag", "http://www.w3.org/1999/xhtml", "header", {}]],
|
||||
"expected": ["<header>"]
|
||||
},
|
||||
|
||||
{"description": "p end-tag followed by hr empty-tag",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "p"], ["EmptyTag", "hr", {}]],
|
||||
"expected": ["<hr>"]
|
||||
},
|
||||
|
||||
{"description": "p end-tag followed by menu start-tag",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "p"], ["StartTag", "http://www.w3.org/1999/xhtml", "menu", {}]],
|
||||
"expected": ["<menu>"]
|
||||
},
|
||||
|
||||
{"description": "p end-tag followed by nav start-tag",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "p"], ["StartTag", "http://www.w3.org/1999/xhtml", "nav", {}]],
|
||||
"expected": ["<nav>"]
|
||||
},
|
||||
|
||||
{"description": "p end-tag followed by ol start-tag",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "p"], ["StartTag", "http://www.w3.org/1999/xhtml", "ol", {}]],
|
||||
"expected": ["<ol>"]
|
||||
},
|
||||
|
||||
{"description": "p end-tag followed by p start-tag",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "p"], ["StartTag", "http://www.w3.org/1999/xhtml", "p", {}]],
|
||||
"expected": ["<p>"]
|
||||
},
|
||||
|
||||
{"description": "p end-tag followed by pre start-tag",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "p"], ["StartTag", "http://www.w3.org/1999/xhtml", "pre", {}]],
|
||||
"expected": ["<pre>"]
|
||||
},
|
||||
|
||||
{"description": "p end-tag followed by section start-tag",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "p"], ["StartTag", "http://www.w3.org/1999/xhtml", "section", {}]],
|
||||
"expected": ["<section>"]
|
||||
},
|
||||
|
||||
{"description": "p end-tag followed by table start-tag",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "p"], ["StartTag", "http://www.w3.org/1999/xhtml", "table", {}]],
|
||||
"expected": ["<table>"]
|
||||
},
|
||||
|
||||
{"description": "p end-tag followed by ul start-tag",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "p"], ["StartTag", "http://www.w3.org/1999/xhtml", "ul", {}]],
|
||||
"expected": ["<ul>"]
|
||||
},
|
||||
|
||||
{"description": "p end-tag followed by end-tag",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "p"], ["EndTag", "http://www.w3.org/1999/xhtml", "foo"]],
|
||||
"expected": ["</foo>"]
|
||||
},
|
||||
|
||||
{"description": "p end-tag at EOF",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "p"]],
|
||||
"expected": [""]
|
||||
},
|
||||
|
||||
|
||||
|
||||
|
||||
{"description": "optgroup end-tag followed by comment",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "optgroup"], ["Comment", "foo"]],
|
||||
"expected": ["</optgroup><!--foo-->"]
|
||||
},
|
||||
|
||||
{"description": "optgroup end-tag followed by space character",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "optgroup"], ["Characters", " foo"]],
|
||||
"expected": ["</optgroup> foo"]
|
||||
},
|
||||
|
||||
{"description": "optgroup end-tag followed by text",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "optgroup"], ["Characters", "foo"]],
|
||||
"expected": ["</optgroup>foo"]
|
||||
},
|
||||
|
||||
{"description": "optgroup end-tag followed by start-tag",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "optgroup"], ["StartTag", "http://www.w3.org/1999/xhtml", "foo", {}]],
|
||||
"expected": ["</optgroup><foo>"]
|
||||
},
|
||||
|
||||
{"description": "optgroup end-tag followed by optgroup start-tag",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "optgroup"], ["StartTag", "http://www.w3.org/1999/xhtml", "optgroup", {}]],
|
||||
"expected": ["<optgroup>"]
|
||||
},
|
||||
|
||||
{"description": "optgroup end-tag followed by end-tag",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "optgroup"], ["EndTag", "http://www.w3.org/1999/xhtml", "foo"]],
|
||||
"expected": ["</foo>"]
|
||||
},
|
||||
|
||||
{"description": "optgroup end-tag at EOF",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "optgroup"]],
|
||||
"expected": [""]
|
||||
},
|
||||
|
||||
|
||||
|
||||
|
||||
{"description": "option end-tag followed by comment",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "option"], ["Comment", "foo"]],
|
||||
"expected": ["</option><!--foo-->"]
|
||||
},
|
||||
|
||||
{"description": "option end-tag followed by space character",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "option"], ["Characters", " foo"]],
|
||||
"expected": ["</option> foo"]
|
||||
},
|
||||
|
||||
{"description": "option end-tag followed by text",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "option"], ["Characters", "foo"]],
|
||||
"expected": ["</option>foo"]
|
||||
},
|
||||
|
||||
{"description": "option end-tag followed by optgroup start-tag",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "option"], ["StartTag", "http://www.w3.org/1999/xhtml", "optgroup", {}]],
|
||||
"expected": ["<optgroup>"]
|
||||
},
|
||||
|
||||
{"description": "option end-tag followed by start-tag",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "option"], ["StartTag", "http://www.w3.org/1999/xhtml", "foo", {}]],
|
||||
"expected": ["</option><foo>"]
|
||||
},
|
||||
|
||||
{"description": "option end-tag followed by option start-tag",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "option"], ["StartTag", "http://www.w3.org/1999/xhtml", "option", {}]],
|
||||
"expected": ["<option>"]
|
||||
},
|
||||
|
||||
{"description": "option end-tag followed by end-tag",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "option"], ["EndTag", "http://www.w3.org/1999/xhtml", "foo"]],
|
||||
"expected": ["</foo>"]
|
||||
},
|
||||
|
||||
{"description": "option end-tag at EOF",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "option"]],
|
||||
"expected": [""]
|
||||
},
|
||||
|
||||
|
||||
|
||||
|
||||
{"description": "colgroup start-tag followed by comment",
|
||||
"input": [["StartTag", "http://www.w3.org/1999/xhtml", "colgroup", {}], ["Comment", "foo"]],
|
||||
"expected": ["<colgroup><!--foo-->"]
|
||||
},
|
||||
|
||||
{"description": "colgroup start-tag followed by space character",
|
||||
"input": [["StartTag", "http://www.w3.org/1999/xhtml", "colgroup", {}], ["Characters", " foo"]],
|
||||
"expected": ["<colgroup> foo"]
|
||||
},
|
||||
|
||||
{"description": "colgroup start-tag followed by text",
|
||||
"input": [["StartTag", "http://www.w3.org/1999/xhtml", "colgroup", {}], ["Characters", "foo"]],
|
||||
"expected": ["<colgroup>foo"]
|
||||
},
|
||||
|
||||
{"description": "colgroup start-tag followed by start-tag",
|
||||
"input": [["StartTag", "http://www.w3.org/1999/xhtml", "colgroup", {}], ["StartTag", "http://www.w3.org/1999/xhtml", "foo", {}]],
|
||||
"expected": ["<colgroup><foo>"]
|
||||
},
|
||||
|
||||
{"description": "first colgroup in a table with a col child",
|
||||
"input": [["StartTag", "http://www.w3.org/1999/xhtml", "table", {}], ["StartTag", "http://www.w3.org/1999/xhtml", "colgroup", {}], ["EmptyTag", "col", {}]],
|
||||
"expected": ["<table><col>"]
|
||||
},
|
||||
|
||||
{"description": "colgroup with a col child, following another colgroup",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "colgroup"], ["StartTag", "http://www.w3.org/1999/xhtml", "colgroup", {}], ["StartTag", "http://www.w3.org/1999/xhtml", "col", {}]],
|
||||
"expected": ["</colgroup><col>", "<colgroup><col>"]
|
||||
},
|
||||
|
||||
{"description": "colgroup start-tag followed by end-tag",
|
||||
"input": [["StartTag", "http://www.w3.org/1999/xhtml", "colgroup", {}], ["EndTag", "http://www.w3.org/1999/xhtml", "foo"]],
|
||||
"expected": ["<colgroup></foo>"]
|
||||
},
|
||||
|
||||
{"description": "colgroup start-tag at EOF",
|
||||
"input": [["StartTag", "http://www.w3.org/1999/xhtml", "colgroup", {}]],
|
||||
"expected": ["<colgroup>"]
|
||||
},
|
||||
|
||||
|
||||
|
||||
{"description": "colgroup end-tag followed by comment",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "colgroup"], ["Comment", "foo"]],
|
||||
"expected": ["</colgroup><!--foo-->"]
|
||||
},
|
||||
|
||||
{"description": "colgroup end-tag followed by space character",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "colgroup"], ["Characters", " foo"]],
|
||||
"expected": ["</colgroup> foo"]
|
||||
},
|
||||
|
||||
{"description": "colgroup end-tag followed by text",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "colgroup"], ["Characters", "foo"]],
|
||||
"expected": ["foo"]
|
||||
},
|
||||
|
||||
{"description": "colgroup end-tag followed by start-tag",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "colgroup"], ["StartTag", "http://www.w3.org/1999/xhtml", "foo", {}]],
|
||||
"expected": ["<foo>"]
|
||||
},
|
||||
|
||||
{"description": "colgroup end-tag followed by end-tag",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "colgroup"], ["EndTag", "http://www.w3.org/1999/xhtml", "foo"]],
|
||||
"expected": ["</foo>"]
|
||||
},
|
||||
|
||||
{"description": "colgroup end-tag at EOF",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "colgroup"]],
|
||||
"expected": [""]
|
||||
},
|
||||
|
||||
|
||||
|
||||
|
||||
{"description": "thead end-tag followed by comment",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "thead"], ["Comment", "foo"]],
|
||||
"expected": ["</thead><!--foo-->"]
|
||||
},
|
||||
|
||||
{"description": "thead end-tag followed by space character",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "thead"], ["Characters", " foo"]],
|
||||
"expected": ["</thead> foo"]
|
||||
},
|
||||
|
||||
{"description": "thead end-tag followed by text",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "thead"], ["Characters", "foo"]],
|
||||
"expected": ["</thead>foo"]
|
||||
},
|
||||
|
||||
{"description": "thead end-tag followed by start-tag",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "thead"], ["StartTag", "http://www.w3.org/1999/xhtml", "foo", {}]],
|
||||
"expected": ["</thead><foo>"]
|
||||
},
|
||||
|
||||
{"description": "thead end-tag followed by tbody start-tag",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "thead"], ["StartTag", "http://www.w3.org/1999/xhtml", "tbody", {}]],
|
||||
"expected": ["<tbody>"]
|
||||
},
|
||||
|
||||
{"description": "thead end-tag followed by tfoot start-tag",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "thead"], ["StartTag", "http://www.w3.org/1999/xhtml", "tfoot", {}]],
|
||||
"expected": ["<tfoot>"]
|
||||
},
|
||||
|
||||
{"description": "thead end-tag followed by end-tag",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "thead"], ["EndTag", "http://www.w3.org/1999/xhtml", "foo"]],
|
||||
"expected": ["</thead></foo>"]
|
||||
},
|
||||
|
||||
{"description": "thead end-tag at EOF",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "thead"]],
|
||||
"expected": ["</thead>"]
|
||||
},
|
||||
|
||||
|
||||
|
||||
|
||||
{"description": "tbody start-tag followed by comment",
|
||||
"input": [["StartTag", "http://www.w3.org/1999/xhtml", "tbody", {}], ["Comment", "foo"]],
|
||||
"expected": ["<tbody><!--foo-->"]
|
||||
},
|
||||
|
||||
{"description": "tbody start-tag followed by space character",
|
||||
"input": [["StartTag", "http://www.w3.org/1999/xhtml", "tbody", {}], ["Characters", " foo"]],
|
||||
"expected": ["<tbody> foo"]
|
||||
},
|
||||
|
||||
{"description": "tbody start-tag followed by text",
|
||||
"input": [["StartTag", "http://www.w3.org/1999/xhtml", "tbody", {}], ["Characters", "foo"]],
|
||||
"expected": ["<tbody>foo"]
|
||||
},
|
||||
|
||||
{"description": "tbody start-tag followed by start-tag",
|
||||
"input": [["StartTag", "http://www.w3.org/1999/xhtml", "tbody", {}], ["StartTag", "http://www.w3.org/1999/xhtml", "foo", {}]],
|
||||
"expected": ["<tbody><foo>"]
|
||||
},
|
||||
|
||||
{"description": "first tbody in a table with a tr child",
|
||||
"input": [["StartTag", "http://www.w3.org/1999/xhtml", "table", {}], ["StartTag", "http://www.w3.org/1999/xhtml", "tbody", {}], ["StartTag", "http://www.w3.org/1999/xhtml", "tr", {}]],
|
||||
"expected": ["<table><tr>"]
|
||||
},
|
||||
|
||||
{"description": "tbody with a tr child, following another tbody",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "tbody"], ["StartTag", "http://www.w3.org/1999/xhtml", "tbody", {}], ["StartTag", "http://www.w3.org/1999/xhtml", "tr", {}]],
|
||||
"expected": ["<tbody><tr>", "</tbody><tr>"]
|
||||
},
|
||||
|
||||
{"description": "tbody with a tr child, following a thead",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "thead"], ["StartTag", "http://www.w3.org/1999/xhtml", "tbody", {}], ["StartTag", "http://www.w3.org/1999/xhtml", "tr", {}]],
|
||||
"expected": ["<tbody><tr>", "</thead><tr>"]
|
||||
},
|
||||
|
||||
{"description": "tbody with a tr child, following a tfoot",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "tfoot"], ["StartTag", "http://www.w3.org/1999/xhtml", "tbody", {}], ["StartTag", "http://www.w3.org/1999/xhtml", "tr", {}]],
|
||||
"expected": ["<tbody><tr>", "</tfoot><tr>"]
|
||||
},
|
||||
|
||||
{"description": "tbody start-tag followed by end-tag",
|
||||
"input": [["StartTag", "http://www.w3.org/1999/xhtml", "tbody", {}], ["EndTag", "http://www.w3.org/1999/xhtml", "foo"]],
|
||||
"expected": ["<tbody></foo>"]
|
||||
},
|
||||
|
||||
{"description": "tbody start-tag at EOF",
|
||||
"input": [["StartTag", "http://www.w3.org/1999/xhtml", "tbody", {}]],
|
||||
"expected": ["<tbody>"]
|
||||
},
|
||||
|
||||
|
||||
|
||||
{"description": "tbody end-tag followed by comment",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "tbody"], ["Comment", "foo"]],
|
||||
"expected": ["</tbody><!--foo-->"]
|
||||
},
|
||||
|
||||
{"description": "tbody end-tag followed by space character",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "tbody"], ["Characters", " foo"]],
|
||||
"expected": ["</tbody> foo"]
|
||||
},
|
||||
|
||||
{"description": "tbody end-tag followed by text",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "tbody"], ["Characters", "foo"]],
|
||||
"expected": ["</tbody>foo"]
|
||||
},
|
||||
|
||||
{"description": "tbody end-tag followed by start-tag",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "tbody"], ["StartTag", "http://www.w3.org/1999/xhtml", "foo", {}]],
|
||||
"expected": ["</tbody><foo>"]
|
||||
},
|
||||
|
||||
{"description": "tbody end-tag followed by tbody start-tag",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "tbody"], ["StartTag", "http://www.w3.org/1999/xhtml", "tbody", {}]],
|
||||
"expected": ["<tbody>", "</tbody>"]
|
||||
},
|
||||
|
||||
{"description": "tbody end-tag followed by tfoot start-tag",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "tbody"], ["StartTag", "http://www.w3.org/1999/xhtml", "tfoot", {}]],
|
||||
"expected": ["<tfoot>"]
|
||||
},
|
||||
|
||||
{"description": "tbody end-tag followed by end-tag",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "tbody"], ["EndTag", "http://www.w3.org/1999/xhtml", "foo"]],
|
||||
"expected": ["</foo>"]
|
||||
},
|
||||
|
||||
{"description": "tbody end-tag at EOF",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "tbody"]],
|
||||
"expected": [""]
|
||||
},
|
||||
|
||||
|
||||
|
||||
|
||||
{"description": "tfoot end-tag followed by comment",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "tfoot"], ["Comment", "foo"]],
|
||||
"expected": ["</tfoot><!--foo-->"]
|
||||
},
|
||||
|
||||
{"description": "tfoot end-tag followed by space character",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "tfoot"], ["Characters", " foo"]],
|
||||
"expected": ["</tfoot> foo"]
|
||||
},
|
||||
|
||||
{"description": "tfoot end-tag followed by text",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "tfoot"], ["Characters", "foo"]],
|
||||
"expected": ["</tfoot>foo"]
|
||||
},
|
||||
|
||||
{"description": "tfoot end-tag followed by start-tag",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "tfoot"], ["StartTag", "http://www.w3.org/1999/xhtml", "foo", {}]],
|
||||
"expected": ["</tfoot><foo>"]
|
||||
},
|
||||
|
||||
{"description": "tfoot end-tag followed by tbody start-tag",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "tfoot"], ["StartTag", "http://www.w3.org/1999/xhtml", "tbody", {}]],
|
||||
"expected": ["<tbody>", "</tfoot>"]
|
||||
},
|
||||
|
||||
{"description": "tfoot end-tag followed by end-tag",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "tfoot"], ["EndTag", "http://www.w3.org/1999/xhtml", "foo"]],
|
||||
"expected": ["</foo>"]
|
||||
},
|
||||
|
||||
{"description": "tfoot end-tag at EOF",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "tfoot"]],
|
||||
"expected": [""]
|
||||
},
|
||||
|
||||
|
||||
|
||||
|
||||
{"description": "tr end-tag followed by comment",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "tr"], ["Comment", "foo"]],
|
||||
"expected": ["</tr><!--foo-->"]
|
||||
},
|
||||
|
||||
{"description": "tr end-tag followed by space character",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "tr"], ["Characters", " foo"]],
|
||||
"expected": ["</tr> foo"]
|
||||
},
|
||||
|
||||
{"description": "tr end-tag followed by text",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "tr"], ["Characters", "foo"]],
|
||||
"expected": ["</tr>foo"]
|
||||
},
|
||||
|
||||
{"description": "tr end-tag followed by start-tag",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "tr"], ["StartTag", "http://www.w3.org/1999/xhtml", "foo", {}]],
|
||||
"expected": ["</tr><foo>"]
|
||||
},
|
||||
|
||||
{"description": "tr end-tag followed by tr start-tag",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "tr"], ["StartTag", "http://www.w3.org/1999/xhtml", "tr", {}]],
|
||||
"expected": ["<tr>", "</tr>"]
|
||||
},
|
||||
|
||||
{"description": "tr end-tag followed by end-tag",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "tr"], ["EndTag", "http://www.w3.org/1999/xhtml", "foo"]],
|
||||
"expected": ["</foo>"]
|
||||
},
|
||||
|
||||
{"description": "tr end-tag at EOF",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "tr"]],
|
||||
"expected": [""]
|
||||
},
|
||||
|
||||
|
||||
|
||||
|
||||
{"description": "td end-tag followed by comment",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "td"], ["Comment", "foo"]],
|
||||
"expected": ["</td><!--foo-->"]
|
||||
},
|
||||
|
||||
{"description": "td end-tag followed by space character",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "td"], ["Characters", " foo"]],
|
||||
"expected": ["</td> foo"]
|
||||
},
|
||||
|
||||
{"description": "td end-tag followed by text",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "td"], ["Characters", "foo"]],
|
||||
"expected": ["</td>foo"]
|
||||
},
|
||||
|
||||
{"description": "td end-tag followed by start-tag",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "td"], ["StartTag", "http://www.w3.org/1999/xhtml", "foo", {}]],
|
||||
"expected": ["</td><foo>"]
|
||||
},
|
||||
|
||||
{"description": "td end-tag followed by td start-tag",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "td"], ["StartTag", "http://www.w3.org/1999/xhtml", "td", {}]],
|
||||
"expected": ["<td>", "</td>"]
|
||||
},
|
||||
|
||||
{"description": "td end-tag followed by th start-tag",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "td"], ["StartTag", "http://www.w3.org/1999/xhtml", "th", {}]],
|
||||
"expected": ["<th>", "</td>"]
|
||||
},
|
||||
|
||||
{"description": "td end-tag followed by end-tag",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "td"], ["EndTag", "http://www.w3.org/1999/xhtml", "foo"]],
|
||||
"expected": ["</foo>"]
|
||||
},
|
||||
|
||||
{"description": "td end-tag at EOF",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "td"]],
|
||||
"expected": [""]
|
||||
},
|
||||
|
||||
|
||||
|
||||
|
||||
{"description": "th end-tag followed by comment",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "th"], ["Comment", "foo"]],
|
||||
"expected": ["</th><!--foo-->"]
|
||||
},
|
||||
|
||||
{"description": "th end-tag followed by space character",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "th"], ["Characters", " foo"]],
|
||||
"expected": ["</th> foo"]
|
||||
},
|
||||
|
||||
{"description": "th end-tag followed by text",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "th"], ["Characters", "foo"]],
|
||||
"expected": ["</th>foo"]
|
||||
},
|
||||
|
||||
{"description": "th end-tag followed by start-tag",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "th"], ["StartTag", "http://www.w3.org/1999/xhtml", "foo", {}]],
|
||||
"expected": ["</th><foo>"]
|
||||
},
|
||||
|
||||
{"description": "th end-tag followed by th start-tag",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "th"], ["StartTag", "http://www.w3.org/1999/xhtml", "th", {}]],
|
||||
"expected": ["<th>", "</th>"]
|
||||
},
|
||||
|
||||
{"description": "th end-tag followed by td start-tag",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "th"], ["StartTag", "http://www.w3.org/1999/xhtml", "td", {}]],
|
||||
"expected": ["<td>", "</th>"]
|
||||
},
|
||||
|
||||
{"description": "th end-tag followed by end-tag",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml", "th"], ["EndTag", "http://www.w3.org/1999/xhtml", "foo"]],
|
||||
"expected": ["</foo>"]
|
||||
},
|
||||
|
||||
{"description": "th end-tag at EOF",
|
||||
"input": [["EndTag", "http://www.w3.org/1999/xhtml" , "th"]],
|
||||
"expected": [""]
|
||||
}
|
||||
|
||||
]}
|
60
tests/wpt/web-platform-tests/tools/html5lib/html5lib/tests/testdata/serializer/options.test
vendored
Normal file
60
tests/wpt/web-platform-tests/tools/html5lib/html5lib/tests/testdata/serializer/options.test
vendored
Normal file
|
@ -0,0 +1,60 @@
|
|||
{"tests":[
|
||||
|
||||
{"description": "quote_char=\"'\"",
|
||||
"options": {"quote_char": "'"},
|
||||
"input": [["StartTag", "http://www.w3.org/1999/xhtml", "span", [{"namespace": null, "name": "title", "value": "test 'with' quote_char"}]]],
|
||||
"expected": ["<span title='test 'with' quote_char'>"]
|
||||
},
|
||||
|
||||
{"description": "quote_attr_values=true",
|
||||
"options": {"quote_attr_values": true},
|
||||
"input": [["StartTag", "http://www.w3.org/1999/xhtml", "button", [{"namespace": null, "name": "disabled", "value" :"disabled"}]]],
|
||||
"expected": ["<button disabled>"],
|
||||
"xhtml": ["<button disabled=\"disabled\">"]
|
||||
},
|
||||
|
||||
{"description": "quote_attr_values=true with irrelevant",
|
||||
"options": {"quote_attr_values": true},
|
||||
"input": [["StartTag", "http://www.w3.org/1999/xhtml", "div", [{"namespace": null, "name": "irrelevant", "value" :"irrelevant"}]]],
|
||||
"expected": ["<div irrelevant>"],
|
||||
"xhtml": ["<div irrelevant=\"irrelevant\">"]
|
||||
},
|
||||
|
||||
{"description": "use_trailing_solidus=true with void element",
|
||||
"options": {"use_trailing_solidus": true},
|
||||
"input": [["EmptyTag", "img", {}]],
|
||||
"expected": ["<img />"]
|
||||
},
|
||||
|
||||
{"description": "use_trailing_solidus=true with non-void element",
|
||||
"options": {"use_trailing_solidus": true},
|
||||
"input": [["StartTag", "http://www.w3.org/1999/xhtml", "div", {}]],
|
||||
"expected": ["<div>"]
|
||||
},
|
||||
|
||||
{"description": "minimize_boolean_attributes=false",
|
||||
"options": {"minimize_boolean_attributes": false},
|
||||
"input": [["StartTag", "http://www.w3.org/1999/xhtml", "div", [{"namespace": null, "name": "irrelevant", "value" :"irrelevant"}]]],
|
||||
"expected": ["<div irrelevant=irrelevant>"],
|
||||
"xhtml": ["<div irrelevant=\"irrelevant\">"]
|
||||
},
|
||||
|
||||
{"description": "minimize_boolean_attributes=false with empty value",
|
||||
"options": {"minimize_boolean_attributes": false},
|
||||
"input": [["StartTag", "http://www.w3.org/1999/xhtml", "div", [{"namespace": null, "name": "irrelevant", "value" :""}]]],
|
||||
"expected": ["<div irrelevant=\"\">"]
|
||||
},
|
||||
|
||||
{"description": "escape less than signs in attribute values",
|
||||
"options": {"escape_lt_in_attrs": true},
|
||||
"input": [["StartTag", "http://www.w3.org/1999/xhtml", "a", [{"namespace": null, "name": "title", "value": "a<b>c&d"}]]],
|
||||
"expected": ["<a title=\"a<b>c&d\">"]
|
||||
},
|
||||
|
||||
{"description": "rcdata",
|
||||
"options": {"escape_rcdata": true},
|
||||
"input": [["StartTag", "http://www.w3.org/1999/xhtml", "script", {}], ["Characters", "a<b>c&d"]],
|
||||
"expected": ["<script>a<b>c&d"]
|
||||
}
|
||||
|
||||
]}
|
51
tests/wpt/web-platform-tests/tools/html5lib/html5lib/tests/testdata/serializer/whitespace.test
vendored
Normal file
51
tests/wpt/web-platform-tests/tools/html5lib/html5lib/tests/testdata/serializer/whitespace.test
vendored
Normal file
|
@ -0,0 +1,51 @@
|
|||
{"tests": [
|
||||
|
||||
{"description": "bare text with leading spaces",
|
||||
"options": {"strip_whitespace": true},
|
||||
"input": [["Characters", "\t\r\n\u000C foo"]],
|
||||
"expected": [" foo"]
|
||||
},
|
||||
|
||||
{"description": "bare text with trailing spaces",
|
||||
"options": {"strip_whitespace": true},
|
||||
"input": [["Characters", "foo \t\r\n\u000C"]],
|
||||
"expected": ["foo "]
|
||||
},
|
||||
|
||||
{"description": "bare text with inner spaces",
|
||||
"options": {"strip_whitespace": true},
|
||||
"input": [["Characters", "foo \t\r\n\u000C bar"]],
|
||||
"expected": ["foo bar"]
|
||||
},
|
||||
|
||||
{"description": "text within <pre>",
|
||||
"options": {"strip_whitespace": true},
|
||||
"input": [["StartTag", "http://www.w3.org/1999/xhtml", "pre", {}], ["Characters", "\t\r\n\u000C foo \t\r\n\u000C bar \t\r\n\u000C"], ["EndTag", "http://www.w3.org/1999/xhtml", "pre"]],
|
||||
"expected": ["<pre>\t\r\n\u000C foo \t\r\n\u000C bar \t\r\n\u000C</pre>"]
|
||||
},
|
||||
|
||||
{"description": "text within <pre>, with inner markup",
|
||||
"options": {"strip_whitespace": true},
|
||||
"input": [["StartTag", "http://www.w3.org/1999/xhtml", "pre", {}], ["Characters", "\t\r\n\u000C fo"], ["StartTag", "http://www.w3.org/1999/xhtml", "span", {}], ["Characters", "o \t\r\n\u000C b"], ["EndTag", "http://www.w3.org/1999/xhtml", "span"], ["Characters", "ar \t\r\n\u000C"], ["EndTag", "http://www.w3.org/1999/xhtml", "pre"]],
|
||||
"expected": ["<pre>\t\r\n\u000C fo<span>o \t\r\n\u000C b</span>ar \t\r\n\u000C</pre>"]
|
||||
},
|
||||
|
||||
{"description": "text within <textarea>",
|
||||
"options": {"strip_whitespace": true},
|
||||
"input": [["StartTag", "http://www.w3.org/1999/xhtml", "textarea", {}], ["Characters", "\t\r\n\u000C foo \t\r\n\u000C bar \t\r\n\u000C"], ["EndTag", "http://www.w3.org/1999/xhtml", "textarea"]],
|
||||
"expected": ["<textarea>\t\r\n\u000C foo \t\r\n\u000C bar \t\r\n\u000C</textarea>"]
|
||||
},
|
||||
|
||||
{"description": "text within <script>",
|
||||
"options": {"strip_whitespace": true},
|
||||
"input": [["StartTag", "http://www.w3.org/1999/xhtml", "script", {}], ["Characters", "\t\r\n\u000C foo \t\r\n\u000C bar \t\r\n\u000C"], ["EndTag", "http://www.w3.org/1999/xhtml", "script"]],
|
||||
"expected": ["<script>\t\r\n\u000C foo \t\r\n\u000C bar \t\r\n\u000C</script>"]
|
||||
},
|
||||
|
||||
{"description": "text within <style>",
|
||||
"options": {"strip_whitespace": true},
|
||||
"input": [["StartTag", "http://www.w3.org/1999/xhtml", "style", {}], ["Characters", "\t\r\n\u000C foo \t\r\n\u000C bar \t\r\n\u000C"], ["EndTag", "http://www.w3.org/1999/xhtml", "style"]],
|
||||
"expected": ["<style>\t\r\n\u000C foo \t\r\n\u000C bar \t\r\n\u000C</style>"]
|
||||
}
|
||||
|
||||
]}
|
43
tests/wpt/web-platform-tests/tools/html5lib/html5lib/tests/testdata/sniffer/htmlOrFeed.json
vendored
Normal file
43
tests/wpt/web-platform-tests/tools/html5lib/html5lib/tests/testdata/sniffer/htmlOrFeed.json
vendored
Normal file
|
@ -0,0 +1,43 @@
|
|||
[
|
||||
{"type": "text/html", "input": ""},
|
||||
{"type": "text/html", "input": "<!---->"},
|
||||
{"type": "text/html", "input": "<!--asdfaslkjdf;laksjdf as;dkfjsd-->"},
|
||||
{"type": "text/html", "input": "<!"},
|
||||
{"type": "text/html", "input": "\t"},
|
||||
{"type": "text/html", "input": "<!>"},
|
||||
{"type": "text/html", "input": "<?"},
|
||||
{"type": "text/html", "input": "<??>"},
|
||||
{"type": "application/rss+xml", "input": "<rss"},
|
||||
{"type": "application/atom+xml", "input": "<feed"},
|
||||
{"type": "text/html", "input": "<html"},
|
||||
{"type": "text/html", "input": "<!DOCTYPE HTML PUBLIC \"-//IETF//DTD HTML 2.0//EN\">\n<html><head>\n<title>302 Found</title>\n</head><body>\n<h1>Found</h1>\n<p>The document has moved <a href=\"http://feeds.feedburner.com/gofug\">here</a>.</p>\n</body></html>\n"},
|
||||
{"type": "text/html", "input": "<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.0 Transitional//EN\">\r\n<HTML><HEAD>\r\n <link rel=\"stylesheet\" type=\"text/css\" href=\"http://cache.blogads.com/289619328/feed.css\" /><link rel=\"stylesheet\" type=\"text/css\" href=\"http://cache.blogads.com/431602649/feed.css\" />\r\n<link rel=\"stylesheet\" type=\"text/css\" href=\"http://cache.blogads.com/382549546/feed.css\" />\r\n<link rel=\"stylesheet\" type=\"text/css\" href=\"http://cache.blogads.com/314618017/feed.css\" /><META http-equiv=\"expires\" content="},
|
||||
{"type": "text/html", "input": "<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\" \"http://www.w3.org/TR/html4/loose.dtd\">\r\n<html>\r\n<head>\r\n<title>Xiaxue - Chicken pie blogger.</title><meta http-equiv=\"Content-Type\" content=\"text/html; charset=iso-8859-1\"><style type=\"text/css\">\r\n<style type=\"text/css\">\r\n<!--\r\nbody {\r\n background-color: #FFF2F2;\r\n}\r\n.style1 {font-family: Georgia, \"Times New Roman\", Times, serif}\r\n.style2 {\r\n color: #8a567c;\r\n font-size: 14px;\r\n font-family: Georgia, \"Times New Roman\", Times, serif;\r\n}\r"},
|
||||
{"type": "text/html", "input": "<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Strict//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd\"><html xmlns=\"http://www.w3.org/1999/xhtml\" xml:lang=\"en\" lang=\"en\">\r\n<head> \r\n<title>Google Operating System</title>\r\n<meta http-equiv=\"Content-Type\" content=\"text/html; charset=UTF-8\" />\r\n<meta name=\"Description\" content=\"Unofficial news and tips about Google. A blog that watches Google's latest developments and the attempts to move your operating system online.\" />\r\n<meta name=\"generator\" c"},
|
||||
{"type": "text/html", "input": "<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Strict//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd\"><html xmlns=\"http://www.w3.org/1999/xhtml\" xml:lang=\"en\" lang=\"en\">\r\n<head>\r\n <title>Assimilated Press</title> <meta http-equiv=\"Content-Type\" content=\"text/html; charset=UTF-8\" />\r\n<meta name=\"MSSmartTagsPreventParsing\" content=\"true\" />\r\n<meta name=\"generator\" content=\"Blogger\" />\r\n<link rel=\"alternate\" type=\"application/atom+xml\" title=\"Assimilated Press - Atom\" href=\"http://assimila"},
|
||||
{"type": "text/html", "input": "<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Strict//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd\"><html xmlns=\"http://www.w3.org/1999/xhtml\" xml:lang=\"en\" lang=\"en\">\r\n<head>\r\n <title>PostSecret</title>\r\n<META name=\"keywords\" Content=\"secrets, postcard, secret, postcards, postsecret, postsecrets,online confessional, post secret, post secrets, artomatic, post a secret\"><META name=\"discription\" Content=\"See a Secret...Share a Secret\"> <meta http-equiv=\"Content-Type\" content=\"te"},
|
||||
{"type": "text/html", "input": "<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Strict//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd\">\n<html xmlns='http://www.w3.org/1999/xhtml' xmlns:b='http://www.google.com/2005/gml/b' xmlns:data='http://www.google.com/2005/gml/data' xmlns:expr='http://www.google.com/2005/gml/expr'>\n <head>\n \n <meta content='text/html; charset=UTF-8' http-equiv='Content-Type'/>\n <meta content='true' name='MSSmartTagsPreventParsing'/>\n <meta content='blogger' name='generator'/>\n <link rel=\"alternate\" typ"},
|
||||
{"type": "text/html", "input": "<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Strict//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd\">\n<html xmlns=\"http://www.w3.org/1999/xhtml\" dir=\"ltr\" lang=\"ja\">\n<head profile=\"http://gmpg.org/xfn/11\"> \n<meta http-equiv=\"Content-Type\" content=\"text/html; charset=UTF-8\" /> \n<title> CMS Lever</title><link rel=\"stylesheet\" type=\"text/css\" media=\"screen\" href=\"http://s.wordpress.com/wp-content/themes/pub/twenty-eight/2813.css\"/>\n<link rel=\"alternate\" type=\"application/rss+xml\" title=\"RSS 2.0\" h"},
|
||||
{"type": "text/html", "input": "<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Strict//EN\"\n \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd\">\n<html xmlns=\"http://www.w3.org/1999/xhtml\" dir=\"ltr\" lang=\"en\"><head>\n<meta http-equiv=\"Content-Type\" content=\"text/html; charset=UTF-8\" />\n<title> Park Avenue Peerage</title>\t<meta name=\"generator\" content=\"WordPress.com\" />\t<!-- feeds -->\n\t<link rel=\"alternate\" type=\"application/rss+xml\" title=\"RSS 2.0\" href=\"http://parkavenuepeerage.wordpress.com/feed/\" />\t<link rel=\"pingback\" href="},
|
||||
{"type": "text/html", "input": "<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Strict//EN\"\n \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd\">\n<html xmlns=\"http://www.w3.org/1999/xhtml\" dir=\"ltr\" lang=\"ja\"><head>\n<meta http-equiv=\"Content-Type\" content=\"text/html; charset=UTF-8\" />\n<title> \u884c\u96f2\u6d41\u6c34 -like a floating clouds and running water-</title>\t<meta name=\"generator\" content=\"WordPress.com\" />\t<!-- feeds -->\n\t<link rel=\"alternate\" type=\"application/rss+xml\" title=\"RSS 2.0\" href=\"http://shw4.wordpress.com/feed/\" />\t<li"},
|
||||
{"type": "text/html", "input": "<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\"><html xmlns=\"http://www.w3.org/1999/xhtml\">\n<head>\n<meta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\" />\n<meta name=\"generator\" content=\"http://www.typepad.com/\" />\n<title>Go Fug Yourself</title><link rel=\"stylesheet\" href=\"http://gofugyourself.typepad.com/go_fug_yourself/styles.css\" type=\"text/css\" />\n<link rel=\"alternate\" type=\"application/atom+xml\" title=\"Atom\" "},
|
||||
{"type": "text/html", "input": "<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\n<html xmlns=\"http://www.w3.org/1999/xhtml\" dir=\"ltr\" lang=\"en\"><head profile=\"http://gmpg.org/xfn/11\">\n<meta http-equiv=\"Content-Type\" content=\"text/html; charset=UTF-8\" /><title> Ladies…</title><meta name=\"generator\" content=\"WordPress.com\" /> <!-- leave this for stats --><link rel=\"stylesheet\" href=\"http://s.wordpress.com/wp-content/themes/default/style.css?1\" type=\"tex"},
|
||||
{"type": "text/html", "input": "<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\r\n<html xmlns=\"http://www.w3.org/1999/xhtml\">\r\n<head>\r\n <title>The Sartorialist</title> <meta http-equiv=\"Content-Type\" content=\"text/html; charset=UTF-8\" />\r\n<meta name=\"MSSmartTagsPreventParsing\" content=\"true\" />\r\n<meta name=\"generator\" content=\"Blogger\" />\r\n<link rel=\"alternate\" type=\"application/atom+xml\" title=\"The Sartorialist - Atom\" href=\"http://thesartorialist.blogspot"},
|
||||
{"type": "text/html", "input": "<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \n \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\"><html xmlns=\"http://www.w3.org/1999/xhtml\" lang=\"en\">\n<head>\n<meta http-equiv=\"Content-Type\" content=\"text/html; charset=ISO-8859-1\" />\n<meta name=\"generator\" content=\"http://www.typepad.com/\" />\n<title>Creating Passionate Users</title><link rel=\"stylesheet\" href=\"http://headrush.typepad.com/creating_passionate_users/styles.css\" type=\"text/css\" />\n<link rel=\"alternate\" type"},
|
||||
{"type": "text/html", "input": "<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\"\n\t\"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\n<html xmlns=\"http://www.w3.org/1999/xhtml\" id=\"sixapart-standard\">\n<head>\n\t<meta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\" />\n\t<meta name=\"generator\" content=\"http://www.typepad.com/\" />\n\t\n\t\n <meta name=\"keywords\" content=\"marketing, blog, seth, ideas, respect, permission\" />\n <meta name=\"description\" content=\"Seth Godin's riffs on marketing, respect, and the "},
|
||||
{"type": "text/html", "input": "<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\"\n\t\"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\n<html xmlns=\"http://www.w3.org/1999/xhtml\" id=\"sixapart-standard\">\n<head>\n\t<meta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\" />\n\t<meta name=\"generator\" content=\"http://www.typepad.com/\" />\n\t\n\t\n \n <meta name=\"description\" content=\" Western Civilization hangs in the balance. This blog is part of the solution,the cure. Get your heads out of the sand and Fight the G"},
|
||||
{"type": "text/html", "input": "<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.1//EN\" \"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd\">\n<html xmlns=\"http://www.w3.org/1999/xhtml\" dir=\"ltr\" lang=\"en\">\n<head>\n<meta http-equiv=\"Content-Type\" content=\"text/html; charset=pahrefhttpwwwfeedburnercomtarget_blankimgsrchttpwwwfeedburnercomfbimagespubpowered_by_fbgifaltPoweredbyFeedBurnerstyleborder0ap\" />\n<title> From Under the Rotunda</title>\n<link rel=\"stylesheet\" href=\"http://s.wordpress.com/wp-content/themes/pub/andreas04/style.css\" type=\"text/css\""},
|
||||
{"type": "application/atom+xml", "input": "<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href=\"http://www.blogger.com/styles/atom.css\" type=\"text/css\"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/'><id>tag:blogger.com,1999:blog-10861780</id><updated>2007-07-27T12:38:50.888-07:00</updated><title type='text'>Official Google Blog</title><link rel='alternate' type='text/html' href='http://googleblog.blogspot.com/'/><link rel='next' type='application/atom+xml' href='http://googleblog.blogs"},
|
||||
{"type": "application/rss+xml", "input": "<?xml version='1.0' encoding='UTF-8'?><rss xmlns:atom='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' version='2.0'><channel><atom:id>tag:blogger.com,1999:blog-10861780</atom:id><lastBuildDate>Fri, 27 Jul 2007 19:38:50 +0000</lastBuildDate><title>Official Google Blog</title><description/><link>http://googleblog.blogspot.com/</link><managingEditor>Eric Case</managingEditor><generator>Blogger</generator><openSearch:totalResults>729</openSearch:totalResults><openSearc"},
|
||||
{"type": "application/rss+xml", "input": "<?xml version=\"1.0\" encoding=\"pahrefhttpwwwfeedburnercomtarget_blankimgsrchttpwwwfeedburnercomfbimagespubpowered_by_fbgifaltPoweredbyFeedBurnerstyleborder0ap\"?>\n<!-- generator=\"wordpress/MU\" -->\n<rss version=\"2.0\"\n\txmlns:content=\"http://purl.org/rss/1.0/modules/content/\"\n\txmlns:wfw=\"http://wellformedweb.org/CommentAPI/\"\n\txmlns:dc=\"http://purl.org/dc/elements/1.1/\"\n\t><channel>\n\t<title>From Under the Rotunda</title>\n\t<link>http://dannybernardi.wordpress.com</link>\n\t<description>The Monographs of Danny Ber"},
|
||||
{"type": "application/rss+xml", "input": "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<!-- generator=\"wordpress/MU\" -->\n<rss version=\"2.0\"\n\txmlns:content=\"http://purl.org/rss/1.0/modules/content/\"\n\txmlns:wfw=\"http://wellformedweb.org/CommentAPI/\"\n\txmlns:dc=\"http://purl.org/dc/elements/1.1/\"\n\t><channel>\n\t<title>CMS Lever</title>\n\t<link>http://kanaguri.wordpress.com</link>\n\t<description>CMS\u306e\u6c17\u306b\u306a\u3063\u305f\u3053\u3068</description>\n\t<pubDate>Wed, 18 Jul 2007 21:26:22 +0000</pubDate>\n\t<generator>http://wordpress.org/?v=MU</generator>\n\t<language>ja</languag"},
|
||||
{"type": "application/atom+xml", "input": "<?xml version=\"1.0\" encoding=\"utf-8\"?>\n<feed xmlns=\"http://www.w3.org/2005/Atom\" xmlns:dc=\"http://purl.org/dc/elements/1.1/\" xmlns:thr=\"http://purl.org/syndication/thread/1.0\">\n <title>Atlas Shrugs</title>\n <link rel=\"self\" type=\"application/atom+xml\" href=\"http://atlasshrugs2000.typepad.com/atlas_shrugs/atom.xml\" />\n <link rel=\"alternate\" type=\"text/html\" href=\"http://atlasshrugs2000.typepad.com/atlas_shrugs/\" />\n <id>tag:typepad.com,2003:weblog-132946</id>\n <updated>2007-08-15T16:07:34-04"},
|
||||
{"type": "application/atom+xml", "input": "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\r\n<?xml-stylesheet href=\"http://feeds.feedburner.com/~d/styles/atom10full.xsl\" type=\"text/xsl\" media=\"screen\"?><?xml-stylesheet href=\"http://feeds.feedburner.com/~d/styles/itemcontent.css\" type=\"text/css\" media=\"screen\"?><feed xmlns=\"http://www.w3.org/2005/Atom\" xmlns:dc=\"http://purl.org/dc/elements/1.1/\" xmlns:thr=\"http://purl.org/syndication/thread/1.0\" xmlns:feedburner=\"http://rssnamespace.org/feedburner/ext/1.0\">\r\n <title>Creating Passionate Users</title>\r\n "},
|
||||
{"type": "application/atom+xml", "input": "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\r\n<?xml-stylesheet href=\"http://feeds.feedburner.com/~d/styles/atom10full.xsl\" type=\"text/xsl\" media=\"screen\"?><?xml-stylesheet href=\"http://feeds.feedburner.com/~d/styles/itemcontent.css\" type=\"text/css\" media=\"screen\"?><feed xmlns=\"http://www.w3.org/2005/Atom\" xmlns:feedburner=\"http://rssnamespace.org/feedburner/ext/1.0\">\r\n <title>Seth's Blog</title>\r\n <link rel=\"alternate\" type=\"text/html\" href=\"http://sethgodin.typepad.com/seths_blog/\" />\r\n <link rel=\"s"},
|
||||
{"type": "application/atom+xml", "input": "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\r\n<?xml-stylesheet href=\"http://feeds.feedburner.com/~d/styles/atom10full.xsl\" type=\"text/xsl\" media=\"screen\"?><?xml-stylesheet href=\"http://feeds.feedburner.com/~d/styles/itemcontent.css\" type=\"text/css\" media=\"screen\"?><feed xmlns=\"http://www.w3.org/2005/Atom\" xmlns:openSearch=\"http://a9.com/-/spec/opensearchrss/1.0/\" xmlns:feedburner=\"http://rssnamespace.org/feedburner/ext/1.0\"><id>tag:blogger.com,1999:blog-32454861</id><updated>2007-07-31T21:44:09.867+02:00</upd"},
|
||||
{"type": "application/atom+xml", "input": "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\r\n<?xml-stylesheet href=\"http://feeds.feedburner.com/~d/styles/atomfull.xsl\" type=\"text/xsl\" media=\"screen\"?><?xml-stylesheet href=\"http://feeds.feedburner.com/~d/styles/itemcontent.css\" type=\"text/css\" media=\"screen\"?><feed xmlns=\"http://purl.org/atom/ns#\" xmlns:dc=\"http://purl.org/dc/elements/1.1/\" xmlns:feedburner=\"http://rssnamespace.org/feedburner/ext/1.0\" version=\"0.3\">\r\n <title>Go Fug Yourself</title>\r\n <link rel=\"alternate\" type=\"text/html\" href=\"http://go"},
|
||||
{"type": "application/rss+xml", "input": "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\r\n<?xml-stylesheet href=\"http://feeds.feedburner.com/~d/styles/rss2full.xsl\" type=\"text/xsl\" media=\"screen\"?><?xml-stylesheet href=\"http://feeds.feedburner.com/~d/styles/itemcontent.css\" type=\"text/css\" media=\"screen\"?><rss xmlns:creativeCommons=\"http://backend.userland.com/creativeCommonsRssModule\" xmlns:feedburner=\"http://rssnamespace.org/feedburner/ext/1.0\" version=\"2.0\"><channel><title>Google Operating System</title><link>http://googlesystem.blogspot.com/</link>"},
|
||||
{"type": "application/rss+xml", "input": "<?xml version=\"1.0\" encoding=\"\"?>\n<!-- generator=\"wordpress/MU\" -->\n<rss version=\"2.0\"\n\txmlns:content=\"http://purl.org/rss/1.0/modules/content/\"\n\txmlns:wfw=\"http://wellformedweb.org/CommentAPI/\"\n\txmlns:dc=\"http://purl.org/dc/elements/1.1/\"\n\t><channel>\n\t<title>Nunublog</title>\n\t<link>http://nunubh.wordpress.com</link>\n\t<description>Just Newbie Blog!</description>\n\t<pubDate>Mon, 09 Jul 2007 18:54:09 +0000</pubDate>\n\t<generator>http://wordpress.org/?v=MU</generator>\n\t<language>id</language>\n\t\t\t<item>\n\t\t<ti"},
|
||||
{"type": "text/html", "input": "<html xmlns=\"http://www.w3.org/1999/xhtml\" xml:lang=\"en\" lang=\"en\">\r\n<HEAD>\r\n<TITLE>Design*Sponge</TITLE><meta http-equiv=\"Content-Type\" content=\"text/html; charset=UTF-8\" />\r\n<meta name=\"MSSmartTagsPreventParsing\" content=\"true\" />\r\n<meta name=\"generator\" content=\"Blogger\" />\r\n<link rel=\"alternate\" type=\"application/atom+xml\" title=\"Design*Sponge - Atom\" href=\"http://designsponge.blogspot.com/feeds/posts/default\" />\r\n<link rel=\"alternate\" type=\"application/rss+xml\" title=\"Design*Sponge - RSS\" href="},
|
||||
{"type": "text/html", "input": "<HTML>\n<HEAD>\n<TITLE>Moved Temporarily</TITLE>\n</HEAD>\n<BODY BGCOLOR=\"#FFFFFF\" TEXT=\"#000000\">\n<H1>Moved Temporarily</H1>\nThe document has moved <A HREF=\"http://feeds.feedburner.com/thesecretdiaryofstevejobs\">here</A>.\n</BODY>\n</HTML>\n"}
|
||||
]
|
104
tests/wpt/web-platform-tests/tools/html5lib/html5lib/tests/testdata/tokenizer/README.md
vendored
Normal file
104
tests/wpt/web-platform-tests/tools/html5lib/html5lib/tests/testdata/tokenizer/README.md
vendored
Normal file
|
@ -0,0 +1,104 @@
|
|||
Tokenizer tests
|
||||
===============
|
||||
|
||||
The test format is [JSON](http://www.json.org/). This has the advantage
|
||||
that the syntax allows backward-compatible extensions to the tests and
|
||||
the disadvantage that it is relatively verbose.
|
||||
|
||||
Basic Structure
|
||||
---------------
|
||||
|
||||
{"tests": [
|
||||
{"description": "Test description",
|
||||
"input": "input_string",
|
||||
"output": [expected_output_tokens],
|
||||
"initialStates": [initial_states],
|
||||
"lastStartTag": last_start_tag,
|
||||
"ignoreErrorOrder": ignore_error_order
|
||||
}
|
||||
]}
|
||||
|
||||
Multiple tests per file are allowed simply by adding more objects to the
|
||||
"tests" list.
|
||||
|
||||
`description`, `input` and `output` are always present. The other values
|
||||
are optional.
|
||||
|
||||
### Test set-up
|
||||
|
||||
`test.input` is a string containing the characters to pass to the
|
||||
tokenizer. Specifically, it represents the characters of the **input
|
||||
stream**, and so implementations are expected to perform the processing
|
||||
described in the spec's **Preprocessing the input stream** section
|
||||
before feeding the result to the tokenizer.
|
||||
|
||||
If `test.doubleEscaped` is present and `true`, then `test.input` is not
|
||||
quite as described above. Instead, it must first be subjected to another
|
||||
round of unescaping (i.e., in addition to any unescaping involved in the
|
||||
JSON import), and the result of *that* represents the characters of the
|
||||
input stream. Currently, the only unescaping required by this option is
|
||||
to convert each sequence of the form \\uHHHH (where H is a hex digit)
|
||||
into the corresponding Unicode code point. (Note that this option also
|
||||
affects the interpretation of `test.output`.)
|
||||
|
||||
`test.initialStates` is a list of strings, each being the name of a
|
||||
tokenizer state. The test should be run once for each string, using it
|
||||
to set the tokenizer's initial state for that run. If
|
||||
`test.initialStates` is omitted, it defaults to `["data state"]`.
|
||||
|
||||
`test.lastStartTag` is a lowercase string that should be used as "the
|
||||
tag name of the last start tag to have been emitted from this
|
||||
tokenizer", referenced in the spec's definition of **appropriate end tag
|
||||
token**. If it is omitted, it is treated as if "no start tag has been
|
||||
emitted from this tokenizer".
|
||||
|
||||
### Test results
|
||||
|
||||
`test.output` is a list of tokens, ordered with the first produced by
|
||||
the tokenizer the first (leftmost) in the list. The list must mach the
|
||||
**complete** list of tokens that the tokenizer should produce. Valid
|
||||
tokens are:
|
||||
|
||||
["DOCTYPE", name, public_id, system_id, correctness]
|
||||
["StartTag", name, {attributes}*, true*]
|
||||
["StartTag", name, {attributes}]
|
||||
["EndTag", name]
|
||||
["Comment", data]
|
||||
["Character", data]
|
||||
"ParseError"
|
||||
|
||||
`public_id` and `system_id` are either strings or `null`. `correctness`
|
||||
is either `true` or `false`; `true` corresponds to the force-quirks flag
|
||||
being false, and vice-versa.
|
||||
|
||||
When the self-closing flag is set, the `StartTag` array has `true` as
|
||||
its fourth entry. When the flag is not set, the array has only three
|
||||
entries for backwards compatibility.
|
||||
|
||||
All adjacent character tokens are coalesced into a single
|
||||
`["Character", data]` token.
|
||||
|
||||
If `test.doubleEscaped` is present and `true`, then every string within
|
||||
`test.output` must be further unescaped (as described above) before
|
||||
comparing with the tokenizer's output.
|
||||
|
||||
`test.ignoreErrorOrder` is a boolean value indicating that the order of
|
||||
`ParseError` tokens relative to other tokens in the output stream is
|
||||
unimportant, and implementations should ignore such differences between
|
||||
their output and `expected_output_tokens`. (This is used for errors
|
||||
emitted by the input stream preprocessing stage, since it is useful to
|
||||
test that code but it is undefined when the errors occur). If it is
|
||||
omitted, it defaults to `false`.
|
||||
|
||||
xmlViolation tests
|
||||
------------------
|
||||
|
||||
`tokenizer/xmlViolation.test` differs from the above in a couple of
|
||||
ways:
|
||||
|
||||
- The name of the single member of the top-level JSON object is
|
||||
"xmlViolationTests" instead of "tests".
|
||||
- Each test's expected output assumes that implementation is applying
|
||||
the tweaks given in the spec's "Coercing an HTML DOM into an
|
||||
infoset" section.
|
||||
|
|
@ -0,0 +1,81 @@
|
|||
{"tests": [
|
||||
|
||||
{"description":"PLAINTEXT content model flag",
|
||||
"initialStates":["PLAINTEXT state"],
|
||||
"lastStartTag":"plaintext",
|
||||
"input":"<head>&body;",
|
||||
"output":[["Character", "<head>&body;"]]},
|
||||
|
||||
{"description":"End tag closing RCDATA or RAWTEXT",
|
||||
"initialStates":["RCDATA state", "RAWTEXT state"],
|
||||
"lastStartTag":"xmp",
|
||||
"input":"foo</xmp>",
|
||||
"output":[["Character", "foo"], ["EndTag", "xmp"]]},
|
||||
|
||||
{"description":"End tag closing RCDATA or RAWTEXT (case-insensitivity)",
|
||||
"initialStates":["RCDATA state", "RAWTEXT state"],
|
||||
"lastStartTag":"xmp",
|
||||
"input":"foo</xMp>",
|
||||
"output":[["Character", "foo"], ["EndTag", "xmp"]]},
|
||||
|
||||
{"description":"End tag closing RCDATA or RAWTEXT (ending with space)",
|
||||
"initialStates":["RCDATA state", "RAWTEXT state"],
|
||||
"lastStartTag":"xmp",
|
||||
"input":"foo</xmp ",
|
||||
"output":[["Character", "foo"], "ParseError"]},
|
||||
|
||||
{"description":"End tag closing RCDATA or RAWTEXT (ending with EOF)",
|
||||
"initialStates":["RCDATA state", "RAWTEXT state"],
|
||||
"lastStartTag":"xmp",
|
||||
"input":"foo</xmp",
|
||||
"output":[["Character", "foo</xmp"]]},
|
||||
|
||||
{"description":"End tag closing RCDATA or RAWTEXT (ending with slash)",
|
||||
"initialStates":["RCDATA state", "RAWTEXT state"],
|
||||
"lastStartTag":"xmp",
|
||||
"input":"foo</xmp/",
|
||||
"output":[["Character", "foo"], "ParseError"]},
|
||||
|
||||
{"description":"End tag not closing RCDATA or RAWTEXT (ending with left-angle-bracket)",
|
||||
"initialStates":["RCDATA state", "RAWTEXT state"],
|
||||
"lastStartTag":"xmp",
|
||||
"input":"foo</xmp<",
|
||||
"output":[["Character", "foo</xmp<"]]},
|
||||
|
||||
{"description":"End tag with incorrect name in RCDATA or RAWTEXT",
|
||||
"initialStates":["RCDATA state", "RAWTEXT state"],
|
||||
"lastStartTag":"xmp",
|
||||
"input":"</foo>bar</xmp>",
|
||||
"output":[["Character", "</foo>bar"], ["EndTag", "xmp"]]},
|
||||
|
||||
{"description":"Partial end tags leading straight into partial end tags",
|
||||
"initialStates":["RCDATA state", "RAWTEXT state"],
|
||||
"lastStartTag":"xmp",
|
||||
"input":"</xmp</xmp</xmp>",
|
||||
"output":[["Character", "</xmp</xmp"], ["EndTag", "xmp"]]},
|
||||
|
||||
{"description":"End tag with incorrect name in RCDATA or RAWTEXT (starting like correct name)",
|
||||
"initialStates":["RCDATA state", "RAWTEXT state"],
|
||||
"lastStartTag":"xmp",
|
||||
"input":"</foo>bar</xmpaar>",
|
||||
"output":[["Character", "</foo>bar</xmpaar>"]]},
|
||||
|
||||
{"description":"End tag closing RCDATA or RAWTEXT, switching back to PCDATA",
|
||||
"initialStates":["RCDATA state", "RAWTEXT state"],
|
||||
"lastStartTag":"xmp",
|
||||
"input":"foo</xmp></baz>",
|
||||
"output":[["Character", "foo"], ["EndTag", "xmp"], ["EndTag", "baz"]]},
|
||||
|
||||
{"description":"RAWTEXT w/ something looking like an entity",
|
||||
"initialStates":["RAWTEXT state"],
|
||||
"lastStartTag":"xmp",
|
||||
"input":"&foo;",
|
||||
"output":[["Character", "&foo;"]]},
|
||||
|
||||
{"description":"RCDATA w/ an entity",
|
||||
"initialStates":["RCDATA state"],
|
||||
"lastStartTag":"textarea",
|
||||
"input":"<",
|
||||
"output":[["Character", "<"]]}
|
||||
|
||||
]}
|
96
tests/wpt/web-platform-tests/tools/html5lib/html5lib/tests/testdata/tokenizer/domjs.test
vendored
Normal file
96
tests/wpt/web-platform-tests/tools/html5lib/html5lib/tests/testdata/tokenizer/domjs.test
vendored
Normal file
|
@ -0,0 +1,96 @@
|
|||
{
|
||||
"tests": [
|
||||
{
|
||||
"description":"CR in bogus comment state",
|
||||
"input":"<?\u000d",
|
||||
"output":["ParseError", ["Comment", "?\u000a"]]
|
||||
},
|
||||
{
|
||||
"description":"CRLF in bogus comment state",
|
||||
"input":"<?\u000d\u000a",
|
||||
"output":["ParseError", ["Comment", "?\u000a"]]
|
||||
},
|
||||
{
|
||||
"description":"CRLFLF in bogus comment state",
|
||||
"input":"<?\u000d\u000a\u000a",
|
||||
"output":["ParseError", ["Comment", "?\u000a\u000a"]]
|
||||
},
|
||||
{
|
||||
"description":"NUL in RCDATA and RAWTEXT",
|
||||
"doubleEscaped":true,
|
||||
"initialStates":["RCDATA state", "RAWTEXT state"],
|
||||
"input":"\\u0000",
|
||||
"output":["ParseError", ["Character", "\\uFFFD"]]
|
||||
},
|
||||
{
|
||||
"description":"leading U+FEFF must pass through",
|
||||
"doubleEscaped":true,
|
||||
"input":"\\uFEFFfoo\\uFEFFbar",
|
||||
"output":[["Character", "\\uFEFFfoo\\uFEFFbar"]]
|
||||
},
|
||||
{
|
||||
"description":"Non BMP-charref in in RCDATA",
|
||||
"initialStates":["RCDATA state"],
|
||||
"input":"≂̸",
|
||||
"output":[["Character", "\u2242\u0338"]]
|
||||
},
|
||||
{
|
||||
"description":"Bad charref in in RCDATA",
|
||||
"initialStates":["RCDATA state"],
|
||||
"input":"&NotEqualTild;",
|
||||
"output":["ParseError", ["Character", "&NotEqualTild;"]]
|
||||
},
|
||||
{
|
||||
"description":"lowercase endtags in RCDATA and RAWTEXT",
|
||||
"initialStates":["RCDATA state", "RAWTEXT state"],
|
||||
"lastStartTag":"xmp",
|
||||
"input":"</XMP>",
|
||||
"output":[["EndTag","xmp"]]
|
||||
},
|
||||
{
|
||||
"description":"bad endtag in RCDATA and RAWTEXT",
|
||||
"initialStates":["RCDATA state", "RAWTEXT state"],
|
||||
"lastStartTag":"xmp",
|
||||
"input":"</ XMP>",
|
||||
"output":[["Character","</ XMP>"]]
|
||||
},
|
||||
{
|
||||
"description":"bad endtag in RCDATA and RAWTEXT",
|
||||
"initialStates":["RCDATA state", "RAWTEXT state"],
|
||||
"lastStartTag":"xmp",
|
||||
"input":"</xm>",
|
||||
"output":[["Character","</xm>"]]
|
||||
},
|
||||
{
|
||||
"description":"bad endtag in RCDATA and RAWTEXT",
|
||||
"initialStates":["RCDATA state", "RAWTEXT state"],
|
||||
"lastStartTag":"xmp",
|
||||
"input":"</xm ",
|
||||
"output":[["Character","</xm "]]
|
||||
},
|
||||
{
|
||||
"description":"bad endtag in RCDATA and RAWTEXT",
|
||||
"initialStates":["RCDATA state", "RAWTEXT state"],
|
||||
"lastStartTag":"xmp",
|
||||
"input":"</xm/",
|
||||
"output":[["Character","</xm/"]]
|
||||
},
|
||||
{
|
||||
"description":"Non BMP-charref in attribute",
|
||||
"input":"<p id=\"≂̸\">",
|
||||
"output":[["StartTag", "p", {"id":"\u2242\u0338"}]]
|
||||
},
|
||||
{
|
||||
"description":"--!NUL in comment ",
|
||||
"doubleEscaped":true,
|
||||
"input":"<!----!\\u0000-->",
|
||||
"output":["ParseError", "ParseError", ["Comment", "--!\\uFFFD"]]
|
||||
},
|
||||
{
|
||||
"description":"space EOF after doctype ",
|
||||
"input":"<!DOCTYPE html ",
|
||||
"output":["ParseError", ["DOCTYPE", "html", null, null , false]]
|
||||
}
|
||||
|
||||
]
|
||||
}
|
283
tests/wpt/web-platform-tests/tools/html5lib/html5lib/tests/testdata/tokenizer/entities.test
vendored
Normal file
283
tests/wpt/web-platform-tests/tools/html5lib/html5lib/tests/testdata/tokenizer/entities.test
vendored
Normal file
|
@ -0,0 +1,283 @@
|
|||
{"tests": [
|
||||
|
||||
{"description": "Undefined named entity in attribute value ending in semicolon and whose name starts with a known entity name.",
|
||||
"input":"<h a='¬i;'>",
|
||||
"output": [["StartTag", "h", {"a": "¬i;"}]]},
|
||||
|
||||
{"description": "Entity name followed by the equals sign in an attribute value.",
|
||||
"input":"<h a='&lang='>",
|
||||
"output": [["StartTag", "h", {"a": "&lang="}]]},
|
||||
|
||||
{"description": "CR as numeric entity",
|
||||
"input":"
",
|
||||
"output": ["ParseError", ["Character", "\r"]]},
|
||||
|
||||
{"description": "CR as hexadecimal numeric entity",
|
||||
"input":"
",
|
||||
"output": ["ParseError", ["Character", "\r"]]},
|
||||
|
||||
{"description": "Windows-1252 EURO SIGN numeric entity.",
|
||||
"input":"€",
|
||||
"output": ["ParseError", ["Character", "\u20AC"]]},
|
||||
|
||||
{"description": "Windows-1252 REPLACEMENT CHAR numeric entity.",
|
||||
"input":"",
|
||||
"output": ["ParseError", ["Character", "\u0081"]]},
|
||||
|
||||
{"description": "Windows-1252 SINGLE LOW-9 QUOTATION MARK numeric entity.",
|
||||
"input":"‚",
|
||||
"output": ["ParseError", ["Character", "\u201A"]]},
|
||||
|
||||
{"description": "Windows-1252 LATIN SMALL LETTER F WITH HOOK numeric entity.",
|
||||
"input":"ƒ",
|
||||
"output": ["ParseError", ["Character", "\u0192"]]},
|
||||
|
||||
{"description": "Windows-1252 DOUBLE LOW-9 QUOTATION MARK numeric entity.",
|
||||
"input":"„",
|
||||
"output": ["ParseError", ["Character", "\u201E"]]},
|
||||
|
||||
{"description": "Windows-1252 HORIZONTAL ELLIPSIS numeric entity.",
|
||||
"input":"…",
|
||||
"output": ["ParseError", ["Character", "\u2026"]]},
|
||||
|
||||
{"description": "Windows-1252 DAGGER numeric entity.",
|
||||
"input":"†",
|
||||
"output": ["ParseError", ["Character", "\u2020"]]},
|
||||
|
||||
{"description": "Windows-1252 DOUBLE DAGGER numeric entity.",
|
||||
"input":"‡",
|
||||
"output": ["ParseError", ["Character", "\u2021"]]},
|
||||
|
||||
{"description": "Windows-1252 MODIFIER LETTER CIRCUMFLEX ACCENT numeric entity.",
|
||||
"input":"ˆ",
|
||||
"output": ["ParseError", ["Character", "\u02C6"]]},
|
||||
|
||||
{"description": "Windows-1252 PER MILLE SIGN numeric entity.",
|
||||
"input":"‰",
|
||||
"output": ["ParseError", ["Character", "\u2030"]]},
|
||||
|
||||
{"description": "Windows-1252 LATIN CAPITAL LETTER S WITH CARON numeric entity.",
|
||||
"input":"Š",
|
||||
"output": ["ParseError", ["Character", "\u0160"]]},
|
||||
|
||||
{"description": "Windows-1252 SINGLE LEFT-POINTING ANGLE QUOTATION MARK numeric entity.",
|
||||
"input":"‹",
|
||||
"output": ["ParseError", ["Character", "\u2039"]]},
|
||||
|
||||
{"description": "Windows-1252 LATIN CAPITAL LIGATURE OE numeric entity.",
|
||||
"input":"Œ",
|
||||
"output": ["ParseError", ["Character", "\u0152"]]},
|
||||
|
||||
{"description": "Windows-1252 REPLACEMENT CHAR numeric entity.",
|
||||
"input":"",
|
||||
"output": ["ParseError", ["Character", "\u008D"]]},
|
||||
|
||||
{"description": "Windows-1252 LATIN CAPITAL LETTER Z WITH CARON numeric entity.",
|
||||
"input":"Ž",
|
||||
"output": ["ParseError", ["Character", "\u017D"]]},
|
||||
|
||||
{"description": "Windows-1252 REPLACEMENT CHAR numeric entity.",
|
||||
"input":"",
|
||||
"output": ["ParseError", ["Character", "\u008F"]]},
|
||||
|
||||
{"description": "Windows-1252 REPLACEMENT CHAR numeric entity.",
|
||||
"input":"",
|
||||
"output": ["ParseError", ["Character", "\u0090"]]},
|
||||
|
||||
{"description": "Windows-1252 LEFT SINGLE QUOTATION MARK numeric entity.",
|
||||
"input":"‘",
|
||||
"output": ["ParseError", ["Character", "\u2018"]]},
|
||||
|
||||
{"description": "Windows-1252 RIGHT SINGLE QUOTATION MARK numeric entity.",
|
||||
"input":"’",
|
||||
"output": ["ParseError", ["Character", "\u2019"]]},
|
||||
|
||||
{"description": "Windows-1252 LEFT DOUBLE QUOTATION MARK numeric entity.",
|
||||
"input":"“",
|
||||
"output": ["ParseError", ["Character", "\u201C"]]},
|
||||
|
||||
{"description": "Windows-1252 RIGHT DOUBLE QUOTATION MARK numeric entity.",
|
||||
"input":"”",
|
||||
"output": ["ParseError", ["Character", "\u201D"]]},
|
||||
|
||||
{"description": "Windows-1252 BULLET numeric entity.",
|
||||
"input":"•",
|
||||
"output": ["ParseError", ["Character", "\u2022"]]},
|
||||
|
||||
{"description": "Windows-1252 EN DASH numeric entity.",
|
||||
"input":"–",
|
||||
"output": ["ParseError", ["Character", "\u2013"]]},
|
||||
|
||||
{"description": "Windows-1252 EM DASH numeric entity.",
|
||||
"input":"—",
|
||||
"output": ["ParseError", ["Character", "\u2014"]]},
|
||||
|
||||
{"description": "Windows-1252 SMALL TILDE numeric entity.",
|
||||
"input":"˜",
|
||||
"output": ["ParseError", ["Character", "\u02DC"]]},
|
||||
|
||||
{"description": "Windows-1252 TRADE MARK SIGN numeric entity.",
|
||||
"input":"™",
|
||||
"output": ["ParseError", ["Character", "\u2122"]]},
|
||||
|
||||
{"description": "Windows-1252 LATIN SMALL LETTER S WITH CARON numeric entity.",
|
||||
"input":"š",
|
||||
"output": ["ParseError", ["Character", "\u0161"]]},
|
||||
|
||||
{"description": "Windows-1252 SINGLE RIGHT-POINTING ANGLE QUOTATION MARK numeric entity.",
|
||||
"input":"›",
|
||||
"output": ["ParseError", ["Character", "\u203A"]]},
|
||||
|
||||
{"description": "Windows-1252 LATIN SMALL LIGATURE OE numeric entity.",
|
||||
"input":"œ",
|
||||
"output": ["ParseError", ["Character", "\u0153"]]},
|
||||
|
||||
{"description": "Windows-1252 REPLACEMENT CHAR numeric entity.",
|
||||
"input":"",
|
||||
"output": ["ParseError", ["Character", "\u009D"]]},
|
||||
|
||||
{"description": "Windows-1252 EURO SIGN hexadecimal numeric entity.",
|
||||
"input":"€",
|
||||
"output": ["ParseError", ["Character", "\u20AC"]]},
|
||||
|
||||
{"description": "Windows-1252 REPLACEMENT CHAR hexadecimal numeric entity.",
|
||||
"input":"",
|
||||
"output": ["ParseError", ["Character", "\u0081"]]},
|
||||
|
||||
{"description": "Windows-1252 SINGLE LOW-9 QUOTATION MARK hexadecimal numeric entity.",
|
||||
"input":"‚",
|
||||
"output": ["ParseError", ["Character", "\u201A"]]},
|
||||
|
||||
{"description": "Windows-1252 LATIN SMALL LETTER F WITH HOOK hexadecimal numeric entity.",
|
||||
"input":"ƒ",
|
||||
"output": ["ParseError", ["Character", "\u0192"]]},
|
||||
|
||||
{"description": "Windows-1252 DOUBLE LOW-9 QUOTATION MARK hexadecimal numeric entity.",
|
||||
"input":"„",
|
||||
"output": ["ParseError", ["Character", "\u201E"]]},
|
||||
|
||||
{"description": "Windows-1252 HORIZONTAL ELLIPSIS hexadecimal numeric entity.",
|
||||
"input":"…",
|
||||
"output": ["ParseError", ["Character", "\u2026"]]},
|
||||
|
||||
{"description": "Windows-1252 DAGGER hexadecimal numeric entity.",
|
||||
"input":"†",
|
||||
"output": ["ParseError", ["Character", "\u2020"]]},
|
||||
|
||||
{"description": "Windows-1252 DOUBLE DAGGER hexadecimal numeric entity.",
|
||||
"input":"‡",
|
||||
"output": ["ParseError", ["Character", "\u2021"]]},
|
||||
|
||||
{"description": "Windows-1252 MODIFIER LETTER CIRCUMFLEX ACCENT hexadecimal numeric entity.",
|
||||
"input":"ˆ",
|
||||
"output": ["ParseError", ["Character", "\u02C6"]]},
|
||||
|
||||
{"description": "Windows-1252 PER MILLE SIGN hexadecimal numeric entity.",
|
||||
"input":"‰",
|
||||
"output": ["ParseError", ["Character", "\u2030"]]},
|
||||
|
||||
{"description": "Windows-1252 LATIN CAPITAL LETTER S WITH CARON hexadecimal numeric entity.",
|
||||
"input":"Š",
|
||||
"output": ["ParseError", ["Character", "\u0160"]]},
|
||||
|
||||
{"description": "Windows-1252 SINGLE LEFT-POINTING ANGLE QUOTATION MARK hexadecimal numeric entity.",
|
||||
"input":"‹",
|
||||
"output": ["ParseError", ["Character", "\u2039"]]},
|
||||
|
||||
{"description": "Windows-1252 LATIN CAPITAL LIGATURE OE hexadecimal numeric entity.",
|
||||
"input":"Œ",
|
||||
"output": ["ParseError", ["Character", "\u0152"]]},
|
||||
|
||||
{"description": "Windows-1252 REPLACEMENT CHAR hexadecimal numeric entity.",
|
||||
"input":"",
|
||||
"output": ["ParseError", ["Character", "\u008D"]]},
|
||||
|
||||
{"description": "Windows-1252 LATIN CAPITAL LETTER Z WITH CARON hexadecimal numeric entity.",
|
||||
"input":"Ž",
|
||||
"output": ["ParseError", ["Character", "\u017D"]]},
|
||||
|
||||
{"description": "Windows-1252 REPLACEMENT CHAR hexadecimal numeric entity.",
|
||||
"input":"",
|
||||
"output": ["ParseError", ["Character", "\u008F"]]},
|
||||
|
||||
{"description": "Windows-1252 REPLACEMENT CHAR hexadecimal numeric entity.",
|
||||
"input":"",
|
||||
"output": ["ParseError", ["Character", "\u0090"]]},
|
||||
|
||||
{"description": "Windows-1252 LEFT SINGLE QUOTATION MARK hexadecimal numeric entity.",
|
||||
"input":"‘",
|
||||
"output": ["ParseError", ["Character", "\u2018"]]},
|
||||
|
||||
{"description": "Windows-1252 RIGHT SINGLE QUOTATION MARK hexadecimal numeric entity.",
|
||||
"input":"’",
|
||||
"output": ["ParseError", ["Character", "\u2019"]]},
|
||||
|
||||
{"description": "Windows-1252 LEFT DOUBLE QUOTATION MARK hexadecimal numeric entity.",
|
||||
"input":"“",
|
||||
"output": ["ParseError", ["Character", "\u201C"]]},
|
||||
|
||||
{"description": "Windows-1252 RIGHT DOUBLE QUOTATION MARK hexadecimal numeric entity.",
|
||||
"input":"”",
|
||||
"output": ["ParseError", ["Character", "\u201D"]]},
|
||||
|
||||
{"description": "Windows-1252 BULLET hexadecimal numeric entity.",
|
||||
"input":"•",
|
||||
"output": ["ParseError", ["Character", "\u2022"]]},
|
||||
|
||||
{"description": "Windows-1252 EN DASH hexadecimal numeric entity.",
|
||||
"input":"–",
|
||||
"output": ["ParseError", ["Character", "\u2013"]]},
|
||||
|
||||
{"description": "Windows-1252 EM DASH hexadecimal numeric entity.",
|
||||
"input":"—",
|
||||
"output": ["ParseError", ["Character", "\u2014"]]},
|
||||
|
||||
{"description": "Windows-1252 SMALL TILDE hexadecimal numeric entity.",
|
||||
"input":"˜",
|
||||
"output": ["ParseError", ["Character", "\u02DC"]]},
|
||||
|
||||
{"description": "Windows-1252 TRADE MARK SIGN hexadecimal numeric entity.",
|
||||
"input":"™",
|
||||
"output": ["ParseError", ["Character", "\u2122"]]},
|
||||
|
||||
{"description": "Windows-1252 LATIN SMALL LETTER S WITH CARON hexadecimal numeric entity.",
|
||||
"input":"š",
|
||||
"output": ["ParseError", ["Character", "\u0161"]]},
|
||||
|
||||
{"description": "Windows-1252 SINGLE RIGHT-POINTING ANGLE QUOTATION MARK hexadecimal numeric entity.",
|
||||
"input":"›",
|
||||
"output": ["ParseError", ["Character", "\u203A"]]},
|
||||
|
||||
{"description": "Windows-1252 LATIN SMALL LIGATURE OE hexadecimal numeric entity.",
|
||||
"input":"œ",
|
||||
"output": ["ParseError", ["Character", "\u0153"]]},
|
||||
|
||||
{"description": "Windows-1252 REPLACEMENT CHAR hexadecimal numeric entity.",
|
||||
"input":"",
|
||||
"output": ["ParseError", ["Character", "\u009D"]]},
|
||||
|
||||
{"description": "Windows-1252 LATIN SMALL LETTER Z WITH CARON hexadecimal numeric entity.",
|
||||
"input":"ž",
|
||||
"output": ["ParseError", ["Character", "\u017E"]]},
|
||||
|
||||
{"description": "Windows-1252 LATIN CAPITAL LETTER Y WITH DIAERESIS hexadecimal numeric entity.",
|
||||
"input":"Ÿ",
|
||||
"output": ["ParseError", ["Character", "\u0178"]]},
|
||||
|
||||
{"description": "Decimal numeric entity followed by hex character a.",
|
||||
"input":"aa",
|
||||
"output": ["ParseError", ["Character", "aa"]]},
|
||||
|
||||
{"description": "Decimal numeric entity followed by hex character A.",
|
||||
"input":"aA",
|
||||
"output": ["ParseError", ["Character", "aA"]]},
|
||||
|
||||
{"description": "Decimal numeric entity followed by hex character f.",
|
||||
"input":"af",
|
||||
"output": ["ParseError", ["Character", "af"]]},
|
||||
|
||||
{"description": "Decimal numeric entity followed by hex character A.",
|
||||
"input":"aF",
|
||||
"output": ["ParseError", ["Character", "aF"]]}
|
||||
|
||||
]}
|
33
tests/wpt/web-platform-tests/tools/html5lib/html5lib/tests/testdata/tokenizer/escapeFlag.test
vendored
Normal file
33
tests/wpt/web-platform-tests/tools/html5lib/html5lib/tests/testdata/tokenizer/escapeFlag.test
vendored
Normal file
|
@ -0,0 +1,33 @@
|
|||
{"tests": [
|
||||
|
||||
{"description":"Commented close tag in RCDATA or RAWTEXT",
|
||||
"initialStates":["RCDATA state", "RAWTEXT state"],
|
||||
"lastStartTag":"xmp",
|
||||
"input":"foo<!--</xmp>--></xmp>",
|
||||
"output":[["Character", "foo<!--"], ["EndTag", "xmp"], ["Character", "-->"], ["EndTag", "xmp"]]},
|
||||
|
||||
{"description":"Bogus comment in RCDATA or RAWTEXT",
|
||||
"initialStates":["RCDATA state", "RAWTEXT state"],
|
||||
"lastStartTag":"xmp",
|
||||
"input":"foo<!-->baz</xmp>",
|
||||
"output":[["Character", "foo<!-->baz"], ["EndTag", "xmp"]]},
|
||||
|
||||
{"description":"End tag surrounded by bogus comment in RCDATA or RAWTEXT",
|
||||
"initialStates":["RCDATA state", "RAWTEXT state"],
|
||||
"lastStartTag":"xmp",
|
||||
"input":"foo<!--></xmp><!-->baz</xmp>",
|
||||
"output":[["Character", "foo<!-->"], ["EndTag", "xmp"], "ParseError", ["Comment", ""], ["Character", "baz"], ["EndTag", "xmp"]]},
|
||||
|
||||
{"description":"Commented entities in RCDATA",
|
||||
"initialStates":["RCDATA state"],
|
||||
"lastStartTag":"xmp",
|
||||
"input":" & <!-- & --> & </xmp>",
|
||||
"output":[["Character", " & <!-- & --> & "], ["EndTag", "xmp"]]},
|
||||
|
||||
{"description":"Incorrect comment ending sequences in RCDATA or RAWTEXT",
|
||||
"initialStates":["RCDATA state", "RAWTEXT state"],
|
||||
"lastStartTag":"xmp",
|
||||
"input":"foo<!-- x --x>x-- >x--!>x--<></xmp>",
|
||||
"output":[["Character", "foo<!-- x --x>x-- >x--!>x--<>"], ["EndTag", "xmp"]]}
|
||||
|
||||
]}
|
42210
tests/wpt/web-platform-tests/tools/html5lib/html5lib/tests/testdata/tokenizer/namedEntities.test
vendored
Normal file
42210
tests/wpt/web-platform-tests/tools/html5lib/html5lib/tests/testdata/tokenizer/namedEntities.test
vendored
Normal file
File diff suppressed because it is too large
Load diff
1313
tests/wpt/web-platform-tests/tools/html5lib/html5lib/tests/testdata/tokenizer/numericEntities.test
vendored
Normal file
1313
tests/wpt/web-platform-tests/tools/html5lib/html5lib/tests/testdata/tokenizer/numericEntities.test
vendored
Normal file
File diff suppressed because it is too large
Load diff
|
@ -0,0 +1,7 @@
|
|||
{"tests": [
|
||||
|
||||
{"description":"<!---- >",
|
||||
"input":"<!---- >",
|
||||
"output":["ParseError", "ParseError", ["Comment","-- >"]]}
|
||||
|
||||
]}
|
196
tests/wpt/web-platform-tests/tools/html5lib/html5lib/tests/testdata/tokenizer/test1.test
vendored
Normal file
196
tests/wpt/web-platform-tests/tools/html5lib/html5lib/tests/testdata/tokenizer/test1.test
vendored
Normal file
|
@ -0,0 +1,196 @@
|
|||
{"tests": [
|
||||
|
||||
{"description":"Correct Doctype lowercase",
|
||||
"input":"<!DOCTYPE html>",
|
||||
"output":[["DOCTYPE", "html", null, null, true]]},
|
||||
|
||||
{"description":"Correct Doctype uppercase",
|
||||
"input":"<!DOCTYPE HTML>",
|
||||
"output":[["DOCTYPE", "html", null, null, true]]},
|
||||
|
||||
{"description":"Correct Doctype mixed case",
|
||||
"input":"<!DOCTYPE HtMl>",
|
||||
"output":[["DOCTYPE", "html", null, null, true]]},
|
||||
|
||||
{"description":"Correct Doctype case with EOF",
|
||||
"input":"<!DOCTYPE HtMl",
|
||||
"output":["ParseError", ["DOCTYPE", "html", null, null, false]]},
|
||||
|
||||
{"description":"Truncated doctype start",
|
||||
"input":"<!DOC>",
|
||||
"output":["ParseError", ["Comment", "DOC"]]},
|
||||
|
||||
{"description":"Doctype in error",
|
||||
"input":"<!DOCTYPE foo>",
|
||||
"output":[["DOCTYPE", "foo", null, null, true]]},
|
||||
|
||||
{"description":"Single Start Tag",
|
||||
"input":"<h>",
|
||||
"output":[["StartTag", "h", {}]]},
|
||||
|
||||
{"description":"Empty end tag",
|
||||
"input":"</>",
|
||||
"output":["ParseError"]},
|
||||
|
||||
{"description":"Empty start tag",
|
||||
"input":"<>",
|
||||
"output":["ParseError", ["Character", "<>"]]},
|
||||
|
||||
{"description":"Start Tag w/attribute",
|
||||
"input":"<h a='b'>",
|
||||
"output":[["StartTag", "h", {"a":"b"}]]},
|
||||
|
||||
{"description":"Start Tag w/attribute no quotes",
|
||||
"input":"<h a=b>",
|
||||
"output":[["StartTag", "h", {"a":"b"}]]},
|
||||
|
||||
{"description":"Start/End Tag",
|
||||
"input":"<h></h>",
|
||||
"output":[["StartTag", "h", {}], ["EndTag", "h"]]},
|
||||
|
||||
{"description":"Two unclosed start tags",
|
||||
"input":"<p>One<p>Two",
|
||||
"output":[["StartTag", "p", {}], ["Character", "One"], ["StartTag", "p", {}], ["Character", "Two"]]},
|
||||
|
||||
{"description":"End Tag w/attribute",
|
||||
"input":"<h></h a='b'>",
|
||||
"output":[["StartTag", "h", {}], "ParseError", ["EndTag", "h"]]},
|
||||
|
||||
{"description":"Multiple atts",
|
||||
"input":"<h a='b' c='d'>",
|
||||
"output":[["StartTag", "h", {"a":"b", "c":"d"}]]},
|
||||
|
||||
{"description":"Multiple atts no space",
|
||||
"input":"<h a='b'c='d'>",
|
||||
"output":["ParseError", ["StartTag", "h", {"a":"b", "c":"d"}]]},
|
||||
|
||||
{"description":"Repeated attr",
|
||||
"input":"<h a='b' a='d'>",
|
||||
"output":["ParseError", ["StartTag", "h", {"a":"b"}]]},
|
||||
|
||||
{"description":"Simple comment",
|
||||
"input":"<!--comment-->",
|
||||
"output":[["Comment", "comment"]]},
|
||||
|
||||
{"description":"Comment, Central dash no space",
|
||||
"input":"<!----->",
|
||||
"output":["ParseError", ["Comment", "-"]]},
|
||||
|
||||
{"description":"Comment, two central dashes",
|
||||
"input":"<!-- --comment -->",
|
||||
"output":["ParseError", ["Comment", " --comment "]]},
|
||||
|
||||
{"description":"Unfinished comment",
|
||||
"input":"<!--comment",
|
||||
"output":["ParseError", ["Comment", "comment"]]},
|
||||
|
||||
{"description":"Start of a comment",
|
||||
"input":"<!-",
|
||||
"output":["ParseError", ["Comment", "-"]]},
|
||||
|
||||
{"description":"Short comment",
|
||||
"input":"<!-->",
|
||||
"output":["ParseError", ["Comment", ""]]},
|
||||
|
||||
{"description":"Short comment two",
|
||||
"input":"<!--->",
|
||||
"output":["ParseError", ["Comment", ""]]},
|
||||
|
||||
{"description":"Short comment three",
|
||||
"input":"<!---->",
|
||||
"output":[["Comment", ""]]},
|
||||
|
||||
|
||||
{"description":"Ampersand EOF",
|
||||
"input":"&",
|
||||
"output":[["Character", "&"]]},
|
||||
|
||||
{"description":"Ampersand ampersand EOF",
|
||||
"input":"&&",
|
||||
"output":[["Character", "&&"]]},
|
||||
|
||||
{"description":"Ampersand space EOF",
|
||||
"input":"& ",
|
||||
"output":[["Character", "& "]]},
|
||||
|
||||
{"description":"Unfinished entity",
|
||||
"input":"&f",
|
||||
"output":[["Character", "&f"]]},
|
||||
|
||||
{"description":"Ampersand, number sign",
|
||||
"input":"&#",
|
||||
"output":["ParseError", ["Character", "&#"]]},
|
||||
|
||||
{"description":"Unfinished numeric entity",
|
||||
"input":"&#x",
|
||||
"output":["ParseError", ["Character", "&#x"]]},
|
||||
|
||||
{"description":"Entity with trailing semicolon (1)",
|
||||
"input":"I'm ¬it",
|
||||
"output":[["Character","I'm \u00ACit"]]},
|
||||
|
||||
{"description":"Entity with trailing semicolon (2)",
|
||||
"input":"I'm ∉",
|
||||
"output":[["Character","I'm \u2209"]]},
|
||||
|
||||
{"description":"Entity without trailing semicolon (1)",
|
||||
"input":"I'm ¬it",
|
||||
"output":[["Character","I'm "], "ParseError", ["Character", "\u00ACit"]]},
|
||||
|
||||
{"description":"Entity without trailing semicolon (2)",
|
||||
"input":"I'm ¬in",
|
||||
"output":[["Character","I'm "], "ParseError", ["Character", "\u00ACin"]]},
|
||||
|
||||
{"description":"Partial entity match at end of file",
|
||||
"input":"I'm &no",
|
||||
"output":[["Character","I'm &no"]]},
|
||||
|
||||
{"description":"Non-ASCII character reference name",
|
||||
"input":"&\u00AC;",
|
||||
"output":[["Character", "&\u00AC;"]]},
|
||||
|
||||
{"description":"ASCII decimal entity",
|
||||
"input":"$",
|
||||
"output":[["Character","$"]]},
|
||||
|
||||
{"description":"ASCII hexadecimal entity",
|
||||
"input":"?",
|
||||
"output":[["Character","?"]]},
|
||||
|
||||
{"description":"Hexadecimal entity in attribute",
|
||||
"input":"<h a='?'></h>",
|
||||
"output":[["StartTag", "h", {"a":"?"}], ["EndTag", "h"]]},
|
||||
|
||||
{"description":"Entity in attribute without semicolon ending in x",
|
||||
"input":"<h a='¬x'>",
|
||||
"output":[["StartTag", "h", {"a":"¬x"}]]},
|
||||
|
||||
{"description":"Entity in attribute without semicolon ending in 1",
|
||||
"input":"<h a='¬1'>",
|
||||
"output":[["StartTag", "h", {"a":"¬1"}]]},
|
||||
|
||||
{"description":"Entity in attribute without semicolon ending in i",
|
||||
"input":"<h a='¬i'>",
|
||||
"output":[["StartTag", "h", {"a":"¬i"}]]},
|
||||
|
||||
{"description":"Entity in attribute without semicolon",
|
||||
"input":"<h a='©'>",
|
||||
"output":["ParseError", ["StartTag", "h", {"a":"\u00A9"}]]},
|
||||
|
||||
{"description":"Unquoted attribute ending in ampersand",
|
||||
"input":"<s o=& t>",
|
||||
"output":[["StartTag","s",{"o":"&","t":""}]]},
|
||||
|
||||
{"description":"Unquoted attribute at end of tag with final character of &, with tag followed by characters",
|
||||
"input":"<a a=a&>foo",
|
||||
"output":[["StartTag", "a", {"a":"a&"}], ["Character", "foo"]]},
|
||||
|
||||
{"description":"plaintext element",
|
||||
"input":"<plaintext>foobar",
|
||||
"output":[["StartTag","plaintext",{}], ["Character","foobar"]]},
|
||||
|
||||
{"description":"Open angled bracket in unquoted attribute value state",
|
||||
"input":"<a a=f<>",
|
||||
"output":["ParseError", ["StartTag", "a", {"a":"f<"}]]}
|
||||
|
||||
]}
|
179
tests/wpt/web-platform-tests/tools/html5lib/html5lib/tests/testdata/tokenizer/test2.test
vendored
Normal file
179
tests/wpt/web-platform-tests/tools/html5lib/html5lib/tests/testdata/tokenizer/test2.test
vendored
Normal file
|
@ -0,0 +1,179 @@
|
|||
{"tests": [
|
||||
|
||||
{"description":"DOCTYPE without name",
|
||||
"input":"<!DOCTYPE>",
|
||||
"output":["ParseError", "ParseError", ["DOCTYPE", null, null, null, false]]},
|
||||
|
||||
{"description":"DOCTYPE without space before name",
|
||||
"input":"<!DOCTYPEhtml>",
|
||||
"output":["ParseError", ["DOCTYPE", "html", null, null, true]]},
|
||||
|
||||
{"description":"Incorrect DOCTYPE without a space before name",
|
||||
"input":"<!DOCTYPEfoo>",
|
||||
"output":["ParseError", ["DOCTYPE", "foo", null, null, true]]},
|
||||
|
||||
{"description":"DOCTYPE with publicId",
|
||||
"input":"<!DOCTYPE html PUBLIC \"-//W3C//DTD HTML Transitional 4.01//EN\">",
|
||||
"output":[["DOCTYPE", "html", "-//W3C//DTD HTML Transitional 4.01//EN", null, true]]},
|
||||
|
||||
{"description":"DOCTYPE with EOF after PUBLIC",
|
||||
"input":"<!DOCTYPE html PUBLIC",
|
||||
"output":["ParseError", ["DOCTYPE", "html", null, null, false]]},
|
||||
|
||||
{"description":"DOCTYPE with EOF after PUBLIC '",
|
||||
"input":"<!DOCTYPE html PUBLIC '",
|
||||
"output":["ParseError", ["DOCTYPE", "html", "", null, false]]},
|
||||
|
||||
{"description":"DOCTYPE with EOF after PUBLIC 'x",
|
||||
"input":"<!DOCTYPE html PUBLIC 'x",
|
||||
"output":["ParseError", ["DOCTYPE", "html", "x", null, false]]},
|
||||
|
||||
{"description":"DOCTYPE with systemId",
|
||||
"input":"<!DOCTYPE html SYSTEM \"-//W3C//DTD HTML Transitional 4.01//EN\">",
|
||||
"output":[["DOCTYPE", "html", null, "-//W3C//DTD HTML Transitional 4.01//EN", true]]},
|
||||
|
||||
{"description":"DOCTYPE with publicId and systemId",
|
||||
"input":"<!DOCTYPE html PUBLIC \"-//W3C//DTD HTML Transitional 4.01//EN\" \"-//W3C//DTD HTML Transitional 4.01//EN\">",
|
||||
"output":[["DOCTYPE", "html", "-//W3C//DTD HTML Transitional 4.01//EN", "-//W3C//DTD HTML Transitional 4.01//EN", true]]},
|
||||
|
||||
{"description":"DOCTYPE with > in double-quoted publicId",
|
||||
"input":"<!DOCTYPE html PUBLIC \">x",
|
||||
"output":["ParseError", ["DOCTYPE", "html", "", null, false], ["Character", "x"]]},
|
||||
|
||||
{"description":"DOCTYPE with > in single-quoted publicId",
|
||||
"input":"<!DOCTYPE html PUBLIC '>x",
|
||||
"output":["ParseError", ["DOCTYPE", "html", "", null, false], ["Character", "x"]]},
|
||||
|
||||
{"description":"DOCTYPE with > in double-quoted systemId",
|
||||
"input":"<!DOCTYPE html PUBLIC \"foo\" \">x",
|
||||
"output":["ParseError", ["DOCTYPE", "html", "foo", "", false], ["Character", "x"]]},
|
||||
|
||||
{"description":"DOCTYPE with > in single-quoted systemId",
|
||||
"input":"<!DOCTYPE html PUBLIC 'foo' '>x",
|
||||
"output":["ParseError", ["DOCTYPE", "html", "foo", "", false], ["Character", "x"]]},
|
||||
|
||||
{"description":"Incomplete doctype",
|
||||
"input":"<!DOCTYPE html ",
|
||||
"output":["ParseError", ["DOCTYPE", "html", null, null, false]]},
|
||||
|
||||
{"description":"Numeric entity representing the NUL character",
|
||||
"input":"�",
|
||||
"output":["ParseError", ["Character", "\uFFFD"]]},
|
||||
|
||||
{"description":"Hexadecimal entity representing the NUL character",
|
||||
"input":"�",
|
||||
"output":["ParseError", ["Character", "\uFFFD"]]},
|
||||
|
||||
{"description":"Numeric entity representing a codepoint after 1114111 (U+10FFFF)",
|
||||
"input":"�",
|
||||
"output":["ParseError", ["Character", "\uFFFD"]]},
|
||||
|
||||
{"description":"Hexadecimal entity representing a codepoint after 1114111 (U+10FFFF)",
|
||||
"input":"�",
|
||||
"output":["ParseError", ["Character", "\uFFFD"]]},
|
||||
|
||||
{"description":"Hexadecimal entity pair representing a surrogate pair",
|
||||
"input":"��",
|
||||
"output":["ParseError", ["Character", "\uFFFD"], "ParseError", ["Character", "\uFFFD"]]},
|
||||
|
||||
{"description":"Hexadecimal entity with mixed uppercase and lowercase",
|
||||
"input":"ꯍ",
|
||||
"output":[["Character", "\uABCD"]]},
|
||||
|
||||
{"description":"Entity without a name",
|
||||
"input":"&;",
|
||||
"output":[["Character", "&;"]]},
|
||||
|
||||
{"description":"Unescaped ampersand in attribute value",
|
||||
"input":"<h a='&'>",
|
||||
"output":[["StartTag", "h", { "a":"&" }]]},
|
||||
|
||||
{"description":"StartTag containing <",
|
||||
"input":"<a<b>",
|
||||
"output":[["StartTag", "a<b", { }]]},
|
||||
|
||||
{"description":"Non-void element containing trailing /",
|
||||
"input":"<h/>",
|
||||
"output":[["StartTag","h",{},true]]},
|
||||
|
||||
{"description":"Void element with permitted slash",
|
||||
"input":"<br/>",
|
||||
"output":[["StartTag","br",{},true]]},
|
||||
|
||||
{"description":"Void element with permitted slash (with attribute)",
|
||||
"input":"<br foo='bar'/>",
|
||||
"output":[["StartTag","br",{"foo":"bar"},true]]},
|
||||
|
||||
{"description":"StartTag containing /",
|
||||
"input":"<h/a='b'>",
|
||||
"output":["ParseError", ["StartTag", "h", { "a":"b" }]]},
|
||||
|
||||
{"description":"Double-quoted attribute value",
|
||||
"input":"<h a=\"b\">",
|
||||
"output":[["StartTag", "h", { "a":"b" }]]},
|
||||
|
||||
{"description":"Unescaped </",
|
||||
"input":"</",
|
||||
"output":["ParseError", ["Character", "</"]]},
|
||||
|
||||
{"description":"Illegal end tag name",
|
||||
"input":"</1>",
|
||||
"output":["ParseError", ["Comment", "1"]]},
|
||||
|
||||
{"description":"Simili processing instruction",
|
||||
"input":"<?namespace>",
|
||||
"output":["ParseError", ["Comment", "?namespace"]]},
|
||||
|
||||
{"description":"A bogus comment stops at >, even if preceeded by two dashes",
|
||||
"input":"<?foo-->",
|
||||
"output":["ParseError", ["Comment", "?foo--"]]},
|
||||
|
||||
{"description":"Unescaped <",
|
||||
"input":"foo < bar",
|
||||
"output":[["Character", "foo "], "ParseError", ["Character", "< bar"]]},
|
||||
|
||||
{"description":"Null Byte Replacement",
|
||||
"input":"\u0000",
|
||||
"output":["ParseError", ["Character", "\u0000"]]},
|
||||
|
||||
{"description":"Comment with dash",
|
||||
"input":"<!---x",
|
||||
"output":["ParseError", ["Comment", "-x"]]},
|
||||
|
||||
{"description":"Entity + newline",
|
||||
"input":"\nx\n>\n",
|
||||
"output":[["Character","\nx\n>\n"]]},
|
||||
|
||||
{"description":"Start tag with no attributes but space before the greater-than sign",
|
||||
"input":"<h >",
|
||||
"output":[["StartTag", "h", {}]]},
|
||||
|
||||
{"description":"Empty attribute followed by uppercase attribute",
|
||||
"input":"<h a B=''>",
|
||||
"output":[["StartTag", "h", {"a":"", "b":""}]]},
|
||||
|
||||
{"description":"Double-quote after attribute name",
|
||||
"input":"<h a \">",
|
||||
"output":["ParseError", ["StartTag", "h", {"a":"", "\"":""}]]},
|
||||
|
||||
{"description":"Single-quote after attribute name",
|
||||
"input":"<h a '>",
|
||||
"output":["ParseError", ["StartTag", "h", {"a":"", "'":""}]]},
|
||||
|
||||
{"description":"Empty end tag with following characters",
|
||||
"input":"a</>bc",
|
||||
"output":[["Character", "a"], "ParseError", ["Character", "bc"]]},
|
||||
|
||||
{"description":"Empty end tag with following tag",
|
||||
"input":"a</><b>c",
|
||||
"output":[["Character", "a"], "ParseError", ["StartTag", "b", {}], ["Character", "c"]]},
|
||||
|
||||
{"description":"Empty end tag with following comment",
|
||||
"input":"a</><!--b-->c",
|
||||
"output":[["Character", "a"], "ParseError", ["Comment", "b"], ["Character", "c"]]},
|
||||
|
||||
{"description":"Empty end tag with following end tag",
|
||||
"input":"a</></b>c",
|
||||
"output":[["Character", "a"], "ParseError", ["EndTag", "b"], ["Character", "c"]]}
|
||||
|
||||
]}
|
6047
tests/wpt/web-platform-tests/tools/html5lib/html5lib/tests/testdata/tokenizer/test3.test
vendored
Normal file
6047
tests/wpt/web-platform-tests/tools/html5lib/html5lib/tests/testdata/tokenizer/test3.test
vendored
Normal file
File diff suppressed because it is too large
Load diff
344
tests/wpt/web-platform-tests/tools/html5lib/html5lib/tests/testdata/tokenizer/test4.test
vendored
Normal file
344
tests/wpt/web-platform-tests/tools/html5lib/html5lib/tests/testdata/tokenizer/test4.test
vendored
Normal file
|
@ -0,0 +1,344 @@
|
|||
{"tests": [
|
||||
|
||||
{"description":"< in attribute name",
|
||||
"input":"<z/0 <>",
|
||||
"output":["ParseError", "ParseError", ["StartTag", "z", {"0": "", "<": ""}]]},
|
||||
|
||||
{"description":"< in attribute value",
|
||||
"input":"<z x=<>",
|
||||
"output":["ParseError", ["StartTag", "z", {"x": "<"}]]},
|
||||
|
||||
{"description":"= in unquoted attribute value",
|
||||
"input":"<z z=z=z>",
|
||||
"output":["ParseError", ["StartTag", "z", {"z": "z=z"}]]},
|
||||
|
||||
{"description":"= attribute",
|
||||
"input":"<z =>",
|
||||
"output":["ParseError", ["StartTag", "z", {"=": ""}]]},
|
||||
|
||||
{"description":"== attribute",
|
||||
"input":"<z ==>",
|
||||
"output":["ParseError", "ParseError", ["StartTag", "z", {"=": ""}]]},
|
||||
|
||||
{"description":"=== attribute",
|
||||
"input":"<z ===>",
|
||||
"output":["ParseError", "ParseError", ["StartTag", "z", {"=": "="}]]},
|
||||
|
||||
{"description":"==== attribute",
|
||||
"input":"<z ====>",
|
||||
"output":["ParseError", "ParseError", "ParseError", ["StartTag", "z", {"=": "=="}]]},
|
||||
|
||||
{"description":"\" after ampersand in double-quoted attribute value",
|
||||
"input":"<z z=\"&\">",
|
||||
"output":[["StartTag", "z", {"z": "&"}]]},
|
||||
|
||||
{"description":"' after ampersand in double-quoted attribute value",
|
||||
"input":"<z z=\"&'\">",
|
||||
"output":[["StartTag", "z", {"z": "&'"}]]},
|
||||
|
||||
{"description":"' after ampersand in single-quoted attribute value",
|
||||
"input":"<z z='&'>",
|
||||
"output":[["StartTag", "z", {"z": "&"}]]},
|
||||
|
||||
{"description":"\" after ampersand in single-quoted attribute value",
|
||||
"input":"<z z='&\"'>",
|
||||
"output":[["StartTag", "z", {"z": "&\""}]]},
|
||||
|
||||
{"description":"Text after bogus character reference",
|
||||
"input":"<z z='&xlink_xmlns;'>bar<z>",
|
||||
"output":[["StartTag","z",{"z":"&xlink_xmlns;"}],["Character","bar"],["StartTag","z",{}]]},
|
||||
|
||||
{"description":"Text after hex character reference",
|
||||
"input":"<z z='  foo'>bar<z>",
|
||||
"output":[["StartTag","z",{"z":" foo"}],["Character","bar"],["StartTag","z",{}]]},
|
||||
|
||||
{"description":"Attribute name starting with \"",
|
||||
"input":"<foo \"='bar'>",
|
||||
"output":["ParseError", ["StartTag", "foo", {"\"": "bar"}]]},
|
||||
|
||||
{"description":"Attribute name starting with '",
|
||||
"input":"<foo '='bar'>",
|
||||
"output":["ParseError", ["StartTag", "foo", {"'": "bar"}]]},
|
||||
|
||||
{"description":"Attribute name containing \"",
|
||||
"input":"<foo a\"b='bar'>",
|
||||
"output":["ParseError", ["StartTag", "foo", {"a\"b": "bar"}]]},
|
||||
|
||||
{"description":"Attribute name containing '",
|
||||
"input":"<foo a'b='bar'>",
|
||||
"output":["ParseError", ["StartTag", "foo", {"a'b": "bar"}]]},
|
||||
|
||||
{"description":"Unquoted attribute value containing '",
|
||||
"input":"<foo a=b'c>",
|
||||
"output":["ParseError", ["StartTag", "foo", {"a": "b'c"}]]},
|
||||
|
||||
{"description":"Unquoted attribute value containing \"",
|
||||
"input":"<foo a=b\"c>",
|
||||
"output":["ParseError", ["StartTag", "foo", {"a": "b\"c"}]]},
|
||||
|
||||
{"description":"Double-quoted attribute value not followed by whitespace",
|
||||
"input":"<foo a=\"b\"c>",
|
||||
"output":["ParseError", ["StartTag", "foo", {"a": "b", "c": ""}]]},
|
||||
|
||||
{"description":"Single-quoted attribute value not followed by whitespace",
|
||||
"input":"<foo a='b'c>",
|
||||
"output":["ParseError", ["StartTag", "foo", {"a": "b", "c": ""}]]},
|
||||
|
||||
{"description":"Quoted attribute followed by permitted /",
|
||||
"input":"<br a='b'/>",
|
||||
"output":[["StartTag","br",{"a":"b"},true]]},
|
||||
|
||||
{"description":"Quoted attribute followed by non-permitted /",
|
||||
"input":"<bar a='b'/>",
|
||||
"output":[["StartTag","bar",{"a":"b"},true]]},
|
||||
|
||||
{"description":"CR EOF after doctype name",
|
||||
"input":"<!doctype html \r",
|
||||
"output":["ParseError", ["DOCTYPE", "html", null, null, false]]},
|
||||
|
||||
{"description":"CR EOF in tag name",
|
||||
"input":"<z\r",
|
||||
"output":["ParseError"]},
|
||||
|
||||
{"description":"Slash EOF in tag name",
|
||||
"input":"<z/",
|
||||
"output":["ParseError"]},
|
||||
|
||||
{"description":"Zero hex numeric entity",
|
||||
"input":"�",
|
||||
"output":["ParseError", "ParseError", ["Character", "\uFFFD"]]},
|
||||
|
||||
{"description":"Zero decimal numeric entity",
|
||||
"input":"�",
|
||||
"output":["ParseError", "ParseError", ["Character", "\uFFFD"]]},
|
||||
|
||||
{"description":"Zero-prefixed hex numeric entity",
|
||||
"input":"A",
|
||||
"output":[["Character", "A"]]},
|
||||
|
||||
{"description":"Zero-prefixed decimal numeric entity",
|
||||
"input":"A",
|
||||
"output":[["Character", "A"]]},
|
||||
|
||||
{"description":"Empty hex numeric entities",
|
||||
"input":"&#x &#X ",
|
||||
"output":["ParseError", ["Character", "&#x "], "ParseError", ["Character", "&#X "]]},
|
||||
|
||||
{"description":"Empty decimal numeric entities",
|
||||
"input":"&# &#; ",
|
||||
"output":["ParseError", ["Character", "&# "], "ParseError", ["Character", "&#; "]]},
|
||||
|
||||
{"description":"Non-BMP numeric entity",
|
||||
"input":"𐀀",
|
||||
"output":[["Character", "\uD800\uDC00"]]},
|
||||
|
||||
{"description":"Maximum non-BMP numeric entity",
|
||||
"input":"",
|
||||
"output":["ParseError", ["Character", "\uDBFF\uDFFF"]]},
|
||||
|
||||
{"description":"Above maximum numeric entity",
|
||||
"input":"�",
|
||||
"output":["ParseError", ["Character", "\uFFFD"]]},
|
||||
|
||||
{"description":"32-bit hex numeric entity",
|
||||
"input":"�",
|
||||
"output":["ParseError", ["Character", "\uFFFD"]]},
|
||||
|
||||
{"description":"33-bit hex numeric entity",
|
||||
"input":"�",
|
||||
"output":["ParseError", ["Character", "\uFFFD"]]},
|
||||
|
||||
{"description":"33-bit decimal numeric entity",
|
||||
"input":"�",
|
||||
"output":["ParseError", ["Character", "\uFFFD"]]},
|
||||
|
||||
{"description":"65-bit hex numeric entity",
|
||||
"input":"�",
|
||||
"output":["ParseError", ["Character", "\uFFFD"]]},
|
||||
|
||||
{"description":"65-bit decimal numeric entity",
|
||||
"input":"�",
|
||||
"output":["ParseError", ["Character", "\uFFFD"]]},
|
||||
|
||||
{"description":"Surrogate code point edge cases",
|
||||
"input":"퟿����",
|
||||
"output":[["Character", "\uD7FF"], "ParseError", ["Character", "\uFFFD"], "ParseError", ["Character", "\uFFFD"], "ParseError", ["Character", "\uFFFD"], "ParseError", ["Character", "\uFFFD\uE000"]]},
|
||||
|
||||
{"description":"Uppercase start tag name",
|
||||
"input":"<X>",
|
||||
"output":[["StartTag", "x", {}]]},
|
||||
|
||||
{"description":"Uppercase end tag name",
|
||||
"input":"</X>",
|
||||
"output":[["EndTag", "x"]]},
|
||||
|
||||
{"description":"Uppercase attribute name",
|
||||
"input":"<x X>",
|
||||
"output":[["StartTag", "x", { "x":"" }]]},
|
||||
|
||||
{"description":"Tag/attribute name case edge values",
|
||||
"input":"<x@AZ[`az{ @AZ[`az{>",
|
||||
"output":[["StartTag", "x@az[`az{", { "@az[`az{":"" }]]},
|
||||
|
||||
{"description":"Duplicate different-case attributes",
|
||||
"input":"<x x=1 x=2 X=3>",
|
||||
"output":["ParseError", "ParseError", ["StartTag", "x", { "x":"1" }]]},
|
||||
|
||||
{"description":"Uppercase close tag attributes",
|
||||
"input":"</x X>",
|
||||
"output":["ParseError", ["EndTag", "x"]]},
|
||||
|
||||
{"description":"Duplicate close tag attributes",
|
||||
"input":"</x x x>",
|
||||
"output":["ParseError", "ParseError", ["EndTag", "x"]]},
|
||||
|
||||
{"description":"Permitted slash",
|
||||
"input":"<br/>",
|
||||
"output":[["StartTag","br",{},true]]},
|
||||
|
||||
{"description":"Non-permitted slash",
|
||||
"input":"<xr/>",
|
||||
"output":[["StartTag","xr",{},true]]},
|
||||
|
||||
{"description":"Permitted slash but in close tag",
|
||||
"input":"</br/>",
|
||||
"output":["ParseError", ["EndTag", "br"]]},
|
||||
|
||||
{"description":"Doctype public case-sensitivity (1)",
|
||||
"input":"<!DoCtYpE HtMl PuBlIc \"AbC\" \"XyZ\">",
|
||||
"output":[["DOCTYPE", "html", "AbC", "XyZ", true]]},
|
||||
|
||||
{"description":"Doctype public case-sensitivity (2)",
|
||||
"input":"<!dOcTyPe hTmL pUbLiC \"aBc\" \"xYz\">",
|
||||
"output":[["DOCTYPE", "html", "aBc", "xYz", true]]},
|
||||
|
||||
{"description":"Doctype system case-sensitivity (1)",
|
||||
"input":"<!DoCtYpE HtMl SyStEm \"XyZ\">",
|
||||
"output":[["DOCTYPE", "html", null, "XyZ", true]]},
|
||||
|
||||
{"description":"Doctype system case-sensitivity (2)",
|
||||
"input":"<!dOcTyPe hTmL sYsTeM \"xYz\">",
|
||||
"output":[["DOCTYPE", "html", null, "xYz", true]]},
|
||||
|
||||
{"description":"U+0000 in lookahead region after non-matching character",
|
||||
"input":"<!doc>\u0000",
|
||||
"output":["ParseError", ["Comment", "doc"], "ParseError", ["Character", "\u0000"]],
|
||||
"ignoreErrorOrder":true},
|
||||
|
||||
{"description":"U+0000 in lookahead region",
|
||||
"input":"<!doc\u0000",
|
||||
"output":["ParseError", ["Comment", "doc\uFFFD"]],
|
||||
"ignoreErrorOrder":true},
|
||||
|
||||
{"description":"U+0080 in lookahead region",
|
||||
"input":"<!doc\u0080",
|
||||
"output":["ParseError", "ParseError", ["Comment", "doc\u0080"]],
|
||||
"ignoreErrorOrder":true},
|
||||
|
||||
{"description":"U+FDD1 in lookahead region",
|
||||
"input":"<!doc\uFDD1",
|
||||
"output":["ParseError", "ParseError", ["Comment", "doc\uFDD1"]],
|
||||
"ignoreErrorOrder":true},
|
||||
|
||||
{"description":"U+1FFFF in lookahead region",
|
||||
"input":"<!doc\uD83F\uDFFF",
|
||||
"output":["ParseError", "ParseError", ["Comment", "doc\uD83F\uDFFF"]],
|
||||
"ignoreErrorOrder":true},
|
||||
|
||||
{"description":"CR followed by non-LF",
|
||||
"input":"\r?",
|
||||
"output":[["Character", "\n?"]]},
|
||||
|
||||
{"description":"CR at EOF",
|
||||
"input":"\r",
|
||||
"output":[["Character", "\n"]]},
|
||||
|
||||
{"description":"LF at EOF",
|
||||
"input":"\n",
|
||||
"output":[["Character", "\n"]]},
|
||||
|
||||
{"description":"CR LF",
|
||||
"input":"\r\n",
|
||||
"output":[["Character", "\n"]]},
|
||||
|
||||
{"description":"CR CR",
|
||||
"input":"\r\r",
|
||||
"output":[["Character", "\n\n"]]},
|
||||
|
||||
{"description":"LF LF",
|
||||
"input":"\n\n",
|
||||
"output":[["Character", "\n\n"]]},
|
||||
|
||||
{"description":"LF CR",
|
||||
"input":"\n\r",
|
||||
"output":[["Character", "\n\n"]]},
|
||||
|
||||
{"description":"text CR CR CR text",
|
||||
"input":"text\r\r\rtext",
|
||||
"output":[["Character", "text\n\n\ntext"]]},
|
||||
|
||||
{"description":"Doctype publik",
|
||||
"input":"<!DOCTYPE html PUBLIK \"AbC\" \"XyZ\">",
|
||||
"output":["ParseError", ["DOCTYPE", "html", null, null, false]]},
|
||||
|
||||
{"description":"Doctype publi",
|
||||
"input":"<!DOCTYPE html PUBLI",
|
||||
"output":["ParseError", ["DOCTYPE", "html", null, null, false]]},
|
||||
|
||||
{"description":"Doctype sistem",
|
||||
"input":"<!DOCTYPE html SISTEM \"AbC\">",
|
||||
"output":["ParseError", ["DOCTYPE", "html", null, null, false]]},
|
||||
|
||||
{"description":"Doctype sys",
|
||||
"input":"<!DOCTYPE html SYS",
|
||||
"output":["ParseError", ["DOCTYPE", "html", null, null, false]]},
|
||||
|
||||
{"description":"Doctype html x>text",
|
||||
"input":"<!DOCTYPE html x>text",
|
||||
"output":["ParseError", ["DOCTYPE", "html", null, null, false], ["Character", "text"]]},
|
||||
|
||||
{"description":"Grave accent in unquoted attribute",
|
||||
"input":"<a a=aa`>",
|
||||
"output":["ParseError", ["StartTag", "a", {"a":"aa`"}]]},
|
||||
|
||||
{"description":"EOF in tag name state ",
|
||||
"input":"<a",
|
||||
"output":["ParseError"]},
|
||||
|
||||
{"description":"EOF in tag name state",
|
||||
"input":"<a",
|
||||
"output":["ParseError"]},
|
||||
|
||||
{"description":"EOF in before attribute name state",
|
||||
"input":"<a ",
|
||||
"output":["ParseError"]},
|
||||
|
||||
{"description":"EOF in attribute name state",
|
||||
"input":"<a a",
|
||||
"output":["ParseError"]},
|
||||
|
||||
{"description":"EOF in after attribute name state",
|
||||
"input":"<a a ",
|
||||
"output":["ParseError"]},
|
||||
|
||||
{"description":"EOF in before attribute value state",
|
||||
"input":"<a a =",
|
||||
"output":["ParseError"]},
|
||||
|
||||
{"description":"EOF in attribute value (double quoted) state",
|
||||
"input":"<a a =\"a",
|
||||
"output":["ParseError"]},
|
||||
|
||||
{"description":"EOF in attribute value (single quoted) state",
|
||||
"input":"<a a ='a",
|
||||
"output":["ParseError"]},
|
||||
|
||||
{"description":"EOF in attribute value (unquoted) state",
|
||||
"input":"<a a =a",
|
||||
"output":["ParseError"]},
|
||||
|
||||
{"description":"EOF in after attribute value state",
|
||||
"input":"<a a ='a'",
|
||||
"output":["ParseError"]}
|
||||
|
||||
]}
|
1295
tests/wpt/web-platform-tests/tools/html5lib/html5lib/tests/testdata/tokenizer/unicodeChars.test
vendored
Normal file
1295
tests/wpt/web-platform-tests/tools/html5lib/html5lib/tests/testdata/tokenizer/unicodeChars.test
vendored
Normal file
File diff suppressed because it is too large
Load diff
|
@ -0,0 +1,27 @@
|
|||
{"tests" : [
|
||||
{"description": "Invalid Unicode character U+DFFF",
|
||||
"doubleEscaped":true,
|
||||
"input": "\\uDFFF",
|
||||
"output":["ParseError", ["Character", "\\uDFFF"]]},
|
||||
|
||||
{"description": "Invalid Unicode character U+D800",
|
||||
"doubleEscaped":true,
|
||||
"input": "\\uD800",
|
||||
"output":["ParseError", ["Character", "\\uD800"]]},
|
||||
|
||||
{"description": "Invalid Unicode character U+DFFF with valid preceding character",
|
||||
"doubleEscaped":true,
|
||||
"input": "a\\uDFFF",
|
||||
"output":[["Character", "a"], "ParseError", ["Character", "\\uDFFF"]]},
|
||||
|
||||
{"description": "Invalid Unicode character U+D800 with valid following character",
|
||||
"doubleEscaped":true,
|
||||
"input": "\\uD800a",
|
||||
"output":["ParseError", ["Character", "\\uD800a"]]},
|
||||
|
||||
{"description":"CR followed by U+0000",
|
||||
"input":"\r\u0000",
|
||||
"output":[["Character", "\n"], "ParseError", ["Character", "\u0000"]],
|
||||
"ignoreErrorOrder":true}
|
||||
]
|
||||
}
|
22
tests/wpt/web-platform-tests/tools/html5lib/html5lib/tests/testdata/tokenizer/xmlViolation.test
vendored
Normal file
22
tests/wpt/web-platform-tests/tools/html5lib/html5lib/tests/testdata/tokenizer/xmlViolation.test
vendored
Normal file
|
@ -0,0 +1,22 @@
|
|||
{"xmlViolationTests": [
|
||||
|
||||
{"description":"Non-XML character",
|
||||
"input":"a\uFFFFb",
|
||||
"ignoreErrorOrder":true,
|
||||
"output":["ParseError",["Character","a\uFFFDb"]]},
|
||||
|
||||
{"description":"Non-XML space",
|
||||
"input":"a\u000Cb",
|
||||
"ignoreErrorOrder":true,
|
||||
"output":[["Character","a b"]]},
|
||||
|
||||
{"description":"Double hyphen in comment",
|
||||
"input":"<!-- foo -- bar -->",
|
||||
"output":["ParseError",["Comment"," foo - - bar "]]},
|
||||
|
||||
{"description":"FF between attributes",
|
||||
"input":"<a b=''\u000Cc=''>",
|
||||
"output":[["StartTag","a",{"b":"","c":""}]]}
|
||||
]}
|
||||
|
||||
|
98
tests/wpt/web-platform-tests/tools/html5lib/html5lib/tests/testdata/tree-construction/README.md
vendored
Normal file
98
tests/wpt/web-platform-tests/tools/html5lib/html5lib/tests/testdata/tree-construction/README.md
vendored
Normal file
|
@ -0,0 +1,98 @@
|
|||
Tree Construction Tests
|
||||
=======================
|
||||
|
||||
Each file containing tree construction tests consists of any number of
|
||||
tests separated by two newlines (LF) and a single newline before the end
|
||||
of the file. For instance:
|
||||
|
||||
[TEST]LF
|
||||
LF
|
||||
[TEST]LF
|
||||
LF
|
||||
[TEST]LF
|
||||
|
||||
Where [TEST] is the following format:
|
||||
|
||||
Each test must begin with a string "\#data" followed by a newline (LF).
|
||||
All subsequent lines until a line that says "\#errors" are the test data
|
||||
and must be passed to the system being tested unchanged, except with the
|
||||
final newline (on the last line) removed.
|
||||
|
||||
Then there must be a line that says "\#errors". It must be followed by
|
||||
one line per parse error that a conformant checker would return. It
|
||||
doesn't matter what those lines are, although they can't be
|
||||
"\#document-fragment", "\#document", "\#script-off", "\#script-on", or
|
||||
empty, the only thing that matters is that there be the right number
|
||||
of parse errors.
|
||||
|
||||
Then there \*may\* be a line that says "\#document-fragment", which must
|
||||
be followed by a newline (LF), followed by a string of characters that
|
||||
indicates the context element, followed by a newline (LF). If this line
|
||||
is present the "\#data" must be parsed using the HTML fragment parsing
|
||||
algorithm with the context element as context.
|
||||
|
||||
Then there \*may\* be a line that says "\#script-off" or
|
||||
"\#script-in". If a line that says "\#script-off" is present, the
|
||||
parser must set the scripting flag to disabled. If a line that says
|
||||
"\#script-on" is present, it must set it to enabled. Otherwise, the
|
||||
test should be run in both modes.
|
||||
|
||||
Then there must be a line that says "\#document", which must be followed
|
||||
by a dump of the tree of the parsed DOM. Each node must be represented
|
||||
by a single line. Each line must start with "| ", followed by two spaces
|
||||
per parent node that the node has before the root document node.
|
||||
|
||||
- Element nodes must be represented by a "`<`" then the *tag name
|
||||
string* "`>`", and all the attributes must be given, sorted
|
||||
lexicographically by UTF-16 code unit according to their *attribute
|
||||
name string*, on subsequent lines, as if they were children of the
|
||||
element node.
|
||||
- Attribute nodes must have the *attribute name string*, then an "="
|
||||
sign, then the attribute value in double quotes (").
|
||||
- Text nodes must be the string, in double quotes. Newlines aren't
|
||||
escaped.
|
||||
- Comments must be "`<`" then "`!-- `" then the data then "` -->`".
|
||||
- DOCTYPEs must be "`<!DOCTYPE `" then the name then if either of the
|
||||
system id or public id is non-empty a space, public id in
|
||||
double-quotes, another space an the system id in double-quotes, and
|
||||
then in any case "`>`".
|
||||
- Processing instructions must be "`<?`", then the target, then a
|
||||
space, then the data and then "`>`". (The HTML parser cannot emit
|
||||
processing instructions, but scripts can, and the WebVTT to DOM
|
||||
rules can emit them.)
|
||||
- Template contents are represented by the string "content" with the
|
||||
children below it.
|
||||
|
||||
The *tag name string* is the local name prefixed by a namespace
|
||||
designator. For the HTML namespace, the namespace designator is the
|
||||
empty string, i.e. there's no prefix. For the SVG namespace, the
|
||||
namespace designator is "svg ". For the MathML namespace, the namespace
|
||||
designator is "math ".
|
||||
|
||||
The *attribute name string* is the local name prefixed by a namespace
|
||||
designator. For no namespace, the namespace designator is the empty
|
||||
string, i.e. there's no prefix. For the XLink namespace, the namespace
|
||||
designator is "xlink ". For the XML namespace, the namespace designator
|
||||
is "xml ". For the XMLNS namespace, the namespace designator is "xmlns
|
||||
". Note the difference between "xlink:href" which is an attribute in no
|
||||
namespace with the local name "xlink:href" and "xlink href" which is an
|
||||
attribute in the xlink namespace with the local name "href".
|
||||
|
||||
If there is also a "\#document-fragment" the bit following "\#document"
|
||||
must be a representation of the HTML fragment serialization for the
|
||||
context element given by "\#document-fragment".
|
||||
|
||||
For example:
|
||||
|
||||
#data
|
||||
<p>One<p>Two
|
||||
#errors
|
||||
3: Missing document type declaration
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| <p>
|
||||
| "One"
|
||||
| <p>
|
||||
| "Two"
|
337
tests/wpt/web-platform-tests/tools/html5lib/html5lib/tests/testdata/tree-construction/adoption01.dat
vendored
Normal file
337
tests/wpt/web-platform-tests/tools/html5lib/html5lib/tests/testdata/tree-construction/adoption01.dat
vendored
Normal file
|
@ -0,0 +1,337 @@
|
|||
#data
|
||||
<a><p></a></p>
|
||||
#errors
|
||||
(1,3): expected-doctype-but-got-start-tag
|
||||
(1,10): adoption-agency-1.3
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| <a>
|
||||
| <p>
|
||||
| <a>
|
||||
|
||||
#data
|
||||
<a>1<p>2</a>3</p>
|
||||
#errors
|
||||
(1,3): expected-doctype-but-got-start-tag
|
||||
(1,12): adoption-agency-1.3
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| <a>
|
||||
| "1"
|
||||
| <p>
|
||||
| <a>
|
||||
| "2"
|
||||
| "3"
|
||||
|
||||
#data
|
||||
<a>1<button>2</a>3</button>
|
||||
#errors
|
||||
(1,3): expected-doctype-but-got-start-tag
|
||||
(1,17): adoption-agency-1.3
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| <a>
|
||||
| "1"
|
||||
| <button>
|
||||
| <a>
|
||||
| "2"
|
||||
| "3"
|
||||
|
||||
#data
|
||||
<a>1<b>2</a>3</b>
|
||||
#errors
|
||||
(1,3): expected-doctype-but-got-start-tag
|
||||
(1,12): adoption-agency-1.3
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| <a>
|
||||
| "1"
|
||||
| <b>
|
||||
| "2"
|
||||
| <b>
|
||||
| "3"
|
||||
|
||||
#data
|
||||
<a>1<div>2<div>3</a>4</div>5</div>
|
||||
#errors
|
||||
(1,3): expected-doctype-but-got-start-tag
|
||||
(1,20): adoption-agency-1.3
|
||||
(1,20): adoption-agency-1.3
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| <a>
|
||||
| "1"
|
||||
| <div>
|
||||
| <a>
|
||||
| "2"
|
||||
| <div>
|
||||
| <a>
|
||||
| "3"
|
||||
| "4"
|
||||
| "5"
|
||||
|
||||
#data
|
||||
<table><a>1<p>2</a>3</p>
|
||||
#errors
|
||||
(1,7): expected-doctype-but-got-start-tag
|
||||
(1,10): unexpected-start-tag-implies-table-voodoo
|
||||
(1,11): unexpected-character-implies-table-voodoo
|
||||
(1,14): unexpected-start-tag-implies-table-voodoo
|
||||
(1,15): unexpected-character-implies-table-voodoo
|
||||
(1,19): unexpected-end-tag-implies-table-voodoo
|
||||
(1,19): adoption-agency-1.3
|
||||
(1,20): unexpected-character-implies-table-voodoo
|
||||
(1,24): unexpected-end-tag-implies-table-voodoo
|
||||
(1,24): eof-in-table
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| <a>
|
||||
| "1"
|
||||
| <p>
|
||||
| <a>
|
||||
| "2"
|
||||
| "3"
|
||||
| <table>
|
||||
|
||||
#data
|
||||
<b><b><a><p></a>
|
||||
#errors
|
||||
(1,3): expected-doctype-but-got-start-tag
|
||||
(1,16): adoption-agency-1.3
|
||||
(1,16): expected-closing-tag-but-got-eof
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| <b>
|
||||
| <b>
|
||||
| <a>
|
||||
| <p>
|
||||
| <a>
|
||||
|
||||
#data
|
||||
<b><a><b><p></a>
|
||||
#errors
|
||||
(1,3): expected-doctype-but-got-start-tag
|
||||
(1,16): adoption-agency-1.3
|
||||
(1,16): expected-closing-tag-but-got-eof
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| <b>
|
||||
| <a>
|
||||
| <b>
|
||||
| <b>
|
||||
| <p>
|
||||
| <a>
|
||||
|
||||
#data
|
||||
<a><b><b><p></a>
|
||||
#errors
|
||||
(1,3): expected-doctype-but-got-start-tag
|
||||
(1,16): adoption-agency-1.3
|
||||
(1,16): expected-closing-tag-but-got-eof
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| <a>
|
||||
| <b>
|
||||
| <b>
|
||||
| <b>
|
||||
| <b>
|
||||
| <p>
|
||||
| <a>
|
||||
|
||||
#data
|
||||
<p>1<s id="A">2<b id="B">3</p>4</s>5</b>
|
||||
#errors
|
||||
(1,3): expected-doctype-but-got-start-tag
|
||||
(1,30): unexpected-end-tag
|
||||
(1,35): adoption-agency-1.3
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| <p>
|
||||
| "1"
|
||||
| <s>
|
||||
| id="A"
|
||||
| "2"
|
||||
| <b>
|
||||
| id="B"
|
||||
| "3"
|
||||
| <s>
|
||||
| id="A"
|
||||
| <b>
|
||||
| id="B"
|
||||
| "4"
|
||||
| <b>
|
||||
| id="B"
|
||||
| "5"
|
||||
|
||||
#data
|
||||
<table><a>1<td>2</td>3</table>
|
||||
#errors
|
||||
(1,7): expected-doctype-but-got-start-tag
|
||||
(1,10): unexpected-start-tag-implies-table-voodoo
|
||||
(1,11): unexpected-character-implies-table-voodoo
|
||||
(1,15): unexpected-cell-in-table-body
|
||||
(1,30): unexpected-implied-end-tag-in-table-view
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| <a>
|
||||
| "1"
|
||||
| <a>
|
||||
| "3"
|
||||
| <table>
|
||||
| <tbody>
|
||||
| <tr>
|
||||
| <td>
|
||||
| "2"
|
||||
|
||||
#data
|
||||
<table>A<td>B</td>C</table>
|
||||
#errors
|
||||
(1,7): expected-doctype-but-got-start-tag
|
||||
(1,8): unexpected-character-implies-table-voodoo
|
||||
(1,12): unexpected-cell-in-table-body
|
||||
(1,22): unexpected-character-implies-table-voodoo
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "AC"
|
||||
| <table>
|
||||
| <tbody>
|
||||
| <tr>
|
||||
| <td>
|
||||
| "B"
|
||||
|
||||
#data
|
||||
<a><svg><tr><input></a>
|
||||
#errors
|
||||
(1,3): expected-doctype-but-got-start-tag
|
||||
(1,23): unexpected-end-tag
|
||||
(1,23): adoption-agency-1.3
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| <a>
|
||||
| <svg svg>
|
||||
| <svg tr>
|
||||
| <svg input>
|
||||
|
||||
#data
|
||||
<div><a><b><div><div><div><div><div><div><div><div><div><div></a>
|
||||
#errors
|
||||
(1,5): expected-doctype-but-got-start-tag
|
||||
(1,65): adoption-agency-1.3
|
||||
(1,65): adoption-agency-1.3
|
||||
(1,65): adoption-agency-1.3
|
||||
(1,65): adoption-agency-1.3
|
||||
(1,65): adoption-agency-1.3
|
||||
(1,65): adoption-agency-1.3
|
||||
(1,65): adoption-agency-1.3
|
||||
(1,65): adoption-agency-1.3
|
||||
(1,65): expected-closing-tag-but-got-eof
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| <div>
|
||||
| <a>
|
||||
| <b>
|
||||
| <b>
|
||||
| <div>
|
||||
| <a>
|
||||
| <div>
|
||||
| <a>
|
||||
| <div>
|
||||
| <a>
|
||||
| <div>
|
||||
| <a>
|
||||
| <div>
|
||||
| <a>
|
||||
| <div>
|
||||
| <a>
|
||||
| <div>
|
||||
| <a>
|
||||
| <div>
|
||||
| <a>
|
||||
| <div>
|
||||
| <div>
|
||||
|
||||
#data
|
||||
<div><a><b><u><i><code><div></a>
|
||||
#errors
|
||||
(1,5): expected-doctype-but-got-start-tag
|
||||
(1,32): adoption-agency-1.3
|
||||
(1,32): expected-closing-tag-but-got-eof
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| <div>
|
||||
| <a>
|
||||
| <b>
|
||||
| <u>
|
||||
| <i>
|
||||
| <code>
|
||||
| <u>
|
||||
| <i>
|
||||
| <code>
|
||||
| <div>
|
||||
| <a>
|
||||
|
||||
#data
|
||||
<b><b><b><b>x</b></b></b></b>y
|
||||
#errors
|
||||
(1,3): expected-doctype-but-got-start-tag
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| <b>
|
||||
| <b>
|
||||
| <b>
|
||||
| <b>
|
||||
| "x"
|
||||
| "y"
|
||||
|
||||
#data
|
||||
<p><b><b><b><b><p>x
|
||||
#errors
|
||||
(1,3): expected-doctype-but-got-start-tag
|
||||
(1,18): unexpected-end-tag
|
||||
(1,19): expected-closing-tag-but-got-eof
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| <p>
|
||||
| <b>
|
||||
| <b>
|
||||
| <b>
|
||||
| <b>
|
||||
| <p>
|
||||
| <b>
|
||||
| <b>
|
||||
| <b>
|
||||
| "x"
|
|
@ -0,0 +1,39 @@
|
|||
#data
|
||||
<b>1<i>2<p>3</b>4
|
||||
#errors
|
||||
(1,3): expected-doctype-but-got-start-tag
|
||||
(1,16): adoption-agency-1.3
|
||||
(1,17): expected-closing-tag-but-got-eof
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| <b>
|
||||
| "1"
|
||||
| <i>
|
||||
| "2"
|
||||
| <i>
|
||||
| <p>
|
||||
| <b>
|
||||
| "3"
|
||||
| "4"
|
||||
|
||||
#data
|
||||
<a><div><style></style><address><a>
|
||||
#errors
|
||||
(1,3): expected-doctype-but-got-start-tag
|
||||
(1,35): unexpected-start-tag-implies-end-tag
|
||||
(1,35): adoption-agency-1.3
|
||||
(1,35): adoption-agency-1.3
|
||||
(1,35): expected-closing-tag-but-got-eof
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| <a>
|
||||
| <div>
|
||||
| <a>
|
||||
| <style>
|
||||
| <address>
|
||||
| <a>
|
||||
| <a>
|
178
tests/wpt/web-platform-tests/tools/html5lib/html5lib/tests/testdata/tree-construction/comments01.dat
vendored
Normal file
178
tests/wpt/web-platform-tests/tools/html5lib/html5lib/tests/testdata/tree-construction/comments01.dat
vendored
Normal file
|
@ -0,0 +1,178 @@
|
|||
#data
|
||||
FOO<!-- BAR -->BAZ
|
||||
#errors
|
||||
(1,3): expected-doctype-but-got-chars
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "FOO"
|
||||
| <!-- BAR -->
|
||||
| "BAZ"
|
||||
|
||||
#data
|
||||
FOO<!-- BAR --!>BAZ
|
||||
#errors
|
||||
(1,3): expected-doctype-but-got-chars
|
||||
(1,15): unexpected-bang-after-double-dash-in-comment
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "FOO"
|
||||
| <!-- BAR -->
|
||||
| "BAZ"
|
||||
|
||||
#data
|
||||
FOO<!-- BAR -- >BAZ
|
||||
#errors
|
||||
(1,3): expected-doctype-but-got-chars
|
||||
(1,15): unexpected-char-in-comment
|
||||
(1,21): eof-in-comment
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "FOO"
|
||||
| <!-- BAR -- >BAZ -->
|
||||
|
||||
#data
|
||||
FOO<!-- BAR -- <QUX> -- MUX -->BAZ
|
||||
#errors
|
||||
(1,3): expected-doctype-but-got-chars
|
||||
(1,15): unexpected-char-in-comment
|
||||
(1,24): unexpected-char-in-comment
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "FOO"
|
||||
| <!-- BAR -- <QUX> -- MUX -->
|
||||
| "BAZ"
|
||||
|
||||
#data
|
||||
FOO<!-- BAR -- <QUX> -- MUX --!>BAZ
|
||||
#errors
|
||||
(1,3): expected-doctype-but-got-chars
|
||||
(1,15): unexpected-char-in-comment
|
||||
(1,24): unexpected-char-in-comment
|
||||
(1,31): unexpected-bang-after-double-dash-in-comment
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "FOO"
|
||||
| <!-- BAR -- <QUX> -- MUX -->
|
||||
| "BAZ"
|
||||
|
||||
#data
|
||||
FOO<!-- BAR -- <QUX> -- MUX -- >BAZ
|
||||
#errors
|
||||
(1,3): expected-doctype-but-got-chars
|
||||
(1,15): unexpected-char-in-comment
|
||||
(1,24): unexpected-char-in-comment
|
||||
(1,31): unexpected-char-in-comment
|
||||
(1,35): eof-in-comment
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "FOO"
|
||||
| <!-- BAR -- <QUX> -- MUX -- >BAZ -->
|
||||
|
||||
#data
|
||||
FOO<!---->BAZ
|
||||
#errors
|
||||
(1,3): expected-doctype-but-got-chars
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "FOO"
|
||||
| <!-- -->
|
||||
| "BAZ"
|
||||
|
||||
#data
|
||||
FOO<!--->BAZ
|
||||
#errors
|
||||
(1,3): expected-doctype-but-got-chars
|
||||
(1,9): incorrect-comment
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "FOO"
|
||||
| <!-- -->
|
||||
| "BAZ"
|
||||
|
||||
#data
|
||||
FOO<!-->BAZ
|
||||
#errors
|
||||
(1,3): expected-doctype-but-got-chars
|
||||
(1,8): incorrect-comment
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "FOO"
|
||||
| <!-- -->
|
||||
| "BAZ"
|
||||
|
||||
#data
|
||||
<?xml version="1.0">Hi
|
||||
#errors
|
||||
(1,1): expected-tag-name-but-got-question-mark
|
||||
(1,22): expected-doctype-but-got-chars
|
||||
#document
|
||||
| <!-- ?xml version="1.0" -->
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "Hi"
|
||||
|
||||
#data
|
||||
<?xml version="1.0">
|
||||
#errors
|
||||
(1,1): expected-tag-name-but-got-question-mark
|
||||
(1,20): expected-doctype-but-got-eof
|
||||
#document
|
||||
| <!-- ?xml version="1.0" -->
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
|
||||
#data
|
||||
<?xml version
|
||||
#errors
|
||||
(1,1): expected-tag-name-but-got-question-mark
|
||||
(1,13): expected-doctype-but-got-eof
|
||||
#document
|
||||
| <!-- ?xml version -->
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
|
||||
#data
|
||||
FOO<!----->BAZ
|
||||
#errors
|
||||
(1,3): expected-doctype-but-got-chars
|
||||
(1,10): unexpected-dash-after-double-dash-in-comment
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "FOO"
|
||||
| <!-- - -->
|
||||
| "BAZ"
|
||||
|
||||
#data
|
||||
<html><!-- comment --><title>Comment before head</title>
|
||||
#errors
|
||||
(1,6): expected-doctype-but-got-start-tag
|
||||
#document
|
||||
| <html>
|
||||
| <!-- comment -->
|
||||
| <head>
|
||||
| <title>
|
||||
| "Comment before head"
|
||||
| <body>
|
424
tests/wpt/web-platform-tests/tools/html5lib/html5lib/tests/testdata/tree-construction/doctype01.dat
vendored
Normal file
424
tests/wpt/web-platform-tests/tools/html5lib/html5lib/tests/testdata/tree-construction/doctype01.dat
vendored
Normal file
|
@ -0,0 +1,424 @@
|
|||
#data
|
||||
<!DOCTYPE html>Hello
|
||||
#errors
|
||||
#document
|
||||
| <!DOCTYPE html>
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "Hello"
|
||||
|
||||
#data
|
||||
<!dOctYpE HtMl>Hello
|
||||
#errors
|
||||
#document
|
||||
| <!DOCTYPE html>
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "Hello"
|
||||
|
||||
#data
|
||||
<!DOCTYPEhtml>Hello
|
||||
#errors
|
||||
(1,9): need-space-after-doctype
|
||||
#document
|
||||
| <!DOCTYPE html>
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "Hello"
|
||||
|
||||
#data
|
||||
<!DOCTYPE>Hello
|
||||
#errors
|
||||
(1,9): need-space-after-doctype
|
||||
(1,10): expected-doctype-name-but-got-right-bracket
|
||||
(1,10): unknown-doctype
|
||||
#document
|
||||
| <!DOCTYPE >
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "Hello"
|
||||
|
||||
#data
|
||||
<!DOCTYPE >Hello
|
||||
#errors
|
||||
(1,11): expected-doctype-name-but-got-right-bracket
|
||||
(1,11): unknown-doctype
|
||||
#document
|
||||
| <!DOCTYPE >
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "Hello"
|
||||
|
||||
#data
|
||||
<!DOCTYPE potato>Hello
|
||||
#errors
|
||||
(1,17): unknown-doctype
|
||||
#document
|
||||
| <!DOCTYPE potato>
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "Hello"
|
||||
|
||||
#data
|
||||
<!DOCTYPE potato >Hello
|
||||
#errors
|
||||
(1,18): unknown-doctype
|
||||
#document
|
||||
| <!DOCTYPE potato>
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "Hello"
|
||||
|
||||
#data
|
||||
<!DOCTYPE potato taco>Hello
|
||||
#errors
|
||||
(1,17): expected-space-or-right-bracket-in-doctype
|
||||
(1,22): unknown-doctype
|
||||
#document
|
||||
| <!DOCTYPE potato>
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "Hello"
|
||||
|
||||
#data
|
||||
<!DOCTYPE potato taco "ddd>Hello
|
||||
#errors
|
||||
(1,17): expected-space-or-right-bracket-in-doctype
|
||||
(1,27): unknown-doctype
|
||||
#document
|
||||
| <!DOCTYPE potato>
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "Hello"
|
||||
|
||||
#data
|
||||
<!DOCTYPE potato sYstEM>Hello
|
||||
#errors
|
||||
(1,24): unexpected-char-in-doctype
|
||||
(1,24): unknown-doctype
|
||||
#document
|
||||
| <!DOCTYPE potato>
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "Hello"
|
||||
|
||||
#data
|
||||
<!DOCTYPE potato sYstEM >Hello
|
||||
#errors
|
||||
(1,28): unexpected-char-in-doctype
|
||||
(1,28): unknown-doctype
|
||||
#document
|
||||
| <!DOCTYPE potato>
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "Hello"
|
||||
|
||||
#data
|
||||
<!DOCTYPE potato sYstEM ggg>Hello
|
||||
#errors
|
||||
(1,34): unexpected-char-in-doctype
|
||||
(1,37): unknown-doctype
|
||||
#document
|
||||
| <!DOCTYPE potato>
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "Hello"
|
||||
|
||||
#data
|
||||
<!DOCTYPE potato SYSTEM taco >Hello
|
||||
#errors
|
||||
(1,25): unexpected-char-in-doctype
|
||||
(1,31): unknown-doctype
|
||||
#document
|
||||
| <!DOCTYPE potato>
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "Hello"
|
||||
|
||||
#data
|
||||
<!DOCTYPE potato SYSTEM 'taco"'>Hello
|
||||
#errors
|
||||
(1,32): unknown-doctype
|
||||
#document
|
||||
| <!DOCTYPE potato "" "taco"">
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "Hello"
|
||||
|
||||
#data
|
||||
<!DOCTYPE potato SYSTEM "taco">Hello
|
||||
#errors
|
||||
(1,31): unknown-doctype
|
||||
#document
|
||||
| <!DOCTYPE potato "" "taco">
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "Hello"
|
||||
|
||||
#data
|
||||
<!DOCTYPE potato SYSTEM "tai'co">Hello
|
||||
#errors
|
||||
(1,33): unknown-doctype
|
||||
#document
|
||||
| <!DOCTYPE potato "" "tai'co">
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "Hello"
|
||||
|
||||
#data
|
||||
<!DOCTYPE potato SYSTEMtaco "ddd">Hello
|
||||
#errors
|
||||
(1,24): unexpected-char-in-doctype
|
||||
(1,34): unknown-doctype
|
||||
#document
|
||||
| <!DOCTYPE potato>
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "Hello"
|
||||
|
||||
#data
|
||||
<!DOCTYPE potato grass SYSTEM taco>Hello
|
||||
#errors
|
||||
(1,17): expected-space-or-right-bracket-in-doctype
|
||||
(1,35): unknown-doctype
|
||||
#document
|
||||
| <!DOCTYPE potato>
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "Hello"
|
||||
|
||||
#data
|
||||
<!DOCTYPE potato pUbLIc>Hello
|
||||
#errors
|
||||
(1,24): unexpected-end-of-doctype
|
||||
(1,24): unknown-doctype
|
||||
#document
|
||||
| <!DOCTYPE potato>
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "Hello"
|
||||
|
||||
#data
|
||||
<!DOCTYPE potato pUbLIc >Hello
|
||||
#errors
|
||||
(1,25): unexpected-end-of-doctype
|
||||
(1,25): unknown-doctype
|
||||
#document
|
||||
| <!DOCTYPE potato>
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "Hello"
|
||||
|
||||
#data
|
||||
<!DOCTYPE potato pUbLIcgoof>Hello
|
||||
#errors
|
||||
(1,24): unexpected-char-in-doctype
|
||||
(1,28): unknown-doctype
|
||||
#document
|
||||
| <!DOCTYPE potato>
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "Hello"
|
||||
|
||||
#data
|
||||
<!DOCTYPE potato PUBLIC goof>Hello
|
||||
#errors
|
||||
(1,25): unexpected-char-in-doctype
|
||||
(1,29): unknown-doctype
|
||||
#document
|
||||
| <!DOCTYPE potato>
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "Hello"
|
||||
|
||||
#data
|
||||
<!DOCTYPE potato PUBLIC "go'of">Hello
|
||||
#errors
|
||||
(1,32): unknown-doctype
|
||||
#document
|
||||
| <!DOCTYPE potato "go'of" "">
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "Hello"
|
||||
|
||||
#data
|
||||
<!DOCTYPE potato PUBLIC 'go'of'>Hello
|
||||
#errors
|
||||
(1,29): unexpected-char-in-doctype
|
||||
(1,32): unknown-doctype
|
||||
#document
|
||||
| <!DOCTYPE potato "go" "">
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "Hello"
|
||||
|
||||
#data
|
||||
<!DOCTYPE potato PUBLIC 'go:hh of' >Hello
|
||||
#errors
|
||||
(1,38): unknown-doctype
|
||||
#document
|
||||
| <!DOCTYPE potato "go:hh of" "">
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "Hello"
|
||||
|
||||
#data
|
||||
<!DOCTYPE potato PUBLIC "W3C-//dfdf" SYSTEM ggg>Hello
|
||||
#errors
|
||||
(1,38): unexpected-char-in-doctype
|
||||
(1,48): unknown-doctype
|
||||
#document
|
||||
| <!DOCTYPE potato "W3C-//dfdf" "">
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "Hello"
|
||||
|
||||
#data
|
||||
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
|
||||
"http://www.w3.org/TR/html4/strict.dtd">Hello
|
||||
#errors
|
||||
#document
|
||||
| <!DOCTYPE html "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "Hello"
|
||||
|
||||
#data
|
||||
<!DOCTYPE ...>Hello
|
||||
#errors
|
||||
(1,14): unknown-doctype
|
||||
#document
|
||||
| <!DOCTYPE ...>
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "Hello"
|
||||
|
||||
#data
|
||||
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
|
||||
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
|
||||
#errors
|
||||
(2,58): unknown-doctype
|
||||
#document
|
||||
| <!DOCTYPE html "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
|
||||
#data
|
||||
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN"
|
||||
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd">
|
||||
#errors
|
||||
(2,54): unknown-doctype
|
||||
#document
|
||||
| <!DOCTYPE html "-//W3C//DTD XHTML 1.0 Frameset//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd">
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
|
||||
#data
|
||||
<!DOCTYPE root-element [SYSTEM OR PUBLIC FPI] "uri" [
|
||||
<!-- internal declarations -->
|
||||
]>
|
||||
#errors
|
||||
(1,23): expected-space-or-right-bracket-in-doctype
|
||||
(2,30): unknown-doctype
|
||||
#document
|
||||
| <!DOCTYPE root-element>
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "]>"
|
||||
|
||||
#data
|
||||
<!DOCTYPE html PUBLIC
|
||||
"-//WAPFORUM//DTD XHTML Mobile 1.0//EN"
|
||||
"http://www.wapforum.org/DTD/xhtml-mobile10.dtd">
|
||||
#errors
|
||||
(3,53): unknown-doctype
|
||||
#document
|
||||
| <!DOCTYPE html "-//WAPFORUM//DTD XHTML Mobile 1.0//EN" "http://www.wapforum.org/DTD/xhtml-mobile10.dtd">
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
|
||||
#data
|
||||
<!DOCTYPE HTML SYSTEM "http://www.w3.org/DTD/HTML4-strict.dtd"><body><b>Mine!</b></body>
|
||||
#errors
|
||||
(1,63): unknown-doctype
|
||||
#document
|
||||
| <!DOCTYPE html "" "http://www.w3.org/DTD/HTML4-strict.dtd">
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| <b>
|
||||
| "Mine!"
|
||||
|
||||
#data
|
||||
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN""http://www.w3.org/TR/html4/strict.dtd">
|
||||
#errors
|
||||
(1,50): unexpected-char-in-doctype
|
||||
#document
|
||||
| <!DOCTYPE html "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
|
||||
#data
|
||||
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"'http://www.w3.org/TR/html4/strict.dtd'>
|
||||
#errors
|
||||
(1,50): unexpected-char-in-doctype
|
||||
#document
|
||||
| <!DOCTYPE html "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
|
||||
#data
|
||||
<!DOCTYPE HTML PUBLIC"-//W3C//DTD HTML 4.01//EN"'http://www.w3.org/TR/html4/strict.dtd'>
|
||||
#errors
|
||||
(1,21): unexpected-char-in-doctype
|
||||
(1,49): unexpected-char-in-doctype
|
||||
#document
|
||||
| <!DOCTYPE html "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
|
||||
#data
|
||||
<!DOCTYPE HTML PUBLIC'-//W3C//DTD HTML 4.01//EN''http://www.w3.org/TR/html4/strict.dtd'>
|
||||
#errors
|
||||
(1,21): unexpected-char-in-doctype
|
||||
(1,49): unexpected-char-in-doctype
|
||||
#document
|
||||
| <!DOCTYPE html "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
Binary file not shown.
723
tests/wpt/web-platform-tests/tools/html5lib/html5lib/tests/testdata/tree-construction/entities01.dat
vendored
Normal file
723
tests/wpt/web-platform-tests/tools/html5lib/html5lib/tests/testdata/tree-construction/entities01.dat
vendored
Normal file
|
@ -0,0 +1,723 @@
|
|||
#data
|
||||
FOO>BAR
|
||||
#errors
|
||||
(1,3): expected-doctype-but-got-chars
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "FOO>BAR"
|
||||
|
||||
#data
|
||||
FOO>BAR
|
||||
#errors
|
||||
(1,3): expected-doctype-but-got-chars
|
||||
(1,6): named-entity-without-semicolon
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "FOO>BAR"
|
||||
|
||||
#data
|
||||
FOO> BAR
|
||||
#errors
|
||||
(1,3): expected-doctype-but-got-chars
|
||||
(1,6): named-entity-without-semicolon
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "FOO> BAR"
|
||||
|
||||
#data
|
||||
FOO>;;BAR
|
||||
#errors
|
||||
(1,3): expected-doctype-but-got-chars
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "FOO>;;BAR"
|
||||
|
||||
#data
|
||||
I'm ¬it; I tell you
|
||||
#errors
|
||||
(1,4): expected-doctype-but-got-chars
|
||||
(1,9): named-entity-without-semicolon
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "I'm ¬it; I tell you"
|
||||
|
||||
#data
|
||||
I'm ∉ I tell you
|
||||
#errors
|
||||
(1,4): expected-doctype-but-got-chars
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "I'm ∉ I tell you"
|
||||
|
||||
#data
|
||||
FOO& BAR
|
||||
#errors
|
||||
(1,3): expected-doctype-but-got-chars
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "FOO& BAR"
|
||||
|
||||
#data
|
||||
FOO&<BAR>
|
||||
#errors
|
||||
(1,3): expected-doctype-but-got-chars
|
||||
(1,9): expected-closing-tag-but-got-eof
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "FOO&"
|
||||
| <bar>
|
||||
|
||||
#data
|
||||
FOO&&&>BAR
|
||||
#errors
|
||||
(1,3): expected-doctype-but-got-chars
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "FOO&&&>BAR"
|
||||
|
||||
#data
|
||||
FOO)BAR
|
||||
#errors
|
||||
(1,3): expected-doctype-but-got-chars
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "FOO)BAR"
|
||||
|
||||
#data
|
||||
FOOABAR
|
||||
#errors
|
||||
(1,3): expected-doctype-but-got-chars
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "FOOABAR"
|
||||
|
||||
#data
|
||||
FOOABAR
|
||||
#errors
|
||||
(1,3): expected-doctype-but-got-chars
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "FOOABAR"
|
||||
|
||||
#data
|
||||
FOO&#BAR
|
||||
#errors
|
||||
(1,3): expected-doctype-but-got-chars
|
||||
(1,5): expected-numeric-entity
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "FOO&#BAR"
|
||||
|
||||
#data
|
||||
FOO&#ZOO
|
||||
#errors
|
||||
(1,3): expected-doctype-but-got-chars
|
||||
(1,5): expected-numeric-entity
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "FOO&#ZOO"
|
||||
|
||||
#data
|
||||
FOOºR
|
||||
#errors
|
||||
(1,3): expected-doctype-but-got-chars
|
||||
(1,7): expected-numeric-entity
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "FOOºR"
|
||||
|
||||
#data
|
||||
FOO&#xZOO
|
||||
#errors
|
||||
(1,3): expected-doctype-but-got-chars
|
||||
(1,6): expected-numeric-entity
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "FOO&#xZOO"
|
||||
|
||||
#data
|
||||
FOO&#XZOO
|
||||
#errors
|
||||
(1,3): expected-doctype-but-got-chars
|
||||
(1,6): expected-numeric-entity
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "FOO&#XZOO"
|
||||
|
||||
#data
|
||||
FOO)BAR
|
||||
#errors
|
||||
(1,3): expected-doctype-but-got-chars
|
||||
(1,7): numeric-entity-without-semicolon
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "FOO)BAR"
|
||||
|
||||
#data
|
||||
FOO䆺R
|
||||
#errors
|
||||
(1,3): expected-doctype-but-got-chars
|
||||
(1,10): numeric-entity-without-semicolon
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "FOO䆺R"
|
||||
|
||||
#data
|
||||
FOOAZOO
|
||||
#errors
|
||||
(1,3): expected-doctype-but-got-chars
|
||||
(1,8): numeric-entity-without-semicolon
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "FOOAZOO"
|
||||
|
||||
#data
|
||||
FOO�ZOO
|
||||
#errors
|
||||
(1,3): expected-doctype-but-got-chars
|
||||
(1,11): illegal-codepoint-for-numeric-entity
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "FOO<4F>ZOO"
|
||||
|
||||
#data
|
||||
FOOxZOO
|
||||
#errors
|
||||
(1,3): expected-doctype-but-got-chars
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "FOOxZOO"
|
||||
|
||||
#data
|
||||
FOOyZOO
|
||||
#errors
|
||||
(1,3): expected-doctype-but-got-chars
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "FOOyZOO"
|
||||
|
||||
#data
|
||||
FOO€ZOO
|
||||
#errors
|
||||
(1,3): expected-doctype-but-got-chars
|
||||
(1,11): illegal-codepoint-for-numeric-entity
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "FOO€ZOO"
|
||||
|
||||
#data
|
||||
FOOZOO
|
||||
#errors
|
||||
(1,3): expected-doctype-but-got-chars
|
||||
(1,11): illegal-codepoint-for-numeric-entity
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "FOOZOO"
|
||||
|
||||
#data
|
||||
FOO‚ZOO
|
||||
#errors
|
||||
(1,3): expected-doctype-but-got-chars
|
||||
(1,11): illegal-codepoint-for-numeric-entity
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "FOO‚ZOO"
|
||||
|
||||
#data
|
||||
FOOƒZOO
|
||||
#errors
|
||||
(1,3): expected-doctype-but-got-chars
|
||||
(1,11): illegal-codepoint-for-numeric-entity
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "FOOƒZOO"
|
||||
|
||||
#data
|
||||
FOO„ZOO
|
||||
#errors
|
||||
(1,3): expected-doctype-but-got-chars
|
||||
(1,11): illegal-codepoint-for-numeric-entity
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "FOO„ZOO"
|
||||
|
||||
#data
|
||||
FOO…ZOO
|
||||
#errors
|
||||
(1,3): expected-doctype-but-got-chars
|
||||
(1,11): illegal-codepoint-for-numeric-entity
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "FOO…ZOO"
|
||||
|
||||
#data
|
||||
FOO†ZOO
|
||||
#errors
|
||||
(1,3): expected-doctype-but-got-chars
|
||||
(1,11): illegal-codepoint-for-numeric-entity
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "FOO†ZOO"
|
||||
|
||||
#data
|
||||
FOO‡ZOO
|
||||
#errors
|
||||
(1,3): expected-doctype-but-got-chars
|
||||
(1,11): illegal-codepoint-for-numeric-entity
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "FOO‡ZOO"
|
||||
|
||||
#data
|
||||
FOOˆZOO
|
||||
#errors
|
||||
(1,3): expected-doctype-but-got-chars
|
||||
(1,11): illegal-codepoint-for-numeric-entity
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "FOOˆZOO"
|
||||
|
||||
#data
|
||||
FOO‰ZOO
|
||||
#errors
|
||||
(1,3): expected-doctype-but-got-chars
|
||||
(1,11): illegal-codepoint-for-numeric-entity
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "FOO‰ZOO"
|
||||
|
||||
#data
|
||||
FOOŠZOO
|
||||
#errors
|
||||
(1,3): expected-doctype-but-got-chars
|
||||
(1,11): illegal-codepoint-for-numeric-entity
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "FOOŠZOO"
|
||||
|
||||
#data
|
||||
FOO‹ZOO
|
||||
#errors
|
||||
(1,3): expected-doctype-but-got-chars
|
||||
(1,11): illegal-codepoint-for-numeric-entity
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "FOO‹ZOO"
|
||||
|
||||
#data
|
||||
FOOŒZOO
|
||||
#errors
|
||||
(1,3): expected-doctype-but-got-chars
|
||||
(1,11): illegal-codepoint-for-numeric-entity
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "FOOŒZOO"
|
||||
|
||||
#data
|
||||
FOOZOO
|
||||
#errors
|
||||
(1,3): expected-doctype-but-got-chars
|
||||
(1,11): illegal-codepoint-for-numeric-entity
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "FOOZOO"
|
||||
|
||||
#data
|
||||
FOOŽZOO
|
||||
#errors
|
||||
(1,3): expected-doctype-but-got-chars
|
||||
(1,11): illegal-codepoint-for-numeric-entity
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "FOOŽZOO"
|
||||
|
||||
#data
|
||||
FOOZOO
|
||||
#errors
|
||||
(1,3): expected-doctype-but-got-chars
|
||||
(1,11): illegal-codepoint-for-numeric-entity
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "FOOZOO"
|
||||
|
||||
#data
|
||||
FOOZOO
|
||||
#errors
|
||||
(1,3): expected-doctype-but-got-chars
|
||||
(1,11): illegal-codepoint-for-numeric-entity
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "FOOZOO"
|
||||
|
||||
#data
|
||||
FOO‘ZOO
|
||||
#errors
|
||||
(1,3): expected-doctype-but-got-chars
|
||||
(1,11): illegal-codepoint-for-numeric-entity
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "FOO‘ZOO"
|
||||
|
||||
#data
|
||||
FOO’ZOO
|
||||
#errors
|
||||
(1,3): expected-doctype-but-got-chars
|
||||
(1,11): illegal-codepoint-for-numeric-entity
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "FOO’ZOO"
|
||||
|
||||
#data
|
||||
FOO“ZOO
|
||||
#errors
|
||||
(1,3): expected-doctype-but-got-chars
|
||||
(1,11): illegal-codepoint-for-numeric-entity
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "FOO“ZOO"
|
||||
|
||||
#data
|
||||
FOO”ZOO
|
||||
#errors
|
||||
(1,3): expected-doctype-but-got-chars
|
||||
(1,11): illegal-codepoint-for-numeric-entity
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "FOO”ZOO"
|
||||
|
||||
#data
|
||||
FOO•ZOO
|
||||
#errors
|
||||
(1,3): expected-doctype-but-got-chars
|
||||
(1,11): illegal-codepoint-for-numeric-entity
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "FOO•ZOO"
|
||||
|
||||
#data
|
||||
FOO–ZOO
|
||||
#errors
|
||||
(1,3): expected-doctype-but-got-chars
|
||||
(1,11): illegal-codepoint-for-numeric-entity
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "FOO–ZOO"
|
||||
|
||||
#data
|
||||
FOO—ZOO
|
||||
#errors
|
||||
(1,3): expected-doctype-but-got-chars
|
||||
(1,11): illegal-codepoint-for-numeric-entity
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "FOO—ZOO"
|
||||
|
||||
#data
|
||||
FOO˜ZOO
|
||||
#errors
|
||||
(1,3): expected-doctype-but-got-chars
|
||||
(1,11): illegal-codepoint-for-numeric-entity
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "FOO˜ZOO"
|
||||
|
||||
#data
|
||||
FOO™ZOO
|
||||
#errors
|
||||
(1,3): expected-doctype-but-got-chars
|
||||
(1,11): illegal-codepoint-for-numeric-entity
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "FOO™ZOO"
|
||||
|
||||
#data
|
||||
FOOšZOO
|
||||
#errors
|
||||
(1,3): expected-doctype-but-got-chars
|
||||
(1,11): illegal-codepoint-for-numeric-entity
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "FOOšZOO"
|
||||
|
||||
#data
|
||||
FOO›ZOO
|
||||
#errors
|
||||
(1,3): expected-doctype-but-got-chars
|
||||
(1,11): illegal-codepoint-for-numeric-entity
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "FOO›ZOO"
|
||||
|
||||
#data
|
||||
FOOœZOO
|
||||
#errors
|
||||
(1,3): expected-doctype-but-got-chars
|
||||
(1,11): illegal-codepoint-for-numeric-entity
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "FOOœZOO"
|
||||
|
||||
#data
|
||||
FOOZOO
|
||||
#errors
|
||||
(1,3): expected-doctype-but-got-chars
|
||||
(1,11): illegal-codepoint-for-numeric-entity
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "FOOZOO"
|
||||
|
||||
#data
|
||||
FOOžZOO
|
||||
#errors
|
||||
(1,3): expected-doctype-but-got-chars
|
||||
(1,11): illegal-codepoint-for-numeric-entity
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "FOOžZOO"
|
||||
|
||||
#data
|
||||
FOOŸZOO
|
||||
#errors
|
||||
(1,3): expected-doctype-but-got-chars
|
||||
(1,11): illegal-codepoint-for-numeric-entity
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "FOOŸZOO"
|
||||
|
||||
#data
|
||||
FOO ZOO
|
||||
#errors
|
||||
(1,3): expected-doctype-but-got-chars
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "FOO ZOO"
|
||||
|
||||
#data
|
||||
FOO퟿ZOO
|
||||
#errors
|
||||
(1,3): expected-doctype-but-got-chars
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "FOOZOO"
|
||||
|
||||
#data
|
||||
FOO�ZOO
|
||||
#errors
|
||||
(1,3): expected-doctype-but-got-chars
|
||||
(1,11): illegal-codepoint-for-numeric-entity
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "FOO<4F>ZOO"
|
||||
|
||||
#data
|
||||
FOO�ZOO
|
||||
#errors
|
||||
(1,3): expected-doctype-but-got-chars
|
||||
(1,11): illegal-codepoint-for-numeric-entity
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "FOO<4F>ZOO"
|
||||
|
||||
#data
|
||||
FOO�ZOO
|
||||
#errors
|
||||
(1,3): expected-doctype-but-got-chars
|
||||
(1,11): illegal-codepoint-for-numeric-entity
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "FOO<4F>ZOO"
|
||||
|
||||
#data
|
||||
FOO�ZOO
|
||||
#errors
|
||||
(1,3): expected-doctype-but-got-chars
|
||||
(1,11): illegal-codepoint-for-numeric-entity
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "FOO<4F>ZOO"
|
||||
|
||||
#data
|
||||
FOOZOO
|
||||
#errors
|
||||
(1,3): expected-doctype-but-got-chars
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "FOOZOO"
|
||||
|
||||
#data
|
||||
FOOZOO
|
||||
#errors
|
||||
(1,3): expected-doctype-but-got-chars
|
||||
(1,13): illegal-codepoint-for-numeric-entity
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "FOOZOO"
|
||||
|
||||
#data
|
||||
FOO􈟔ZOO
|
||||
#errors
|
||||
(1,3): expected-doctype-but-got-chars
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "FOOZOO"
|
||||
|
||||
#data
|
||||
FOOZOO
|
||||
#errors
|
||||
(1,3): expected-doctype-but-got-chars
|
||||
(1,13): illegal-codepoint-for-numeric-entity
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "FOOZOO"
|
||||
|
||||
#data
|
||||
FOO�ZOO
|
||||
#errors
|
||||
(1,3): expected-doctype-but-got-chars
|
||||
(1,13): illegal-codepoint-for-numeric-entity
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "FOO<4F>ZOO"
|
||||
|
||||
#data
|
||||
FOO�ZOO
|
||||
#errors
|
||||
(1,3): expected-doctype-but-got-chars
|
||||
(1,13): illegal-codepoint-for-numeric-entity
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "FOO<4F>ZOO"
|
283
tests/wpt/web-platform-tests/tools/html5lib/html5lib/tests/testdata/tree-construction/entities02.dat
vendored
Normal file
283
tests/wpt/web-platform-tests/tools/html5lib/html5lib/tests/testdata/tree-construction/entities02.dat
vendored
Normal file
|
@ -0,0 +1,283 @@
|
|||
#data
|
||||
<div bar="ZZ>YY"></div>
|
||||
#errors
|
||||
(1,20): expected-doctype-but-got-start-tag
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| <div>
|
||||
| bar="ZZ>YY"
|
||||
|
||||
#data
|
||||
<div bar="ZZ&"></div>
|
||||
#errors
|
||||
(1,15): expected-doctype-but-got-start-tag
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| <div>
|
||||
| bar="ZZ&"
|
||||
|
||||
#data
|
||||
<div bar='ZZ&'></div>
|
||||
#errors
|
||||
(1,15): expected-doctype-but-got-start-tag
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| <div>
|
||||
| bar="ZZ&"
|
||||
|
||||
#data
|
||||
<div bar=ZZ&></div>
|
||||
#errors
|
||||
(1,13): expected-doctype-but-got-start-tag
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| <div>
|
||||
| bar="ZZ&"
|
||||
|
||||
#data
|
||||
<div bar="ZZ>=YY"></div>
|
||||
#errors
|
||||
(1,15): named-entity-without-semicolon
|
||||
(1,20): expected-doctype-but-got-start-tag
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| <div>
|
||||
| bar="ZZ>=YY"
|
||||
|
||||
#data
|
||||
<div bar="ZZ>0YY"></div>
|
||||
#errors
|
||||
(1,20): expected-doctype-but-got-start-tag
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| <div>
|
||||
| bar="ZZ>0YY"
|
||||
|
||||
#data
|
||||
<div bar="ZZ>9YY"></div>
|
||||
#errors
|
||||
(1,20): expected-doctype-but-got-start-tag
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| <div>
|
||||
| bar="ZZ>9YY"
|
||||
|
||||
#data
|
||||
<div bar="ZZ>aYY"></div>
|
||||
#errors
|
||||
(1,20): expected-doctype-but-got-start-tag
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| <div>
|
||||
| bar="ZZ>aYY"
|
||||
|
||||
#data
|
||||
<div bar="ZZ>ZYY"></div>
|
||||
#errors
|
||||
(1,20): expected-doctype-but-got-start-tag
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| <div>
|
||||
| bar="ZZ>ZYY"
|
||||
|
||||
#data
|
||||
<div bar="ZZ> YY"></div>
|
||||
#errors
|
||||
(1,15): named-entity-without-semicolon
|
||||
(1,20): expected-doctype-but-got-start-tag
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| <div>
|
||||
| bar="ZZ> YY"
|
||||
|
||||
#data
|
||||
<div bar="ZZ>"></div>
|
||||
#errors
|
||||
(1,15): named-entity-without-semicolon
|
||||
(1,17): expected-doctype-but-got-start-tag
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| <div>
|
||||
| bar="ZZ>"
|
||||
|
||||
#data
|
||||
<div bar='ZZ>'></div>
|
||||
#errors
|
||||
(1,15): named-entity-without-semicolon
|
||||
(1,17): expected-doctype-but-got-start-tag
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| <div>
|
||||
| bar="ZZ>"
|
||||
|
||||
#data
|
||||
<div bar=ZZ>></div>
|
||||
#errors
|
||||
(1,14): named-entity-without-semicolon
|
||||
(1,15): expected-doctype-but-got-start-tag
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| <div>
|
||||
| bar="ZZ>"
|
||||
|
||||
#data
|
||||
<div bar="ZZ£_id=23"></div>
|
||||
#errors
|
||||
(1,18): named-entity-without-semicolon
|
||||
(1,26): expected-doctype-but-got-start-tag
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| <div>
|
||||
| bar="ZZ£_id=23"
|
||||
|
||||
#data
|
||||
<div bar="ZZ&prod_id=23"></div>
|
||||
#errors
|
||||
(1,25): expected-doctype-but-got-start-tag
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| <div>
|
||||
| bar="ZZ&prod_id=23"
|
||||
|
||||
#data
|
||||
<div bar="ZZ£_id=23"></div>
|
||||
#errors
|
||||
(1,27): expected-doctype-but-got-start-tag
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| <div>
|
||||
| bar="ZZ£_id=23"
|
||||
|
||||
#data
|
||||
<div bar="ZZ∏_id=23"></div>
|
||||
#errors
|
||||
(1,26): expected-doctype-but-got-start-tag
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| <div>
|
||||
| bar="ZZ∏_id=23"
|
||||
|
||||
#data
|
||||
<div bar="ZZ£=23"></div>
|
||||
#errors
|
||||
(1,18): named-entity-without-semicolon
|
||||
(1,23): expected-doctype-but-got-start-tag
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| <div>
|
||||
| bar="ZZ£=23"
|
||||
|
||||
#data
|
||||
<div bar="ZZ&prod=23"></div>
|
||||
#errors
|
||||
(1,22): expected-doctype-but-got-start-tag
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| <div>
|
||||
| bar="ZZ&prod=23"
|
||||
|
||||
#data
|
||||
<div>ZZ£_id=23</div>
|
||||
#errors
|
||||
(1,5): expected-doctype-but-got-start-tag
|
||||
(1,13): named-entity-without-semicolon
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| <div>
|
||||
| "ZZ£_id=23"
|
||||
|
||||
#data
|
||||
<div>ZZ&prod_id=23</div>
|
||||
#errors
|
||||
(1,5): expected-doctype-but-got-start-tag
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| <div>
|
||||
| "ZZ&prod_id=23"
|
||||
|
||||
#data
|
||||
<div>ZZ£_id=23</div>
|
||||
#errors
|
||||
(1,5): expected-doctype-but-got-start-tag
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| <div>
|
||||
| "ZZ£_id=23"
|
||||
|
||||
#data
|
||||
<div>ZZ∏_id=23</div>
|
||||
#errors
|
||||
(1,5): expected-doctype-but-got-start-tag
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| <div>
|
||||
| "ZZ∏_id=23"
|
||||
|
||||
#data
|
||||
<div>ZZ£=23</div>
|
||||
#errors
|
||||
(1,5): expected-doctype-but-got-start-tag
|
||||
(1,13): named-entity-without-semicolon
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| <div>
|
||||
| "ZZ£=23"
|
||||
|
||||
#data
|
||||
<div>ZZ&prod=23</div>
|
||||
#errors
|
||||
(1,5): expected-doctype-but-got-start-tag
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| <div>
|
||||
| "ZZ&prod=23"
|
|
@ -0,0 +1,291 @@
|
|||
#data
|
||||
<div<div>
|
||||
#errors
|
||||
(1,9): expected-doctype-but-got-start-tag
|
||||
(1,9): expected-closing-tag-but-got-eof
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| <div<div>
|
||||
|
||||
#data
|
||||
<div foo<bar=''>
|
||||
#errors
|
||||
(1,9): invalid-character-in-attribute-name
|
||||
(1,16): expected-doctype-but-got-start-tag
|
||||
(1,16): expected-closing-tag-but-got-eof
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| <div>
|
||||
| foo<bar=""
|
||||
|
||||
#data
|
||||
<div foo=`bar`>
|
||||
#errors
|
||||
(1,10): equals-in-unquoted-attribute-value
|
||||
(1,14): unexpected-character-in-unquoted-attribute-value
|
||||
(1,15): expected-doctype-but-got-start-tag
|
||||
(1,15): expected-closing-tag-but-got-eof
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| <div>
|
||||
| foo="`bar`"
|
||||
|
||||
#data
|
||||
<div \"foo=''>
|
||||
#errors
|
||||
(1,7): invalid-character-in-attribute-name
|
||||
(1,14): expected-doctype-but-got-start-tag
|
||||
(1,14): expected-closing-tag-but-got-eof
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| <div>
|
||||
| \"foo=""
|
||||
|
||||
#data
|
||||
<a href='\nbar'></a>
|
||||
#errors
|
||||
(1,16): expected-doctype-but-got-start-tag
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| <a>
|
||||
| href="\nbar"
|
||||
|
||||
#data
|
||||
<!DOCTYPE html>
|
||||
#errors
|
||||
#document
|
||||
| <!DOCTYPE html>
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
|
||||
#data
|
||||
⟨⟩
|
||||
#errors
|
||||
(1,6): expected-doctype-but-got-chars
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "⟨⟩"
|
||||
|
||||
#data
|
||||
'
|
||||
#errors
|
||||
(1,6): expected-doctype-but-got-chars
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "'"
|
||||
|
||||
#data
|
||||
ⅈ
|
||||
#errors
|
||||
(1,12): expected-doctype-but-got-chars
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "ⅈ"
|
||||
|
||||
#data
|
||||
𝕂
|
||||
#errors
|
||||
(1,6): expected-doctype-but-got-chars
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "𝕂"
|
||||
|
||||
#data
|
||||
∉
|
||||
#errors
|
||||
(1,9): expected-doctype-but-got-chars
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "∉"
|
||||
|
||||
#data
|
||||
<?import namespace="foo" implementation="#bar">
|
||||
#errors
|
||||
(1,1): expected-tag-name-but-got-question-mark
|
||||
(1,47): expected-doctype-but-got-eof
|
||||
#document
|
||||
| <!-- ?import namespace="foo" implementation="#bar" -->
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
|
||||
#data
|
||||
<!--foo--bar-->
|
||||
#errors
|
||||
(1,10): unexpected-char-in-comment
|
||||
(1,15): expected-doctype-but-got-eof
|
||||
#document
|
||||
| <!-- foo--bar -->
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
|
||||
#data
|
||||
<![CDATA[x]]>
|
||||
#errors
|
||||
(1,2): expected-dashes-or-doctype
|
||||
(1,13): expected-doctype-but-got-eof
|
||||
#document
|
||||
| <!-- [CDATA[x]] -->
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
|
||||
#data
|
||||
<textarea><!--</textarea>--></textarea>
|
||||
#errors
|
||||
(1,10): expected-doctype-but-got-start-tag
|
||||
(1,39): unexpected-end-tag
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| <textarea>
|
||||
| "<!--"
|
||||
| "-->"
|
||||
|
||||
#data
|
||||
<textarea><!--</textarea>-->
|
||||
#errors
|
||||
(1,10): expected-doctype-but-got-start-tag
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| <textarea>
|
||||
| "<!--"
|
||||
| "-->"
|
||||
|
||||
#data
|
||||
<style><!--</style>--></style>
|
||||
#errors
|
||||
(1,7): expected-doctype-but-got-start-tag
|
||||
(1,30): unexpected-end-tag
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <style>
|
||||
| "<!--"
|
||||
| <body>
|
||||
| "-->"
|
||||
|
||||
#data
|
||||
<style><!--</style>-->
|
||||
#errors
|
||||
(1,7): expected-doctype-but-got-start-tag
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <style>
|
||||
| "<!--"
|
||||
| <body>
|
||||
| "-->"
|
||||
|
||||
#data
|
||||
<ul><li>A </li> <li>B</li></ul>
|
||||
#errors
|
||||
(1,4): expected-doctype-but-got-start-tag
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| <ul>
|
||||
| <li>
|
||||
| "A "
|
||||
| " "
|
||||
| <li>
|
||||
| "B"
|
||||
|
||||
#data
|
||||
<table><form><input type=hidden><input></form><div></div></table>
|
||||
#errors
|
||||
(1,7): expected-doctype-but-got-start-tag
|
||||
(1,13): unexpected-form-in-table
|
||||
(1,32): unexpected-hidden-input-in-table
|
||||
(1,39): unexpected-start-tag-implies-table-voodoo
|
||||
(1,46): unexpected-end-tag-implies-table-voodoo
|
||||
(1,46): unexpected-end-tag
|
||||
(1,51): unexpected-start-tag-implies-table-voodoo
|
||||
(1,57): unexpected-end-tag-implies-table-voodoo
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| <input>
|
||||
| <div>
|
||||
| <table>
|
||||
| <form>
|
||||
| <input>
|
||||
| type="hidden"
|
||||
|
||||
#data
|
||||
<i>A<b>B<p></i>C</b>D
|
||||
#errors
|
||||
(1,3): expected-doctype-but-got-start-tag
|
||||
(1,15): adoption-agency-1.3
|
||||
(1,20): adoption-agency-1.3
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| <i>
|
||||
| "A"
|
||||
| <b>
|
||||
| "B"
|
||||
| <b>
|
||||
| <p>
|
||||
| <b>
|
||||
| <i>
|
||||
| "C"
|
||||
| "D"
|
||||
|
||||
#data
|
||||
<div></div>
|
||||
#errors
|
||||
(1,5): expected-doctype-but-got-start-tag
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| <div>
|
||||
|
||||
#data
|
||||
<svg></svg>
|
||||
#errors
|
||||
(1,5): expected-doctype-but-got-start-tag
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| <svg svg>
|
||||
|
||||
#data
|
||||
<math></math>
|
||||
#errors
|
||||
(1,6): expected-doctype-but-got-start-tag
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| <math math>
|
|
@ -0,0 +1,54 @@
|
|||
#data
|
||||
<button>1</foo>
|
||||
#errors
|
||||
(1,8): expected-doctype-but-got-start-tag
|
||||
(1,15): unexpected-end-tag
|
||||
(1,15): expected-closing-tag-but-got-eof
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| <button>
|
||||
| "1"
|
||||
|
||||
#data
|
||||
<foo>1<p>2</foo>
|
||||
#errors
|
||||
(1,5): expected-doctype-but-got-start-tag
|
||||
(1,16): unexpected-end-tag
|
||||
(1,16): expected-closing-tag-but-got-eof
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| <foo>
|
||||
| "1"
|
||||
| <p>
|
||||
| "2"
|
||||
|
||||
#data
|
||||
<dd>1</foo>
|
||||
#errors
|
||||
(1,4): expected-doctype-but-got-start-tag
|
||||
(1,11): unexpected-end-tag
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| <dd>
|
||||
| "1"
|
||||
|
||||
#data
|
||||
<foo>1<dd>2</foo>
|
||||
#errors
|
||||
(1,5): expected-doctype-but-got-start-tag
|
||||
(1,17): unexpected-end-tag
|
||||
(1,17): expected-closing-tag-but-got-eof
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| <foo>
|
||||
| "1"
|
||||
| <dd>
|
||||
| "2"
|
|
@ -0,0 +1,47 @@
|
|||
#data
|
||||
<isindex>
|
||||
#errors
|
||||
(1,9): expected-doctype-but-got-start-tag
|
||||
(1,9): deprecated-tag
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| <form>
|
||||
| <hr>
|
||||
| <label>
|
||||
| "This is a searchable index. Enter search keywords: "
|
||||
| <input>
|
||||
| name="isindex"
|
||||
| <hr>
|
||||
|
||||
#data
|
||||
<isindex name="A" action="B" prompt="C" foo="D">
|
||||
#errors
|
||||
(1,48): expected-doctype-but-got-start-tag
|
||||
(1,48): deprecated-tag
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| <form>
|
||||
| action="B"
|
||||
| <hr>
|
||||
| <label>
|
||||
| "C"
|
||||
| <input>
|
||||
| foo="D"
|
||||
| name="isindex"
|
||||
| <hr>
|
||||
|
||||
#data
|
||||
<form><isindex>
|
||||
#errors
|
||||
(1,6): expected-doctype-but-got-start-tag
|
||||
(1,15): deprecated-tag
|
||||
(1,15): expected-closing-tag-but-got-eof
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| <form>
|
|
@ -0,0 +1,46 @@
|
|||
#data
|
||||
<!doctype html><p>foo<main>bar<p>baz
|
||||
#errors
|
||||
(1,36): expected-closing-tag-but-got-eof
|
||||
#document
|
||||
| <!DOCTYPE html>
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| <p>
|
||||
| "foo"
|
||||
| <main>
|
||||
| "bar"
|
||||
| <p>
|
||||
| "baz"
|
||||
|
||||
#data
|
||||
<!doctype html><main><p>foo</main>bar
|
||||
#errors
|
||||
#document
|
||||
| <!DOCTYPE html>
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| <main>
|
||||
| <p>
|
||||
| "foo"
|
||||
| "bar"
|
||||
|
||||
#data
|
||||
<!DOCTYPE html>xxx<svg><x><g><a><main><b>
|
||||
#errors
|
||||
* (1,42) unexpected HTML-like start tag token in foreign content
|
||||
* (1,42) unexpected end of file
|
||||
#document
|
||||
| <!DOCTYPE html>
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "xxx"
|
||||
| <svg svg>
|
||||
| <svg x>
|
||||
| <svg g>
|
||||
| <svg a>
|
||||
| <svg main>
|
||||
| <b>
|
Binary file not shown.
|
@ -0,0 +1,46 @@
|
|||
#data
|
||||
<input type="hidden"><frameset>
|
||||
#errors
|
||||
(1,21): expected-doctype-but-got-start-tag
|
||||
(1,31): unexpected-start-tag
|
||||
(1,31): eof-in-frameset
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <frameset>
|
||||
|
||||
#data
|
||||
<!DOCTYPE html><table><caption><svg>foo</table>bar
|
||||
#errors
|
||||
(1,47): unexpected-end-tag
|
||||
(1,47): end-table-tag-in-caption
|
||||
#document
|
||||
| <!DOCTYPE html>
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| <table>
|
||||
| <caption>
|
||||
| <svg svg>
|
||||
| "foo"
|
||||
| "bar"
|
||||
|
||||
#data
|
||||
<table><tr><td><svg><desc><td></desc><circle>
|
||||
#errors
|
||||
(1,7): expected-doctype-but-got-start-tag
|
||||
(1,30): unexpected-cell-end-tag
|
||||
(1,37): unexpected-end-tag
|
||||
(1,45): expected-closing-tag-but-got-eof
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| <table>
|
||||
| <tbody>
|
||||
| <tr>
|
||||
| <td>
|
||||
| <svg svg>
|
||||
| <svg desc>
|
||||
| <td>
|
||||
| <circle>
|
Binary file not shown.
298
tests/wpt/web-platform-tests/tools/html5lib/html5lib/tests/testdata/tree-construction/ruby.dat
vendored
Normal file
298
tests/wpt/web-platform-tests/tools/html5lib/html5lib/tests/testdata/tree-construction/ruby.dat
vendored
Normal file
|
@ -0,0 +1,298 @@
|
|||
#data
|
||||
<html><ruby>a<rb>b<rb></ruby></html>
|
||||
#errors
|
||||
(1,6): expected-doctype-but-got-start-tag
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| <ruby>
|
||||
| "a"
|
||||
| <rb>
|
||||
| "b"
|
||||
| <rb>
|
||||
|
||||
#data
|
||||
<html><ruby>a<rb>b<rt></ruby></html>
|
||||
#errors
|
||||
(1,6): expected-doctype-but-got-start-tag
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| <ruby>
|
||||
| "a"
|
||||
| <rb>
|
||||
| "b"
|
||||
| <rt>
|
||||
|
||||
#data
|
||||
<html><ruby>a<rb>b<rtc></ruby></html>
|
||||
#errors
|
||||
(1,6): expected-doctype-but-got-start-tag
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| <ruby>
|
||||
| "a"
|
||||
| <rb>
|
||||
| "b"
|
||||
| <rtc>
|
||||
|
||||
#data
|
||||
<html><ruby>a<rb>b<rp></ruby></html>
|
||||
#errors
|
||||
(1,6): expected-doctype-but-got-start-tag
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| <ruby>
|
||||
| "a"
|
||||
| <rb>
|
||||
| "b"
|
||||
| <rp>
|
||||
|
||||
#data
|
||||
<html><ruby>a<rb>b<span></ruby></html>
|
||||
#errors
|
||||
(1,6): expected-doctype-but-got-start-tag
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| <ruby>
|
||||
| "a"
|
||||
| <rb>
|
||||
| "b"
|
||||
| <span>
|
||||
|
||||
#data
|
||||
<html><ruby>a<rt>b<rb></ruby></html>
|
||||
#errors
|
||||
(1,6): expected-doctype-but-got-start-tag
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| <ruby>
|
||||
| "a"
|
||||
| <rt>
|
||||
| "b"
|
||||
| <rb>
|
||||
|
||||
#data
|
||||
<html><ruby>a<rt>b<rt></ruby></html>
|
||||
#errors
|
||||
(1,6): expected-doctype-but-got-start-tag
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| <ruby>
|
||||
| "a"
|
||||
| <rt>
|
||||
| "b"
|
||||
| <rt>
|
||||
|
||||
#data
|
||||
<html><ruby>a<rt>b<rtc></ruby></html>
|
||||
#errors
|
||||
(1,6): expected-doctype-but-got-start-tag
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| <ruby>
|
||||
| "a"
|
||||
| <rt>
|
||||
| "b"
|
||||
| <rtc>
|
||||
|
||||
#data
|
||||
<html><ruby>a<rt>b<rp></ruby></html>
|
||||
#errors
|
||||
(1,6): expected-doctype-but-got-start-tag
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| <ruby>
|
||||
| "a"
|
||||
| <rt>
|
||||
| "b"
|
||||
| <rp>
|
||||
|
||||
#data
|
||||
<html><ruby>a<rt>b<span></ruby></html>
|
||||
#errors
|
||||
(1,6): expected-doctype-but-got-start-tag
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| <ruby>
|
||||
| "a"
|
||||
| <rt>
|
||||
| "b"
|
||||
| <span>
|
||||
|
||||
#data
|
||||
<html><ruby>a<rtc>b<rb></ruby></html>
|
||||
#errors
|
||||
(1,6): expected-doctype-but-got-start-tag
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| <ruby>
|
||||
| "a"
|
||||
| <rtc>
|
||||
| "b"
|
||||
| <rb>
|
||||
|
||||
#data
|
||||
<html><ruby>a<rtc>b<rt>c<rt>d</ruby></html>
|
||||
#errors
|
||||
(1,6): expected-doctype-but-got-start-tag
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| <ruby>
|
||||
| "a"
|
||||
| <rtc>
|
||||
| "b"
|
||||
| <rt>
|
||||
| "c"
|
||||
| <rt>
|
||||
| "d"
|
||||
|
||||
#data
|
||||
<html><ruby>a<rtc>b<rtc></ruby></html>
|
||||
#errors
|
||||
(1,6): expected-doctype-but-got-start-tag
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| <ruby>
|
||||
| "a"
|
||||
| <rtc>
|
||||
| "b"
|
||||
| <rtc>
|
||||
|
||||
#data
|
||||
<html><ruby>a<rtc>b<rp></ruby></html>
|
||||
#errors
|
||||
(1,6): expected-doctype-but-got-start-tag
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| <ruby>
|
||||
| "a"
|
||||
| <rtc>
|
||||
| "b"
|
||||
| <rp>
|
||||
|
||||
#data
|
||||
<html><ruby>a<rtc>b<span></ruby></html>
|
||||
#errors
|
||||
(1,6): expected-doctype-but-got-start-tag
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| <ruby>
|
||||
| "a"
|
||||
| <rtc>
|
||||
| "b"
|
||||
| <span>
|
||||
|
||||
#data
|
||||
<html><ruby>a<rp>b<rb></ruby></html>
|
||||
#errors
|
||||
(1,6): expected-doctype-but-got-start-tag
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| <ruby>
|
||||
| "a"
|
||||
| <rp>
|
||||
| "b"
|
||||
| <rb>
|
||||
|
||||
#data
|
||||
<html><ruby>a<rp>b<rt></ruby></html>
|
||||
#errors
|
||||
(1,6): expected-doctype-but-got-start-tag
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| <ruby>
|
||||
| "a"
|
||||
| <rp>
|
||||
| "b"
|
||||
| <rt>
|
||||
|
||||
#data
|
||||
<html><ruby>a<rp>b<rtc></ruby></html>
|
||||
#errors
|
||||
(1,6): expected-doctype-but-got-start-tag
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| <ruby>
|
||||
| "a"
|
||||
| <rp>
|
||||
| "b"
|
||||
| <rtc>
|
||||
|
||||
#data
|
||||
<html><ruby>a<rp>b<rp></ruby></html>
|
||||
#errors
|
||||
(1,6): expected-doctype-but-got-start-tag
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| <ruby>
|
||||
| "a"
|
||||
| <rp>
|
||||
| "b"
|
||||
| <rp>
|
||||
|
||||
#data
|
||||
<html><ruby>a<rp>b<span></ruby></html>
|
||||
#errors
|
||||
(1,6): expected-doctype-but-got-start-tag
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| <ruby>
|
||||
| "a"
|
||||
| <rp>
|
||||
| "b"
|
||||
| <span>
|
||||
|
||||
#data
|
||||
<html><ruby><rtc><ruby>a<rb>b<rt></ruby></ruby></html>
|
||||
#errors
|
||||
(1,6): expected-doctype-but-got-start-tag
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| <ruby>
|
||||
| <rtc>
|
||||
| <ruby>
|
||||
| "a"
|
||||
| <rb>
|
||||
| "b"
|
||||
| <rt>
|
|
@ -0,0 +1,365 @@
|
|||
#data
|
||||
FOO<script>'Hello'</script>BAR
|
||||
#errors
|
||||
(1,3): expected-doctype-but-got-chars
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "FOO"
|
||||
| <script>
|
||||
| "'Hello'"
|
||||
| "BAR"
|
||||
|
||||
#data
|
||||
FOO<script></script>BAR
|
||||
#errors
|
||||
(1,3): expected-doctype-but-got-chars
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "FOO"
|
||||
| <script>
|
||||
| "BAR"
|
||||
|
||||
#data
|
||||
FOO<script></script >BAR
|
||||
#errors
|
||||
(1,3): expected-doctype-but-got-chars
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "FOO"
|
||||
| <script>
|
||||
| "BAR"
|
||||
|
||||
#data
|
||||
FOO<script></script/>BAR
|
||||
#errors
|
||||
(1,3): expected-doctype-but-got-chars
|
||||
(1,21): self-closing-flag-on-end-tag
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "FOO"
|
||||
| <script>
|
||||
| "BAR"
|
||||
|
||||
#data
|
||||
FOO<script></script/ >BAR
|
||||
#errors
|
||||
(1,3): expected-doctype-but-got-chars
|
||||
(1,20): unexpected-character-after-solidus-in-tag
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "FOO"
|
||||
| <script>
|
||||
| "BAR"
|
||||
|
||||
#data
|
||||
FOO<script type="text/plain"></scriptx>BAR
|
||||
#errors
|
||||
(1,3): expected-doctype-but-got-chars
|
||||
(1,42): expected-named-closing-tag-but-got-eof
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "FOO"
|
||||
| <script>
|
||||
| type="text/plain"
|
||||
| "</scriptx>BAR"
|
||||
|
||||
#data
|
||||
FOO<script></script foo=">" dd>BAR
|
||||
#errors
|
||||
(1,3): expected-doctype-but-got-chars
|
||||
(1,31): attributes-in-end-tag
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "FOO"
|
||||
| <script>
|
||||
| "BAR"
|
||||
|
||||
#data
|
||||
FOO<script>'<'</script>BAR
|
||||
#errors
|
||||
(1,3): expected-doctype-but-got-chars
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "FOO"
|
||||
| <script>
|
||||
| "'<'"
|
||||
| "BAR"
|
||||
|
||||
#data
|
||||
FOO<script>'<!'</script>BAR
|
||||
#errors
|
||||
(1,3): expected-doctype-but-got-chars
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "FOO"
|
||||
| <script>
|
||||
| "'<!'"
|
||||
| "BAR"
|
||||
|
||||
#data
|
||||
FOO<script>'<!-'</script>BAR
|
||||
#errors
|
||||
(1,3): expected-doctype-but-got-chars
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "FOO"
|
||||
| <script>
|
||||
| "'<!-'"
|
||||
| "BAR"
|
||||
|
||||
#data
|
||||
FOO<script>'<!--'</script>BAR
|
||||
#errors
|
||||
(1,3): expected-doctype-but-got-chars
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "FOO"
|
||||
| <script>
|
||||
| "'<!--'"
|
||||
| "BAR"
|
||||
|
||||
#data
|
||||
FOO<script>'<!---'</script>BAR
|
||||
#errors
|
||||
(1,3): expected-doctype-but-got-chars
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "FOO"
|
||||
| <script>
|
||||
| "'<!---'"
|
||||
| "BAR"
|
||||
|
||||
#data
|
||||
FOO<script>'<!-->'</script>BAR
|
||||
#errors
|
||||
(1,3): expected-doctype-but-got-chars
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "FOO"
|
||||
| <script>
|
||||
| "'<!-->'"
|
||||
| "BAR"
|
||||
|
||||
#data
|
||||
FOO<script>'<!-->'</script>BAR
|
||||
#errors
|
||||
(1,3): expected-doctype-but-got-chars
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "FOO"
|
||||
| <script>
|
||||
| "'<!-->'"
|
||||
| "BAR"
|
||||
|
||||
#data
|
||||
FOO<script>'<!-- potato'</script>BAR
|
||||
#errors
|
||||
(1,3): expected-doctype-but-got-chars
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "FOO"
|
||||
| <script>
|
||||
| "'<!-- potato'"
|
||||
| "BAR"
|
||||
|
||||
#data
|
||||
FOO<script>'<!-- <sCrIpt'</script>BAR
|
||||
#errors
|
||||
(1,3): expected-doctype-but-got-chars
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "FOO"
|
||||
| <script>
|
||||
| "'<!-- <sCrIpt'"
|
||||
| "BAR"
|
||||
|
||||
#data
|
||||
FOO<script type="text/plain">'<!-- <sCrIpt>'</script>BAR
|
||||
#errors
|
||||
(1,3): expected-doctype-but-got-chars
|
||||
(1,56): expected-script-data-but-got-eof
|
||||
(1,56): expected-named-closing-tag-but-got-eof
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "FOO"
|
||||
| <script>
|
||||
| type="text/plain"
|
||||
| "'<!-- <sCrIpt>'</script>BAR"
|
||||
|
||||
#data
|
||||
FOO<script type="text/plain">'<!-- <sCrIpt> -'</script>BAR
|
||||
#errors
|
||||
(1,3): expected-doctype-but-got-chars
|
||||
(1,58): expected-script-data-but-got-eof
|
||||
(1,58): expected-named-closing-tag-but-got-eof
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "FOO"
|
||||
| <script>
|
||||
| type="text/plain"
|
||||
| "'<!-- <sCrIpt> -'</script>BAR"
|
||||
|
||||
#data
|
||||
FOO<script type="text/plain">'<!-- <sCrIpt> --'</script>BAR
|
||||
#errors
|
||||
(1,3): expected-doctype-but-got-chars
|
||||
(1,59): expected-script-data-but-got-eof
|
||||
(1,59): expected-named-closing-tag-but-got-eof
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "FOO"
|
||||
| <script>
|
||||
| type="text/plain"
|
||||
| "'<!-- <sCrIpt> --'</script>BAR"
|
||||
|
||||
#data
|
||||
FOO<script>'<!-- <sCrIpt> -->'</script>BAR
|
||||
#errors
|
||||
(1,3): expected-doctype-but-got-chars
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "FOO"
|
||||
| <script>
|
||||
| "'<!-- <sCrIpt> -->'"
|
||||
| "BAR"
|
||||
|
||||
#data
|
||||
FOO<script type="text/plain">'<!-- <sCrIpt> --!>'</script>BAR
|
||||
#errors
|
||||
(1,3): expected-doctype-but-got-chars
|
||||
(1,61): expected-script-data-but-got-eof
|
||||
(1,61): expected-named-closing-tag-but-got-eof
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "FOO"
|
||||
| <script>
|
||||
| type="text/plain"
|
||||
| "'<!-- <sCrIpt> --!>'</script>BAR"
|
||||
|
||||
#data
|
||||
FOO<script type="text/plain">'<!-- <sCrIpt> -- >'</script>BAR
|
||||
#errors
|
||||
(1,3): expected-doctype-but-got-chars
|
||||
(1,61): expected-script-data-but-got-eof
|
||||
(1,61): expected-named-closing-tag-but-got-eof
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "FOO"
|
||||
| <script>
|
||||
| type="text/plain"
|
||||
| "'<!-- <sCrIpt> -- >'</script>BAR"
|
||||
|
||||
#data
|
||||
FOO<script type="text/plain">'<!-- <sCrIpt '</script>BAR
|
||||
#errors
|
||||
(1,3): expected-doctype-but-got-chars
|
||||
(1,56): expected-script-data-but-got-eof
|
||||
(1,56): expected-named-closing-tag-but-got-eof
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "FOO"
|
||||
| <script>
|
||||
| type="text/plain"
|
||||
| "'<!-- <sCrIpt '</script>BAR"
|
||||
|
||||
#data
|
||||
FOO<script type="text/plain">'<!-- <sCrIpt/'</script>BAR
|
||||
#errors
|
||||
(1,3): expected-doctype-but-got-chars
|
||||
(1,56): expected-script-data-but-got-eof
|
||||
(1,56): expected-named-closing-tag-but-got-eof
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "FOO"
|
||||
| <script>
|
||||
| type="text/plain"
|
||||
| "'<!-- <sCrIpt/'</script>BAR"
|
||||
|
||||
#data
|
||||
FOO<script type="text/plain">'<!-- <sCrIpt\'</script>BAR
|
||||
#errors
|
||||
(1,3): expected-doctype-but-got-chars
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "FOO"
|
||||
| <script>
|
||||
| type="text/plain"
|
||||
| "'<!-- <sCrIpt\'"
|
||||
| "BAR"
|
||||
|
||||
#data
|
||||
FOO<script type="text/plain">'<!-- <sCrIpt/'</script>BAR</script>QUX
|
||||
#errors
|
||||
(1,3): expected-doctype-but-got-chars
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "FOO"
|
||||
| <script>
|
||||
| type="text/plain"
|
||||
| "'<!-- <sCrIpt/'</script>BAR"
|
||||
| "QUX"
|
||||
|
||||
#data
|
||||
FOO<script><!--<script>-></script>--></script>QUX
|
||||
#errors
|
||||
(1,3): expected-doctype-but-got-chars
|
||||
#document
|
||||
| <html>
|
||||
| <head>
|
||||
| <body>
|
||||
| "FOO"
|
||||
| <script>
|
||||
| "<!--<script>-></script>-->"
|
||||
| "QUX"
|
Some files were not shown because too many files have changed in this diff Show more
Loading…
Add table
Add a link
Reference in a new issue