Handle nonmappable code points in Document::encoding_parse_a_url (#37541)

This is a followup to https://github.com/servo/servo/pull/33825. Using
`Encoder::encode` introduced a subtle bug: That function will silently
replace nonmappable code points (such as `㐀` in euc-jp). The url spec
however expects nonmappable characters to be treated differently. There
is actually an open bug in the `rust-url` repo about this:
https://github.com/servo/rust-url/issues/649, with the conclusion
apparently being that this should not be implemented by the url crate
itself.

Gecko implementation of the equivalent algorithm for reference:
https://searchfox.org/mozilla-central/rev/d52edf7ea4236446e118a2edc815023c5479663f/netwerk/base/nsStandardURL.cpp#116-172.

Testing: More web platform tests pass

Part of https://github.com/servo/servo/issues/5601

---------

Signed-off-by: Simon Wülker <simon.wuelker@arcor.de>
This commit is contained in:
Simon Wülker 2025-06-19 12:14:45 +02:00 committed by GitHub
parent 3a54ddd034
commit a27c9ee691
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
14 changed files with 65 additions and 294148 deletions

View file

@ -6,6 +6,7 @@
#![crate_name = "servo_url"]
#![crate_type = "rlib"]
pub mod encoding;
pub mod origin;
use std::collections::hash_map::DefaultHasher;