mach: Make test-tidy line length check Unicode-aware (#38335)

Currently, our implementation for each line-checking function reads the
file as bytes, so we need to properly decode each line to UTF-8 before
evaluating it. This ensures it is counted as a string and not as bytes

Testing: I tested by changing the comment like the issue above and it
not give an error
Fixes: #38237

Signed-off-by: Jerens Lensun <jerensslensun@gmail.com>
This commit is contained in:
Jerens Lensun 2025-07-29 22:46:32 +08:00 committed by GitHub
parent eda83564f3
commit c738bbc41c
No known key found for this signature in database
GPG key ID: B5690EEEBB952194

View file

@ -319,7 +319,7 @@ def check_length(file_name: str, idx: int, line: bytes) -> Iterator[tuple[int, s
# Prefer shorter lines when shell scripting.
max_length = 80 if file_name.endswith(".sh") else 120
if len(line.rstrip(b"\n")) > max_length and not is_unsplittable(file_name, line):
if len(line.decode("utf-8").rstrip("\n")) > max_length and not is_unsplittable(file_name, line):
yield (idx + 1, "Line is longer than %d characters" % max_length)