-
Notifications
You must be signed in to change notification settings - Fork 146
Improve Unicode script (#881) #883
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
7e20b67
to
a2d77f1
Compare
str += node.textContent; | ||
} | ||
return str; | ||
return trimText(str); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calling trimText
here introduces a side effect: a few pages (e.g., UTS37) separate authors using lines, and trimText
replaces all white spaces and line terminators with a single white space. The parseEditor
function is then unable to split authors as \n
no longer matches anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. The reason for the change is that some documents have inline coloring via <span>
s.
For example TR35-3, it has a version set to:
1<span>.</span><span class="changed">2 (draft 5)</span>
The original version caused it to become:
1 . 2 (draft 5)
I will think of a better implementation that works for all cases.
a2d77f1
to
4185bc2
Compare
Overhauled the script to extract all available revisions for each of the standards, so it is possible to link to a specific one. Now also the main URL for all Unicode standards now point to the latest live on their website.
4185bc2
to
792cf8e
Compare
The update drops |
I can add that no problem, but I'm not sure if it's a good idea in general for versioned entries when not referencing any version in particular? If they want to explicitely state the last version they checked for compatibility alongside its date, they can now reference a particular version. For a non-specific version, however, the date would cause the documents referring to it to also change the date any time they're recompiled, even if the writer has not actually checked the newer version to be fully compatible with the documentation. For example, UTS46-33 made some changes in the processing that were not covered in the WHATWG URL specs at the time, and needed some changes (whatwg/url#836). With the date there, any recompilations of the WHATWG URL document between the new UTS46-33 and ammending of the WHATWG URL standard, would cause the date to be also updated, incorrectly implying UTS46-33 changes were already taken into account. IMO if they want to specify a non-specific version with a check date, that should be manually stated by the writer, as the compilation time will be later than the time they've checked it, and the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the argument goes both ways. That is, without any mention of date, you also imply that the latest version you're going to get when you retrieve the URL was the one taken into account. That's what you get when you choose to reference "the latest version of a spec". With a date, you could at least theoretically speaking spot the fact that the document you're referencing has changed when you re-build your spec.
That said, it seems wrong to use the latest URL along with a date and then, when you click on the link, you actually get a version published at a later date. It might actually explain why the script had been written this way: the URL and date at the root level were aligned, editors who care about dates could have both a way not to worry about versions when they edited their spec ("just use the latest one") and an automated pinning mechanism when they published their spec.
In short, I think you're right, if we're going to use a non specific URL, dropping the date seems indeed better.
I'm approving the PR but not merging immediately to leave a bit of time for other possible reviewers to chime in if they feel strongly about the direction here.
I think we should leave the date for the reason @tidoust mentioned here:
|
Would it make sense to do that outside the extraction script, though? Sort of |
Possibly. Would argue doing this in a separate PR, though |
I guess that could be done in https://github.com/tobie/specref/blob/main/lib/bibref.js#L263-L270, eg if |
Overhauled the script to extract all available revisions for each of the standards, so it is possible to link to a specific one.
Now also the main URL for all Unicode standards now point to the latest live on their website.