-
-
Notifications
You must be signed in to change notification settings - Fork 196
Fonts 2025 queries #4175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fonts 2025 queries #4175
Conversation
01d64b2
to
1d97236
Compare
68e71e2
to
e12330c
Compare
38e6979
to
ac584bd
Compare
@tunetheweb, I think you were the one reviewing the queries last year. If you do not mind, I would like to invite you to review this year, too, but please feel free to assign someone else. This year, we did not change anything. We just migrated the queries to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with some non-blocking comments.
(The linter is failing due to the code elsewhere.) |
Fixing in #4196 |
That's fixed in After that are you good to merge this? |
Thank you. Rebased. Well, I have not received any feedback from the lead. I would merge, if you are OK with potential follow-up PRs. |
Yeah lets do that. |
Makes progress on #4073
Fonts
Resources
Structure
The queries are split by the section where they are used:
design/
is about foundries and families,development/
is about tools and technologies, andperformance/
is about hosting and serving.Each file name starts with one of the following prefixes indicating the primary subject of the corresponding analysis:
fonts_
is about font files,pages_
is about HTML pages,scripts_
is about JavaScript scripts, andstyles_
is about CSS style sheets.The prefix is followed by the property studied given in singular, potentially extended one or several suffixes narrowing down the scope, as in
fonts_size_by_table.sql
andpages_link_relation.sql
.Content
Each query starts with a preamble indicating the section, question, and normalization type, as illustrated below:
Many queries rely on temporary functions for convenience and clarity. The functions that appear in several queries are extracted into a common file called
common.sql
. Whenever any of the functions defined incommon.sql
is used by a query, the query has the following pseudo-directive at the top:-- INCLUDE https://github.com/HTTPArchive/almanac.httparchive.org/blob/main/sql/{year}/fonts/common.sql
The pseudo-directive has to be replaced with the content of
common.sql
prior to executing the query in question.In addition, queries generally have parameters, as in
@date
, so as to be able to run them for different configurations. The values for the parameters will have to be supplied upon execution.All the above is taken take of automatically if the queries are executed using
execute.py
, which we discuss next.Execution
The queries can be executed using the
execute.py
script. The results are first saved in local CSV files sitting next to the SQL files and then uploaded to the spreadsheet. In the spreadsheet, for each query, a separate sheet is created and named after the question the query answers, which is given in its preamble. If the CSV file already exists, the corresponding query is not executed. If cell A1 is already populated, the corresponding sheet is not updated.First, ensure that the Application Default Credentials authorization strategy is configured, and that the HTTP Archive project is used as the quota project:
Second, install the Python prerequisites for the script:
The script can be run for all or a subset of the queries as illustrated below:
By default, it operates in a dry-run mode: it does not run the queries but prints an estimate of the amount of data that would be processed by each query. To actually run the queries, pass the
--no-dry-run
option as follows: