Skip to content

Conversation

@sorairolake
Copy link

This pull request adds a category for crates that implement archives formats such as tar and ZIP.

It seems there is no category for archives formats. For this reason, the following crates are either not assigned to any category, or crates for file formats that can both archive and compress, such as LHA/LZH, are assigned the compression category.

Examples of crates

  • ar - a library for encoding/decoding Unix archive files.
  • cab - Read/write Windows cabinet (CAB) files.
  • cpio-archive - cpio archive reading and writing.
  • delharc - a library for parsing and extracting files from LHA/LZH archives.
  • ouch - a command-line utility for easily compressing and decompressing files and directories.
  • sevenz-rust2 - a 7z decompressor/compressor written in pure Rust.
  • tar - a Rust implementation of a TAR file reader and writer.
  • unrar - list and extract RAR archives.
  • zip - library to support the reading and writing of zip files.

@Turbo87
Copy link
Member

Turbo87 commented Sep 8, 2025

IMHO since most archive formats usually include compression by default I'm not sure we really need another top-level category for this 🤔

@rust-lang/crates-io thoughts?

@sorairolake
Copy link
Author

sorairolake commented Sep 8, 2025

There are many file formats that can both be archived and compressed, such as ZIP and 7z, but there are also some file formats that can only be archived, such as tar and cpio.1 Since tar and cpio can only archive, I don't think it's appropriate to assign them to the compression category. These archive formats, whether they include compression feature or not, share the common feature of archiving. So, I think having a category for archive formats would make it easier to find implementations of them.

Footnotes

  1. https://en.wikipedia.org/wiki/List_of_archive_formats

@Turbo87 Turbo87 moved this to Backlog in crates.io team meetings Sep 12, 2025
@Turbo87 Turbo87 moved this from Backlog to For next meeting in crates.io team meetings Sep 12, 2025
@jtgeibel
Copy link
Member

We discussed this at yesterday's team meeting and while we see how a distinction can be made between compression and archive formats, we don't feel this broad enough to be a top-level category.

Since that meeting I've gone through your list above and have looked through the existing category structure. Out of the existing categories I think this could possibly fit under either of the following categories. The links and result counts are for a search of "archive" within each category.

  • parser-implementations - "Encoding and/or decoding data from one data format to another." - 57 results
  • encoding - "Parsers implemented for particular formats or languages." - 97 results

We would reconsider this as a subcategory.

Speaking for myself and not the team, I'd like to provide a bit more background context. A few years ago we decided not to remove the categories feature, but its kind of in a soft deprecation state for now. A few reasons for that being:

  • As seen above, the existing structure isn't entirely orthogonal. It's not clear to me how these two top-level categories are different. That's fine to some extent, but compared to keywords the point of categories is to have a curated structure and since we don't have any mechanism to reorganize things over time we've been setting a higher bar recently when it comes to new categories. (Also, its hard to enforce a good structure here long term when additions happen one at a time and are low priority compared to some other issues.)
  • When a new category is added, we don't have any mechanism (beyond someone opening PRs against each crate) to encourage authors to use the new category. I'm kind of interested to see how much usage new categories added over the last few years have actually received, but haven't had an opportunity to investigate.
  • Compared to keywords, not a lot of crates use the categories feature. Out of the 9 crates you list, 8 of them have keywords but only 2 have categories assigned. In total, they list 24 (non-distinct) keywords but only 4 (non-distinct) categories.

As a user I really like the discoverability of categories, but if popular crates don't opt-in to the appropriate category then it can just as easily push users towards a worse selection than if they had done a more general search. In that sense, new categories have a steep hill to climb before they are a net value add and I think that is one of the reasons the team has been more reluctant to adding new categories recently. Again, I'm sure the team is willing to reconsider this as a subcategory, but even then we may not be able to reach consensus on adding this category.

@Turbo87
Copy link
Member

Turbo87 commented Sep 15, 2025

thanks for your comprehensive feedback, Justin! if we can find a good parent category for archives then I'm open to discussing the proposal. until then, I guess we can close this PR for now.

@Turbo87 Turbo87 closed this Sep 15, 2025
@sorairolake sorairolake deleted the sorairolake-archives-category branch September 15, 2025 09:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants