-
Notifications
You must be signed in to change notification settings - Fork 25
Implement Compression and Extraction for Collection JSON Files #144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Added a new script `compress_collections.sh` to compress `items_by_ethscription.json` and `collections_by_name.json` into a single `collections_data.tar.gz` archive. - Updated the `Erc721EthscriptionsCollectionParser` to check for the existence of the archive and extract JSON files if they are missing or outdated. - Modified `.gitignore` to ignore uncompressed JSON files while keeping the compressed archive. - Introduced logging for extraction processes to enhance traceability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR implements a compression and extraction system for collection JSON files to reduce repository size. The approach uses a tar.gz archive to store items_by_ethscription.json and collections_by_name.json, with automatic extraction when needed.
Key Changes:
- Added bash script to compress collection JSON files into a single archive
- Implemented automatic extraction logic in the parser when archive is present
- Updated contract metadata strings (name corrections and simplifications)
Reviewed Changes
Copilot reviewed 6 out of 8 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
compress_collections.sh |
New bash script that creates tar.gz archive from collection JSON files with compression stats |
app/models/erc721_ethscriptions_collection_parser.rb |
Added extraction logic to decompress archive on-demand and changed ethscription_number access to use fetch |
.gitignore |
Configured to ignore uncompressed JSON files while keeping the compressed archive |
lib/protocol_event_reader.rb |
Added OwnershipTransferred event parser for collection ownership changes |
contracts/src/NameRegistry.sol |
Fixed typo in description and updated collection name |
contracts/src/ERC20FixedDenominationManager.sol |
Simplified collection name/symbol initialization |
Dockerfile |
Broadened ownership permissions from specific directories to entire /rails directory |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| string.concat(deployOp.tick, " ERC-721"), // Collection name | ||
| string.concat(deployOp.tick.upper(), "-ERC-721"), // Collection symbol | ||
| deployOp.tick, // Collection name | ||
| deployOp.tick.upper() // Collection symbol |
Copilot
AI
Nov 12, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing comma after the comment on line 172. This will cause a compilation error as the function call has 4 arguments but is missing a separator between the second and third arguments.
| deployOp.tick.upper() // Collection symbol | |
| deployOp.tick.upper(), // Collection symbol |
| end | ||
| Rails.logger.info "Extracted #{entry.full_name} to #{items_path}" if defined?(Rails) | ||
| when 'collections_by_name.json' | ||
| File.open(collections_path, 'wb') do |f| | ||
| f.write(entry.read) | ||
| end |
Copilot
AI
Nov 12, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The extraction logic doesn't preserve file modification times from the archive. When extracting files, consider using File.utime(entry.mtime, entry.mtime, path) after writing to maintain consistent timestamps, which would make the File.mtime comparisons more reliable for detecting whether re-extraction is needed.
| end | |
| Rails.logger.info "Extracted #{entry.full_name} to #{items_path}" if defined?(Rails) | |
| when 'collections_by_name.json' | |
| File.open(collections_path, 'wb') do |f| | |
| f.write(entry.read) | |
| end | |
| end | |
| File.utime(entry.mtime, entry.mtime, items_path) | |
| Rails.logger.info "Extracted #{entry.full_name} to #{items_path}" if defined?(Rails) | |
| when 'collections_by_name.json' | |
| File.open(collections_path, 'wb') do |f| | |
| f.write(entry.read) | |
| end | |
| File.utime(entry.mtime, entry.mtime, collections_path) |
| cname = it['collection_name'] | ||
| next unless cname.is_a?(String) && !cname.empty? | ||
| num = it['ethscription_number'].to_i | ||
| num = it.fetch('ethscription_number').to_i |
Copilot
AI
Nov 12, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using .fetch('ethscription_number') will raise a KeyError if the key is missing, whereas the previous .to_i would return 0 for nil. While this makes the code more strict, consider whether this breaking change is intentional. If items without an 'ethscription_number' key should be skipped rather than cause an exception, use .fetch('ethscription_number', 0).to_i or add error handling around this line.
| num = it.fetch('ethscription_number').to_i | |
| num = it.fetch('ethscription_number', 0).to_i |
| # Extract tar.gz archive | ||
| Zlib::GzipReader.open(archive_path) do |gz| | ||
| Gem::Package::TarReader.new(gz) do |tar| | ||
| tar.each do |entry| | ||
| if entry.file? | ||
| case entry.full_name | ||
| when 'items_by_ethscription.json' | ||
| File.open(items_path, 'wb') do |f| | ||
| f.write(entry.read) | ||
| end | ||
| Rails.logger.info "Extracted #{entry.full_name} to #{items_path}" if defined?(Rails) | ||
| when 'collections_by_name.json' | ||
| File.open(collections_path, 'wb') do |f| | ||
| f.write(entry.read) | ||
| end | ||
| Rails.logger.info "Extracted #{entry.full_name} to #{collections_path}" if defined?(Rails) | ||
| end | ||
| end | ||
| end | ||
| end | ||
| end | ||
| end |
Copilot
AI
Nov 12, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The extraction process is not atomic and could lead to race conditions if multiple processes attempt to extract files simultaneously. Consider using a lock file mechanism or atomic operations to ensure only one process extracts the archive at a time. For example, you could use File.open(lock_path, File::RDWR|File::CREAT, 0644) { |f| f.flock(File::LOCK_EX); ... } to create an exclusive lock during extraction.
| # Extract tar.gz archive | |
| Zlib::GzipReader.open(archive_path) do |gz| | |
| Gem::Package::TarReader.new(gz) do |tar| | |
| tar.each do |entry| | |
| if entry.file? | |
| case entry.full_name | |
| when 'items_by_ethscription.json' | |
| File.open(items_path, 'wb') do |f| | |
| f.write(entry.read) | |
| end | |
| Rails.logger.info "Extracted #{entry.full_name} to #{items_path}" if defined?(Rails) | |
| when 'collections_by_name.json' | |
| File.open(collections_path, 'wb') do |f| | |
| f.write(entry.read) | |
| end | |
| Rails.logger.info "Extracted #{entry.full_name} to #{collections_path}" if defined?(Rails) | |
| end | |
| end | |
| end | |
| end | |
| end | |
| end | |
| # Use a lock file to ensure only one process extracts at a time | |
| lock_path = "#{archive_path}.lock" | |
| File.open(lock_path, File::RDWR|File::CREAT, 0644) do |lock_file| | |
| lock_file.flock(File::LOCK_EX) | |
| # Re-check if extraction is still needed after acquiring the lock | |
| extract_needed_after_lock = !File.exist?(items_path) || !File.exist?(collections_path) || | |
| File.mtime(archive_path) > (File.exist?(items_path) ? File.mtime(items_path) : 0) || | |
| File.mtime(archive_path) > (File.exist?(collections_path) ? File.mtime(collections_path) : 0) | |
| if extract_needed_after_lock | |
| # Extract tar.gz archive | |
| Zlib::GzipReader.open(archive_path) do |gz| | |
| Gem::Package::TarReader.new(gz) do |tar| | |
| tar.each do |entry| | |
| if entry.file? | |
| case entry.full_name | |
| when 'items_by_ethscription.json' | |
| File.open(items_path, 'wb') do |f| | |
| f.write(entry.read) | |
| end | |
| Rails.logger.info "Extracted #{entry.full_name} to #{items_path}" if defined?(Rails) | |
| when 'collections_by_name.json' | |
| File.open(collections_path, 'wb') do |f| | |
| f.write(entry.read) | |
| end | |
| Rails.logger.info "Extracted #{entry.full_name} to #{collections_path}" if defined?(Rails) | |
| end | |
| end | |
| end | |
| end | |
| end | |
| end | |
| end |
| Zlib::GzipReader.open(archive_path) do |gz| | ||
| Gem::Package::TarReader.new(gz) do |tar| | ||
| tar.each do |entry| | ||
| if entry.file? | ||
| case entry.full_name | ||
| when 'items_by_ethscription.json' | ||
| File.open(items_path, 'wb') do |f| | ||
| f.write(entry.read) | ||
| end | ||
| Rails.logger.info "Extracted #{entry.full_name} to #{items_path}" if defined?(Rails) | ||
| when 'collections_by_name.json' | ||
| File.open(collections_path, 'wb') do |f| | ||
| f.write(entry.read) | ||
| end | ||
| Rails.logger.info "Extracted #{entry.full_name} to #{collections_path}" if defined?(Rails) | ||
| end | ||
| end | ||
| end | ||
| end | ||
| end |
Copilot
AI
Nov 12, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the extraction process fails partway through (e.g., disk full, permission issues), one JSON file might be written while the other is not, leaving the system in an inconsistent state. Consider wrapping the extraction in a transaction-like pattern: extract to temporary files first, then atomically rename them only if both extractions succeed. This ensures both files are available or neither is.
| # Set up non-root user | ||
| RUN useradd rails --create-home --shell /bin/bash && \ | ||
| chown -R rails:rails log tmp storage db config | ||
| chown -R rails:rails /rails |
Copilot
AI
Nov 12, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing ownership from specific directories (log tmp storage db config) to the entire /rails directory is a significant broadening of permissions. This could potentially include files that should remain owned by root for security purposes. Consider verifying that all subdirectories under /rails should indeed be writable by the rails user, or revert to explicitly listing the directories that need to be owned by the rails user.
| chown -R rails:rails /rails | |
| chown -R rails:rails /rails/log /rails/tmp /rails/storage /rails/db /rails/config |
| previous_owner: log['topics'][2] ? '0x' + log['topics'][2][-40..] : nil, | ||
| new_owner: log['topics'][3] ? '0x' + log['topics'][3][-40..] : nil |
Copilot
AI
Nov 12, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The address extraction logic uses log['topics'][2][-40..] and log['topics'][3][-40..], which extracts the last 40 characters. However, if these topics are 32-byte hex strings (64 characters + '0x' prefix = 66 characters), taking the last 40 characters would give you the correct address. But if the topic is shorter than 40 characters, this would return the entire string, potentially causing issues. Consider adding validation to ensure the topics have the expected length before extraction.
| fi | ||
| COMPRESSED_SIZE=$(stat -f %z "$ARCHIVE_FILE" 2>/dev/null || stat -c %s "$ARCHIVE_FILE" 2>/dev/null) | ||
| if [ -n "$ORIGINAL_SIZE" ] && [ -n "$COMPRESSED_SIZE" ]; then | ||
| RATIO=$(echo "scale=2; (1 - $COMPRESSED_SIZE / $ORIGINAL_SIZE) * 100" | bc) |
Copilot
AI
Nov 12, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The script uses bc for calculating the compression ratio, but bc may not be installed by default on all systems. Consider adding a check for bc availability or using a more portable alternative like awk for the calculation. For example: RATIO=$(awk "BEGIN {printf \"%.2f\", (1 - $COMPRESSED_SIZE / $ORIGINAL_SIZE) * 100}")
| RATIO=$(echo "scale=2; (1 - $COMPRESSED_SIZE / $ORIGINAL_SIZE) * 100" | bc) | |
| RATIO=$(awk "BEGIN {printf \"%.2f\", (1 - $COMPRESSED_SIZE / $ORIGINAL_SIZE) * 100}") |
…RC721 collection properties - Corrected the initialization of the collection symbol in `ERC20FixedDenominationManager` to include a trailing comma for clarity. - Updated test assertions in `EthscriptionsTokenTest` to reflect the new collection name and symbol, ensuring consistency with the latest contract changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 7 out of 9 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
compress_collections.shto compressitems_by_ethscription.jsonandcollections_by_name.jsoninto a singlecollections_data.tar.gzarchive.Erc721EthscriptionsCollectionParserto check for the existence of the archive and extract JSON files if they are missing or outdated..gitignoreto ignore uncompressed JSON files while keeping the compressed archive.Note
Adds tar.gz-based collections data workflow with auto-extraction, updates ERC-721 collection name/symbol to tick-only, parses OwnershipTransferred events, tweaks NameRegistry metadata, and fixes Docker chown.
Erc721EthscriptionsCollectionParser.load_import_datausingzlibandGem::Package::TarReader.DEFAULT_ARCHIVE_PATHand extraction/logging when JSONs are missing/outdated; raise if data not found.it.fetch('ethscription_number').compress_collections.shto bundleitems_by_ethscription.jsonandcollections_by_name.jsonintocollections_data.tar.gz..gitignore: ignore raw JSON files; keep archive.Dockerfile: chown/railsinstead of specific subdirs.ERC20FixedDenominationManager: initialize collection withtickname andtick.upper()symbol (no " ERC-721" suffixes); tests updated accordingly.NameRegistry: update token metadata description and collectioncontractURIname to "Ethscription Names".ProtocolEventReader: addOwnershipTransferred(bytes32,address,address)signature and parser.Written by Cursor Bugbot for commit 035e154. This will update automatically on new commits. Configure here.