Implement Compression and Extraction for Collection JSON Files #144

RogerPodacter · 2025-11-12T21:16:37Z

Added a new script compress_collections.sh to compress items_by_ethscription.json and collections_by_name.json into a single collections_data.tar.gz archive.
Updated the Erc721EthscriptionsCollectionParser to check for the existence of the archive and extract JSON files if they are missing or outdated.
Modified .gitignore to ignore uncompressed JSON files while keeping the compressed archive.
Introduced logging for extraction processes to enhance traceability.

Note

Adds tar.gz-based collections data workflow with auto-extraction, updates ERC-721 collection name/symbol to tick-only, parses OwnershipTransferred events, tweaks NameRegistry metadata, and fixes Docker chown.

Collections data handling (parser):
- Add tar.gz archive support with auto-extraction in Erc721EthscriptionsCollectionParser.load_import_data using zlib and Gem::Package::TarReader.
- Introduce DEFAULT_ARCHIVE_PATH and extraction/logging when JSONs are missing/outdated; raise if data not found.
- Minor robustness: use it.fetch('ethscription_number').
Scripts:
- New compress_collections.sh to bundle items_by_ethscription.json and collections_by_name.json into collections_data.tar.gz.
Infra:
- .gitignore: ignore raw JSON files; keep archive.
- Dockerfile: chown /rails instead of specific subdirs.
Smart contracts:
- ERC20FixedDenominationManager: initialize collection with tick name and tick.upper() symbol (no " ERC-721" suffixes); tests updated accordingly.
- NameRegistry: update token metadata description and collection contractURI name to "Ethscription Names".
Event parsing:
- ProtocolEventReader: add OwnershipTransferred(bytes32,address,address) signature and parser.

^{Written by Cursor Bugbot for commit 035e154. This will update automatically on new commits. Configure here.}

- Added a new script `compress_collections.sh` to compress `items_by_ethscription.json` and `collections_by_name.json` into a single `collections_data.tar.gz` archive. - Updated the `Erc721EthscriptionsCollectionParser` to check for the existence of the archive and extract JSON files if they are missing or outdated. - Modified `.gitignore` to ignore uncompressed JSON files while keeping the compressed archive. - Introduced logging for extraction processes to enhance traceability.

Copilot

Pull Request Overview

This PR implements a compression and extraction system for collection JSON files to reduce repository size. The approach uses a tar.gz archive to store items_by_ethscription.json and collections_by_name.json, with automatic extraction when needed.

Key Changes:

Added bash script to compress collection JSON files into a single archive
Implemented automatic extraction logic in the parser when archive is present
Updated contract metadata strings (name corrections and simplifications)

Reviewed Changes

Copilot reviewed 6 out of 8 changed files in this pull request and generated 8 comments.

Show a summary per file

File	Description
`compress_collections.sh`	New bash script that creates tar.gz archive from collection JSON files with compression stats
`app/models/erc721_ethscriptions_collection_parser.rb`	Added extraction logic to decompress archive on-demand and changed ethscription_number access to use fetch
`.gitignore`	Configured to ignore uncompressed JSON files while keeping the compressed archive
`lib/protocol_event_reader.rb`	Added OwnershipTransferred event parser for collection ownership changes
`contracts/src/NameRegistry.sol`	Fixed typo in description and updated collection name
`contracts/src/ERC20FixedDenominationManager.sol`	Simplified collection name/symbol initialization
`Dockerfile`	Broadened ownership permissions from specific directories to entire /rails directory

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-11-12T21:20:22Z

contracts/src/ERC20FixedDenominationManager.sol

-            string.concat(deployOp.tick, " ERC-721"),  // Collection name
-            string.concat(deployOp.tick.upper(), "-ERC-721"),  // Collection symbol
+            deployOp.tick,  // Collection name
+            deployOp.tick.upper()  // Collection symbol


Missing comma after the comment on line 172. This will cause a compilation error as the function call has 4 arguments but is missing a separator between the second and third arguments.

Suggested change

deployOp.tick.upper() // Collection symbol

deployOp.tick.upper(), // Collection symbol

Copilot · 2025-11-12T21:20:23Z

app/models/erc721_ethscriptions_collection_parser.rb

+                    end
+                    Rails.logger.info "Extracted #{entry.full_name} to #{items_path}" if defined?(Rails)
+                  when 'collections_by_name.json'
+                    File.open(collections_path, 'wb') do |f|
+                      f.write(entry.read)
+                    end


[nitpick] The extraction logic doesn't preserve file modification times from the archive. When extracting files, consider using File.utime(entry.mtime, entry.mtime, path) after writing to maintain consistent timestamps, which would make the File.mtime comparisons more reliable for detecting whether re-extraction is needed.

Suggested change

end

Rails.logger.info "Extracted #{entry.full_name} to #{items_path}" if defined?(Rails)

when 'collections_by_name.json'

File.open(collections_path, 'wb') do |f|

f.write(entry.read)

end

end

File.utime(entry.mtime, entry.mtime, items_path)

Rails.logger.info "Extracted #{entry.full_name} to #{items_path}" if defined?(Rails)

when 'collections_by_name.json'

File.open(collections_path, 'wb') do |f|

f.write(entry.read)

end

File.utime(entry.mtime, entry.mtime, collections_path)

Copilot · 2025-11-12T21:20:23Z

app/models/erc721_ethscriptions_collection_parser.rb

        cname = it['collection_name']
        next unless cname.is_a?(String) && !cname.empty?
-        num = it['ethscription_number'].to_i
+        num = it.fetch('ethscription_number').to_i


Using .fetch('ethscription_number') will raise a KeyError if the key is missing, whereas the previous .to_i would return 0 for nil. While this makes the code more strict, consider whether this breaking change is intentional. If items without an 'ethscription_number' key should be skipped rather than cause an exception, use .fetch('ethscription_number', 0).to_i or add error handling around this line.

Suggested change

num = it.fetch('ethscription_number').to_i

num = it.fetch('ethscription_number', 0).to_i

Copilot · 2025-11-12T21:20:24Z

app/models/erc721_ethscriptions_collection_parser.rb

+          # Extract tar.gz archive
+          Zlib::GzipReader.open(archive_path) do |gz|
+            Gem::Package::TarReader.new(gz) do |tar|
+              tar.each do |entry|
+                if entry.file?
+                  case entry.full_name
+                  when 'items_by_ethscription.json'
+                    File.open(items_path, 'wb') do |f|
+                      f.write(entry.read)
+                    end
+                    Rails.logger.info "Extracted #{entry.full_name} to #{items_path}" if defined?(Rails)
+                  when 'collections_by_name.json'
+                    File.open(collections_path, 'wb') do |f|
+                      f.write(entry.read)
+                    end
+                    Rails.logger.info "Extracted #{entry.full_name} to #{collections_path}" if defined?(Rails)
+                  end
+                end
+              end
+            end
+          end
+        end


The extraction process is not atomic and could lead to race conditions if multiple processes attempt to extract files simultaneously. Consider using a lock file mechanism or atomic operations to ensure only one process extracts the archive at a time. For example, you could use File.open(lock_path, File::RDWR|File::CREAT, 0644) { |f| f.flock(File::LOCK_EX); ... } to create an exclusive lock during extraction.

Suggested change

# Extract tar.gz archive

Zlib::GzipReader.open(archive_path) do |gz|

Gem::Package::TarReader.new(gz) do |tar|

tar.each do |entry|

if entry.file?

case entry.full_name

when 'items_by_ethscription.json'

File.open(items_path, 'wb') do |f|

f.write(entry.read)

end

Rails.logger.info "Extracted #{entry.full_name} to #{items_path}" if defined?(Rails)

when 'collections_by_name.json'

File.open(collections_path, 'wb') do |f|

f.write(entry.read)

end

Rails.logger.info "Extracted #{entry.full_name} to #{collections_path}" if defined?(Rails)

end

end

end

end

end

end

# Use a lock file to ensure only one process extracts at a time

lock_path = "#{archive_path}.lock"

File.open(lock_path, File::RDWR|File::CREAT, 0644) do |lock_file|

lock_file.flock(File::LOCK_EX)

# Re-check if extraction is still needed after acquiring the lock

extract_needed_after_lock = !File.exist?(items_path) || !File.exist?(collections_path) ||

File.mtime(archive_path) > (File.exist?(items_path) ? File.mtime(items_path) : 0) ||

File.mtime(archive_path) > (File.exist?(collections_path) ? File.mtime(collections_path) : 0)

if extract_needed_after_lock

# Extract tar.gz archive

Zlib::GzipReader.open(archive_path) do |gz|

Gem::Package::TarReader.new(gz) do |tar|

tar.each do |entry|

if entry.file?

case entry.full_name

when 'items_by_ethscription.json'

File.open(items_path, 'wb') do |f|

f.write(entry.read)

end

Rails.logger.info "Extracted #{entry.full_name} to #{items_path}" if defined?(Rails)

when 'collections_by_name.json'

File.open(collections_path, 'wb') do |f|

f.write(entry.read)

end

Rails.logger.info "Extracted #{entry.full_name} to #{collections_path}" if defined?(Rails)

end

end

end

end

end

end

end

Copilot · 2025-11-12T21:20:24Z

app/models/erc721_ethscriptions_collection_parser.rb

+          Zlib::GzipReader.open(archive_path) do |gz|
+            Gem::Package::TarReader.new(gz) do |tar|
+              tar.each do |entry|
+                if entry.file?
+                  case entry.full_name
+                  when 'items_by_ethscription.json'
+                    File.open(items_path, 'wb') do |f|
+                      f.write(entry.read)
+                    end
+                    Rails.logger.info "Extracted #{entry.full_name} to #{items_path}" if defined?(Rails)
+                  when 'collections_by_name.json'
+                    File.open(collections_path, 'wb') do |f|
+                      f.write(entry.read)
+                    end
+                    Rails.logger.info "Extracted #{entry.full_name} to #{collections_path}" if defined?(Rails)
+                  end
+                end
+              end
+            end
+          end


If the extraction process fails partway through (e.g., disk full, permission issues), one JSON file might be written while the other is not, leaving the system in an inconsistent state. Consider wrapping the extraction in a transaction-like pattern: extract to temporary files first, then atomically rename them only if both extractions succeed. This ensures both files are available or neither is.

Copilot · 2025-11-12T21:20:24Z

Dockerfile

 # Set up non-root user
 RUN useradd rails --create-home --shell /bin/bash && \
-    chown -R rails:rails log tmp storage db config
+    chown -R rails:rails /rails


Changing ownership from specific directories (log tmp storage db config) to the entire /rails directory is a significant broadening of permissions. This could potentially include files that should remain owned by root for security purposes. Consider verifying that all subdirectories under /rails should indeed be writable by the rails user, or revert to explicitly listing the directories that need to be owned by the rails user.

Suggested change

chown -R rails:rails /rails

chown -R rails:rails /rails/log /rails/tmp /rails/storage /rails/db /rails/config

Copilot · 2025-11-12T21:20:25Z

lib/protocol_event_reader.rb

+      previous_owner: log['topics'][2] ? '0x' + log['topics'][2][-40..] : nil,
+      new_owner: log['topics'][3] ? '0x' + log['topics'][3][-40..] : nil


The address extraction logic uses log['topics'][2][-40..] and log['topics'][3][-40..], which extracts the last 40 characters. However, if these topics are 32-byte hex strings (64 characters + '0x' prefix = 66 characters), taking the last 40 characters would give you the correct address. But if the topic is shorter than 40 characters, this would return the entire string, potentially causing issues. Consider adding validation to ensure the topics have the expected length before extraction.

Copilot · 2025-11-12T21:20:25Z

compress_collections.sh

+    fi
+    COMPRESSED_SIZE=$(stat -f %z "$ARCHIVE_FILE" 2>/dev/null || stat -c %s "$ARCHIVE_FILE" 2>/dev/null)
+    if [ -n "$ORIGINAL_SIZE" ] && [ -n "$COMPRESSED_SIZE" ]; then
+        RATIO=$(echo "scale=2; (1 - $COMPRESSED_SIZE / $ORIGINAL_SIZE) * 100" | bc)


[nitpick] The script uses bc for calculating the compression ratio, but bc may not be installed by default on all systems. Consider adding a check for bc availability or using a more portable alternative like awk for the calculation. For example: RATIO=$(awk "BEGIN {printf \"%.2f\", (1 - $COMPRESSED_SIZE / $ORIGINAL_SIZE) * 100}")

Suggested change

RATIO=$(echo "scale=2; (1 - $COMPRESSED_SIZE / $ORIGINAL_SIZE) * 100" | bc)

RATIO=$(awk "BEGIN {printf \"%.2f\", (1 - $COMPRESSED_SIZE / $ORIGINAL_SIZE) * 100}")

…RC721 collection properties - Corrected the initialization of the collection symbol in `ERC20FixedDenominationManager` to include a trailing comma for clarity. - Updated test assertions in `EthscriptionsTokenTest` to reflect the new collection name and symbol, ensuring consistency with the latest contract changes.

Copilot

Pull Request Overview

Copilot reviewed 7 out of 9 changed files in this pull request and generated no new comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

RogerPodacter requested a review from Copilot November 12, 2025 21:16

Copilot started reviewing on behalf of RogerPodacter November 12, 2025 21:17 View session

Copilot finished reviewing on behalf of RogerPodacter November 12, 2025 21:18

Copilot AI reviewed Nov 12, 2025

View reviewed changes

RogerPodacter requested a review from Copilot November 12, 2025 21:27

Copilot started reviewing on behalf of RogerPodacter November 12, 2025 21:28 View session

Copilot finished reviewing on behalf of RogerPodacter November 12, 2025 21:31

RogerPodacter merged commit a732296 into evm-backend-demo Nov 12, 2025
6 checks passed

RogerPodacter deleted the fix_collections branch November 12, 2025 21:31

Copilot AI reviewed Nov 12, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement Compression and Extraction for Collection JSON Files #144

Implement Compression and Extraction for Collection JSON Files #144

Uh oh!

RogerPodacter commented Nov 12, 2025 •

edited by cursor bot

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Nov 12, 2025

Uh oh!

Copilot AI Nov 12, 2025

Uh oh!

Copilot AI Nov 12, 2025

Uh oh!

Copilot AI Nov 12, 2025

Uh oh!

Copilot AI Nov 12, 2025

Uh oh!

Copilot AI Nov 12, 2025

Uh oh!

Copilot AI Nov 12, 2025

Uh oh!

Copilot AI Nov 12, 2025

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	deployOp.tick.upper() // Collection symbol
	deployOp.tick.upper(), // Collection symbol

	num = it.fetch('ethscription_number').to_i
	num = it.fetch('ethscription_number', 0).to_i

	chown -R rails:rails /rails
	chown -R rails:rails /rails/log /rails/tmp /rails/storage /rails/db /rails/config

		previous_owner: log['topics'][2] ? '0x' + log['topics'][2][-40..] : nil,
		new_owner: log['topics'][3] ? '0x' + log['topics'][3][-40..] : nil

	RATIO=$(echo "scale=2; (1 - $COMPRESSED_SIZE / $ORIGINAL_SIZE) * 100" \| bc)
	RATIO=$(awk "BEGIN {printf \"%.2f\", (1 - $COMPRESSED_SIZE / $ORIGINAL_SIZE) * 100}")

Implement Compression and Extraction for Collection JSON Files #144

Implement Compression and Extraction for Collection JSON Files #144

Uh oh!

Conversation

RogerPodacter commented Nov 12, 2025 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

RogerPodacter commented Nov 12, 2025 •

edited by cursor bot

Loading