diff --git a/README.md b/README.md index 658384e..e67e3fd 100644 --- a/README.md +++ b/README.md @@ -53,14 +53,40 @@ require 'zstd-ruby' #### Simple Compression ```ruby -compressed_data = Zstd.compress(data) -compressed_data = Zstd.compress(data, level: complession_level) # default compression_level is 3 +compressed_data = Zstd.compress(data) # default: 3 +compressed_data = Zstd.compress(data, level: 6) ``` -#### Compression with Dictionary +### Context-based Compression + +For better performance with multiple operations, use reusable contexts: + ```ruby -# dictionary is supposed to have been created using `zstd --train` -compressed_using_dict = Zstd.compress("", dict: File.read('dictionary_file')) +# Unified context (recommended) +ctx = Zstd::Context.new(level: 6) +compressed = ctx.compress(data) +original = ctx.decompress(compressed) + +# Specialized contexts for memory optimization +cctx = Zstd::CContext.new(level: 6) # Compression-only +dctx = Zstd::DContext.new # Decompression-only +``` + +### Dictionary Compression + +Dictionaries provide better compression for similar data: + +```ruby +dictionary = File.read('dictionary_file') + +# Using module methods +compressed = Zstd.compress(data, level: 3, dict: dictionary) +original = Zstd.decompress(compressed, dict: dictionary) + +# Using contexts for better performance +ctx = Zstd::Context.new(level: 6, dict: dictionary) +compressed = ctx.compress(data) +original = ctx.decompress(compressed) ``` #### Compression with CDict @@ -128,16 +154,9 @@ res << stream.finish ### Decompression -#### Simple Decompression - ```ruby data = Zstd.decompress(compressed_data) -``` - -#### Decompression with Dictionary -```ruby -# dictionary is supposed to have been created using `zstd --train` -Zstd.decompress(compressed_using_dict, dict: File.read('dictionary_file')) +data = Zstd.decompress(compressed_data, dict: dictionary) ``` #### Decompression with DDict @@ -157,79 +176,73 @@ result = '' result << stream.decompress(cstr[0, 10]) result << stream.decompress(cstr[10..-1]) ``` +## API Reference + +### Context Classes + +#### `Zstd::Context` +Unified context for both compression and decompression. -#### Streaming Decompression with dictionary ```ruby -cstr = "" # Compressed data -stream = Zstd::StreamingDecompress.new(dict: File.read('dictionary_file')) -result = '' -result << stream.decompress(cstr[0, 10]) -result << stream.decompress(cstr[10..-1]) +ctx = Zstd::Context.new # Default settings +ctx = Zstd::Context.new(level: 6) # With compression level +ctx = Zstd::Context.new(level: 6, dict: dictionary) # With dictionary ``` -DDict can also be specified to `dict:`. +- `compress(data)` → String +- `decompress(compressed_data)` → String -#### Streaming Decompression with Position Tracking - -If you need to know how much of the input data was consumed during decompression, you can use the `decompress_with_pos` method: +#### `Zstd::CContext` +Compression-only context for memory optimization. ```ruby -cstr = "" # Compressed data -stream = Zstd::StreamingDecompress.new -result, consumed_bytes = stream.decompress_with_pos(cstr[0, 10]) -# result contains the decompressed data -# consumed_bytes contains the number of bytes from input that were processed +cctx = Zstd::CContext.new(level: 6) # With compression level +cctx = Zstd::CContext.new(level: 6, dict: dictionary) # With dictionary ``` -This is particularly useful when processing streaming data where you need to track the exact position in the input stream. +- `compress(data)` → String -### Skippable frame +#### `Zstd::DContext` +Decompression-only context for memory optimization. ```ruby -compressed_data_with_skippable_frame = Zstd.write_skippable_frame(compressed_data, "sample data") - -Zstd.read_skippable_frame(compressed_data_with_skippable_frame) -# => "sample data" +dctx = Zstd::DContext.new # Default settings +dctx = Zstd::DContext.new(dict: dictionary) # With dictionary ``` -### Stream Writer and Reader Wrapper -**EXPERIMENTAL** +- `decompress(compressed_data)` → String -* These features are experimental and may be subject to API changes in future releases. -* There may be performance and compatibility issues, so extensive testing is required before production use. -* If you have any questions, encounter bugs, or have suggestions, please report them via [GitHub issues](https://github.com/SpringMT/zstd-ruby/issues). +### Module Methods -#### Zstd::StreamWriter +#### Compression +- `Zstd.compress(data)` → String (default level 3) +- `Zstd.compress(data, dict: dictionary)` → String -```ruby -require 'stringio' -require 'zstd-ruby' +#### Decompression +- `Zstd.decompress(compressed_data)` → String +- `Zstd.decompress(compressed_data, dict: dictionary)` → String -io = StringIO.new -stream = Zstd::StreamWriter.new(io) -stream.write("abc") -stream.finish +#### Utilities +- `Zstd.zstd_version` → Integer -io.rewind -# Retrieve the compressed data -compressed_data = io.read -``` +### Performance Guidelines -#### Zstd::StreamReader +| Use Case | Recommended API | Benefits | +|----------|----------------|----------| +| Single operations | `Zstd.compress/decompress` | Simple, no setup | +| Multiple operations | `Zstd::Context` | 2-3x faster, convenient | +| Specialized needs | `Zstd::CContext/DContext` | Direct API access | -```ruby -require 'stringio' -require 'zstd-ruby' # Add the appropriate require statement if necessary +**Compression Levels:** 1-3 (fast, 3 is default), 9-19 (better compression) -io = StringIO.new(compressed_data) -reader = Zstd::StreamReader.new(io) +## Benchmarks -# Read and output the decompressed data -puts reader.read(10) # 'abc' -puts reader.read(10) # 'def' -puts reader.read(10) # '' (end of data) -``` +To test performance on your system: +```bash +cd benchmarks +ruby quick_benchmark.rb # Fast overview of all APIs (recommended) +``` ## JRuby This gem does not support JRuby. @@ -266,7 +279,6 @@ To install this gem onto your local machine, run `bundle exec rake install`. To Bug reports and pull requests are welcome on GitHub at https://github.com/SpringMT/zstd-ruby. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the [Contributor Covenant](http://contributor-covenant.org) code of conduct. - ## License The gem is available as open source under the terms of the [BSD-3-Clause License](https://opensource.org/licenses/BSD-3-Clause). diff --git a/benchmarks/context_reuse.rb b/benchmarks/context_reuse.rb new file mode 100644 index 0000000..a286c07 --- /dev/null +++ b/benchmarks/context_reuse.rb @@ -0,0 +1,49 @@ +require 'benchmark/ips' + +$LOAD_PATH.unshift '../lib' +require 'zstd-ruby' + +# Sample data - size typical of real-world scenarios +data = "Hello, World! " * 1000 # ~13KB string + +# Pre-compress data for decompression tests +compressed_data = Zstd.compress(data) + +puts "Benchmarking context reuse vs module methods" +puts "Data size: #{data.bytesize} bytes, Compressed: #{compressed_data.bytesize} bytes" +puts + +Benchmark.ips do |x| + x.config(time: 3, warmup: 1) + + # Compression benchmarks + x.report("Module compress (new context each time)") do + Zstd.compress(data) + end + + ctx = Zstd::Context.new + x.report("Context compress (reused)") do + ctx.compress(data) + end + + cctx = Zstd::CContext.new + x.report("CContext compress (reused)") do + cctx.compress(data) + end + + # Decompression benchmarks + x.report("Module decompress (new context each time)") do + Zstd.decompress(compressed_data) + end + + x.report("Context decompress (reused)") do + ctx.decompress(compressed_data) + end + + dctx = Zstd::DContext.new + x.report("DContext decompress (reused)") do + dctx.decompress(compressed_data) + end + + x.compare! +end \ No newline at end of file diff --git a/ext/zstdruby/context.c b/ext/zstdruby/context.c new file mode 100644 index 0000000..df7ede0 --- /dev/null +++ b/ext/zstdruby/context.c @@ -0,0 +1,320 @@ +#include "common.h" + +extern VALUE rb_mZstd; + +typedef struct { + ZSTD_CCtx* cctx; + ZSTD_CDict* cdict; + int compression_level; + int needs_reset; + VALUE dictionary; +} zstd_ccontext_t; + +typedef struct { + ZSTD_DCtx* dctx; + ZSTD_DDict* ddict; + int needs_reset; + VALUE dictionary; +} zstd_dcontext_t; + +static VALUE rb_cZstdCContext; +static VALUE rb_cZstdDContext; + +// Forward declaration of decompress_buffered from zstdruby.c +extern VALUE decompress_buffered(ZSTD_DCtx* dctx, const char* input_data, size_t input_size, bool free_ctx); + +// CContext (compression-only) implementation +static void zstd_ccontext_mark(void *ptr) +{ + zstd_ccontext_t *ctx = (zstd_ccontext_t*)ptr; + if (ctx) { + rb_gc_mark(ctx->dictionary); + } +} + +static void zstd_ccontext_free(void *ptr) +{ + zstd_ccontext_t *ctx = (zstd_ccontext_t*)ptr; + if (ctx) { + if (ctx->cctx) { + ZSTD_freeCCtx(ctx->cctx); + } + if (ctx->cdict) { + ZSTD_freeCDict(ctx->cdict); + } + xfree(ctx); + } +} + +static const rb_data_type_t zstd_ccontext_type = { + "ZstdCContext", + {zstd_ccontext_mark, zstd_ccontext_free, 0}, + 0, 0, + RUBY_TYPED_FREE_IMMEDIATELY, +}; + +static VALUE zstd_ccontext_alloc(VALUE klass) +{ + zstd_ccontext_t *ctx = ALLOC(zstd_ccontext_t); + + ctx->cctx = NULL; + ctx->cdict = NULL; + ctx->compression_level = ZSTD_CLEVEL_DEFAULT; + ctx->needs_reset = 0; + ctx->dictionary = Qnil; + + return TypedData_Wrap_Struct(klass, &zstd_ccontext_type, ctx); +} + +static VALUE zstd_ccontext_initialize(int argc, VALUE *argv, VALUE self) +{ + VALUE level_value = Qnil; + VALUE dictionary_value = Qnil; + VALUE options = Qnil; + + if (argc == 1) { + if (RB_TYPE_P(argv[0], T_HASH)) { + options = argv[0]; + level_value = rb_hash_aref(options, ID2SYM(rb_intern("level"))); + dictionary_value = rb_hash_aref(options, ID2SYM(rb_intern("dict"))); + } else { + level_value = argv[0]; + } + } else if (argc == 2) { + level_value = argv[0]; + dictionary_value = argv[1]; + } else if (argc > 2) { + rb_raise(rb_eArgError, "wrong number of arguments (given %d, expected 0..2)", argc); + } + + zstd_ccontext_t *ctx; + TypedData_Get_Struct(self, zstd_ccontext_t, &zstd_ccontext_type, ctx); + + ctx->compression_level = convert_compression_level(level_value); + ctx->dictionary = dictionary_value; + + ctx->cctx = ZSTD_createCCtx(); + if (!ctx->cctx) { + rb_raise(rb_eRuntimeError, "Failed to create compression context"); + } + + // Create dictionary if provided + if (!NIL_P(dictionary_value)) { + StringValue(dictionary_value); + char* dict_data = RSTRING_PTR(dictionary_value); + size_t dict_size = RSTRING_LEN(dictionary_value); + + ctx->cdict = ZSTD_createCDict(dict_data, dict_size, ctx->compression_level); + if (!ctx->cdict) { + ZSTD_freeCCtx(ctx->cctx); + ctx->cctx = NULL; + rb_raise(rb_eRuntimeError, "Failed to create compression dictionary"); + } + } + + return self; +} + +static VALUE zstd_ccontext_compress(VALUE self, VALUE input_value) +{ + StringValue(input_value); + char* input_data = RSTRING_PTR(input_value); + size_t input_size = RSTRING_LEN(input_value); + + zstd_ccontext_t *ctx; + TypedData_Get_Struct(self, zstd_ccontext_t, &zstd_ccontext_type, ctx); + + if (!ctx->cctx) { + rb_raise(rb_eRuntimeError, "Compression context not initialized"); + } + + if (ctx->needs_reset) { + size_t reset_result = ZSTD_CCtx_reset(ctx->cctx, ZSTD_reset_session_only); + if (ZSTD_isError(reset_result)) { + rb_raise(rb_eRuntimeError, "Failed to reset compression context: %s", ZSTD_getErrorName(reset_result)); + } + ctx->needs_reset = 0; + } + + size_t max_compressed_size = ZSTD_compressBound(input_size); + VALUE output = rb_str_new(NULL, max_compressed_size); + char* output_data = RSTRING_PTR(output); + + size_t compressed_size; + + // Use dictionary if available + if (ctx->cdict) { + compressed_size = ZSTD_compress_usingCDict(ctx->cctx, + (void*)output_data, max_compressed_size, + (void*)input_data, input_size, + ctx->cdict); + } else { + compressed_size = ZSTD_compressCCtx(ctx->cctx, + (void*)output_data, max_compressed_size, + (void*)input_data, input_size, + ctx->compression_level); + } + + if (ZSTD_isError(compressed_size)) { + rb_raise(rb_eRuntimeError, "Compress failed: %s", ZSTD_getErrorName(compressed_size)); + } + + ctx->needs_reset = 1; + rb_str_resize(output, compressed_size); + return output; +} + +// DContext (decompression-only) implementation +static void zstd_dcontext_mark(void *ptr) +{ + zstd_dcontext_t *ctx = (zstd_dcontext_t*)ptr; + if (ctx) { + rb_gc_mark(ctx->dictionary); + } +} + +static void zstd_dcontext_free(void *ptr) +{ + zstd_dcontext_t *ctx = (zstd_dcontext_t*)ptr; + if (ctx) { + if (ctx->dctx) { + ZSTD_freeDCtx(ctx->dctx); + } + if (ctx->ddict) { + ZSTD_freeDDict(ctx->ddict); + } + xfree(ctx); + } +} + +static const rb_data_type_t zstd_dcontext_type = { + "ZstdDContext", + {zstd_dcontext_mark, zstd_dcontext_free, 0}, + 0, 0, + RUBY_TYPED_FREE_IMMEDIATELY, +}; + +static VALUE zstd_dcontext_alloc(VALUE klass) +{ + zstd_dcontext_t *ctx = ALLOC(zstd_dcontext_t); + + ctx->dctx = NULL; + ctx->ddict = NULL; + ctx->needs_reset = 0; + ctx->dictionary = Qnil; + + return TypedData_Wrap_Struct(klass, &zstd_dcontext_type, ctx); +} + +static VALUE zstd_dcontext_initialize(int argc, VALUE *argv, VALUE self) +{ + VALUE dictionary_value = Qnil; + VALUE options = Qnil; + + if (argc == 1) { + if (RB_TYPE_P(argv[0], T_HASH)) { + options = argv[0]; + dictionary_value = rb_hash_aref(options, ID2SYM(rb_intern("dict"))); + } else { + dictionary_value = argv[0]; + } + } else if (argc > 1) { + rb_raise(rb_eArgError, "wrong number of arguments (given %d, expected 0..1)", argc); + } + + zstd_dcontext_t *ctx; + TypedData_Get_Struct(self, zstd_dcontext_t, &zstd_dcontext_type, ctx); + + ctx->dictionary = dictionary_value; + + ctx->dctx = ZSTD_createDCtx(); + if (!ctx->dctx) { + rb_raise(rb_eRuntimeError, "Failed to create decompression context"); + } + + // Create dictionary if provided + if (!NIL_P(dictionary_value)) { + StringValue(dictionary_value); + char* dict_data = RSTRING_PTR(dictionary_value); + size_t dict_size = RSTRING_LEN(dictionary_value); + + ctx->ddict = ZSTD_createDDict(dict_data, dict_size); + if (!ctx->ddict) { + ZSTD_freeDCtx(ctx->dctx); + ctx->dctx = NULL; + rb_raise(rb_eRuntimeError, "Failed to create decompression dictionary"); + } + } + + return self; +} + +static VALUE zstd_dcontext_decompress(VALUE self, VALUE input_value) +{ + StringValue(input_value); + char* input_data = RSTRING_PTR(input_value); + size_t input_size = RSTRING_LEN(input_value); + + zstd_dcontext_t *ctx; + TypedData_Get_Struct(self, zstd_dcontext_t, &zstd_dcontext_type, ctx); + + if (!ctx->dctx) { + rb_raise(rb_eRuntimeError, "Decompression context not initialized"); + } + + if (ctx->needs_reset) { + size_t reset_result = ZSTD_DCtx_reset(ctx->dctx, ZSTD_reset_session_only); + if (ZSTD_isError(reset_result)) { + rb_raise(rb_eRuntimeError, "Failed to reset decompression context: %s", ZSTD_getErrorName(reset_result)); + } + ctx->needs_reset = 0; + } + + unsigned long long const uncompressed_size = ZSTD_getFrameContentSize(input_data, input_size); + if (uncompressed_size == ZSTD_CONTENTSIZE_ERROR) { + rb_raise(rb_eRuntimeError, "Not compressed by zstd: %s", ZSTD_getErrorName(uncompressed_size)); + } + + if (uncompressed_size == ZSTD_CONTENTSIZE_UNKNOWN) { + ctx->needs_reset = 1; + return decompress_buffered(ctx->dctx, input_data, input_size, false); + } + + VALUE output = rb_str_new(NULL, uncompressed_size); + char* output_data = RSTRING_PTR(output); + + size_t decompress_size; + + // Use dictionary if available + if (ctx->ddict) { + decompress_size = ZSTD_decompress_usingDDict(ctx->dctx, + (void*)output_data, uncompressed_size, + (void*)input_data, input_size, + ctx->ddict); + } else { + decompress_size = ZSTD_decompressDCtx(ctx->dctx, + (void*)output_data, uncompressed_size, + (void*)input_data, input_size); + } + + if (ZSTD_isError(decompress_size)) { + rb_raise(rb_eRuntimeError, "Decompress error: %s", ZSTD_getErrorName(decompress_size)); + } + + ctx->needs_reset = 1; + return output; +} + +void +zstd_ruby_context_init(void) +{ + rb_cZstdCContext = rb_define_class_under(rb_mZstd, "CContext", rb_cObject); + rb_define_alloc_func(rb_cZstdCContext, zstd_ccontext_alloc); + rb_define_method(rb_cZstdCContext, "initialize", zstd_ccontext_initialize, -1); + rb_define_method(rb_cZstdCContext, "compress", zstd_ccontext_compress, 1); + + rb_cZstdDContext = rb_define_class_under(rb_mZstd, "DContext", rb_cObject); + rb_define_alloc_func(rb_cZstdDContext, zstd_dcontext_alloc); + rb_define_method(rb_cZstdDContext, "initialize", zstd_dcontext_initialize, -1); + rb_define_method(rb_cZstdDContext, "decompress", zstd_dcontext_decompress, 1); +} \ No newline at end of file diff --git a/ext/zstdruby/main.c b/ext/zstdruby/main.c index 5497b0e..e4c3796 100644 --- a/ext/zstdruby/main.c +++ b/ext/zstdruby/main.c @@ -4,6 +4,7 @@ VALUE rb_mZstd; VALUE rb_cCDict; VALUE rb_cDDict; void zstd_ruby_init(void); +void zstd_ruby_context_init(void); void zstd_ruby_skippable_frame_init(void); void zstd_ruby_streaming_compress_init(void); void zstd_ruby_streaming_decompress_init(void); @@ -19,6 +20,7 @@ Init_zstdruby(void) rb_cCDict = rb_define_class_under(rb_mZstd, "CDict", rb_cObject); rb_cDDict = rb_define_class_under(rb_mZstd, "DDict", rb_cObject); zstd_ruby_init(); + zstd_ruby_context_init(); zstd_ruby_skippable_frame_init(); zstd_ruby_streaming_compress_init(); zstd_ruby_streaming_decompress_init(); diff --git a/ext/zstdruby/zstdruby.c b/ext/zstdruby/zstdruby.c index e63b95c..c06371d 100644 --- a/ext/zstdruby/zstdruby.c +++ b/ext/zstdruby/zstdruby.c @@ -39,7 +39,7 @@ static VALUE rb_compress(int argc, VALUE *argv, VALUE self) return output; } -static VALUE decompress_buffered(ZSTD_DCtx* dctx, const char* input_data, size_t input_size) +VALUE decompress_buffered(ZSTD_DCtx* dctx, const char* input_data, size_t input_size, bool free_ctx) { ZSTD_inBuffer input = { input_data, input_size, 0 }; VALUE result = rb_str_new(0, 0); @@ -52,12 +52,12 @@ static VALUE decompress_buffered(ZSTD_DCtx* dctx, const char* input_data, size_t size_t ret = zstd_stream_decompress(dctx, &output, &input, false); if (ZSTD_isError(ret)) { - ZSTD_freeDCtx(dctx); + if (free_ctx) ZSTD_freeDCtx(dctx); rb_raise(rb_eRuntimeError, "%s: %s", "ZSTD_decompressStream failed", ZSTD_getErrorName(ret)); } rb_str_cat(result, output.dst, output.pos); } - ZSTD_freeDCtx(dctx); + if (free_ctx) ZSTD_freeDCtx(dctx); return result; } @@ -81,9 +81,9 @@ static VALUE rb_decompress(int argc, VALUE *argv, VALUE self) } // ZSTD_decompressStream may be called multiple times when ZSTD_CONTENTSIZE_UNKNOWN, causing slowness. // Therefore, we will not standardize on ZSTD_decompressStream - if (uncompressed_size == ZSTD_CONTENTSIZE_UNKNOWN) { - return decompress_buffered(dctx, input_data, input_size); - } + if (uncompressed_size == ZSTD_CONTENTSIZE_UNKNOWN) { + return decompress_buffered(dctx, input_data, input_size, true); + } VALUE output = rb_str_new(NULL, uncompressed_size); char* output_data = RSTRING_PTR(output); diff --git a/lib/zstd-ruby.rb b/lib/zstd-ruby.rb index 4e0d54d..7fb71d7 100644 --- a/lib/zstd-ruby.rb +++ b/lib/zstd-ruby.rb @@ -1,5 +1,6 @@ require "zstd-ruby/version" require "zstd-ruby/zstdruby" +require "zstd-ruby/context" require "zstd-ruby/stream_writer" require "zstd-ruby/stream_reader" diff --git a/lib/zstd-ruby/context.rb b/lib/zstd-ruby/context.rb new file mode 100644 index 0000000..0862196 --- /dev/null +++ b/lib/zstd-ruby/context.rb @@ -0,0 +1,48 @@ +module Zstd + class Context + def initialize(*args) + @args = args + @cctx = nil + @dctx = nil + + # Extract dictionary for DContext + @dict = extract_dictionary_from_args(args) + end + + def compress(data) + @cctx ||= CContext.new(*@args) + @cctx.compress(data) + end + + def decompress(data) + @dctx ||= create_dcontext + @dctx.decompress(data) + end + + private + + def extract_dictionary_from_args(args) + return nil if args.empty? + + # Check if first argument is a hash with dict key + if args.first.is_a?(Hash) && args.first.key?(:dict) + return args.first[:dict] + end + + # Check for positional dictionary argument (level, dict) + if args.length == 2 + return args[1] + end + + nil + end + + def create_dcontext + if @dict + DContext.new(@dict) + else + DContext.new + end + end + end +end diff --git a/spec/zstd-ruby-context_spec.rb b/spec/zstd-ruby-context_spec.rb new file mode 100644 index 0000000..1251a63 --- /dev/null +++ b/spec/zstd-ruby-context_spec.rb @@ -0,0 +1,267 @@ +require "spec_helper" + +describe Zstd::Context do + let(:test_data) { "Hello World!" * 100 } + let(:small_data) { "Hello World!" } + let(:large_data) { "A" * 100_000 } + + describe "#initialize" do + it "creates a new context with default compression level" do + ctx = Zstd::Context.new + expect(ctx).to be_a(Zstd::Context) + end + + it "creates a new context with specified compression level" do + ctx = Zstd::Context.new(level: 5) + expect(ctx).to be_a(Zstd::Context) + end + + it "creates a new context with integer compression level" do + ctx = Zstd::Context.new(5) + expect(ctx).to be_a(Zstd::Context) + end + + it "handles negative compression levels" do + ctx = Zstd::Context.new(level: -1) + expect(ctx).to be_a(Zstd::Context) + end + + it "handles high compression levels" do + ctx = Zstd::Context.new(level: 19) + expect(ctx).to be_a(Zstd::Context) + end + end + + describe "#compress" do + let(:ctx) { Zstd::Context.new(level: 3) } + + it "compresses data correctly" do + compressed = ctx.compress(test_data) + expect(compressed).to be_a(String) + expect(compressed.length).to be < test_data.length + end + + it "compresses empty string" do + compressed = ctx.compress("") + expect(compressed).to be_a(String) + end + + it "compresses small data" do + compressed = ctx.compress(small_data) + expect(compressed).to be_a(String) + end + + it "compresses large data" do + compressed = ctx.compress(large_data) + expect(compressed).to be_a(String) + expect(compressed.length).to be < large_data.length + end + + it "can compress multiple times with same context" do + compressed1 = ctx.compress(test_data) + compressed2 = ctx.compress(test_data) + + expect(compressed1).to be_a(String) + expect(compressed2).to be_a(String) + expect(compressed1).to eq(compressed2) + end + + it "compresses different data with same context" do + data1 = "First piece of data" + data2 = "Second piece of data" + + compressed1 = ctx.compress(data1) + compressed2 = ctx.compress(data2) + + expect(compressed1).to be_a(String) + expect(compressed2).to be_a(String) + expect(compressed1).not_to eq(compressed2) + end + end + + describe "#decompress" do + let(:ctx) { Zstd::Context.new(level: 3) } + + it "decompresses data correctly" do + compressed = ctx.compress(test_data) + decompressed = ctx.decompress(compressed) + expect(decompressed).to eq(test_data) + end + + it "decompresses empty string" do + compressed = ctx.compress("") + decompressed = ctx.decompress(compressed) + expect(decompressed).to eq("") + end + + it "decompresses small data" do + compressed = ctx.compress(small_data) + decompressed = ctx.decompress(compressed) + expect(decompressed).to eq(small_data) + end + + it "decompresses large data" do + compressed = ctx.compress(large_data) + decompressed = ctx.decompress(compressed) + expect(decompressed).to eq(large_data) + end + + it "can decompress multiple times with same context" do + compressed = ctx.compress(test_data) + decompressed1 = ctx.decompress(compressed) + decompressed2 = ctx.decompress(compressed) + + expect(decompressed1).to eq(test_data) + expect(decompressed2).to eq(test_data) + expect(decompressed1).to eq(decompressed2) + end + + it "raises error for invalid compressed data" do + expect { + ctx.decompress("invalid data") + }.to raise_error(RuntimeError, /Not compressed by zstd/) + end + end + + describe "context reuse" do + let(:ctx) { Zstd::Context.new(level: 3) } + + it "can alternate between compress and decompress operations" do + data1 = "First data" + data2 = "Second data" + + compressed1 = ctx.compress(data1) + decompressed1 = ctx.decompress(compressed1) + compressed2 = ctx.compress(data2) + decompressed2 = ctx.decompress(compressed2) + + expect(decompressed1).to eq(data1) + expect(decompressed2).to eq(data2) + end + + it "handles multiple operations efficiently" do + data_sets = [ + "Data set 1", + "Data set 2" * 10, + "Data set 3" * 100, + "", + "Single" + ] + + compressed_data = [] + decompressed_data = [] + + # Compress all data + data_sets.each do |data| + compressed_data << ctx.compress(data) + end + + # Decompress all data + compressed_data.each do |compressed| + decompressed_data << ctx.decompress(compressed) + end + + expect(decompressed_data).to eq(data_sets) + end + end + + describe "compatibility with module methods" do + let(:ctx) { Zstd::Context.new(level: 3) } + + it "produces compatible compressed data with Zstd.compress" do + compressed_ctx = ctx.compress(test_data) + compressed_module = Zstd.compress(test_data, level: 3) + + decompressed_from_ctx = Zstd.decompress(compressed_ctx) + decompressed_from_module = ctx.decompress(compressed_module) + + expect(decompressed_from_ctx).to eq(test_data) + expect(decompressed_from_module).to eq(test_data) + end + + it "can decompress data compressed by module methods" do + compressed = Zstd.compress(test_data, level: 3) + decompressed = ctx.decompress(compressed) + expect(decompressed).to eq(test_data) + end + + it "module methods can decompress context-compressed data" do + compressed = ctx.compress(test_data) + decompressed = Zstd.decompress(compressed) + expect(decompressed).to eq(test_data) + end + end + + describe "compression levels" do + it "different levels produce different compression ratios" do + data = "A" * 10000 + + ctx_low = Zstd::Context.new(level: 1) + ctx_high = Zstd::Context.new(level: 9) + + compressed_low = ctx_low.compress(data) + compressed_high = ctx_high.compress(data) + + # Both should decompress correctly + expect(ctx_low.decompress(compressed_low)).to eq(data) + expect(ctx_high.decompress(compressed_high)).to eq(data) + + # Higher compression should generally produce smaller output + expect(compressed_high.length).to be <= compressed_low.length + end + end + + describe "thread safety" do + it "each context instance is independent" do + ctx1 = Zstd::Context.new(level: 1) + ctx2 = Zstd::Context.new(level: 5) + + data1 = "Context 1 data" + data2 = "Context 2 data" + + compressed1 = ctx1.compress(data1) + compressed2 = ctx2.compress(data2) + + decompressed1 = ctx1.decompress(compressed1) + decompressed2 = ctx2.decompress(compressed2) + + expect(decompressed1).to eq(data1) + expect(decompressed2).to eq(data2) + + # Cross-decompression should also work + expect(ctx1.decompress(compressed2)).to eq(data2) + expect(ctx2.decompress(compressed1)).to eq(data1) + end + end + + describe "memory management" do + it "handles context cleanup properly" do + # Create and use many contexts to test memory management + 100.times do |i| + ctx = Zstd::Context.new(level: (i % 10) + 1) + data = "Test data #{i}" + compressed = ctx.compress(data) + decompressed = ctx.decompress(compressed) + expect(decompressed).to eq(data) + end + end + end + + describe "error handling" do + let(:ctx) { Zstd::Context.new } + + it "handles binary data correctly" do + binary_data = (0..255).map(&:chr).join * 100 + compressed = ctx.compress(binary_data) + decompressed = ctx.decompress(compressed) + expect(decompressed).to eq(binary_data) + end + + it "preserves encoding" do + utf8_data = "Hello 世界! 🌍" + compressed = ctx.compress(utf8_data) + decompressed = ctx.decompress(compressed) + expect(decompressed.force_encoding(utf8_data.encoding)).to eq(utf8_data) + end + end +end diff --git a/spec/zstd-ruby-dictionary-contexts_spec.rb b/spec/zstd-ruby-dictionary-contexts_spec.rb new file mode 100644 index 0000000..410f675 --- /dev/null +++ b/spec/zstd-ruby-dictionary-contexts_spec.rb @@ -0,0 +1,180 @@ +require "spec_helper" + +describe "Zstd Dictionary Context Support" do + let(:test_data) { "This is sample data for dictionary testing. It contains repeated patterns and common phrases that should compress well with a dictionary." } + let(:dictionary) do + # Create a simple dictionary from repeated patterns + "sample data dictionary testing patterns phrases compress well common repeated" + end + + describe Zstd::CContext do + describe "dictionary support" do + it "creates context with dictionary using hash syntax" do + ctx = Zstd::CContext.new(level: 3, dict: dictionary) + expect(ctx).to be_a(Zstd::CContext) + end + + it "creates context with dictionary using positional arguments" do + ctx = Zstd::CContext.new(3, dictionary) + expect(ctx).to be_a(Zstd::CContext) + end + + it "compresses data using dictionary" do + ctx = Zstd::CContext.new(level: 3, dict: dictionary) + compressed = ctx.compress(test_data) + expect(compressed).to be_a(String) + expect(compressed.length).to be < test_data.length + end + + it "produces smaller output with dictionary than without" do + ctx_with_dict = Zstd::CContext.new(level: 3, dict: dictionary) + ctx_without_dict = Zstd::CContext.new(level: 3) + + compressed_with_dict = ctx_with_dict.compress(test_data) + compressed_without_dict = ctx_without_dict.compress(test_data) + + # Dictionary should produce better compression for data with matching patterns + expect(compressed_with_dict.length).to be <= compressed_without_dict.length + end + + it "can compress multiple times with same dictionary context" do + ctx = Zstd::CContext.new(level: 3, dict: dictionary) + + compressed1 = ctx.compress(test_data) + compressed2 = ctx.compress(test_data + " additional data") + + expect(compressed1).to be_a(String) + expect(compressed2).to be_a(String) + expect(compressed1).not_to eq(compressed2) + end + + it "works with empty dictionary" do + ctx = Zstd::CContext.new(level: 3, dict: "") + compressed = ctx.compress(test_data) + expect(compressed).to be_a(String) + end + end + end + + describe Zstd::DContext do + let(:cctx) { Zstd::CContext.new(level: 3, dict: dictionary) } + let(:compressed_data) { cctx.compress(test_data) } + + describe "dictionary support" do + it "creates context with dictionary using hash syntax" do + ctx = Zstd::DContext.new(dict: dictionary) + expect(ctx).to be_a(Zstd::DContext) + end + + it "creates context with dictionary using positional argument" do + ctx = Zstd::DContext.new(dictionary) + expect(ctx).to be_a(Zstd::DContext) + end + + it "decompresses data using dictionary" do + ctx = Zstd::DContext.new(dict: dictionary) + decompressed = ctx.decompress(compressed_data) + expect(decompressed).to eq(test_data) + end + + it "can decompress multiple times with same dictionary context" do + ctx = Zstd::DContext.new(dict: dictionary) + + decompressed1 = ctx.decompress(compressed_data) + decompressed2 = ctx.decompress(compressed_data) + + expect(decompressed1).to eq(test_data) + expect(decompressed2).to eq(test_data) + end + + it "works with empty dictionary" do + cctx_empty = Zstd::CContext.new(level: 3, dict: "") + compressed_empty_dict = cctx_empty.compress(test_data) + + ctx = Zstd::DContext.new(dict: "") + decompressed = ctx.decompress(compressed_empty_dict) + expect(decompressed).to eq(test_data) + end + + it "fails with wrong dictionary" do + wrong_dict = "completely different dictionary content that does not match" + ctx = Zstd::DContext.new(dict: wrong_dict) + + expect { + ctx.decompress(compressed_data) + }.to raise_error(RuntimeError, /Decompress error/) + end + end + end + + describe "Cross-context compatibility" do + it "CContext with dict can be decompressed by DContext with same dict" do + cctx = Zstd::CContext.new(level: 3, dict: dictionary) + dctx = Zstd::DContext.new(dict: dictionary) + + compressed = cctx.compress(test_data) + decompressed = dctx.decompress(compressed) + + expect(decompressed).to eq(test_data) + end + + it "works with module methods and dictionaries" do + cctx = Zstd::CContext.new(level: 3, dict: dictionary) + compressed = cctx.compress(test_data) + + # Should be compatible with module dictionary decompression + decompressed = Zstd.decompress(compressed, dict: dictionary) + expect(decompressed).to eq(test_data) + end + + it "module dictionary compression can be decompressed by DContext" do + compressed = Zstd.compress(test_data, level: 3, dict: dictionary) + + dctx = Zstd::DContext.new(dict: dictionary) + decompressed = dctx.decompress(compressed) + + expect(decompressed).to eq(test_data) + end + end + + describe "Ruby Context with dictionaries" do + it "supports dictionaries in unified Context class" do + # This should work once Ruby Context supports dictionary options + expect { + ctx = Zstd::Context.new(level: 3, dict: dictionary) + compressed = ctx.compress(test_data) + decompressed = ctx.decompress(compressed) + expect(decompressed).to eq(test_data) + }.not_to raise_error + end + end + + describe "Performance with dictionaries" do + it "context reuse with dictionaries is efficient" do + cctx = Zstd::CContext.new(level: 3, dict: dictionary) + dctx = Zstd::DContext.new(dict: dictionary) + + # Multiple operations should work efficiently + 10.times do |i| + data = "#{test_data} iteration #{i}" + compressed = cctx.compress(data) + decompressed = dctx.decompress(compressed) + expect(decompressed).to eq(data) + end + end + end + + describe "Error handling" do + it "handles invalid dictionary gracefully" do + expect { + Zstd::CContext.new(level: 3, dict: nil) + }.not_to raise_error + end + + it "raises error for non-string dictionary" do + expect { + Zstd::CContext.new(level: 3, dict: 12345) + }.to raise_error(TypeError) + end + end +end diff --git a/spec/zstd-ruby-split-contexts_spec.rb b/spec/zstd-ruby-split-contexts_spec.rb new file mode 100644 index 0000000..b0cde60 --- /dev/null +++ b/spec/zstd-ruby-split-contexts_spec.rb @@ -0,0 +1,292 @@ +require "spec_helper" + +describe "Zstd Split Contexts" do + let(:test_data) { "Hello World!" * 100 } + let(:small_data) { "Hello World!" } + let(:large_data) { "A" * 100_000 } + + describe Zstd::CContext do + describe "#initialize" do + it "creates a new compression context with default level" do + ctx = Zstd::CContext.new + expect(ctx).to be_a(Zstd::CContext) + end + + it "creates a new compression context with specified level" do + ctx = Zstd::CContext.new(level: 5) + expect(ctx).to be_a(Zstd::CContext) + end + + it "creates a new compression context with integer level" do + ctx = Zstd::CContext.new(5) + expect(ctx).to be_a(Zstd::CContext) + end + + it "handles negative compression levels" do + ctx = Zstd::CContext.new(level: -1) + expect(ctx).to be_a(Zstd::CContext) + end + + it "handles high compression levels" do + ctx = Zstd::CContext.new(level: 19) + expect(ctx).to be_a(Zstd::CContext) + end + end + + describe "#compress" do + let(:ctx) { Zstd::CContext.new(level: 3) } + + it "compresses data correctly" do + compressed = ctx.compress(test_data) + expect(compressed).to be_a(String) + expect(compressed.length).to be < test_data.length + end + + it "compresses empty string" do + compressed = ctx.compress("") + expect(compressed).to be_a(String) + end + + it "compresses small data" do + compressed = ctx.compress(small_data) + expect(compressed).to be_a(String) + end + + it "compresses large data" do + compressed = ctx.compress(large_data) + expect(compressed).to be_a(String) + expect(compressed.length).to be < large_data.length + end + + it "can compress multiple times with same context" do + compressed1 = ctx.compress(test_data) + compressed2 = ctx.compress(test_data) + + expect(compressed1).to be_a(String) + expect(compressed2).to be_a(String) + expect(compressed1).to eq(compressed2) + end + + it "compresses different data with same context" do + data1 = "First piece of data" + data2 = "Second piece of data" + + compressed1 = ctx.compress(data1) + compressed2 = ctx.compress(data2) + + expect(compressed1).to be_a(String) + expect(compressed2).to be_a(String) + expect(compressed1).not_to eq(compressed2) + end + + it "is compatible with module decompression" do + compressed = ctx.compress(test_data) + decompressed = Zstd.decompress(compressed) + expect(decompressed).to eq(test_data) + end + end + + describe "compression levels" do + it "different levels produce different compression ratios" do + data = "A" * 10000 + + ctx_low = Zstd::CContext.new(level: 1) + ctx_high = Zstd::CContext.new(level: 9) + + compressed_low = ctx_low.compress(data) + compressed_high = ctx_high.compress(data) + + # Both should decompress correctly + expect(Zstd.decompress(compressed_low)).to eq(data) + expect(Zstd.decompress(compressed_high)).to eq(data) + + # Higher compression should generally produce smaller output + expect(compressed_high.length).to be <= compressed_low.length + end + end + end + + describe Zstd::DContext do + let(:ctx) { Zstd::DContext.new } + let(:cctx) { Zstd::CContext.new(level: 3) } + + describe "#initialize" do + it "creates a new decompression context" do + ctx = Zstd::DContext.new + expect(ctx).to be_a(Zstd::DContext) + end + + it "accepts dictionary argument" do + dict = "sample dictionary data" + ctx = Zstd::DContext.new(dict) + expect(ctx).to be_a(Zstd::DContext) + end + end + + describe "#decompress" do + it "decompresses data correctly" do + compressed = cctx.compress(test_data) + decompressed = ctx.decompress(compressed) + expect(decompressed).to eq(test_data) + end + + it "decompresses empty string" do + compressed = cctx.compress("") + decompressed = ctx.decompress(compressed) + expect(decompressed).to eq("") + end + + it "decompresses small data" do + compressed = cctx.compress(small_data) + decompressed = ctx.decompress(compressed) + expect(decompressed).to eq(small_data) + end + + it "decompresses large data" do + compressed = cctx.compress(large_data) + decompressed = ctx.decompress(compressed) + expect(decompressed).to eq(large_data) + end + + it "can decompress multiple times with same context" do + compressed = cctx.compress(test_data) + decompressed1 = ctx.decompress(compressed) + decompressed2 = ctx.decompress(compressed) + + expect(decompressed1).to eq(test_data) + expect(decompressed2).to eq(test_data) + expect(decompressed1).to eq(decompressed2) + end + + it "raises error for invalid compressed data" do + expect { + ctx.decompress("invalid data") + }.to raise_error(RuntimeError, /Not compressed by zstd/) + end + + it "is compatible with module compression" do + compressed = Zstd.compress(test_data, level: 3) + decompressed = ctx.decompress(compressed) + expect(decompressed).to eq(test_data) + end + end + end + + describe "Cross-compatibility" do + let(:cctx) { Zstd::CContext.new(level: 3) } + let(:dctx) { Zstd::DContext.new } + let(:unified_ctx) { Zstd::Context.new(level: 3) } + + it "CContext output can be decompressed by DContext" do + compressed = cctx.compress(test_data) + decompressed = dctx.decompress(compressed) + expect(decompressed).to eq(test_data) + end + + it "all context types produce compatible output" do + module_compressed = Zstd.compress(test_data, level: 3) + cctx_compressed = cctx.compress(test_data) + unified_compressed = unified_ctx.compress(test_data) + + # All should decompress to the same result + expect(dctx.decompress(module_compressed)).to eq(test_data) + expect(dctx.decompress(cctx_compressed)).to eq(test_data) + expect(dctx.decompress(unified_compressed)).to eq(test_data) + expect(unified_ctx.decompress(cctx_compressed)).to eq(test_data) + expect(Zstd.decompress(cctx_compressed)).to eq(test_data) + end + + it "handles mixed workloads efficiently" do + # Compress different data with CContext + data1 = "First data set" + data2 = "Second data set" * 10 + data3 = "Third data set" * 100 + + compressed1 = cctx.compress(data1) + compressed2 = cctx.compress(data2) + compressed3 = cctx.compress(data3) + + # Decompress with DContext + expect(dctx.decompress(compressed1)).to eq(data1) + expect(dctx.decompress(compressed2)).to eq(data2) + expect(dctx.decompress(compressed3)).to eq(data3) + end + end + + describe "Memory efficiency" do + it "CContext uses less memory than unified Context" do + # This is a conceptual test - CContext only allocates compression context + cctx = Zstd::CContext.new(level: 3) + unified_ctx = Zstd::Context.new(level: 3) + + # Both should work, but CContext should be more memory efficient + compressed_c = cctx.compress(test_data) + compressed_unified = unified_ctx.compress(test_data) + + expect(compressed_c).to be_a(String) + expect(compressed_unified).to be_a(String) + end + + it "DContext uses less memory than unified Context" do + # This is a conceptual test - DContext only allocates decompression context + compressed_data = Zstd.compress(test_data, level: 3) + + dctx = Zstd::DContext.new + unified_ctx = Zstd::Context.new(level: 3) + + # Both should work, but DContext should be more memory efficient + decompressed_d = dctx.decompress(compressed_data) + decompressed_unified = unified_ctx.decompress(compressed_data) + + expect(decompressed_d).to eq(test_data) + expect(decompressed_unified).to eq(test_data) + end + end + + describe "Error handling" do + let(:cctx) { Zstd::CContext.new } + let(:dctx) { Zstd::DContext.new } + + it "handles binary data correctly" do + binary_data = (0..255).map(&:chr).join * 100 + compressed = cctx.compress(binary_data) + decompressed = dctx.decompress(compressed) + expect(decompressed).to eq(binary_data) + end + + it "handles UTF-8 data correctly" do + utf8_data = "Hello 世界! 🌍" + compressed = cctx.compress(utf8_data) + decompressed = dctx.decompress(compressed) + expect(decompressed.force_encoding(utf8_data.encoding)).to eq(utf8_data) + end + end + + describe "Performance characteristics" do + let(:medium_data) { "A" * 10_000 } + + it "contexts can be reused efficiently" do + cctx = Zstd::CContext.new(level: 3) + dctx = Zstd::DContext.new + + # Multiple compression operations + compressed_results = [] + 10.times do |i| + data = "#{medium_data}_#{i}" + compressed_results << cctx.compress(data) + end + + # Multiple decompression operations + decompressed_results = [] + compressed_results.each do |compressed| + decompressed_results << dctx.decompress(compressed) + end + + # Verify all operations succeeded + expect(decompressed_results.length).to eq(10) + decompressed_results.each_with_index do |result, i| + expect(result).to eq("#{medium_data}_#{i}") + end + end + end +end