diff --git a/doc/specs/index.md b/doc/specs/index.md index f709fb2ca..d104c0397 100644 --- a/doc/specs/index.md +++ b/doc/specs/index.md @@ -14,6 +14,7 @@ This is an index/directory of the specifications (specs) for each new module/fea - [ansi](./stdlib_ansi.html) - Terminal color and style escape sequences - [array](./stdlib_array.html) - Procedures for index manipulation and array handling - [ascii](./stdlib_ascii.html) - Procedures for handling ASCII characters + - [base64](./stdlib_base64.html) - Base64 encoding and decoding routines - [constants](./stdlib_constants.html) - Constants - [bitsets](./stdlib_bitsets.html) - Bitset data types and procedures - [error](./stdlib_error.html) - Catching and handling errors diff --git a/doc/specs/stdlib_base64.md b/doc/specs/stdlib_base64.md new file mode 100644 index 000000000..67ea000d4 --- /dev/null +++ b/doc/specs/stdlib_base64.md @@ -0,0 +1,172 @@ +--- +title: base64 +--- + +# The `stdlib_base64` module + +[TOC] + +## Introduction + +The `stdlib_base64` module provides procedures to encode and decode intrinsic +Fortran data using the Base64 encoding scheme defined by RFC 4648. + +## Specification of the `stdlib_base64` procedures + +### `base64_encode` + +#### Status + +Experimental + +#### Description + +Encodes intrinsic arrays into Base64 text. + +This is the ergonomic API for encoding. It allocates and returns the output +string automatically. + +#### Syntax + +`res =` [[stdlib_base64(module):base64_encode(interface)]] `(data)` + +#### Class + +Function. + +#### Argument + +`data`: shall be a contiguous array of an intrinsic type (`real`, `integer`, +`complex`, `logical`, or `integer(int8)`). It is an `intent(in)` argument. + +#### Result value + +The result `res` is an allocatable character string (`character(len=:)`) +containing the Base64 representation of the input `data`. + +#### Example + +```fortran +character(len=:), allocatable :: encoded +integer(int8) :: bytes(3) = [77_int8, 97_int8, 110_int8] + +encoded = base64_encode(bytes) ! "TWFu" +``` + +### `base64_encode_into` + +#### Status + +Experimental + +#### Description + +Encodes bytes into a caller-provided output buffer. + +This is the preallocated API for throughput-sensitive workflows. It does not +allocate and reports status through `err_state`. On success, +`err_state%ok()` is `.true.` and `encoded_len` is the number of meaningful +characters written to `str`. + +#### Syntax + +`call` [[stdlib_base64(module):base64_encode_into(subroutine)]] & +`(bytes, str, encoded_len, err_state)` + +#### Class + +Pure module subroutine. + +#### Arguments + +`bytes`: shall be a contiguous array of type `integer(int8)`. It is an +`intent(in)` argument. + +`str`: shall be an intrinsic character type. It is an `intent(out)` argument. + +`encoded_len`: shall be an `integer`. It is an `intent(out)` argument +representing the number of encoded characters written. + +`err_state`: shall be a `type(state_type)` from `stdlib_error`. It is an +`intent(out)` argument used to report success or errors. + +### `base64_decode` + +#### Status + +Experimental + +#### Description + +Decodes Base64 text and returns an allocated byte string. + +This is the ergonomic API for decoding. It allocates and returns the result +automatically. On error, an empty result is returned. If `err_state` is +present, error details are reported there. + +#### Syntax + +`res =` [[stdlib_base64(module):base64_decode(function)]] `(str [, err_state])` + +#### Class + +Function. + +#### Arguments + +`str`: shall be an intrinsic character type. It is an `intent(in)` argument. + +`err_state` (optional): shall be a `type(state_type)` from `stdlib_error`. +It is an `intent(out)` argument used to report invalid Base64 sequences. + +#### Result value + +The result `res` is an allocatable character string (`character(len=:)`) +containing the decoded bytes. + +#### Example + +```fortran +character(len=:), allocatable :: decoded + +decoded = base64_decode("TWFu") ! "Man" +``` + +### `base64_decode_into` + +#### Status + +Experimental + +#### Description + +Decodes Base64 text into a caller-provided output buffer. + +This is the preallocated API for throughput-sensitive workflows. It does not +allocate and reports status through `err_state`. The optional `skip_despace` +argument can be used when the input is already whitespace-free to bypass the +despacing step. + +#### Syntax + +`call` [[stdlib_base64(module):base64_decode_into(subroutine)]] & +`(str, res, decoded_len, err_state [, skip_despace])` + +#### Class + +Pure module subroutine. + +#### Arguments + +`str`: shall be an intrinsic character type. It is an `intent(in)` argument. + +`res`: shall be an intrinsic character type. It is an `intent(out)` argument. + +`decoded_len`: shall be an `integer`. It is an `intent(out)` argument +representing the number of decoded bytes written. + +`err_state`: shall be a `type(state_type)` from `stdlib_error`. It is an +`intent(out)` argument used to report success or errors. + +`skip_despace` (optional): shall be a `logical`. It is an `intent(in)` +argument. If `.true.`, the routine assumes the input contains no whitespace. diff --git a/src/core/stdlib_ascii.fypp b/src/core/stdlib_ascii.fypp index fa062900d..1555fc492 100644 --- a/src/core/stdlib_ascii.fypp +++ b/src/core/stdlib_ascii.fypp @@ -65,6 +65,7 @@ module stdlib_ascii character(len=*), public, parameter :: letters = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" !! A .. Za .. z character(len=*), public, parameter :: uppercase = letters(1:26) !! A .. Z character(len=*), public, parameter :: lowercase = letters(27:) !! a .. z + character(len=*), public, parameter :: base64_alphabet = letters//digits//"+/" !! RFC 4648 Base64 alphabet character(len=*), public, parameter :: whitespace = " "//TAB//VT//CR//LF//FF !! ASCII _whitespace diff --git a/src/strings/CMakeLists.txt b/src/strings/CMakeLists.txt index ad654dc9f..e12790797 100644 --- a/src/strings/CMakeLists.txt +++ b/src/strings/CMakeLists.txt @@ -1,4 +1,7 @@ set(strings_fppFiles + stdlib_base64_encode.fypp + stdlib_base64_decode.fypp + stdlib_base64.fypp stdlib_string_type.fypp stdlib_string_type_constructor.fypp stdlib_str2num.fypp diff --git a/src/strings/stdlib_base64.fypp b/src/strings/stdlib_base64.fypp new file mode 100644 index 000000000..275f22e17 --- /dev/null +++ b/src/strings/stdlib_base64.fypp @@ -0,0 +1,177 @@ +! SPDX-Identifier: MIT +#:include "common.fypp" + +!> Base64 encoding and decoding algorithms +!> +!> This module provides procedures to safely encode and decode data +!> using the standard Base64 encoding scheme defined in RFC 4648. +!> +!> @note +!> **Experimental:** This API is currently considered experimental and +!> may be subject to change in future releases of `stdlib`. + +module stdlib_base64 + use stdlib_kinds, only: sp, dp, xdp, qp, int8, int16, int32, int64, lk, c_bool + use, intrinsic :: iso_c_binding, only: c_size_t + use stdlib_ascii, only: base64_alphabet + use stdlib_error, only: state_type + + implicit none + private + + public :: base64_encode, base64_encode_into + public :: base64_decode, base64_decode_into + + ! Branchless RFC 4648 decode map: byte -> 6-bit value, invalid -> -1. + ! The main loop OR-reduces values and checks once at the end. + integer(int8), parameter :: DT(0:255) = int( [ & + -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1, & + -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1, & + -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,62,-1,-1,-1,63, & + 52,53,54,55,56,57,58,59,60,61,-1,-1,-1,-1,-1,-1, & + -1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9,10,11,12,13,14, & + 15,16,17,18,19,20,21,22,23,24,25,-1,-1,-1,-1,-1, & + -1,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40, & + 41,42,43,44,45,46,47,48,49,50,51,-1,-1,-1,-1,-1, & + -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1, & + -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1, & + -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1, & + -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1, & + -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1, & + -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1, & + -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1, & + -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1 & + ], int8) + + ! Branchless stream-compaction mask for despace step: keep(1) / drop(0). + integer(int32), parameter :: IS_VAL(0:255) = int([ & + 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, & + 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, & + 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, & + 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, & + 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, & + 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, & + 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, & + 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, & + 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, & + 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, & + 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, & + 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, & + 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, & + 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, & + 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, & + 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 & + ], int32) + + !> Encodes intrinsic arrays into Base64 text. + !> + !> This is the ergonomic API for encoding. It allocates and returns the + !> output string automatically. + !> + !> Use this form when convenience matters more than avoiding allocations. + !> + !> Example: + !>```fortran + !> character(len=:), allocatable :: encoded + !> integer(int8) :: bytes(3) = [77_int8, 97_int8, 110_int8] + !> encoded = base64_encode(bytes) ! "TWFu" + !>``` + interface base64_encode +#:for k1, t1, _, _ in REAL_KINDS_TYPES + module procedure :: base64_encode_real_${k1}$ +#:endfor +#:for k1, t1, _, _ in INT_KINDS_TYPES + module procedure :: base64_encode_int_${k1}$ +#:endfor +#:for k1, t1, _ in CMPLX_KINDS_TYPES + module procedure :: base64_encode_cmplx_${k1}$ +#:endfor +#:for k1, t1 in LOG_KINDS_TYPES + module procedure :: base64_encode_logical_${k1}$ +#:endfor + end interface base64_encode + + interface +#:for k1, t1, _, _ in REAL_KINDS_TYPES + module function base64_encode_real_${k1}$(data) result(str) + ${t1}$, intent(in), target, contiguous :: data(..) + character(len=:), allocatable :: str + end function base64_encode_real_${k1}$ +#:endfor + +#:for k1, t1, _, _ in INT_KINDS_TYPES + module function base64_encode_int_${k1}$(data) result(str) + ${t1}$, intent(in), target, contiguous :: data(..) + character(len=:), allocatable :: str + end function base64_encode_int_${k1}$ +#:endfor + +#:for k1, t1, _ in CMPLX_KINDS_TYPES + module function base64_encode_cmplx_${k1}$(data) result(str) + ${t1}$, intent(in), target, contiguous :: data(..) + character(len=:), allocatable :: str + end function base64_encode_cmplx_${k1}$ +#:endfor + +#:for k1, t1 in LOG_KINDS_TYPES + module function base64_encode_logical_${k1}$(data) result(str) + ${t1}$, intent(in), target, contiguous :: data(..) + character(len=:), allocatable :: str + end function base64_encode_logical_${k1}$ +#:endfor + + module function base64_encode_bytes(bytes) result(str) + integer(int8), intent(in), target, contiguous :: bytes(:) + character(len=:), allocatable, target :: str + end function base64_encode_bytes + + !> Encodes bytes into a caller-provided output buffer. + !> + !> This is the preallocated API for throughput-sensitive workflows. + !> It does not allocate and reports status through `err_state`. + !> + !> On success, `err_state%ok()` is `.true.` and `encoded_len` is the + !> number of meaningful characters written into `str`. + pure module subroutine base64_encode_into(bytes, str, encoded_len, err_state) + integer(int8), intent(in), target, contiguous :: bytes(:) + character(len=*), intent(out), target :: str + integer, intent(out) :: encoded_len + type(state_type), intent(out) :: err_state + end subroutine base64_encode_into + + !> Decodes Base64 text into a caller-provided output buffer. + !> + !> This is the preallocated API for throughput-sensitive workflows. + !> It does not allocate and reports status through `err_state`. + !> + !> The optional `skip_despace` input can be used when the input is + !> already whitespace-free. + pure module subroutine base64_decode_into(str, res, decoded_len, err_state, skip_despace) + character(len=*), intent(in) :: str + character(len=*), intent(out) :: res + integer, intent(out) :: decoded_len + type(state_type), intent(out) :: err_state + logical, intent(in), optional :: skip_despace + end subroutine base64_decode_into + + !> Decodes Base64 text and returns an allocated byte-string. + !> + !> This is the ergonomic API for decoding. It allocates and returns + !> the result automatically. + !> + !> On error, an empty result is returned. If `err_state` is present, + !> details are stored there. + !> + !> Example: + !>```fortran + !> character(len=:), allocatable :: decoded + !> decoded = base64_decode("TWFu") ! "Man" + !>``` + module function base64_decode(str, err_state) result(res) + character(len=*), intent(in) :: str + type(state_type), intent(out), optional :: err_state + character(len=:), allocatable :: res + end function base64_decode + end interface + +end module stdlib_base64 \ No newline at end of file diff --git a/src/strings/stdlib_base64_decode.fypp b/src/strings/stdlib_base64_decode.fypp new file mode 100644 index 000000000..e3c821176 --- /dev/null +++ b/src/strings/stdlib_base64_decode.fypp @@ -0,0 +1,217 @@ +! SPDX-Identifier: MIT + +submodule(stdlib_base64) stdlib_base64_decode + use stdlib_error, only: state_type, STDLIB_SUCCESS, STDLIB_VALUE_ERROR + +contains + + pure module subroutine base64_decode_into(str, res, decoded_len, err_state, skip_despace) + character(len=*), intent(in) :: str + character(len=*), intent(out) :: res + integer, intent(out) :: decoded_len + type(state_type), intent(out) :: err_state + logical, intent(in), optional :: skip_despace + + integer(int8), allocatable :: filtered(:) + logical :: clean_mode + + integer(int32) :: v0, v1, v2, v3, c, raw_c + integer :: i, j, slen, n_pad + integer(int32) :: error_accum, unicode_accum + + err_state = state_type(STDLIB_SUCCESS) + decoded_len = 0 + unicode_accum = 0 + + if (len(str) == 0) return + + clean_mode = .false. + if (present(skip_despace)) clean_mode = skip_despace + + if (clean_mode) then + slen = len(str) + if (mod(slen, 4) /= 0) then + err_state = state_type("base64_decode_into", STDLIB_VALUE_ERROR, & + "Input length must be a multiple of 4") + return + end if + + n_pad = 0 + if (str(slen:slen) == '=') then + n_pad = 1 + if (str(slen-1:slen-1) == '=') n_pad = 2 + end if + + decoded_len = (slen / 4) * 3 - n_pad + if (len(res) < decoded_len) then + err_state = state_type("base64_decode_into", STDLIB_VALUE_ERROR, & + "Output buffer too small") + decoded_len = 0 + return + end if + else + allocate(filtered(len(str))) + j = 0 + do i = 1, len(str) + raw_c = int(iachar(str(i:i)), int32) + unicode_accum = ior(unicode_accum, raw_c) + c = iand(raw_c, 255_int32) + if (IS_VAL(c) == 0) cycle + j = j + 1 + filtered(j) = int(c, int8) + end do + slen = j + + if (unicode_accum > 255_int32 .or. slen == 0 .or. mod(slen, 4) /= 0) then + err_state = state_type("base64_decode_into", STDLIB_VALUE_ERROR, & + "Invalid Base64 byte stream") + return + end if + + n_pad = 0 + if (filtered(slen) == 61_int8) then + n_pad = 1 + if (filtered(slen-1) == 61_int8) n_pad = 2 + end if + + decoded_len = (slen / 4) * 3 - n_pad + if (len(res) < decoded_len) then + err_state = state_type("base64_decode_into", STDLIB_VALUE_ERROR, & + "Output buffer too small") + decoded_len = 0 + return + end if + end if + + error_accum = 0 + + ! Fortran Scalar Decoder Loop + do i = 1, slen - 7, 4 + if (clean_mode) then + raw_c = int(iachar(str(i:i)), int32) + unicode_accum = ior(unicode_accum, raw_c) + v0 = int(DT(iand(raw_c, 255_int32)), int32) + raw_c = int(iachar(str(i+1:i+1)), int32) + unicode_accum = ior(unicode_accum, raw_c) + v1 = int(DT(iand(raw_c, 255_int32)), int32) + raw_c = int(iachar(str(i+2:i+2)), int32) + unicode_accum = ior(unicode_accum, raw_c) + v2 = int(DT(iand(raw_c, 255_int32)), int32) + raw_c = int(iachar(str(i+3:i+3)), int32) + unicode_accum = ior(unicode_accum, raw_c) + v3 = int(DT(iand(raw_c, 255_int32)), int32) + else + v0 = int(DT(iand(int(filtered(i), int32), 255_int32)), int32) + v1 = int(DT(iand(int(filtered(i+1), int32), 255_int32)), int32) + v2 = int(DT(iand(int(filtered(i+2), int32), 255_int32)), int32) + v3 = int(DT(iand(int(filtered(i+3), int32), 255_int32)), int32) + end if + + error_accum = ior(error_accum, ior(v0, ior(v1, ior(v2, v3)))) + + j = (i - 1) / 4 * 3 + 1 + res(j:j) = char(ior(ishft(v0, 2), ishft(v1, -4))) + res(j+1:j+1) = char(ior(ishft(iand(v1, 15_int32), 4), ishft(v2, -2))) + res(j+2:j+2) = char(ior(ishft(iand(v2, 3_int32), 6), v3)) + end do + + if (error_accum < 0 .or. unicode_accum > 255_int32) then + err_state = state_type("base64_decode_into", STDLIB_VALUE_ERROR, & + "Invalid Base64 alphabet or non-ASCII input") + decoded_len = 0 + return + end if + + ! Fortran Tail Padding Logic + i = slen - 3 + j = (i - 1) / 4 * 3 + 1 + + if (clean_mode) then + raw_c = int(iachar(str(i:i)), int32) + unicode_accum = ior(unicode_accum, raw_c) + c = iand(raw_c, 255_int32) + v0 = int(DT(c), int32) + raw_c = int(iachar(str(i+1:i+1)), int32) + unicode_accum = ior(unicode_accum, raw_c) + c = iand(raw_c, 255_int32) + v1 = int(DT(c), int32) + raw_c = int(iachar(str(i+2:i+2)), int32) + unicode_accum = ior(unicode_accum, raw_c) + c = iand(raw_c, 255_int32) + v2 = int(DT(c), int32) + raw_c = int(iachar(str(i+3:i+3)), int32) + unicode_accum = ior(unicode_accum, raw_c) + c = iand(raw_c, 255_int32) + v3 = int(DT(c), int32) + else + c = iand(int(filtered(i), int32), 255_int32) + v0 = int(DT(c), int32) + c = iand(int(filtered(i+1), int32), 255_int32) + v1 = int(DT(c), int32) + c = iand(int(filtered(i+2), int32), 255_int32) + v2 = int(DT(c), int32) + c = iand(int(filtered(i+3), int32), 255_int32) + v3 = int(DT(c), int32) + end if + + if (v0 < 0 .or. v1 < 0 .or. unicode_accum > 255_int32) then + err_state = state_type("base64_decode_into", STDLIB_VALUE_ERROR, & + "Invalid final quartet") + decoded_len = 0 + return + end if + + res(j:j) = char(ior(ishft(v0, 2), ishft(v1, -4))) + + if (n_pad < 2) then + if (v2 < 0) then + err_state = state_type("base64_decode_into", STDLIB_VALUE_ERROR, & + "Invalid final quartet") + decoded_len = 0 + return + end if + res(j+1:j+1) = char(ior(ishft(iand(v1, 15_int32), 4), ishft(v2, -2))) + end if + + if (n_pad == 0) then + if (v3 < 0) then + err_state = state_type("base64_decode_into", STDLIB_VALUE_ERROR, & + "Invalid final quartet") + decoded_len = 0 + return + end if + res(j+2:j+2) = char(ior(ishft(iand(v2, 3_int32), 6), v3)) + end if + end subroutine base64_decode_into + + module function base64_decode(str, err_state) result(res) + character(len=*), intent(in) :: str + type(state_type), intent(out), optional :: err_state + character(len=:), allocatable :: res + + integer :: decoded_len + type(state_type) :: local_state + + if (len(str) == 0) then + allocate(character(len=0) :: res) + local_state = state_type(STDLIB_SUCCESS) + if (present(err_state)) err_state = local_state + return + end if + + ! Pre-allocate maximum possible size needed + allocate(character(len=(len(str) / 4) * 3 + 3) :: res) + + call base64_decode_into(str, res, decoded_len, local_state) + + if (.not. local_state%ok()) then + deallocate(res) + allocate(character(len=0) :: res) + else + res = res(1:decoded_len) + end if + + if (present(err_state)) err_state = local_state + end function base64_decode + +end submodule stdlib_base64_decode \ No newline at end of file diff --git a/src/strings/stdlib_base64_encode.fypp b/src/strings/stdlib_base64_encode.fypp new file mode 100644 index 000000000..a2d87fca9 --- /dev/null +++ b/src/strings/stdlib_base64_encode.fypp @@ -0,0 +1,209 @@ +! SPDX-Identifier: MIT +#:include "common.fypp" + +submodule(stdlib_base64) stdlib_base64_encode + use stdlib_error, only: state_type, STDLIB_SUCCESS, STDLIB_VALUE_ERROR + +contains + +#:set RANKS = range(0, MAXRANK + 1) + +#:for k1, t1, _, _ in REAL_KINDS_TYPES + module function base64_encode_real_${k1}$(data) result(str) + ${t1}$, intent(in), target, contiguous :: data(..) + character(len=:), allocatable :: str + integer(int64) :: nbytes + nbytes = int(size(data), int64) * storage_size(data) / 8 + if (nbytes == 0) then + str = "" + return + end if + ! Standard-compliant conversion using transfer + select rank(data) + #:for r in RANKS + rank(${r}$) + str = base64_encode_bytes(transfer(data, [0_int8], int(nbytes))) + #:endfor + end select + end function base64_encode_real_${k1}$ +#:endfor + +#:for k1, t1, _, _ in INT_KINDS_TYPES + module function base64_encode_int_${k1}$(data) result(str) + ${t1}$, intent(in), target, contiguous :: data(..) + character(len=:), allocatable :: str + integer(int64) :: nbytes + nbytes = int(size(data), int64) * storage_size(data) / 8 + if (nbytes == 0) then + str = "" + return + end if + select rank(data) + #:for r in RANKS + rank(${r}$) + str = base64_encode_bytes(transfer(data, [0_int8], int(nbytes))) + #:endfor + end select + end function base64_encode_int_${k1}$ +#:endfor + +#:for k1, t1, _ in CMPLX_KINDS_TYPES + module function base64_encode_cmplx_${k1}$(data) result(str) + ${t1}$, intent(in), target, contiguous :: data(..) + character(len=:), allocatable :: str + integer(int64) :: nbytes + nbytes = int(size(data), int64) * storage_size(data) / 8 + if (nbytes == 0) then + str = "" + return + end if + select rank(data) + #:for r in RANKS + rank(${r}$) + str = base64_encode_bytes(transfer(data, [0_int8], int(nbytes))) + #:endfor + end select + end function base64_encode_cmplx_${k1}$ +#:endfor + +#:for k1, t1 in LOG_KINDS_TYPES + module function base64_encode_logical_${k1}$(data) result(str) + ${t1}$, intent(in), target, contiguous :: data(..) + character(len=:), allocatable :: str + integer(int64) :: nbytes + nbytes = int(size(data), int64) * storage_size(data) / 8 + if (nbytes == 0) then + str = "" + return + end if + select rank(data) + #:for r in RANKS + rank(${r}$) + str = base64_encode_bytes(transfer(data, [0_int8], int(nbytes))) + #:endfor + end select + end function base64_encode_logical_${k1}$ +#:endfor + + module function base64_encode_bytes(bytes) result(str) + integer(int8), intent(in), target, contiguous :: bytes(:) + character(len=:), allocatable, target :: str + ! Use c_size_t to safely track the size of massive arrays + ! without risking 32-bit integer overflow, and to ensure C-interop + integer(c_size_t) :: nbytes + integer :: str_len + integer :: i, j + integer(int32) :: triplet + + nbytes = size(bytes, kind=c_size_t) + str_len = 4 * ((int(nbytes) + 2) / 3) + + allocate(character(len=str_len) :: str) + if (nbytes == 0) return + + j = 1 + do i = 1, int(nbytes) - 2, 3 + triplet = ior(ishft(iand(int(bytes(i), int32), 255_int32), 16_int32), & + ior(ishft(iand(int(bytes(i+1), int32), 255_int32), 8_int32), & + iand(int(bytes(i+2), int32), 255_int32))) + + str(j:j) = base64_alphabet( & + ishft(triplet, -18_int32) + 1 : ishft(triplet, -18_int32) + 1) + str(j+1:j+1) = base64_alphabet( & + iand(ishft(triplet, -12_int32), 63_int32) + 1 : iand(ishft(triplet, -12_int32), 63_int32) + 1) + str(j+2:j+2) = base64_alphabet( & + iand(ishft(triplet, -6_int32), 63_int32) + 1 : iand(ishft(triplet, -6_int32), 63_int32) + 1) + str(j+3:j+3) = base64_alphabet(iand(triplet, 63_int32) + 1 : iand(triplet, 63_int32) + 1) + j = j + 4 + end do + + if (mod(nbytes, 3_c_size_t) == 1) then + triplet = ishft(iand(int(bytes(int(nbytes)), int32), 255_int32), 16_int32) + str(j:j) = base64_alphabet( & + ishft(triplet, -18_int32) + 1 : ishft(triplet, -18_int32) + 1) + str(j+1:j+1) = base64_alphabet( & + iand(ishft(triplet, -12_int32), 63_int32) + 1 : iand(ishft(triplet, -12_int32), 63_int32) + 1) + str(j+2:j+3) = "==" + else if (mod(nbytes, 3_c_size_t) == 2) then + triplet = ior(ishft(iand(int(bytes(int(nbytes)-1), int32), 255_int32), 16_int32), & + ishft(iand(int(bytes(int(nbytes)), int32), 255_int32), 8_int32)) + str(j:j) = base64_alphabet( & + ishft(triplet, -18_int32) + 1 : ishft(triplet, -18_int32) + 1) + str(j+1:j+1) = base64_alphabet( & + iand(ishft(triplet, -12_int32), 63_int32) + 1 : iand(ishft(triplet, -12_int32), 63_int32) + 1) + str(j+2:j+2) = base64_alphabet( & + iand(ishft(triplet, -6_int32), 63_int32) + 1 : iand(ishft(triplet, -6_int32), 63_int32) + 1) + str(j+3:j+3) = "=" + end if + + end function base64_encode_bytes + + pure module subroutine base64_encode_into(bytes, str, encoded_len, err_state) + integer(int8), intent(in), target, contiguous :: bytes(:) + character(len=*), intent(out), target :: str + integer, intent(out) :: encoded_len + type(state_type), intent(out) :: err_state + + integer(c_size_t) :: nbytes + integer :: str_len + integer :: i, j + integer(int32) :: triplet + + encoded_len = 0 + err_state = state_type(STDLIB_SUCCESS) + + nbytes = size(bytes, kind=c_size_t) + if (nbytes == 0) then + str = '' + return + end if + + str_len = 4 * ((int(nbytes) + 2) / 3) + + if (len(str) < str_len) then + str = '' + err_state = state_type("base64_encode_into", STDLIB_VALUE_ERROR, & + "Output buffer too small") + return + end if + + encoded_len = str_len + + j = 1 + do i = 1, int(nbytes) - 2, 3 + triplet = ior(ishft(iand(int(bytes(i), int32), 255_int32), 16_int32), & + ior(ishft(iand(int(bytes(i+1), int32), 255_int32), 8_int32), & + iand(int(bytes(i+2), int32), 255_int32))) + + str(j:j) = base64_alphabet( & + ishft(triplet, -18_int32) + 1 : ishft(triplet, -18_int32) + 1) + str(j+1:j+1) = base64_alphabet( & + iand(ishft(triplet, -12_int32), 63_int32) + 1 : iand(ishft(triplet, -12_int32), 63_int32) + 1) + str(j+2:j+2) = base64_alphabet( & + iand(ishft(triplet, -6_int32), 63_int32) + 1 : iand(ishft(triplet, -6_int32), 63_int32) + 1) + str(j+3:j+3) = base64_alphabet(iand(triplet, 63_int32) + 1 : iand(triplet, 63_int32) + 1) + j = j + 4 + end do + + if (mod(nbytes, 3_c_size_t) == 1) then + triplet = ishft(iand(int(bytes(int(nbytes)), int32), 255_int32), 16_int32) + str(j:j) = base64_alphabet( & + ishft(triplet, -18_int32) + 1 : ishft(triplet, -18_int32) + 1) + str(j+1:j+1) = base64_alphabet( & + iand(ishft(triplet, -12_int32), 63_int32) + 1 : iand(ishft(triplet, -12_int32), 63_int32) + 1) + str(j+2:j+3) = "==" + else if (mod(nbytes, 3_c_size_t) == 2) then + triplet = ior(ishft(iand(int(bytes(int(nbytes)-1), int32), 255_int32), 16_int32), & + ishft(iand(int(bytes(int(nbytes)), int32), 255_int32), 8_int32)) + str(j:j) = base64_alphabet( & + ishft(triplet, -18_int32) + 1 : ishft(triplet, -18_int32) + 1) + str(j+1:j+1) = base64_alphabet( & + iand(ishft(triplet, -12_int32), 63_int32) + 1 : iand(ishft(triplet, -12_int32), 63_int32) + 1) + str(j+2:j+2) = base64_alphabet( & + iand(ishft(triplet, -6_int32), 63_int32) + 1 : iand(ishft(triplet, -6_int32), 63_int32) + 1) + str(j+3:j+3) = "=" + end if + + end subroutine base64_encode_into + +end submodule stdlib_base64_encode \ No newline at end of file diff --git a/src/strings/stdlib_strings.fypp b/src/strings/stdlib_strings.fypp index 9cacb1f72..1f449e235 100644 --- a/src/strings/stdlib_strings.fypp +++ b/src/strings/stdlib_strings.fypp @@ -5,6 +5,7 @@ !> The specification of this module is available [here](../page/specs/stdlib_strings.html). module stdlib_strings use stdlib_ascii, only: whitespace + use stdlib_base64, only: base64_encode, base64_decode use stdlib_string_type, only: string_type, char, verify, repeat, len, len_trim, move use stdlib_optval, only: optval use stdlib_kinds, only: sp, dp, xdp, qp, int8, int16, int32, int64, lk, c_bool, c_char @@ -13,6 +14,7 @@ module stdlib_strings private public :: to_string + public :: base64_encode, base64_decode public :: to_c_char public :: strip, chomp public :: starts_with, ends_with diff --git a/test/string/CMakeLists.txt b/test/string/CMakeLists.txt index 833471c9e..a22145a2d 100644 --- a/test/string/CMakeLists.txt +++ b/test/string/CMakeLists.txt @@ -14,6 +14,7 @@ ADDTEST(string_intrinsic) ADDTEST(string_match) ADDTEST(string_derivedtype_io) ADDTEST(string_functions) +ADDTEST(base64) ADDTEST(string_strip_chomp) ADDTEST(string_to_number) ADDTEST(string_to_string) diff --git a/test/string/test_base64.f90 b/test/string/test_base64.f90 new file mode 100644 index 000000000..b8852d093 --- /dev/null +++ b/test/string/test_base64.f90 @@ -0,0 +1,209 @@ +! SPDX-Identifier: MIT +module test_base64 + use testdrive, only : new_unittest, unittest_type, error_type, check + use stdlib_base64, only : base64_encode, base64_decode, base64_encode_into, base64_decode_into + use stdlib_kinds, only : int8, int32, dp, lk + use stdlib_error, only : state_type + implicit none + +contains + + subroutine collect_base64(testsuite) + type(unittest_type), allocatable, intent(out) :: testsuite(:) + + testsuite = [ & + new_unittest("base64-known-vectors", test_known_vectors), & + new_unittest("base64-decode-whitespace", test_decode_whitespace), & + new_unittest("base64-decode-invalid", test_decode_invalid), & + new_unittest("base64-roundtrip-int32", test_roundtrip_int32), & + new_unittest("base64-roundtrip-real", test_roundtrip_real), & + new_unittest("base64-roundtrip-complex", test_roundtrip_complex), & + new_unittest("base64-roundtrip-logical", test_roundtrip_logical), & + new_unittest("base64-rank0", test_rank0_encode), & + new_unittest("base64-encode-into", test_encode_into), & + new_unittest("base64-decode-into", test_decode_into) & + ] + end subroutine collect_base64 + + subroutine test_known_vectors(error) + type(error_type), allocatable, intent(out) :: error + + call check(error, base64_encode([int(77, int8), int(97, int8), int(110, int8)]) == "TWFu") + if (allocated(error)) return + call check(error, base64_encode([int(77, int8), int(97, int8)]) == "TWE=") + if (allocated(error)) return + call check(error, base64_encode([int(77, int8)]) == "TQ==") + if (allocated(error)) return + + call check(error, base64_decode("TWFu") == "Man") + if (allocated(error)) return + call check(error, base64_decode("TWE=") == "Ma") + if (allocated(error)) return + call check(error, base64_decode("TQ==") == "M") + end subroutine test_known_vectors + + subroutine test_decode_whitespace(error) + type(error_type), allocatable, intent(out) :: error + + call check(error, base64_decode("T W" // new_line("a") // "Fu") == "Man") + end subroutine test_decode_whitespace + + subroutine test_decode_invalid(error) + type(error_type), allocatable, intent(out) :: error + type(state_type) :: err_flag + + call check(error, base64_decode("abc", err_flag) == "") + if (allocated(error)) return + call check(error, .not. err_flag%ok()) + if (allocated(error)) return + + call check(error, base64_decode("A===") == "") + if (allocated(error)) return + call check(error, base64_decode("AA=A") == "") + end subroutine test_decode_invalid + + subroutine test_roundtrip_int32(error) + type(error_type), allocatable, intent(out) :: error + + integer(int32) :: vals(4), got(4) + character(len=:), allocatable :: enc, dec + + vals = [1_int32, -2_int32, 1024_int32, -4096_int32] + + enc = base64_encode(vals) + dec = base64_decode(enc) + got = transfer(dec, got) + + call check(error, all(got == vals)) + end subroutine test_roundtrip_int32 + + subroutine test_roundtrip_real(error) + type(error_type), allocatable, intent(out) :: error + + real(dp) :: got(4) + character(len=:), allocatable :: enc, dec + + real(dp), parameter :: vals(*) = [1.5_dp, -2.25_dp, 0.125_dp, 9.0_dp] + + enc = base64_encode(vals) + dec = base64_decode(enc) + got = transfer(dec, got) + + call check(error, all(got == vals)) + end subroutine test_roundtrip_real + + subroutine test_roundtrip_complex(error) + type(error_type), allocatable, intent(out) :: error + + complex(dp) :: got(3) + character(len=:), allocatable :: enc, dec + + complex(dp), parameter :: vals(*) = [cmplx(1.0_dp, 2.0_dp, dp), & + cmplx(-3.0_dp, 0.5_dp, dp), cmplx(0.0_dp, -4.0_dp, dp)] + + enc = base64_encode(vals) + dec = base64_decode(enc) + got = transfer(dec, got) + + call check(error, all(got == vals)) + end subroutine test_roundtrip_complex + + subroutine test_roundtrip_logical(error) + type(error_type), allocatable, intent(out) :: error + logical(lk) :: vals(5), got(5) + character(len=:), allocatable :: enc, dec + + vals = [.true._lk, .false._lk, .true._lk, .true._lk, .false._lk] + + enc = base64_encode(vals) + dec = base64_decode(enc) + got = transfer(dec, got) + + call check(error, all((vals .neqv. .false._lk) .eqv. (got .neqv. .false._lk))) + end subroutine test_roundtrip_logical + + subroutine test_rank0_encode(error) + type(error_type), allocatable, intent(out) :: error + + integer(int32) :: v + + v = 42_int32 + call check(error, len(base64_encode(v)) > 0) + end subroutine test_rank0_encode + + subroutine test_encode_into(error) + type(error_type), allocatable, intent(out) :: error + integer(int8) :: bytes(3) + character(len=4) :: str + character(len=2) :: small_str + integer :: elen + type(state_type) :: err + + bytes = [int(77, int8), int(97, int8), int(110, int8)] + + ! Test successful zero-copy encode + call base64_encode_into(bytes, str, elen, err) + call check(error, err%ok()) + if (allocated(error)) return + call check(error, elen == 4) + if (allocated(error)) return + call check(error, str == "TWFu") + if (allocated(error)) return + + ! Test buffer too small (Expect Error) + call base64_encode_into(bytes, small_str, elen, err) + call check(error, .not. err%ok()) + if (allocated(error)) return + call check(error, small_str == "") + end subroutine test_encode_into + + subroutine test_decode_into(error) + type(error_type), allocatable, intent(out) :: error + character(len=4) :: str = "TWFu" + character(len=3) :: res + character(len=1) :: small_res + integer :: dlen + type(state_type) :: err + + ! Test successful decode + call base64_decode_into(str, res, dlen, err) + call check(error, err%ok()) + if (allocated(error)) return + call check(error, dlen == 3) + if (allocated(error)) return + call check(error, res == "Man") + if (allocated(error)) return + + ! Test buffer too small (Expect Error) + call base64_decode_into(str, small_res, dlen, err) + call check(error, .not. err%ok()) + if (allocated(error)) return + end subroutine test_decode_into + +end module test_base64 + +program tester + use, intrinsic :: iso_fortran_env, only : error_unit + use testdrive, only : run_testsuite, new_testsuite, testsuite_type + use test_base64, only : collect_base64 + implicit none + integer :: stat, is + type(testsuite_type), allocatable :: testsuites(:) + character(len=*), parameter :: fmt = '("#", *(1x, a))' + + stat = 0 + + testsuites = [ & + new_testsuite("base64", collect_base64) & + ] + + do is = 1, size(testsuites) + write(error_unit, fmt) "Testing:", testsuites(is)%name + call run_testsuite(testsuites(is)%collect, error_unit, stat) + end do + + if (stat > 0) then + write(error_unit, '(i0, 1x, a)') stat, "test(s) failed!" + error stop + end if +end program tester \ No newline at end of file