|
| 1 | +Changes for 0.6.0 'Gyrfalcon': |
| 2 | +------------------------------ |
| 3 | + |
| 4 | +0.6.0 is a major release for dav1d: |
| 5 | + - New ARM64 optimizations for the 10/12bit depth: |
| 6 | + - mc_avg, mc_w_avg, mc_mask |
| 7 | + - mc_put/mc_prep 8tap/bilin |
| 8 | + - mc_warp_8x8 |
| 9 | + - mc_w_mask |
| 10 | + - mc_blend |
| 11 | + - wiener |
| 12 | + - SGR |
| 13 | + - loopfilter |
| 14 | + - cdef |
| 15 | + - New AVX-512 optimizations for prep_bilin, prep_8tap, cdef_filter, mc_avg/w_avg/mask |
| 16 | + - New SSSE3 optimizations for film grain |
| 17 | + - New AVX2 optimizations for msac_adapt16 |
| 18 | + - Fix rare mismatches against the reference decoder, notably because of clipping |
| 19 | + - Improvements on ARM64 on msac, cdef and looprestoration optimizations |
| 20 | + - Improvements on AVX2 optimizations for cdef_filter |
| 21 | + - Improvements in the C version for itxfm, cdef_filter |
| 22 | + |
| 23 | + |
| 24 | +Changes for 0.5.2 'Asiatic Cheetah': |
| 25 | +------------------------------------ |
| 26 | + |
| 27 | +0.5.2 is a small release improving speed for ARM32 and adding minor features: |
| 28 | + - ARM32 optimizations for loopfilter, ipred_dc|h|v |
| 29 | + - Add section-5 raw OBU demuxer |
| 30 | + - Improve the speed by reducing the L2 cache collisions |
| 31 | + - Fix minor issues |
| 32 | + |
| 33 | + |
| 34 | +Changes for 0.5.1 'Asiatic Cheetah': |
| 35 | +------------------------------------ |
| 36 | + |
| 37 | +0.5.1 is a small release improving speeds and fixing minor issues |
| 38 | +compared to 0.5.0: |
| 39 | + - SSE2 optimizations for CDEF, wiener and warp_affine |
| 40 | + - NEON optimizations for SGR on ARM32 |
| 41 | + - Fix mismatch issue in x86 asm in inverse identity transforms |
| 42 | + - Fix build issue in ARM64 assembly if debug info was enabled |
| 43 | + - Add a workaround for Xcode 11 -fstack-check bug |
| 44 | + |
| 45 | + |
| 46 | +Changes for 0.5.0 'Asiatic Cheetah': |
| 47 | +------------------------------------ |
| 48 | + |
| 49 | +0.5.0 is a medium release fixing regressions and minor issues, |
| 50 | +and improving speed significantly: |
| 51 | + - Export ITU T.35 metadata |
| 52 | + - Speed improvements on blend_ on ARM |
| 53 | + - Speed improvements on decode_coef and MSAC |
| 54 | + - NEON optimizations for blend*, w_mask_, ipred functions for ARM64 |
| 55 | + - NEON optimizations for CDEF and warp on ARM32 |
| 56 | + - SSE2 optimizations for MSAC hi_tok decoding |
| 57 | + - SSSE3 optimizations for deblocking loopfilters and warp_affine |
| 58 | + - AVX2 optimizations for film grain and ipred_z2 |
| 59 | + - SSE4 optimizations for warp_affine |
| 60 | + - VSX optimizations for wiener |
| 61 | + - Fix inverse transform overflows in x86 and NEON asm |
| 62 | + - Fix integer overflows with large frames |
| 63 | + - Improve film grain generation to match reference code |
| 64 | + - Improve compatibility with older binutils for ARM |
| 65 | + - More advanced Player example in tools |
| 66 | + |
| 67 | + |
| 68 | +Changes for 0.4.0 'Cheetah': |
| 69 | +---------------------------- |
| 70 | + |
| 71 | + - Fix playback with unknown OBUs |
| 72 | + - Add an option to limit the maximum frame size |
| 73 | + - SSE2 and ARM64 optimizations for MSAC |
| 74 | + - Improve speed on 32bits systems |
| 75 | + - Optimization in obmc blend |
| 76 | + - Reduce RAM usage significantly |
| 77 | + - The initial PPC SIMD code, cdef_filter |
| 78 | + - NEON optimizations for blend functions on ARM |
| 79 | + - NEON optimizations for w_mask functions on ARM |
| 80 | + - NEON optimizations for inverse transforms on ARM64 |
| 81 | + - VSX optimizations for CDEF filter |
| 82 | + - Improve handling of malloc failures |
| 83 | + - Simple Player example in tools |
| 84 | + |
| 85 | + |
| 86 | +Changes for 0.3.1 'Sailfish': |
| 87 | +------------------------------ |
| 88 | + |
| 89 | + - Fix a buffer overflow in frame-threading mode on SSSE3 CPUs |
| 90 | + - Reduce binary size, notably on Windows |
| 91 | + - SSSE3 optimizations for ipred_filter |
| 92 | + - ARM optimizations for MSAC |
| 93 | + |
| 94 | + |
| 95 | +Changes for 0.3.0 'Sailfish': |
| 96 | +------------------------------ |
| 97 | + |
| 98 | +This is the final release for the numerous speed improvements of 0.3.0-rc. |
| 99 | +It mostly: |
| 100 | + - Fixes an annoying crash on SSSE3 that happened in the itx functions |
| 101 | + |
| 102 | + |
| 103 | +Changes for 0.2.2 (0.3.0-rc) 'Antelope': |
| 104 | +----------------------------- |
| 105 | + |
| 106 | + - Large improvement on MSAC decoding with SSE, bringing 4-6% speed increase |
| 107 | + The impact is important on SSSE3, SSE4 and AVX2 cpus |
| 108 | + - SSSE3 optimizations for all blocks size in itx |
| 109 | + - SSSE3 optimizations for ipred_paeth and ipred_cfl (420, 422 and 444) |
| 110 | + - Speed improvements on CDEF for SSE4 CPUs |
| 111 | + - NEON optimizations for SGR and loop filter |
| 112 | + - Minor crashes, improvements and build changes |
| 113 | + |
| 114 | + |
| 115 | +Changes for 0.2.1 'Antelope': |
| 116 | +---------------------------- |
| 117 | + |
| 118 | + - SSSE3 optimization for cdef_dir |
| 119 | + - AVX2 improvements of the existing CDEF optimizations |
| 120 | + - NEON improvements of the existing CDEF and wiener optimizations |
| 121 | + - Clarification about the numbering/versionning scheme |
| 122 | + |
| 123 | + |
| 124 | +Changes for 0.2.0 'Antelope': |
| 125 | +---------------------------- |
| 126 | + |
| 127 | + - ARM64 and ARM optimizations using NEON instructions |
| 128 | + - SSSE3 optimizations for both 32 and 64bits |
| 129 | + - More AVX2 assembly, reaching almost completion |
| 130 | + - Fix installation of includes |
| 131 | + - Rewrite inverse transforms to avoid overflows |
| 132 | + - Snap packaging for Linux |
| 133 | + - Updated API (ABI and API break) |
| 134 | + - Fixes for un-decodable samples |
| 135 | + |
| 136 | + |
| 137 | +Changes for 0.1.0 'Gazelle': |
| 138 | +---------------------------- |
| 139 | + |
| 140 | +Initial release of dav1d, the fast and small AV1 decoder. |
| 141 | + - Support for all features of the AV1 bitstream |
| 142 | + - Support for all bitdepth, 8, 10 and 12bits |
| 143 | + - Support for all chroma subsamplings 4:2:0, 4:2:2, 4:4:4 *and* grayscale |
| 144 | + - Full acceleration for AVX2 64bits processors, making it the fastest decoder |
| 145 | + - Partial acceleration for SSSE3 processors |
| 146 | + - Partial acceleration for NEON processors |
0 commit comments