Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
122 commits
Select commit Hold shift + click to select a range
557aad1
Turn: Add RGB48, RGB64 and planar with alpha formats support.
chikuzen Aug 23, 2016
3e0456a
ConvertTo8/16/Float: true 10-12-14 bit range support. New parameters.
Aug 23, 2016
b09bc11
New params for Info(): c[font]s[size]f[text_color]i[halo_color]i. Fix…
Aug 23, 2016
06d0240
ConvertToRGBxxx: 16 bit/float, RGB48/64,PlanarRGB(A), except PlanarRG…
Aug 24, 2016
3a2b8b3
New: PlanarToRGBA. ToRGB conversions seem to be ready.
Aug 25, 2016
2301c2c
Fix: Convert RGB32->PlanarRGB (alpha->nonalpha)
Aug 25, 2016
976064b
Resizers: RGB48/64, PlanarRGB 8/16/Float
Aug 26, 2016
8a11a08
Resamplers: Alpha plane (YUVA, PlanarRGBA)
Aug 26, 2016
46089bb
Resize: SSE2/4.1 16bit/float in resize_v_sseX_planar_16or32 with pure…
Aug 26, 2016
c97f545
Remove unused code
Aug 26, 2016
721c29b
Convert: RGB48/64,PlanarRGB(A)8/10-16/32 -> YUV(A)8/10-16/32
Aug 26, 2016
8579f2a
Remove unused code
Aug 26, 2016
55057fa
Remove unused code
Aug 26, 2016
7167307
Typo left from earlier trials
Aug 26, 2016
7998456
RGB48/64+PlanarRGB(A) for
Aug 26, 2016
6780ff4
avisynth C: missing alpha fields in VideoFrame struct + planar R,G,B …
Aug 29, 2016
b514645
fix :Convert PlanarRGB->444 at 8 bit
Aug 29, 2016
815d59d
Horizontal/VerticalReduceBy2: PlanarRGB+alpha. NoRGB48/64yet
Aug 29, 2016
b84055d
StackHorizontal/Vertical: PlanarRGB, alpha
Aug 29, 2016
0f04e5f
Merge(Chroma,Luma):Planar RGB, float type fix
Aug 29, 2016
48eb640
fix: typo in default alpha value
Aug 29, 2016
0b6c7ad
make fill_plane and fill_chroma public
Aug 29, 2016
7f92e2c
ShowRed/Green/Blue/Alpha() for RGB48/64 source. New targets RGB48/64/…
Aug 29, 2016
783fe2e
ShowRed/G/B/A parameter check message fix
Aug 29, 2016
98376c9
ShowChannel: kill warnings
Aug 29, 2016
ef2eb0d
ConvertToY: RGB48/64, PlanarRGB8/16/float support
Aug 30, 2016
417282b
HorizontalReduceBy2: RGB48/64. No todo left here.
Aug 30, 2016
5b537c3
resize_h_c_planar: much faster by changing x-y loop order
Aug 30, 2016
3e8f2f8
resizer_h_ssse3_generic for 16bit/float. Resizers are superfast now f…
Aug 30, 2016
cd782ca
ColorYUV for 10-16 bit with native 10,12,14 bitdepth.
Sep 1, 2016
fab87f4
Text overlay native 10-12-14 bit-aware.
Sep 1, 2016
74116e7
Blur/Sharpen: RGB48/64 and Planar RGB(A)
Sep 1, 2016
4446dd4
Conditional (runtime) functions for YUV 16 bit/float and RGB64 and Pl…
Sep 2, 2016
3c6a1cd
Conditional_functions: Remove refactored creates
Sep 2, 2016
79cb2cd
conditional_functions: return from one common place
Sep 2, 2016
d97c5ee
Levels: 10-16 bit support for YUV(A), PlanarRGB(A), 16 bits for RGB48/64
Sep 4, 2016
1b7e57f
RGBAdjust: RGB48/64 and Planar RGB(A) 10-16 bit
Sep 4, 2016
440a2cd
RGBAdjust: code cleanup to templates
Sep 5, 2016
8360ed6
Fix Limiter parameter check (was never true)
Sep 5, 2016
c32c9ec
Allow plugins to use parameter type double
Sep 5, 2016
2639567
Limiter: 10-16 bit YUV support. Todo: show options for 10+ bits
Sep 5, 2016
e536fb1
Convert sse2 for YUV 8->10-16, 16-10->8 bit (with shifts)
Sep 5, 2016
21d0a98
Limiter: port show option for 10-16 bits
Sep 6, 2016
4f67b39
Limiter: Remove old non-templated 8 bit code
Sep 6, 2016
6c85f4c
Revert a false idea (double type support in plugins e.g. virtualdub)
Sep 7, 2016
34db747
Invert: YUV(A)/PlanarRGB(A) 8,10-16,32 bit, RGB48/64, with SSE2
Sep 7, 2016
34ea551
Blur/Sharpen: 10,12,14 bit clamp, TemporalSoften: SAD scale ok for 10…
Sep 7, 2016
105cdfc
PlaneSwap <plane>ToY: 10-12-14 bit aware
Sep 7, 2016
132b5f7
ConvertToY8: 10-12-14 bit aware
Sep 7, 2016
672d8bf
Mask: hbd alpha formats RGB64 and PlanarRGBA (8,10-16,float) support
Sep 7, 2016
6c5d0b0
GreyScale(): RGB64/PlanarRGB(A) + all 10-12-14 bit aware
Sep 8, 2016
0470711
ResetMask: RGB64, planars with alpha (RGBA, YUVA)
Sep 8, 2016
ec2c88c
Layer() for RGB64, ResetMask: new "mask" parameter. Default: max of p…
Sep 8, 2016
6340238
Info(): missing 8 bit YUVA color space descriptions
Sep 8, 2016
3967091
Planar RGBA -> 422/420 conversion chain results YUVA as it works for …
Sep 8, 2016
5c0be9b
ConvertToYUV411: alias for ConvertToYV411. For naming consistency
Sep 8, 2016
e780ae4
Convert: Planar RGB <> YUV: 10-12-14 bit range
Sep 10, 2016
808d9c9
StackVertical: don't reverse frame order for planar RGBs
Sep 10, 2016
8211308
General functions for pixel_type<->pixel_type_name, aliases also acce…
Sep 10, 2016
f55c7d4
ShowChannel() to use GetPixelTypeFromName()
Sep 10, 2016
1827ec4
BlankClip and Colorbars to use GetPixelTypeFromName()
Sep 10, 2016
712f342
FilterInfo to use GetPixelTypeName()
Sep 10, 2016
b2216fd
frame->GetRowSize() and GetHeight() to return 0 if no Alpha plane (fo…
Sep 12, 2016
8f0e2bf
Comment in order to not mislead me again
Sep 12, 2016
37173f1
ConvertToY: make rgb matrix offset_y always 8 bit, conversion later
Sep 12, 2016
53f46a5
Memo why AVSValue will never get 64 bit types in Avisynth(32)
Sep 12, 2016
7a34064
Overlay! Working "add" method for 16 bit input. Big changes, no clean…
Sep 12, 2016
3b603a2
AddBorders: 10-12-14 bit aware, stretch 8 bit colors for RGB (10-16)
Sep 13, 2016
c7c585c
BlankClip: 10-12-14 bit aware
Sep 13, 2016
f15cb76
Tweak: Luma LUT for 10-16 bits, Chroma LUT for 10 bit (old: LUT 8 bit…
Sep 14, 2016
aaf8977
Subtract: Planar RGB(A), YUV(A) 10-16,Float, RGB48 and RGB64
Sep 14, 2016
1ded5d5
ColorKeyMask: RGB64, Planar RGBA 8-16,float.
Sep 14, 2016
152f328
MergeChroma: fix regression for 8 bit (+variable renames)
Sep 15, 2016
873f4f7
MaskHS: 10-16bit,float. Tweak: fix using start/endHue, min/maxSat for…
Sep 15, 2016
cdab57e
Apply template<typename pixel_t> naming style
Sep 15, 2016
53890f5
Tweak dither strength back to base +/-0.5. Use env2->Allocate for RGB…
Sep 16, 2016
cf65dbc
Histogram: "Levels" with bits=10 gives 10-bit wide histogram for 10+ …
Sep 16, 2016
3caef6c
Histogram "levels": parameter: bits=8,9,10,11,12 for finer histograms…
Sep 17, 2016
37599c9
Levels, Tweak, RGBAdjust: dither range fix for 10-16 bit + float
Sep 17, 2016
a6c0d50
ConvertBits(), refactor, YUVA alpha always full scale to retain max. …
Sep 19, 2016
598efff
rec2020 matrix for RGB<->444 and GreyScale. not for YUY2
Sep 19, 2016
47571ad
Merge: SSE2 for 10-14 bits (10-16 for SSE4.1 still work)
Sep 22, 2016
0077f82
Ordered dither for 10-16->8 bit. SSE2 for 8->10-16 bit full scale (RGB).
Sep 22, 2016
547f536
ConvertBits: dither parameter check
Sep 22, 2016
ea0655a
TemporalSoften: SSE2 for SAD 16 bit, bits_per_pixel SAD scaling
Sep 29, 2016
36ef8a7
No need "typename" in non-templated function (compiler compatibility)
Sep 29, 2016
133b9f0
VDubFilter: really allow and convert double/long-type params
Sep 30, 2016
8865287
Merge remote-tracking branch 'chikuzen/turn' into MT
Sep 30, 2016
b450628
CPU feature constants for AVX2, FMA3, F16C (AES, MOVBE) + Info()
Oct 1, 2016
0023d71
Remove MSVC specific version checking from CPUCheckForExtensions
Oct 1, 2016
5dd0d8b
AddAlphaPlane/RemoveAlphaPlane + misc refactor on ->8bit conversion
Oct 1, 2016
2dba164
Avisynth.h: auto fallback to avs2.6 when avs+ VideoInfo:: member func…
Oct 5, 2016
c586ae0
YUV 10-16->10-16 bitdepth conversions to SSE2
Oct 5, 2016
8ab11f4
Missing parenthesis when x64 build
Oct 12, 2016
409069c
Fix some warnings
Oct 12, 2016
cdd15f6
Fix warning
Oct 12, 2016
595744e
AVX code path for 16/32 bit resampler
Oct 12, 2016
fdb0448
Resizers: proper clamping for 10,12,14 bits
Oct 12, 2016
6e942ec
Fix avisynth.h for x64 build
Oct 12, 2016
595d38f
AVX and AVX2 paths for 32->(8..16) and 10..16<->10..16 bitdepth conve…
Oct 12, 2016
e0640b5
Use AVX/AVX2 path for two bitdepth conversion function
Oct 12, 2016
8af39fa
cmakelist.txt: set special avx/avx2 options for files *_avx.cpp and *…
Oct 12, 2016
ffe5071
Test for fixing "Only a single prefetcher is allowed per script."
Oct 12, 2016
0b440c5
Comments in main cmakelist.txt
Oct 12, 2016
87ce6b4
Fix some size_t warnings for x64
Oct 13, 2016
a289325
Overlay: Subtract 10-16 bit. Unify with "Add"
Oct 13, 2016
384f10d
Overlay: Darken/Lighten to 10-16 bit
Oct 13, 2016
3c993d7
Overlay: Blend,Chroma,Luma 10-16 bit
Oct 14, 2016
8f17960
Overlay: Multiply 10-16 bits
Oct 14, 2016
f0cc244
Overlay: Difference 10-16 bit
Oct 14, 2016
e32f825
misc. comment in overlay add
Oct 14, 2016
1f4cf1b
Overlay: Exclusion 10-16 bit
Oct 14, 2016
101e3f1
Misc. line ordering
Oct 14, 2016
87426c7
Overlay: Softlight, Hardlight 10-16 bit. No more left.
Oct 14, 2016
066bc4b
Histogram: Classic 10-16 bit, and 32 bit float
Oct 14, 2016
42331b4
ShowRed/G/B/A: new ShowY/U/V. New:allow PlanarRGB(A)/YUV(A) src and t…
Oct 14, 2016
9d156c6
New script function: ColorSpaceNameToPixelType. Returns a VideoInfo::…
Oct 14, 2016
e10dcb8
AviSource/DirectShowSource: 16-bit RGB input support (BGR[48], BRA[64])
ignus2 Oct 23, 2016
c6d5d77
Merge pull request #7 from ignus2/input
pinterf Oct 23, 2016
0476aa9
DirectShowSource compilation help updated
Oct 26, 2016
b54a6be
Fix Overlay for 8 bit YV12
Oct 26, 2016
dbcfae2
DirectShowSource: Sensible defaults for CMake baseclasses lib, remove…
ignus2 Oct 30, 2016
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@ IF( MSVC_IDE ) # Check for Visual Studio
# Enable C++ with SEH exceptions
set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} /EHa")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} /EHa")
# or add_compile_options( /EHa ) for CMake>=3?

# Prevent VC++ from complaining about not using MS-specific functions
add_definitions("/D _CRT_SECURE_NO_WARNINGS /D _SECURE_SCL=0")
Expand All @@ -48,8 +49,9 @@ IF( MSVC_IDE ) # Check for Visual Studio
if(CMAKE_SIZEOF_VOID_P EQUAL 4)
# VC++ enables the SSE2 instruction set by default even on 32-bits. Step back a bit.
add_definitions("/arch:SSE")
#add_definitions("/arch:SSE2") # Better use this one, it's 2016 now
endif()

# Set additional optimization flags
set(CMAKE_C_FLAGS_RELEASE "${CMAKE_C_FLAGS_RELEASE} /Oy /Ot /GS-")
set(CMAKE_CXX_FLAGS_RELEASE "${CMAKE_CXX_FLAGS_RELEASE} /Oy /Ot /GS-")
Expand Down
8 changes: 8 additions & 0 deletions avs_core/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,14 @@ foreach(FILE ${AvsCore_Sources})
source_group("${GROUP}" FILES "${FILE}")
endforeach()

# special AVX option for source files with *_avx.cpp pattern
file(GLOB_RECURSE SRCS_AVX "*_avx.cpp")
set_source_files_properties(${SRCS_AVX} PROPERTIES COMPILE_FLAGS " /arch:AVX ")

# special AVX2 option for source files with *_avx2.cpp pattern
file(GLOB_RECURSE SRCS_AVX2 "*_avx2.cpp")
set_source_files_properties(${SRCS_AVX2} PROPERTIES COMPILE_FLAGS " /arch:AVX2 ")

# Specify include directories
target_include_directories("AvsCore" PUBLIC ${CMAKE_CURRENT_SOURCE_DIR}/include)
# Specify preprocessor definitions
Expand Down
1,771 changes: 1,398 additions & 373 deletions avs_core/convert/convert.cpp

Large diffs are not rendered by default.

11 changes: 5 additions & 6 deletions avs_core/convert/convert.h
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@

#include "../core/internal.h"

enum {Rec601=0, Rec709=1, PC_601=2, PC_709=3, AVERAGE=4 };
enum {Rec601=0, Rec709=1, PC_601=2, PC_709=3, AVERAGE=4, Rec2020=5 };
int getMatrix( const char* matrix, IScriptEnvironment* env);

/*****************************************************
Expand Down Expand Up @@ -66,7 +66,7 @@ inline int RGB2YUV(int rgb)
******* Colorspace GenericVideoFilter Classes ******
*******************************************************/


// YUY2 only
class ConvertToRGB : public GenericVideoFilter
/**
* Class to handle conversion to RGB & RGBA
Expand All @@ -80,15 +80,14 @@ class ConvertToRGB : public GenericVideoFilter
return cachehints == CACHE_GET_MTMODE ? MT_NICE_FILTER : 0;
}

static AVSValue __cdecl Create(AVSValue args, void*, IScriptEnvironment* env);
static AVSValue __cdecl Create32(AVSValue args, void*, IScriptEnvironment* env);
static AVSValue __cdecl Create24(AVSValue args, void*, IScriptEnvironment* env);
static AVSValue __cdecl Create(AVSValue args, void* user_data, IScriptEnvironment* env);

private:
int theMatrix;
enum {Rec601=0, Rec709=1, PC_601=2, PC_709=3 };
enum {Rec601=0, Rec709=1, PC_601=2, PC_709=3};
};

// YUY2 only
class ConvertToYV12 : public GenericVideoFilter
/**
* Class for conversions to YV12
Expand Down
121 changes: 121 additions & 0 deletions avs_core/convert/convert_avx.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@
// Avisynth v2.5. Copyright 2002-2009 Ben Rudiak-Gould et al.
// http://www.avisynth.org

// This program is free software; you can redistribute it and/or modify
// it under the terms of the GNU General Public License as published by
// the Free Software Foundation; either version 2 of the License, or
// (at your option) any later version.
//
// This program is distributed in the hope that it will be useful,
// but WITHOUT ANY WARRANTY; without even the implied warranty of
// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
// GNU General Public License for more details.
//
// You should have received a copy of the GNU General Public License
// along with this program; if not, write to the Free Software
// Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA, or visit
// http://www.gnu.org/copyleft/gpl.html .
//
// Linking Avisynth statically or dynamically with other modules is making a
// combined work based on Avisynth. Thus, the terms and conditions of the GNU
// General Public License cover the whole combination.
//
// As a special exception, the copyright holders of Avisynth give you
// permission to link Avisynth with independent modules that communicate with
// Avisynth solely through the interfaces defined in avisynth.h, regardless of the license
// terms of these independent modules, and to copy and distribute the
// resulting combined work under terms of your choice, provided that
// every copy of the combined work is accompanied by a complete copy of
// the source code of Avisynth (the version of Avisynth used to produce the
// combined work), being distributed under the terms of the GNU General
// Public License plus this exception. An independent module is a module
// which is not derived from or based on Avisynth, such as 3rd-party filters,
// import and export plugins, or graphical user interfaces.


#include "convert.h"
#include "convert_planar.h"
#include "convert_rgb.h"
#include "convert_yv12.h"
#include "convert_yuy2.h"
#include <avs/alignment.h>
#include <avs/win.h>
#include <avs/minmax.h>
#include <emmintrin.h>
#include <immintrin.h>
#include <tuple>
#include <map>

#include "convert_avx.h"

template<typename pixel_t, uint8_t targetbits>
void convert_32_to_uintN_c_avx(const BYTE *srcp, BYTE *dstp, int src_rowsize, int src_height, int src_pitch, int dst_pitch, float float_range)
{
const float *srcp0 = reinterpret_cast<const float *>(srcp);
pixel_t *dstp0 = reinterpret_cast<pixel_t *>(dstp);

src_pitch = src_pitch / sizeof(float);
dst_pitch = dst_pitch / sizeof(pixel_t);

int src_width = src_rowsize / sizeof(float);

float max_dst_pixelvalue = (float)((1<<targetbits) - 1); // 255, 1023, 4095, 16383, 65535.0

float factor = 1.0f / float_range * max_dst_pixelvalue;

for(int y=0; y<src_height; y++)
{
for (int x = 0; x < src_width; x++)
{
float pixel = srcp0[x] * factor + 0.5f; // 0.5f: keep the neutral grey level of float 0.5
dstp0[x] = pixel_t(clamp(pixel, 0.0f, max_dst_pixelvalue)); // we clamp here!
}
dstp0 += dst_pitch;
srcp0 += src_pitch;
}
_mm256_zeroupper();
}

template void convert_32_to_uintN_c_avx<uint8_t, 8>(const BYTE *srcp, BYTE *dstp, int src_rowsize, int src_height, int src_pitch, int dst_pitch, float float_range);
template void convert_32_to_uintN_c_avx<uint16_t, 10>(const BYTE *srcp, BYTE *dstp, int src_rowsize, int src_height, int src_pitch, int dst_pitch, float float_range);
template void convert_32_to_uintN_c_avx<uint16_t, 12>(const BYTE *srcp, BYTE *dstp, int src_rowsize, int src_height, int src_pitch, int dst_pitch, float float_range);
template void convert_32_to_uintN_c_avx<uint16_t, 14>(const BYTE *srcp, BYTE *dstp, int src_rowsize, int src_height, int src_pitch, int dst_pitch, float float_range);
template void convert_32_to_uintN_c_avx<uint16_t, 16>(const BYTE *srcp, BYTE *dstp, int src_rowsize, int src_height, int src_pitch, int dst_pitch, float float_range);


// YUV: bit shift 10-12-14-16 <=> 10-12-14-16 bits
// shift right or left, depending on expandrange template param
template<bool expandrange, uint8_t shiftbits>
void convert_uint16_to_uint16_c_avx(const BYTE *srcp, BYTE *dstp, int src_rowsize, int src_height, int src_pitch, int dst_pitch, float float_range)
{
const uint16_t *srcp0 = reinterpret_cast<const uint16_t *>(srcp);
uint16_t *dstp0 = reinterpret_cast<uint16_t *>(dstp);

src_pitch = src_pitch / sizeof(uint16_t);
dst_pitch = dst_pitch / sizeof(uint16_t);

const int src_width = src_rowsize / sizeof(uint16_t);

for(int y=0; y<src_height; y++)
{
for (int x = 0; x < src_width; x++)
{
if(expandrange)
dstp0[x] = srcp0[x] << shiftbits; // expand range. No clamp before, source is assumed to have valid range
else
dstp0[x] = srcp0[x] >> shiftbits; // reduce range
}
dstp0 += dst_pitch;
srcp0 += src_pitch;
}
_mm256_zeroupper();
}

// instantiate them
template void convert_uint16_to_uint16_c_avx<false, 2>(const BYTE *srcp, BYTE *dstp, int src_rowsize, int src_height, int src_pitch, int dst_pitch, float float_range);
template void convert_uint16_to_uint16_c_avx<false, 4>(const BYTE *srcp, BYTE *dstp, int src_rowsize, int src_height, int src_pitch, int dst_pitch, float float_range);
template void convert_uint16_to_uint16_c_avx<false, 6>(const BYTE *srcp, BYTE *dstp, int src_rowsize, int src_height, int src_pitch, int dst_pitch, float float_range);
template void convert_uint16_to_uint16_c_avx<true, 2>(const BYTE *srcp, BYTE *dstp, int src_rowsize, int src_height, int src_pitch, int dst_pitch, float float_range);
template void convert_uint16_to_uint16_c_avx<true, 4>(const BYTE *srcp, BYTE *dstp, int src_rowsize, int src_height, int src_pitch, int dst_pitch, float float_range);
template void convert_uint16_to_uint16_c_avx<true, 6>(const BYTE *srcp, BYTE *dstp, int src_rowsize, int src_height, int src_pitch, int dst_pitch, float float_range);

46 changes: 46 additions & 0 deletions avs_core/convert/convert_avx.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
// Avisynth v2.5. Copyright 2002 Ben Rudiak-Gould et al.
// http://www.avisynth.org

// This program is free software; you can redistribute it and/or modify
// it under the terms of the GNU General Public License as published by
// the Free Software Foundation; either version 2 of the License, or
// (at your option) any later version.
//
// This program is distributed in the hope that it will be useful,
// but WITHOUT ANY WARRANTY; without even the implied warranty of
// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
// GNU General Public License for more details.
//
// You should have received a copy of the GNU General Public License
// along with this program; if not, write to the Free Software
// Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA, or visit
// http://www.gnu.org/copyleft/gpl.html .
//
// Linking Avisynth statically or dynamically with other modules is making a
// combined work based on Avisynth. Thus, the terms and conditions of the GNU
// General Public License cover the whole combination.
//
// As a special exception, the copyright holders of Avisynth give you
// permission to link Avisynth with independent modules that communicate with
// Avisynth solely through the interfaces defined in avisynth.h, regardless of the license
// terms of these independent modules, and to copy and distribute the
// resulting combined work under terms of your choice, provided that
// every copy of the combined work is accompanied by a complete copy of
// the source code of Avisynth (the version of Avisynth used to produce the
// combined work), being distributed under the terms of the GNU General
// Public License plus this exception. An independent module is a module
// which is not derived from or based on Avisynth, such as 3rd-party filters,
// import and export plugins, or graphical user interfaces.

#ifndef __Convert_AVX_H__
#define __Convert_AVX_H__

#include "../core/internal.h"

template<bool expandrange, uint8_t shiftbits>
void convert_uint16_to_uint16_c_avx(const BYTE *srcp, BYTE *dstp, int src_rowsize, int src_height, int src_pitch, int dst_pitch, float float_range);

template<typename pixel_t, uint8_t targetbits>
void convert_32_to_uintN_c_avx(const BYTE *srcp, BYTE *dstp, int src_rowsize, int src_height, int src_pitch, int dst_pitch, float float_range);

#endif // __Convert_AVX_H__
120 changes: 120 additions & 0 deletions avs_core/convert/convert_avx2.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,120 @@
// Avisynth v2.5. Copyright 2002-2009 Ben Rudiak-Gould et al.
// http://www.avisynth.org

// This program is free software; you can redistribute it and/or modify
// it under the terms of the GNU General Public License as published by
// the Free Software Foundation; either version 2 of the License, or
// (at your option) any later version.
//
// This program is distributed in the hope that it will be useful,
// but WITHOUT ANY WARRANTY; without even the implied warranty of
// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
// GNU General Public License for more details.
//
// You should have received a copy of the GNU General Public License
// along with this program; if not, write to the Free Software
// Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA, or visit
// http://www.gnu.org/copyleft/gpl.html .
//
// Linking Avisynth statically or dynamically with other modules is making a
// combined work based on Avisynth. Thus, the terms and conditions of the GNU
// General Public License cover the whole combination.
//
// As a special exception, the copyright holders of Avisynth give you
// permission to link Avisynth with independent modules that communicate with
// Avisynth solely through the interfaces defined in avisynth.h, regardless of the license
// terms of these independent modules, and to copy and distribute the
// resulting combined work under terms of your choice, provided that
// every copy of the combined work is accompanied by a complete copy of
// the source code of Avisynth (the version of Avisynth used to produce the
// combined work), being distributed under the terms of the GNU General
// Public License plus this exception. An independent module is a module
// which is not derived from or based on Avisynth, such as 3rd-party filters,
// import and export plugins, or graphical user interfaces.


#include "convert.h"
#include "convert_planar.h"
#include "convert_rgb.h"
#include "convert_yv12.h"
#include "convert_yuy2.h"
#include <avs/alignment.h>
#include <avs/win.h>
#include <avs/minmax.h>
#include <emmintrin.h>
#include <immintrin.h>
#include <tuple>
#include <map>

#include "convert_avx2.h"

template<typename pixel_t, uint8_t targetbits>
void convert_32_to_uintN_c_avx2(const BYTE *srcp, BYTE *dstp, int src_rowsize, int src_height, int src_pitch, int dst_pitch, float float_range)
{
const float *srcp0 = reinterpret_cast<const float *>(srcp);
pixel_t *dstp0 = reinterpret_cast<pixel_t *>(dstp);

src_pitch = src_pitch / sizeof(float);
dst_pitch = dst_pitch / sizeof(pixel_t);

int src_width = src_rowsize / sizeof(float);

float max_dst_pixelvalue = (float)((1<<targetbits) - 1); // 255, 1023, 4095, 16383, 65535.0

float factor = 1.0f / float_range * max_dst_pixelvalue;

for(int y=0; y<src_height; y++)
{
for (int x = 0; x < src_width; x++)
{
float pixel = srcp0[x] * factor + 0.5f; // 0.5f: keep the neutral grey level of float 0.5
dstp0[x] = pixel_t(clamp(pixel, 0.0f, max_dst_pixelvalue)); // we clamp here!
}
dstp0 += dst_pitch;
srcp0 += src_pitch;
}
}

template void convert_32_to_uintN_c_avx2<uint8_t, 8>(const BYTE *srcp, BYTE *dstp, int src_rowsize, int src_height, int src_pitch, int dst_pitch, float float_range);
template void convert_32_to_uintN_c_avx2<uint16_t, 10>(const BYTE *srcp, BYTE *dstp, int src_rowsize, int src_height, int src_pitch, int dst_pitch, float float_range);
template void convert_32_to_uintN_c_avx2<uint16_t, 12>(const BYTE *srcp, BYTE *dstp, int src_rowsize, int src_height, int src_pitch, int dst_pitch, float float_range);
template void convert_32_to_uintN_c_avx2<uint16_t, 14>(const BYTE *srcp, BYTE *dstp, int src_rowsize, int src_height, int src_pitch, int dst_pitch, float float_range);
template void convert_32_to_uintN_c_avx2<uint16_t, 16>(const BYTE *srcp, BYTE *dstp, int src_rowsize, int src_height, int src_pitch, int dst_pitch, float float_range);

// YUV: bit shift 10-12-14-16 <=> 10-12-14-16 bits
// shift right or left, depending on expandrange template param
template<bool expandrange, uint8_t shiftbits>
void convert_uint16_to_uint16_c_avx2(const BYTE *srcp, BYTE *dstp, int src_rowsize, int src_height, int src_pitch, int dst_pitch, float float_range)
{
const uint16_t *srcp0 = reinterpret_cast<const uint16_t *>(srcp);
uint16_t *dstp0 = reinterpret_cast<uint16_t *>(dstp);

src_pitch = src_pitch / sizeof(uint16_t);
dst_pitch = dst_pitch / sizeof(uint16_t);

const int src_width = src_rowsize / sizeof(uint16_t);

for(int y=0; y<src_height; y++)
{
for (int x = 0; x < src_width; x++)
{
if(expandrange)
dstp0[x] = srcp0[x] << shiftbits; // expand range. No clamp before, source is assumed to have valid range
else
dstp0[x] = srcp0[x] >> shiftbits; // reduce range
}
dstp0 += dst_pitch;
srcp0 += src_pitch;
}
// Anti-sse2-avx penalty vzeroupper (_mm256_zeroupper()) is automatically placed here if ymm registers are used
}

// instantiate them
template void convert_uint16_to_uint16_c_avx2<false, 2>(const BYTE *srcp, BYTE *dstp, int src_rowsize, int src_height, int src_pitch, int dst_pitch, float float_range);
template void convert_uint16_to_uint16_c_avx2<false, 4>(const BYTE *srcp, BYTE *dstp, int src_rowsize, int src_height, int src_pitch, int dst_pitch, float float_range);
template void convert_uint16_to_uint16_c_avx2<false, 6>(const BYTE *srcp, BYTE *dstp, int src_rowsize, int src_height, int src_pitch, int dst_pitch, float float_range);
template void convert_uint16_to_uint16_c_avx2<true, 2>(const BYTE *srcp, BYTE *dstp, int src_rowsize, int src_height, int src_pitch, int dst_pitch, float float_range);
template void convert_uint16_to_uint16_c_avx2<true, 4>(const BYTE *srcp, BYTE *dstp, int src_rowsize, int src_height, int src_pitch, int dst_pitch, float float_range);
template void convert_uint16_to_uint16_c_avx2<true, 6>(const BYTE *srcp, BYTE *dstp, int src_rowsize, int src_height, int src_pitch, int dst_pitch, float float_range);


Loading