@@ -49,12 +49,12 @@ Most of my time now is focused on **LLM/VLM** Inference. Please check 📖[Aweso
49
49
50
50
### 📒 CV移动端/服务端推理部署/C++/算法/技术随笔 (本人作者)
51
51
52
- - [[ 推理部署] ⚡️🔥覆盖云边端全场景 ,FastDeploy三行代码搞定150+ CV、NLP、Speech模型部署] ( https://zhuanlan.zhihu.com/p/581326442 )
52
+ - [[ 推理部署] ⚡️覆盖云边端 ,FastDeploy三行代码搞定150+ CV、NLP、Speech模型部署] ( https://zhuanlan.zhihu.com/p/581326442 )
53
53
- [[ 推理部署] 💡如何在lite.ai.toolkit(3.5k+🔥stars)中增加您的模型?] ( https://zhuanlan.zhihu.com/p/523876625 )
54
54
- [[ 推理部署] 🤓凑个热闹之 美团 YOLOv6 ORT/MNN/TNN/NCNN C++推理部署] ( https://zhuanlan.zhihu.com/p/533643238 )
55
55
- [[ 推理部署] 🌔ONNX推理加速技术文档-杂记] ( https://zhuanlan.zhihu.com/p/524023964 )
56
56
- [[ 推理部署] 👉Mac源码编译TensorFlow C++指北] ( https://zhuanlan.zhihu.com/p/524013615 )
57
- - [[ 推理部署] 👿1Mb!头部姿态估计: 来讲讲FSANet ,一个小而美的模型(含ONNXRuntime /MNN C++实现)] ( https://zhuanlan.zhihu.com/p/447364201 )
57
+ - [[ 推理部署] 👿1Mb!头部姿态估计: FSANet ,一个小而美的模型(含ORT /MNN C++实现)] ( https://zhuanlan.zhihu.com/p/447364201 )
58
58
- [[ 推理部署] 🤓opencv+ffmpeg编译打包全解指南] ( https://zhuanlan.zhihu.com/p/472115312 )
59
59
- [[ 推理部署] 🔧填坑: RobustVideoMatting(5k+🔥star)视频抠图静态ONNX模型转换] ( https://zhuanlan.zhihu.com/p/459088407 )
60
60
- [[ 推理部署] 🔥190Kb!SSRNet年龄检测详细解读(含C++工程)] ( https://zhuanlan.zhihu.com/p/462762797 )
@@ -73,7 +73,7 @@ Most of my time now is focused on **LLM/VLM** Inference. Please check 📖[Aweso
73
73
- [[ 推理部署] 📒超有用Tensorflow C++工程化知识点] ( https://zhuanlan.zhihu.com/p/449788027 )
74
74
- [[ 推理部署] 📒深度学习模型转换资料整理] ( https://zhuanlan.zhihu.com/p/449759361 )
75
75
- [[ 推理部署] 🔥🔥超轻量级NanoDet MNN/TNN/NCNN/ONNXRuntime C++工程记录] ( https://zhuanlan.zhihu.com/p/443419387 )
76
- - [[ 推理部署] 🔥详细记录MGMatting(CVPR2021)🔥MNN 、TNN和ONNXRuntime C++移植(长文警告!)] ( https://zhuanlan.zhihu.com/p/442949027 )
76
+ - [[ 推理部署] 🔥详细记录MGMatting之MNN 、TNN和ONNXRuntime C++移植(长文警告!)] ( https://zhuanlan.zhihu.com/p/442949027 )
77
77
- [[ 推理部署] 🔥YOLOX NCNN/MNN/TNN/ONNXRuntime C++工程简记] ( https://zhuanlan.zhihu.com/p/447364122 )
78
78
- [[ 推理部署] 🔥手动修改YoloX的tnnproto记录-TNN C++] ( https://zhuanlan.zhihu.com/p/425668734 )
79
79
- [[ 推理部署] 🔥🔥🔥 全网最详细 ONNXRuntime C++/Java/Python 资料!] ( https://zhuanlan.zhihu.com/p/414317269 )
@@ -96,14 +96,8 @@ Most of my time now is focused on **LLM/VLM** Inference. Please check 📖[Aweso
96
96
97
97
### 📒 CUTLASS/CuTe/Tensor Cores等文章推荐 (其他作者)
98
98
99
- - [[ GPU指令集架构] 📖NVidia GPU指令集架构-前言(@reed )] ( https://zhuanlan.zhihu.com/p/686198447 )
100
- - [[ GPU指令集架构] 📖NVidia GPU指令集架构-寄存器(@reed )] ( https://zhuanlan.zhihu.com/p/688616037 )
101
- - [[ GPU指令集架构] 📖NVidia GPU指令集架构-Load和Cache(@reed )] ( https://zhuanlan.zhihu.com/p/692445145 )
102
- - [[ GPU指令集架构] 📖NVidia GPU指令集架构-浮点运算(@reed )] ( https://zhuanlan.zhihu.com/p/695667044 )
103
- - [[ GPU指令集架构] 📖NVidia GPU指令集架构-整数运算(@reed )] ( https://zhuanlan.zhihu.com/p/700921948 )
104
- - [[ GPU指令集架构] 📖NVidia GPU指令集架构-比特和逻辑操作(@reed )] ( https://zhuanlan.zhihu.com/p/712356884 )
105
- - [[ cute系列详解] [ 概念入门 ] 📖cutlass cute 101(@朱小霖)] ( https://zhuanlan.zhihu.com/p/660379052 )
106
- - [[ cute系列详解] [ 概念入门 ] 📖CUTLASS 2.x & CUTLASS 3.x Intro 学习笔记(@BBuf )] ( https://zhuanlan.zhihu.com/p/710516489 )
99
+ - [[ cute系列详解] [ 入门 ] 📖cutlass cute 101(@朱小霖)] ( https://zhuanlan.zhihu.com/p/660379052 )
100
+ - [[ cute系列详解] [ 入门 ] 📖CUTLASS 2.x & CUTLASS 3.x Intro 学习笔记(@BBuf )] ( https://zhuanlan.zhihu.com/p/710516489 )
107
101
- [[ cute系列详解] [ Layout ] 📖cute 之 Layout(@reed )] ( https://zhuanlan.zhihu.com/p/661182311 )
108
102
- [[ cute系列详解] [ Layout ] 📖cute Layout 的代数和几何解释(@reed )] ( https://zhuanlan.zhihu.com/p/662089556 )
109
103
- [[ cute系列详解] [ Tensor ] 📖cute 之 Tensor(@reed )] ( https://zhuanlan.zhihu.com/p/663093816 )
@@ -117,20 +111,28 @@ Most of my time now is focused on **LLM/VLM** Inference. Please check 📖[Aweso
117
111
- [[ cute系列详解] [ GEMM ] 📖cute 之 GEMM流水线(@reed )] ( https://zhuanlan.zhihu.com/p/665082713 )
118
112
- [[ cute系列详解] [ GEMM ] 📖cute 之 高效GEMM实现(@reed )] ( https://zhuanlan.zhihu.com/p/675308830 )
119
113
- [[ cute系列详解] [ GEMM ] 📖GEMM流水线: single-stage、pipelined、multi-stage(@Titus )] ( https://zhuanlan.zhihu.com/p/712451053 )
120
- - [[ cute系列详解] [ 应用实践 ] 📖Hopper Mixed GEMM的CUTLASS实现笔记(@BBuf )] ( https://zhuanlan.zhihu.com/p/714378343 )
121
- - [[ cute系列详解] [ 应用实践 ] 📖CUTLASS CuTe实战(一)-基础(@进击的Killua)] ( https://zhuanlan.zhihu.com/p/690703999 )
122
- - [[ cute系列详解] [ 应用实践 ] 📖CUTLASS CuTe实战(二)-应用(@进击的Killua)] ( https://zhuanlan.zhihu.com/p/692078624 )
123
- - [[ cute系列详解] [ 应用实践 ] 📖FlashAttention fp8实现(ada架构)(@shengying .wei)] ( https://zhuanlan.zhihu.com/p/712314257 )
124
- - [[ cute系列详解] [ 应用实践 ] 📖FlashAttention 笔记: tiny-flash-attention解读(@shengying .wei)] ( https://zhuanlan.zhihu.com/p/708867810 )
125
- - [[ cute系列详解] [ 应用实践 ] 📖使用cutlass cute复现flash attention(@66RING )] ( https://zhuanlan.zhihu.com/p/696323042 )
114
+ - [[ cute系列详解] [ GEMM ] 📖CuTe GEMM细节分析(一): ldmatrix的选择(@Anonymous )] ( https://zhuanlan.zhihu.com/p/702818267 )
115
+ - [[ cute系列详解] [ GEMM ] 📖CuTe GEMM细节分析(二): TiledCopy与cp.async(@Anonymous )] ( https://zhuanlan.zhihu.com/p/703560147 )
116
+ - [[ cute系列详解] [ GEMM ] 📖CuTe GEMM细节分析(三): Swizzle<B,M,S>参数取值(@Anonymous )] ( https://zhuanlan.zhihu.com/p/713713957 )
117
+ - [[ cute系列详解] [ 实践 ] 📖Hopper Mixed GEMM的CUTLASS实现笔记(@BBuf )] ( https://zhuanlan.zhihu.com/p/714378343 )
118
+ - [[ cute系列详解] [ 实践 ] 📖CUTLASS CuTe实战(一): 基础(@进击的Killua)] ( https://zhuanlan.zhihu.com/p/690703999 )
119
+ - [[ cute系列详解] [ 实践 ] 📖CUTLASS CuTe实战(二): 应用(@进击的Killua)] ( https://zhuanlan.zhihu.com/p/692078624 )
120
+ - [[ cute系列详解] [ 实践 ] 📖FlashAttention fp8实现(ada架构)(@shengying .wei)] ( https://zhuanlan.zhihu.com/p/712314257 )
121
+ - [[ cute系列详解] [ 实践 ] 📖FlashAttention 笔记: tiny-flash-attention解读(@shengying .wei)] ( https://zhuanlan.zhihu.com/p/708867810 )
122
+ - [[ cute系列详解] [ 实践 ] 📖使用cutlass cute复现flash attention(@66RING )] ( https://zhuanlan.zhihu.com/p/696323042 )
126
123
- [[ cutlass教程] 📖cutlass 基本认知(@JoeNomad )] ( https://zhuanlan.zhihu.com/p/677616101 )
127
124
- [[ cutlass教程] 📖cutlass 软件架构(@JoeNomad )] ( https://zhuanlan.zhihu.com/p/678915618 )
128
125
- [[ cutlass教程] 📖cutlass block swizzle 和 tile iterator(@JoeNomad )] ( https://zhuanlan.zhihu.com/p/679929705 )
129
126
- [[ cutlass教程] 📖cutlass bank conflict free 的shared memory layout(@JoeNomad )] ( https://zhuanlan.zhihu.com/p/681966685 )
130
127
- [[ cutlass教程] 📖cutlass 多级流水线(@JoeNomad )] ( https://zhuanlan.zhihu.com/p/687397095 )
131
128
- [[ cutlass教程] 📖CUTLASS 基础介绍(@进击的Killua)] ( https://zhuanlan.zhihu.com/p/671324125 )
132
129
- [[ cutlass教程] 📖乱谈CUTLASS GTC2020 SLIDES(@zzk again)] ( https://zhuanlan.zhihu.com/p/674693873 )
133
-
130
+ - [[ GPU指令集架构] 📖NVidia GPU指令集架构-前言(@reed )] ( https://zhuanlan.zhihu.com/p/686198447 )
131
+ - [[ GPU指令集架构] 📖NVidia GPU指令集架构-寄存器(@reed )] ( https://zhuanlan.zhihu.com/p/688616037 )
132
+ - [[ GPU指令集架构] 📖NVidia GPU指令集架构-Load和Cache(@reed )] ( https://zhuanlan.zhihu.com/p/692445145 )
133
+ - [[ GPU指令集架构] 📖NVidia GPU指令集架构-浮点运算(@reed )] ( https://zhuanlan.zhihu.com/p/695667044 )
134
+ - [[ GPU指令集架构] 📖NVidia GPU指令集架构-整数运算(@reed )] ( https://zhuanlan.zhihu.com/p/700921948 )
135
+ - [[ GPU指令集架构] 📖NVidia GPU指令集架构-比特和逻辑操作(@reed )] ( https://zhuanlan.zhihu.com/p/712356884 )
134
136
135
137
💡说明: 大佬们写的文章实在是太棒了,学到了很多东西。欢迎大家提PR推荐更多优秀的文章!
136
138
0 commit comments