Update README.md

DefTruth · web-flow · commit f9d542bfbdae · 2024-08-21T10:05:38.000+08:00
diff --git a/README.md b/README.md
@@ -24,57 +24,57 @@ Most of my time now is focused on **LLM/VLM** Inference. Please check 📖[Aweso
 
 ### 📒 大模型/多模态/Diffusion 推理优化 (本人作者)
 
-- [[VLM推理优化][InternLM/VL系列][万字]📒InternLM2/.../InternVL1.5系列笔记: 核心点解析](https://zhuanlan.zhihu.com/p/702481058)
+- [[VLM推理优化][InternVL系列]📒InternLM2/.../InternVL1.5系列笔记: 核心点解析](https://zhuanlan.zhihu.com/p/702481058)
 - [[LLM推理优化][TensorRT-LLM][5w字]🔥TensorRT-LLM部署调优-指北](https://zhuanlan.zhihu.com/p/699333691)
 - [[LLM推理优化][KV Cache优化]🔥MQA/GQA/YOCO/CLA/MLKV笔记: 层内和层间KV Cache共享](https://zhuanlan.zhihu.com/p/697311739)
 - [[LLM推理优化][Prefill优化]🔥图解vLLM Prefix Prefill Triton Kernel](https://zhuanlan.zhihu.com/p/695799736)
 - [[LLM推理优化][Prefill优化][万字]🔥原理&图解vLLM Automatic Prefix Cache: 首Token时延优化](https://zhuanlan.zhihu.com/p/693556044)
 - [[LLM推理优化][Attention优化][2w字]🔥原理&图解: 从Online-Softmax到FlashAttention V1/V2/V3](https://zhuanlan.zhihu.com/p/668888063)
 - [[LLM推理优化][Decoding优化]🔥原理&图解FlashDecoding/FlashDecoding++](https://zhuanlan.zhihu.com/p/696075602)
 - [[VLM推理优化][LLaVA系列]📒CLIP/LLaVA/LLaVA1.5/VILA笔记: 核心点解析](https://zhuanlan.zhihu.com/p/683137074)
-- [[LLM推理优化][Attention优化][万字]🔥TensorRT MHA/Myelin Optimize vs FlashAttention-2](https://zhuanlan.zhihu.com/p/678873216)
-- [[LLM推理优化][CUDA 12 PTX汇编]📒PRMT指令详解-通用模式](https://zhuanlan.zhihu.com/p/660630414)
-- [[LLM推理优化][CUDA 12 PTX汇编]📒LOP3指令详解](https://zhuanlan.zhihu.com/p/659741469)
-- [[LLM推理优化][3w字]🔥高频面试题汇总-大模型手撕CUDA](https://zhuanlan.zhihu.com/p/678903537)
-- [[LLM推理优化]🔥WINT8/4-(00): 通俗易懂讲解-快速反量化算法](https://zhuanlan.zhihu.com/p/657072856)
-- [[LLM推理优化]🔥WINT8/4-(01): PRMT指令详解及FasterTransformer源码解析](https://zhuanlan.zhihu.com/p/657070837)
-- [[LLM推理优化]🔥WINT8/4-(02): 快速反量化之INT8转BF16](https://zhuanlan.zhihu.com/p/657073159)
-- [[LLM推理优化]🔥WINT8/4-(03): LOP3指令详解及INT4转FP16/BF16分析](https://zhuanlan.zhihu.com/p/657073857)
-- [[LLM推理优化]🔥100+篇: 大模型推理各方向新发展整理](https://zhuanlan.zhihu.com/p/693680304)
-- [[LLM推理优化]🔥30+篇: LLM推理论文集-500页PDF💡](https://zhuanlan.zhihu.com/p/669777159)
-- [[LLM推理优化]🔥FlashDecoding++: 比FlashDecoding还要快！](https://zhuanlan.zhihu.com/p/665022589)
-- [[LLM推理优化]🔥速递：TensorRT-LLM开源，TensorRT 9.1 也来了🤓](https://zhuanlan.zhihu.com/p/662361469)
-- [[LLM推理优化]🔥20+篇: LLM推理论文集-300页PDF💡](https://zhuanlan.zhihu.com/p/658091768)
-- [[LLM推理优化]🔥PagedAttention论文新鲜出炉](https://zhuanlan.zhihu.com/p/617015570)
+- [[LLM推理优化][Attention优化][万字]🔥TensorRT MHA/Myelin vs FlashAttention-2](https://zhuanlan.zhihu.com/p/678873216)
+- [[LLM推理优化][PTX汇编]📒CUDA 12 PTX汇编: PRMT指令详解-通用模式](https://zhuanlan.zhihu.com/p/660630414)
+- [[LLM推理优化][PTX汇编]📒CUDA 12 PTX汇编: LOP3指令详解](https://zhuanlan.zhihu.com/p/659741469)
+- [[LLM推理优化][CUDA][3w字]🔥高频面试题汇总-大模型手撕CUDA](https://zhuanlan.zhihu.com/p/678903537)
+- [[LLM推理优化][Weight Only]🔥WINT8/4-(00): 通俗易懂讲解-快速反量化算法](https://zhuanlan.zhihu.com/p/657072856)
+- [[LLM推理优化][Weight Only]🔥WINT8/4-(01): PRMT指令详解及FasterTransformer源码解析](https://zhuanlan.zhihu.com/p/657070837)
+- [[LLM推理优化][Weight Only]🔥WINT8/4-(02): 快速反量化之INT8转BF16](https://zhuanlan.zhihu.com/p/657073159)
+- [[LLM推理优化][Weight Only]🔥WINT8/4-(03): LOP3指令详解及INT4转FP16/BF16分析](https://zhuanlan.zhihu.com/p/657073857)
+- [[LLM推理优化][LLM Infra整理]🔥100+篇: 大模型推理各方向新发展整理](https://zhuanlan.zhihu.com/p/693680304)
+- [[LLM推理优化][LLM Infra整理]🔥30+篇: LLM推理论文集-500页PDF💡](https://zhuanlan.zhihu.com/p/669777159)
+- [[LLM推理优化][LLM Infra整理]🔥FlashDecoding++: 比FlashDecoding还要快！](https://zhuanlan.zhihu.com/p/665022589)
+- [[LLM推理优化][LLM Infra整理]🔥速递：TensorRT-LLM开源，TensorRT 9.1 也来了🤓](https://zhuanlan.zhihu.com/p/662361469)
+- [[LLM推理优化][LLM Infra整理]🔥20+篇: LLM推理论文集-300页PDF💡](https://zhuanlan.zhihu.com/p/658091768)
+- [[LLM推理优化][LLM Infra整理]🔥PagedAttention论文新鲜出炉](https://zhuanlan.zhihu.com/p/617015570)
 
 ### 📒 CV移动端/服务端推理部署/C++/算法/技术随笔 (本人作者)
 
-- [[推理部署]⚡️覆盖云边端，FastDeploy三行代码搞定150+ CV、NLP模型部署](https://zhuanlan.zhihu.com/p/581326442)
-- [[推理部署]💡如何在lite.ai.toolkit(3.6k+🔥stars)中增加您的模型？](https://zhuanlan.zhihu.com/p/523876625)
-- [[推理部署]🤓凑个热闹之 美团 YOLOv6 ORT/MNN/TNN/NCNN C++推理部署](https://zhuanlan.zhihu.com/p/533643238)
-- [[推理部署]🌔ONNX推理加速技术文档-杂记](https://zhuanlan.zhihu.com/p/524023964)
-- [[推理部署]👉Mac源码编译TensorFlow C++指北](https://zhuanlan.zhihu.com/p/524013615)
-- [[推理部署]👿1Mb!头部姿态估计: FSANet，一个小而美的模型(含C++实现)](https://zhuanlan.zhihu.com/p/447364201)
-- [[推理部署]🤓opencv+ffmpeg编译打包全解指南](https://zhuanlan.zhihu.com/p/472115312)
-- [[推理部署]🔧填坑: RobustVideoMatting视频抠图静态ONNX模型转换](https://zhuanlan.zhihu.com/p/459088407)
-- [[推理部署]🔥190Kb!SSRNet年龄检测详细解读（含C++工程）](https://zhuanlan.zhihu.com/p/462762797)
-- [[推理部署]🔥MGMatting(CVPR2021)人像抠图C++应用记录](https://zhuanlan.zhihu.com/p/464732042)
-- [[推理部署]🍅超准确人脸检测(带关键点)YOLO5Face C++工程详细记录](https://zhuanlan.zhihu.com/p/461878005)
-- [[推理部署]👋解决: ONNXRuntime(Python) GPU 部署配置记录](https://zhuanlan.zhihu.com/p/457484536)
-- [[推理部署]🍅记录SCRFD(CVPR2021)人脸检测C++工程化(含docker镜像)](https://zhuanlan.zhihu.com/p/455165568)
-- [[推理部署]👋野路子：记录一个解决onnx转ncnn时op不支持的trick](https://zhuanlan.zhihu.com/p/451446147)
-- [[推理部署]🔥升级版NanoDet-Plus MNN/TNN/NCNN/ORT C++工程记录](https://zhuanlan.zhihu.com/p/450586647)
-- [[推理部署]🔥超轻量级NanoDet MNN/TNN/NCNN/ORT C++工程记录](https://zhuanlan.zhihu.com/p/443419387)
-- [[推理部署]🔥详细记录MGMatting之MNN、TNN和ORT C++移植（长文警告!）](https://zhuanlan.zhihu.com/p/442949027)
-- [[推理部署]🔥YOLOX NCNN/MNN/TNN/ONNXRuntime C++工程简记](https://zhuanlan.zhihu.com/p/447364122)
-- [[推理部署]🔥手动修改YoloX的tnnproto记录-TNN](https://zhuanlan.zhihu.com/p/425668734)
-- [[推理部署]🔥全网最详细 ONNXRuntime C++/Java/Python 资料！](https://zhuanlan.zhihu.com/p/414317269)
-- [[推理部署]🔥RobustVideoMatting🔥: C++工程化记录-实现篇](https://zhuanlan.zhihu.com/p/413280488)
-- [[推理部署]🔥RobustVideoMatting🔥: C++工程化记录-应用篇](https://zhuanlan.zhihu.com/p/412491918)
-- [[推理部署]💡ONNXRuntime C++ CMake 工程分析及编译](https://zhuanlan.zhihu.com/p/411887386)
-- [[推理部署]🤓如何使用ONNXRuntime C++ API处理NCHW和NHWC输入？](https://zhuanlan.zhihu.com/p/524230808)
-- [[推理部署]💡tnn-convert搭建简记-YOLOP转TNN](https://zhuanlan.zhihu.com/p/431418709)
-- [[推理部署]💡YOLOP ONNXRuntime C++工程化记录](https://zhuanlan.zhihu.com/p/411651933)
+- [[推理部署][CV/NLP]⚡️FastDeploy三行代码搞定150+ CV、NLP模型部署](https://zhuanlan.zhihu.com/p/581326442)
+- [[推理部署][CV]💡如何在lite.ai.toolkit(3.6k+🔥stars)中增加您的模型？](https://zhuanlan.zhihu.com/p/523876625)
+- [[推理部署][CV]🤓凑个热闹之 美团 YOLOv6 ORT/MNN/TNN/NCNN C++推理部署](https://zhuanlan.zhihu.com/p/533643238)
+- [[推理部署][ONNX]🌔ONNX推理加速技术文档-杂记](https://zhuanlan.zhihu.com/p/524023964)
+- [[推理部署][TensorFlow]👉Mac源码编译TensorFlow C++指北](https://zhuanlan.zhihu.com/p/524013615)
+- [[推理部署][CV]👿1Mb!头部姿态估计: FSANet，一个小而美的模型(含C++实现)](https://zhuanlan.zhihu.com/p/447364201)
+- [[推理部署][CV]🤓opencv+ffmpeg编译打包全解指南](https://zhuanlan.zhihu.com/p/472115312)
+- [[推理部署][CV]🔧RobustVideoMatting视频抠图静态ONNX模型转换](https://zhuanlan.zhihu.com/p/459088407)
+- [[推理部署][CV]🔥190Kb!SSRNet年龄检测详细解读（含C++工程）](https://zhuanlan.zhihu.com/p/462762797)
+- [[推理部署][CV]🔥MGMatting(CVPR2021)人像抠图C++应用记录](https://zhuanlan.zhihu.com/p/464732042)
+- [[推理部署][CV]🍅超准确人脸检测(带关键点)YOLO5Face C++工程详细记录](https://zhuanlan.zhihu.com/p/461878005)
+- [[推理部署][ORT]👋解决: ONNXRuntime(Python) GPU 部署配置记录](https://zhuanlan.zhihu.com/p/457484536)
+- [[推理部署][CV]🍅记录SCRFD(CVPR2021)人脸检测C++工程化(含docker镜像)](https://zhuanlan.zhihu.com/p/455165568)
+- [[推理部署][NCNN]👋野路子：记录一个解决onnx转ncnn时op不支持的trick](https://zhuanlan.zhihu.com/p/451446147)
+- [[推理部署][CV]🔥升级版NanoDet-Plus MNN/TNN/NCNN/ORT C++工程记录](https://zhuanlan.zhihu.com/p/450586647)
+- [[推理部署][CV]🔥超轻量级NanoDet MNN/TNN/NCNN/ORT C++工程记录](https://zhuanlan.zhihu.com/p/443419387)
+- [[推理部署][CV]🔥详细记录MGMatting之MNN、TNN和ORT C++移植（长文警告!）](https://zhuanlan.zhihu.com/p/442949027)
+- [[推理部署][CV]🔥YOLOX NCNN/MNN/TNN/ONNXRuntime C++工程简记](https://zhuanlan.zhihu.com/p/447364122)
+- [[推理部署][TNN]🔥手动修改YoloX的tnnproto记录-TNN](https://zhuanlan.zhihu.com/p/425668734)
+- [[推理部署][ORT]🔥全网最详细 ONNXRuntime C++/Java/Python 资料！](https://zhuanlan.zhihu.com/p/414317269)
+- [[推理部署][CV]🔥RobustVideoMatting🔥: C++工程化记录-实现篇](https://zhuanlan.zhihu.com/p/413280488)
+- [[推理部署][CV]🔥RobustVideoMatting🔥: C++工程化记录-应用篇](https://zhuanlan.zhihu.com/p/412491918)
+- [[推理部署][ORT]💡ONNXRuntime C++ CMake 工程分析及编译](https://zhuanlan.zhihu.com/p/411887386)
+- [[推理部署][ORT]🤓如何使用ONNXRuntime C++ API处理NCHW和NHWC输入？](https://zhuanlan.zhihu.com/p/524230808)
+- [[推理部署][TNN]💡tnn-convert搭建简记-YOLOP转TNN](https://zhuanlan.zhihu.com/p/431418709)
+- [[推理部署][CV]💡YOLOP ONNXRuntime C++工程化记录](https://zhuanlan.zhihu.com/p/411651933)
 - [[推理部署][NCNN]📒超有用NCNN参考资料整理](https://zhuanlan.zhihu.com/p/449765328)
 - [[推理部署][MNN]📒超有用MNN参考资料整理](https://zhuanlan.zhihu.com/p/449761992)
 - [[推理部署][TNN]📒超有用TNN参考资料整理](https://zhuanlan.zhihu.com/p/449769615)
@@ -86,13 +86,13 @@ Most of my time now is focused on **LLM/VLM** Inference. Please check 📖[Aweso
 - [[技术随笔][C++][CMake]👋超有用CMake参考资料整理](https://zhuanlan.zhihu.com/p/449779892)
 - [[技术随笔][C++][3W字]💡静态链接和静态库实践指北-原理篇](https://zhuanlan.zhihu.com/p/595527528)
 - [[技术随笔][C++]🤓Mac下C++内存检查指北(Valgrind VS Asan)](https://zhuanlan.zhihu.com/p/508470880)
-- [[技术随笔]🔥torchlm: 人脸关键点检测库](https://zhuanlan.zhihu.com/p/467211561)
-- [[技术随笔]📒200页PDF:《统计学习方法-李航: 笔记-从原理到实现-基于R》](https://zhuanlan.zhihu.com/p/684885595)
-- [[技术随笔]💡如何优雅地git clone和git submodule？](https://zhuanlan.zhihu.com/p/639136221)
-- [[技术随笔]📒人脸重建3D参考资料整理](https://zhuanlan.zhihu.com/p/524034741)
-- [[技术随笔]📒BlendShapes参考资料整理](https://zhuanlan.zhihu.com/p/524036145)
-- [[技术随笔]🛠🛠从源码安装Pytorch3D详细记录及学习资料](https://zhuanlan.zhihu.com/p/512347464)
-- [[技术随笔]🍅🍅200页:《统计学习方法：李航》笔记 -从原理到实现](https://zhuanlan.zhihu.com/p/461520847)
+- [[技术随笔][CV]🔥torchlm: 人脸关键点检测库](https://zhuanlan.zhihu.com/p/467211561)
+- [[技术随笔][ML]📒200页PDF:《统计学习方法-李航: 笔记-从原理到实现-基于R》](https://zhuanlan.zhihu.com/p/684885595)
+- [[技术随笔][Git]💡如何优雅地git clone和git submodule？](https://zhuanlan.zhihu.com/p/639136221)
+- [[技术随笔][3D]📒人脸重建3D参考资料整理](https://zhuanlan.zhihu.com/p/524034741)
+- [[技术随笔][3D]📒BlendShapes参考资料整理](https://zhuanlan.zhihu.com/p/524036145)
+- [[技术随笔][3D]🛠从源码安装Pytorch3D详细记录及学习资料](https://zhuanlan.zhihu.com/p/512347464)
+- [[技术随笔][ML]🍅200页:《统计学习方法：李航》笔记 -从原理到实现](https://zhuanlan.zhihu.com/p/461520847)
   
 ### 📒 CUTLASS/CuTe/Tensor Cores等文章推荐 (其他作者)
 
@@ -111,28 +111,28 @@ Most of my time now is focused on **LLM/VLM** Inference. Please check 📖[Aweso
 - [[cute系列详解][GEMM]📖cute 之 GEMM流水线(@reed)](https://zhuanlan.zhihu.com/p/665082713)
 - [[cute系列详解][GEMM]📖cute 之 高效GEMM实现(@reed)](https://zhuanlan.zhihu.com/p/675308830)
 - [[cute系列详解][GEMM]📖GEMM流水线: single-stage、pipelined、multi-stage(@Titus)](https://zhuanlan.zhihu.com/p/712451053)
-- [[cute系列详解][GEMM]📖CuTe GEMM细节分析(一): ldmatrix的选择(@Anonymous)](https://zhuanlan.zhihu.com/p/702818267)
-- [[cute系列详解][GEMM]📖CuTe GEMM细节分析(二): TiledCopy与cp.async(@Anonymous)](https://zhuanlan.zhihu.com/p/703560147)
-- [[cute系列详解][GEMM]📖CuTe GEMM细节分析(三): Swizzle<B,M,S>参数取值(@Anonymous)](https://zhuanlan.zhihu.com/p/713713957)
+- [[cute系列详解][GEMM]📖GEMM细节分析(一): ldmatrix的选择(@Anonymous)](https://zhuanlan.zhihu.com/p/702818267)
+- [[cute系列详解][GEMM]📖GEMM细节分析(二): TiledCopy与cp.async(@Anonymous)](https://zhuanlan.zhihu.com/p/703560147)
+- [[cute系列详解][GEMM]📖GEMM细节分析(三): Swizzle<B,M,S>参数取值(@Anonymous)](https://zhuanlan.zhihu.com/p/713713957)
 - [[cute系列详解][实践]📖Hopper Mixed GEMM的CUTLASS实现笔记(@BBuf)](https://zhuanlan.zhihu.com/p/714378343)
 - [[cute系列详解][实践]📖CUTLASS CuTe实战(一): 基础(@进击的Killua)](https://zhuanlan.zhihu.com/p/690703999)
 - [[cute系列详解][实践]📖CUTLASS CuTe实战(二): 应用(@进击的Killua)](https://zhuanlan.zhihu.com/p/692078624)
 - [[cute系列详解][实践]📖FlashAttention fp8实现（ada架构)(@shengying.wei)](https://zhuanlan.zhihu.com/p/712314257)
 - [[cute系列详解][实践]📖FlashAttention 笔记: tiny-flash-attention解读(@shengying.wei)](https://zhuanlan.zhihu.com/p/708867810)
 - [[cute系列详解][实践]📖使用cutlass cute复现flash attention(@66RING)](https://zhuanlan.zhihu.com/p/696323042)
-- [[cutlass教程]📖cutlass 基本认知(@JoeNomad)](https://zhuanlan.zhihu.com/p/677616101)
-- [[cutlass教程]📖cutlass 软件架构(@JoeNomad)](https://zhuanlan.zhihu.com/p/678915618)
-- [[cutlass教程]📖cutlass block swizzle 和 tile iterator(@JoeNomad)](https://zhuanlan.zhihu.com/p/679929705)
-- [[cutlass教程]📖cutlass bank conflict free 的shared memory layout(@JoeNomad)](https://zhuanlan.zhihu.com/p/681966685)
-- [[cutlass教程]📖cutlass 多级流水线(@JoeNomad)](https://zhuanlan.zhihu.com/p/687397095)
-- [[cutlass教程]📖CUTLASS 基础介绍(@进击的Killua)](https://zhuanlan.zhihu.com/p/671324125)
-- [[cutlass教程]📖乱谈CUTLASS GTC2020 SLIDES(@zzk again)](https://zhuanlan.zhihu.com/p/674693873)
-- [[GPU指令集架构]📖NVidia GPU指令集架构-前言(@reed)](https://zhuanlan.zhihu.com/p/686198447)
-- [[GPU指令集架构]📖NVidia GPU指令集架构-寄存器(@reed)](https://zhuanlan.zhihu.com/p/688616037)
-- [[GPU指令集架构]📖NVidia GPU指令集架构-Load和Cache(@reed)](https://zhuanlan.zhihu.com/p/692445145)
-- [[GPU指令集架构]📖NVidia GPU指令集架构-浮点运算(@reed)](https://zhuanlan.zhihu.com/p/695667044)
-- [[GPU指令集架构]📖NVidia GPU指令集架构-整数运算(@reed)](https://zhuanlan.zhihu.com/p/700921948)
-- [[GPU指令集架构]📖NVidia GPU指令集架构-比特和逻辑操作(@reed)](https://zhuanlan.zhihu.com/p/712356884)
+- [[cutlass教程][入门]📖cutlass 基本认知(@JoeNomad)](https://zhuanlan.zhihu.com/p/677616101)
+- [[cutlass教程][入门]📖cutlass 软件架构(@JoeNomad)](https://zhuanlan.zhihu.com/p/678915618)
+- [[cutlass教程][入门]📖CUTLASS 基础介绍(@进击的Killua)](https://zhuanlan.zhihu.com/p/671324125)
+- [[cutlass教程][入门]📖乱谈CUTLASS GTC2020 SLIDES(@zzk again)](https://zhuanlan.zhihu.com/p/674693873)
+- [[cutlass教程][深入]📖cutlass block swizzle 和 tile iterator(@JoeNomad)](https://zhuanlan.zhihu.com/p/679929705)
+- [[cutlass教程][深入]📖cutlass bank conflict free 的shared memory layout(@JoeNomad)](https://zhuanlan.zhihu.com/p/681966685)
+- [[cutlass教程][深入]📖cutlass 多级流水线(@JoeNomad)](https://zhuanlan.zhihu.com/p/687397095)
+- [[GPU指令集架构][精解]📖NVidia GPU指令集架构-前言(@reed)](https://zhuanlan.zhihu.com/p/686198447)
+- [[GPU指令集架构][精解]📖NVidia GPU指令集架构-寄存器(@reed)](https://zhuanlan.zhihu.com/p/688616037)
+- [[GPU指令集架构][精解]📖NVidia GPU指令集架构-Load和Cache(@reed)](https://zhuanlan.zhihu.com/p/692445145)
+- [[GPU指令集架构][精解]📖NVidia GPU指令集架构-浮点运算(@reed)](https://zhuanlan.zhihu.com/p/695667044)
+- [[GPU指令集架构][精解]📖NVidia GPU指令集架构-整数运算(@reed)](https://zhuanlan.zhihu.com/p/700921948)
+- [[GPU指令集架构][精解]📖NVidia GPU指令集架构-比特和逻辑操作(@reed)](https://zhuanlan.zhihu.com/p/712356884)
 
 💡说明: 大佬们写的文章实在是太棒了，学到了很多东西。欢迎大家提PR推荐更多优秀的文章！