Sg4Dylan
diff --git a/‎.gitignore‎
Lines changed: 1 addition & 1 deletion b/‎.gitignore‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎README.md‎
Lines changed: 42 additions & 1 deletion b/‎README.md‎
Lines changed: 42 additions & 1 deletion
diff --git a/‎README_CN.md‎
Lines changed: 35 additions & 0 deletions b/‎README_CN.md‎
Lines changed: 35 additions & 0 deletions
@@ -173,7 +173,7 @@ cython_debug/
 #  be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
 #  and can be added to the global gitignore or merged into this file.  For a more nuclear
 #  option (not recommended) you can uncomment the following to ignore the entire idea folder.
-#.idea/
+.idea/
 
 # Abstra
 # Abstra is an AI-powered process automation framework.
 
@@ -1,2 +1,43 @@
 # EmiyaEngineNN
-Fully Neural Networked EmiyaEngine. An upsampler for general music audio.
+
+[English](README.md) | [简体中文](README_CN.md)
+
+A completely reconstructed EmiyaEngine using neural networks.
+An upsampling/restoration model suitable for common lossy audio.
+
+---
+
+## Methodology
+
+EmiyaEngineNN is a high-fidelity broadband audio upsampling model based on a modification
+of [BAE-Net](https://github.com/yuguochencuc/BAE-Net).  
+Compared to the original design, the network capacity has been increased to about three times the original by
+significantly widening the FFT window (1576->3072), modifying the number of channels in the intermediate layers, and
+other operations.  
+This is to better adapt to the more complex scenarios of general lossy audio, rather than just VCTK speech.  
+Additionally, the engineering aspect references the design of [kokoro](httpss://github.com/hexgrad/kokoro), with
+STFT/iSTFT built into the network for end-to-end computation, reducing the alignment cost of pre- and post-processing.
+
+## Usage
+
+The environment uses Python 3.12 + PyTorch 2.7.1 + ONNX 1.18.0.  
+Prepare a directory named `dataset`, put the audio files into it, and start `train_aio.py` to begin training.
+
+If you just want to see the effect, you can download the binary from the Release page.  
+It supports common lossy audio format inputs (e.g., MP3, AAC, Opus), and the output is fixed to lossless compressed
+FLAC.
+
+```shell
+zansei.exe model.onnx input.mp3 output.flac
+```
+
+It should be noted that the program will internally downsample the audio to 32kHz to remove the empty spectrum and
+optimize the output quality.  
+Using lossless audio or other inputs containing information above this frequency range may cause audio degradation.
+
+## Training Details
+
+The training used 226 stereo recordings randomly selected from a personal music library and trained for about 90 hours,
+which was interrupted by a machine failure and restart.  
+The MS-STFT weighted loss at the last checkpoint was about 8.1, and the discriminator loss was about 0.33. Other metrics
+were lost due to the failure.  
@@ -0,0 +1,35 @@
+# EmiyaEngineNN
+
+[English](README.md) | [简体中文](README_CN.md)
+
+完全使用神经网络重构的 EmiyaEngine  
+一个适用于常见有损音频的上采样/修复模型
+
+---
+
+## 方法论
+
+EmiyaEngineNN 是一个基于 [BAE-Net](https://github.com/yuguochencuc/BAE-Net) 修改而来的高保真度宽带音频上采样模型。  
+相比原设计通过大幅加宽 FFT 窗口（1576->3072）、修改中间层通道数等操作，增大了网络容量到原设计的三倍左右。  
+以此更好地适应更复杂的一般有损音频场景，而非单纯的 VCTK 语音。  
+另外工程上参考了 [kokoro](https://github.com/hexgrad/kokoro) 的设计，将 STFT/iSTFT 内置在网络中端到端计算，降低前后处理的对齐成本。
+
+## 使用方法
+
+环境使用 Python 3.12 + PyTorch 2.7.1 + ONNX 1.18.0。  
+准备一个目录名为 `dataset`，把音频文件丢进去，启动 `train_aio.py` 即可开始训练。
+
+如果只想看看效果，可以从 Release 页面下载二进制使用。  
+支持常见有损音频格式输入（例如 MP3、AAC、Opus），输出固定为无损压缩 FLAC。
+
+```shell
+zansei.exe model.onnx input.mp3 output.flac
+```
+
+需要注意，程序会在内部将音频下采样到 32kHz 以去除空白频谱优化输出质量。  
+使用无损音频或其他高于此频率范围包含信息的输入可能会导致音频劣化。
+
+## 训练细节
+
+训练使用了个人曲库中随机挑选的 226 个立体声录音，训练了大约 90 小时，因机器故障重启中断。  
+最后一个存档点的 MS-STFT 加权损失大约为 8.1，鉴别器损失大约 0.33，其他的指标因故障丢失。