SCALE-sim v3 integrates a detailed memory model with the systolic array computation. Users can evaluate the stall cycles due to data load from memory, bank conflicts and can experiment with different memory types (DDR3,DDR4 etc.) and configurations (channel, row etc.). Currently we integrate Ramulator [Link] DRAM simulator in this design. The memory interface can be easily evaluated by performing the following steps:
python -m venv ./venv
source venv/bin/activate
pip install -r ./requirements.txt
git submodule update --init --recursive
cd submodules
cp ../scripts/ramulator_patch/* ./ramulator/
cd ramulator
make -j <num_jobs>
After this step, you will have a ramulator executable in the ./ramulator folder which will be used to simulate the memory. The ramulator integration is a multi-step process, where we need to first generate the demand trace by running the SCALE-Sim without any memory stalls. Next, the demand trace is fed to the Ramulator. Each memory request is tagged with an arrival time, based on when the request is sent to the Ramulator. The Ramulator reports the response time of each individual requests. The memory round-trip time is saved in a numpy file. SCALE-Sim is rerun with the memory round-trip latency for each request, capturing realistic pipeline stalls caused by memory delays and reporting the resulting execution time. The steps of plot generation are listed in the next step.
source generate_fig9_ramulator_mem_bw_plot.sh
source generate_fig10_ramulator_stall_plot.sh