Proposal: Adding a fixed-width binary format for memory traces.

## Problem
Given that we now have a use for trace files beyond simple debugging (i.e. building our own trace visualizer), the current CSV format output by the existing `TraceRecorder` introduces some unessesary inconveniences when trying to parse it efficiently:
- We need to count & index the memory mapped CSV file, which still can take 3-5s on SSD and ~11s on HDD with numerous optimizations.
    - This is how far I got optimizing the parsing of the CSV trace file for my Trace Visualizer, any more gives diminishing returns: [Code](https://github.com/ziadomalik/ramwiz/blob/feat/load-trace/src-tauri/src/trace.rs)
- Since we're working with characters in a string, we need to perform linear time iterations to find a row and then a specific entry in a row. 
- The current format has odd quirks that could be optimized for file size, like:
    - Spaces after commas in the output CSV (Those add up! The first test in the `mess` branch produces with spaces a `27MB` file and removing the spaces gives you something along the order `23MB`, so imagine how much of a 1GB file is spaces.
    - When something is broken/invalid, `TraceRecorder` outputs `-1` as a value in that entry, forcing us to use integer types, getting half the positive number range for double the size. Depending on how we define the maximum values for entries in the address vector, one could easily cut the file-size in half if it were stored in binary.

## Solution
Yesterday, I wanted to validate my idea, so I implemented a binary trace format (+ recorder) with the following characteristics:
- It is fixed-width, meaning after memory mapping, we get  `O(1)` lookup for each entry at any line.
- It defines a trace event to be exactly `32B` wide, meaning two trace events fit into one `64B` cache line. 
    - (i.e when we look up `i`, we get `i+1` loaded into the CPU cache for free).
- It is very friendly to size-optimization if you can define tighter upper boundaries for the address vector entries.

I have forked Ramulator 2.0 and added:
- Header-only library defining the file format: [(Code)](https://github.com/ziadomalik/ramulator2/blob/mtrc/src/dram_controller/impl/plugin/mtrc/mtrc.h)
- New recorder called `BinaryTraceRecorder`: [(Code)](https://github.com/ziadomalik/ramulator2/blob/mtrc/src/dram_controller/impl/plugin/binary_trace_recorder.cpp)

### How to test it
Since you guys don't seem to have a set-in-stone way of doing testing right now, I didn't want to add my own unit/integration testing setup without talking to you guys first, so I **vibecoded (!)** a script that takes the binary file and turns it back into a matching CSV. [(Code)](https://github.com/ziadomalik/ramulator2/blob/mtrc/src/dram_controller/impl/plugin/mtrc/test_mtrc.py)
I recommend the following testing procedure:
1. Add both the `TraceRecorder` and `BinaryTraceRecorder` to a project and generate some traces. (Params are the same)
    - You should end up with the usual CSV files `{path}.ch{channel_id}` and new files that end in `{channel_id}.mtrc` (the new format)
3. Pick one channel, (i.e. `0`), and convert the file ending in `{id}.mtrc` to a CSV like so:
    - `python3 test_mtrc.py <path>.mtrc`
    - You will end up with a file of the same name as the input file but ending in `.csv`
4. Get the diff of this file and the file produced by `TraceRecorder` corresponding to the same channel:
    - `diff -a visualizer_trace_0.csv visualizer_trace_csv.ch0`
    - If the diff is empty, it means our binary format contains the same information as the original trace format!

---

I made an issue to hear you guys' thoughts on this and whether you guys think it's a good change for the trace visualizer project and a good addition to the project in general. If you are interested, I can turn my branch into a PR. (CC: @nisabostanci, @RichardLuo79) 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: Adding a fixed-width binary format for memory traces. #103

Problem

Solution

How to test it

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Proposal: Adding a fixed-width binary format for memory traces. #103

Description

Problem

Solution

How to test it

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions