Imported Simulator

You can learn the following topics from this page:

How to imported a new simulator?
Function and APIs provided for imported simulators.
What has been modified in SniperSim and GPGPUSim?

The tasks to import a new simulator

Provide implementation of APIs in benchmark. See Benchmark for the API list.
Issue CYCLE command to report the end cycle if the execution cycle of this simulator takes in count to determine the total execution cycle. In general, simulators of PComps should provide CYCLE command and simulators of SComps can skip this task.
Issue PIPE command before open Pipe and read/write data in the functional model.
Issue READ/WRITE command in the timing model and adjust the execution cycle when receiving SYNC command.

SniperSim

SniperSim is a trace-based CPU simulator which can achieve high speed and reasonable accuracy.

APIs

Read and write API is implemented by System Calls. Following system call numbers are assigned to these two APIs.

SYSCALL_SEND_TO_GPU = 508,          // Send data to GPU
SYSCALL_READ_FROM_GPU = 509,        // Read data from CPU

System calls have following arguments: source address, destination address, pointer to data array and amount of data in bytes.

Handle Read/Write System Calls

SniperSim provides separated functional and timing models. Hence, APIs are handled in functional and timing models separately.

In the functional model, system calls are handled in file $SIMULATOR_ROOT/snipersim/sift/recorder/syscall_modeling.cc. The flow chart is as below:

flowchart TD

subgraph Write Syscall
A1[Issue PIPE command]
B1[Wait for SYNC command]
C1[Open PIPE]
D1[Write data to PIPE]
end

A1-->B1-->C1-->D1
B1-->B1

subgraph Read Syscall
A2[Issue PIPE command]
B2[Wait for SYNC command]
C2[Open PIPE]
D2[Read data from PIPE]
end

A2-->B2-->C2-->D2
B2-->B2

In the timing model, system calls are handled in file $SIMULATOR_ROOT/snipersim/common/core/syscall_model.cc. The flow chart is as below:

flowchart TD

subgraph Write Syscall
A1[Get current execution cycle]
B1[Issue WRITE command]
C1[Wait for SYNC command]
D1[Sleep core until cycle specified by SYNC command]
end

A1-->B1-->C1-->D1
C1-->C1

subgraph Read Syscall
A2[Get current execution cycle]
B2[Issue READ command]
C2[Wait for SYNC command]
D2[Sleep core until cycle specified by SYNC command]
end

A2-->B2-->C2-->D2
C2-->C2

SniperSim is not a cycle driven simulator. Hence, the execution cycle cannot be changed by modifying the value of some variables. Instead, one Sleep instruction is injected into the timing model and the duration of the Sleep instruction equals to the gap from the cycle issue one READ/WRITE command to the cycle receiving the corresponding SYNC command.

// Update simulator time.
ComponentPeriod time_wake_period = *(Sim()->getDvfsManager()->getGlobalDomain()) * end_time;
SubsecondTime time_wake = time_wake_period.getPeriod();
SubsecondTime sleep_end_time;
Sim()->getSyscallServer()->handleSleepCall(m_thread->getId(), time_wake, start_time, sleep_end_time);

// Sleep core until specified time.
if (m_thread->reschedule(sleep_end_time, core))
    core = m_thread->getCore();

core->getPerformanceModel()->queuePseudoInstruction(new SyncInstruction(sleep_end_time, SyncInstruction::SLEEP));

Issue CYCLE command

Because CPU always control the flow of benchmark, the execution cycle of CPU plays an important role in the execution cycle of entire simulation. CYCLE command is issued in file $SIMULATOR_ROOT/snipersim/common/core/core.cc.

GPGPUSim

GPGPUSim is a cycle-accurate model to simulate the arithtecture of Nvidia GPGPU.

APIs

Unfortunately, there is no instruction like CPU Sytem calls in CUDA environment. The APIs are emulated tricky in GPGPUSim. Instruction ADDC with immediate operands is applied to create a pseudo-syscall in GPGPUSim.

addc.u32 %0, %1, %2;

If the second source operand is 0, the instruction injects the unsigned 32-bit integer provided in the first source operand to the list of system call arguments. If the second source operand is 1, instruction perform a functionality specified by the first source operand. The source code to generate read system call is as below:

asm("addc.u32 %0, %1, %2;" : "=r"(t_res) : "r"(__dst_x) , "r"(InterChiplet::CUDA_SYSCALL_ARG));
*__res += t_res;
asm("addc.u32 %0, %1, %2;" : "=r"(t_res) : "r"(__dst_y) , "r"(InterChiplet::CUDA_SYSCALL_ARG));
*__res += t_res;
asm("addc.u32 %0, %1, %2;" : "=r"(t_res) : "r"(__src_x) , "r"(InterChiplet::CUDA_SYSCALL_ARG));
*__res += t_res;
asm("addc.u32 %0, %1, %2;" : "=r"(t_res) : "r"(__srx_y) , "r"(InterChiplet::CUDA_SYSCALL_ARG));
*__res += t_res;
asm("addc.u32 %0, %1, %2;" : "=r"(t_res) : "r"(lo_data_ptr) , "r"(InterChiplet::CUDA_SYSCALL_ARG));
*__res += t_res;
asm("addc.u32 %0, %1, %2;" : "=r"(t_res) : "r"(hi_data_ptr) , "r"(InterChiplet::CUDA_SYSCALL_ARG));
*__res += t_res;
asm("addc.u32 %0, %1, %2;" : "=r"(t_res) : "r"(byte_size) , "r"(InterChiplet::CUDA_SYSCALL_ARG));
*__res += t_res;
asm("addc.u32 %0, %1, %2;" : "=r"(t_res) : "r"(InterChiplet::SYSCALL_SEND_TO_GPU), "r"(InterChiplet::CUDA_SYSCALL_CMD));
*__res += t_res;

The return value __res stops the compiler to remove instructions with unused destination operands.

Handle Read/Write System Calls

System calls are handled in file $SIMULATOR_ROOT/gpgpu-sim/src/cuda-sim/instructions.cc. The flow chart is as below:

flowchart TD

subgraph Write Syscall
A1[Read data from GPU memory]
B1[Issue PIPE command]
C1[Wait for SYNC command]
D1[Open PIPE]
E1[Write data to PIPE]
F1[Get current execution cycle]
G1[Send WRITE command]
H1[Wait for SYNC command]
I1[Lazzy adjust the clock cycle]
end

A1-->B1-->C1-->D1-->E1-->F1-->G1-->H1-->I1
C1-->C1
H1-->H1

subgraph Read Syscall
B2[Issue PIPE command]
C2[Wait for SYNC command]
D2[Open PIPE]
E2[Read data from PIPE]
A2[Write data from to memory]
F2[Get current execution cycle]
G2[Send WRITE command]
H2[Wait for SYNC command]
I2[Lazzy adjust the clock cycle]
end

B2-->C2-->D2-->E2-->A2-->F2-->G2-->H2-->I2
C2-->C2
H1-->H1

Because the data pointer provides by CUDA is within CUDA address space, rather than host address space. Hence, the data cannot be read or written from the memory location provided by APIs directly. Instead, We have to read/write the value from/to CUDA through interface provided by GPGPUsim.

memory_space_t space;
space.set_type(global_space); // TODO: how to accept other space?
memory_space *mem = NULL;
addr_t addr = data_ptr;
decode_space(space, thread, dst, mem, addr);
mem->write(addr, nbytes, interdata, thread, pI);

GPGPUSim is a cycle-driven simulator, whose cycle loop can be found in file $SIMULATOR_ROOT/gpgpu-sim/src/gpgpu-sim/gpu-sim.h and $SIMULATOR_ROOT/gpgpu-sim/src/gpgpu-sim/gpu-sim.cc. Variable gpgpu_sim::gpu_sim_cycle maintains the current execution cycle. gpgpu_sim::gpu_sim_cycle cannot be modified directly when system calls are handled. Instead, the target execution cycle should be recorded somewhere and set to gpgpu_sim::gpu_sim_cycle after simulator has handle all events in current cycle. To address this, some variables and functions are added to gpgpu_sim as below:

gpu-sim.h

class gpgpu_sim : public gpgpu_t {
...
  // Directly set GPU cycle.
  void chiplet_direct_set_cycle(long long int end_time);
...
}

gpu-sim.cc

// Directly set GPU cycle.
bool g_chiplet_directly_set_cycle = false;
unsigned long long g_chiplet_directly_set_cycle_val = 0;

...

void gpgpu_sim::cycle() {
...
    gpu_sim_cycle++;
    // Directly set GPU cycle.
    if (g_chiplet_directly_set_cycle)
    {
      std::cout << "Directly set cycle to " << g_chiplet_directly_set_cycle_val << std::endl;
      gpu_sim_cycle = g_chiplet_directly_set_cycle_val;
      g_chiplet_directly_set_cycle = false;
    }
...
}


// Directly set cycle
void gpgpu_sim::chiplet_direct_set_cycle(long long int end_time)
{
  g_chiplet_directly_set_cycle_val = end_time;
  g_chiplet_directly_set_cycle = true;
}

By calling gpgpu_sim::chiplet_direct_set_cycle, the target execution cycle end_time is recorded. The source code to adjust execution cycle in GPGPUSim is shown below:

thread->get_gpu()->chiplet_direct_set_cycle(timeEnd);

Issue CYCLE command

The task of GPU is triggered by CPUs in the system. The data required by tasks is prepared by CPU and the generated result is accepted by CPU as well. The execution cycle of CPU and reflect the execution cycle of GPU through the synchronization performed by data transmission. Therefore, GPGPUSim does not issue CYCLE command.

Utility APIs

$SIMULATOR_ROOT/interchiplet/includes/pipe_comm.h provides utility APIs to handle synchronization protocol.

To issue command, following APIs exist in pipe_comm.h:

InterChiplet::SyncProtocol::sendCycleCmd sends CYCLE command.
InterChiplet::SyncProtocol::pipeSync sends PIPE command and wait for SYNC command. Function return the cycle specified by SYNC command.
InterChiplet::SyncProtocol::writeSync sends WRITE command and wait for SYNC command. Function return the cycle specified by SYNC command.
InterChiplet::SyncProtocol::readSync sends READ command and wait for SYNC command. Function return the cycle specified by SYNC command.

To reduce the overhead to open, close, read and write Pipe, pipe_comm.h abstracts operations into class InterChiplet::PipeComm. PipeComm holds a list of opened pipes and one data buffer for each opened pipe. So that one pipe is opened only once during one simulator process. Meanwhile, many reads with small size are regulated to less read with the size of data buffer.

The usage of InterChiplet::PipeComm is as below:

InterChiplet::PipeComm global_pipe_comm;    // It is suggested to declare global entity of PipeComm.

char * fileName = InterChiplet::SyncProtocol::pipeName(src_x, src_y, dst_x, dst_y);    // It is suggested to use APIs to get the file name of pipes.
global_pipe_comm.write_data(fileName, interdata, nbytes);    // Write data to Pipe
global_pipe_comm.read_data(fileName, interdata, nbytes);     // Read data to Pipe

Generate Patch and Apply Patch

Although imported simulators need a minor change to support synchronization protocol, it is still suggested to imported third-parity simulators as git submodules. The purpose behind such suggest is to keep the repository clean and respect to the open-source spirit. The minor modification should be stored in a dedicate diff file for each simulator.

patch.sh is used to create diff patch for all simulators. It will also copy modified files to .cache. It is forbidden to copy files in .cache to the directory of simulators because the copy operation cannot redraw. However, the file in .cache can be used as reference when recover from git confliction.

apply_patch.sh will apply diff patch to all simulators. git reset is necessary in each simulator before apply_patch to avoid git confliction.

When adding new simulators, it is necessary to add the path of the new simulators to patch.sh and apply_patch.sh.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Imported Simulator

The tasks to import a new simulator

SniperSim

APIs

Handle Read/Write System Calls

Issue CYCLE command

GPGPUSim

APIs

Handle Read/Write System Calls

Issue CYCLE command

Utility APIs

Generate Patch and Apply Patch

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally