-
Notifications
You must be signed in to change notification settings - Fork 13
Imported Simulator
You can learn the following topics from this page:
- How to imported a new simulator?
- Function and APIs provided for imported simulators.
- What has been modified in SniperSim and GPGPUSim?
- Provide implementation of APIs in benchmark. See Benchmark for the API list.
- Issue CYCLE command to report the end cycle if the execution cycle of this simulator takes in count to determine the total execution cycle. In general, simulators of PComps should provide CYCLE command and simulators of SComps can skip this task.
- Issue PIPE command before open Pipe and read/write data in the functional model.
- Issue READ/WRITE command in the timing model and adjust the execution cycle when receiving SYNC command.
SniperSim is a trace-based CPU simulator which can achieve high speed and reasonable accuracy.
Read and write API is implemented by System Calls. Following system call numbers are assigned to these two APIs.
SYSCALL_SEND_TO_GPU = 508, // Send data to GPU
SYSCALL_READ_FROM_GPU = 509, // Read data from CPU
System calls have following arguments: source address, destination address, pointer to data array and amount of data in bytes.
SniperSim provides separated functional and timing models. Hence, APIs are handled in functional and timing models separately.
In the functional model, system calls are handled in file $SIMULATOR_ROOT/snipersim/sift/recorder/syscall_modeling.cc. The flow chart is as below:
flowchart TD
subgraph Write Syscall
A1[Issue PIPE command]
B1[Wait for SYNC command]
C1[Open PIPE]
D1[Write data to PIPE]
end
A1-->B1-->C1-->D1
B1-->B1
subgraph Read Syscall
A2[Issue PIPE command]
B2[Wait for SYNC command]
C2[Open PIPE]
D2[Read data from PIPE]
end
A2-->B2-->C2-->D2
B2-->B2
In the timing model, system calls are handled in file $SIMULATOR_ROOT/snipersim/common/core/syscall_model.cc. The flow chart is as below:
flowchart TD
subgraph Write Syscall
A1[Get current execution cycle]
B1[Issue WRITE command]
C1[Wait for SYNC command]
D1[Sleep core until cycle specified by SYNC command]
end
A1-->B1-->C1-->D1
C1-->C1
subgraph Read Syscall
A2[Get current execution cycle]
B2[Issue READ command]
C2[Wait for SYNC command]
D2[Sleep core until cycle specified by SYNC command]
end
A2-->B2-->C2-->D2
C2-->C2
SniperSim is not a cycle driven simulator. Hence, the execution cycle cannot be changed by modifying the value of some variables. Instead, one Sleep instruction is injected into the timing model and the duration of the Sleep instruction equals to the gap from the cycle issue one READ/WRITE command to the cycle receiving the corresponding SYNC command.
// Update simulator time.
ComponentPeriod time_wake_period = *(Sim()->getDvfsManager()->getGlobalDomain()) * end_time;
SubsecondTime time_wake = time_wake_period.getPeriod();
SubsecondTime sleep_end_time;
Sim()->getSyscallServer()->handleSleepCall(m_thread->getId(), time_wake, start_time, sleep_end_time);
// Sleep core until specified time.
if (m_thread->reschedule(sleep_end_time, core))
core = m_thread->getCore();
core->getPerformanceModel()->queuePseudoInstruction(new SyncInstruction(sleep_end_time, SyncInstruction::SLEEP));
Because CPU always control the flow of benchmark, the execution cycle of CPU plays an important role in the execution cycle of entire simulation. CYCLE command is issued in file $SIMULATOR_ROOT/snipersim/common/core/core.cc.
GPGPUSim is a cycle-accurate model to simulate the arithtecture of Nvidia GPGPU.
Unfortunately, there is no instruction like CPU Sytem calls in CUDA environment. The APIs are emulated tricky in GPGPUSim. Instruction ADDC with immediate operands is applied to create a pseudo-syscall in GPGPUSim.
addc.u32 %0, %1, %2;
If the second source operand is 0, the instruction injects the unsigned 32-bit integer provided in the first source operand to the list of system call arguments. If the second source operand is 1, instruction perform a functionality specified by the first source operand. The source code to generate read system call is as below:
asm("addc.u32 %0, %1, %2;" : "=r"(t_res) : "r"(__dst_x) , "r"(InterChiplet::CUDA_SYSCALL_ARG));
*__res += t_res;
asm("addc.u32 %0, %1, %2;" : "=r"(t_res) : "r"(__dst_y) , "r"(InterChiplet::CUDA_SYSCALL_ARG));
*__res += t_res;
asm("addc.u32 %0, %1, %2;" : "=r"(t_res) : "r"(__src_x) , "r"(InterChiplet::CUDA_SYSCALL_ARG));
*__res += t_res;
asm("addc.u32 %0, %1, %2;" : "=r"(t_res) : "r"(__srx_y) , "r"(InterChiplet::CUDA_SYSCALL_ARG));
*__res += t_res;
asm("addc.u32 %0, %1, %2;" : "=r"(t_res) : "r"(lo_data_ptr) , "r"(InterChiplet::CUDA_SYSCALL_ARG));
*__res += t_res;
asm("addc.u32 %0, %1, %2;" : "=r"(t_res) : "r"(hi_data_ptr) , "r"(InterChiplet::CUDA_SYSCALL_ARG));
*__res += t_res;
asm("addc.u32 %0, %1, %2;" : "=r"(t_res) : "r"(byte_size) , "r"(InterChiplet::CUDA_SYSCALL_ARG));
*__res += t_res;
asm("addc.u32 %0, %1, %2;" : "=r"(t_res) : "r"(InterChiplet::SYSCALL_SEND_TO_GPU), "r"(InterChiplet::CUDA_SYSCALL_CMD));
*__res += t_res;
The return value __res stops the compiler to remove instructions with unused destination operands.
System calls are handled in file $SIMULATOR_ROOT/gpgpu-sim/src/cuda-sim/instructions.cc. The flow chart is as below:
flowchart TD
subgraph Write Syscall
A1[Read data from GPU memory]
B1[Issue PIPE command]
C1[Wait for SYNC command]
D1[Open PIPE]
E1[Write data to PIPE]
F1[Get current execution cycle]
G1[Send WRITE command]
H1[Wait for SYNC command]
I1[Lazzy adjust the clock cycle]
end
A1-->B1-->C1-->D1-->E1-->F1-->G1-->H1-->I1
C1-->C1
H1-->H1
subgraph Read Syscall
B2[Issue PIPE command]
C2[Wait for SYNC command]
D2[Open PIPE]
E2[Read data from PIPE]
A2[Write data from to memory]
F2[Get current execution cycle]
G2[Send WRITE command]
H2[Wait for SYNC command]
I2[Lazzy adjust the clock cycle]
end
B2-->C2-->D2-->E2-->A2-->F2-->G2-->H2-->I2
C2-->C2
H1-->H1
Because the data pointer provides by CUDA is within CUDA address space, rather than host address space. Hence, the data cannot be read or written from the memory location provided by APIs directly. Instead, We have to read/write the value from/to CUDA through interface provided by GPGPUsim.
memory_space_t space;
space.set_type(global_space); // TODO: how to accept other space?
memory_space *mem = NULL;
addr_t addr = data_ptr;
decode_space(space, thread, dst, mem, addr);
mem->write(addr, nbytes, interdata, thread, pI);
GPGPUSim is a cycle-driven simulator, whose cycle loop can be found in file $SIMULATOR_ROOT/gpgpu-sim/src/gpgpu-sim/gpu-sim.h and $SIMULATOR_ROOT/gpgpu-sim/src/gpgpu-sim/gpu-sim.cc. Variable gpgpu_sim::gpu_sim_cycle maintains the current execution cycle. gpgpu_sim::gpu_sim_cycle cannot be modified directly when system calls are handled. Instead, the target execution cycle should be recorded somewhere and set to gpgpu_sim::gpu_sim_cycle after simulator has handle all events in current cycle. To address this, some variables and functions are added to gpgpu_sim as below:
gpu-sim.h
class gpgpu_sim : public gpgpu_t {
...
// Directly set GPU cycle.
void chiplet_direct_set_cycle(long long int end_time);
...
}
gpu-sim.cc
// Directly set GPU cycle.
bool g_chiplet_directly_set_cycle = false;
unsigned long long g_chiplet_directly_set_cycle_val = 0;
...
void gpgpu_sim::cycle() {
...
gpu_sim_cycle++;
// Directly set GPU cycle.
if (g_chiplet_directly_set_cycle)
{
std::cout << "Directly set cycle to " << g_chiplet_directly_set_cycle_val << std::endl;
gpu_sim_cycle = g_chiplet_directly_set_cycle_val;
g_chiplet_directly_set_cycle = false;
}
...
}
// Directly set cycle
void gpgpu_sim::chiplet_direct_set_cycle(long long int end_time)
{
g_chiplet_directly_set_cycle_val = end_time;
g_chiplet_directly_set_cycle = true;
}
By calling gpgpu_sim::chiplet_direct_set_cycle, the target execution cycle end_time is recorded. The source code to adjust execution cycle in GPGPUSim is shown below:
thread->get_gpu()->chiplet_direct_set_cycle(timeEnd);
The task of GPU is triggered by CPUs in the system. The data required by tasks is prepared by CPU and the generated result is accepted by CPU as well. The execution cycle of CPU and reflect the execution cycle of GPU through the synchronization performed by data transmission. Therefore, GPGPUSim does not issue CYCLE command.
$SIMULATOR_ROOT/interchiplet/includes/pipe_comm.h provides utility APIs to handle synchronization protocol.
To issue command, following APIs exist in pipe_comm.h:
-
InterChiplet::SyncProtocol::sendCycleCmdsends CYCLE command. -
InterChiplet::SyncProtocol::pipeSyncsends PIPE command and wait for SYNC command. Function return the cycle specified by SYNC command. -
InterChiplet::SyncProtocol::writeSyncsends WRITE command and wait for SYNC command. Function return the cycle specified by SYNC command. -
InterChiplet::SyncProtocol::readSyncsends READ command and wait for SYNC command. Function return the cycle specified by SYNC command.
To reduce the overhead to open, close, read and write Pipe, pipe_comm.h abstracts operations into class InterChiplet::PipeComm. PipeComm holds a list of opened pipes and one data buffer for each opened pipe. So that one pipe is opened only once during one simulator process. Meanwhile, many reads with small size are regulated to less read with the size of data buffer.
The usage of InterChiplet::PipeComm is as below:
InterChiplet::PipeComm global_pipe_comm; // It is suggested to declare global entity of PipeComm.
char * fileName = InterChiplet::SyncProtocol::pipeName(src_x, src_y, dst_x, dst_y); // It is suggested to use APIs to get the file name of pipes.
global_pipe_comm.write_data(fileName, interdata, nbytes); // Write data to Pipe
global_pipe_comm.read_data(fileName, interdata, nbytes); // Read data to Pipe
Although imported simulators need a minor change to support synchronization protocol, it is still suggested to imported third-parity simulators as git submodules. The purpose behind such suggest is to keep the repository clean and respect to the open-source spirit. The minor modification should be stored in a dedicate diff file for each simulator.
patch.sh is used to create diff patch for all simulators. It will also copy modified files to .cache. It is forbidden to copy files in .cache to the directory of simulators because the copy operation cannot redraw. However, the file in .cache can be used as reference when recover from git confliction.
apply_patch.sh will apply diff patch to all simulators. git reset is necessary in each simulator before apply_patch to avoid git confliction.
When adding new simulators, it is necessary to add the path of the new simulators to patch.sh and apply_patch.sh.