SlugLab
diff --git a/‎.gitmodules‎
Lines changed: 3 additions & 0 deletions b/‎.gitmodules‎
Lines changed: 3 additions & 0 deletions
diff --git a/‎artifact/gen_workloads.sh‎
Lines changed: 0 additions & 13 deletions b/‎artifact/gen_workloads.sh‎
Lines changed: 0 additions & 13 deletions
diff --git a/‎artifact/gromacs/orig.txt‎
Lines changed: 63 additions & 0 deletions b/‎artifact/gromacs/orig.txt‎
Lines changed: 63 additions & 0 deletions
diff --git a/‎artifact/llama/orig.txt‎
Lines changed: 162 additions & 0 deletions b/‎artifact/llama/orig.txt‎
Lines changed: 162 additions & 0 deletions
diff --git a/‎artifact/memcached/cxl.txt‎ b/‎artifact/memcached/cxl.txt‎
diff --git a/‎artifact/memcached/cxlmemsim.txt‎
Lines changed: 55 additions & 0 deletions b/‎artifact/memcached/cxlmemsim.txt‎
Lines changed: 55 additions & 0 deletions
diff --git a/‎artifact/memcached/orig.txt‎
Lines changed: 4 additions & 0 deletions b/‎artifact/memcached/orig.txt‎
Lines changed: 4 additions & 0 deletions
diff --git a/‎script/ld.png‎ ‎artifact/microbench/pmu/ld.png‎script/ld.png renamed to artifact/microbench/pmu/ld.png b/‎script/ld.png‎ ‎artifact/microbench/pmu/ld.png‎script/ld.png renamed to artifact/microbench/pmu/ld.png
diff --git a/‎script/ld_base_results_local.csv‎ ‎…microbench/pmu/ld_base_results_local.csv‎script/ld_base_results_local.csv renamed to artifact/microbench/pmu/ld_base_results_local.csv b/‎script/ld_base_results_local.csv‎ ‎…microbench/pmu/ld_base_results_local.csv‎script/ld_base_results_local.csv renamed to artifact/microbench/pmu/ld_base_results_local.csv
diff --git a/‎script/ld_pmu128_results.csv‎ ‎…act/microbench/pmu/ld_pmu128_results.csv‎script/ld_pmu128_results.csv renamed to artifact/microbench/pmu/ld_pmu128_results.csv b/‎script/ld_pmu128_results.csv‎ ‎…act/microbench/pmu/ld_pmu128_results.csv‎script/ld_pmu128_results.csv renamed to artifact/microbench/pmu/ld_pmu128_results.csv
@@ -28,3 +28,6 @@
 [submodule "workloads/YCSB"]
 	path = workloads/YCSB
 	url = https://github.com/brianfrankcooper/YCSB/
+[submodule "workloads/gromacs"]
+	path = workloads/gromacs
+	url = https://github.com/gromacs/gromacs
@@ -0,0 +1,63 @@
+ time ./bin/gmx solvate -cp 5pep-box.gro -cs spc216.gro -o 5pep-solv.gro -p topol.top
+  :-) GROMACS - gmx solvate, 2026.0-dev-20250308-ea62118dab-dirty-unknown (-:
+
+Executable:   /home/try/Documents/CXLMemSim-dev/workloads/gromacs/build/./bin/gmx
+Data prefix:  /home/try/Documents/CXLMemSim-dev/workloads/gromacs (source tree)
+Working dir:  /home/try/Documents/CXLMemSim-dev/workloads/gromacs/build
+Command line:
+  gmx solvate -cp 5pep-box.gro -cs spc216.gro -o 5pep-solv.gro -p topol.top
+
+Reading solute configuration
+Reading solvent configuration
+
+Initialising inter-atomic distances...
+
+WARNING: Masses and atomic (Van der Waals) radii will be guessed
+         based on residue and atom names, since they could not be
+         definitively assigned from the information in your input
+         files. These guessed numbers might deviate from the mass
+         and radius of the atom type. Please check the output
+         files if necessary. Note, that this functionality may
+         be removed in a future GROMACS version. Please, consider
+         using another file format for your input.
+
+NOTE: From version 5.0 gmx solvate uses the Van der Waals radii
+from the source below. This means the results may be different
+compared to previous GROMACS versions.
+
+++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
+A. Bondi
+van der Waals Volumes and Radii
+J. Phys. Chem. (1964)
+DOI: 10.1021/j100785a001
+-------- -------- --- Thank You --- -------- --------
+
+Generating solvent configuration
+Will generate new solvent configuration of 5x5x5 boxes
+Solvent box contains 69699 atoms in 23233 residues
+Removed 7791 solvent atoms due to solvent-solvent overlap
+Removed 4365 solvent atoms due to solute-solvent overlap
+Sorting configuration
+Found 1 molecule type:
+    SOL (   3 atoms): 19181 residues
+Generated solvent containing 57543 atoms in 19181 residues
+Writing generated configuration to 5pep-solv.gro
+
+Back Off! I just backed up 5pep-solv.gro to ./#5pep-solv.gro.1#
+
+Output configuration contains 62225 atoms in 19507 residues
+Volume                 :     627.119 (nm^3)
+Density                :     1012.07 (g/l)
+Number of solvent molecules:  19181
+
+Processing topology
+Adding line for 19181 solvent molecules with resname (SOL) to topology file (topol.top)
+
+Back Off! I just backed up topol.top to ./#topol.top.3#
+
+GROMACS reminds you: "Everybody Lie Down On the Floor and Keep Calm" (KLF)
+
+
+real    0m0.189s
+user    0m0.518s
+sys     0m1.133s
@@ -0,0 +1,162 @@
+./bin/llama-cli    --model DeepSeek-R1-Distill-Qwen-32B-Q2_K.gguf     --cache-type-k q8_0     --threads 24     --prompt '<｜User｜>What is 1+1?<｜Assistant｜>'     -no-cnv
+build: 4798 (1782cdfe) with cc (Ubuntu 14.2.0-4ubuntu2~24.04) 14.2.0 for x86_64-linux-gnu
+main: llama backend init
+main: load the model and apply lora adapter, if any
+llama_model_loader: loaded meta data with 27 key-value pairs and 771 tensors from DeepSeek-R1-Distill-Qwen-32B-Q2_K.gguf (version GGUF V3 (latest))
+llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
+llama_model_loader: - kv   0:                       general.architecture str              = qwen2
+llama_model_loader: - kv   1:                               general.type str              = model
+llama_model_loader: - kv   2:                               general.name str              = DeepSeek R1 Distill Qwen 32B
+llama_model_loader: - kv   3:                       general.organization str              = Deepseek Ai
+llama_model_loader: - kv   4:                           general.basename str              = DeepSeek-R1-Distill-Qwen
+llama_model_loader: - kv   5:                         general.size_label str              = 32B
+llama_model_loader: - kv   6:                          qwen2.block_count u32              = 64
+llama_model_loader: - kv   7:                       qwen2.context_length u32              = 131072
+llama_model_loader: - kv   8:                     qwen2.embedding_length u32              = 5120
+llama_model_loader: - kv   9:                  qwen2.feed_forward_length u32              = 27648
+llama_model_loader: - kv  10:                 qwen2.attention.head_count u32              = 40
+llama_model_loader: - kv  11:              qwen2.attention.head_count_kv u32              = 8
+llama_model_loader: - kv  12:                       qwen2.rope.freq_base f32              = 1000000.000000
+llama_model_loader: - kv  13:     qwen2.attention.layer_norm_rms_epsilon f32              = 0.000010
+llama_model_loader: - kv  14:                       tokenizer.ggml.model str              = gpt2
+llama_model_loader: - kv  15:                         tokenizer.ggml.pre str              = deepseek-r1-qwen
+llama_model_loader: - kv  16:                      tokenizer.ggml.tokens arr[str,152064]  = ["!", "\"", "#", "$", "%", "&", "'", ...
+llama_model_loader: - kv  17:                  tokenizer.ggml.token_type arr[i32,152064]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
+llama_model_loader: - kv  18:                      tokenizer.ggml.merges arr[str,151387]  = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
+llama_model_loader: - kv  19:                tokenizer.ggml.bos_token_id u32              = 151646
+llama_model_loader: - kv  20:                tokenizer.ggml.eos_token_id u32              = 151643
+llama_model_loader: - kv  21:            tokenizer.ggml.padding_token_id u32              = 151654
+llama_model_loader: - kv  22:               tokenizer.ggml.add_bos_token bool             = true
+llama_model_loader: - kv  23:               tokenizer.ggml.add_eos_token bool             = false
+llama_model_loader: - kv  24:                    tokenizer.chat_template str              = {% if not add_generation_prompt is de...
+llama_model_loader: - kv  25:               general.quantization_version u32              = 2
+llama_model_loader: - kv  26:                          general.file_type u32              = 10
+llama_model_loader: - type  f32:  321 tensors
+llama_model_loader: - type q2_K:  257 tensors
+llama_model_loader: - type q3_K:  128 tensors
+llama_model_loader: - type q4_K:   64 tensors
+llama_model_loader: - type q6_K:    1 tensors
+print_info: file format = GGUF V3 (latest)
+print_info: file type   = Q2_K - Medium
+print_info: file size   = 11.46 GiB (3.01 BPW)
+load: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect
+load: special tokens cache size = 22
+load: token to piece cache size = 0.9310 MB
+print_info: arch             = qwen2
+print_info: vocab_only       = 0
+print_info: n_ctx_train      = 131072
+print_info: n_embd           = 5120
+print_info: n_layer          = 64
+print_info: n_head           = 40
+print_info: n_head_kv        = 8
+print_info: n_rot            = 128
+print_info: n_swa            = 0
+print_info: n_embd_head_k    = 128
+print_info: n_embd_head_v    = 128
+print_info: n_gqa            = 5
+print_info: n_embd_k_gqa     = 1024
+print_info: n_embd_v_gqa     = 1024
+print_info: f_norm_eps       = 0.0e+00
+print_info: f_norm_rms_eps   = 1.0e-05
+print_info: f_clamp_kqv      = 0.0e+00
+print_info: f_max_alibi_bias = 0.0e+00
+print_info: f_logit_scale    = 0.0e+00
+print_info: n_ff             = 27648
+print_info: n_expert         = 0
+print_info: n_expert_used    = 0
+print_info: causal attn      = 1
+print_info: pooling type     = 0
+print_info: rope type        = 2
+print_info: rope scaling     = linear
+print_info: freq_base_train  = 1000000.0
+print_info: freq_scale_train = 1
+print_info: n_ctx_orig_yarn  = 131072
+print_info: rope_finetuned   = unknown
+print_info: ssm_d_conv       = 0
+print_info: ssm_d_inner      = 0
+print_info: ssm_d_state      = 0
+print_info: ssm_dt_rank      = 0
+print_info: ssm_dt_b_c_rms   = 0
+print_info: model type       = 32B
+print_info: model params     = 32.76 B
+print_info: general.name     = DeepSeek R1 Distill Qwen 32B
+print_info: vocab type       = BPE
+print_info: n_vocab          = 152064
+print_info: n_merges         = 151387
+print_info: BOS token        = 151646 '<｜begin▁of▁sentence｜>'
+print_info: EOS token        = 151643 '<｜end▁of▁sentence｜>'
+print_info: EOT token        = 151643 '<｜end▁of▁sentence｜>'
+print_info: PAD token        = 151654 '<|vision_pad|>'
+print_info: LF token         = 198 'Ċ'
+print_info: FIM PRE token    = 151659 '<|fim_prefix|>'
+print_info: FIM SUF token    = 151661 '<|fim_suffix|>'
+print_info: FIM MID token    = 151660 '<|fim_middle|>'
+print_info: FIM PAD token    = 151662 '<|fim_pad|>'
+print_info: FIM REP token    = 151663 '<|repo_name|>'
+print_info: FIM SEP token    = 151664 '<|file_sep|>'
+print_info: EOG token        = 151643 '<｜end▁of▁sentence｜>'
+print_info: EOG token        = 151662 '<|fim_pad|>'
+print_info: EOG token        = 151663 '<|repo_name|>'
+print_info: EOG token        = 151664 '<|file_sep|>'
+print_info: max token length = 256
+load_tensors: loading model tensors, this can take a while... (mmap = true)
+load_tensors:   CPU_Mapped model buffer size = 11736.98 MiB
+...............................................................................................
+llama_init_from_model: n_seq_max     = 1
+llama_init_from_model: n_ctx         = 4096
+llama_init_from_model: n_ctx_per_seq = 4096
+llama_init_from_model: n_batch       = 2048
+llama_init_from_model: n_ubatch      = 512
+llama_init_from_model: flash_attn    = 0
+llama_init_from_model: freq_base     = 1000000.0
+llama_init_from_model: freq_scale    = 1
+llama_init_from_model: n_ctx_per_seq (4096) < n_ctx_train (131072) -- the full capacity of the model will not be utilized
+llama_kv_cache_init: kv_size = 4096, offload = 1, type_k = 'q8_0', type_v = 'f16', n_layer = 64, can_shift = 1
+llama_kv_cache_init:        CPU KV buffer size =   784.00 MiB
+llama_init_from_model: KV self size  =  784.00 MiB, K (q8_0):  272.00 MiB, V (f16):  512.00 MiB
+llama_init_from_model:        CPU  output buffer size =     0.58 MiB
+llama_init_from_model:        CPU compute buffer size =   368.01 MiB
+llama_init_from_model: graph nodes  = 2246
+llama_init_from_model: graph splits = 1
+common_init_from_params: setting dry_penalty_last_n to ctx_size = 4096
+common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)
+main: llama threadpool init, n_threads = 24
+
+system_info: n_threads = 24 (n_threads_batch = 24) / 24 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX_VNNI = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | LLAMAFILE = 1 | OPENMP = 1 | AARCH64_REPACK = 1 |
+
+sampler seed: 2940439051
+sampler params:
+        repeat_last_n = 64, repeat_penalty = 1.000, frequency_penalty = 0.000, presence_penalty = 0.000
+        dry_multiplier = 0.000, dry_base = 1.750, dry_allowed_length = 2, dry_penalty_last_n = 4096
+        top_k = 40, top_p = 0.950, min_p = 0.050, xtc_probability = 0.000, xtc_threshold = 0.100, typical_p = 1.000, top_n_sigma = -1.000, temp = 0.800
+        mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000
+sampler chain: logits -> logit-bias -> penalties -> dry -> top-k -> typical -> top-p -> min-p -> xtc -> temp-ext -> dist
+generate: n_ctx = 4096, n_batch = 2048, n_predict = -1, n_keep = 1
+
+What is 1+1?<think>
+I need to calculate the sum of two numbers: 1 and 1.
+
+First, I'll identify the numbers involved in the addition: 1 and 1.
+
+Next, I'll add these two numbers together: 1 plus 1.
+
+Finally, the result of the addition is 2.
+</think>
+
+To solve \(1 + 1\), follow these steps:
+
+1. **Identify the numbers to add:**
+   The numbers are \(1\) and \(1\).
+
+2. **Add the numbers together:**
+   \(1 + 1 = 2\).
+
+3. **Final Answer:**
+   \(\boxed{2}\) [end of text]
+
+
+llama_perf_sampler_print:    sampling time =       8.30 ms /   147 runs   (    0.06 ms per token, 17708.71 tokens per second)
+llama_perf_context_print:        load time =    1083.18 ms
+llama_perf_context_print: prompt eval time =     810.71 ms /    10 tokens (   81.07 ms per token,    12.33 tokens per second)
+llama_perf_context_print:        eval time =   67364.69 ms /   136 runs   (  495.33 ms per token,     2.02 tokens per second)
+llama_perf_context_print:       total time =   68215.19 ms /   146 tokens
@@ -0,0 +1,55 @@
+/home/try/Documents/CXLMemSim-dev/cmake-build-debug/CXLMemSim -t "./workloads/memcached -u try"
+========== Process 0[tgid=1173294, tid=1173294] statistics summary ==========
+emulated time =198.512602908
+total delay   =5.085377891
+PEBS sample total 838 11399
+LBR sample total 0
+bpftime sample total 0
+CXLController:
+Total system memory capacity: 60GB
+  Page Type: PAGE
+  Global Counter:
+    Local: 0
+    Remote: 43
+    HITM: 1857813796
+Topology:
+Switch:
+  Events:
+    Load: 0
+    Store: 0
+    Conflict: 0
+  Switch:
+    Events:
+      Load: 0
+      Store: 29
+      Conflict: 0
+    Expander:
+      Events:
+        Load: 0
+        Store: 15
+        Migrate in: 0
+        Migrate out: 0
+        Hit Old: 0
+    Expander:
+      Events:
+        Load: 0
+        Store: 14
+        Migrate in: 0
+        Migrate out: 0
+        Hit Old: 0
+  Expander:
+    Events:
+      Load: 0
+      Store: 14
+      Migrate in: 0
+      Migrate out: 0
+      Hit Old: 0
+
+Statistics:
+  Number of Switches: 2
+  Number of Endpoints: 3
+  Number of Threads created: 1
+  Memory Freed: 0 bytes
+
+
+Process finished with exit code 0
@@ -0,0 +1,4 @@
+time ./workloads/memcached -u try
+real    3m9.193s
+user    0m8.464s
+sys     0m35.308s