Updated some URLs. Added real-world workload docs to setup

Scusemua · Scusemua · commit 693ba2a96548 · 2023-10-19T10:49:45.000-04:00
diff --git a/README.md b/README.md
@@ -233,7 +233,7 @@ Hops is released under an [Apache 2.0 license](LICENSE.txt).
 
 # Associated Publications
 
-This software was the subject of the paper, *λFS: A Scalable and Elastic Distributed File System Metadata Service using Serverless Functions*. This paper can be found [here](https://arxiv.org/abs/2306.11877) and is set to appear in the proceedings of ASPLOS'23. The software found [here](https://github.com/ds2-lab/LambdaFS-Benchmark-Utility) was used to evaluate λFS and HopsFS for the paper.
+This software was the subject of the paper, *λFS: A Scalable and Elastic Distributed File System Metadata Service using Serverless Functions*. This paper can be found [here](https://arxiv.org/abs/2306.11877) and is set to appear in the proceedings of ASPLOS'23. The software found [here](https://github.com/ds2-lab/LambdaFS-Benchmarking) was used to evaluate λFS and HopsFS for the paper.
 
 **BibTeX Citation (for arXiv preprint)**:
 ```
diff --git a/aws-setup/documentation/setup.md b/aws-setup/documentation/setup.md
@@ -411,7 +411,7 @@ If everything works, then you'll also see a message prefixed with `[SUCCESS]` be
 
 # λFS Benchmarking Utility
 
-**NOTE:** This information is covered in greater detail in the LambdaFS-Benchmarking-Utility GitHub repository available [here](https://github.com/ds2-lab/LambdaFS-Benchmark-Utility).
+**NOTE:** This information is covered in greater detail in the LambdaFS-Benchmarking-Utility GitHub repository available [here](https://github.com/ds2-lab/LambdaFS-Benchmarking).
 
 To run the same software we used when evaluating λFS and HopsFS, you can navigate to the `/home/ubuntu/repos/LambdaFS-BenchmarkingUtility` directory. The branch compatible with λFS is the `origin/generic` branch, while the branch compatible with HopsFS is the `origin/vanilla-distributed` branch.
 
@@ -455,7 +455,7 @@ There are several configuration parameters to set:
 
 For each "follower" (i.e., other machine on which you'd like to run the benchmarking software), you must add an entry to the `followers` list using the format shown above. If deployed on AWS EC2 within a VPC, then the `ip` is the private IPv4 of the EC2 VM. For `user`, specify the OS username that should be used when SSH-ing to the machine. If using our provided EC2 AMIs, then this will be `ubuntu`. 
 
-## Full List of Available Command-Line Arguments
+## **Full List of Available Command-Line Arguments**
 
 The following is the full list of available command-line arguments for the λFS Benchmarking Utility.
 ```
@@ -482,4 +482,30 @@ The following is the full list of available command-line arguments for the λFS
   The command should SCP the hdfs-site.xml config file to each follower.
 
 -m  --manually_launch_followers   [no value] [default: false]
-```
+```
+
+## Real-World Workloads
+
+This software also drives simulations of the HDFS Spotify workload described in the paper. This option can be selected from the interactive menu along with all of the other experiments. The real-world workload expects there to be a `workload.yaml` file in the root of the repository on the primary client (i.e., experiment driver). The following is a description of the available parameters.
+
+### **General Config Parameters for the Real-World Spotify Workload**
+- `num.worker.threads` (`int`): The total number of clients that each individual worker node should use. If this is set to `128` and there are 8 worker nodes used in the experiment, then there will be a total of 1,024 clients.
+- `files.to.create.in.warmup.phase` (`int`): The number of files that each individual client should create at the very beginning of the experiment. These files are used to perform `move`, `delete`, and `rename` operations.
+- `warmup.phase.wait.time` (`int`): How long to wait at the beginning for all "warm-up files" to be created before moving onto the actual experiment.
+- `interleaved.bm.duration` (`int`): How long the real-world experiment should last (in *milliseconds*). 
+- `interleaved.bm.iat.unit` (`int`) (**recommended:** `15`): How long, in seconds, the current randomly-generated throughput value should last before a new value is generated. 
+- `interleaved.bm.iat.skipunit` (`int`) (**recommended:** `0`): Skips rate-limiting for this number of ticks. Recommended to leave this at 0. 
+- `interleaved.bm.iat.distribution` (`string`) (**recommended:** `PARETO`): Defines the distribution to use when randomly generating file system operations. Options include `"UNIFORM"`, `"PARETO"` (default/recommended), `"POISSON"`, and `"ZIPF"`.
+- `interleaved.bm.iat.pareto.alpha`(`int`): (**recommended:** `2`): Shape parameter of the `Pareto` distribution.
+- `interleaved.bm.iat.pareto.location` (`int`): (**recommended:** `10000`): Used as a parameter to the `Pareto` distribution. 
+
+### **File System Operation Distribution Parameters**
+- `interleaved.create.files.percentage`(**recommended:** `1.09`): Percentage of `CREATE-FILE` operations.
+- `interleaved.rename.files.percentage`(**recommended:** `0.55`): Percentage of `RENAME-FILE` operations.
+- `interleaved.delete.files.percentage`(**recommended:** `0.34`): Percentage of `DELETE-FILE` operations.
+- `interleaved.mkdir.percentage`(**recommended:** `0.02`): Percentage of `MKDIR` operations.
+- `interleaved.read.files.percentage`(**recommended:** `71.84`): Percentage of `READ-FILE` operations.
+- `interleaved.ls.dirs.percentage`(**recommended:** `8.17`): Percentage of `LIST-DIRECTORY` operations.
+- `interleaved.ls.files.percentage`(**recommended:** `0.68`): Percentage of `LIST-FILE` operations.
+- `interleaved.file.getInfo.percentage`(**recommended:** `13.54`): Percentage of `STAT-FILE` operations.
+- `interleaved.dir.getInfo.percentage`(**recommended:** `3.77`): Percentage of `STAT-DIRECTORY` operations.
diff --git a/aws-setup/documentation/setup_tldr.md b/aws-setup/documentation/setup_tldr.md
@@ -228,4 +228,4 @@ com.gmail.benrcarver.distributed.InteractiveTest --leader_ip <PRIVATE IPv4 OF VM
 
 Note that we're setting the JVM heap size to 2GB in the above command via the flags `-Xmx2g -Xms2g`. If you're using a VM with less than 2GB of RAM, then you should adjust this value accordingly. We recommend at least 256-512MB of RAM for basic testing with single file system operations and 1-2GB for benchmarks, especially if running in `distributed` mode. 
 
-For more detailed instructions on the `distributed` mode of the benchmarking utility, please refer to the `setup.md` file (in `aws-setup/documentation/setup.md`) or the LambdaFS-Benchmarking-Utility GitHub repository available [here](https://github.com/ds2-lab/LambdaFS-Benchmark-Utility).
+For more detailed instructions on the `distributed` mode of the benchmarking utility, please refer to the `setup.md` file (in `aws-setup/documentation/setup.md`) or the LambdaFS-Benchmarking-Utility GitHub repository available [here](https://github.com/ds2-lab/LambdaFS-Benchmarking).

Original file line number	Diff line number	Diff line change
`@@ -228,4 +228,4 @@ com.gmail.benrcarver.distributed.InteractiveTest --leader_ip <PRIVATE IPv4 OF VM`
`228`	`228`
`229`	`229`	Note that we're setting the JVM heap size to 2GB in the above command via the flags `-Xmx2g -Xms2g`. If you're using a VM with less than 2GB of RAM, then you should adjust this value accordingly. We recommend at least 256-512MB of RAM for basic testing with single file system operations and 1-2GB for benchmarks, especially if running in `distributed` mode.
`230`	`230`
`231`		-For more detailed instructions on the `distributed` mode of the benchmarking utility, please refer to the `setup.md` file (in `aws-setup/documentation/setup.md`) or the LambdaFS-Benchmarking-Utility GitHub repository available [here](https://github.com/ds2-lab/LambdaFS-Benchmark-Utility).
	`231`	+For more detailed instructions on the `distributed` mode of the benchmarking utility, please refer to the `setup.md` file (in `aws-setup/documentation/setup.md`) or the LambdaFS-Benchmarking-Utility GitHub repository available [here](https://github.com/ds2-lab/LambdaFS-Benchmarking).