mavroudo
diff --git a/‎README.md‎
Lines changed: 37 additions & 35 deletions b/‎README.md‎
Lines changed: 37 additions & 35 deletions
diff --git a/‎config.json‎
Lines changed: 4 additions & 0 deletions b/‎config.json‎
Lines changed: 4 additions & 0 deletions
diff --git a/‎docker-compose.yml‎
Lines changed: 106 additions & 0 deletions b/‎docker-compose.yml‎
Lines changed: 106 additions & 0 deletions
diff --git a/‎dockerbase/Dockerfile‎
Lines changed: 2 additions & 2 deletions b/‎dockerbase/Dockerfile‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎dockerbase/preprocessWithAPI.Dockerfile‎ ‎dockerbase/REST.Dockerfile‎dockerbase/preprocessWithAPI.Dockerfile renamed to dockerbase/REST.Dockerfile b/‎dockerbase/preprocessWithAPI.Dockerfile‎ ‎dockerbase/REST.Dockerfile‎dockerbase/preprocessWithAPI.Dockerfile renamed to dockerbase/REST.Dockerfile
diff --git a/‎dockerbase/docker-compose-cassandra.yml‎
Lines changed: 0 additions & 19 deletions b/‎dockerbase/docker-compose-cassandra.yml‎
Lines changed: 0 additions & 19 deletions
diff --git a/‎dockerbase/docker-compose-preprocess-with-api.yml‎
Lines changed: 0 additions & 21 deletions b/‎dockerbase/docker-compose-preprocess-with-api.yml‎
Lines changed: 0 additions & 21 deletions
diff --git a/‎dockerbase/docker-compose-s3.yml‎
Lines changed: 0 additions & 26 deletions b/‎dockerbase/docker-compose-s3.yml‎
Lines changed: 0 additions & 26 deletions
@@ -45,39 +45,42 @@ a new network using the following command:
 docker network create --driver=bridge  siesta-net
 ```
 
-2. **Deploy database:** From the root directory execute the following commands:
+2. **Deploy the infrastructure:** From the root directory execute the following command:
 ```bash
-docker-compose -f dockerbase/docker-compose-s3.yml up -d
-```   
-for the S3
+docker-compose up -d 
+```
+This will deploy the entire SIESTA infrastructure, which includes the Preprocessing Component
+(integrated with a Python REST API implemented in FastAPI), the Query Processor, and the User Interface. 
+Additionally, it deploys a database for index storage. Currently, MinIO (S3) is active, while Cassandra is commented out.
+You can choose to switch between these two, or use environmental variables to set up a connection with another database.
 
-```bash
-docker-compose -f dockerbase/docker-compose-cassandra.yml up -d
-```  
-for the Cassandra. The database will be deployed locally, opening the default ports, and then it will be
-detached.
+Before executing the preprocessing with S3, note that you must create a new bucket named **siesta**.
+You can access the different services from the following endpoints:
+
+- FastAPI: http://localhost:8000/#docs
+- S3: http://localhost:9000 (default username/password: minionadmin/minioadmin)
 
-If you decide to use S3, you have to create a new bucket named "**siesta**" before proceeding to the next step. To do that
-login to http://localhost:9000 using for both username and password **minioadmin** (default option for minio). Click on
-**Buckets** from the left and then press **Create Bucket**. Use the default settings.
 
-3. **Build Docker image:** From the root directory run the following command:
+### Build the preprocess component separately 
+1. **Build Docker image:** From the root directory run the following command:
 ```bash
 docker build -t preprocess -f dockerbase/Dockerfile .
 ```
 This will download all the dependencies, build the jar file and finally download the spark component. The image is now
 ready to be executed.
 
-4. **Run image:** After image was built it can be run with the 
+2. **Deploy a database:** You can run from the root directory ```docker-compose up -d minio``` to deploy S3,
+ or ```docker-compose up -d cassandra``` to deploy Cassandra.
+
+3. **Run image:** if S3 is utilized 
 ```bash
 docker run --network siesta-net preprocess
 ```
-
-if S3 is utilized or
+if Cassandra is utilized
 ```bash
 docker run --network siesta-net preprocess -d cassandra
 ``` 
-for Cassandra. The default execution will  generate 200 synthetic traces, 
+The default execution will  generate 200 synthetic traces, 
 using 10 different event types, and lengths that vary from 10 to 90 events. The inverted indices will be stored
 using "test" as the logname.
 
@@ -86,43 +89,42 @@ Connecting to already deployed databases or utilizing a spark cluster can be eas
 of parameters. The only thing that you should make sure is that their urls are accessible
 by the docker container. this can be done by either making the url publicly available or by connecting the
 docker container in the same network (as done above with the siesta-net).
-- **Connect with spark cluster:** Change the value of the "**--master**" parameter in the ENTRYPOINT of the 
-Dockerfile from "**local[*]**" to the resource manager's url. 
+- **Connect with spark cluster (with the api):** Change the value of the Spark master parameter before submitting the
+preprocess job from "**local[*]**" to the resource manager's url. 
+- **Connect with spark cluster (standalone):** Change the value of the "**--master**" parameter in the ENTRYPOINT of the 
+Dockerfile from "**local[*]**" to the resource manager's url. At the end build the image again before executing it.
 - **Connect with Cassandra:** Change the values environmental parameters that start with **cassandra\_**. 
 These parameters include the contact point and the credentials required to achieve connection.
 - **Connect with S3:** Change the values environmental parameters that start with **s3**.
    These parameters include the contact point and the credentials required to achieve connection.
 
-At the end build the image again before executing it.
+
 
 ### Executing preprocess for a provided logfile
 Till now the supported file extensions are "**.xes**", which are the default file for the Business Process
 Management logfiles and "**.withTimestamp**", which is a generic file format generated for testing. A new
 connector can be easily implemented in the _auth.datalab.siesta.BusinessLogic.IngestData.ReadLogFile_. 
 
-In order to execute the preprocess for a provided logfile you need to take 2 steps. First ensure that the
+You can either submit a file to be preprocessed through the User Interface (Preprocessing tab), through the FastAPI docs
+or in the standalone format. For the last one you need to take 2 steps.
+First ensure that the
 logfile is visible inside the docker container and second execute the preprocessing with the appropriate
 parameters. Therefore, place the logfile you want to preprocess inside the _experiments/input_ file. 
 Assuming that the logfile is named "log.xes" and the indices should have the name "log" run the following
 command from the root directory:
-```bash
-docker run  --mount type=bind,source="$(pwd)"/experiments/input,target=/app/input \
-  preprocess -f /input/log.xes --logname log
-```
-### Execute preprocessing through API
 
-There is another way to execute the preprocess component and this is utilizing an API. To that end, 
-FastAPI was used. The process allows to upload log file, modify the environmental parameters (that 
-describe among others the connection properties to the databases) and execute the preprocessing.
+You can submit a file for preprocessing through the User Interface (under the Preprocessing tab), 
+via the FastAPI docs, or in standalone format. For the latter, two steps are required. 
+First, ensure that the logfile is visible inside the Docker container. 
+Second, execute the preprocessing with the appropriate parameters.
+To do this, place the logfile you wish to preprocess inside the _experiments/input_ directory. 
+Assuming that the logfile is named log.xes and the indices should be named log,
+run the following command from the root directory:
 
-The same parameters used while executing the preprocessing jar can also be set here, as parameters
-in the request.
-
-To deploy the preprocess component with the api run the following command from the root directory:
 ```bash
-docker-compose -f dockerbase/docker-compose-preprocess-with-api.yml up
+docker run  --mount type=bind,source="$(pwd)"/experiments/input,target=/app/input \
+  preprocess -f /app/input/log.xes --logname log
 ```
-and then access the docs (Swagger) from the http://localhost:8000/docs.
 
 ### Complete list of parameters:
 ```
 
@@ -0,0 +1,4 @@
+{
+	"BASE_URL": "http://localhost:8090",
+	"PREPROCESS_BASE_URL": "http://localhost:8000"
+}
@@ -0,0 +1,106 @@
+version: '3.7'
+services:
+  preprocess:
+    image: mavroudo/siesta-preprocess:2
+    ports:
+      - "8000:8000"
+    networks:
+      - siesta-net
+    volumes:
+      - preprocesses:/app/pythonAPI/dbSQL
+    environment:
+      #for cassandra
+      cassandra_host: cassandra
+      cassandra_port: 9042
+      cassandra_user: cassandra
+      cassandra_pass: cassandra
+      cassandra_keyspace_name: siesta
+      cassandra_replication_class: SimpleStrategy
+      cassandra_replication_rack: replication_factor
+      cassandra_replication_factor: 3
+      cassandra_write_consistency_level: ONE
+      cassandra_gc_grace_seconds: 864000
+      # for s3 (minio)
+      s3accessKeyAws: minioadmin
+      s3ConnectionTimeout: 600000
+      s3endPointLoc: http://minio:9000
+      s3secretKeyAws: minioadmin
+
+  query:
+    image: mavroudo/siesta-query:2
+    environment:
+      master.uri: local[4] # or local[*]
+      database: s3 # cassandra-rdd or s3
+      #for s3 (minio)
+      s3.endpoint: http://minio:9000
+      s3.user: minioadmin
+      s3.key: minioadmin
+      s3.timetout: 600000
+      #for cassandra
+      cassandra.max_requests_per_local_connection: 32768
+      cassandra.max_requests_per_remote_connection: 22000
+      cassandra.connections_per_host: 1000
+      cassandra.max_queue_size: 1024
+      cassandra.connection_timeout: 30000
+      cassandra.read_timeout: 30000
+      spring.data.cassandra.contact-points: cassandra
+      spring.data.cassandra.port: 9042
+      spring.data.cassandra.user: cassandra
+      spring.data.cassandra.password: cassandra
+      server.port: 8090 # port of the application
+    volumes:
+      - ./build:/root/.m2
+    ports:
+      - '8090:8090'
+    networks:
+      - siesta-net
+
+  ui:
+    image: mavroudo/siesta-ui:2
+    ports:
+      - "80:80"
+      # if you want to modify the base url (for setting it to a server, add and modify the config.json file 
+#    volumes:
+#      - ./config.json:/usr/share/nginx/html/config.json
+
+
+  minio:
+    container_name: minio
+    image: minio/minio:RELEASE.2023-11-01T01-57-10Z
+    ports:
+      - "9000:9000"
+      - "9001:9001"
+    volumes:
+      - minio_storage:/data
+    environment:
+      MINIO_ROOT_USER: minioadmin
+      MINIO_ROOT_PASSWORD: minioadmin
+    command: server --console-address ":9001" /data
+    networks:
+      - siesta-net
+
+#  cassandra:
+#    image: 'cassandra:4.0'
+#    container_name: cassandra
+#    ports:
+#      - '7000:7000'
+#      - '9042:9042'
+#    volumes:
+#      - './cassandra/data:/var/lib/cassandra'
+#    environment:
+#      PROJECT_NAME: siesta
+#      CASSANDRA_SEEDS: cassandra
+#      CASSANDRA_PASSWORD_SEEDER: yes
+#      CASSANDRA_PASSWORD: cassandra
+#    networks:
+#      - siesta-net
+
+
+networks:
+  siesta-net:
+    name: siesta-net
+    external: true
+
+volumes:
+  minio_storage: {}
+  preprocesses: {}
@@ -36,9 +36,9 @@ ENV cassandra_write_consistency_level=ONE
 ENV cassandra_gc_grace_seconds=864000
 ENV s3accessKeyAws=minioadmin
 ENV s3ConnectionTimeout=600000
-ENV s3endPointLoc=http://minio-contact:9000
+ENV s3endPointLoc=http://minio:9000
 ENV s3secretKeyAws=minioadmin
 
 
 ENTRYPOINT ["/opt/spark/bin/spark-submit","--master","local[*]","preprocess.jar"]
-CMD ["--logname","test","--delete_prev"]
+CMD ["--logname","test","--delete_prev"]
-Original file line number
+Diff line change
@@ @@ -0,0 +1,4 @@ @@
 +{
 +	"BASE_URL": "http://localhost:8090",
 +	"PREPROCESS_BASE_URL": "http://localhost:8000"
 +}