@@ -45,39 +45,42 @@ a new network using the following command:
4545docker network create --driver=bridge siesta-net
4646```
4747
48- 2 . ** Deploy database :** From the root directory execute the following commands :
48+ 2 . ** Deploy the infrastructure :** From the root directory execute the following command :
4949``` bash
50- docker-compose -f dockerbase/docker-compose-s3.yml up -d
51- ```
52- for the S3
50+ docker-compose up -d
51+ ```
52+ This will deploy the entire SIESTA infrastructure, which includes the Preprocessing Component
53+ (integrated with a Python REST API implemented in FastAPI), the Query Processor, and the User Interface.
54+ Additionally, it deploys a database for index storage. Currently, MinIO (S3) is active, while Cassandra is commented out.
55+ You can choose to switch between these two, or use environmental variables to set up a connection with another database.
5356
54- ``` bash
55- docker-compose -f dockerbase/docker-compose-cassandra.yml up -d
56- ```
57- for the Cassandra. The database will be deployed locally, opening the default ports, and then it will be
58- detached.
57+ Before executing the preprocessing with S3, note that you must create a new bucket named ** siesta ** .
58+ You can access the different services from the following endpoints:
59+
60+ - FastAPI: http://localhost:8000/#docs
61+ - S3: http://localhost:9000 (default username/password: minionadmin/minioadmin)
5962
60- If you decide to use S3, you have to create a new bucket named "** siesta** " before proceeding to the next step. To do that
61- login to http://localhost:9000 using for both username and password ** minioadmin** (default option for minio). Click on
62- ** Buckets** from the left and then press ** Create Bucket** . Use the default settings.
6363
64- 3 . ** Build Docker image:** From the root directory run the following command:
64+ ### Build the preprocess component separately
65+ 1 . ** Build Docker image:** From the root directory run the following command:
6566``` bash
6667docker build -t preprocess -f dockerbase/Dockerfile .
6768```
6869This will download all the dependencies, build the jar file and finally download the spark component. The image is now
6970ready to be executed.
7071
71- 4 . ** Run image:** After image was built it can be run with the
72+ 2 . ** Deploy a database:** You can run from the root directory ``` docker-compose up -d minio ``` to deploy S3,
73+ or ``` docker-compose up -d cassandra ``` to deploy Cassandra.
74+
75+ 3 . ** Run image:** if S3 is utilized
7276``` bash
7377docker run --network siesta-net preprocess
7478```
75-
76- if S3 is utilized or
79+ if Cassandra is utilized
7780``` bash
7881docker run --network siesta-net preprocess -d cassandra
7982```
80- for Cassandra. The default execution will generate 200 synthetic traces,
83+ The default execution will generate 200 synthetic traces,
8184using 10 different event types, and lengths that vary from 10 to 90 events. The inverted indices will be stored
8285using "test" as the logname.
8386
@@ -86,43 +89,42 @@ Connecting to already deployed databases or utilizing a spark cluster can be eas
8689of parameters. The only thing that you should make sure is that their urls are accessible
8790by the docker container. this can be done by either making the url publicly available or by connecting the
8891docker container in the same network (as done above with the siesta-net).
89- - ** Connect with spark cluster:** Change the value of the "** --master** " parameter in the ENTRYPOINT of the
90- Dockerfile from "** local[ * ] ** " to the resource manager's url.
92+ - ** Connect with spark cluster (with the api):** Change the value of the Spark master parameter before submitting the
93+ preprocess job from "** local[ * ] ** " to the resource manager's url.
94+ - ** Connect with spark cluster (standalone):** Change the value of the "** --master** " parameter in the ENTRYPOINT of the
95+ Dockerfile from "** local[ * ] ** " to the resource manager's url. At the end build the image again before executing it.
9196- ** Connect with Cassandra:** Change the values environmental parameters that start with ** cassandra\_ ** .
9297These parameters include the contact point and the credentials required to achieve connection.
9398- ** Connect with S3:** Change the values environmental parameters that start with ** s3** .
9499 These parameters include the contact point and the credentials required to achieve connection.
95100
96- At the end build the image again before executing it.
101+
97102
98103### Executing preprocess for a provided logfile
99104Till now the supported file extensions are "** .xes** ", which are the default file for the Business Process
100105Management logfiles and "** .withTimestamp** ", which is a generic file format generated for testing. A new
101106connector can be easily implemented in the _ auth.datalab.siesta.BusinessLogic.IngestData.ReadLogFile_ .
102107
103- In order to execute the preprocess for a provided logfile you need to take 2 steps. First ensure that the
108+ You can either submit a file to be preprocessed through the User Interface (Preprocessing tab), through the FastAPI docs
109+ or in the standalone format. For the last one you need to take 2 steps.
110+ First ensure that the
104111logfile is visible inside the docker container and second execute the preprocessing with the appropriate
105112parameters. Therefore, place the logfile you want to preprocess inside the _ experiments/input_ file.
106113Assuming that the logfile is named "log.xes" and the indices should have the name "log" run the following
107114command from the root directory:
108- ``` bash
109- docker run --mount type=bind,source=" $( pwd) " /experiments/input,target=/app/input \
110- preprocess -f /input/log.xes --logname log
111- ```
112- ### Execute preprocessing through API
113115
114- There is another way to execute the preprocess component and this is utilizing an API. To that end,
115- FastAPI was used. The process allows to upload log file, modify the environmental parameters (that
116- describe among others the connection properties to the databases) and execute the preprocessing.
116+ You can submit a file for preprocessing through the User Interface (under the Preprocessing tab),
117+ via the FastAPI docs, or in standalone format. For the latter, two steps are required.
118+ First, ensure that the logfile is visible inside the Docker container.
119+ Second, execute the preprocessing with the appropriate parameters.
120+ To do this, place the logfile you wish to preprocess inside the _ experiments/input_ directory.
121+ Assuming that the logfile is named log.xes and the indices should be named log,
122+ run the following command from the root directory:
117123
118- The same parameters used while executing the preprocessing jar can also be set here, as parameters
119- in the request.
120-
121- To deploy the preprocess component with the api run the following command from the root directory:
122124``` bash
123- docker-compose -f dockerbase/docker-compose-preprocess-with-api.yml up
125+ docker run --mount type=bind,source=" $( pwd) " /experiments/input,target=/app/input \
126+ preprocess -f /app/input/log.xes --logname log
124127```
125- and then access the docs (Swagger) from the http://localhost:8000/docs .
126128
127129### Complete list of parameters:
128130```
0 commit comments