Skip to content

Commit 4382435

Browse files
authored
Merge pull request #19 from mavroudo/v2
V2
2 parents bacd6d0 + 63a2a4a commit 4382435

File tree

10 files changed

+172
-218
lines changed

10 files changed

+172
-218
lines changed

README.md

Lines changed: 37 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -45,39 +45,42 @@ a new network using the following command:
4545
docker network create --driver=bridge siesta-net
4646
```
4747

48-
2. **Deploy database:** From the root directory execute the following commands:
48+
2. **Deploy the infrastructure:** From the root directory execute the following command:
4949
```bash
50-
docker-compose -f dockerbase/docker-compose-s3.yml up -d
51-
```
52-
for the S3
50+
docker-compose up -d
51+
```
52+
This will deploy the entire SIESTA infrastructure, which includes the Preprocessing Component
53+
(integrated with a Python REST API implemented in FastAPI), the Query Processor, and the User Interface.
54+
Additionally, it deploys a database for index storage. Currently, MinIO (S3) is active, while Cassandra is commented out.
55+
You can choose to switch between these two, or use environmental variables to set up a connection with another database.
5356

54-
```bash
55-
docker-compose -f dockerbase/docker-compose-cassandra.yml up -d
56-
```
57-
for the Cassandra. The database will be deployed locally, opening the default ports, and then it will be
58-
detached.
57+
Before executing the preprocessing with S3, note that you must create a new bucket named **siesta**.
58+
You can access the different services from the following endpoints:
59+
60+
- FastAPI: http://localhost:8000/#docs
61+
- S3: http://localhost:9000 (default username/password: minionadmin/minioadmin)
5962

60-
If you decide to use S3, you have to create a new bucket named "**siesta**" before proceeding to the next step. To do that
61-
login to http://localhost:9000 using for both username and password **minioadmin** (default option for minio). Click on
62-
**Buckets** from the left and then press **Create Bucket**. Use the default settings.
6363

64-
3. **Build Docker image:** From the root directory run the following command:
64+
### Build the preprocess component separately
65+
1. **Build Docker image:** From the root directory run the following command:
6566
```bash
6667
docker build -t preprocess -f dockerbase/Dockerfile .
6768
```
6869
This will download all the dependencies, build the jar file and finally download the spark component. The image is now
6970
ready to be executed.
7071

71-
4. **Run image:** After image was built it can be run with the
72+
2. **Deploy a database:** You can run from the root directory ```docker-compose up -d minio``` to deploy S3,
73+
or ```docker-compose up -d cassandra``` to deploy Cassandra.
74+
75+
3. **Run image:** if S3 is utilized
7276
```bash
7377
docker run --network siesta-net preprocess
7478
```
75-
76-
if S3 is utilized or
79+
if Cassandra is utilized
7780
```bash
7881
docker run --network siesta-net preprocess -d cassandra
7982
```
80-
for Cassandra. The default execution will generate 200 synthetic traces,
83+
The default execution will generate 200 synthetic traces,
8184
using 10 different event types, and lengths that vary from 10 to 90 events. The inverted indices will be stored
8285
using "test" as the logname.
8386

@@ -86,43 +89,42 @@ Connecting to already deployed databases or utilizing a spark cluster can be eas
8689
of parameters. The only thing that you should make sure is that their urls are accessible
8790
by the docker container. this can be done by either making the url publicly available or by connecting the
8891
docker container in the same network (as done above with the siesta-net).
89-
- **Connect with spark cluster:** Change the value of the "**--master**" parameter in the ENTRYPOINT of the
90-
Dockerfile from "**local[*]**" to the resource manager's url.
92+
- **Connect with spark cluster (with the api):** Change the value of the Spark master parameter before submitting the
93+
preprocess job from "**local[*]**" to the resource manager's url.
94+
- **Connect with spark cluster (standalone):** Change the value of the "**--master**" parameter in the ENTRYPOINT of the
95+
Dockerfile from "**local[*]**" to the resource manager's url. At the end build the image again before executing it.
9196
- **Connect with Cassandra:** Change the values environmental parameters that start with **cassandra\_**.
9297
These parameters include the contact point and the credentials required to achieve connection.
9398
- **Connect with S3:** Change the values environmental parameters that start with **s3**.
9499
These parameters include the contact point and the credentials required to achieve connection.
95100

96-
At the end build the image again before executing it.
101+
97102

98103
### Executing preprocess for a provided logfile
99104
Till now the supported file extensions are "**.xes**", which are the default file for the Business Process
100105
Management logfiles and "**.withTimestamp**", which is a generic file format generated for testing. A new
101106
connector can be easily implemented in the _auth.datalab.siesta.BusinessLogic.IngestData.ReadLogFile_.
102107

103-
In order to execute the preprocess for a provided logfile you need to take 2 steps. First ensure that the
108+
You can either submit a file to be preprocessed through the User Interface (Preprocessing tab), through the FastAPI docs
109+
or in the standalone format. For the last one you need to take 2 steps.
110+
First ensure that the
104111
logfile is visible inside the docker container and second execute the preprocessing with the appropriate
105112
parameters. Therefore, place the logfile you want to preprocess inside the _experiments/input_ file.
106113
Assuming that the logfile is named "log.xes" and the indices should have the name "log" run the following
107114
command from the root directory:
108-
```bash
109-
docker run --mount type=bind,source="$(pwd)"/experiments/input,target=/app/input \
110-
preprocess -f /input/log.xes --logname log
111-
```
112-
### Execute preprocessing through API
113115

114-
There is another way to execute the preprocess component and this is utilizing an API. To that end,
115-
FastAPI was used. The process allows to upload log file, modify the environmental parameters (that
116-
describe among others the connection properties to the databases) and execute the preprocessing.
116+
You can submit a file for preprocessing through the User Interface (under the Preprocessing tab),
117+
via the FastAPI docs, or in standalone format. For the latter, two steps are required.
118+
First, ensure that the logfile is visible inside the Docker container.
119+
Second, execute the preprocessing with the appropriate parameters.
120+
To do this, place the logfile you wish to preprocess inside the _experiments/input_ directory.
121+
Assuming that the logfile is named log.xes and the indices should be named log,
122+
run the following command from the root directory:
117123

118-
The same parameters used while executing the preprocessing jar can also be set here, as parameters
119-
in the request.
120-
121-
To deploy the preprocess component with the api run the following command from the root directory:
122124
```bash
123-
docker-compose -f dockerbase/docker-compose-preprocess-with-api.yml up
125+
docker run --mount type=bind,source="$(pwd)"/experiments/input,target=/app/input \
126+
preprocess -f /app/input/log.xes --logname log
124127
```
125-
and then access the docs (Swagger) from the http://localhost:8000/docs.
126128

127129
### Complete list of parameters:
128130
```

config.json

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
{
2+
"BASE_URL": "http://localhost:8090",
3+
"PREPROCESS_BASE_URL": "http://localhost:8000"
4+
}

docker-compose.yml

Lines changed: 106 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,106 @@
1+
version: '3.7'
2+
services:
3+
preprocess:
4+
image: mavroudo/siesta-preprocess:2
5+
ports:
6+
- "8000:8000"
7+
networks:
8+
- siesta-net
9+
volumes:
10+
- preprocesses:/app/pythonAPI/dbSQL
11+
environment:
12+
#for cassandra
13+
cassandra_host: cassandra
14+
cassandra_port: 9042
15+
cassandra_user: cassandra
16+
cassandra_pass: cassandra
17+
cassandra_keyspace_name: siesta
18+
cassandra_replication_class: SimpleStrategy
19+
cassandra_replication_rack: replication_factor
20+
cassandra_replication_factor: 3
21+
cassandra_write_consistency_level: ONE
22+
cassandra_gc_grace_seconds: 864000
23+
# for s3 (minio)
24+
s3accessKeyAws: minioadmin
25+
s3ConnectionTimeout: 600000
26+
s3endPointLoc: http://minio:9000
27+
s3secretKeyAws: minioadmin
28+
29+
query:
30+
image: mavroudo/siesta-query:2
31+
environment:
32+
master.uri: local[4] # or local[*]
33+
database: s3 # cassandra-rdd or s3
34+
#for s3 (minio)
35+
s3.endpoint: http://minio:9000
36+
s3.user: minioadmin
37+
s3.key: minioadmin
38+
s3.timetout: 600000
39+
#for cassandra
40+
cassandra.max_requests_per_local_connection: 32768
41+
cassandra.max_requests_per_remote_connection: 22000
42+
cassandra.connections_per_host: 1000
43+
cassandra.max_queue_size: 1024
44+
cassandra.connection_timeout: 30000
45+
cassandra.read_timeout: 30000
46+
spring.data.cassandra.contact-points: cassandra
47+
spring.data.cassandra.port: 9042
48+
spring.data.cassandra.user: cassandra
49+
spring.data.cassandra.password: cassandra
50+
server.port: 8090 # port of the application
51+
volumes:
52+
- ./build:/root/.m2
53+
ports:
54+
- '8090:8090'
55+
networks:
56+
- siesta-net
57+
58+
ui:
59+
image: mavroudo/siesta-ui:2
60+
ports:
61+
- "80:80"
62+
# if you want to modify the base url (for setting it to a server, add and modify the config.json file
63+
# volumes:
64+
# - ./config.json:/usr/share/nginx/html/config.json
65+
66+
67+
minio:
68+
container_name: minio
69+
image: minio/minio:RELEASE.2023-11-01T01-57-10Z
70+
ports:
71+
- "9000:9000"
72+
- "9001:9001"
73+
volumes:
74+
- minio_storage:/data
75+
environment:
76+
MINIO_ROOT_USER: minioadmin
77+
MINIO_ROOT_PASSWORD: minioadmin
78+
command: server --console-address ":9001" /data
79+
networks:
80+
- siesta-net
81+
82+
# cassandra:
83+
# image: 'cassandra:4.0'
84+
# container_name: cassandra
85+
# ports:
86+
# - '7000:7000'
87+
# - '9042:9042'
88+
# volumes:
89+
# - './cassandra/data:/var/lib/cassandra'
90+
# environment:
91+
# PROJECT_NAME: siesta
92+
# CASSANDRA_SEEDS: cassandra
93+
# CASSANDRA_PASSWORD_SEEDER: yes
94+
# CASSANDRA_PASSWORD: cassandra
95+
# networks:
96+
# - siesta-net
97+
98+
99+
networks:
100+
siesta-net:
101+
name: siesta-net
102+
external: true
103+
104+
volumes:
105+
minio_storage: {}
106+
preprocesses: {}

dockerbase/Dockerfile

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -36,9 +36,9 @@ ENV cassandra_write_consistency_level=ONE
3636
ENV cassandra_gc_grace_seconds=864000
3737
ENV s3accessKeyAws=minioadmin
3838
ENV s3ConnectionTimeout=600000
39-
ENV s3endPointLoc=http://minio-contact:9000
39+
ENV s3endPointLoc=http://minio:9000
4040
ENV s3secretKeyAws=minioadmin
4141

4242

4343
ENTRYPOINT ["/opt/spark/bin/spark-submit","--master","local[*]","preprocess.jar"]
44-
CMD ["--logname","test","--delete_prev"]
44+
CMD ["--logname","test","--delete_prev"]

dockerbase/docker-compose-cassandra.yml

Lines changed: 0 additions & 19 deletions
This file was deleted.

dockerbase/docker-compose-preprocess-with-api.yml

Lines changed: 0 additions & 21 deletions
This file was deleted.

dockerbase/docker-compose-s3.yml

Lines changed: 0 additions & 26 deletions
This file was deleted.

0 commit comments

Comments
 (0)