Skip to content

Commit eb09ec2

Browse files
Merge remote-tracking branch 'upstream/hotfixes' into release
2 parents 5835610 + f045f12 commit eb09ec2

17 files changed

+7843
-0
lines changed

docs/01_handling_event_data.md

Lines changed: 279 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,279 @@
1+
Supported/Described Version(s): pm4py 2.7.11.11
2+
3+
This documentation assumes that the reader has a basic understanding of process
4+
mining
5+
and python concepts.
6+
7+
8+
# Handling Event Data
9+
10+
11+
12+
13+
## Importing IEEE XES files
14+
15+
16+
IEEE XES is a standard format describing how event logs are stored.
17+
For more information about the format, please study the
18+
IEEE XES Website (http://www.xes-standard.org)
19+
.
20+
A simple synthetic event log (
21+
running-example.xes
22+
) can be downloaded from
23+
here (static/assets/examples/running-example.xes)
24+
.
25+
Note that several real event logs have been made available, over the past few
26+
years.
27+
You can find them
28+
here (https://data.4tu.nl/search?q=:keyword:%20real%20life%20event%20logs)
29+
.
30+
31+
32+
33+
The example code on the right shows how to import an event log, stored in the IEEE
34+
XES format, given a file path to the log file.
35+
The code fragment uses the standard importer (iterparse, described in a later
36+
paragraph).
37+
Note that IEEE XES Event Logs are imported into a Pandas dataframe object.
38+
39+
40+
```python
41+
import pm4py
42+
if __name__ == "__main__":
43+
log = pm4py.read_xes('tests/input_data/running-example.xes')
44+
```
45+
46+
47+
48+
49+
## Importing CSV files
50+
51+
52+
Apart from the IEEE XES standard, a lot of event logs are actually stored in a
53+
CSV
54+
file (https://en.wikipedia.org/wiki/Comma-separated_values)
55+
.
56+
In general, there is two ways to deal with CSV files in pm4py:
57+
,
58+
59+
- Import the CSV into a
60+
pandas (https://pandas.pydata.org)
61+
62+
DataFrame (https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html#pandas.read_csv)
63+
;
64+
In general, most existing algorithms in pm4py are coded to be flexible in terms
65+
of their
66+
input, i.e., if a certain event log object is provided that is not in the right
67+
form, we
68+
translate it to the appropriate form for you.
69+
Hence, after importing a dataframe, most algorithms are directly able to work
70+
with the
71+
data frame.
72+
,
73+
74+
- Convert the CSV into an event log object (similar to the result of the IEEE XES
75+
importer
76+
presented in the previous section);
77+
In this case, the first step is to import the CSV file using pandas (similar to
78+
the
79+
previous bullet) and subsequently converting it to the event log object.
80+
In the remainder of this section, we briefly highlight how to convert a pandas
81+
DataFrame
82+
to an event log.
83+
Note that most algorithms use the same type of conversion, in case a given
84+
event data
85+
object is not of the right type.
86+
87+
88+
The example code on the right shows how to convert a CSV file into the pm4py
89+
internal event data object types.
90+
By default, the converter converts the dataframe to an Event Log object (i.e., not
91+
an Event Stream).
92+
93+
94+
```python
95+
import pandas as pd
96+
import pm4py
97+
98+
if __name__ == "__main__":
99+
dataframe = pd.read_csv('tests/input_data/running-example.csv', sep=',')
100+
dataframe = pm4py.format_dataframe(dataframe, case_id='case:concept:name', activity_key='concept:name', timestamp_key='time:timestamp')
101+
event_log = pm4py.convert_to_event_log(dataframe)
102+
```
103+
104+
105+
Note that the example code above does not directly work in a lot of cases. Let us consider a very simple example event log, and, assume it is stored
106+
as a
107+
`csv`,
108+
109+
-file:
110+
111+
|CaseID|Activity|Timestamp|clientID|
112+
|---|---|---|---|
113+
|1|register request|20200422T0455|1337|
114+
|2|register request|20200422T0457|1479|
115+
|1|submit payment|20200422T0503|1337|
116+
|||||
117+
In this small example table, we observe four columns, i.e.,
118+
`CaseID`
119+
,
120+
`Activity`
121+
,
122+
`Timestamp`
123+
and
124+
`clientID`
125+
.
126+
Clearly, when importing the data and converting it to an Event Log object, we aim to
127+
combine all rows (events) with the same value for the
128+
`CaseID`
129+
column
130+
together.
131+
Another interesting phenomenon in the example data is the fourth column, i.e.,
132+
`clientID`
133+
.
134+
In fact, the client ID is an attribute that will not change over the course of
135+
execution
136+
a process instance, i.e., it is a
137+
case-level attribute
138+
.
139+
pm4py allows us to specify that a column actually describes a case-level attribute
140+
(under the assumption that the attribute does not change during the execution of a
141+
process).
142+
143+
The example code on the right shows how to convert the previously examplified csv
144+
data file.
145+
After loading the csv file of the example table, we rename the
146+
`clientID`
147+
column to
148+
`case:clientID`
149+
(this is a specific operation provided by
150+
pandas!).
151+
152+
153+
154+
```python
155+
import pandas as pd
156+
import pm4py
157+
158+
if __name__ == "__main__":
159+
dataframe = pd.read_csv('tests/input_data/running-example-transformed.csv', sep=',')
160+
dataframe = dataframe.rename(columns={'clientID': 'case:clientID'})
161+
dataframe = pm4py.format_dataframe(dataframe, case_id='CaseID', activity_key='Activity', timestamp_key='Timestamp')
162+
event_log = pm4py.convert_to_event_log(dataframe)
163+
```
164+
165+
166+
167+
168+
## Converting Event Data
169+
170+
171+
In this section, we describe how to convert event log objects from one object type
172+
to another object type.
173+
There are three objects, which we are able to 'switch' between, i.e., Event Log,
174+
Event Stream and Data Frame objects.
175+
Please refer to the previous code snippet for an example of applying log conversion
176+
(applied when importing a CSV object).
177+
Finally, note that most algorithms internally use the converters, in order to be
178+
able to handle an input event data object of any form.
179+
In such a case, the default parameters are used.
180+
To convert from any object to an event log, the following method can be used:
181+
182+
183+
```python
184+
import pm4py
185+
if __name__ == "__main__":
186+
event_log = pm4py.convert_to_event_log(dataframe)
187+
```
188+
189+
190+
To convert from any object to an event stream, the following method can be used:
191+
192+
193+
```python
194+
import pm4py
195+
if __name__ == "__main__":
196+
event_stream = pm4py.convert_to_event_stream(dataframe)
197+
```
198+
199+
200+
To convert from any object to a dataframe, the following method can be used:
201+
202+
203+
```python
204+
import pm4py
205+
if __name__ == "__main__":
206+
dataframe = pm4py.convert_to_dataframe(dataframe)
207+
```
208+
209+
210+
211+
212+
## Exporting IEEE XES files
213+
214+
215+
Exporting an Event Log object to an IEEE Xes file is fairly straightforward in pm4py.
216+
Consider the example code fragment on the right, which depicts this
217+
functionality.
218+
219+
220+
```python
221+
import pm4py
222+
if __name__ == "__main__":
223+
pm4py.write_xes(log, 'exported.xes')
224+
```
225+
226+
227+
In the example, the
228+
`log`
229+
object is assumed to be an Event Log object.
230+
The exporter also accepts an Event Stream or DataFrame object as an input.
231+
However, the exporter will first convert the given input object into an Event Log.
232+
Hence, in this case, standard parameters for the conversion are used.
233+
Thus, if the user wants more control, it is advisable to apply the conversion to
234+
Event Log, prior to exporting.
235+
236+
237+
238+
## Exporting logs to CSV
239+
240+
241+
To export an event log to a
242+
`csv`,
243+
244+
-file, pm4py uses Pandas.
245+
Hence, an event log is first converted to a Pandas Data Frame, after which it is
246+
written to disk.
247+
248+
249+
250+
```python
251+
import pandas as pd
252+
import pm4py
253+
254+
if __name__ == "__main__":
255+
dataframe = pm4py.convert_to_dataframe(log)
256+
dataframe.to_csv('exported.csv')
257+
```
258+
259+
260+
261+
In case an event log object is provided that is not a dataframe, i.e., an Event Log
262+
or Event Stream, the conversion is applied, using the default parameter values,
263+
i.e., as presented in the
264+
Converting
265+
Event Data (#item-convert-logs)
266+
section.
267+
Note that exporting event data to as csv file has no parameters.
268+
In case more control over the conversion is needed, please apply a conversion to
269+
dataframe first, prior to exporting to csv.
270+
271+
272+
273+
## I/O with Other File Types
274+
275+
276+
At this moment, I/O of any format supported by Pandas (dataframes) is implicitly
277+
supported.
278+
As long as data can be loaded into a Pandas dataframe, pm4py is reasonably able to work
279+
with such files.

0 commit comments

Comments
 (0)