Event logs and models used in Process Mining book

event-logs-process-mining-book.zip
Chapter 1
chapter_1.zip

The files

  • running-example.csv
  • running-example.xls
  • running-example.xes
  • running-example.mxml

contain the event log shown in Table 1.1 (and Table 1.2). The xes and mxml files can be loaded into ProM and used to discover the process model shown in Figure 1.5 (e.g., using the alpha algorithm).

Sometimes (e.g., in Tables 1.2 and 1.3) short names are used: a = register request, b = examine thoroughly, c = examine casually, d = check ticket, e = decide, f = reinitiate request, g = pay compensation, and h = reject request.

The files

  • running-example.pnml
  • running-example-v52.pnml
  • running-example-v52.tpn

contain the process model shown in Figure 1.5. The model can be imported into ProM and used for different types of analysis (e.g., verification and conformance checking).

The files

  • running-example-just-two-cases.csv
  • running-example-just-two-cases.xls
  • running-example-just-two-cases.xes
  • running-example-just-two-cases.mxml

contain the event log used to construct the process model in Figure 1.6 (cases 1 and 4).

The files

  • running-example-just-two-cases.pnml
  • running-example-just-two-cases-v52.pnml
  • running-example-just-two-cases-v52.tpn

contain the process model shown in Figure 1.6. The model can be imported into ProM and used for different types of analysis (e.g., verification and conformance checking).

The files

  • running-example-non-conforming.csv
  • running-example-non-conforming.xls
  • running-example-non-conforming.xes
  • running-example-non-conforming.mxml

contain the event log shown in Table 1.3. This log is used to illustrate the notion of conformance checking. For example, a conformance check of this event log and running-example.pnml will show that cases 7, 8 and 10 deviate.

Note that the above files are not representative for real-life event logs and models in terms of size and complexity. However, because of their simplicity, they serve as a nice illustration of the basic concepts.

Chapter 5
chapter_5.zip

The twelve logs named L1-L12 used in Chapter 5 are included. Per event log there are three files:

  • a .txt file describing the event log
  • a .xes file that can be loaded into ProM 6
  • a .mxml file that can be loaded into ProM 6 and earlier versions of ProM (e.g., ProM 5.2)

A short description of these events logs:

  • L1 is the log used to discover the WF-net N1 in Figure 5.1.
  • L2 is the log used to discover the WF-net N2 in Figure 5.2.
  • L3 is the log used to discover the WF-net N3 in Figure 5.5.
  • L4 is the log used to discover the WF-net N4 in Figure 5.6.
  • L5 is the log used to discover the WF-net N5 in Figure 5.8.
  • L6 is the log used to discover the WF-net N6 in Figure 5.9.
  • L7 is the log used to discover the incorrect WF-net N7 in Figure 5.10. The correct WF-net is shown in Figure 5.11.
  • L8 is the log used to discover the WF-net N8 in Figure 5.12. The correct WF-net is shown in Figure 5.13.
  • L9 is a log used to illustrate the limitations of the alpha algorithm (see Figure 5.14).
  • L10 is a log used to illustrate the limitations of the alpha algorithm (see Figure 5.20).
  • L11 is a log used to illustrate the limitations of the alpha algorithm (see Figure 5.21).
  • L12 is a log used to illustrate the dilemma related to infrequent sequences (relates to the choice between N4 and N9).

Note that the log files do not contain meaningful information related to time, resources, etc. They are intended to illustrate the alpha algorithm (and its limitations).

Also several Petri net models have been included N1-N11 (see .pmnl files). In some cases both the correct and incorrect model are included.

The files bigger-example.xes and bigger-example.mxml contain the larger event log shown in Figure 5.24. This event log contains information about 1391 cases. The model discovered by the alpha algorithm is stored in N-bigger-example.pmnl. This corresponds to WF-net N1 in Figure 5.24.

Note that event log bigger-example.xes is also used in Chapter 7.

Chapter 6
chapter_6.zip

The files L-heur-1.xes and L-heur-1.mxml contain event log L used in Section 6.2.2. This event log contains 40 cases and 139 events and is used to explain the heuristic mining algorithm.

The files L-heur-2.xes and L-heur-2.mxml contain the same event log but now the first sequence < a,e > is more frequent (50 times rather than 5 times).

One can use these event logs to apply the various process discovery algorithms (not just the heuristics miner but also the genetic miner, alpha miner, fuzzy miner, etc.).

Event log L1 (see files L1.xes and L1.mxml) is used to build the transition systems in figures 6.12, 6.13, 6.14, and 6.15. One can use ProM’s transition miner to reproduce these results.

The same log (L1.xes/L1.mxml) is used to illustrate state-based regions (see Figure 6.17).

Event log L9 (see files L9.xes and L9.mxml) is used to show that language-based regions can be used to discover the WF-net shown in Figure 6.19. (Use the proper settings to reproduce this using the ILP miner in ProM.)

Note that again these event logs contain no timestamps, resources, etc.

Chapter 7
chapter_7.zip

The files Lfull.xes and Lfull.mxml contain the event log described in Table 7.1. The log contains 1391 cases following 21 different traces (7539 events in total).

The following mapping is used: a = register request, b = examine thoroughly, c = examine casually, d = check ticket, e = decide, f = reinitiate request, g = pay compensation, and h = reject request.

The four WF-nets shown in Figure 7.2 are included in the files N1.pnml, N2.pnml, N3.pnml, and N4.pnml (.tpn files for ProM 5.2 are also included).

One can use the conformance checking in ProM 5.2 to compute the fitness values given in Chapter 7 (use tpn files and mxml logs). Note that the conformance checker also provides other conformance-related metrics (not just fitness).

One can also use a range of conformance checking plug-ins in ProM 6. Use for example the “Replay log on Petri net” plug-in in ProM 6. This plug-in is using a cost-based approach with penalties for the various discrepancies between event log and model.

Chapter 8
chapter_8.zip

The files described below are not explicitly used in chapter 8. However, the event logs they contain also contain information about resources and time. Therefore, these files can be used to apply the techniques described in Chapter 8.

The files reviewing.xes and reviewing.mxml contain an event log describing the handling of reviews for a journal. The event log consists of 100 cases (papers) and 3730 events. Each paper is sent to three different reviewers. The reviewers are invited to write a report. However, reviewers often do not respond. As a result, it is not always possible to make a decision after a first round of reviewing. If there are not enough reports, then additional reviewers are invited. This process is repeated until a final decision can be made (accept or reject). Note that this example is also used in Chapter 13 to illustrate the need for seamlessly zooming in and out (see Figures 13.6 and 13.7 show the discovered models).

Use reviewing.xes/reviewing.pnml to discover the underlying process model (already the alpha algorithm will find a good model, see for example reviewing.pnml/reviewing.tpn). Since the log also contains information about originators, contains timestamps, etc., all the process mining techniques described in the book can be applied to the event log. Start by applying the dotted chart analysis. Then use the various discovery techniques. Also discover the social network and organizational structures. Using replay, check the conformance and locate the bottlenecks.

The files teleclaim.xes and teleclaims.pnml contain an event log describing the handling of claims in an insurance company. The log contains 46138 events related to 3512 cases (claims). The process deals with the handling of inbound phone calls, whereby different types of insurance claims (household, car, etc.) are lodged over the phone. The process is supported by two separate call centers operating for two different organizational entities (Brisbane and Sydney). Both centers are similar in terms of incoming call volume and average total call handling time, but different in the way call centre agents are deployed, underlying IT systems, etc. After the initial steps in the call center, the remainder of the process is handled by the back-office of the insurance company. Although this is a synthetic event log without noise, it is difficult to mine. The alpha algorithm fails to extract the right model. The model also contains information about resources and has transactional information. Therefore, it can be used to apply the techniques discussed in Chapter 8.

Use teleclaim.xes/teleclaims.pnml to extract different process models. First apply the dotted chart analysis to understand the event log. Then use the various discovery techniques. Try to understand why most algorithms fail to discover a good process model. Also discover the social network and organizational structures. Using replay, check the conformance and locate the bottlenecks. See file teleclaim.pnml/teleclaims.tpn for an example process model.

The files repairexample.xes / repairexample.mxml and repairexamplesample2.xes / repairexamplesample2.mxml are taken from the ProM Framework Tutorial (see www.processmining.org). We refer to this tutorial for details. These logs can be used to apply most of the techniques discussed in Chapter 8.

For more example event logs we refer to www.processmining.org and http://data.3tu.nl/repository/collection:event_logs. For example, see doi:10.4121/uuid:d9769f3d-0ab0-4fb8-803b-0d1120ffcf54 for a real life log of a Dutch academic hospital. This event log was used for the first Business Process Intelligence Contest (BPIC 2011).