Event data are everywhere!
Data are collected about anything, at any time, and at any place.
Operational processes in finance, insurance, government,
healthcare, production, logistics, education, and maintenance are
no exception. A starting point for process mining is the event
data collected by the information systems supporting such
processes.
Data for process mining
Process mining assumes the existence of an event log where each event
refers to a case, an activity, and a point in time. An event log can
be seen as a collection of cases and a case can be seen as a
trace/sequence of events.
Event data may come from a wide variety of sources:
- a database system (e.g., patient data in a hospital),
- a comma-separated values (CSV) file or spreadsheet,
- a transaction log (e.g., a trading system),
- a business suite/ERP system (SAP, Oracle, etc.),
- a message log (e.g., from IBM middleware),
- an open API providing data from websites or social media,
- ...
Formats of event data
XES
eXtensible Event Stream (XES) is the standard format for process
mining supported by the majority of process mining tools. XES was
adopted in 2010 by the
IEEE Task Force on Process Mining
as the standard format for logging events. It has become an official
IEEE standard
in 2016.
Currently, there are over 25 commercial process mining tools. The
adoption of process mining has been accelerating in recent years.
Tools like Disco (Fluxicon), Celonis Process Mining, ProcessGold
Enterprise Platform, Minit, myInvenio, Signavio Process Intelligence,
QPR ProcessAnalyzer, LANA Process Mining, Rialto Process, Icris
Process Mining Factory, Worksoft Analyze & Process Mining for SAP, SNP
Business Process Analysis, web-Methods Process Performance Manager,
and Perceptive Process Mining are now available. Moreover, open-source
tools like ProM, ProM Lite, and RapidProM are widely used. It is vital
that event data can be exchanged between these tools. Several of these
tools already support XES. For example, it is easy to exchange XES
data between Disco, Celonis, ProM, Rialto Process, minit, and SNP.
Purpose:
The purpose of this standard is to provide a generally acknowledged
XML format for the interchange of event data between information
systems in many application domains on the one hand and analysis tools
for such data on the other hand. As such, this standard aims to fix
the syntax and the semantics of the event data which, for example, is
being transferred from the site generating this data to the site
analyzing this data. As a result of this standard, if the event data
is transferred using the syntax as described by this standard, its
semantics will be well understood and clear at both sites.
Available data sets in XES:
- Hospital Billing - Event Log
- Sepsis Cases - Event Log
- Road Traffic Fine Management Process
- BPIC 2020 (BPI Challenge 2020)
- Purchase order handling process (BPI Challenge 2019)
- Payment process of Common Agricultural Policy (BPI Challenge 2018)
- Loan application process of a Dutch financial institute (BPI Challenge 2017)
- Municipality log 1 (BPI Challenge 2015)
- Municipality log 2 (BPI Challenge 2015)
- Municipality log 3 (BPI Challenge 2015)
- Municipality log 4 (BPI Challenge 2015)
- Municipality log 5 (BPI Challenge 2015)
- Incident management log (BPI Challenge 2013)
- Problem management log, open problems (BPI Challenge 2013)
- Problem management log, closed problems (BPI Challenge 2013)
- Event log of a loan application process (BPI Challenge 2012)
- Anonymized event log of a Dutch Academic Hospital (BPI Challenge 2011)
Object-Centric Event Logs (OCEL)
Input for process mining is an event log. A traditional event log views a process from a particular angle provided by the case notion that is used to correlate events. Each event in such an event log refers to (1) a particular process instance (called a case), (2) an activity, and (3) a timestamp. There may be additional event attributes referring to resources, people, costs, etc., but these are optional. With some effort, such data can be extracted from any information system supporting operational processes. Process mining uses these event data to answer a variety of process-related questions.
The assumption that there is just one case notion and that each event
refers to precisely one case is problematic in real-life processes.
Therefore, we drop the case notion and assume that an event can be
related to any number of objects. In such an object-centric event log,
we distinguish different order types (e.g., orders, items, packages,
customers, and products). Each event has three types of attributes:
- Mandatory attributes like activity and timestamp.
- Per object type, a set of object references (zero or more per object type).
- Additional attributes (e.g., costs, etc.).
Purpose:
The purpose of the OCEL standard is to provide a general standard to
interchange object-centric event data with multiple case notions. We
set the following goals for the standard:
- Interoperability: with the provision of the OCEL standard and JSON/XML serializations of OCEL, we want to support a widespread collection of languages and systems.
- Generalization: the standard supports the storage of events, objects, and their attributes. Furthermore, the standard can be extended.
- Provision of a collection of examples: example logs, extracted from information systems supporting some widespread business processes, are provided for the OCEL standard.
- Tool/Library Support: to support the implementation of OCEL in custom applications, tool/library support shall be provided.
CSV
Ideally, event logs are stored in the standard format for process
mining XES. However, the native format is seldom and an event log.
Often Comma-Separated Values (CSV) files are used as an intermediate
format. The rows in a CSV file correspond to events and the columns to
attributes of events. There should be columns for the case identifier,
the activity name, and the timestamp of an event, but there may be
many more attributes.
ProM and most other process mining tools can convert a CSV file into
an event log by assigning columns to process mining concepts.
Available data sets in CSV:
- Purchase order handling process (BPI Challenge 2019)
- Click-data for the customers that are not logged in to the website (BPI Challenge 2016)
- Click-data for the customers that are logged in to the website (BPI Challenge 2016)
- Questions asked by customers (BPI Challenge 2016)
- Messages sent by customers (BPI Challenge 2016)
- Complaints filed by customers (BPI Challenge 2016)
- Change log (BPI Challenge 2014)
- Incident log (BPI Challenge 2014)
- Interaction log (BPI Challenge 2014)
- Incident activity log (BPI Challenge 2014)
- Incident management log (BPI Challenge 2013)
- Problem management log, open problems (BPI Challenge 2013)
- Problem management log, closed problems (BPI Challenge 2013)
- Anonymized event log of a Dutch Academic Hospital (BPI Challenge 2011)