In the past years, the chair has devoted much attention to data analytics, i.e., generating interesting insights from large bodies of data, and business process management. This relates to the three pillars of KIT - research, education and innovation. While research and education mainly are described elsewhere, details on innovation follow.
KIT Facility Management
At KIT there is an installation of smart meters allowing to collect energy-consumption data systematically. So far, KIT mainly uses the data collected by these meters to bill tenants. However, one would also like to learn more from this data, including regularities and peculiarities of energy consumption within KIT or problems with the energy-provisioning infrastructure. The data at KIT is collected by hundreds of meters that report one or several values to a sink every quarter of an hour, for several years by now. Several values are reported if it is not only electricity that is measured, but also other modes of energy/supply such as heat or water. By analyzing this data, we have been able to identify meters not working properly. This is a significant improvement - meters currently are regularly inspected 'by hand', in order to spot broken ones. Another issue is that high-voltage current is not measured directly; instead, voltage transformers generate a smaller electrical current by means of induction, with a fixed scaling factor. By deploying conventional data-mining algorithms on the consumption data, we have detected that the setting of this parameter for one meter has been flawed.
Measurement Data from an Office Building
An office building in Frankfurt is instrumented with sensors measuring various attributes such as temperature or air moisture, at different positions within the building or even within individual offices, but also collecting information on user behavior, e.g., is a certain window currently open. This data is collected and currently analyzed by domain experts intellectually, in order to improve the steering component of the climate-control and other infrastructure units. This in turn reduces the overall energy consumption of the building. By analyzing the data with new methods developed at our chair, we have identified correlations between sets of attributes systematically, without manual intervention. Some of these correlations have been known to the individuals familiar with the building infrastructure (thus confirming the validity of our method), others in turn have been new to them. The domain experts are currently trying to understand the new findings, in order to reach the goals mentioned earlier.
Localizing Defects in Software
Reducing the costs of localizing defects in software continues to be a fundamental problem. Our focus has been on non-terminating bugs that only occur in some executions of the program in question, i.e., defects that are notoriously hard to localize. Our approach is to collect information on program executions that are correct as well as on those that are not, and to compare this data by means of data-mining methods. By doing so, we have been able to localize defects (i.e., identify functions whose code contains the defect) with a high degree of correctness. Our method not only finds known kinds of defects (i.e., our method is valid), it also has identified a new kind of defect in parallel programs which has not been described in the scientific literature before.
In order to be successful, an online shop must use sophisticated pricing policies and must analyze the behavior of its customers systematically. We have carried out several analyses for one shop, with different objectives, using data-mining methods. One issue has been the prediction of return shipments of ordered goods. This is crucial for optimal stock planning. Further, we also have computed the likelihood of a visitor of an online shop placing an order, based on his series of clicks during the visit. A third issue we have addressed successfully is the prediction of quantities of items sold. We have done this mainly based on past sales and prices of items. This has been a prerequisite for the shop to both optimize stock planning as well as pricing. At the technical level, a problem that we faced in all cases had been that the volume of data ultimately available to us has been relatively small. A core difficulty has been to extract useful information from the data.
For a comprehensive list of relevant lectures, see Education.
There is one unbiased, relevant indicator of the quality of our teaching efforts in this field: As part of the data analytics lab course that takes place each summer semester, a group of students (Master level mainly) has participated in the Data Mining Cup each time for some years. The DMC arguably is the most prestigious competition for students in the field worldwide. Assignments typically are prediction tasks based on real data from a company, in the spirit of Optimizing Stockkeeping and Pricing for an Online Shop. Performance of participating teams is measured by means of objective benchmarks. The number of participating teams has been more than 100 in the past years, from all over the world, with two teams max per academic institution. Our teams, which have consisted of different individuals each year, have obtained the following outstanding standings:
- 3rd place - coupon effectivenes
- 1st place (Task2) - purchase prediction
- 3rd place - predict number of items sold
- 3rd place - purchase prediction based on web clicks
- 2nd place - prediction of return shiphments
- 10th place - prediction for the redeemed coupons and for the shopping basket value for new orders in a shop
- 10th place - revenue prediction of a mail-order pharmacy
Next to home-grown solutions, participants of the lab course have worked with SPSS Modeler, Weka, R, KNIME and, for data-warehousing functionality, Cognos.
Business Process Management (BPM)
Verification of Industrial Testing Processes
Testing in the automotive industry is supposed to guarantee that vehicles are shipped without any flaw. Respective processes are complex, due to the variety of components and electronic devices in modern
vehicles. To achieve error-free processes, their formal analysis is required. To execute full-state verification techniques like model checking, the state space of the process needs to be constructed. This tends to increase exponentially with the size of the process schema. To address this issue, we have designed a domain-specific reduction technique that reduces this state space, without changing the result of the verification. With this reduction, even complex processes with many parallel branches from our project partner from the automotive industry can be verified in less than 10 s. Another innovation is that we have developed a framework that allows to easily specify and maintain the many properties the processes must satisfy. Expert interviews have shown that our framework is indeed user-friendly and well suited to operate in a real production environment.
While this work has been carried out together with a partner from the automotive industry, the techniques designed and evaluated bear great potential for other manufacturing or service-oriented industries.
Automated Design of Industrial Testing Processes
If testing processes in the automotive industry are designed by hand, they tend to contain bugs, which might lead to damages of the vehicles being tested, incur unnecessary costs etc. To this end, for the specific setting of an automotive manufacturer, we have designed a method that generates the testing processes automatically from a given set of tasks, taking any predefined dependencies between these tasks into account. For this setting, we have demonstrated that (a) the duration of this generation is negligible, and (b) the processes generated automatically tend to be significantly better (here: faster) than processes designed by domain experts by hand.
Modeling Secure Business Processes
A specification of business processes should not be confined to the arrangement of tasks, it should also include the information who is allowed to carry out a certain task, under which circumstances, and to access certain data objects. In general, such privileges depend on the current state of the process, i.e., are context-specific. To illustrate, think of a company supporting companies in developing the careers of their employees and employees in developing their skills. An important process in this company collects information on existing skills, validates it and puts it in a repository in a unified format. Here, an example of a security constraint is that certain validations cannot be done by the same person. Another example is to enforce users being informed that someone has accessed their personal data. We have systematically modeled that and related processes of the company, together with their security constraints, using a language we have developed ourselves. Such a modeling is the prerequisite for an efficient implementation (where tools and components developed by us can again be used).