Cleaning Antipatterns in SQL Query Log

  • Autor:

    Natalia Arzamasova, Martin Schäler, Klemens Böhm

  • Quelle:

    IEEE Transactions on Knowledge and Data Engineering (2018). DOI: 10.1109/TKDE.2017.2772252

  • Datum: 10.11.2017
  • Download pdf

     

    This is the supplementary material for the article "Cleaning Antipatterns in an SQL Query Log".
    The framework described in the paper is a workflow of the cleaning process. The purpose of our framework is to analyze query logs. Depending on the analysis target, we intend to find query templates or patterns (series of query templates) within the log, or identify and solve antipatterns. Figure to the right shows the respective workflow. Rectangular boxes stand for input data, rounded boxes for processing steps, and gray boxes for results. There are several results extracted from a SQL log. In contrast, the log is the only input. This current section is supposed to give the reader an impression of the potential of our solution as well as an impression of how to use it. It also describes parameters of our framework. We now discuss some relevant details.

Supplementary materials

Test dataset

An example of an input data one can load from here: DataSet

DB description

We process all the operations within Oracle DBMS. Hence, let us provide our database scheme below (not all the attributes are listed, only the most important ones).

Table name Description
T_ORIG_LOG Contains original (raw) log
PARSED_STATEMENTS Contains original log after parsing together with their skeleton parts (SELECT, FROM, WHERE)
FROM_WHERE_STATEMENTS Contains query templates with their skeleton parts, frequency and user popularity
FREQUENCY_TABLE Contains patterns
T_TRANSFORMED_LOG Contains cleaned query log
Column name Type Description
Table: T_ORIG_LOG    
SEQ NUMBER(10,0) Unique numeric identifier of a query
THETIME TIMESTAMP(0) Timestamp when a query was completed
STATEMENT VARCHAR2(4000 BYTE) Query
Table: PARSED_STATEMENTS    
SEQ NUMBER(10,0) Unique numeric identifier of a query
THETIME TIMESTAMP(0) Timestamp when a query was completed
STATEMENT VARCHAR2(4000 BYTE) Query
SELECT_STATEMENT VARCHAR2(4000 BYTE) Skeleton form of SELECT clause
FROM_STATEMENT VARCHAR2(4000 BYTE) Skeleton form of FROM clause
WHERE_STATEMENT VARCHAR2(4000 BYTE) Skeleton form of WHERE clause
WHERE_ST_REAL VARCHAR2(4000 BYTE) WHERE clause
STAT_ID NUMBER Reference to ID in table FROM_WHERE_STATEMENTS, a table of query templates.
CAN_BE_STIFLE NUMBER Indicator, if a query can be a Stifle antipattern
Table: FROM_WHERE_STATEMENTS    
ID NUMBER Identificator of a query template
WHERE_TEMPLATE VARCHAR2(4000 BYTE) Skeleton form of WHERE clause
FROM_TEMPLATE VARCHAR2(4000 BYTE) Skeleton form of FROM clause
SELECT_TEMPLATE VARCHAR2(4000 BYTE) Skeleton form of SELECT clause
COUNT NUMBER Frequency
DISTINCT_IPS_COUNT NUMBER User popularity
Table: FREQUENCY_TABLE    
FIRST_STATEMENT NUMBER Identificator of a first statement in a pattern
SECOND_STATEMENT NUMBER Identificator of a first statement in a pattern
COUNT NUMBER Frequency
ANTIPATTERNTYPE NUMBER

Type of antipattern.

This is a mask field. 1 = DS-Stifle

2 = DS-Stifle

4 = DF-Stifle

8 = CTH candidate
Table: T_TRANSFORMED_LOG    
SEQ NUMBER(10,0) Unique numeric identifier of a query
THETIME TIMESTAMP(0) Timestamp when a query was completed
STATEMENT VARCHAR2(4000 BYTE) Query
Table: STATISTICS​    
ROWS_BEFORE NUMBER(10,0) An amount of queries in the original log
ROWS_AFTER NUMBER(10,0) An amount of queries in the transformed log
ROWS_MODIFIED NUMBER(10,0) An amount of queries in the transformed log which were rewritten in order to solve antipatterns

 

 

Scripts

The links to the scripts together with description are listed below:

Link Description
CreateDB Code to create the Oracle database for the framework
Run Code to run the framework

After creating the database with CreateDB script, import the data from the sample dataset into table T_ORIG_TABLE. If you need to run the framework on your data, you need to export your data into the corresponding table. After initializing the table with the data use Run script.