Cleaning Antipatterns in SQL Query Log
- Autor:
-
Quelle:
IEEE Transactions on Knowledge and Data Engineering
-
Download pdf
This is the supplementary material for the article "Cleaning Antipatterns in an SQL Query Log".
The framework described in the paper is a workflow of the cleaning process. The purpose of our framework is to analyze query logs. Depending on the analysis target, we intend to find query templates or patterns (series of query templates) within the log, or identify and solve antipatterns. Figure to the right shows the respective workflow. Rectangular boxes stand for input data, rounded boxes for processing steps, and gray boxes for results. There are several results extracted from a SQL log. In contrast, the log is the only input. This current section is supposed to give the reader an impression of the potential of our solution as well as an impression of how to use it. It also describes parameters of our framework. We now discuss some relevant details.
Supplementary materials
Test dataset
An example of an input data one can load from here: DataSet
DB description
We process all the operations within Oracle DBMS. Hence, let us provide our database scheme below (not all the attributes are listed, only the most important ones).
Table name | Description |
T_ORIG_LOG | Contains original (raw) log |
PARSED_STATEMENTS | Contains original log after parsing together with their skeleton parts (SELECT, FROM, WHERE) |
FROM_WHERE_STATEMENTS | Contains query templates with their skeleton parts, frequency and user popularity |
FREQUENCY_TABLE | Contains patterns |
T_TRANSFORMED_LOG | Contains cleaned query log |
Column name | Type | Description |
Table: T_ORIG_LOG | ||
SEQ | NUMBER(10,0) | Unique numeric identifier of a query |
THETIME | TIMESTAMP(0) | Timestamp when a query was completed |
STATEMENT | VARCHAR2(4000 BYTE) | Query |
Table: PARSED_STATEMENTS | ||
SEQ | NUMBER(10,0) | Unique numeric identifier of a query |
THETIME | TIMESTAMP(0) | Timestamp when a query was completed |
STATEMENT | VARCHAR2(4000 BYTE) | Query |
SELECT_STATEMENT | VARCHAR2(4000 BYTE) | Skeleton form of SELECT clause |
FROM_STATEMENT | VARCHAR2(4000 BYTE) | Skeleton form of FROM clause |
WHERE_STATEMENT | VARCHAR2(4000 BYTE) | Skeleton form of WHERE clause |
WHERE_ST_REAL | VARCHAR2(4000 BYTE) | WHERE clause |
STAT_ID | NUMBER | Reference to ID in table FROM_WHERE_STATEMENTS, a table of query templates. |
CAN_BE_STIFLE | NUMBER | Indicator, if a query can be a Stifle antipattern |
Table: FROM_WHERE_STATEMENTS | ||
ID | NUMBER | Identificator of a query template |
WHERE_TEMPLATE | VARCHAR2(4000 BYTE) | Skeleton form of WHERE clause |
FROM_TEMPLATE | VARCHAR2(4000 BYTE) | Skeleton form of FROM clause |
SELECT_TEMPLATE | VARCHAR2(4000 BYTE) | Skeleton form of SELECT clause |
COUNT | NUMBER | Frequency |
DISTINCT_IPS_COUNT | NUMBER | User popularity |
Table: FREQUENCY_TABLE | ||
FIRST_STATEMENT | NUMBER | Identificator of a first statement in a pattern |
SECOND_STATEMENT | NUMBER | Identificator of a first statement in a pattern |
COUNT | NUMBER | Frequency |
ANTIPATTERNTYPE | NUMBER |
Type of antipattern. This is a mask field. 1 = DS-Stifle 2 = DS-Stifle 4 = DF-Stifle 8 = CTH candidate |
Table: T_TRANSFORMED_LOG | ||
SEQ | NUMBER(10,0) | Unique numeric identifier of a query |
THETIME | TIMESTAMP(0) | Timestamp when a query was completed |
STATEMENT | VARCHAR2(4000 BYTE) | Query |
Table: STATISTICS | ||
ROWS_BEFORE | NUMBER(10,0) | An amount of queries in the original log |
ROWS_AFTER | NUMBER(10,0) | An amount of queries in the transformed log |
ROWS_MODIFIED | NUMBER(10,0) | An amount of queries in the transformed log which were rewritten in order to solve antipatterns |
Scripts
The links to the scripts together with description are listed below:
Link | Description |
CreateDB | Code to create the Oracle database for the framework |
Run | Code to run the framework |
After creating the database with CreateDB script, import the data from the sample dataset into table T_ORIG_TABLE. If you need to run the framework on your data, you need to export your data into the corresponding table. After initializing the table with the data use Run script.