Contextual Query Language

Polypheny extends the traditional CQL Specification by introducing several new features and adjustments to enhance query detail and clarity. This documentation provides an overview of the custom CQL implementation, including the absence of prefix assignments and search-term-only filters, and the introduction of additional keywords.

Polypheny implements namespaces to function as Context Sets, containing tables, which themselves host columns. The roles of these entities align with the “index” definition in the standard CQL specification. Operations performed on columns are referred to as ‘Filters’, while the outcomes of combined operations (such as Joins or Set Operations) on tables are designated as ‘Relations’.

Information: The integration of CQL into Polypheny was primarily executed as part of Vishal Dalwadi’s Google Summer of Code 2021 project, entitled Support for Contextual Query Language.

Basics

CQL can be executed using the query console provided by the Polypheny-UI. Additionally, Polypheny also supports an HTTP-Interface, which is able to handle CQL queries. After deploying the interface, CQL queries can be submitted via HTTP POST request to the specified port (default: 13137) and the corresponding route (/cql). The query needs to be placed in the body of the POST request. The result is returned as JSON.

Polypheny’s CQL implementation uses fully qualified names instead of indices. For example, column names like public.emps.emp and table names like public.emps.

Syntax

Polypheny’s CQL Implementation consists of four major parts: Filters, Relations, Sort Specifications and Projections. The format of each of these is discussed later. The basic syntax of a query consisting of these four parts is shown below. The parser is case-insensitive when it comes to keywords (such as modifiers, boolean operators or comparison operators) but is case-sensitive when it comes to names and literals (such column names, table names or literal values in filters).

CQL Query:
    (
        Filters
    |
        relation Relation
    |
        Filters relation Relation
    )
    [ sortby SortSpecification ]
    [ project Projection ]

Filters:
    ( Filters ) | Filter BooleanOp Filter | Filter

Relation:
    Table Combiner Relation | Table

SortSpecification:
    Column Modifiers

Projection:
    Column Modifiers

Filter:
    Column ComparisonOp ( Column | Literal value with or without double quotes )

BooleanOp:
    (AND | OR | NOT | PROX)
    Modifiers

Combiner:
    (AND | OR)
    Modifiers

Column:
    NamespaceName.TableName.ColumnName

Modifiers:
    ( Modifier )*

Modifier:
    / ModifierName
    [ ComparisonOp ModifierValue ]

ComparisonOp:
    (= | == | <> | < | > | <= | >= | NamedComparator)
    Modifiers

NamedComparator:
    String containing only alphabets.

Table:
    NamespaceName.TableName

NamespaceName, TableName, ColumnName:
    String containing only alphabets or underscores (_).

ModifierName:
    String without double quotes, white spaces, escaped double quote, parenthesis, =, <, > or /.

ModifierValue:
    String value with or without double quotes.

Filters

Filters are used to do comparisons on a column in the Relation. They are similar to SQL’s WHERE clause. If Relation is specified, the column must be from the relation specified. Comparisons can be between literal values or columns; however, support for comparisons with columns is still underway. Comparisons like equals, not equal, less than, greater than, less than or equals and greater than or equals can be done with support for more (between, any, all, etc.) to be added later. Multiple filters can be separated by AND, OR, NOT or PROX boolean operators; however, support for PROX is still underway. Boolean operators’ precedence depend on their position; i.e. First occurring boolean operator has higher precedence than those occurring later. However, this can be change by using parenthesis.

Relation

Relation is the final table that the query would be executed on. So the relation is a combination of multiple tables. The combination operation can be a join, union, intersection, set-difference, etc; however, implementation of set operations as combiners is still underway. The actual combiner keyword used is AND or OR. For joins, AND means a INNER join, whereas OR means a FULL, LEFT or RIGHT join. The combiner also takes modifiers used to modify its execution. The two modifiers currently supported are: null and on.

The null modifier is to be used with OR combiner to specify which of the rows can be null. Possible values of null modifier are: both (FULL join), right (LEFT join) or left (RIGHT join).

The on modifier is used to specify the columns to join on. It only works if the column(s) belong to both the tables. Possible values of on modifier are: all (finds the common columns between two tables; Default for AND), none (Default for OR), comma-separated list of column names (for example, ‘name,id’).

Sort Specification

Sort specification is used to specify a space separated column list on which to sort the query output. These columns must be projected if a projection clause is specified.

Projection

Projection is used to specify the columns for the result. It is also used for aggregations and grouping.

Example Queries

Consider a namespace “public” with tables “employee” and “dept” defined as follows:

CREATE TABLE public.dept(
deptno TINYINT NOT NULL,
deptname VARCHAR(30) NOT NULL,
PRIMARY KEY (deptno) );

CREATE TABLE public.employee(
empno INTEGER NOT NULL,
empname VARCHAR(20) NOT NULL,
salary REAL NOT NULL,
deptno TINYINT NOT NULL,
married BOOLEAN NOT NULL,
dob DATE NOT NULL,
joining_date DATE NOT NULL,
PRIMARY KEY (empno) );

Then the following CQL queries can be executed on the namespace.

Find employee named “Loki”:

public.employee.empname == "Loki"

Find all employees in the HR department that are married:

public.dept.deptname == "HR" and public.employee.married == TRUE

Find all employees from the HR or IT department:

public.dept.deptname == "HR" or public.dept.deptname == "IT" relation public.employee and public.dept

Find all employees from all departments except HR:

public.employee.empno >= 1 NOT public.dept.deptname == "HR"

Count the number of employees:

relation public.employee project public.employee.empno/count

Get all the employee names sorted by date of birth:

relation public.employee sortby public.employee.dob project public.employee.empname

Count the number of employees in each department:

relation public.employee project public.employee.empno/count public.employee.deptno

Future Directions and Desirable Features

Looking ahead, there are several promising enhancements and features that will further refine the functionality and user experience of Polypheny’s CQL:

  • Combiner’s getCommonColumns Optimization: By creating a cache, the performance of this method can be substantially improved.
  • Column Filters Support: This will provide a more versatile and refined filtering mechanism.
  • Set Operations Support: This will allow for more complex query operations.
  • Proximity Boolean Operator Support: This will enable advanced, proximity-based search operations.
  • Querying the Result of a Query: This will allow for nested querying, leading to more complex and comprehensive data extraction.
  • Modifiers for Sorting, Projection, and Filtering: These will allow for further customization and fine-tuning of data operations.
  • Enhanced Test Coverage: To ensure robustness and reliability of the CQL, comprehensive testing will be essential.
Licensed under CC BY-SA