CREATE TABLE

The CREATE TABLE statement is used to create a new table in a specific namespace. The table is the central object of a relational schema and contains data in rows and columns.

Syntax

The syntax for the CREATE TABLE statement is:

CREATE TABLE [ IF NOT EXISTS ] table_name (
    column_definition,
    column_definition,
    column_definition,
    ....
    constraint_definition,
    constraint_definition,
    ....
)
[ AS query ]
[ ON STORE store_name ]
[ PARTITION BY partitioning_information ]

In this syntax:

  • IF NOT EXISTS is optional. If included, a table with the given name will not be created if it already exists.
  • table_name is the name of the table you wish to create.
  • column_definition defines a column in the table. The syntax is described in its own section.
  • constraint_definition defines the constraints imposed on the data stored in the table. The syntax is described in its own section.
  • AS query is optional and specifies a query that provides the rows to populate the new table.
  • ON STORE store_name is optional. It specifies on which store the table should be created. If omitted, Polypheny selects the data store(s) on which the table is physically stored.
  • The optional PARTITION BY clause specifies a method of horizontally partitioning the table. The syntax is described in its own section.

Please note:

  • You need the necessary privileges to create a table.
  • The table names must be unique within the namespace.

Column Definition

In the CREATE TABLE statement, a column is defined by a column definition, which provides the name, the data type, and other attributes for the column. Here’s the general form:

column_name data_type [ NULL | NOT NULL ] [ DEFAULT expression ] 
[ GENERATED ALWAYS AS ( expression ) [ VIRTUAL | STORED ]]
[ COLLATE { CASE SENSITIVE | CASE INSENSITIVE }]

Details of each element are as follows:

  • column_name: The name of the column. Each column in a table should have a unique name.
  • data_type: The data type of the column.
  • NULL or NOT NULL: These are optional and specify whether the column can accept null values. By default, columns are allowed to contain null values. If NOT NULL is specified, the column cannot contain a null value.
  • DEFAULT expression: This is optional and sets a default value for the column when no value is specified at the time of row insertion. The default expression must match the data type of the column.
  • GENERATED ALWAYS AS (expression) [VIRTUAL | STORED]: This is optional and specifies a generated column. A generated column is a column that is computed from an expression that can use other columns in the same table. The keyword ALWAYS is required. The optional VIRTUAL or STORED keyword indicates how the generated column’s values are computed or stored.
  • COLLATE CASE SENSITIVE or COLLATE CASE INSENSITIVE: This is optional and specifies the collation of the column. CASE SENSITIVE means that the column distinguishes between uppercase and lowercase characters. CASE INSENSITIVE means that the column does not distinguish between uppercase and lowercase characters.
GENERATED columns are not yet supported in the current version of Polypheny. Stay tuned!

Here’s an example of a CREATE TABLE statement with various column definitions:

CREATE TABLE employees (
    id INTEGER NOT NULL,
    first_name VARCHAR(50) NOT NULL,
    last_name VARCHAR(50) NOT NULL,
    birth_date DATE NULL,
    salary_level INTEGER DEFAULT 1,
    salary REAL GENERATED ALWAYS AS (hours_worked * hourly_rate) STORED,
    employee_name VARCHAR(101) GENERATED ALWAYS AS (CONCAT(first_name, ' ', last_name)) VIRTUAL,
    email VARCHAR(100) COLLATE CASE INSENSITIVE,
    PRIMARY KEY (id)
);

This example creates a table named ‘employees’ with the following columns:

  • id: An INTEGER data type, this field cannot contain null values (NOT NULL). Typically, id is used as a unique identifier for each record in the table. See next section on how to define a primary key constraint.
  • first_name: A VARCHAR(50) data type, this field represents the first name of an employee and cannot contain null values (NOT NULL).
  • last_name: A VARCHAR(50) data type, this field represents the last name of an employee and also cannot contain null values (NOT NULL).
  • birth_date: A DATE data type, this field stores the birth date of an employee. It’s allowed to contain null values (NULL).
  • salary_level: An INTEGER data type, this field denotes the salary level of an employee. It’s set to 1 by default (DEFAULT 1), which means if no value is specified, 1 will be used.
  • salary: A REAL data type, this field is computed based on hours_worked and hourly_rate. It’s always calculated as hours_worked * hourly_rate and the result is stored in the database (STORED). Note, hours_worked and hourly_rate should be other fields in the table.
  • employee_name: A VARCHAR(101) data type, this field is computed from first_name and last_name. It’s always formed by concatenating first_name and last_name with a space in between (CONCAT(first_name, ' ', last_name)). The result is not stored in the database (VIRTUAL), which means it is calculated on the fly whenever it is queried.
  • email: A VARCHAR(100) data type, this field contains the email of the employee. The COLLATE CASE INSENSITIVE directive specifies that any string comparison on this field is case insensitive. This means that John.Doe@example.com and john.doe@example.com would be considered the same.

Constraints Definition

Constraints are utilized to enforce rules on the data within a table. Their purpose is to constrain the type of data that can be stored in a table, ensuring the accuracy and reliability of the data within the table.

Polypheny supports thy following kinds of constraints:

  • PRIMARY KEY Constraint: This serves as a unique identifier for each entry in a database table. Primary keys must hold unique values and cannot contain null values. They can consist of one or more columns; when more than one column is used, it is commonly referred to as a composite primary key. Each table can only have one primary key.
  • FOREIGN KEY Constraint: This constraint establishes relationships between tables and prohibits any actions that would disrupt these connections. A foreign key in one table corresponds to a primary key in another table.
  • UNIQUE Constraint: This constraint guarantees that all column values are distinct. It can also be applied across a set of columns, in which case the combination of values within the set must be unique. An unlimited number of unique constraints can be established. In contrast to primary keys, columns under a unique constraint can contain NULL values. However, each NULL is deemed different, with multiple NULLs being seen as unique in relation to one another.
  • CHECK Constraint: This constraint confirms that all values in a column meet specified conditions. An unlimited number of check constraints can be created.
CHECK constraints are not yet supported in the current version of Polypheny. Stay tuned!

These constraints can be added or removed later using the ALTER TABLE statement. The following example illustrates the syntax for defining constraints as part of the create table statement:

CREATE TABLE employees (
    id INTEGER NOT NULL,
    email VARCHAR(50) NOT NULL,
    first_name VARCHAR(50),
    last_name VARCHAR(50),
    birth_date DATE,
    manager_id INTEGER,
    PRIMARY KEY(id),
    UNIQUE (email),
    CHECK (birth_date < CURRENT_DATE),
    FOREIGN KEY (manager_id) REFERENCES employees(id)
);

In this example:

  • id is a primary key and cannot contain null values.
  • email cannot contain null values and must be unique.
  • birth_date must be a date prior to the current date.
  • department_id is a foreign key that refers to the id field of the departments table.
  • manager_id is a foreign key that refers to the id field of the same employees table.

To define names for the individual constraints, the definition can be prefixed by the keyword CONSTRAINT and a name:

CREATE TABLE employees (
 id INTEGER NOT NULL,
 email VARCHAR(50) NOT NULL,
 first_name VARCHAR(50),
 last_name VARCHAR(50),
 birth_date DATE,
 manager_id INTEGER,
 CONSTRAINT cstr_primary PRIMARY KEY(id),
 CONSTRAINT cstr_email_unique UNIQUE(email), 
 CONSTRAINT cstr_bday CHECK(birth_date < CURRENT_DATE),
 CONSTRAINT fk_manager FOREIGN KEY (manager_id) REFERENCES employees(id)
);

Horizontal Partitioning

Horizontal partitioning allows to split a table into multiple chunks of rows. It can improve query performance and aid in managing large data sets. In a CREATE TABLE statement, partitioning is specified using the PARTITION BY clause. See Data Partitioning to learn more on data partitioning in Polypheny.

In Polypheny, three methods of partitioning are supported:

  1. Basic partitioning defined by an identifier.
  2. Range-based partitioning.
  3. Temperature-based partitioning.

Syntax:

PARTITION BY { identifier | RANGE | TEMPERATURE (partition_column) }
    (
        PARTITION partition_name VALUES (value) [, ...]
    ) [, ...]
    [USING FREQUENCY 
        {
            ALL | WRITE | READ 
        } INTERVAL interval_value interval_unit
        WITH num_partitions identifier PARTITIONS
    ]

Basic Partitioning

You specify the partition column, and optionally the number of partition groups and their names.

Example:

PARTITION BY id

Range-Based Partitioning

Range-based partitioning allows you to specify partition boundaries based on column values.

Example:

PARTITION BY RANGE (date) (
    PARTITION p0 VALUES (2020-01-01),
    PARTITION p1 VALUES (2020-07-01),
    PARTITION p2 VALUES (2021-01-01)
)

Temperature-Based Partitioning

Temperature-based partitioning enables you to define partitions based on access patterns and frequency of the data. Hot data (frequently accessed) is stored separately from cold data (infrequently accessed), improving overall performance.

The syntax specifies the access pattern (ALL, WRITE, READ), the interval of data access frequency, the number of partitions, and the internal partition function.

Example:

PARTITION BY TEMPERATURE (date) (
    PARTITION p0 VALUES (10 PERCENT_REMAINDER),
    PARTITION p1 VALUES (90 PERCENT_REMAINDER)
)
USING FREQUENCY ALL INTERVAL 5 MINUTE
WITH 2 RANGE PARTITIONS

In this example, the date column is used for partitioning. Two partitions (p0 and p1) are defined, where p0 contains the hot (frequently accessed) 10% of the data and p1 contains the remaining 90% of the data. The USING FREQUENCY clause specifies that the frequency of data access is measured across all types of operations (ALL) and updated every 5 minutes. The WITH clause specifies that there will be 2 range partitions.

Examples

Here are a few examples showing the use of the CREATE TABLE statement:

Example 1: Simple table creation

CREATE TABLE students (
    id INTEGER NOT NULL,
    name VARCHAR(50) NOT NULL,
    age INTEGER DEFAULT 18,
    gpa REAL NOT NULL,
    credits INT DEFAULT 0,
    gradePoints REAL DEFAULT 0
);

This statement creates a new table named ‘students’. It includes columns for id, name, age, gpa, credits, and gradePoints.

Example 2: Table creation with query

CREATE TABLE highAchievers AS
SELECT id, name FROM students WHERE gpa >= 3.5;

This statement creates a new table named ‘highAchievers’ and populates it with rows from the ‘students’ table where the gpa is 3.5 or higher.

Example 3: Table creation with partitioning

CREATE TABLE students (
    id INTEGER NOT NULL,
    name VARCHAR(50) NOT NULL,
    age INTEGER DEFAULT 18,
    gpa REAL NOT NULL,
    credits INTEGER DEFAULT 0,
    gradePoints REAL DEFAULT 0,
    PRIMARY KEY (id)
)
PARTITION BY RANGE (age) (
    PARTITION p0 VALUES LESS THAN (20),
    PARTITION p1 VALUES LESS THAN (25),
    PARTITION p2 VALUES LESS THAN (30),
    PARTITION p3 VALUES LESS THAN MAXVALUE
);

This statement creates a partitioned table ‘students’. The table is partitioned based on the age column, with different partitions for ages less than 20, 25, 30, and a catch-all partition for ages 30 and above.

© Polypheny GmbH. All Rights Reserved.