The CREATE TABLE
statement is used to create a new table in a specific namespace. The table is the central object of a relational schema and contains data in rows and columns.
Syntax
The syntax for the CREATE TABLE
statement is:
CREATE TABLE [ IF NOT EXISTS ] table_name (
column_definition,
column_definition,
column_definition,
....
constraint_definition,
constraint_definition,
....
)
[ AS query ]
[ ON STORE store_name ]
[ PARTITION BY partitioning_information ]
In this syntax:
IF NOT EXISTS
is optional. If included, a table with the given name will not be created if it already exists.table_name
is the name of the table you wish to create.column_definition
defines a column in the table. The syntax is described in its own section.constraint_definition
defines the constraints imposed on the data stored in the table. The syntax is described in its own section.AS query
is optional and specifies a query that provides the rows to populate the new table.ON STORE store_name
is optional. It specifies on which store the table should be created. If omitted, Polypheny selects the data store(s) on which the table is physically stored.- The optional
PARTITION BY
clause specifies a method of horizontally partitioning the table. The syntax is described in its own section.
Please note:
- You need the necessary privileges to create a table.
- The table names must be unique within the namespace.
Column Definition
In the CREATE TABLE
statement, a column is defined by a column definition, which provides the name, the data type, and other attributes for the column. Here’s the general form:
column_name data_type [ NULL | NOT NULL ] [ DEFAULT expression ]
[ GENERATED ALWAYS AS ( expression ) [ VIRTUAL | STORED ]]
[ COLLATE { CASE SENSITIVE | CASE INSENSITIVE }]
Details of each element are as follows:
column_name
: The name of the column. Each column in a table should have a unique name.data_type
: The data type of the column.NULL
orNOT NULL
: These are optional and specify whether the column can accept null values. By default, columns are allowed to contain null values. IfNOT NULL
is specified, the column cannot contain a null value.DEFAULT expression
: This is optional and sets a default value for the column when no value is specified at the time of row insertion. The default expression must match the data type of the column.GENERATED ALWAYS AS (expression) [VIRTUAL | STORED]
: This is optional and specifies a generated column. A generated column is a column that is computed from an expression that can use other columns in the same table. The keywordALWAYS
is required. The optionalVIRTUAL
orSTORED
keyword indicates how the generated column’s values are computed or stored.COLLATE CASE SENSITIVE
orCOLLATE CASE INSENSITIVE
: This is optional and specifies the collation of the column.CASE SENSITIVE
means that the column distinguishes between uppercase and lowercase characters.CASE INSENSITIVE
means that the column does not distinguish between uppercase and lowercase characters.
Here’s an example of a CREATE TABLE
statement with various column definitions:
CREATE TABLE employees (
id INTEGER NOT NULL,
first_name VARCHAR(50) NOT NULL,
last_name VARCHAR(50) NOT NULL,
birth_date DATE NULL,
salary_level INTEGER DEFAULT 1,
salary REAL GENERATED ALWAYS AS (hours_worked * hourly_rate) STORED,
employee_name VARCHAR(101) GENERATED ALWAYS AS (CONCAT(first_name, ' ', last_name)) VIRTUAL,
email VARCHAR(100) COLLATE CASE INSENSITIVE,
PRIMARY KEY (id)
);
This example creates a table named ‘employees’ with the following columns:
id
: AnINTEGER
data type, this field cannot contain null values (NOT NULL
). Typically,id
is used as a unique identifier for each record in the table. See next section on how to define a primary key constraint.first_name
: AVARCHAR(50)
data type, this field represents the first name of an employee and cannot contain null values (NOT NULL
).last_name
: AVARCHAR(50)
data type, this field represents the last name of an employee and also cannot contain null values (NOT NULL
).birth_date
: ADATE
data type, this field stores the birth date of an employee. It’s allowed to contain null values (NULL
).salary_level
: AnINTEGER
data type, this field denotes the salary level of an employee. It’s set to1
by default (DEFAULT 1
), which means if no value is specified,1
will be used.salary
: AREAL
data type, this field is computed based onhours_worked
andhourly_rate
. It’s always calculated ashours_worked * hourly_rate
and the result is stored in the database (STORED
). Note,hours_worked
andhourly_rate
should be other fields in the table.employee_name
: AVARCHAR(101)
data type, this field is computed fromfirst_name
andlast_name
. It’s always formed by concatenatingfirst_name
andlast_name
with a space in between (CONCAT(first_name, ' ', last_name)
). The result is not stored in the database (VIRTUAL
), which means it is calculated on the fly whenever it is queried.email
: AVARCHAR(100)
data type, this field contains the email of the employee. TheCOLLATE CASE INSENSITIVE
directive specifies that any string comparison on this field is case insensitive. This means thatJohn.Doe@example.com
andjohn.doe@example.com
would be considered the same.
Constraints Definition
Constraints are utilized to enforce rules on the data within a table. Their purpose is to constrain the type of data that can be stored in a table, ensuring the accuracy and reliability of the data within the table.
Polypheny supports thy following kinds of constraints:
- PRIMARY KEY Constraint: This serves as a unique identifier for each entry in a database table. Primary keys must hold unique values and cannot contain null values. They can consist of one or more columns; when more than one column is used, it is commonly referred to as a composite primary key. Each table can only have one primary key.
- FOREIGN KEY Constraint: This constraint establishes relationships between tables and prohibits any actions that would disrupt these connections. A foreign key in one table corresponds to a primary key in another table.
- UNIQUE Constraint: This constraint guarantees that all column values are distinct. It can also be applied across a set of columns, in which case the combination of values within the set must be unique. An unlimited number of unique constraints can be established. In contrast to primary keys, columns under a unique constraint can contain NULL values. However, each NULL is deemed different, with multiple NULLs being seen as unique in relation to one another.
- CHECK Constraint: This constraint confirms that all values in a column meet specified conditions. An unlimited number of check constraints can be created.
These constraints can be added or removed later using the ALTER TABLE statement. The following example illustrates the syntax for defining constraints as part of the create table statement:
CREATE TABLE employees (
id INTEGER NOT NULL,
email VARCHAR(50) NOT NULL,
first_name VARCHAR(50),
last_name VARCHAR(50),
birth_date DATE,
manager_id INTEGER,
PRIMARY KEY(id),
UNIQUE (email),
CHECK (birth_date < CURRENT_DATE),
FOREIGN KEY (manager_id) REFERENCES employees(id)
);
In this example:
id
is a primary key and cannot contain null values.email
cannot contain null values and must be unique.birth_date
must be a date prior to the current date.department_id
is a foreign key that refers to theid
field of thedepartments
table.manager_id
is a foreign key that refers to theid
field of the sameemployees
table.
To define names for the individual constraints, the definition can be prefixed by the keyword CONSTRAINT
and a name:
CREATE TABLE employees (
id INTEGER NOT NULL,
email VARCHAR(50) NOT NULL,
first_name VARCHAR(50),
last_name VARCHAR(50),
birth_date DATE,
manager_id INTEGER,
CONSTRAINT cstr_primary PRIMARY KEY(id),
CONSTRAINT cstr_email_unique UNIQUE(email),
CONSTRAINT cstr_bday CHECK(birth_date < CURRENT_DATE),
CONSTRAINT fk_manager FOREIGN KEY (manager_id) REFERENCES employees(id)
);
Horizontal Partitioning
Horizontal partitioning allows to split a table into multiple chunks of rows. It can improve query performance and aid in managing large data sets. In a CREATE TABLE
statement, partitioning is specified using the PARTITION BY
clause. See Data Partitioning to learn more on data partitioning in Polypheny.
In Polypheny, three methods of partitioning are supported:
- Basic partitioning defined by an identifier.
- Range-based partitioning.
- Temperature-based partitioning.
Syntax:
PARTITION BY { identifier | RANGE | TEMPERATURE (partition_column) }
(
PARTITION partition_name VALUES (value) [, ...]
) [, ...]
[USING FREQUENCY
{
ALL | WRITE | READ
} INTERVAL interval_value interval_unit
WITH num_partitions identifier PARTITIONS
]
Basic Partitioning
You specify the partition column, and optionally the number of partition groups and their names.
Example:
PARTITION BY id
Range-Based Partitioning
Range-based partitioning allows you to specify partition boundaries based on column values.
Example:
PARTITION BY RANGE (date) (
PARTITION p0 VALUES (2020-01-01),
PARTITION p1 VALUES (2020-07-01),
PARTITION p2 VALUES (2021-01-01)
)
Temperature-Based Partitioning
Temperature-based partitioning enables you to define partitions based on access patterns and frequency of the data. Hot data (frequently accessed) is stored separately from cold data (infrequently accessed), improving overall performance.
The syntax specifies the access pattern (ALL, WRITE, READ), the interval of data access frequency, the number of partitions, and the internal partition function.
Example:
PARTITION BY TEMPERATURE (date) (
PARTITION p0 VALUES (10 PERCENT_REMAINDER),
PARTITION p1 VALUES (90 PERCENT_REMAINDER)
)
USING FREQUENCY ALL INTERVAL 5 MINUTE
WITH 2 RANGE PARTITIONS
In this example, the date
column is used for partitioning. Two partitions (p0
and p1
) are defined, where p0
contains the hot (frequently accessed) 10% of the data and p1
contains the remaining 90% of the data. The USING FREQUENCY
clause specifies that the frequency of data access is measured across all types of operations (ALL) and updated every 5 minutes. The WITH
clause specifies that there will be 2 range partitions.
Examples
Here are a few examples showing the use of the CREATE TABLE
statement:
Example 1: Simple table creation
CREATE TABLE students (
id INTEGER NOT NULL,
name VARCHAR(50) NOT NULL,
age INTEGER DEFAULT 18,
gpa REAL NOT NULL,
credits INT DEFAULT 0,
gradePoints REAL DEFAULT 0
);
This statement creates a new table named ‘students’. It includes columns for id
, name
, age
, gpa
, credits
, and gradePoints
.
Example 2: Table creation with query
CREATE TABLE highAchievers AS
SELECT id, name FROM students WHERE gpa >= 3.5;
This statement creates a new table named ‘highAchievers’ and populates it with rows from the ‘students’ table where the gpa
is 3.5 or higher.
Example 3: Table creation with partitioning
CREATE TABLE students (
id INTEGER NOT NULL,
name VARCHAR(50) NOT NULL,
age INTEGER DEFAULT 18,
gpa REAL NOT NULL,
credits INTEGER DEFAULT 0,
gradePoints REAL DEFAULT 0,
PRIMARY KEY (id)
)
PARTITION BY RANGE (age) (
PARTITION p0 VALUES LESS THAN (20),
PARTITION p1 VALUES LESS THAN (25),
PARTITION p2 VALUES LESS THAN (30),
PARTITION p3 VALUES LESS THAN MAXVALUE
);
This statement creates a partitioned table ‘students’. The table is partitioned based on the age
column, with different partitions for ages less than 20, 25, 30, and a catch-all partition for ages 30 and above.