The CSV data source adapter enables the querying of CSV files as relational tables. Ensure that the first line of your CSV files contains a header in compliance with the specifications provided below. The adapter facilitates querying of multiple CSV files, where each file is represented as a table. This adapter also supports files compressed using gzip, identified by their filename endings of either .csv
or .csv.gz
.
The CSV adapter operates in read-only mode, and does not support DML queries. However, based on the mode of operation, the file content may be updated in the background as long as the schema (number and type of columns) remains consistent.
Adapter Configuration
The CSV adapter comprises two settings:
Setting | Description |
---|---|
directory | Specifies the location of the directory containing the CSV files. |
maxStringLength | Defines the length limit (in characters, including whitespace) for varchar columns. Ensure this value is equal to or larger than the length of your longest string across all columns. |
File Header Specifications
The first line of the CSV file must include a header indicating the name of each column.
id,name
Optionally, the header can specify explicit data types. If a data type is omitted, Polypheny will automatically determine it. The column name and the data type should be separated by a colon:
id:int,name:string
Supported Data Types
The CSV adapter supports the following data types:
Type | Description |
---|---|
boolean | Permissible values: true , false , 0 and 1 . |
date | Required format: yyyy-mm-dd . |
double | Floating point number with ‘.’ (dot) as decimal separator. Maps to the data type DOUBLE (8 bytes, IEEE 754). |
float | Floating point number with ‘.’ (dot) as decimal separator. Maps to the data type REAL (4 bytes, IEEE 754). |
int | Integer number in the range -2,147,483,648 to 2,147,483,647 . |
long | Integer number in the range -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807 . Mapped to the data type BIGINT. |
string | String type (can contain letters, numbers, and special characters). If it contains commas, it should be wrapped in double quotes. Maps to VARCHAR. The length can be specified using the adapter parameter maxStringLength . |
time | Required format: hh:mm:ss . Fractional seconds are not supported. |
timestamp | Required format: yyyy-mm-dd hh:mm:ss . Fractional seconds are not supported. |
Examples
Example without explicit data types:
id,name,level,birthday,fulltime,start,last_update,tfloat,tdouble
1,Alice,9,1992-11-14,true,08:00:00,2021-01-30 13:30:04+00:00,1.999,2.333
2,"Wurst, Hans",6,1989-08
-01,0,08:15:00,2020-11-01 09:05:12.111,0.1,999.999999
Example with explicit data types:
id:long,name:string,level:int,birthday:date,fulltime:boolean,start:time,last_update:timestamp,tfloat:float,tdouble:double
1,Alice,9,1992-11-14,true,08:00:00,2021-01-30 13:30:04+00:00,1.999,2.333
2,"Wurst, Hans",6,1989-08-01,0,08:15:00,2020-11-01 09:05:12.111,0.1,999.999999
Deployment
The adapter can be deployed using the Polypheny-UI (Adapters -> Sources). The adapter operates in two modes:
- Uploading: The file is uploaded to Polypheny. Any subsequent changes to the file will not be reflected in Polypheny.
- Linking: The file is accessed by Polypheny at a specified location. For this mode, the file must reside on the same host as Polypheny, and Polypheny must have access rights to the file. Any changes to the file are immediately visible in Polypheny. Make sure that the file name does not include spaces.
Once successfully deployed, all CSV files are mapped as tables in the public schema. The tables and columns can be renamed. Moreover, columns can be reordered and removed if they are not required (the column data are not deleted from the file). If required, columns that have been removed can be re-added using the Polypheny-UI or the following SQL statement:
ALTER TABLE tableName ADD COLUMN physicalName AS name
In this statement, physicalName
refers to the name specified in the CSV file header.