Data Source: CSV

The CSV data source adapter enables the querying of CSV files as relational tables. Ensure that the first line of your CSV files contains a header in compliance with the specifications provided below. The adapter facilitates querying of multiple CSV files, where each file is represented as a table. This adapter also supports files compressed using gzip, identified by their filename endings of either .csv or .csv.gz.

The CSV adapter operates in read-only mode, and does not support DML queries. However, based on the mode of operation, the file content may be updated in the background as long as the schema (number and type of columns) remains consistent.

Please note that the current version of the CSV adapter does not support null values.

Adapter Configuration

The CSV adapter comprises two settings:

Setting Description
directory Specifies the location of the directory containing the CSV files.
maxStringLength Defines the length limit (in characters, including whitespace) for varchar columns. Ensure this value is equal to or larger than the length of your longest string across all columns.

File Header Specifications

The first line of the CSV file must include a header indicating the name of each column.

id,name

Optionally, the header can specify explicit data types. If a data type is omitted, Polypheny will automatically determine it. The column name and the data type should be separated by a colon:

id:int,name:string

Supported Data Types

The CSV adapter supports the following data types:

Type Description
boolean Permissible values: true, false, 0 and 1.
date Required format: yyyy-mm-dd.
double Floating point number with ‘.’ (dot) as decimal separator. Maps to the data type DOUBLE (8 bytes, IEEE 754).
float Floating point number with ‘.’ (dot) as decimal separator. Maps to the data type REAL (4 bytes, IEEE 754).
int Integer number in the range -2,147,483,648 to 2,147,483,647.
long Integer number in the range -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807. Mapped to the data type BIGINT.
string String type (can contain letters, numbers, and special characters). If it contains commas, it should be wrapped in double quotes. Maps to VARCHAR. The length can be specified using the adapter parameter maxStringLength.
time Required format: hh:mm:ss. Fractional seconds are not supported.
timestamp Required format: yyyy-mm-dd hh:mm:ss. Fractional seconds are not supported.

Examples

Example without explicit data types:

id,name,level,birthday,fulltime,start,last_update,tfloat,tdouble
1,Alice,9,1992-11-14,true,08:00:00,2021-01-30 13:30:04+00:00,1.999,2.333
2,"Wurst, Hans",6,1989-08

-01,0,08:15:00,2020-11-01 09:05:12.111,0.1,999.999999

Example with explicit data types:

id:long,name:string,level:int,birthday:date,fulltime:boolean,start:time,last_update:timestamp,tfloat:float,tdouble:double
1,Alice,9,1992-11-14,true,08:00:00,2021-01-30 13:30:04+00:00,1.999,2.333
2,"Wurst, Hans",6,1989-08-01,0,08:15:00,2020-11-01 09:05:12.111,0.1,999.999999

Deployment

The adapter can be deployed using the Polypheny-UI (Adapters -> Sources). The adapter operates in two modes:

  • Uploading: The file is uploaded to Polypheny. Any subsequent changes to the file will not be reflected in Polypheny.
  • Linking: The file is accessed by Polypheny at a specified location. For this mode, the file must reside on the same host as Polypheny, and Polypheny must have access rights to the file. Any changes to the file are immediately visible in Polypheny. Make sure that the file name does not include spaces.

Once successfully deployed, all CSV files are mapped as tables in the public schema. The tables and columns can be renamed. Moreover, columns can be reordered and removed if they are not required (the column data are not deleted from the file). If required, columns that have been removed can be re-added using the Polypheny-UI or the following SQL statement:

ALTER TABLE tableName ADD COLUMN physicalName AS name

In this statement, physicalName refers to the name specified in the CSV file header.

© Polypheny GmbH. All Rights Reserved.