Data Source: Google Sheets

⚠ Warning
This data source adapter is still in alpha stage and may not be stable.

The Google Sheets Data Source Adapter permits data from a Google Sheets document to be mapped into the logical schema and querried using Polypheny. The data is represented as a relational table. This adapter is read-only, meaning it does not support Data Manipulation Language (DML) queries.

Information: This adapter has been implemented by Anh Tran as part of his Google Summer of Code 2022 project, entitled Data Source Adapter for Google Sheets.

The content of the Google Sheets can be updated concurrently as long as there are no modifications to the the column names. The adapter designates the first column as the primary key and does not support null values.

The adapter accommodates different formatting, but users should be aware of potential issues in cases where:

  • Two tables are embedded within the same sheet in the URL, which will lead to an error.
  • There is an interruption in the row sequence, in which case the adapter will not read beyond the interruption.

Adapter Configuration

The Google Sheets Data Source Adapter can be customized with the following settings:

Name Description
Name Unique identifier for Google Sheets URL tables
sheetsURL The Google Sheets URL to extract data from
maxStringLength Maximum size of strings in all tables
querySize Number of rows inspected per network call with the Google Sheets API
resetRefreshToken Enter “Yes” to delete the existing refresh token associated with the current email used to read from Google Sheets and utilize a different account. “No” is the default setting.

Supported Data Types

All columns in the Google Sheets Adapter default to the string type unless otherwise specified. To change the data type, the column name should be formatted as columnName:type. The column data types that can be assigned are: int, string, boolean, long, float, double, date, time, timestamp.

For instance:

  • A Google Sheets column named user_id:int would be imported with the column name user_id and the column type int.
  • A column named birthday:date would result in the column name birthday and the type date.
  • A column simply named age, even if all values are integers, would be treated as a string type when mapped into Polypheny.

Deployment

The adapter can be deployed through Polypheny-UI (Adapters –> Sources), or with the following SQL statement:

ALTER ADAPTERS ADD unique_name USING 'org.polypheny.db.adapter.googlesheet.GoogleSheetSource' WITH 
'{maxStringLength: "255", querySize: "1000", sheetsURL: "https://docs.google.com/spreadsheets/d/1-int7xwx0UyyigB4FLGMOxaCiuHXSNhi09fYSuAIX2Q/edit#gid=0",
mode: "remote", resetRefreshToken: "No"}'

Please refer to the adapter configuration above and adjust maxStringLength, querySize, and other parameters as needed. Once the adapter is successfully deployed, all Google Sheets will be represented as a tables in the public namespace.

ℹ Info
The ability to specify the namespace will follow in a future version.

Roadmap

Token Management

  • The application currently operates using the Google Cloud Console tied to a private email project. A new project should be created under the Polypheny email, with the Google Sheets API enabled. An OAUTH credential should then be added under the “Credentials” section, configuring origins as needed. The redirect URLs should be http://localhost:8888/Callback and http://localhost:8888. This localhost port is utilized by the GoogleSheetReader to receive the access tokens from OAUTH, but can be changed as needed. The credentials should then be downloaded and added to google-sheet-adapter/src/main/resources/credentials.json.

  • At present, refresh tokens expire approximately after a week, potentially leading to user errors. While users can manually reset this using the resetRefreshToken option in the settings, to improve user experience and eliminate API call limits to Google Sheets servers, the OAuth Consent Screen should be navigated to and the app should be published. This extends the life of refresh tokens indefinitely and removes the API call limit.

Potential Enhancements

  • Segregate GoogleSheetTableScanProject into distinct Scan and Project classes.
  • Relocate the GoogleSheetSourceTest from dbms/src/test/java/org.polypheny.db/adapter/ to the google-sheet-adapter package.
Licensed under CC BY-SA