Tutorial: Working With Data From Different Data Sources

This tutorial demonstrates how to integrate and query data from both a CSV file and a SQL database using Polypheny. By linking these diverse data sources, Polypheny enables real-time querying without the need for time-consuming data import processes. Additionally, it offers the flexibility to access the data using various query languages and interfaces.

Motivation

Polypheny’s ability to connect different types of data sources empowers users to query data in real-time. Unlike traditional approaches that involve importing data from multiple sources into a database, Polypheny allows queries to be executed directly on the latest version of the data. This eliminates the need for ETL (Extract, Transform, Load) processes. Furthermore, Polypheny supports querying the data using different query languages and interfaces, enhancing flexibility and usability.

Linking the CSV File

To illustrate the process, we assume the existence of a CSV file containing price information for electronic components. The file consists of 12 rows and 5 columns, including data such as component name, manufacturer, part number, quantity, and price per unit.

Component,Manufacturer,Part Number,Quantity,Price
Resistor,ABC Electronics,RES123,100,0.50
Capacitor,XYZ Corp,CAP456,50,1.20
Transistor,DEF Inc,TRANS789,25,2.75
Diode,LMN Technologies,DIO987,75,0.80
IC Chip,PQR Semiconductors,IC1234,10,5.50
LED,UVW Components,LED567,200,0.25
Fuse,XYZ Corp,FUS890,30,1.10
Inductor,ABC Electronics,IND234,40,1.75
Relay,DEF Inc,REL567,15,3.20
Switch,LMN Technologies,SWT987,60,0.90
Resistor,UVW Components,RES890,80,0.40
Capacitor,PQR Semiconductors,CAP123,20,1.50

To link this CSV file in Polypheny, follow these steps:

  1. Create a file named prices.csv on the machine running Polypheny-DB containing the content above.
  2. Open the Polypheny-UI and navigate to “Adapters” in the menu.
  3. Click the “+” button in the Sources section to deploy a CSV source.
  4. In the dialog, provide a unique name for the source, e.g., dealer_prices.
  5. Select the “link” method and choose the CSV file. Proceed by clicking “Deploy.”
  6. For security purposes, Polypheny will ask you to create a file named polypheny.access containing the specified string. Place this file in the same folder as the CSV file.
  7. Once the file is created, click “Continue” in Polypheny-UI.

Congratulations! The CSV file has now been successfully linked to the Polypheny schema. You can explore the data by navigating to “Data” in the menu and selecting the table named “prices” in the public namespace. It’s worth noting that Polypheny reads the content of the file on query execution, allowing changes to the file’s content to be immediately reflected in Polypheny.

Linking a SQL Database

Polypheny supports the integration of various database systems as data sources. In this tutorial, we demonstrate how to link tables from a PostgreSQL database, although the process is similar for other databases.

In this example, we assume the PostgreSQL database system has a table with the name product_catalog. The table, including example data, can be created using the following SQL query:

CREATE TABLE product_catalog (
    ProductID SERIAL PRIMARY KEY,
    PartNumber VARCHAR(255) REFERENCES YourCSVTableNameHere(PartNumber),
    MarginAddPercent DECIMAL(10, 2),
    DiscountGroup VARCHAR(255),
    Title VARCHAR(255),
    Description TEXT
);

INSERT INTO product_catalog (PartNumber, MarginAddPercent, DiscountGroup, Title, Description)
VALUES ('RES123', 10.00, 'GroupA', 'Resistor', 'High-quality resistor for electronic circuits');

INSERT INTO product_catalog (PartNumber, MarginAddPercent, DiscountGroup, Title, Description)
VALUES ('CAP456', 12.50, 'GroupB', 'Capacitor', 'Electrolytic capacitor for power applications');

INSERT INTO product_catalog (PartNumber, MarginAddPercent, DiscountGroup, Title, Description)
VALUES ('TRANS789', 15.00, 'GroupA', 'Transistor', 'NPN transistor for amplification purposes');

INSERT INTO product_catalog (PartNumber, MarginAddPercent, DiscountGroup, Title, Description)
VALUES ('DIO987', 10.50, 'GroupC', 'Diode', 'Fast-recovery diode for rectification');

INSERT INTO product_catalog (PartNumber, MarginAddPercent, DiscountGroup, Title, Description)
VALUES ('IC1234', 20.00, 'GroupB', 'IC Chip', 'Integrated circuit chip for digital applications');

INSERT INTO product_catalog (PartNumber, MarginAddPercent, DiscountGroup, Title, Description)
VALUES ('LED567', 8.00, 'GroupA', 'LED', 'High-intensity LED for lighting applications');

INSERT INTO product_catalog (PartNumber, MarginAddPercent, DiscountGroup, Title, Description)
VALUES ('FUS890', 7.50, 'GroupC', 'Fuse', 'Fast-acting fuse for circuit protection');

INSERT INTO product_catalog (PartNumber, MarginAddPercent, DiscountGroup, Title, Description)
VALUES ('IND234', 12.00, 'GroupB', 'Inductor', 'Wirewound inductor for filtering and energy storage');

The steps to link a table from another database in Polypheny are similar to linking a CSV file:

  1. Open Polypheny-UI and navigate to “Adapters” in the menu.
  2. Click the “+” button in the Sources section and select the PostgreSQL source.
  3. In the dialog, specify the connection details for the PostgreSQL database.
  4. Provide product_catalog as the name of the table to be linked into the Polypheny schema.
  5. Proceed by clicking “Deploy”.

Congratulations! The product_catalog table has now been successfully linked into the Polypheny schema. You can browse the data by navigating to “Data” in the menu and selecting the table named “product_catalog” in the public namespace. Any changes made to the data in the PostgreSQL database will be immediately visible in Polypheny.

Joining the Data

Polypheny’s logical schema represents both the CSV file and the SQL table as entities, enabling the use of read-only queries and various query languages supported by Polypheny to query and combine these entities.

As an example,let’s join the prices table (from the CSV file) with the product_catalog table (from the PostgreSQL database) to calculate the sales price for a product with ID 5. Use the following query:

SELECT (cp.Quantity * pc.MarginAddPercent / 100) + cp.Price AS CalculatedPrice
FROM product_catalog pc
JOIN prices cp ON pc.PartNumber = cp.PartNumber
WHERE pc.ProductID = 5;

Modifying the Data

Please note that the ability to perform data modification queries depends on the data source adapter. While the PostgreSQL adapter supports data modification queries, the CSV adapter does not. For example, to add a new product to the product catalog offered by the distributor, execute the following query in Polypheny:

INSERT INTO product_catalog (PartNumber, MarginAddPercent, DiscountGroup, Title, Description)
VALUES ('RES890', 10.00, 'GroupA', 'Resistor', 'Resistor for high-current electronic applications');

Additionally, you can update existing entries in the linked data source. For instance:

UPDATE product_catalog
SET Description = 'Premium resistor for high-performance electronic circuits'
WHERE PartNumber = 'RES123';

Polypheny enables seamless integration and querying of data from diverse sources, allowing you to harness the power of real-time data analysis and manipulation.

© Polypheny GmbH. All Rights Reserved.