Tutorial: Analyze Data using Notebooks

On this page

Creating a New Notebook
Familiarizing Yourself with the UI
Querying Data from Polypheny using Query Cells
Advanced Analysis: Gaining Insights from Query Results
Closing Your Notebook

This tutorial shows how to leverage Polypheny’s Notebook feature to execute code and query data, and even perform a basic yet insightful analysis using Python. We’ll focus on querying the relational public.emp data source shipped with Polypheny for this tutorial.

Creating a New Notebook

Open Polypheny-UI: Navigate to the Notebooks section in the main menu.
Deploy Jupyter Server container: Make sure that the Jupyter server container is deployed. If it shows offline, click on the green Start Container Button and follow the instructions. Note that this requieres Docker to be setup.
Create New File: Click the + button in the left sidebar.
Name Your Notebook: Provide a unique name for your notebook.
File & Kernel Settings: Make sure the file type is set to Notebook and the kernel to Python 3 (ipykernel).
Create: Click on Create.

Congratulations, you have just set up a new notebook! This action starts up a Python kernel and immediately opens your notebook, ready for action.

Familiarizing Yourself with the UI

When your notebook opens, you’ll see a toolbar at the top and your new notebook displayed below it. By default, you have one empty code cell highlighted by a blue bar. Each cell has its own mini toolbar offering cell-related functionalities.

Markdown Introduction

Switch to Markdown: Convert the cell type to Markdown using either the notebook toolbar or the cell-specific toolbar.

Edit Markdown Cell: Click inside the cell’s editor and enter the following Markdown text:

# My First Notebook
This notebook demonstrates how to perform a simple 
analysis of some data stored in *Polypheny*.

Render Markdown: Click on the ▶ button next to the query cell to render the Markdown text.

Further information on the supported markdown syntax can be found here.

First Python Code Cell

Insert a Code Cell: Add a new code cell below your Markdown cell.
Write & Execute Code: Enter print('Hello World') and run the cell. You should see “Hello World” displayed as output.
Save Your Work: Don’t forget to save your progress by clicking on the “Save” button in the toolbar.

Querying Data from Polypheny using Query Cells

Add a Query Cell: Insert a new cell, but this time set its type to Query.
Query Settings: Ensure the language is set to SQL, the namespace to public, and specify employees as the variable name for storing the query results.
Execute Your Query: Run the following SQL query to get started.
```
SELECT * FROM emp
ORDER BY workingyears DESC
```
Inspect Results: You’ll now see the 10 employees with the most working years. The output is limited to 10 rows because no LIMIT clause was specified.
Modify Query: Update your SQL query to focus on the working years and monthly income of all employees.
```
SELECT monthlyincome, workingyears FROM emp
LIMIT 2000
```

Since the data set is large, Polypheny will show only the first 1000 entries for performance reasons. If you want, you can hide the table display by clicking on the blue bar on the left edge of the result cell.

Advanced Analysis: Gaining Insights from Query Results

Now we will analyse the data using Pandas.

Add a Code Cell: Create a new code cell bellow your query cell.

Data to Dataframe: Insert the following code to manipulate your data:

df = employees.as_df()
print(f"Total Number of Employees: {len(df)}")

Visual Analysis: To better understand the distribution of working years versus monthly income, add this code:

df.plot.scatter(x='workingyears', y='monthlyincome', title='Scatter Plot of Working Years vs Monthly Income')

Execute Analysis Cell: Run the cell to see the result. This code block performs several actions:
- First, it converts the query result stored in the employees variable into a Pandas DataFrame.
- It then prints the total number of employees for your quick reference.
- Finally, it generates a scatter plot that plots ‘Working Years’ against ‘Monthly Income’, offering a visual way to perceive any trends or anomalies in the data.

Closing Your Notebook

Navigate or Close: Either navigate away from the current notebook using the left sidebar or click on the close button.
Save and Shutdown: A dialog will appear asking if you’d like to save your notebook and shut down the kernel. Ensure both are checked, as it’s a good practice not to leave unnecessary kernels running.
Finalize: Click on “Close & Shut Down”.
Kernel Status: Finally, confirm via the notebooks dashboard that the kernel has indeed stopped running.

Different Data Sources

Cross-Model Queries

Quick Search Results

Creating a New Notebook

Familiarizing Yourself with the UI

Markdown Introduction

First Python Code Cell

Querying Data from Polypheny using Query Cells

Advanced Analysis: Gaining Insights from Query Results

Closing Your Notebook