This tutorial shows how to leverage Polypheny’s Notebook feature to execute code and query data, and even perform a basic yet insightful analysis using Python. We’ll focus on querying the relational public.emp
data source shipped with Polypheny for this tutorial.
Creating a New Notebook
- Open Polypheny-UI: Navigate to the
Notebooks
section in the main menu. - Deploy Jupyter Server container: Make sure that the Jupyter server container is deployed. If it shows offline, click on the green
Start Container
Button and follow the instructions. Note that this requieres Docker to be setup. - Create New File: Click the
+
button in the left sidebar. - Name Your Notebook: Provide a unique name for your notebook.
- File & Kernel Settings: Make sure the file type is set to
Notebook
and the kernel toPython 3 (ipykernel)
. - Create: Click on
Create
.
Congratulations, you have just set up a new notebook! This action starts up a Python kernel and immediately opens your notebook, ready for action.
Familiarizing Yourself with the UI
When your notebook opens, you’ll see a toolbar at the top and your new notebook displayed below it. By default, you have one empty code cell highlighted by a blue bar. Each cell has its own mini toolbar offering cell-related functionalities.
Markdown Introduction
- Switch to Markdown: Convert the cell type to
Markdown
using either the notebook toolbar or the cell-specific toolbar. - Edit Markdown Cell: Click inside the cell’s editor and enter the following Markdown text:
# My First Notebook This notebook demonstrates how to perform a simple analysis of some data stored in *Polypheny*.
- Render Markdown: Click on the
▶
button next to the query cell to render the Markdown text.
Further information on the supported markdown syntax can be found here.
First Python Code Cell
- Insert a Code Cell: Add a new code cell below your Markdown cell.
- Write & Execute Code: Enter
print('Hello World')
and run the cell. You should see “Hello World” displayed as output. - Save Your Work: Don’t forget to save your progress by clicking on the “Save” button in the toolbar.
Querying Data from Polypheny using Query Cells
- Add a Query Cell: Insert a new cell, but this time set its type to
Query
. - Query Settings: Ensure the language is set to
SQL
, the namespace topublic
, and specifyemployees
as the variable name for storing the query results. - Execute Your Query: Run the following SQL query to get started.
SELECT * FROM emp ORDER BY workingyears DESC
-
Inspect Results: You’ll now see the 10 employees with the most working years. The output is limited to 10 rows because no
LIMIT
clause was specified. - Modify Query: Update your SQL query to focus on the working years and monthly income of all employees.
SELECT monthlyincome, workingyears FROM emp LIMIT 2000
Advanced Analysis: Gaining Insights from Query Results
Now we will analyse the data using Pandas.
- Add a Code Cell: Create a new code cell bellow your query cell.
- Data to Dataframe: Insert the following code to manipulate your data:
df = employees.as_df() print(f"Total Number of Employees: {len(df)}")
- Visual Analysis: To better understand the distribution of working years versus monthly income, add this code:
df.plot.scatter(x='workingyears', y='monthlyincome', title='Scatter Plot of Working Years vs Monthly Income')
-
Execute Analysis Cell: Run the cell to see the result. This code block performs several actions:
- First, it converts the query result stored in the
employees
variable into a Pandas DataFrame. - It then prints the total number of employees for your quick reference.
- Finally, it generates a scatter plot that plots ‘Working Years’ against ‘Monthly Income’, offering a visual way to perceive any trends or anomalies in the data.
- First, it converts the query result stored in the
Closing Your Notebook
- Navigate or Close: Either navigate away from the current notebook using the left sidebar or click on the close button.
- Save and Shutdown: A dialog will appear asking if you’d like to save your notebook and shut down the kernel. Ensure both are checked, as it’s a good practice not to leave unnecessary kernels running.
- Finalize: Click on “Close & Shut Down”.
- Kernel Status: Finally, confirm via the notebooks dashboard that the kernel has indeed stopped running.