A schema in a database system outlines how data is structured and organized. The level of detail and available components for schema definitions vary based on the specific data model in use. Since schemas often undergo changes, it’s crucial for a database system to provide dynamic definition and modification of schemas.
Polypheny is a PolyDBMS (Polystore Database Management System) that allows the articulation of schemas based on different data models. This document delves into the conceptual model behind managing schemas in such a system.
Polypheny emphasizes three fundamental data models: the relational, document, and LPG models. Each of these models varies in the semantic building blocks provided for schema definition and the schema’s significance. The relational model necessitates a stringent schema, the document model doesn’t demand a schema, and the LPG model accepts optional schema constraints. A PolyDBMS, such as Polypheny, supports these diverse data models, requiring a conceptual model that can construct schemas based on these models.
While Polypheny’s schema model is grounded on these three data models, it can also integrate other popular data models, such as the wide-column, key-value, RDF, or object-relational model. This model uses a semi-mathematical notation for formal concept discussions, maintaining a significant level of abstraction from a practical implementation.
Overview
Polypheny’s schema model is a comprehensive multi-layer concept that goes beyond merely representing multimodel schemas. It differentiates between distinct types of schemas, each embracing multiple data models and elucidates the mappings between these schema types.
Polypheny schema model comprises four schema types (or layers):
-
Logical Schema: This schema is the backbone of the PolyDBMS, supporting various data models. The primary component is the ‘namespace’, with each namespace corresponding to a specific data model. The building blocks used within a namespace align with its respective data model. Logical schemas can also house views that access entities from the same or different namespaces. Besides, the logical schema permits defining data constraints based on the data model of the namespace.
-
Allocation Schema: This schema emerges by applying horizontal data partitioning and transforming materialized views into entities with specific freshness. It also symbolizes schema elements hidden from the user (e.g., internal tuple identifiers).
-
Physical Schemas: These schemas are materialized on the underlying data stores, each containing a subset of the schema defined in the allocation layer. When an entity’s data model doesn’t match the data store’s data model, an appropriate mapping is executed.
-
Exposed Schemas: They represent a view on the logical schema, granting access to all data entities defined in the logical schema. Depending on the query language, the schema might exhibit a hierarchical naming structure to segregate entities from different namespaces.
This multi-layered architecture caters to the requirements of a PolyDBMS and delivers numerous benefits. It enables virtual mapping between data models, granting access to data organized under a different data model than the query language. It also supports cross-model queries. Furthermore, it fully abstracts the exposed schema from the physical schema, separating the user-specified query schema from the actual physical structure in which the data is stored.
Detailed Layer Descriptions
Logical Schema
The logical schema is the PolyDBMS’s heart, maintaining the user-defined schema and outlining data organization’s structure. Through the concept of ‘namespaces’, it accommodates schema definitions based on various data models in one logical schema. Each namespace has a unique name and corresponds to a particular data model. A PolyDBMS’s logical schema comprises a finite set of these namespaces.
Allocation Schema
The allocation schema is an extension of the logical schema, implementing specific data management strategies like horizontal partitioning and converting materialized views
into entities. It also presents schema elements typically hidden from the user, acting as a bridge between the logical schema and the physical schemas. It reflects data distribution and partitioning across multiple physical data stores.
The allocation schema’s main entities include:
-
Allocations: They depict data distribution defined by the logical schema onto the physical schemas. Each allocation aligns with an entity in the logical schema and is assigned to a data store.
-
Partitions: They signify horizontal partitioning of data. An entity in the allocation schema can be divided into several partitions, each representing a subset of the data defined by the entity. Partitions offer granular control of data distribution and enhance query execution efficiency.
Physical Schemas
Physical schemas materialize on the underlying data stores. Each data store holds a subset of the schema defined in the allocation layer based on defined allocations and partitions. In cases where an entity’s data model doesn’t align with the data store’s data model, a suitable mapping is performed. Physical schemas enable efficient data storage and retrieval according to each data store’s capabilities. Polypheny supports diverse data store types with different data models and maps entities from the allocation schema to the physical schemas optimally.
Exposed Schemas
Exposed schemas offer a view on the logical schema, granting users and applications access to Polypheny-managed data. They may present a hierarchical naming structure to separate entities from different namespaces, depending on the query language used. They encapsulate the complexity of underlying allocation and physical schemas, letting users interact with the data in line with the logical schema.
The abstraction offers several benefits:
-
Simplicity: Users can interact with the data through a single, unified schema, oblivious of underlying data management complexities.
-
Flexibility: Decoupling the logical schema from the physical schemas allows Polypheny to optimize data storage and retrieval without impacting user interaction.
-
Interoperability: Utilizing namespaces in the logical schema enables querying data from different data models simultaneously, enabling powerful cross-model queries.
In summary, Polypheny’s multi-layered schema model offers significant flexibility, effectively managing data across multiple data models and data stores while providing a simple and consistent interface to users.