DB Partitioning


Partitioning enhances the performance, manageability, and availability of a wide variety of applications and helps reduce the total cost of ownership for storing large amounts of data. Partitioning allows tables, indexes, and index-organized tables to be subdivided into smaller pieces, enabling these database objects to be managed and accessed at a finer level of granularity.

Partitioning is a rather general concept and can be applied in many contexts. When it considers the partitioning of relational data,

it usually refers to decomposing your tables either row-wise (horizontally) or column-wise (vertically).

–       Horizontal partition(Row-wise partitioning) :- It involves putting different rows into different tables. Perhaps customers with ZIP codes less than 50000 are stored in CustomersEast, while customers with ZIP codes greater than or equal to 50000 are stored in CustomersWest. The two partition tables are then CustomersEast and CustomersWest, while a viewwith a union might be created over both of them to provide a complete view of all customers.

–       Vertical partition(Column-wise partitioning) :- It involves creating tables with fewer columns and using additional tables to store the remaining columns. Normalization also involves this splitting of columns across tables, but vertical partitioning goes beyond that and partitions columns even when already normalized. Different physical storage might be used to realize vertical partitioning as well; storing infrequently used or very wide columns on a different device,

for example, is a method of vertical partitioning. Done explicitly or implicitly, this type of partitioning is called “row splitting” (the row is split by its columns). A common form of vertical partitioning is to split dynamic data (slow to find) from static data (fast to find) in a table where the dynamic data is not used as often as the static. Creating a view across the two newly created tables restores the original table with a performance penalty, however performance will increase when accessing the static data e.g. for statistical analysis.

Other Example:–

assume you have a table like this:
  create table data (    id integer primary key,     status char(1) not null,     data1 varchar2(10) not null,     data2 varchar2(10) not null);

One way to partition data vertically: Split it as follows:
create table data_main (    id integer primary key,    status char(1) not null,    data1 varchar2(10) not null ); create table data_rarely_used (    id integer primary key,    data2 varchar2(10) not null,    foreign key (id) references data_main (id) );

This kind of partitioning can be applied, for example, when you rarely need column data2 in your queries. Partition data_main will take less space, hence full table scans will be faster and it is more likely that it fits into the DBMS’ page cache. The downside: When you have to query all columns of data, you obivously have to join the tables, which will be more expensive that querying the original table.

Notice you are splitting the columns in the same way as you would when you normalize tables. However, in this case data could already be normalized to 3NF (and even BCNF and 4NF), but you decide to further split it for the reason of physical optimization.

One way to partition data horizontally, using Oracle syntax:
create table data (    id integer primary key,     status char(1),     data1 varchar2(10),     data2 varchar2(10) )    partition by list (status) (        partition active_data values ( ‘A’ ),       partition other_data values(default)     );

This would tell the DBMS to internally store the table data in two segments (like two tables), depending on the value of the column status. This way of partitioning data can be applied, for example, when you usually query only rows of one partition, e.g., the status ‘A’ rows (let’s call them active rows). Like before, full scans will be faster (particularly if there are only few active rows), the active rows (and the other rows resp.) are stored contiguously (they won’t be scattered around pages that they share with rows of a different status value, and it is more likely that the active rows will be in the page cache.