CUBES

 Cubes

The data structures used in the OLAP are multidimensional data cubes or OLAP cubes:

image1

An OLAP cube is a multidimensional database that is optimized for data warehouse and online analytical processing (OLAP) applications.

An OLAP cube is an array of data understood in terms of its 0 or more dimensions. OLAP is an acronym for online analytical processing. OLAP is a computer-based technique for analyzing business data in the search for business intelligence.

So, in simple it is the data presentation in multiple dimensions.

Example: A company needs survey of their products all over the world.

So for this it requires answer of some questions like

–          Sell in USA?

–          Sell in USA of different products?

–          Sell in USA of different products in last 6 month?

–          Sell in Other country and total sell?

–          What type of consumers attract by this product?

–          Product details, with sell?

…etc

In this case we need one master table/Fact table where we store values commonly used (like last month sell) and also the relation of other table by using foreign key (Customer detail , USA sell). In such a manner that data retrieve much faster as possible.

 

The OLAP cube consists of facts, also called measures, categorized by dimensions (it can be much more than 3 Dimensions; dimensions referred from Fact Table by “foreign keys”). Measures are derived from the records in the Fact Table and Dimensions are derived from the dimension tables, where each column represents one attribute (also called dictionary; dimension can have many attributes)

Facts and Measures

Fact is most detailed information that can be measured.

In simple it works like central control, which has information of whole cube like where we find which thing.

image2

OLAP Common operations include slice and dice, drill down, roll up, and pivot:

 

Slice:

A slice is a subset of a multi-dimensional array corresponding to a single value for one or more members of the dimensions not in the subset.

image3

 

Dice:

The dice operation is a slice on more than two dimensions of a data cube (or more than two consecutive slices).

image4

 

Drill Down/Up:

Drilling down or up is a specific analytical technique whereby the user navigates among levels of data ranging from the most summarized (up) to the most detailed (down).

image5

 

Roll-up:

(Aggregate, Consolidate) A roll-up involves computing all of the data relationships for one or more dimensions. To do this, a computational relationship or formula might be defined.

image6

 

Pivot:

This operation is also called rotate operation. It rotates the data in order to provide an alternative presentation of data – the report or page display takes a different dimensional orientation.

image7

 

Summary

OLAP cube is really helpful, when we are talking about billions of data. It is very easy and straightforward to translate business questions into multidimensional query.

By which we can really increase data performance as well as understandability of data in a better manner.

Reusability is also possible in cube, which make this more scalable.

Thanks for reading : Pushpraj Kumar ( Helical IT Solutions )

Advertisements

INNODB Backup and Recovery using mysqldump

InnoDB is a storage engine for MySQL. MySQL 5.5 and later use it by default. It provides the standard ACID-compliant transaction features, along with foreign key support (Declarative Referential Integrity). It is included as standard in most binaries distributed by “MySQL AB”, the exception being some “OEM” versions.

The key to safe database management is making regular backups. Depending on your data volume, number of MySQL servers, and database workload, you can use these techniques, alone or in combination: hot backup with MySQL Enterprise Backup; cold backup by copying files while the MySQL server is shut down; physical backup for fast operation (especially for restore); logical backup with mysqldump for smaller data volumes or to record the structure of schema objects

Issue : —

C:\Calpont\bin>mysqldump.exe  -u root -p  –database  TestDB > C:\Pushpraj\infinidbPushpraj.sql 

Enter password: *******

mysqldump: Got error: 138: IDB-2029: Lock table command is currently not supported in InfiniDB. when using LOCK TABLES

Solution  :-  use –skip-lock-tables

LOGICAL BACKUP

Windows :::

To take the dump from your mysql for infinidb —-

C:\Calpont\bin>mysqldump.exe –skip-lock-tables -u root -p  –database  TestDB > C:\Pushpraj\infinidbPushpraj.sql 

Enter password: *******

It will create your dump file .

To restore the dump in your mysql —-

First create an empty DB in your Mysql desitination DB.

C:\Calpont\bin>mysql.exe -u root -p  TestDB < C:\Pushpraj\infinidbPushpraj.sql 

Enter password: *******

 

🙂   Have fun   🙂

 

PUSHPRAJ KUMAR

MySQL C – API programming

MySQL C – API programming

MySQL is a leading open source database management system. It is a multi user, multithreaded database management system. MySQL is especially popular on the web.

Basic Structure of C Programs that uses MySQL C mysqlwithcAPI 1. All programs must include <mysql/mysql.h> as the last include.2. Define MYSQL type variable. NOTE: THERE CAN BE ONLY ONE MYSQL VARIABLE.3. Initialize MYSQL type variable with mysql_init()4. Load any options, if required, by using mysql_options(). If you don’t need don’t call. You can call this fuction multiple times if you require. If you call this, call this before mysql_real_connect() and after mysql_init().5. Connect by calling mysql_real_connect()6. Call the business logic and MySQL API’s7. Close the MYSQL type variable.

 

Template of program

#include <mysql/mysql.h><return_type> <function_name>(parameters)

{

MYSQL mysql; mysql_init(&mysql);

mysql_options(&mysql,MYSQL_OPT_COMPRESS,0);      

/*call only if required otherwise omit*/

mysql_options(&mysql,MYSQL_READ_DEFAULT_GROUP,”Pushpraj”);        

 /*call only if required otherwise omit*/

mysql_real_connect(….);      

/* now call other API’s*/

mysql_close(&mysql);

}

 Compiling and Running in UNIX :- 

To Compile :-  $gcc mysql_app.c -o mysql_app -I/usr/local/include -L/usr/local/lib/mysql –lmysqlclient

To Run :-  $./mysql_app

Examples:-

1. Test the connection.

#include </usr/include/mysql/my_global.h>

#include </usr/include/mysql/mysql.h>

int main(int argc, char **argv)

{

printf(“MySQL client version: %s\n”, mysql_get_client_info());

  exit(0);

}

Description:-  “mysql_get_client_info()”, this mysql function collects the version information of mysql.

2. Create table and insert data into table.

#include <my_global.h>

#include <mysql.h>

void finish_with_error(MYSQL *con){

  fprintf(stderr, “%s\n”, mysql_error(con));

  mysql_close(con);

  exit(1);       

}

 

int main(int argc, char **argv)

{

  MYSQL *con = mysql_init(NULL);

  if (con == NULL) {

      fprintf(stderr, “%s\n”, mysql_error(con));

      exit(1);

  } 

if (mysql_real_connect(con, “192.168.2.50”, “username”, “password”,”db-Name”, port-No, NULL, 0) == NULL)

  {

      finish_with_error(con);

  }   

 

  if (mysql_query(con, “DROP TABLE IF EXISTS Dell”)) {

      finish_with_error(con);

  }

  if (mysql_query(con, “CREATE TABLE Dell(Id INT, Name TEXT, Price INT)”)) {     

      finish_with_error(con);

  }

  if (mysql_query(con, “INSERT INTO Dell VALUES(1,’vostro14′,26042)”)) {

      finish_with_error(con);

  }

  if (mysql_query(con, “INSERT INTO Dell VALUES(2,’inspiron15′,37000)”)) {

      finish_with_error(con);

  }

  if (mysql_query(con, “INSERT INTO Dell VALUES(3,’inspiron16′,40000)”)) {

      finish_with_error(con);

  }

  if (mysql_query(con, “INSERT INTO Dell VALUES(4,’ vostro15′,29000)”)) {

      finish_with_error(con);

  }

  if (mysql_query(con, “INSERT INTO Dell VALUES(5,’express’,69000)”)) {

      finish_with_error(con);

  }

  mysql_close(con);

  exit(0);

}

Description:- mysql_init, allocates/intialises a Mysql object suitable for mysql_real_connect()establishes a connection to the database.We provide connection handler, host name, username ,password ,database name, port number , unix socket and client flag.

3. Create connection with Database & fetch the records.

#include <my_global.h>

#include <mysql.h>

void finish_with_error(MYSQL *con)

{

  fprintf(stderr, “%s\n”, mysql_error(con));

  mysql_close(con);

  exit(1);       

}

int main(int argc, char **argv)

{     

  MYSQL *con = mysql_init(NULL);

  if (con == NULL)

  {

      fprintf(stderr, “mysql_init() failed\n”);

      exit(1);

  } 

 

  if (mysql_real_connect(con, “192.168.2.50”, “username”, “password”,”db-Name”, port-No, NULL, 0) == NULL)

  {      finish_with_error(con);  }  

 

  if (mysql_query(con, “SELECT * FROM Dell”))

  {      finish_with_error(con);  }  

  MYSQL_RES *result = mysql_store_result(con);

  if (result == NULL)

  {      finish_with_error(con);  }  

  int num_fields = mysql_num_fields(result);

  MYSQL_ROW row;

  while ((row = mysql_fetch_row(result)))

  {

      for(int i = 0; i < num_fields; i++)

      {

          printf(“%s “, row[i] ? row[i] : “NULL”);

      }

      printf(“\n”);

  }

  mysql_free_result(result);

  mysql_close(con);

  exit(0);

}

Description:-

  • Create a connection:: mysql_real_connect(con, “192.168.2.50”, “username”, “password”,”db-Name”, port-No, NULL, 0)
  • Execute query:: mysql_query(con, “SELECT * FROM Dell”)
  • Get the result set:: MYSQL_RES *result = mysql_store_result(con)
  • Fetch all available rows one by one :: row = mysql_fetch_row(result)
  • Free the result set:: mysql_free_result(result)
  • Close the connection:: mysql_close(con);

Thanks

Pushpraj Kumar

DB Partitioning

dbpartition

Partitioning enhances the performance, manageability, and availability of a wide variety of applications and helps reduce the total cost of ownership for storing large amounts of data. Partitioning allows tables, indexes, and index-organized tables to be subdivided into smaller pieces, enabling these database objects to be managed and accessed at a finer level of granularity.

Partitioning is a rather general concept and can be applied in many contexts. When it considers the partitioning of relational data,

it usually refers to decomposing your tables either row-wise (horizontally) or column-wise (vertically).

–       Horizontal partition(Row-wise partitioning) :- It involves putting different rows into different tables. Perhaps customers with ZIP codes less than 50000 are stored in CustomersEast, while customers with ZIP codes greater than or equal to 50000 are stored in CustomersWest. The two partition tables are then CustomersEast and CustomersWest, while a viewwith a union might be created over both of them to provide a complete view of all customers.

–       Vertical partition(Column-wise partitioning) :- It involves creating tables with fewer columns and using additional tables to store the remaining columns. Normalization also involves this splitting of columns across tables, but vertical partitioning goes beyond that and partitions columns even when already normalized. Different physical storage might be used to realize vertical partitioning as well; storing infrequently used or very wide columns on a different device,

for example, is a method of vertical partitioning. Done explicitly or implicitly, this type of partitioning is called “row splitting” (the row is split by its columns). A common form of vertical partitioning is to split dynamic data (slow to find) from static data (fast to find) in a table where the dynamic data is not used as often as the static. Creating a view across the two newly created tables restores the original table with a performance penalty, however performance will increase when accessing the static data e.g. for statistical analysis.

Other Example:–

assume you have a table like this:
  create table data (    id integer primary key,     status char(1) not null,     data1 varchar2(10) not null,     data2 varchar2(10) not null);

One way to partition data vertically: Split it as follows:
create table data_main (    id integer primary key,    status char(1) not null,    data1 varchar2(10) not null ); create table data_rarely_used (    id integer primary key,    data2 varchar2(10) not null,    foreign key (id) references data_main (id) );

This kind of partitioning can be applied, for example, when you rarely need column data2 in your queries. Partition data_main will take less space, hence full table scans will be faster and it is more likely that it fits into the DBMS’ page cache. The downside: When you have to query all columns of data, you obivously have to join the tables, which will be more expensive that querying the original table.

Notice you are splitting the columns in the same way as you would when you normalize tables. However, in this case data could already be normalized to 3NF (and even BCNF and 4NF), but you decide to further split it for the reason of physical optimization.

One way to partition data horizontally, using Oracle syntax:
create table data (    id integer primary key,     status char(1),     data1 varchar2(10),     data2 varchar2(10) )    partition by list (status) (        partition active_data values ( ‘A’ ),       partition other_data values(default)     );

This would tell the DBMS to internally store the table data in two segments (like two tables), depending on the value of the column status. This way of partitioning data can be applied, for example, when you usually query only rows of one partition, e.g., the status ‘A’ rows (let’s call them active rows). Like before, full scans will be faster (particularly if there are only few active rows), the active rows (and the other rows resp.) are stored contiguously (they won’t be scattered around pages that they share with rows of a different status value, and it is more likely that the active rows will be in the page cache.

 

ETL JOB FOR DOWNLOADING AND UNTARING TAR FILE FROM FTP

etlpic1

 

ETL JOB FOR DOWNLOADING AND UNTARING TAR FILE FROM FTP

1. To Download File from FTP , we first have to create connection by providing all credentials of FTP Server (Host-URl, Username, Password, Port No and which type of connection it is like FTP or SFTP)component Name : tFTPConnection
2. Then Next Task is to Provide Ftp File path (For this mention Ftp Location of file’s ) component Name : tFTPFileList
3. Once Done then we have to mention where we want to put that file (Here we mention Local System path where we want to put our Ftp file’s) component Name : tFTPGet

filedownload

ETL JOB TO PROCESSED TAR FILE’S AND ITERATE THEM ONE-BY-ONE

1. To Untar a tar file there is a component tFileArchieve but instead of that I am using GZipCompressor by using Java code in tJava component .
2. Here we just need to drag-n-drop tJava component , and in that provide the location of tar file and path where you untaring your tar file…

File dest = new File(dirName);TarArchiveInputStream tarIn = new TarArchiveInputStream(

new GzipCompressorInputStream(

new BufferedInputStream( new FileInputStream( TarName ) )

);

TarArchiveEntry tarEntry = tarIn.getNextTarEntry();

while (tarEntry != null)

{

// create a file with the same name as the tarEntry

File destPath = new File(dest, tarEntry.getName());

System.out.println(“working: ” + destPath.getCanonicalPath()+”— Tar Entry: “);

context.csvloc=””+destPath.getParentFile();

System.out.println(“\nCSV FILE Location ::::”+context.csvloc+”\n”);

if(!(destPath.getParentFile().exists()))

{

System.out.println(“Dest: “+dest);

destPath.getParentFile().mkdirs();

}

if (tarEntry.isDirectory())

{

System.out.println(“Createing directory: “+tarEntry.getName());

destPath.mkdirs();

}

else

{

destPath.createNewFile();

byte [] btoRead = newbyte[2048];

BufferedOutputStream bout = new BufferedOutputStream(new FileOutputStream(destPath));

int len; //variable declared

while((len = tarIn.read(btoRead)) != -1)

{

bout.write(btoRead,0,len);

}

bout.close();

btoRead = null;

}

tarEntry = tarIn.getNextTarEntry();

}//while loop end here

tarIn.close();

(This code is capable of searching tar file in given folder as well as untaring that file into specified folder path)

Here “ dirName” denotes location where Tar file is present and “TarName” denotes name of the Tar file.
3. Regarding Iteration you can connect tFTPGet-component to this tJava-component by Iterate. By this way tJava-component get one Tar file at a time and processed it.

So lastly the flow is similar to the below picture….

etlpic