Change Hive metastore from derby to MySQL

Change Hive metastore from derby to MySQL

 

Machine : UBUNTU-14.04 | Hive : HIve 1.2.1

To change Hive Metastore from Derby to MySQL we need to follow these 8 simple steps,

 

Step-1 :

First we need to install Mysql and its dependencies on system.

Command-1 : sudo apt-get install mysql-server

Note: Click Next > Next and set the password for MySQL.

Command-2 : sudo apt-get install libmysql-java

 

Step-2 :

Create soft-link for connector in Hive lib directory.

Command-1 : ln -s /usr/share/java/mysql-connector-java.jar $HIVE_HOME/lib/mysql-connector-java.jar

 

Step-3 :

Access your MySQL and create one new database metastore for hive,

Command : $ mysql -u root -p

Enter password:

mysql> CREATE DATABASE metastore;

 

Step-4 :

Then we need one MySQL account for Hive to access the metastore. It is very important to prevent this user account from any type of change in schema.

Command :

mysql> CREATE USER ‘hive’@’%’ IDENTIFIED BY ‘password’;

mysql> GRANT all on *.* to ‘hive’@localhost identified by ‘password’;

mysql> flush privileges;

 

Step-5 :

Now we need to configure Hive to access MySQL metastore, for this we need to update hive-site.xml file(If file does not exist then create a new one)

<configuration>

<property>

<name>javax.jdo.option.ConnectionURL</name>

<value>jdbc:mysql://192.168.8.99/metastore?createDatabaseIfNotExist=true</value>

<description>metadata is stored in a MySQL server</description>

</property>

<property>

<name>javax.jdo.option.ConnectionDriverName</name>

<value>com.mysql.jdbc.Driver</value>

<description>MySQL JDBC driver class</description>

</property>

<property>

<name>javax.jdo.option.ConnectionUserName</name>

<value>hive</value>

<description>user name for connecting to mysql server</description>

</property>

<property>

<name>javax.jdo.option.ConnectionPassword</name>

<value>password</value>

<description>password for connecting to mysql server</description>

</property>

</configuration>

 

Note: While updating please take all properties-tag only.

 

Step-6 :

Now we need to run the Hive schematool to initialization MySQL metastore.

For this we need to go to $HIVE_HOME>bin> folder

Command-1 : schematool -initSchema -dbType mysql

Note : When you have found that your metastore is corrupted, then we need to update metastore.

  • Before you run hive for the first time, run

Command : schematool -initSchema -dbType mysql

  • If you already ran hive and then tried to initSchema and if it’s failing:

Command : mv metastore_db metastore_db.tmp

You find your metasore_db file at $HIVE_HOME location.

  • Re run

 

Step-7 :

Start your Hive and access your tables.

 

Step-8 :

To validate it ,

Connect and open your hive

hive>

Then create a table in it and insert one record.

hive> create table saurzcode(id int, name string);

hive> insert into saurzcode(1, “Helical”);

Later access your MySQL and open metastore database

mysql -u root -p

Enter password:

mysql> use metastore;

And see your table as a record in TBLS table of metastore database.

mysql> show tables ;

mysql> select * from TBLS;

 

Hurray, Completed !!!!

Thanks for visiting , Have a great day.

PUSHPRAJ KUMAR

Advertisements

Collection Part-1 : Java Arrays

Arrays are objects which store multiple variables of the same type, it is a collection of similar type of elements that have contiguous memory location.

The length of an array is established when the array is created. After creation, its length is fixed.

array10No

An array of 10 elements.

It is an array of size 10, means we can store 10-elements in a single variable/object.

Define an array:

int[] arr = new int[10];

int[] arr ={1,2,3,4,5};

//We can give any number here. It will allocate 10 int-variable space in JVM heap section.

Starting with arr[0]=10;

to

arr[arr.length -1] =100; // arr.length -1=9 , because we are starting from 0 index.

How to copy one array content to another array,

public static void arraycopy( Object sourceArray, int sourceArrayPositionPos, Object destinationArray, int destinationArrayPosition, int length )

 

src — This is the source array.

srcPos — This is the starting position in the source array.

dest — This is the destination array.

destPos — This is the starting position in the destination data.

length — This is the number of array elements to be copied.

Use this function to copy one array into another . Ex: //here declaration, instantiation and initialization all are done at a time int[] arr ={8,7,6,5,4,3,2,2,0}; //Assume it is a sourceArray. int[] FinalArr = new int[9]; //Assume it is a destination array , where you want to copy your content. So copy command is:

System.arraycopy(arr, 0, FinalArr, 0, arr.length); //to print this destination/Targeted array.

for(int i=0;i<arr.length;i++)

{ System.out.println(“Value-”+i+” are : ”+FinalArr[i]); }

O/P

Value-0 are : 8

Value-1 are : 7

Value-2 are : 6

Value-3 are : 5

Value-4 are : 4

Value-5 are : 3

Value-6 are : 2

Value-7 are : 2

Value-8 are : 0

EX:

int c1[][][][]={{{{1,3,4},{3,4,5},{1,2,3}},{{1,3,4},{3,4,5},{1,2,3}},{{1,3,4},{3,4,5},{1,2,3}}},{{{1,3,4},{3,4,5},{1,2,3}},{{1,3,4},{3,4,5},{1,2,3}},{{1,3,4},{3,4,5},{1,2,3}}}};

for(int i=0;i<2;i++){

for(int j=0;j<3;j++){

for(int k=0;k<3;k++){

for(int l=0;l<3;l++){

System.out.print(c1[i][j][k][l]+””);

}

}

System.out.print(“\t”);

}

System.out.print(“\n”);

}

O/P

134345123 134345123 134345123
134345123 134345123 134345123

Enjoy learning…. 🙂

Batch-Updation in Hibernate

Batch Updation in Hibernate

JDBC has long been offering support for DML statement batching. By default, all statements are sent one after the other, each one in a separate network round-trip. Batching allows us to send multiple statements in one-shot, saving unnecessary socket stream flushing.

Hibernate hides the database statements behind a transactional write-behind abstraction layer. An intermediate layer allows us to hide the JDBC batching semantics from the persistence layer logic. This way, we can change the JDBC batching strategy without altering the data access code.

Update code snippet look like this ,

Session session = sessionFactory.openSession();
Transaction txInstance = session.beginTransaction();
ScrollableResults studentInstance = session.createQuery("FROM STUDENT").scroll();
int count =0;
while( studentInstance.next())
{
   Student student =(Student) studentInstance.get(Student.class,StudentID); 
   student.setregNo( regNO );
   session.update(student);
  // 50 - size of batch which you set earlier.
  // For Detail "http://helicaltech.com/batch-insertion-in-hibernate/"
   if(++count %50==0)
   {
      session.flush();
      session.clear();
   }
}
txInstance.commit();
session.close();

----------------------------------

 

PUSHPRAJ KUMAR (BI Developer)

DWR

DWR (DIRECT WEB REMOTING)

Introductiondwr2
DWR, or Direct Web Remoting, is a Java open source library that helps developers write web sites that include Ajax technology. It is a RPC library which makes it easy to call Java functions from JavaScript and to call JavaScript functions from Java.

dwr1
How it works?
It consists of two main parts:
• Code to allow JavaScript to retrieve data from a servlet-based web server using Ajax principles.
• A JavaScript library that makes it easier for the web site developer to dynamically update the web page with the retrieved data.
Basically, it converts the java class (which was configured in dwr.xml) to JavaScript so that we can easily access any function written on server side java at front-end side.
Why to use DWR ?
DWR has a number of features like call batching, marshalling of virtually any data-structure between Java and Javascript (including binary file uploading and downloading), exception handling, advanced CSRF protection and deep integration with several Java server-side technologies like Spring and Guice.
How to implement?
To implement DWR in your spring Application
1. You need to first download the DWR jars. (dwr.jar and apache.commons.log.jar)
2. Put these jars into your jar folder.
3. Create new configuration file (dwr.xml) at the same place where web.xml is present.
4. Define dwr.xml in web.xml file, so that jvm understand the flow of execution.

<display-name>DWREasyAjax</display-name>
<servlet>
<display-name>DWR Servlet</display-name>
<servlet-name>dwr-invoker</servlet-name>
<servlet-class>
org.directwebremoting.servlet.DwrServlet
</servlet-class>
<init-param>
<param-name>debug</param-name>
<param-value>true</param-value>
</init-param>
</servlet><servlet-mapping>
<servlet-name>dwr-invoker</servlet-name>
<url-pattern>/dwr/*</url-pattern>
</servlet-mapping>

 

5. Then mention which java file you want to access on your frontend and vice versa.
Ex.

<dwr>
<allow>
<create creator=”new” javascript=”HorizontalMenu”>
<param name=”class” value=”samples.HorizontalMenu” />
</create>
</allow>
</dwr>

 

6. Create your java page (Servlet page)and write whatever the business logic you want.
7. Create your front end page , and add your servlet page as js file(which you configured in your dwr.xml file)
Some important link:-
1. https://www.packtpub.com/books/content/dwr-java-ajax-user-interface-basic-elements-part-1
2. http://java-x.blogspot.in/2007/03/reverse-ajax-with-direct-web-remoting.html
3. http://www.javaworld.com/article/2071890/web-app-frameworks/ajax-made-simple-with-dwr.html
4. http://www.jitendrazaa.com/blog/java/jsp/step-by-step-dwr-application-simple-ajax-in-java/
Pushpraj Kumar
Helical IT Solution

DATA VAULT MODELING

Data-Vault-Model (1)

DATA VAULT MODELING
Data Vault Modeling is a database modeling method that is designed to provide long-term historical storage of data coming in from multiple operational systems. It is also a method of looking at historical data that, apart from the modeling aspect, deals with issues such as auditing, tracing of data, loading speed and resilience to change.

Data Vault Modeling focuses on several things:-
First, it emphasizes the need to trace of where all the data in the database came from. This means that every row in a Data Vault must be accompanied by record source and load date attributes, enabling an auditor to trace values back to the source.
Second, it makes no distinction between good and bad data (“bad” meaning not conforming to business rules) This is summarized in the statement that a Data Vault stores “a single version of the facts” as opposed to the practice in other data warehouse methods of storing “a single version of the truth” where data that does not conform to the definitions is removed or “cleansed”.
Third, the modeling method is designed to be resilient to change in the business environment where the data being stored is coming from, by explicitly separating structural information from descriptive attributes.
Finally, Data Vault is designed to enable parallel loading as much as possible, so that very large implementations can scale out without the need for major redesign.

Data Vault’s philosophy is that all data is relevant data, even if it is not in line with established definitions and business rules. If data is not conforming to these definitions and rules then that is a problem for the business, not the data warehouse. The determination of data being “wrong” is an interpretation of the data that stems from a particular point of view that may not be valid for everyone or at every point in time. Therefore the Data Vault must capture all data and only when reporting or extracting data from the Data Vault is the data being interpreted.

BASIC NOTIONS
Data Vault attempts to solve the problem of dealing with change in the environment by separating the business keys (that do not mutate as often, because they uniquely identify a business entity) and the associations between those business keys, from the descriptive attributes of those keys.

The business keys and their associations are structural attributes, forming the skeleton of the data model. The Data Vault method has as one of its main axioms that real business keys only change when the business changes and are therefore the most stable elements from which to derive the structure of a historical database. If you use these keys as the backbone of a Data Warehouse, you can organize the rest of the data around them. This means that choosing the correct keys for the Hubs is of prime importance for the stability of your model. The keys are stored in tables with a few constraints on the structure. These key-tables are called Hubs.

DATA VAULT AND DIMENSIONAL MODELLING
The Data Vault modelled layer is normally used to store data. It is not optimized for query performance, nor is it easy to query by the well-known query-tools such as Cognos, SAP Business Objects, Pentaho et al. Since these end-user computing tools expect or prefer their data to be contained in a dimensional model, a conversion is usually necessary.
For performance reasons the dimensional model will usually be implemented in relational tables, after approval.
Note that while it is relatively straightforward to move data from a Data Vault model to a (cleansed) dimensional model, the reverse is not as easy.

hub-spoke-2

Refer for more Detail: http://www.slideshare.net/dlinstedt

PUSHPRAJ (BI-Developer)