Skip to Main Content

How To Query Hive and Impala from Oracle using ODBC Heterogeneous Gateway

unknown-1040115Jul 6 2016 — edited Sep 1 2016

Introduction

Big Data technologies have spread throughout organizations and the challenge of integration with legacy systems must be addressed somehow. There are several ways to connect Oracle or other RDBMS systems with the big data ecosystem, such as dedicated connectors, ETL, and custom ones. In the following article we will demonstrate one simple method to connect and query from Oracle directly to Apache Hive and Cloudera Impala using the ODBC heterogeneous gateway. Heterogeneous gateways were developed by Oracle to address the direct connectivity and type translations from non-Oracle to Oracle databases. There are specialized gateways, subjected to licensing and the generic ODBC gateway which is free. The ODBC heterogeneous gateway is a generic gateway that can be used practically with any data source that provides a functional ODBC driver.

Apache Hive is a data warehouse framework built on top of Hadoop used for mostly for analytics. It is using a query-like language called HiveQL. All queries are translated into map reduce jobs and executed further in hadoop. It lacks transaction support and comes also with several limitations. More about Hive at https://hive.apache.org.

Cloudera Impala is a more elaborated framework, starting to get wider acceptance and support by more and more vendors such as Amazon, Oracle, MapR, and others. It is implemented as a Massive Parallel Processing query engine with SQL ANSI compatibility, having low latency that completely bypasses map reduce operations. Also it has support for many different file formats.  More about Impala (http://www.cloudera.com/documentation/enterprise/latest/topics/impala.html). 

Environmental Setup

Oracle server

Operating system: Oracle Virtual Box guest running OEL 6.5

Database server: Oracle Enterprise Edition Ver. 12.1.0.2.0

Hostname: Node1

IP address: 192.168.1.20

Cloudera QuickStart

Operating system: Oracle Virtual Box guest running CentOS 6.7

Cloudera version: 5.7

Hostname: Cloudera

IP address: 192.168.1.88

To setup the Cloudera environment we used a Cloudera Quickstart image that can be downloaded from http://www.cloudera.com/downloads/quickstart_vms/5-7.html. The hive tables used in this article are created according to Cloudera tutorial that can be found here (http://www.cloudera.com/developers/get-started-with-hadoop-tutorial/exercise-1.html)

Installing Cloudera Hive and Impala ODBC driver

As a perquisite to install the Cloudera Hive and Impala ODBC drivers we need to have installed unixODBC or iODBC. We have opted for unixODBC and installed as follows:

[root@node1 etc]# yum install unixODBC

Loaded plugins: refresh-packagekit, security

..........................................................................................................

Setting up Install Process

Resolving Dependencies

--> Running transaction check

--> Package unixODBC.x86_64 0:2.2.14-14.el6 will be installed

--> Finished Dependency Resolution

Dependencies Resolved

=============================================================

Package                     Arch                     Version                                    Repository                               

Size

=============================================================

Installing:

unixODBC                x86_64                    2.2.14-14.el6                           ol6_latest                              

377 k

Transaction Summary

============================================================

Install       1 Package(s)

Total download size: 377 k

Installed size: 1.1 M

Is this ok [y/N]: y

Downloading Packages:

unixODBC-2.2.14-14.el6.x86_64.rpm 

….............................................................................................................................  

Next, download the Cloudera Hive ODBC driver from the following address:

http://www.cloudera.com/downloads/connectors/hive/odbc/2-5-12.html

As root execute the following command to install the Cloudera Hive ODBC driver :

[root@node1 kit]# rpm -Uhv ClouderaHiveODBC-2.5.19.1004-1.el6.x86_64.rpm

Preparing...                                    ########################################### [100%]

   1:ClouderaHiveODBC                ########################################### [100%]

[root@node1 kit]#

Download the Cloudera Impala ODBC driver form the following location:

(http://www.cloudera.com/downloads/connectors/impala/odbc/2-5-22.html)

As root execute the following command to install the odbc cloudera impala driver :

[root@node1 kit]# rpm -Uhv ClouderaImpalaODBC-2.5.33.1004-1.el6.x86_64.rpm

Preparing...                                                           ###################### [100%]

   1:ClouderaImpalaODBC                                   ###################### [100%]

[root@node1 kit]#

Configuring the Cloudera Hive ODBC driver

The driver installation directory will be located in /opt/cloudera/hiveodbc. In the /opt/cloudera/hiveodbc/Setup directory we will find sample configuration files that can be used with few modification for performing the setup.

Create a directory as root /usr/local/odbc.This directory will be used as placement for odbcinst.ini:

[root@node1 ~]# mkdir -p /usr/local/odbc

Copy odbc.ini in /home/oracle directory as follows:

[root@node1 Setup]# cp odbc.ini /home/oracle


Next copy cloudera.hiveodbc.ini from /opt/cloudera/hiveodbc/lib/64 to /etc directory as follows:

[root@node1 64]# cp cloudera.hiveodbc.ini /etc

Next, we will configure the environment variables for Cloudera Hive ODBC driver.

Open /home/oracle/.bash_profile for editing and add the following variables (marked with bold) as follows:

Next add in the /home/oracle/.bash_profile the following variables.

[oracle@node1 ~]$ vi .bash_profile

# .bash_profile

# Get the aliases and functions

if [ -f ~/.bashrc ]; then

            . ~/.bashrc

fi

# User specific environment and startup programs

export ORACLE_HOME=/u01/app/oracle/product/12.1.0/dbhome_1

export ORACLE_SID=ORCL

export ODBCINI=~/odbc.ini

export ODBCSYSINI=/usr/local/odbc

export CLOUDERAHIVEINI=/etc/cloudera.hiveodbc.ini

PATH=$PATH:$HOME/bin:$ORACLE_HOME/bin

export LD_LIBRARY_PATH=$LIBRARY_PATH:$ORACLE_HOME/lib

Source the profile to get the environment variables loaded as follows:

[oracle@node1 ~]$ source .bash_profile

The odbc.ini file copied under /home/oracle is containing two examples for configuring Data source names (DSN) for the 32 and 64 driver version. All parameters are explained with comments in the sample files, comments which are omitted from the current listing. If you are interested in learning more about these parameter then check the driver documentation at http://www.cloudera.com/documentation/other/connectors/hive-odbc/latest.html.

We will change only the host, port, and UID under [Cloudera ODBC Driver for Apache Hive (64-bit) DSN] and rename the data source name to HIVEDSN as follows:

[HiveDSN]

Description     = Cloudera ODBC Driver for Apache Hive (64-bit) DSN

Driver              = /opt/cloudera/hiveodbc/lib/64/libclouderahiveodbc64.so

HOST                        = 192.168.1.88

PORT             = 10000

Schema            = default

ServiceDiscoveryMode                      = 0

ZKNamespace =

HiveServerType          = 2

AuthMech                  = 2

ThriftTransport          = 1

UseNativeQuery         = 0

KrbHostFQDN          = [Hive Server 2 Host FQDN]

KrbServiceName         = [Hive Server 2 Kerberos service name]

KrbRealm                    = [Hive Server 2 Kerberos realm]

SSL                              = 0

TwoWaySSL               = 0

ClientCert                   =

ClientPrivateKey        =

ClientPrivateKeyPassword                =

UID=cloudera

Next, test the connection as follows:

[oracle@node1 ~]$ isql -v hivedsn

+---------------------------------------+

| Connected!                            |

|                                                |

| sql-statement                         |

| help [tablename]                   |

| quit                                        |

|                                                |

+---------------------------------------+

SQL> 

List the tables and try a count against the orders table to ensure that everything is working fine:

SQL> show tables;

+-----------------------------------+

| tab_name |+-----------------------------------------+

| categories || customers || departments || order_items || orders || products |+------------------------------------------+

SQLRowCount returns -1

6 rows fetched

SQL> select count(*) from orders;

+---------------------+

| EXPR_1              |

+---------------------+

| 68883               |

+---------------------+

SQLRowCount returns -1

1 rows fetched

The connection and data retrieval seems to work fine.

Oracle ODBC transparent gateway configuration for Cloudera Hive ODBC

Under the directory $ORACLE_HOME/hs/admin we created a initialization parameters file named initHIVEDSN.ora (here we used the dg4odbc.ini file as a template) with the following contents:

[oracle@node1 admin]$ vi initHIVEDSN.ora

# This is a sample agent init file that contains the HS parameters that are needed for the Database Gateway for ODBC

HS_FDS_CONNECT_INFO = "HIVEDSN"

HS_FDS_TRACE_LEVEL = 0

HS_FDS_SHAREABLE_NAME = /opt/cloudera/hiveodbc/lib/64/libclouderahiveodbc64.so

#

# ODBC specific environment variables

#

set ODBCINI=/home/oracle/odbc.ini

Note: The name of the init file must be init<HS_FDS_CONNECT_INFO value>.ora otherwise the connection will fail.

Listener configuration

Open the listener.ora file

[oracle@node1 admin]$ vi /u01/app/oracle/product/12.1.0/dbhome_1/network/admin/listener.ora

# listener.ora Network Configuration File: /u01/app/oracle/product/12.1.0/dbhome_1/network/admin/listener.ora

# Generated by Oracle configuration tools.

SID_LIST_LISTENER =

  (SID_LIST =

    (SID_DESC =

      (PROGRAM = dg4odbc)

      (ARGS = ENVS=LD_LIBRARY_PATH=/opt/cloudera/hiveodbc/lib/64/libclouderahiveodbc64.so)

      (SID_NAME = HIVEDSN)

      (ORACLE_HOME = /u01/app/oracle/product/12.1.0/dbhome_1)

    )

  )

LISTENER =

  (DESCRIPTION_LIST =

    (DESCRIPTION =

      (ADDRESS = (PROTOCOL = TCP)(HOST = node1)(PORT = 1521))

    )

    (DESCRIPTION =

      (ADDRESS = (PROTOCOL = IPC)(KEY = EXTPROC1521))

    )

  )

ADR_BASE_LISTENER = /u01/app/oracle

Add a network service in tnsnames.ora, called HIVEDSN as follows :

HIVEDSN =

  (DESCRIPTION=

(ADDRESS=(PROTOCOL=tcp)(HOST=node1)(PORT=1521))

(CONNECT_DATA=(SID=HIVEDSN))

      (HS=OK)

    )

Reload the listener as follows:

[oracle@node1 admin]$ lsnrctl reload

LSNRCTL for Linux: Version 12.1.0.2.0 - Production on 20-JUN-2016 08:05:57

Copyright (c) 1991, 2014, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=node1)(PORT=1521)))

The command completed successfully

  [oracle@node1 admin]$

Check the services availability:

[oracle@node1 admin]$ lsnrctl status

LSNRCTL for Linux: Version 12.1.0.2.0 - Production on 20-JUN-2016 08:06:32

Copyright (c) 1991, 2014, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=node1)(PORT=1521)))

STATUS of the LISTENER

-----------------------------------------------------------------------------------------------------------------------

Alias                            LISTENER

Version                        TNSLSNR for Linux: Version 12.1.0.2.0 - Production

Start Date                    19-JUN-2016 08:04:43

Uptime                        1 days 0 hr. 1 min. 49 sec

Trace Level                 off

Security                       ON: Local OS Authentication

SNMP                          OFF

Listener Parameter File   /u01/app/oracle/product/12.1.0/dbhome_1/network/admin/listener.ora

Listener Log File /u01/app/oracle/diag/tnslsnr/node1/listener/alert/log.xml

Listening Endpoints Summary...

(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=node1)(PORT=1521)))

  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=EXTPROC1521)))

(DESCRIPTION=(ADDRESS=(PROTOCOL=tcps)(HOST=node1)(PORT=5500))

(Security=(my_wallet_directory=/u01/app/oracle/product/12.1.0/dbhome_1/admin/ORCL/xdb_wallet))(Presentation=HTTP)(Session=RAW))

Services Summary...

Service "HIVEDSN" has 1 instance(s).

  Instance "HIVEDSN", status UNKNOWN, has 1 handler(s) for this service...

Service "ORCL" has 1 instance(s).

  Instance "ORCL", status READY, has 1 handler(s) for this service...

Service "ORCLXDB" has 1 instance(s).

  Instance "ORCL", status READY, has 1 handler(s) for this service...

The command completed successfully

Ping the network service HIVEDSN to check its availability as follows:

[oracle@node1 admin]$ tnsping HIVEDSN

TNS Ping Utility for Linux: Version 12.1.0.2.0 - Production on 20-JUN-2016 08:07:56

Copyright (c) 1997, 2014, Oracle.  All rights reserved.

Used parameter files:

/u01/app/oracle/product/12.1.0/dbhome_1/network/admin/sqlnet.ora

Used TNSNAMES adapter to resolve the alias

Attempting to contact (DESCRIPTION= (ADDRESS=(PROTOCOL=tcp)(HOST=node1)(PORT=1521)) (CONNECT_DATA=(SID=HIVEDSN)) (HS=OK))

OK (10 msec)

Create a public database link called hivedsn as follows:

[oracle@node1 admin]$ sqlplus / as sysdba

SQL*Plus: Release 12.1.0.2.0 Production on Mon Jun 20 08:09:35 2016

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

Connected to:

Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production

With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options

SQL> create public database link hivedsn connect to cloudera identified by cloudera using 'HIVEDSN';

Database link created.

Next, try to fetch some data from hive using the database link:

SQL> select * from customers@hivedsn;

select * from customers@hivedsn

                        *

ERROR at line 1:

ORA-28500: connection from ORACLE to a non-Oracle system returned this message:

[Cloudera][ODBC] (11560) Unable to locate SQLGetPrivateProfileString function.

{HY000,NativeErr = 11560}

ORA-02063: preceding 2 lines from HIVEDSN

The SQLGetPrivateProfileString function is contained and exported from libodbcinst.so shared library. To solve this issue we have to add the complete path (with bold) into /etc/cloudera.hiveodbc.ini driver configuration file as follows.:

[Driver]

ODBCInstLib=/usr/lib64/libodbcinst.so

ErrorMessagesPath=/opt/cloudera/hiveodbc/ErrorMessages/

LogLevel=0

LogPath=

SwapFilePath=/tmp

Also if the Hive ODBC driver is not linked with libodbcinst.so you should add the LD_PRELOAD variable to bash_profile or issue export LD_PRELOAD=/usr/lib64/libodbcinst.so.

[oracle@node1 ~]$ export LD_PRELOAD=/usr/lib64/libodbcinst.so

Verify that libodbcinst.so is loaded as follows:

[oracle@node1 ~]$ ldd /opt/cloudera/impalaodbc/lib/64/libclouderaimpalaodbc64.so

            linux-vdso.so.1 =>  (0x00007fffad5ff000)

            libdl.so.2 => /lib64/libdl.so.2 (0x00007fd1af2b2000)

            libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fd1af094000)

            libsasl2.so.2 => /usr/lib64/libsasl2.so.2 (0x00007fd1aee7a000)

            librt.so.1 => /lib64/librt.so.1 (0x00007fd1aec72000)

            libresolv.so.2 => /lib64/libresolv.so.2 (0x00007fd1aea57000)

            libkeyutils.so.1 => /lib64/libkeyutils.so.1 (0x00007fd1ae854000)

            libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x00007fd1ae54e000)

            libm.so.6 => /lib64/libm.so.6 (0x00007fd1ae2c9000)

            libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007fd1ae0b3000)

            libc.so.6 => /lib64/libc.so.6 (0x00007fd1add20000)

            /lib64/ld-linux-x86-64.so.2 (0x00000033a1a00000)

            libcrypt.so.1 => /lib64/libcrypt.so.1 (0x00007fd1adae8000)

            libfreebl3.so => /lib64/libfreebl3.so (0x00007fd1ad886000)

Reload the listener, connect again and reissue the select as follows:

[oracle@node1 admin]$ sqlplus / as sysdba

SQL*Plus: Release 12.1.0.2.0 Production on Mon Jun 20 08:23:36 2016

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

Connected to:

Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production

With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options

SQL> select * from customers@hivedsn;

select * from customers@hivedsn

                        *

ERROR at line 1:

ORA-28500: connection from ORACLE to a non-Oracle system returned this message:

[Cloudera][ODBC] (11470) Transactions are not supported. {HYC00,NativeErr =

11470}

ORA-02063: preceding 2 lines from HIVEDSN

The cause of this error is that  Hive is a non transactional/non logging data store, therefore we should issue our transactions in read only and non-logging mode. We can tweak this by using the heterogeneous service HS_TRANSACTION_MODEL initialization parameter changed to READ_ONLY_AUTOCOMMIT. More about initialization parameters used by ODBC heterogeneous gateway can be found at (https://docs.oracle.com/database/121/ODBCU/feature.htm#ODBCU763)

Open the initHIVEDSN.ora parameters file for editing and add the HS_TRANSACTION_MODEL parameter as follows:

# This is a sample agent init file that contains the HS parameters that are

# needed for the Database Gateway for ODBC

HS_FDS_CONNECT_INFO = "HIVEDSN"

HS_FDS_TRACE_LEVEL = 0

HS_FDS_SHAREABLE_NAME = /opt/cloudera/hiveodbc/lib/64/libclouderahiveodbc64.so

HS_TRANSACTION_MODEL=READ_ONLY_AUTOCOMMIT

#

# ODBC specific environment variables

#

set ODBCINI=/home/oracle/odbc.ini

Save, connect again and reissue the statement as follows:

SQL> select * from customers@hivedsn

-------------------------------------------------------------------------------------------

1530 David        Smith           XXXXXXXXX         XXXXXXXXX                 

The connection and data retrieval is fully functional.

Configuring the Cloudera Impala ODBC driver

The driver installation directory will be located in /opt/cloudera/impalaodbc. In the /opt/cloudera/impalaodbc/Setup directory we will find sample configuration files that can be used with few modification for performing the setup.

Create a directory as root /usr/local/odbc as follows:

[root@node1 ~]# mkdir -p /usr/local/odbc

This directory will be used as placement for odbcinst.ini.

Copy odbc.ini in /home/oracle directory as follows:

[root@node1 Setup]# cp odbc.ini /home/oracle

Next, copy cloudera.impalaodbc.ini from /opt/cloudera/hiveodbc/lib/64

[root@node1 64]# cp cloudera.impalaodbc.ini /etc

Next, we will configure the environment variables for Cloudera Impala ODBC driver. Open .bash_profile for editing and add the following variables (marked with bold) as follows:

[oracle@node1 ~]$ vi .bash_profile

# .bash_profile

# Get the aliases and functions

if [ -f ~/.bashrc ]; then

            . ~/.bashrc

fi

# User specific environment and startup programs

export ORACLE_HOME=/u01/app/oracle/product/12.1.0/dbhome_1

export ORACLE_SID=ORCL

export ODBCINI=~/odbc.ini

export ODBCSYSINI=/usr/local/odbc

export SIMBAINI=/etc/cloudera.impalaodbc.ini

PATH=$PATH:$HOME/bin:$ORACLE_HOME/bin

  export LD_LIBRARY_PATH=$LIBRARY_PATH:$ORACLE_HOME/lib

Source the profile to get the environment variables loaded:

[oracle@node1 ~]$ source .bash_profile

Similarly with the Hive driver, the odbc.ini file copied under /home/oracle is containing two examples for configuring DSN for the 32 and 64 driver version. In our case we are going to use the 64 version. All parameters are explained with comments in the sample file, which are omitted from the listing. For more information about parameters you can consult the driver installation and configuration manual located at http://www.cloudera.com/documentation/other/connectors/impala-odbc/2-5-22.html.

We will change only the host, port, rename the data source to IMPLDSN as follows:

[ImplDSN]

# Description: DSN Description.

# This key is not necessary and is only to give a description of the data source.

Description=Cloudera ODBC Driver for Impala (64-bit) DSN

# Driver: The location where the ODBC driver is installed to.

Driver=/opt/cloudera/impalaodbc/lib/64/libclouderaimpalaodbc64.so

HOST=192.168.1.88

PORT=21050

Database=default

AuthMech=0

Test the connection as follows:

[oracle@node1 ~]$ isql -v ImplDSN

+---------------------------------------+

| Connected!                   |

|                                       |

| sql-statement                |

| help [tablename]          |

| quit                               |

|                                       |

+---------------------------------------+

SQL>

List the tables and try a count against one table to ensure that everything is working fine:

SQL> show tables;

+------------------+

|tab_name                                                                                                                                                                                                                                                 |+------------------+

| categories

|| customers

|| departments

|| order_items

|| orders

|| products

|+-------------------+

SQLRowCount returns -1

6 rows fetched

SQL> select count(*) from orders;

+---------------------+

| EXPR_1              |

+---------------------+

| 68883               |

+---------------------+

SQLRowCount returns -1

1 rows fetched

Oracle ODBC Transparent Gateway Configuration for Cloudera Impala ODBC Driver

Under directory $ORACLE_HOME/hs/admin we created an initialization parameters file named initIMPLDSN.ora (we used the dg4odbc.ini file as a template) with the following contents :

[oracle@node1 admin]$ vi initHIVEDSN.ora

# This is a sample agent init file that contains the HS parameters that are

# needed for the Database Gateway for ODBC

HS_FDS_CONNECT_INFO = "IMPLDSN"

HS_FDS_TRACE_LEVEL = 0

HS_FDS_SHAREABLE_NAME = /opt/cloudera/hiveodbc/lib/64/libclouderahiveodbc64.so

HS_TRANSACTION_MODEL=READ_ONLY_AUTOCOMMIT

#

# ODBC specific environment variables

#

set ODBCINI=/home/oracle/odbc.ini

  Important : The name of the init file must be init<HS_FDS_CONNECT_INFO value>.ora otherwise the connection will fail.

Add also libodbcinst.so path in /etc/cloudera.impalaodbc.ini driver configuration file as follows :

[Driver]

ODBCInstLib=/usr/lib64/libodbcinst.so

ErrorMessagesPath=/opt/cloudera/impalaodbc/ErrorMessages/

LogLevel=0

LogPath=

SwapFilePath=/tmp

Listener configuration

Open the listener.ora file for editing and add the following (marked with bold):

[oracle@node1 admin]$ vi /u01/app/oracle/product/12.1.0/dbhome_1/network/admin/listener.ora

# listener.ora Network Configuration File: /u01/app/oracle/product/12.1.0/dbhome_1/network/admin/listener.ora

# Generated by Oracle configuration tools.

SID_LIST_LISTENER =

  (SID_LIST =

    (SID_DESC =

      (PROGRAM = dg4odbc)

      (ARGS = ENVS=LD_LIBRARY_PATH=/opt/cloudera/impalaodbc/lib/64/libclouderaimpalaodbc64.so)

      (SID_NAME = IMPLDSN)

      (ORACLE_HOME = /u01/app/oracle/product/12.1.0/dbhome_1)

    )

  )

LISTENER =

  (DESCRIPTION_LIST =

    (DESCRIPTION =

      (ADDRESS = (PROTOCOL = TCP)(HOST = node1)(PORT = 1521))

    )

    (DESCRIPTION =

      (ADDRESS = (PROTOCOL = IPC)(KEY = EXTPROC1521))

    )

  )

ADR_BASE_LISTENER = /u01/app/oracle

Add a network service in tnsnames.ora as follows:

IMPLDSN =

  (DESCRIPTION=

(ADDRESS=(PROTOCOL=tcp)(HOST=node1)(PORT=1521))

      (CONNECT_DATA=(SID=IMPLDSN))

      (HS=OK)

    )

Reload the listener as follows:

[oracle@node1 admin]$ lsnrctl reload

LSNRCTL for Linux: Version 12.1.0.2.0 - Production on 20-JUN-2016 08:05:57

Copyright (c) 1991, 2014, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=node1)(PORT=1521)))

The command completed successfully

[oracle@node1 admin]$

Check the services availability:

[oracle@node1 admin]$ lsnrctl status

LSNRCTL for Linux: Version 12.1.0.2.0 - Production on 20-JUN-2016 08:06:32

Copyright (c) 1991, 2014, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=node1)(PORT=1521)))

STATUS of the LISTENER

------------------------------------------------------------------------------------------------------------------------

Alias                            LISTENER

Version                        TNSLSNR for Linux: Version 12.1.0.2.0 - Production

Start Date                    19-JUN-2016 08:04:43

Uptime                        1 days 0 hr. 1 min. 49 sec

Trace Level                 off

Security                       ON: Local OS Authentication

SNMP                          OFF

Listener Parameter File /u01/app/oracle/product/12.1.0/dbhome_1/network/admin/listener.ora

Listener Log File             /u01/app/oracle/diag/tnslsnr/node1/listener/alert/log.xml

Listening Endpoints Summary...

(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=node1)(PORT=1521)))

(DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=EXTPROC1521)))

(DESCRIPTION=(ADDRESS=(PROTOCOL=tcps)(HOST=node1)(PORT=5500))

(Security=(my_wallet_directory=/u01/app/oracle/product/12.1.0/dbhome_1/admin/ORCL/xdb_wallet))

(Presentation=HTTP)(Session=RAW))

Services Summary...

Service "IMPLDSN" has 1 instance(s).

  Instance "IMPLDSN", status UNKNOWN, has 1 handler(s) for this service...

Service "ORCL" has 1 instance(s).

  Instance "ORCL", status READY, has 1 handler(s) for this service...

Service "ORCLXDB" has 1 instance(s).

  Instance "ORCL", status READY, has 1 handler(s) for this service...

The command completed successfully

Ping the network service IMPLDSN to check its availability as follows:

[oracle@node1 admin]$ tnsping IMPLDSN

TNS Ping Utility for Linux: Version 12.1.0.2.0 - Production on 20-JUN-2016 08:07:56

Copyright (c) 1997, 2014, Oracle.  All rights reserved.

Used parameter files:

/u01/app/oracle/product/12.1.0/dbhome_1/network/admin/sqlnet.ora

Used TNSNAMES adapter to resolve the alias

Attempting to contact (DESCRIPTION=

(ADDRESS=(PROTOCOL=tcp)(HOST=node1)(PORT=1521))

(CONNECT_DATA=(SID=IMPLDSN)) (HS=OK))

OK (10 msec)

Create a public database link to connect from oracle as follows:

[oracle@node1 admin]$ sqlplus / as sysdba

SQL*Plus: Release 12.1.0.2.0 Production on Mon Jun 20 08:09:35 2016

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

Connected to:

Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production

With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options

SQL> create public database link impldsn connect to cloudera identified by cloudera using 'IMPLDSN';

  Database link created.

Test the database link as follows:

SQL> select * from customers@impldsn

-----------------------------------------------------------------------------------

1530 David     Smith               XXXXXXXXX         XXXXXXXXX      

  Connection and data retrieval from Cloudera Impala seems fine.

Observations

Whenever possible, use Impala over Hive because of the several limitations in terms of speed and syntax. It is also possible to have several syntax differences when you query from Hive or Impala directly from Oracle. If some complex query should be executed that it is better to create a view locally in Hive or Impala and try to select that directly. Another benefit is that this will leverage query optimizations on the Impala or Hive side.

One example, if you try to run the following query, you will get a warning that the count inside this query is not a supported feature :

SQL> l

  1  select c."category_name", count("order_item_quantity") as count

  2  from order_items@impaladsn oi

  3  inner join products@impaladsn p on oi."order_item_product_id" = p."product_id"

  4  inner join categories@impaladsn c on c."category_id" = p."product_category_id"

  5  group by c."category_name"

  6* order by count("order_item_quantity") desc

SQL> /

select c."category_name", count("order_item_quantity") as count

ERROR at line 1:

ORA-02070: database IMPALADSN does not support in this context

To solve this issue, create a view in Impala :

[cloudera@quickstart Desktop]$ impala-shell

Starting Impala Shell without Kerberos authentication

Connected to quickstart.cloudera:21000

Server version: impalad version 2.5.0-cdh5.7.0 RELEASE (build ad3f5adabedf56fe6bd9eea39147c067cc552703)

***********************************************************************************

Welcome to the Impala shell. Copyright (c) 2015 Cloudera, Inc. All rights reserved.

(Impala Shell v2.5.0-cdh5.7.0 (ad3f5ad) built on Wed Mar 23 11:33:33 PDT 2016)

After running a query, type SUMMARY to see a summary of where time was spent.

***********************************************************************************

[quickstart.cloudera:21000] >

create view summaryvw as -- Most popular product categories

                               > select c.category_name, count(order_item_quantity) as count

                               > from order_items oi

                               > inner join products p on oi.order_item_product_id = p.product_id

                               > inner join categories c on c.category_id = p.product_category_id

                               > group by c.category_name

                               > order by count desc

                               > limit 10;

Query: create view summaryvw as -- Most popular product categories

select c.category_name, count(order_item_quantity) as count

from order_items oi

inner join products p on oi.order_item_product_id = p.product_id

inner join categories c on c.category_id = p.product_category_id

group by c.category_name

order by count desc

limit 10

Reissue the query this time against the sumaryvw view created in Impala:

  1* select * from summaryvw@impaladsn

SQL> /

category_name count

-------------------------------------------------------------------------------- ----------

Cleats 24551

Men's Footwear 22246

Women's Apparel                                                                                 21035

Indoor/Outdoor Games 19298

Fishing                                                                                                 17325

Water Sports 15540

Camping & Hiking 13729

Cardio Equipment 12487

Shop By Sport 10984

Electronics 3156

10 rows selected.

Summary

In this article we demonstrated how to use Cloudera Hive and Impala ODBC drivers to connect from Oracle using ODBC heterogeneous gateway. This approach could be very helpful when we want to retrieve small amount of data using relative simple queries and need to setup the connection very quickly. Also the method can be used for archive or backup of sensitive data from Hive or Impala.

About the Authors

Y V Ravi Kumar is an Oracle ACE and Oracle Certified Master (OCM) with 17 years of experience in BFSI vertical. He is also OCP in Oracle 8i, 9i, 10g, 11g &12c and Certified in Golden Gate, RAC, Performance Tuning & Oracle Exadata. He continuously motivates many DBAs and helps the Oracle Community by publishing his tips/ideas/suggestions/solutions in his blog. He has written 40+ OTN articles on Oracle Exadata, Oracle RAC and Oracle GoldenGate for OTN for Spanish, Portuguese and English and 17 articles for TOAD World, 2 Articles for UKOUG, 3 Articles for OTech Magazine and 2 Articles for Redgate. He is a frequent Oracle speaker in @OTN, AIOUG, Sangam and IOUG. Learn more from his profile at @"LaserSoft"

Adrian Neagu has over 15 years of experience as a database administrator, having expertise in various RDBMS systems. He has experience in many areas such as financial industry, the pharmaceutical industry, telecom and aviation. He is an Oracle Certified Master 10g and 11g, Oracle Certified Professional 9i, 10g, and 11g, Cloudera Certified Administrator for Apache Hadoop, IBM DB2 Certified Administrator version 8.1.2 and 9, IBM DB2 9 Advanced Certified Administrator 9, and Sun Certified System Administrator Solaris 10. He is an expert in many areas of database administration, BigData and Operating systems including high-performance tuning, high availability, replication, backup, and recovery.

Comments
Post Details
Added on Jul 6 2016
4 comments
14,891 views