Introduction
Big Data technologies have spread throughout organizations and the challenge of integration with legacy systems must be addressed somehow. There are several ways to connect Oracle or other RDBMS systems with the big data ecosystem, such as dedicated connectors, ETL, and custom ones. In the following article we will demonstrate one simple method to connect and query from Oracle directly to Apache Hive and Cloudera Impala using the ODBC heterogeneous gateway. Heterogeneous gateways were developed by Oracle to address the direct connectivity and type translations from non-Oracle to Oracle databases. There are specialized gateways, subjected to licensing and the generic ODBC gateway which is free. The ODBC heterogeneous gateway is a generic gateway that can be used practically with any data source that provides a functional ODBC driver.
Apache Hive is a data warehouse framework built on top of Hadoop used for mostly for analytics. It is using a query-like language called HiveQL. All queries are translated into map reduce jobs and executed further in hadoop. It lacks transaction support and comes also with several limitations. More about Hive at https://hive.apache.org.
Cloudera Impala is a more elaborated framework, starting to get wider acceptance and support by more and more vendors such as Amazon, Oracle, MapR, and others. It is implemented as a Massive Parallel Processing query engine with SQL ANSI compatibility, having low latency that completely bypasses map reduce operations. Also it has support for many different file formats. More about Impala (http://www.cloudera.com/documentation/enterprise/latest/topics/impala.html).
Environmental Setup
Oracle server
Operating system: Oracle Virtual Box guest running OEL 6.5
Database server: Oracle Enterprise Edition Ver. 12.1.0.2.0
Hostname: Node1
IP address: 192.168.1.20
Cloudera QuickStart
Operating system: Oracle Virtual Box guest running CentOS 6.7
Cloudera version: 5.7
Hostname: Cloudera
IP address: 192.168.1.88
To setup the Cloudera environment we used a Cloudera Quickstart image that can be downloaded from http://www.cloudera.com/downloads/quickstart_vms/5-7.html. The hive tables used in this article are created according to Cloudera tutorial that can be found here (http://www.cloudera.com/developers/get-started-with-hadoop-tutorial/exercise-1.html)
Installing Cloudera Hive and Impala ODBC driver
As a perquisite to install the Cloudera Hive and Impala ODBC drivers we need to have installed unixODBC or iODBC. We have opted for unixODBC and installed as follows:
[root@node1 etc]# yum install unixODBC
Loaded plugins: refresh-packagekit, security
..........................................................................................................
Setting up Install Process
Resolving Dependencies
--> Running transaction check
--> Package unixODBC.x86_64 0:2.2.14-14.el6 will be installed
--> Finished Dependency Resolution
Dependencies Resolved
=============================================================
Package Arch Version Repository
Size
=============================================================
Installing:
unixODBC x86_64 2.2.14-14.el6 ol6_latest
377 k
Transaction Summary
============================================================
Install 1 Package(s)
Total download size: 377 k
Installed size: 1.1 M
Is this ok [y/N]: y
Downloading Packages:
unixODBC-2.2.14-14.el6.x86_64.rpm
….............................................................................................................................
Next, download the Cloudera Hive ODBC driver from the following address:
http://www.cloudera.com/downloads/connectors/hive/odbc/2-5-12.html
As root execute the following command to install the Cloudera Hive ODBC driver :
[root@node1 kit]# rpm -Uhv ClouderaHiveODBC-2.5.19.1004-1.el6.x86_64.rpm
Preparing... ########################################### [100%]
1:ClouderaHiveODBC ########################################### [100%]
[root@node1 kit]#
Download the Cloudera Impala ODBC driver form the following location:
(http://www.cloudera.com/downloads/connectors/impala/odbc/2-5-22.html)
As root execute the following command to install the odbc cloudera impala driver :
[root@node1 kit]# rpm -Uhv ClouderaImpalaODBC-2.5.33.1004-1.el6.x86_64.rpm
Preparing... ###################### [100%]
1:ClouderaImpalaODBC ###################### [100%]
[root@node1 kit]#
Configuring the Cloudera Hive ODBC driver
The driver installation directory will be located in /opt/cloudera/hiveodbc. In the /opt/cloudera/hiveodbc/Setup directory we will find sample configuration files that can be used with few modification for performing the setup.
Create a directory as root /usr/local/odbc.This directory will be used as placement for odbcinst.ini:
[root@node1 ~]# mkdir -p /usr/local/odbc
Copy odbc.ini in /home/oracle directory as follows:
[root@node1 Setup]# cp odbc.ini /home/oracle
Next copy cloudera.hiveodbc.ini from /opt/cloudera/hiveodbc/lib/64 to /etc directory as follows:
[root@node1 64]# cp cloudera.hiveodbc.ini /etc
Next, we will configure the environment variables for Cloudera Hive ODBC driver.
Open /home/oracle/.bash_profile for editing and add the following variables (marked with bold) as follows:
Next add in the /home/oracle/.bash_profile the following variables.
[oracle@node1 ~]$ vi .bash_profile
# .bash_profile
# Get the aliases and functions
if [ -f ~/.bashrc ]; then
. ~/.bashrc
fi
# User specific environment and startup programs
export ORACLE_HOME=/u01/app/oracle/product/12.1.0/dbhome_1
export ORACLE_SID=ORCL
export ODBCINI=~/odbc.ini
export ODBCSYSINI=/usr/local/odbc
export CLOUDERAHIVEINI=/etc/cloudera.hiveodbc.ini
PATH=$PATH:$HOME/bin:$ORACLE_HOME/bin
export LD_LIBRARY_PATH=$LIBRARY_PATH:$ORACLE_HOME/lib
Source the profile to get the environment variables loaded as follows:
[oracle@node1 ~]$ source .bash_profile
The odbc.ini file copied under /home/oracle is containing two examples for configuring Data source names (DSN) for the 32 and 64 driver version. All parameters are explained with comments in the sample files, comments which are omitted from the current listing. If you are interested in learning more about these parameter then check the driver documentation at http://www.cloudera.com/documentation/other/connectors/hive-odbc/latest.html.
We will change only the host, port, and UID under [Cloudera ODBC Driver for Apache Hive (64-bit) DSN] and rename the data source name to HIVEDSN as follows:
[HiveDSN]
Description = Cloudera ODBC Driver for Apache Hive (64-bit) DSN
Driver = /opt/cloudera/hiveodbc/lib/64/libclouderahiveodbc64.so
HOST = 192.168.1.88
PORT = 10000
Schema = default
ServiceDiscoveryMode = 0
ZKNamespace =
HiveServerType = 2
AuthMech = 2
ThriftTransport = 1
UseNativeQuery = 0
KrbHostFQDN = [Hive Server 2 Host FQDN]
KrbServiceName = [Hive Server 2 Kerberos service name]
KrbRealm = [Hive Server 2 Kerberos realm]
SSL = 0
TwoWaySSL = 0
ClientCert =
ClientPrivateKey =
ClientPrivateKeyPassword =
UID=cloudera
Next, test the connection as follows:
[oracle@node1 ~]$ isql -v hivedsn
+---------------------------------------+
| Connected! |
| |
| sql-statement |
| help [tablename] |
| quit |
| |
+---------------------------------------+
SQL>
List the tables and try a count against the orders table to ensure that everything is working fine:
SQL> show tables;
+-----------------------------------+
| tab_name |+-----------------------------------------+
| categories || customers || departments || order_items || orders || products |+------------------------------------------+
SQLRowCount returns -1
6 rows fetched
SQL> select count(*) from orders;
+---------------------+
| EXPR_1 |
+---------------------+
| 68883 |
+---------------------+
SQLRowCount returns -1
1 rows fetched
The connection and data retrieval seems to work fine.
Oracle ODBC transparent gateway configuration for Cloudera Hive ODBC
Under the directory $ORACLE_HOME/hs/admin we created a initialization parameters file named initHIVEDSN.ora (here we used the dg4odbc.ini file as a template) with the following contents:
[oracle@node1 admin]$ vi initHIVEDSN.ora
# This is a sample agent init file that contains the HS parameters that are needed for the Database Gateway for ODBC
HS_FDS_CONNECT_INFO = "HIVEDSN"
HS_FDS_TRACE_LEVEL = 0
HS_FDS_SHAREABLE_NAME = /opt/cloudera/hiveodbc/lib/64/libclouderahiveodbc64.so
#
# ODBC specific environment variables
#
set ODBCINI=/home/oracle/odbc.ini
Note: The name of the init file must be init<HS_FDS_CONNECT_INFO value>.ora otherwise the connection will fail.
Listener configuration
Open the listener.ora file
[oracle@node1 admin]$ vi /u01/app/oracle/product/12.1.0/dbhome_1/network/admin/listener.ora
# listener.ora Network Configuration File: /u01/app/oracle/product/12.1.0/dbhome_1/network/admin/listener.ora
# Generated by Oracle configuration tools.
SID_LIST_LISTENER =
(SID_LIST =
(SID_DESC =
(PROGRAM = dg4odbc)
(ARGS = ENVS=LD_LIBRARY_PATH=/opt/cloudera/hiveodbc/lib/64/libclouderahiveodbc64.so)
(SID_NAME = HIVEDSN)
(ORACLE_HOME = /u01/app/oracle/product/12.1.0/dbhome_1)
)
)
LISTENER =
(DESCRIPTION_LIST =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = node1)(PORT = 1521))
)
(DESCRIPTION =
(ADDRESS = (PROTOCOL = IPC)(KEY = EXTPROC1521))
)
)
ADR_BASE_LISTENER = /u01/app/oracle
Add a network service in tnsnames.ora, called HIVEDSN as follows :
HIVEDSN =
(DESCRIPTION=
(ADDRESS=(PROTOCOL=tcp)(HOST=node1)(PORT=1521))
(CONNECT_DATA=(SID=HIVEDSN))
(HS=OK)
)
Reload the listener as follows:
[oracle@node1 admin]$ lsnrctl reload
LSNRCTL for Linux: Version 12.1.0.2.0 - Production on 20-JUN-2016 08:05:57
Copyright (c) 1991, 2014, Oracle. All rights reserved.
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=node1)(PORT=1521)))
The command completed successfully
[oracle@node1 admin]$
Check the services availability:
[oracle@node1 admin]$ lsnrctl status
LSNRCTL for Linux: Version 12.1.0.2.0 - Production on 20-JUN-2016 08:06:32
Copyright (c) 1991, 2014, Oracle. All rights reserved.
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=node1)(PORT=1521)))
STATUS of the LISTENER
-----------------------------------------------------------------------------------------------------------------------
Alias LISTENER
Version TNSLSNR for Linux: Version 12.1.0.2.0 - Production
Start Date 19-JUN-2016 08:04:43
Uptime 1 days 0 hr. 1 min. 49 sec
Trace Level off
Security ON: Local OS Authentication
SNMP OFF
Listener Parameter File /u01/app/oracle/product/12.1.0/dbhome_1/network/admin/listener.ora
Listener Log File /u01/app/oracle/diag/tnslsnr/node1/listener/alert/log.xml
Listening Endpoints Summary...
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=node1)(PORT=1521)))
(DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=EXTPROC1521)))
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcps)(HOST=node1)(PORT=5500))
(Security=(my_wallet_directory=/u01/app/oracle/product/12.1.0/dbhome_1/admin/ORCL/xdb_wallet))(Presentation=HTTP)(Session=RAW))
Services Summary...
Service "HIVEDSN" has 1 instance(s).
Instance "HIVEDSN", status UNKNOWN, has 1 handler(s) for this service...
Service "ORCL" has 1 instance(s).
Instance "ORCL", status READY, has 1 handler(s) for this service...
Service "ORCLXDB" has 1 instance(s).
Instance "ORCL", status READY, has 1 handler(s) for this service...
The command completed successfully
Ping the network service HIVEDSN to check its availability as follows:
[oracle@node1 admin]$ tnsping HIVEDSN
TNS Ping Utility for Linux: Version 12.1.0.2.0 - Production on 20-JUN-2016 08:07:56
Copyright (c) 1997, 2014, Oracle. All rights reserved.
Used parameter files:
/u01/app/oracle/product/12.1.0/dbhome_1/network/admin/sqlnet.ora
Used TNSNAMES adapter to resolve the alias
Attempting to contact (DESCRIPTION= (ADDRESS=(PROTOCOL=tcp)(HOST=node1)(PORT=1521)) (CONNECT_DATA=(SID=HIVEDSN)) (HS=OK))
OK (10 msec)
Create a public database link called hivedsn as follows:
[oracle@node1 admin]$ sqlplus / as sysdba
SQL*Plus: Release 12.1.0.2.0 Production on Mon Jun 20 08:09:35 2016
Copyright (c) 1982, 2014, Oracle. All rights reserved.
Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options
SQL> create public database link hivedsn connect to cloudera identified by cloudera using 'HIVEDSN';
Database link created.
Next, try to fetch some data from hive using the database link:
SQL> select * from customers@hivedsn;
select * from customers@hivedsn
*
ERROR at line 1:
ORA-28500: connection from ORACLE to a non-Oracle system returned this message:
[Cloudera][ODBC] (11560) Unable to locate SQLGetPrivateProfileString function.
{HY000,NativeErr = 11560}
ORA-02063: preceding 2 lines from HIVEDSN
The SQLGetPrivateProfileString function is contained and exported from libodbcinst.so shared library. To solve this issue we have to add the complete path (with bold) into /etc/cloudera.hiveodbc.ini driver configuration file as follows.:
[Driver]
ODBCInstLib=/usr/lib64/libodbcinst.so
ErrorMessagesPath=/opt/cloudera/hiveodbc/ErrorMessages/
LogLevel=0
LogPath=
SwapFilePath=/tmp
Also if the Hive ODBC driver is not linked with libodbcinst.so you should add the LD_PRELOAD variable to bash_profile or issue export LD_PRELOAD=/usr/lib64/libodbcinst.so.
[oracle@node1 ~]$ export LD_PRELOAD=/usr/lib64/libodbcinst.so
Verify that libodbcinst.so is loaded as follows:
[oracle@node1 ~]$ ldd /opt/cloudera/impalaodbc/lib/64/libclouderaimpalaodbc64.so
linux-vdso.so.1 => (0x00007fffad5ff000)
libdl.so.2 => /lib64/libdl.so.2 (0x00007fd1af2b2000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fd1af094000)
libsasl2.so.2 => /usr/lib64/libsasl2.so.2 (0x00007fd1aee7a000)
librt.so.1 => /lib64/librt.so.1 (0x00007fd1aec72000)
libresolv.so.2 => /lib64/libresolv.so.2 (0x00007fd1aea57000)
libkeyutils.so.1 => /lib64/libkeyutils.so.1 (0x00007fd1ae854000)
libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x00007fd1ae54e000)
libm.so.6 => /lib64/libm.so.6 (0x00007fd1ae2c9000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007fd1ae0b3000)
libc.so.6 => /lib64/libc.so.6 (0x00007fd1add20000)
/lib64/ld-linux-x86-64.so.2 (0x00000033a1a00000)
libcrypt.so.1 => /lib64/libcrypt.so.1 (0x00007fd1adae8000)
libfreebl3.so => /lib64/libfreebl3.so (0x00007fd1ad886000)
Reload the listener, connect again and reissue the select as follows:
[oracle@node1 admin]$ sqlplus / as sysdba
SQL*Plus: Release 12.1.0.2.0 Production on Mon Jun 20 08:23:36 2016
Copyright (c) 1982, 2014, Oracle. All rights reserved.
Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options
SQL> select * from customers@hivedsn;
select * from customers@hivedsn
*
ERROR at line 1:
ORA-28500: connection from ORACLE to a non-Oracle system returned this message:
[Cloudera][ODBC] (11470) Transactions are not supported. {HYC00,NativeErr =
11470}
ORA-02063: preceding 2 lines from HIVEDSN
The cause of this error is that Hive is a non transactional/non logging data store, therefore we should issue our transactions in read only and non-logging mode. We can tweak this by using the heterogeneous service HS_TRANSACTION_MODEL initialization parameter changed to READ_ONLY_AUTOCOMMIT. More about initialization parameters used by ODBC heterogeneous gateway can be found at (https://docs.oracle.com/database/121/ODBCU/feature.htm#ODBCU763)
Open the initHIVEDSN.ora parameters file for editing and add the HS_TRANSACTION_MODEL parameter as follows:
# This is a sample agent init file that contains the HS parameters that are
# needed for the Database Gateway for ODBC
HS_FDS_CONNECT_INFO = "HIVEDSN"
HS_FDS_TRACE_LEVEL = 0
HS_FDS_SHAREABLE_NAME = /opt/cloudera/hiveodbc/lib/64/libclouderahiveodbc64.so
HS_TRANSACTION_MODEL=READ_ONLY_AUTOCOMMIT
#
# ODBC specific environment variables
#
set ODBCINI=/home/oracle/odbc.ini
Save, connect again and reissue the statement as follows:
SQL> select * from customers@hivedsn
-------------------------------------------------------------------------------------------
1530 David Smith XXXXXXXXX XXXXXXXXX
The connection and data retrieval is fully functional.
Configuring the Cloudera Impala ODBC driver
The driver installation directory will be located in /opt/cloudera/impalaodbc. In the /opt/cloudera/impalaodbc/Setup directory we will find sample configuration files that can be used with few modification for performing the setup.
Create a directory as root /usr/local/odbc as follows:
[root@node1 ~]# mkdir -p /usr/local/odbc
This directory will be used as placement for odbcinst.ini.
Copy odbc.ini in /home/oracle directory as follows:
[root@node1 Setup]# cp odbc.ini /home/oracle
Next, copy cloudera.impalaodbc.ini from /opt/cloudera/hiveodbc/lib/64
[root@node1 64]# cp cloudera.impalaodbc.ini /etc
Next, we will configure the environment variables for Cloudera Impala ODBC driver. Open .bash_profile for editing and add the following variables (marked with bold) as follows:
[oracle@node1 ~]$ vi .bash_profile
# .bash_profile
# Get the aliases and functions
if [ -f ~/.bashrc ]; then
. ~/.bashrc
fi
# User specific environment and startup programs
export ORACLE_HOME=/u01/app/oracle/product/12.1.0/dbhome_1
export ORACLE_SID=ORCL
export ODBCINI=~/odbc.ini
export ODBCSYSINI=/usr/local/odbc
export SIMBAINI=/etc/cloudera.impalaodbc.ini
PATH=$PATH:$HOME/bin:$ORACLE_HOME/bin
export LD_LIBRARY_PATH=$LIBRARY_PATH:$ORACLE_HOME/lib
Source the profile to get the environment variables loaded:
[oracle@node1 ~]$ source .bash_profile
Similarly with the Hive driver, the odbc.ini file copied under /home/oracle is containing two examples for configuring DSN for the 32 and 64 driver version. In our case we are going to use the 64 version. All parameters are explained with comments in the sample file, which are omitted from the listing. For more information about parameters you can consult the driver installation and configuration manual located at http://www.cloudera.com/documentation/other/connectors/impala-odbc/2-5-22.html.
We will change only the host, port, rename the data source to IMPLDSN as follows:
[ImplDSN]
# Description: DSN Description.
# This key is not necessary and is only to give a description of the data source.
Description=Cloudera ODBC Driver for Impala (64-bit) DSN
# Driver: The location where the ODBC driver is installed to.
Driver=/opt/cloudera/impalaodbc/lib/64/libclouderaimpalaodbc64.so
HOST=192.168.1.88
PORT=21050
Database=default
AuthMech=0
Test the connection as follows:
[oracle@node1 ~]$ isql -v ImplDSN
+---------------------------------------+
| Connected! |
| |
| sql-statement |
| help [tablename] |
| quit |
| |
+---------------------------------------+
SQL>
List the tables and try a count against one table to ensure that everything is working fine:
SQL> show tables;
+------------------+
|tab_name |+------------------+
| categories
|| customers
|| departments
|| order_items
|| orders
|| products
|+-------------------+
SQLRowCount returns -1
6 rows fetched
SQL> select count(*) from orders;
+---------------------+
| EXPR_1 |
+---------------------+
| 68883 |
+---------------------+
SQLRowCount returns -1
1 rows fetched
Oracle ODBC Transparent Gateway Configuration for Cloudera Impala ODBC Driver
Under directory $ORACLE_HOME/hs/admin we created an initialization parameters file named initIMPLDSN.ora (we used the dg4odbc.ini file as a template) with the following contents :
[oracle@node1 admin]$ vi initHIVEDSN.ora
# This is a sample agent init file that contains the HS parameters that are
# needed for the Database Gateway for ODBC
HS_FDS_CONNECT_INFO = "IMPLDSN"
HS_FDS_TRACE_LEVEL = 0
HS_FDS_SHAREABLE_NAME = /opt/cloudera/hiveodbc/lib/64/libclouderahiveodbc64.so
HS_TRANSACTION_MODEL=READ_ONLY_AUTOCOMMIT
#
# ODBC specific environment variables
#
set ODBCINI=/home/oracle/odbc.ini
Important : The name of the init file must be init<HS_FDS_CONNECT_INFO value>.ora otherwise the connection will fail.
Add also libodbcinst.so path in /etc/cloudera.impalaodbc.ini driver configuration file as follows :
[Driver]
ODBCInstLib=/usr/lib64/libodbcinst.so
ErrorMessagesPath=/opt/cloudera/impalaodbc/ErrorMessages/
LogLevel=0
LogPath=
SwapFilePath=/tmp
Listener configuration
Open the listener.ora file for editing and add the following (marked with bold):
[oracle@node1 admin]$ vi /u01/app/oracle/product/12.1.0/dbhome_1/network/admin/listener.ora
# listener.ora Network Configuration File: /u01/app/oracle/product/12.1.0/dbhome_1/network/admin/listener.ora
# Generated by Oracle configuration tools.
SID_LIST_LISTENER =
(SID_LIST =
(SID_DESC =
(PROGRAM = dg4odbc)
(ARGS = ENVS=LD_LIBRARY_PATH=/opt/cloudera/impalaodbc/lib/64/libclouderaimpalaodbc64.so)
(SID_NAME = IMPLDSN)
(ORACLE_HOME = /u01/app/oracle/product/12.1.0/dbhome_1)
)
)
LISTENER =
(DESCRIPTION_LIST =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = node1)(PORT = 1521))
)
(DESCRIPTION =
(ADDRESS = (PROTOCOL = IPC)(KEY = EXTPROC1521))
)
)
ADR_BASE_LISTENER = /u01/app/oracle
Add a network service in tnsnames.ora as follows:
IMPLDSN =
(DESCRIPTION=
(ADDRESS=(PROTOCOL=tcp)(HOST=node1)(PORT=1521))
(CONNECT_DATA=(SID=IMPLDSN))
(HS=OK)
)
Reload the listener as follows:
[oracle@node1 admin]$ lsnrctl reload
LSNRCTL for Linux: Version 12.1.0.2.0 - Production on 20-JUN-2016 08:05:57
Copyright (c) 1991, 2014, Oracle. All rights reserved.
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=node1)(PORT=1521)))
The command completed successfully
[oracle@node1 admin]$
Check the services availability:
[oracle@node1 admin]$ lsnrctl status
LSNRCTL for Linux: Version 12.1.0.2.0 - Production on 20-JUN-2016 08:06:32
Copyright (c) 1991, 2014, Oracle. All rights reserved.
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=node1)(PORT=1521)))
STATUS of the LISTENER
------------------------------------------------------------------------------------------------------------------------
Alias LISTENER
Version TNSLSNR for Linux: Version 12.1.0.2.0 - Production
Start Date 19-JUN-2016 08:04:43
Uptime 1 days 0 hr. 1 min. 49 sec
Trace Level off
Security ON: Local OS Authentication
SNMP OFF
Listener Parameter File /u01/app/oracle/product/12.1.0/dbhome_1/network/admin/listener.ora
Listener Log File /u01/app/oracle/diag/tnslsnr/node1/listener/alert/log.xml
Listening Endpoints Summary...
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=node1)(PORT=1521)))
(DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=EXTPROC1521)))
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcps)(HOST=node1)(PORT=5500))
(Security=(my_wallet_directory=/u01/app/oracle/product/12.1.0/dbhome_1/admin/ORCL/xdb_wallet))
(Presentation=HTTP)(Session=RAW))
Services Summary...
Service "IMPLDSN" has 1 instance(s).
Instance "IMPLDSN", status UNKNOWN, has 1 handler(s) for this service...
Service "ORCL" has 1 instance(s).
Instance "ORCL", status READY, has 1 handler(s) for this service...
Service "ORCLXDB" has 1 instance(s).
Instance "ORCL", status READY, has 1 handler(s) for this service...
The command completed successfully
Ping the network service IMPLDSN to check its availability as follows:
[oracle@node1 admin]$ tnsping IMPLDSN
TNS Ping Utility for Linux: Version 12.1.0.2.0 - Production on 20-JUN-2016 08:07:56
Copyright (c) 1997, 2014, Oracle. All rights reserved.
Used parameter files:
/u01/app/oracle/product/12.1.0/dbhome_1/network/admin/sqlnet.ora
Used TNSNAMES adapter to resolve the alias
Attempting to contact (DESCRIPTION=
(ADDRESS=(PROTOCOL=tcp)(HOST=node1)(PORT=1521))
(CONNECT_DATA=(SID=IMPLDSN)) (HS=OK))
OK (10 msec)
Create a public database link to connect from oracle as follows:
[oracle@node1 admin]$ sqlplus / as sysdba
SQL*Plus: Release 12.1.0.2.0 Production on Mon Jun 20 08:09:35 2016
Copyright (c) 1982, 2014, Oracle. All rights reserved.
Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options
SQL> create public database link impldsn connect to cloudera identified by cloudera using 'IMPLDSN';
Database link created.
Test the database link as follows:
SQL> select * from customers@impldsn
-----------------------------------------------------------------------------------
1530 David Smith XXXXXXXXX XXXXXXXXX
Connection and data retrieval from Cloudera Impala seems fine.
Observations
Whenever possible, use Impala over Hive because of the several limitations in terms of speed and syntax. It is also possible to have several syntax differences when you query from Hive or Impala directly from Oracle. If some complex query should be executed that it is better to create a view locally in Hive or Impala and try to select that directly. Another benefit is that this will leverage query optimizations on the Impala or Hive side.
One example, if you try to run the following query, you will get a warning that the count inside this query is not a supported feature :
SQL> l
1 select c."category_name", count("order_item_quantity") as count
2 from order_items@impaladsn oi
3 inner join products@impaladsn p on oi."order_item_product_id" = p."product_id"
4 inner join categories@impaladsn c on c."category_id" = p."product_category_id"
5 group by c."category_name"
6* order by count("order_item_quantity") desc
SQL> /
select c."category_name", count("order_item_quantity") as count
ERROR at line 1:
ORA-02070: database IMPALADSN does not support in this context
To solve this issue, create a view in Impala :
[cloudera@quickstart Desktop]$ impala-shell
Starting Impala Shell without Kerberos authentication
Connected to quickstart.cloudera:21000
Server version: impalad version 2.5.0-cdh5.7.0 RELEASE (build ad3f5adabedf56fe6bd9eea39147c067cc552703)
***********************************************************************************
Welcome to the Impala shell. Copyright (c) 2015 Cloudera, Inc. All rights reserved.
(Impala Shell v2.5.0-cdh5.7.0 (ad3f5ad) built on Wed Mar 23 11:33:33 PDT 2016)
After running a query, type SUMMARY to see a summary of where time was spent.
***********************************************************************************
[quickstart.cloudera:21000] >
create view summaryvw as -- Most popular product categories
> select c.category_name, count(order_item_quantity) as count
> from order_items oi
> inner join products p on oi.order_item_product_id = p.product_id
> inner join categories c on c.category_id = p.product_category_id
> group by c.category_name
> order by count desc
> limit 10;
Query: create view summaryvw as -- Most popular product categories
select c.category_name, count(order_item_quantity) as count
from order_items oi
inner join products p on oi.order_item_product_id = p.product_id
inner join categories c on c.category_id = p.product_category_id
group by c.category_name
order by count desc
limit 10
Reissue the query this time against the sumaryvw view created in Impala:
1* select * from summaryvw@impaladsn
SQL> /
category_name count
-------------------------------------------------------------------------------- ----------
Cleats 24551
Men's Footwear 22246
Women's Apparel 21035
Indoor/Outdoor Games 19298
Fishing 17325
Water Sports 15540
Camping & Hiking 13729
Cardio Equipment 12487
Shop By Sport 10984
Electronics 3156
10 rows selected.
Summary
In this article we demonstrated how to use Cloudera Hive and Impala ODBC drivers to connect from Oracle using ODBC heterogeneous gateway. This approach could be very helpful when we want to retrieve small amount of data using relative simple queries and need to setup the connection very quickly. Also the method can be used for archive or backup of sensitive data from Hive or Impala.
About the Authors
Y V Ravi Kumar is an Oracle ACE and Oracle Certified Master (OCM) with 17 years of experience in BFSI vertical. He is also OCP in Oracle 8i, 9i, 10g, 11g &12c and Certified in Golden Gate, RAC, Performance Tuning & Oracle Exadata. He continuously motivates many DBAs and helps the Oracle Community by publishing his tips/ideas/suggestions/solutions in his blog. He has written 40+ OTN articles on Oracle Exadata, Oracle RAC and Oracle GoldenGate for OTN for Spanish, Portuguese and English and 17 articles for TOAD World, 2 Articles for UKOUG, 3 Articles for OTech Magazine and 2 Articles for Redgate. He is a frequent Oracle speaker in @OTN, AIOUG, Sangam and IOUG. Learn more from his profile at @"LaserSoft"
Adrian Neagu has over 15 years of experience as a database administrator, having expertise in various RDBMS systems. He has experience in many areas such as financial industry, the pharmaceutical industry, telecom and aviation. He is an Oracle Certified Master 10g and 11g, Oracle Certified Professional 9i, 10g, and 11g, Cloudera Certified Administrator for Apache Hadoop, IBM DB2 Certified Administrator version 8.1.2 and 9, IBM DB2 9 Advanced Certified Administrator 9, and Sun Certified System Administrator Solaris 10. He is an expert in many areas of database administration, BigData and Operating systems including high-performance tuning, high availability, replication, backup, and recovery.