Data Lake & Services

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

Oracle R Enterprise Transparency Layer : Viewing behind the Scenes

Brendan TFeb 25 2016 — edited May 25 2016

**

Oracle R Enterprise Transparency Layer : Viewing behind the Scenes**

_by @"Brendan" Brendan Tierney, Oracle ACE Director
_

This purpose of this article is to give you an overview of the Oracle R Enterprise (ORE) Transparency Layer. The ORE Transparency Layer is a core component of Oracle R Enterprise and manages the process of translating your R code into SQL code, which is in turn executed on the data in the Oracle Database. A typical scenario when working with the R language is that the data is extracted to the data analysis machine, and then all the analysis is performed on the data. With Oracle R Enterprise, and using the Transparency Layer, you can keep your data inside the Oracle Database. Since there's zero data movement, your analytics can be performed at greater speed and on greater volumes of data. This is achieved by using the performance and scalabilities of the Oracle Database. An additional benefit is that it ensures the consistent security of your data.

1. Overview of ORE

Oracle R Enterprise (ORE) is a component of the Oracle Advanced Analytics option for the Oracle Database Enterprise Edition. Oracle R Enterprise allows you to use the R language to seamlessly integrate with the Oracle Database, allowing you to use the power and scalability of the Oracle Database. With Oracle R Enterprise you can quickly and easily migrate your R advanced analytics to use the Oracle Database with the minimum of code changes to your R scripts. By having the R language as a component of the Oracle Database significantly expands the statistical, analytical and graphical capabilities of the Oracle Database. It allows you to quickly and easily integrate the vast analytical capabilities of the R language with your applications and BI Dashboards. As part of ORE, you can store R code inside the Oracle Database and call this R code using SQL and PL/SQL.

A core feature of Oracle R Enterprise is the Transparency Layer. This allows for the seamless translation of your R code, where possible, into SQL code to run in the Oracle Database.

This article will demonstrate how the ORE Transparency Layer works using a number of examples. We will explore the meta data provided by the ORE Transparency Layer and what SQL is actually run in the Oracle Database for some R commands.

2.The ORE Transparency Layer

The Transparency Layer of Oracle R Enterprise supports the in-database data exploration, data preparation and analytics. The Transparency Layer allows a data analyst who is using R to seamlessly and with the minimum of changes, to work with data stored in the Oracle Database. The data contained in the tables and views in the database will appear as (ORE) data frames in the R development environment. No data will be moved from the Oracle Database to the R environment on the data analysts PC or laptop. Due to this zero data movement, being able to utilize the scalability and performance features of the Oracle Database allows the data analyst to work with a greater volume of data and get results from their analytics in a significantly shorter time.

When the data analyst writes their R code to process data in the Oracle Database, and this data is defined as an ORE data frame, the ORE Transparency Layer will look to translate the R code into the equivalent SQL code. By doing this you will gain from 35+ years of SQL optimization within the Oracle Database. Nothing can run faster on your data in an Oracle Database than the Oracle SQL Language.

After issuing the R code to run on the data in the Oracle Database, and this code has been executed in the Oracle Database, the Transparency Layer will take the results and will then transform the results into the typical R format. It is this R formatted results that are returennd and presented back to the data analyst.

The purpose of the Transparency Layer is to support in-database data exploration, data preparation and data analysis en route to application of predictive analytics where we have a mix of in-database and CRAN techniques.

In addition to the functionality outlined above the Transparency Layer will:

Stacked and accumulated all the commands. The R code is not executed line by line or function by function as they are processed in the R code.
Only when a final result is needed (for computation or viewing)
Oracle will perform some optimization & reorganization of the commands.
Oracle will perform query optimization.
Oracle will translate into in-Database SQL and PL/SQL function/procedures/packages.
Execute the SQL (& R) commands on the data in the database.
Oracle will manage the results.
The returned results will be translated back into R format.

In the following examples we will explore some examples of how you can examine what SQL code Transparency Layer is generating and running in the Oracle Database.

3. Example 1 : Moving data to the Oracle Database & Selecting Data

In my first example we will look at what happens when we take an R data frame and push it to the Oracle Database. In this example we will use the MTCARS data set that comes as standard with R. If you have R installed then you have this data set.

Why do we want to push a data frame to the Oracle Database. Well there are lots of reasons and most of these will involve wanting to use the power of the Oracle Database server to perform some additional processing on the data.

The following example uses the ore.push function to move the MTCARS data frame to the Oracle Database. The function will create a temporary table object in your schema, with a name beginning with ORE$ and will insert a record in the table for each row in the R data frame.

> cars_dataset

The cars_dataset is a local variable in my R session that contains a pointer to the ORE table in my schema. We can check this out by looking at the class of the object and we are told that it is an ore.frame.

> class(cars_dataset)
[1] "ore.frame"attr(,"package")
[1] "OREbase"

When you look in your schema for this object you will it having a name like ORE$. In my example scenario it is called ORE$73_8.

You can go an DESC this table and view the data that was inserted in this table using SQL.

When it comes to the ORE Transparency layer you may be interested in seeing what it is doing and what kind of information does it have about this object/table in my schema. To view this information you will need to use the R function called str. When we use this function on the cars_dataset ORE data frame we get the following information.

> str(cars_dataset) 
'data.frame':     32 obs. of  11 variables: 
Formal class 'ore.frame' [package "OREbase"] with 12 slots  
 ..@ .Data    : list()  
 ..@ dataQry  : Named chr "( select /*+ no_merge(t) */ VAL012 NAME001,VAL013 NAME002, VAL001 ,VAL002 ,VAL003 ,VAL004 ,VAL005 ,VAL006 ,VAL007 ,VAL008 ,VAL0"| __truncated__  
 .. ..- attr(*, "names")= chr "73_9"  
 ..@ dataObj  : chr "73_9"  
 ..@ desc     :'data.frame':     11 obs. of  2 variables:  
 .. ..$ name  : chr  "mpg" "cyl" "disp" "hp" ...  
 .. ..$ Sclass: chr  "numeric" "numeric" "numeric" "numeric" ...  
 ..@ sqlName  : Named chr  "VAL012" "VAL013"  
 .. ..- attr(*, "names")= chr  "asc" "" 
 ..@ sqlValue : chr  "VAL001" "VAL002" "VAL003" "VAL004" ... 
 ..@ sqlTable : chr "\"ORE_USER\".\"ORE$73_8\"" 
 ..@ sqlPred  : chr ""  
 ..@ extRef   :List of 1  
 .. ..$ :  
 ..@ names    : chr  
 ..@ row.names: int   
 ..@ .S3Class : chr "data.frame"

When you look at this information you will see some of the information displayed is truncated. To overcome this you need to query each of the items individually, just like I have shown below. These examples show you how to display the object/table name in the database, list the attribute/variable names to be used in the ORE data frame, list the data types of each attribute/variable and finally the query used by the ORE Transparency Layer to retrieve the data from the table in the Database.

# What is the name of the table in the Database 
> cars_dataset@sqlTable 
# Get the names of the attributes/variables 
> cars_dataset@desc$name 
# Get the data types of the attributes/variables 
> cars_dataset@desc$Sclass 
# Return the SQL Query used to retrieve the data from the table in the Database 
> cars_dataset@dataQry 
# Display the first 5 records from the ORE table 
> head(cars_dataset, 5)

When you look inside the Oracle Database to see what query was actually made, when we do this we will see that the query is slightly different. The following output is what is shown in v$sql for when we the last R command above.

with OBJ73_9 as 
( select /*+ no_merge(t) */ VAL012 NAME001,VAL013 NAME002, VAL001 ,VAL002 ,VAL003 ,VAL004 ,VAL005 ,VAL006 ,VAL007 ,VAL008 ,VAL009 ,VAL010 ,VAL011
  from "ORE_USER"."ORE$73_8" t  ),
OBJ73_14 as (
               select * 
               from (select * from  OBJ73_9 order by NAME001 asc, NAME002) 
               where  rownum

4. Example 2 : Aggregating Data

In the previous example we got to see some of the ORE Transparency Layer information that is available to us. We also got to see the actual query that was run on the data. The ORE Transparency Layer took a typical R function, translated this function into the equivalent in SQL, executed this query in the database and then presented the results (via the ORE Transparency Layer) back to the user in their R environment.

In the following example we will look at a slightly more complex scenario that involves performing some aggregation on our data. In this particular scenario we will be using a view that is defined over the CUSTOMERS table in the SH schema.

# Aggregate the data in the CUSTOMER_V (a view based on the sh.customer table) 
#   Aggregate based on each value of Customer Gender 
> AggData  AggData 
  CUST_GENDER     x 
F           F 18325 
M           M 37175

Again you can use the str function to examine each of the elements of the ORE Transparency Layer for this function call.

> str(AggData) 
> AggData@dataQry 
> AggData@dataObj 
> AggData@desc$name 
> AggData@sqlName 
> AggData@sqlValue 
> AggData@sqlTable 
> AggData@sqlPred 
> AggData@extRef 
> AggData@row.names 
> AggData@.Data

When we look at the @dataQry slot from the ORE Transparency Layer we can see the structure of the query it will perform in the database.

"( select \"CUST_GENDER\" NAME001, \"CUST_GENDER\" VAL001,count(*) VAL002 from \"ORE_USER\".\"CUSTOMER_V\" where (\"CUST_GENDER\" is not null) group by \"CUST_GENDER\" )"

When we look inside the Oracle Database and user the v$sql view we can see the actual query that was run to retrieve the results

with OBJ73_44 as 
   ( select "CUST_GENDER" NAME001, "CUST_GENDER" VAL001,count(*) VAL002 
     from "ORE_USER"."CUSTOMER_V" 
     where ("CUST_GENDER" is not null) group by "CUST_GENDER" ),
OBJ73_46 as 
   ( select /*+ no_merge(t) */ 1 NAME001, count(*) VAL001 from OBJ73_44 t  )  
select  *  from  OBJ73_46 t order by NAME001

5. Summary

In this technical paper we have explored some part of ORE Transparency Layer. The examples shown demonstrated how you can discover and access the ORE Transparency Layer meta data and how you can query/retrieve this information in your R environment. I also included the actual SQL that was run in the Oracle Database for each of the examples. You can find the actual SQL statements run in the Oracle Database using the v$sql view.

We also examined R code examples that can be translated, via the ORE Transparency Layer, into SQL statements. Many of the typical R statistical functions can be mapped into equivalent Oracle SQL. In scenarios when this is not possible an R process will be spanned on the Oracle Database server. This will be a topic for another technical article.

About the Author

Brendan Tierney [@"Brendan" ] is an Oracle ACE Director and an independent consultant based in Dublin, Ireland. He has 22+ years of extensive experience working in the areas of Data Mining, Predictive Analytics, Data Warehousing and Data Architecture. Brendan regularly works on projects in Ireland, UK, Europe, Canada and USA. For these projects Brendan is based in his home office and remotely connects to each organization network. Brendan is the editor of the UKOUG Oracle Scene magazine and deputy chair of the OUG Ireland BI & Big Data SIG. Brendan is a regular speaker at conferences across Europe and the USA and has written technical articles for OTN, Oracle Scene, IOUG SELECT Journal and ODTUG Technical Journal.

Brendan has published a number of books. These include the following books published by Oracle Press / McGraw-Hill:

Predictive Analytics using Oracle Data Miner (August 2014)
Oracle R Enterprise: Harnessing the Power of R in the Oracle Database (2016)
Real World SQL & PL/SQL: Advice from the Experts (2016), co-written with Arup Nada, Heli Helskyaho, Martin Widlake and Alex Nuijten.

Visit Brendan's Community profile for links to Brendan's blog, and contact information.

Added on Feb 25 2016

#legacy-documents

3 comments

1,455 views