Cloud Platform

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

Introducing Qubole. A turnkey Big Data Service running on Oracle Bare Metal Cloud

Rachna Raina Thusoo-OracleFeb 6 2017 — edited Feb 6 2017

Tectonic shifts in the technology space usually precipitate out of a necessity or a desire to change the status quo. Big data, as we know it today is one

such tectonic shift that was a result of the necessity to keep up with the deluge of data - a result of automation of pretty much everything that touches our life today.

Big data encompasses data sets that are large and complex such that traditional data processing cannot keep up (think petabytes and exabytes of data). It

matters in the Oracle Bare Metal Cloud Service context because cloud has been the accelerator for big data adoption and making it accessible to all. Big data

belongs in the cloud. The elasticity of the cloud makes the use of big data analysis practical and the scalability, durability and cost effectiveness of object storage

makes it possible to store data at the ‘big data’ scale.

The performance characteristics, cost effectiveness, and processing prowess of the Bare Metal Cloud make it the platform of choice to run all your big data workloads.

The Mapreduce reference architecture was the harbinger of all things big data. It was the first step to re-imagining how data should be analyzed at cloud scale. Apache

Hadoop brought this dynamic, distributed, and highly scalable framework to open source. With built-in fault tolerance and high scalability, this framework became the cornerstone

of big data infrastructure and over a period of time it has spawned development of different interfaces such as Hive, Spark, Presto, and others in the community. Engineers at

Facebook first democratized this framework with Hive and made big data available and accessible to data users beyond developers. Since then engines such as Spark have

emerged to fill the needs of data scientists while solutions like Presto and Impala have become popular solutions for Analysts.

Today, I am very excited to let you know that in addition to supporting DIY big data workloads, Bare Metal Cloud Service now also supports a turnkey, cloud scale big data-as-a-Service

solution. We have accomplished this by integrating with Qubole’s big data-as-a-service platform ‘Qubole Data Service’ (QDS). Qubole knows a thing or two about big data processing at

scale. It was founded by the same engineers who introduced Hive to Facebook and the Hadoop ecosystem. They were the architects of the Facebook data infrastructure, which today

supports 1.85 billion users. In its five years of existence, QDS has scaled to process almost 500 petabytes of data each month.

Oracle Bare Metal Cloud Services and Qubole together, offer a compelling big data solution for enterprises that wish to gain a competitive edge by

generating business insights and making their data work for them.

DIY big data or a turnkey solution – your choice

We like to give our customers choice. You can take two routes to standing up your big data infrastructure on Oracle Bare Metal Cloud Services:

DIY big data: You can select the Hadoop distribution of your choice, run it on Bate Metal Cloud compute service and stitch together the solution using multiple off-the-shelf products

and tools. We’ll describe this in detail in later posts.

Oracle IaaS big data solution with Qubole: This solution offers you a turnkey, fully managed, self-service big data solution on Bare Metal Cloud. If you choose Qubole,

you will be able to delegate all the groundwork associated with standing up big data infrastructure in the cloud to the Bare Metal Cloud team. Qubole Data Service

will enable you to go from 0 to query in 5 short steps.

Seven great things about the Oracle and Qubole big data solution

A fully turnkey Solution: Qubole on Bare Metal Cloud Services (BMCS) is a fully managed, turnkey, big data solution. You don’t need to install, configure, or deploy a big data

infrastructure to get results out of your data.  Just sign up for BMCS and Qubole, define the clusters’ characteristics, and start submitting queries. Qubole manages the rest for you.

Separation of compute and storage: Qubole uses data stored in the Object Storage. Data processing is not tied to where the data resides. This separation allows Qubole to

scale compute independent of storage and vice versa. This provides the elasticity and agility needed for big data processing.
Auto-scaling: Qubole scales resources up and down, based on customer defined policies, adapting the cluster size to the workload. This saves costs, reduces complexity and
```
increases agility.
```
Automatic cluster management: When a new workload starts, Qubole determines if a new cluster is required and automatically starts it. It monitors when the cluster nodes need

to scale up, and uses an auto-scaling algorithm to adapt the cluster’s size to the workload needs. Finally, when the job is done, Qubole auto terminates cluster, without user intervention
Multi-engine support: Qubole support multiple big data processing engines. At GA, we offer support for Hadoop, Hive, and Spark. Support for Presto is coming soon.
Management interfaces: Manage big data infrastructure using UI, REST APIs and Python SDK
And did I mention scale : Qubole can easily scale to run thousands of big data nodes on BMCS. It operates at cloud scale.

Oracle and Qubole high level architecture

Qubole offers native integration with the Oracle IaaS platform. The solution consists of three components

      **-  Bare Metal Cloud Services infrastructure**: This is basic compute, storage, networking infrastructure layer, where your data will be hosted

        and where the Hadoop clusters will run.  You own the resources in your BMCS tenancy, at all times but Qubole manages it on your behalf using

        an IAM user that you create, specifically for this purpose.

- Qubole Tier: This is where the QDS engine is hosted. It drives activities like provisioning, orchestration, management, auto scaling, data visualization etc

- User Access tier: The customer manages their big data infrastructure via the user access layer. Users have a choice to manage their big data

infrastructure via the UI (using a browser based interface) or an SDK. ODBC drivers are supported as well.

Oracle and Qubole high level architecture.jpg

Four steps to starting your Oracle IaaS big data solution with Qubole

Getting started is a quick and easy. You can have you big data infrastructure up and running in a matter of hours.

Step #1: Create a BMCS account here and setup your tenancy to host Qubole big data clusters

Once you establish access to your Bare Metal Cloud Services account, log into the service console and set up your tenancy for Qubole to initialize your big

data infrastructure. You’ll need to do the following

Create compartments where your big data clusters will be hosted
Create a new BMCS user
Create a new user group and assign the newly create BMCS user to the group
Create a public/private key pair and upload the public key to BMCS
Configure the Virtual Cloud Network
Set access policies on the compartment

Be sure that you are closely following the instructions posted in the Qubole ‘quick start guide’ for smooth sailing.

Once the setup is done, identify the bucket in the Bare Metal Cloud object storage that hosts your data, or create a bucket in the Bare Metal Cloud Service console and import your data into the bucket.

Object Storage.jpg

Step # 2: Create a Qubole - Oracle account

Click here to establish your Qubole-Oracle account. You will receive a confirmation email from Qubole that will provide the log in details as well as

the link to the Qubole on BMCS service console.

Qubole Sign up.jpg

Step # 3: Input your Bare Metal Cloud Tenancy details into the Qubole service console

The compute and storage settings can be same of different, depending on whether a single or multiple users control access to the storage and compute resources.

Compute Settings.jpg

Step # 4: Start analyzing your data

You are all set, you can now start running queries to analyze your data, schedule jobs and visualize data to detect patterns.

 - Create a cluster configuration

Qubole Cluster.jpg

\- Create the Hive metastore and submit queries, via the ‘QDS Analyze’. You can also choose to recurring jobs via the ‘QDS Scheduler’

- Monitor your jobs

Qubole Dashboard.jpg

Blazing fast Spark performance

Workloads just run better on Oracle Bare Metal Cloud Services and we have a benchmark to prove it. Qubole on BMCS runs Spark SQL 116% faster than

comparable solution on-premises.

Qubole Benchmark.jpg

Sample Data sets

Just in case you want to give our big data solution a test run and are looking for some sample data, here are some of the public data sets that you can download

Give it a try!

In summary, the Bare Metal Cloud Services team is invested in offering customers choice. We built our infrastructure to handle the most demanding workloads. Infrastructure

as Service primitives like compute (with blazing fast NVMe storage), block storage, object storage and networking were purposely architected to efficiently host big data workloads.

So whether you are gravitating towards standing up your own big data cluster on BMCS or leveraging Qubole as your turnkey big data solution of choice, we have you covered.

Go ahead, I encourage you to take big data out for a spin and see how it works for you. If you need help, feel free to reach out to us either by commenting below, on stack overflow,

or reach out to your Oracle account manager so that they can get you in direct contact with us.

Rachna Thusoo

Director, Product Management

Oracle Bare Metal Cloud Team

Added on Feb 6 2017

#cloud-applications, #cloud-infrastructure, #legacy-blogs

0 comments

1,458 views