Skip to Main Content

Data Lake & Services

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

Interested in getting your voice heard by members of the Developer Marketing team at Oracle? Check out this post for AppDev or this post for AI focus group information.

Ingesting Large Oracle Table into Databricks takes longer

User_1WSU2Mar 10 2022

I have an oracle table containing 50 Million Records and about 13-15 columns and having composite primary key. I am trying to fetch this table into databricks using oracle.jdbc.driver.OracleDriver. I have tried two different approaches as below:

Approach 1
val myDF = spark.read.jdbc(url = url,
table = "TableName",
columnName="PartionColumn",
lowerBound=lowerBound,
upperBound=UpperBound,
numPartitions=10,
connectionProperties)

myDF.write.option("mergeSchema", "true").format("delta").mode("overwrite").saveAsTable("TableName")
Approach 2
val myDF = spark.read.jdbc(url, query, connectionProperties)
myDF.write.option("mergeSchema", "true").format("delta").mode("overwrite").saveAsTable("TableName")

And this takes more than 25 hours.
Also, when I try to load data into dataframe and display it, it doesn't display the result.
Can someone suggest me what I am doing wrong here ? Or any help on best approach to achieve this would be appreciated.

Thanks

Comments

Post Details

Added on Mar 10 2022
1 comment
1,340 views