Skip to Main Content

Ingesting Large Oracle Table into Databricks takes longer

User_1WSU2Mar 10 2022

I have an oracle table containing 50 Million Records and about 13-15 columns and having composite primary key. I am trying to fetch this table into databricks using oracle.jdbc.driver.OracleDriver. I have tried two different approaches as below:

Approach 1
val myDF = spark.read.jdbc(url = url,
table = "TableName",
columnName="PartionColumn",
lowerBound=lowerBound,
upperBound=UpperBound,
numPartitions=10,
connectionProperties)

myDF.write.option("mergeSchema", "true").format("delta").mode("overwrite").saveAsTable("TableName")
Approach 2
val myDF = spark.read.jdbc(url, query, connectionProperties)
myDF.write.option("mergeSchema", "true").format("delta").mode("overwrite").saveAsTable("TableName")

And this takes more than 25 hours.
Also, when I try to load data into dataframe and display it, it doesn't display the result.
Can someone suggest me what I am doing wrong here ? Or any help on best approach to achieve this would be appreciated.

Thanks

Comments
Post Details
Added on Mar 10 2022
1 comment
64 views