Skip to Main Content

Database Software

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

Slow Ingestion - Large XML File /w Schema

521772Sep 26 2006 — edited Sep 5 2008
The File: 78-150 megs with 5 namespaces. This data contains configuration data based on a wireless standard set by 3GPP http://www.arib.or.jp/IMT-2000/V310Sep02/S3g/Rel4/32/32615-420.pdf . We will receive this file a few times daily from multiple sources and the goal is to have the data relationally loaded.

The Schema: XML Schema has been created and registered. There are 6 XSD's/Namespaces and circular dependencies have been edited to be out-of-line with default tables. Abstract class substitution has been removed per M.Drake's recommendation in an earlier post. I have set each complex type to not keep the DOM refererence and I have supressed the default table on most complex elements (defaultTable="").

Method: I have successfully registered the schema and loaded files 900k-28M without issues or severe performance degradation. I have written the XML/SQL extraction queries using XPath and these perform very well. The problem is that when the file goes larger than 28M there is a severe degradation of performance and interestingly, the performance is far worse on the large servers compared with my local instance. I have tested using createResource, FTP to repository, and BFILE insert.

Stats: (*Note* Dev server was quiet, Prod/Enterprise tested on a quiet instance)

Using BFILE method

9800k [Enterprise Server 72 CPU] 41 sec
14404k [Enterprise Server 72 CPU] 1:33 min
44066k [Enterprise Server 72 CPU] 20:20 min

900k [Dev Server 55 CPU] 17 secs
3.5 megs [Dev Server 55 CPU] 37 secs
10 megs [Dev Server 55 CPU] 2:17 mins
14.5 megs [Dev Server 55 CPU] 3:55 mins
29 megs [Dev Server 55 CPU] 14:00 mins
44 megs [Dev Server 55 CPU] 51:03 mins


Using createResource

3617k [Enterprise Server 72 CPU] 18 sec
9867k [Enterprise Server 72 CPU] 1:03 min
14404k [Enterprise Server 72 CPU] 2:13 min
29235k [Enterprise Server 72 CPU] 7:35 min
44066k [Enterprise Server 72 CPU] 16:15 min

3.5 megs [Dev Server 55 CPU] 31 secs
10 megs [Dev Server 55 CPU] 1:44 mins
14.5 megs [Dev Server 55 CPU] 3:07 mins
29 megs [Dev Server 55 CPU] 10:19 mins
44 megs [Dev Server 55 CPU] 21:35 mins

3.5 megs [Local Instance 1 CPU] 8 secs
10 megs [Local Instance 1 CPU] 21 secs
14.5 megs [Local Instance 1 CPU] 45 secs
29 megs [Local Instance 1 CPU] 2:04 mins
44 megs [Local Instance 1 CPU] 3:47 mins
76 megs [Local Instance 1 CPU] 11:52 mins

Using ftp to repository

29235k [Enterprise Server 72 CPU] 6 min
44066k [Enterprise Server 72 CPU] 14 min
76M [Enterprise Server 72 CPU] 75 min

As you can see, there is a large difference between the results on my personal database and the results on the actual servers. SR has been created SR 5814226.993 and includes the XML Schema and XML Files.


We plan to evolve our system to handle around 700K files per day (avg is 20k) including these large XML files. The files I am working with now are a proof of concept and it is important to demonstrate that Oracle XML is capable of handling these files very efficiently. Any help would be appreciated.

Thank you,
VJ
Comments
Locked Post
New comments cannot be posted to this locked post.
Post Details
Locked on Oct 3 2008
Added on Sep 26 2006
27 comments
3,850 views