Slow Ingestion - Large XML File /w Schema
521772Sep 26 2006 — edited Sep 5 2008The File: 78-150 megs with 5 namespaces. This data contains configuration data based on a wireless standard set by 3GPP http://www.arib.or.jp/IMT-2000/V310Sep02/S3g/Rel4/32/32615-420.pdf . We will receive this file a few times daily from multiple sources and the goal is to have the data relationally loaded.
The Schema: XML Schema has been created and registered. There are 6 XSD's/Namespaces and circular dependencies have been edited to be out-of-line with default tables. Abstract class substitution has been removed per M.Drake's recommendation in an earlier post. I have set each complex type to not keep the DOM refererence and I have supressed the default table on most complex elements (defaultTable="").
Method: I have successfully registered the schema and loaded files 900k-28M without issues or severe performance degradation. I have written the XML/SQL extraction queries using XPath and these perform very well. The problem is that when the file goes larger than 28M there is a severe degradation of performance and interestingly, the performance is far worse on the large servers compared with my local instance. I have tested using createResource, FTP to repository, and BFILE insert.
Stats: (*Note* Dev server was quiet, Prod/Enterprise tested on a quiet instance)
Using BFILE method
9800k [Enterprise Server 72 CPU] 41 sec
14404k [Enterprise Server 72 CPU] 1:33 min
44066k [Enterprise Server 72 CPU] 20:20 min
900k [Dev Server 55 CPU] 17 secs
3.5 megs [Dev Server 55 CPU] 37 secs
10 megs [Dev Server 55 CPU] 2:17 mins
14.5 megs [Dev Server 55 CPU] 3:55 mins
29 megs [Dev Server 55 CPU] 14:00 mins
44 megs [Dev Server 55 CPU] 51:03 mins
Using createResource
3617k [Enterprise Server 72 CPU] 18 sec
9867k [Enterprise Server 72 CPU] 1:03 min
14404k [Enterprise Server 72 CPU] 2:13 min
29235k [Enterprise Server 72 CPU] 7:35 min
44066k [Enterprise Server 72 CPU] 16:15 min
3.5 megs [Dev Server 55 CPU] 31 secs
10 megs [Dev Server 55 CPU] 1:44 mins
14.5 megs [Dev Server 55 CPU] 3:07 mins
29 megs [Dev Server 55 CPU] 10:19 mins
44 megs [Dev Server 55 CPU] 21:35 mins
3.5 megs [Local Instance 1 CPU] 8 secs
10 megs [Local Instance 1 CPU] 21 secs
14.5 megs [Local Instance 1 CPU] 45 secs
29 megs [Local Instance 1 CPU] 2:04 mins
44 megs [Local Instance 1 CPU] 3:47 mins
76 megs [Local Instance 1 CPU] 11:52 mins
Using ftp to repository
29235k [Enterprise Server 72 CPU] 6 min
44066k [Enterprise Server 72 CPU] 14 min
76M [Enterprise Server 72 CPU] 75 min
As you can see, there is a large difference between the results on my personal database and the results on the actual servers. SR has been created SR 5814226.993 and includes the XML Schema and XML Files.
We plan to evolve our system to handle around 700K files per day (avg is 20k) including these large XML files. The files I am working with now are a proof of concept and it is important to demonstrate that Oracle XML is capable of handling these files very efficiently. Any help would be appreciated.
Thank you,
VJ