Skip to Main Content

Database Software

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

XML to CSV routine using XSLT. Oracle bugs?

paul zipJul 2 2013 — edited Jul 3 2013

I'm trying to write some XSLT so that I can apply it to any given XML using  XMLTransform to generate a flattened CSV structure.  My input can be any XML document, with any number of nested levels so the structure isn't known to the XSLT.  I'm fairly new to XSLT but am having problem with Oracle's version of it.  I've tested on both 10gR2 and 11gR2.


CSV Rules :

Adjacent fields must be separated by a comma

Embedded commas in fields are escaped e.g. 15, Maple Street => "15, Maple Street"

Embedded quotes in fields are escaped.  e.g. O'Brien => "O'Brien" or John "Jonner" McNabb => "John ""Jonner"" McNabb"

Embedded CR / LF or CRLF in fields are escaped with quotes.

Records are separated with CRLF pairs


Flattening Rules :

The text of leaf elements (elements without children) are output even if they are empty

Any element with text is output

A new line occurs whenever the nesting level changes. i.e. If an element has children.


When I run the following XML and my XSLT through Oracle I get results inconsistent with results I get when I run the XML and XSLT through online XSLT utils (including W3C's test pages).

with qryXML as (

  select xmltype(q'[

  <email>

    <to>Steve O'Brien</to>

    <cc/>

    <from>Jane "The Smiler" Griff</from>

    <heading>Reminder</heading>

    <body>Hey, don't forget me this weekend!  I've enclosed some pics from last weekend

  Love Jane

  </body>

    <attachments>

      <attachment>

        <filename>Dance.jpg</filename>

        <encoding>MIME</encoding>

      </attachment>

      <attachment>

        <filename>Signature.txt</filename>

        <encoding/>

      </attachment>

    </attachments>

  </email>

  as XMLDATA from dual)

select

   XMLTransform(q.XMLDATA,

      xmltype(q'[

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

    <xsl:output method="text"/>

    <xsl:strip-space elements="*"/>

    <xsl:template match="*[text() or not(*)]">

        <xsl:if test="position() &gt; 1">

            <xsl:text>,</xsl:text>

        </xsl:if>

    <xsl:call-template name="display_csv_field">

      <xsl:with-param name="field" select="."/>

    </xsl:call-template>

        <xsl:if test="not(following-sibling::*[text() or not(*)])">

            <xsl:text>

</xsl:text>         

        </xsl:if>

    </xsl:template>

  <xsl:template name="display_csv_field">

    <xsl:param name="field"/>

    <xsl:variable name="CRLF">

      <xsl:text>&#13;&#10;</xsl:text>

    </xsl:variable>

    <xsl:variable name="CR">

      <xsl:text>&#13;</xsl:text>

    </xsl:variable>

    <xsl:variable name="LF">

      <xsl:text>&#10;</xsl:text>

    </xsl:variable>

    <xsl:variable name="apos">'</xsl:variable>   

    <xsl:choose>

      <xsl:when test="contains( $field, '&quot;' )">

        <!-- Field contains a quote. So escape  -->

        <xsl:text>"</xsl:text>

        <xsl:call-template name="escape_quotes">

          <xsl:with-param name="string" select="$field" />

        </xsl:call-template>

        <xsl:text>"</xsl:text>

      </xsl:when>

      <xsl:when test="contains( $field, ',' ) or

                      contains( $field, $apos ) or

                      contains( $field, $CRLF ) or

                      contains( $field, $CR ) or

                      contains( $field, $LF )" >

        <!-- Field contains a comma, apostrophe and/or a linefeed, so quote -->

        <xsl:text>"</xsl:text>

        <xsl:value-of select="$field" />

        <xsl:text>"</xsl:text>

      </xsl:when>

      <xsl:otherwise>

        <!-- No need to enclose this field in quotes. -->

        <xsl:value-of select="$field" />

      </xsl:otherwise>

    </xsl:choose>

  </xsl:template>

  <xsl:template name="escape_quotes">

    <xsl:param name="string" />

    <xsl:value-of select="substring-before( $string, '&quot;' )" />

    <xsl:text>""</xsl:text>

    <xsl:variable name="substring_after_first_quote"

                  select="substring-after( $string, '&quot;' )" />

    <xsl:choose>

      <xsl:when test="not( contains( $substring_after_first_quote, '&quot;' ) )">

        <xsl:value-of select="$substring_after_first_quote" />

      </xsl:when>

      <xsl:otherwise>

        <!-- The substring after the first quote contains a quote.

             So, we call ourself recursively to escape the quotes

             in the substring after the first quote. -->

        <xsl:call-template name="escape_quotes">

          <xsl:with-param name="string" select="$substring_after_first_quote" />

        </xsl:call-template>

      </xsl:otherwise>

    </xsl:choose>

  </xsl:template> 

</xsl:stylesheet>

]'

)) from qryXML q  

/

Output from Oracle (which appears incorrect)

&quot;Steve O&apos;Brien&quot;&quot;Jane &quot;&quot;The Smiler&quot;&quot; Griff&quot;Reminder&quot;Hey, don&apos;t forget me this weekend!  I&apos;ve enclosed some pics from last weekend

  Love Jane

  &quot;

Dance.jpgMIME

Signature.txt


Output from most other XSLT test environments (which appears correct):

"Steve O'Brien",,"Jane ""The Smiler"" Griff",Reminder,"Hey, don't forget me this weekend!  I've enclosed some pics from last weekend

Love Jane

"

Dance.jpg,MIME

Signature.txt,


Oracle's XSLT handling appears incorrect in several ways:

1. There is no comma delimiter separating fields and on investigation, position() is always returning 1, which is incorrect.  From XSLT reference "The position function returns a number equal to the context position from the expression evaluation context." ...."the position, or index number, of the node, relative to all the selected nodes in the node list."  Therefore any node in a node list should be indexable with position.   I am selecting a node list *[text() or not(*)] => Any nodes with text or with no children, yet position of each node in this list is always 1.

2. My XSLT says the output is text, yet Oracle is still using HTML character escaping which is not text!  &quot; instead of " and &apos; instead of '.  I could search and replace, but I shouldn't have to.

Can anyone explain why Oracle's XSLT differs from other XSLT processors and offer suggestions to solve these issues.  Are these bugs?

Comments
Locked Post
New comments cannot be posted to this locked post.
Post Details
Locked on Jul 31 2013
Added on Jul 2 2013
4 comments
1,441 views