I'm trying to write some XSLT so that I can apply it to any given XML using XMLTransform to generate a flattened CSV structure. My input can be any XML document, with any number of nested levels so the structure isn't known to the XSLT. I'm fairly new to XSLT but am having problem with Oracle's version of it. I've tested on both 10gR2 and 11gR2.
CSV Rules :
Adjacent fields must be separated by a comma
Embedded commas in fields are escaped e.g. 15, Maple Street => "15, Maple Street"
Embedded quotes in fields are escaped. e.g. O'Brien => "O'Brien" or John "Jonner" McNabb => "John ""Jonner"" McNabb"
Embedded CR / LF or CRLF in fields are escaped with quotes.
Records are separated with CRLF pairs
Flattening Rules :
The text of leaf elements (elements without children) are output even if they are empty
Any element with text is output
A new line occurs whenever the nesting level changes. i.e. If an element has children.
When I run the following XML and my XSLT through Oracle I get results inconsistent with results I get when I run the XML and XSLT through online XSLT utils (including W3C's test pages).
with qryXML as (
select xmltype(q'[
<email>
<to>Steve O'Brien</to>
<cc/>
<from>Jane "The Smiler" Griff</from>
<heading>Reminder</heading>
<body>Hey, don't forget me this weekend! I've enclosed some pics from last weekend
Love Jane
</body>
<attachments>
<attachment>
<filename>Dance.jpg</filename>
<encoding>MIME</encoding>
</attachment>
<attachment>
<filename>Signature.txt</filename>
<encoding/>
</attachment>
</attachments>
</email>
as XMLDATA from dual)
select
XMLTransform(q.XMLDATA,
xmltype(q'[
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>
<xsl:strip-space elements="*"/>
<xsl:template match="*[text() or not(*)]">
<xsl:if test="position() > 1">
<xsl:text>,</xsl:text>
</xsl:if>
<xsl:call-template name="display_csv_field">
<xsl:with-param name="field" select="."/>
</xsl:call-template>
<xsl:if test="not(following-sibling::*[text() or not(*)])">
<xsl:text>
</xsl:text>
</xsl:if>
</xsl:template>
<xsl:template name="display_csv_field">
<xsl:param name="field"/>
<xsl:variable name="CRLF">
<xsl:text> </xsl:text>
</xsl:variable>
<xsl:variable name="CR">
<xsl:text> </xsl:text>
</xsl:variable>
<xsl:variable name="LF">
<xsl:text> </xsl:text>
</xsl:variable>
<xsl:variable name="apos">'</xsl:variable>
<xsl:choose>
<xsl:when test="contains( $field, '"' )">
<!-- Field contains a quote. So escape -->
<xsl:text>"</xsl:text>
<xsl:call-template name="escape_quotes">
<xsl:with-param name="string" select="$field" />
</xsl:call-template>
<xsl:text>"</xsl:text>
</xsl:when>
<xsl:when test="contains( $field, ',' ) or
contains( $field, $apos ) or
contains( $field, $CRLF ) or
contains( $field, $CR ) or
contains( $field, $LF )" >
<!-- Field contains a comma, apostrophe and/or a linefeed, so quote -->
<xsl:text>"</xsl:text>
<xsl:value-of select="$field" />
<xsl:text>"</xsl:text>
</xsl:when>
<xsl:otherwise>
<!-- No need to enclose this field in quotes. -->
<xsl:value-of select="$field" />
</xsl:otherwise>
</xsl:choose>
</xsl:template>
<xsl:template name="escape_quotes">
<xsl:param name="string" />
<xsl:value-of select="substring-before( $string, '"' )" />
<xsl:text>""</xsl:text>
<xsl:variable name="substring_after_first_quote"
select="substring-after( $string, '"' )" />
<xsl:choose>
<xsl:when test="not( contains( $substring_after_first_quote, '"' ) )">
<xsl:value-of select="$substring_after_first_quote" />
</xsl:when>
<xsl:otherwise>
<!-- The substring after the first quote contains a quote.
So, we call ourself recursively to escape the quotes
in the substring after the first quote. -->
<xsl:call-template name="escape_quotes">
<xsl:with-param name="string" select="$substring_after_first_quote" />
</xsl:call-template>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
</xsl:stylesheet>
]'
)) from qryXML q
/
Output from Oracle (which appears incorrect)
"Steve O'Brien""Jane ""The Smiler"" Griff"Reminder"Hey, don't forget me this weekend! I've enclosed some pics from last weekend
Love Jane
"
Dance.jpgMIME
Signature.txt
Output from most other XSLT test environments (which appears correct):
"Steve O'Brien",,"Jane ""The Smiler"" Griff",Reminder,"Hey, don't forget me this weekend! I've enclosed some pics from last weekend
Love Jane
"
Dance.jpg,MIME
Signature.txt,
Oracle's XSLT handling appears incorrect in several ways:
1. There is no comma delimiter separating fields and on investigation, position() is always returning 1, which is incorrect. From XSLT reference "The position function returns a number equal to the context position from the expression evaluation context." ...."the position, or index number, of the node, relative to all the selected nodes in the node list." Therefore any node in a node list should be indexable with position. I am selecting a node list *[text() or not(*)] => Any nodes with text or with no children, yet position of each node in this list is always 1.
2. My XSLT says the output is text, yet Oracle is still using HTML character escaping which is not text! " instead of " and ' instead of '. I could search and replace, but I shouldn't have to.
Can anyone explain why Oracle's XSLT differs from other XSLT processors and offer suggestions to solve these issues. Are these bugs?