Skip to Main Content

Database Software

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

Problems with foreign characters in UTL_HTTP.request_pieces

user5614215Jul 24 2016 — edited Jul 25 2016

I have problems when downloading an XML document that contains

characters from other languages (in UTF-8) into the database

(Oracle Database 12c Enterprise Edition Release 12.1.0.1.0 - 64bit Production)

using the UTL_HTTP.request_pieces function.

Some of the characters of foreign languages are broken.

UTL_HTTP.request_pieces(url, int):

The function loads a file (in our case, XML) from the given URL in parts of

2000 characters/bytes(?) each (see PL/SQL concrete code below; the int

argument just states an upper bound how many pieces are read).

The problem occurs apparently when some characters (examples: russian, iranian,

japanese, armenian, burmese, ...) occur at/after(?) the split between the pieces. 

Note:

* I know that Oracle has problems when the encoding declaration is UTF-8,

  and that has to be changed in the XML file into the oracle-specific AL32UTF8.

* The DB runs under AL32UTF8 character set configuration (i.e., NLS_CHARACTERSET is  

  AL32UTF8).

We use an example, a geographical sample database for teaching, which includes

the local names of countries, provinces and cities, available at

'http://www.dbis.informatik.uni-goettingen.de/mondial-al32utf8.xml'

(already as AL32UTF8-encoded).

The PL/SQL code is appended below; the local result can be visited at

<http://www.semwebtech.org/sqlfrontend/>

with

*********

select * from mondial

**********

or

**********

xquery

for $i in ora:view("mondial")/mondial//*[localname]/(name|localname)

return $i

/

**********

In the example, it breaks with the russian/cyrillic name of the Astrakhanskaya

province, "Астраханская".

The result differs when the document changes, and other characters are affected

by the piece-breaks. In SQLplus, the output shows inverted "?" (usually one,

sometimes two) for the damaged characters.

SQL code:

#####

CREATE OR REPLACE FUNCTION getXMLunchanged(url IN VARCHAR2)

  -- requires encoding="AL32UTF8" set, no DTD reference

RETURN XMLType

IS

x  UTL_HTTP.html_pieces;

tempCLOB CLOB := NULL;

BEGIN

         x := UTL_HTTP.request_pieces(url, 10000);

         DBMS_LOB.createTemporary(tempCLOB, TRUE, DBMS_LOB.SESSION);

         IF x.COUNT > 0 THEN

   FOR i IN 1..x.COUNT LOOP

            DBMS_LOB.writeAppend(tempCLOB, LENGTH(x(i)), x(i));

           END LOOP;

         END IF;

         RETURN XMLType(tempCLOB);

END;

/

CREATE TABLE mondial OF XMLType;

INSERT INTO mondial

  VALUES (system.getXMLunchanged('http://www.dbis.informatik.uni-goettingen.de/mondial-al32utf8.xml'));

#####

Wolfgang

Comments
Locked Post
New comments cannot be posted to this locked post.
Post Details
Locked on Aug 22 2016
Added on Jul 24 2016
1 comment
1,779 views