Skip to Main Content

SQL & PL/SQL

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

Get rows not containing UTF-8 characters.

581144Mar 12 2012 — edited Mar 13 2012
Hi guys,

The first part is just general information. If you want to go straight to my question go to the bottom.

Our business has recently loaded a lot of data from a legacy system to our database.
I have found out that there are some problems with the data added to our database.
Some characters which is not UTF-8(?) characters has been added to a column (LONG) in the table fnd_documents_long_text.

I noticed the data error because XML Publisher gave me the following warning:
-----
java.io.UTFDataFormatException: Invalid UTF8 encoding.


I then copied the output xml from the server to my own desktop. When I tries to open the XML file in my editor it replaces a character with a Unicode substitution character:
-----
Some bytes have been replace with the Unicode substitution character while..


Now. I found the line causing the error in XML Publisher and my editor.

The line is:
<LONG_TEXT>COPPER WIRE TO BS 4516 PT.1 [xCF].25 PVA GR.2 (LEWMEX) MAX REEL SIZE 25KG</LONG_TEXT>
The [xCF] are what raises the whole problem. When I try to copy the character into the editor in the forum it shows ϱ

QUESTION
I want to find all the attachments (rows in fnd_documents_long_text), which will cause XML Publisher to fail executing a report request.

I have created the following PL/SQL Block to identify the rows that needs correction:
DECLARE
  CURSOR c1 IS
    SELECT  media_id,
            long_text
      FROM  fnd_documents_long_text
    ;
    
  v_media_id              fnd_documents_long_text.media_id%TYPE;
  v_long_text             fnd_documents_long_text.long_text%TYPE;
  
  l_test                  varchar2(2000);
BEGIN
  dbms_output.put_line('START');
  
  IF (c1%ISOPEN)
  THEN
    CLOSE c1;
  END IF;
  
  OPEN c1;
  LOOP
    FETCH c1 INTO v_media_id, v_long_text;
    EXIT WHEN c1%NOTFOUND;
    
      l_test := REGEXP_REPLACE(v_long_text,'[\x80-\xFF]','');
      
      IF (l_test != ' ')
      THEN
        dbms_output.put_line('Media: ' || v_media_id || ', Text: ' || v_long_text);
      END IF;
      
  END LOOP;
  
  CLOSE c1;
  
  dbms_output.put_line('END');
END;
My problem is that the list I get back contains row where æøå üä etc. exists. How do I get the list so I only contains rows where screwed up characters like ϱ exists?


Thank you in advance.

/Kenneth
Comments
Locked Post
New comments cannot be posted to this locked post.
Post Details
Locked on Apr 10 2012
Added on Mar 12 2012
9 comments
8,852 views