Skip to Main Content

Database Software

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

RSS feed extraction problem on Scandinavian characters

jariolaDec 8 2011 — edited Dec 11 2011
Hi,

I have problem when extracting RSS from some feeds, Scandinavian characters are converted.
When I view feed source, characters are correct. So I assume problem might be my XML query.
Example RSS feed from
http://www.mtv3.fi/rss/urheilu_f1.rss
I get results like
Pakokaasuvirtausten hyödyntäminen rajoitetaan minimiin
That should say
Pakokaasuvirtausten hyädyntäminen rajoitetaan minimiin
And problem do not occur e.g. feed from here
http://www.hs.fi/uutiset/rss/
I use below table and package to get RSS fead and store it to clob column
CREATE TABLE rss
  (payload CLOB
  );

create or replace
PACKAGE RSS_UTIL AS
  g_wallet_path     CONSTANT VARCHAR2(200) := 'file:/somepath/wallets';
  g_wallet_pass     CONSTANT VARCHAR2(200) := 'verysecret';
  g_collection_name CONSTANT VARCHAR2(200) := 'RSS_RESPONSE';
  g_user_agent      CONSTANT VARCHAR2(200) := 'Mozilla/5.0';
--------------------------------------------------------------------------------
  PROCEDURE get_rss_feed(
    p_url IN VARCHAR2
  );
END;
/
create or replace
PACKAGE BODY RSS_UTIL AS
  PROCEDURE get_rss_feed(
    p_url IN VARCHAR2
  )
  AS
    l_clob      CLOB;
    l_buffer    VARCHAR2(32767);
    l_http_req  utl_http.req;
    l_http_resp utl_http.resp;
  BEGIN
  --  
    --utl_http.set_wallet(g_wallet_path, g_wallet_pass);
    --
    utl_http.set_proxy(apex_application.g_proxy_server, NULL);
    --
    l_http_req  := utl_http.begin_request(p_url);
    --
    utl_http.set_header(l_http_req, 'User-Agent', g_user_agent);
    --
    l_http_resp := utl_http.get_response(l_http_req);
    --
    dbms_lob.createtemporary(l_clob, FALSE);
    dbms_lob.open(l_clob, dbms_lob.lob_readwrite);
    --
    BEGIN
      LOOP
        utl_http.read_text(l_http_resp, l_buffer);
        dbms_lob.writeappend(l_clob, LENGTH(l_buffer), l_buffer);
      END LOOP;
    EXCEPTION WHEN utl_http.end_of_body THEN
      NULL;
    END;
    --
    utl_http.end_response(l_http_resp);
    --
    EXECUTE IMMEDIATE 'TRUNCATE TABLE RSS';
    INSERT INTO RSS VALUES(l_clob);
    COMMIT;
  --
  END;
END;
/

BEGIN
  rss_util.get_rss_feed('http://www.mtv3.fi/rss/urheilu_f1.rss');
END;
/
And I query data from clob by
SELECT extractValue(value(t), '//link/text()')   AS LINK,
  extractValue(value(t), '//category[1]/text()') AS category,
  extractValue(value(t), '//category[2]/text()') AS sub_category,
  extractValue(value(t), '//title/text()')       AS title,
  extractValue(value(t), '//description/text()') AS description,
  extractValue(value(t), '//author/text()')      AS author,
  extractValue(value(t), '//pubDate/text()')     AS rss_date
FROM rss,
  TABLE(xmlsequence(extract(xmltype(rss.payload),'//rss/channel/item'))) t ;
I have same also as Apex app if you like check
http://actionet.homelinux.net/htmldb/lspdemo?p=48
Just press login button to enter application.

Only difference is that I store RSS to apex_collection clob column and query it from there to report.

How I should change my code or query to fix this problem?

Regards,
Jari

http://dbswh.webhop.net/dbswh/f?p=BLOG:HOME:0
This post has been answered by odie_63 on Dec 8 2011
Jump to Answer
Comments
Locked Post
New comments cannot be posted to this locked post.
Post Details
Locked on Jan 8 2012
Added on Dec 8 2011
5 comments
350 views