Get content from e-mail with text/html as content type
981862Dec 27 2012 — edited Jan 6 2013Hey there,
I am working on an e-mail application in Java. This application basically enters a mailbox every few minutes and loops trough unread e-mails for certain subjects. If a subject is found I want to retreive the content of said e-mail. Retreiving the content works fine with e-mails sent from Gmail, Outlook (desktop client), Hotmail.
However, when I am trying to get the content of an e-mail sent by an Office 365 webclient I get returned an text/html content type. I printed the content and found out it exists out of HTML code. But this HTML code isn't a good format:
+<html dir=3D"ltr">+
+<head>+
+<meta http-equiv=3D"Content-Type" content=3D"text/html; charset=3Diso-8859-=+
+1">+
+<style type=3D"text/css" id=3D"owaParaStyle"></style>+
+</head>+
+<body fpstyle=3D"1" ocsi=3D"0">+
+<div style=3D"direction: ltr;font-family: Tahoma;color: #000000;font-size: =+
+10pt;"> +
+<div><span style=3D"font-family: 'Segoe UI', Helvetica, Arial, sans-serif;"=+
+>Geachte heer/mevrouw,</span><br style=3D"font-family: 'Segoe UI', Helvetic=+
+a, Arial, sans-serif;">+
+<br style=3D"font-family: 'Segoe UI', Helvetica, Arial, sans-serif;">+
+<span style=3D"font-family: 'Segoe UI', Helvetica, Arial, sans-serif;">Wij =+
hebben uw inzending ontvangen en gecontroleerd. Hierbij het verslag van</sp=
an><br style=3D"font-family: 'Segoe UI', Helvetica, Arial, sans-serif;">
+<span style=3D"font-family: 'Segoe UI', Helvetica, Arial, sans-serif;">de c=+
ontrole.</span><br style=3D"font-family: 'Segoe UI', Helvetica, Arial, sans=
-serif;">
Note the = symbols.
When I am trying to get the content of Gmail, Outlook or Hotmail I get back text/plain as content type, just text no HTML or random symbols.
How can I solve this, I tried parsing the content with Jsoup, but the random = symbols cause problems.
Any help is appreciated,
Thanks!