HTML String to text
843841Aug 4 2004 — edited Aug 4 2004Hi,
I wrote a servlet that recives e-mails and prints them. The problem is that there are e-mails that contains an HTML message, so I want to convert the string that contains the HTML message into text.
I found this program that reads a HTML page from URL and prints the text of the HTML, exactly what I want except the fact that it reads the HTML from a URL and not from a String:
import java.io.*;
import java.net.*;
import java.util.*;
import javax.swing.*;
import javax.swing.text.*;
import javax.swing.text.html.*;
class HTML2Text {
public static void main(String[] args) {
EditorKit kit = new HTMLEditorKit();
Document doc = kit.createDefaultDocument();
// The Document class does not yet handle charset's properly.
doc.putProperty("IgnoreCharsetDirective", Boolean.TRUE);
try {
// Create a reader on the HTML content.
Reader rd = getReader(args[0]);
// Parse the HTML.
kit.read(rd, doc, 0);
System.out.println( doc.getText(0, doc.getLength()) );
}
catch (Exception e) {
e.printStackTrace();
}
System.exit(1);
}
// Returns a reader on the HTML data. If 'uri' begins
// with "http:", it's treated as a URL; otherwise,
// it's assumed to be a local filename.
static Reader getReader(String uri)
throws IOException {
// Retrieve from Internet.
if (uri.startsWith("http:")) {
URLConnection conn = new URL(uri).openConnection();
return new InputStreamReader(conn.getInputStream());
}
// Retrieve from file.
else {
return new FileReader(uri);
}
}
}
can someone tell me how to make this program to get a String (that contains the HTML) instead of the URL of the HTML page?
Thanks, Naor.