Skip to Main Content

Java Programming

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

Extract PDF embedded images using iText.

807580Nov 22 2009
Hi All, I am trying to extract images from pdf document using iText library.

I can able to find the image input streams from entire document input stream(It return the PdfReader object).

I am trying to create the Instance of image input stream to get the images information embedded in pdf document.

however I am able to create the instance of only JPEG format(.jpg, .jpeg, .jpe).
*** Image imageObject = Image.getInstance(image); **
Not other format images are embedded in PDF document.

Below is the method for extracting images from pdf document.

public void extractImagesInfo(){
try{
PdfReader chartReader = new PdfReader("MyPdf.pdf");
for (int i = 0; i < chartReader.getXrefSize(); i++) {
PdfObject pdfobj = chartReader.getPdfObject(i);
if (pdfobj != null && pdfobj.isStream()) {
PdfStream stream = (PdfStream) pdfobj;
PdfObject pdfsubtype = stream.get(PdfName.SUBTYPE);
//System.out.println("Stream subType: " pdfsubtype);
if (pdfsubtype != null && pdfsubtype.toString().equals(PdfName.IMAGE.toString())) {
byte[] image = PdfReader.getStreamBytesRaw((PRStream) stream);
Image imageObject = Image.getInstance(image);
System.out.println("Resolution" imageObject.getDpiX());
System.out.println("Height" imageObject.getHeight());
System.out.println("Width" imageObject.getWidth());

}
}
}
}catch(Exception e){
e.printStackTrace();
}

}

Thank in advance.
Comments
Locked Post
New comments cannot be posted to this locked post.
Post Details
Locked on Dec 20 2009
Added on Nov 22 2009
0 comments
922 views