关于Java:使用Apache POI从文档中获取图像

Get Image from the document using Apache POI

我正在使用Apache Poi从docx中读取图像。

这是我的代码:

1
2
3
4
5
6
7
8
9
10
11
12
13
enter code here

public Image ReadImg(int imageid) throws IOException {
    XWPFDocument doc = new XWPFDocument(new FileInputStream("import.docx"));
    BufferedImage jpg = null;
    List<XWPFPictureData> pic = doc.getAllPictures();
    XWPFPictureData pict = pic.get(imageid);
    String extract = pict.suggestFileExtension();
    byte[] data = pict.getData();
    //try to read image data using javax.imageio.* (JDK 1.4+)
    jpg = ImageIO.read(new ByteArrayInputStream(data));
    return jpg;
}

它正确读取图像,但顺序不正确。

例如,如果文档包含

image1.jpeg
image2.jpeg
image3.jpeg
image4.jpeg
image5.jpeg

显示为

image4
image3
图片1
图片5
image2

您能帮我解决吗?

我想按顺序阅读图像。

谢谢,
西西克


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
public static void extractImages(XWPFDocument docx) {
    try {

        List<XWPFPictureData> piclist = docx.getAllPictures();
        // traverse through the list and write each image to a file
        Iterator<XWPFPictureData> iterator = piclist.iterator();
        int i = 0;
        while (iterator.hasNext()) {
            XWPFPictureData pic = iterator.next();
            byte[] bytepic = pic.getData();
            BufferedImage imag = ImageIO.read(new ByteArrayInputStream(bytepic));
            ImageIO.write(imag,"jpg", new File("D:/imagefromword/" + pic.getFileName()));
            i++;
        }

    } catch (Exception e) {
        System.exit(-1);
    }

}