-
-
Notifications
You must be signed in to change notification settings - Fork 110
Open
Description
Hello, I'm trying to extract an MS Word file embedded in an RTF file by using RTFEmbeddedObject.getEmbeddedObjects(String file). The method returns a list with four instances, which is expected. When I check the resulting data array with Apache Tika, it returns the application/x-tika-msoffice mime type, which seems correct.
However, when I try to open the resulting file, it doesn't show the expected result on MS Word. I will attach both files on this issue.
here's the code that I'm using:
`
List<List> rtfl = RTFEmbeddedObject.getEmbeddedObjects(readLineByLine(file));
for(List<RTFEmbeddedObject> l : rtfl){
FileUtils.writeByteArrayToFile(new File
("test.doc"),
l.get(1).getData());
Tika t = new Tika();
String s = t.detect(l.get(1).getData());
System.out.println("Mimetype: " + s);
}
`
Attachments at:
rtfword.zip
Thanks in advance!
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels