{"id":3649,"date":"2014-03-29T07:29:53","date_gmt":"2014-03-29T07:29:53","guid":{"rendered":"https:\/\/unknownerror.org\/index.php\/2014\/03\/29\/how-to-extract-the-text-of-a-ppt-file-with-tika-collection-of-common-programming-errors\/"},"modified":"2014-03-29T07:29:53","modified_gmt":"2014-03-29T07:29:53","slug":"how-to-extract-the-text-of-a-ppt-file-with-tika-collection-of-common-programming-errors","status":"publish","type":"post","link":"https:\/\/unknownerror.org\/index.php\/2014\/03\/29\/how-to-extract-the-text-of-a-ppt-file-with-tika-collection-of-common-programming-errors\/","title":{"rendered":"How to extract the text of a .ppt file with tika?-Collection of common programming errors"},"content":{"rendered":"<p>I have extracted the text of a .pdf file with tika using <code>AutoDetectParser<\/code> class. but when I use the same code for extracting the text of a .ppt file, it throws an exception. How to do it? thanks<\/p>\n<p>EDIT:<br \/>\nThe code that I used is:<\/p>\n<pre><code>File file = new File(\"1.ppt\");\nInputStream input = new FileInputStream(file);\nParser autoDetectParser = new AutoDetectParser();\nMetadata metadata = new Metadata();\nStringWriter writer = new StringWriter();\nContentHandler handler = new WriteOutContentHandler(writer);\nautoDetectParser.parse(input, handler, metadata, new ParseContext());\n<\/code><\/pre>\n<p>and the exception was:<\/p>\n<pre><code>java.lang.NoSuchFieldError: SMALLER_BIG_BLOCK_SIZE_DETAILS\nat org.apache.poi.poifs.filesystem.NPOIFSFileSystem.(NPOIFSFileSystem.java:93)\nat org.apache.poi.poifs.filesystem.NPOIFSFileSystem.(NPOIFSFileSystem.java:190)\nat org.apache.poi.poifs.filesystem.NPOIFSFileSystem.(NPOIFSFileSystem.java:184)\nat org.apache.tika.parser.microsoft.POIFSContainerDetector.getTopLevelNames(POIFSContainerDetector.java:371)\nat org.apache.tika.parser.microsoft.POIFSContainerDetector.detect(POIFSContainerDetector.java:165)\nat org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:61)\nat org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:113)\nat ppt.PPTParserTest.test3(PPTParserTest.java:52)\n<\/code><\/pre>\n<p>I found out that the problem caused by some extra <code>jars<\/code> that have been in my <code>classpath<\/code>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I have extracted the text of a .pdf file with tika using AutoDetectParser class. but when I use the same code for extracting the text of a .ppt file, it throws an exception. How to do it? thanks EDIT: The code that I used is: File file = new File(&#8220;1.ppt&#8221;); InputStream input = new FileInputStream(file); [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-3649","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/unknownerror.org\/index.php\/wp-json\/wp\/v2\/posts\/3649","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/unknownerror.org\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/unknownerror.org\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/unknownerror.org\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/unknownerror.org\/index.php\/wp-json\/wp\/v2\/comments?post=3649"}],"version-history":[{"count":0,"href":"https:\/\/unknownerror.org\/index.php\/wp-json\/wp\/v2\/posts\/3649\/revisions"}],"wp:attachment":[{"href":"https:\/\/unknownerror.org\/index.php\/wp-json\/wp\/v2\/media?parent=3649"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/unknownerror.org\/index.php\/wp-json\/wp\/v2\/categories?post=3649"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/unknownerror.org\/index.php\/wp-json\/wp\/v2\/tags?post=3649"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}