Link Details

Link 947669 thumbnail
User 1123099 avatar

By bpriyada
Submitted: Mar 31 2013 / 20:30

Full Text Search engines are what I am referring to here and these search engines quickly and effectively search large volume of unstructured text. To extract text from PDF documents, let us use Apache PDFBox, an open source java library that will extract content from PDF documents which can be fed to Lucene for indexing.
  • 1
  • 0
  • 1018
  • 225

Add your comment

Html tags not supported. Reply is editable for 5 minutes. Use [code lang="java|ruby|sql|css|xml"][/code] to post code snippets.

Voters For This Link (1)

Voters Against This Link (0)

    Apache Hadoop
    Written by: Piotr Krewski
    Featured Refcardz: Top Refcardz:
    1. Play
    2. Akka
    3. Design Patterns
    4. OO JS
    5. Cont. Delivery
    1. Play
    2. Java Performance
    3. Akka
    4. REST
    5. Java