Link Details

Link 107506 thumbnail
User 111696 avatar

By bloid
via ddj.com
Submitted: Aug 27 2008 / 02:36

Some programs need to compare a lot of text strings quickly. Text output filters, search engines, and natural-language processors are examples of applications where rapid text string comparison is important. Besides simple character-by-character comparisons, many text comparison routines need to support matching on wildcard characters. One of the strings might treat the "*" character as a special wildcard character; for example, *sip* should match mississippi.
  • 2
  • 1
  • 224
  • 28

Comments

Add your comment
User 242659 avatar

larsgregersen replied ago:

-1 votes Vote down Vote up Reply

I fail to see why this is important. Regular expresion libraries exist for almost any language and environment you could think of. There is no need to invent this type of feature again.

At any rate, I would to have a comment on the complexity of the algorithm. I.e. how well does it scale for really long strings (e.g. millions of bytes)? The cases where you have "*" at the beginning and/or the end of the search string are special cases that are well handles by standard search algorithms.

Also I recommend the inclusion of existing tricks such as the Boyer-Moore algorithm for string searches in order to speed things up.

In an international environment you'll probably see that the proposed conversion to lower case will fail. Conversion to lower case for unicode is non-trivial and could amount to more computing time than the search.

Add your comment


Html tags not supported. Reply is editable for 5 minutes. Use [code lang="java|ruby|sql|css|xml"][/code] to post code snippets.

Voters For This Link (2)



Voters Against This Link (1)