DZone Snippets is a public source code repository. Easily build up your personal collection of code snippets, categorize them with tags / keywords, and share them with the world
Cleaning Strings With Regular Expressions
Firsly, to get rid of all non ascii characters.
=> text = "Normal ©®»λαβstring" "Normal ©®»λαβstring" => stripped = text.chars.gsub(/[^\x20-\x7E]/, '') "Normal string"
Now lets get rid of html tags.
# strip html tags
def strip_html(str, preserve_tags = ['p'])
return '' unless str.is_a?(String)
str = str.strip || ''
preserve_el = preserve_tags.join('|') << '|\/'
str.chars.gsub(/<(\/|\s)*[^(#{preserve_el})][^>]*>/,'')
end
=> text = "<p>This is a <a href=\"http://www.example.com\">link</a> and a <span>span</span></p>"
"<p>This is a <a href=\"http://www.example.com\">link</a> and a <span>span</span></p>"
=> stripped = strip_html(text)
"<p>This is a link and a span</p>"
=> stripped = strip_html(text, [])
"This is a link and a span"
Finally, lets compact some whitespace to ensure that at most, one space remains between two words.
=> text = " This is some text with strange spacing patterns "
" This is some text with strange spacing patterns "
=> stripped = text.chars.gsub(/\s{2,}/,'').strip
"This is some text with strange spacing patterns"





