DZone Snippets is a public source code repository. Easily build up your personal collection of code snippets, categorize them with tags / keywords, and share them with the world

Snippets has posted 5883 posts at DZone. View Full User Profile

Counting Characters In String

07.26.2005
| 32254 views |
  • submit to reddit
        
>>> s = 'a;jfkd;aflhakfhaskfjalghlakfhfnkjafyksd'
>>> cnt = {}
>>> for c in s:
	cnt[c] = cnt.get(c,0) + 1

>>> print cnt
{'a': 7, 'd': 2, 'g': 1, 'f': 7, 'h': 4, 'k': 6, 'j': 3, 'l': 3, 'n': 1, 's': 2, 'y': 1, ';': 2}
This can be used to count any distribution.
Note the use of dict.get(key,default) to set to 0
if the key is not avaiable. 
If this were perl, I would just do a
cnt[c] += 1
But python will give an error instead of returning 0.
It's not too bad, though.    

Comments

Snippets Manager replied on Wed, 2009/07/15 - 3:38pm

Thanks so much for this - incredibly coursework | course work | coursework help useful! Needed to use the strip_tags and strip_links functions to send plain text versions of an email. coursework writing | buy coursework | custom coursework | gcse coursework

Snippets Manager replied on Mon, 2009/01/12 - 12:16am

Be careful looping over strings like that. If you happen to be looping over a UTF-8 string, you're going to get unexpected results. >>> import sys >>> import codecs >>> # Unicode String >>> us = u"a;bccd\u00E9\u00E9" >>> s = us.encode("utf-8") >>> def count_chars(s): >>> cnt = {} >>> for c in s: >>> cnt[c] = cnt.get(c,0) + 1 >>> return cnt >>> # This doesn't work because it's a btye stream >>> print s a;bccdéé >>> print len(s) # Unexpectied str len, counts bytes not letters 10 >>> print count_chars(s) {'a': 1, '\xc3': 2, 'b': 1, 'd': 1, '\xa9': 2, 'c': 2, ';': 1} >>> # Since é is a multi-byte utf-8 char, when you loop over the string >>> # it counts each byte of the char instead of the char itself >>> # We have to do this so that unicode objects written to stdout >>> # is encoded automatically as utf-8 >>> sys.stdout = codecs.EncodedFile(sys.stdout, "utf-8", "utf-8") >>> # This will work because it's a unicode object >>> print us a;bccdéé >>> print len(us) # Correct char count, counts unicode code points, not bytes 8 >>> print count_chars(us) {u'a': 1, u'c': 2, u'b': 1, u'd': 1, u'\xe9': 2, u';': 1} >>> # That u'\xe9' is the é char.

Snippets Manager replied on Mon, 2012/05/07 - 2:13pm

>>> s = 'a;jfkd;aflhakfhaskfjalghlakfhfnkjafyksd' >>> dict((c, s.count(c)) for c in s) {'a': 7, 'd': 2, 'g': 1, 'f': 7, 'h': 4, 'k': 6, 'j': 3, 'l': 3, 'n': 1, 's': 2, 'y': 1, ';': 2}