DZone Snippets is a public source code repository. Easily build up your personal collection of code snippets, categorize them with tags / keywords, and share them with the world

Snippets has posted 5883 posts at DZone. View Full User Profile

Balance Binary Search Tree In Python - Code Module

01.05.2011
| 6325 views |
  • submit to reddit
        
'''
updated 01/06/11 to simplify empty tree definition
Module for balanced binary search trees. Usage:
	from bbst import *
	defines two classes, bbstree and bbsnode.
	
	tree = bbstree() creates a new, empty tree.

	tnode = tree.lookup(k) searches tree for a node with key k.
		Returns the bbsnode with key k, or None if k is not found.
		tnode.key, tnode.value are usable attributes when found.

	tree.forward() is a generator that returns each tree node
		in ascending key order.
	tree.reverse() is a generator that returns each tree node
		in descending key order.
			for tnode in tree.forward/reverse():
				print(tnode.key, tnode.value)

	tree.height() returns the height of the tree, 0 for the empty tree.
		The tree's population is between 2**height and 2**(height+1),
		that is, if height is 7, the tree has from 128 to 255 nodes.
		The expected search length is height, the maximum is height+2.

	tree.insert(k, v=None) creates a node with key=k and value=v
		and inserts it into tree. If a node with key=k already
		exists, its value is updated to v. Nothing is returned.

	tree.instest(k, v=None) looks for a node k in the tree, and inserts
		it only if it does not exist. Returns the node. Use this for
		example if you are using the tree to keep a tally of items k:
		tree.instest(k,0) installs node k only if it didn't exist, but
		returns it, so tree.instest(k,0).value += 1 keeps a tally.
		
	tree.delete(k) deletes the tree node with key k if it exists, and
	        rebalances the tree as necessary.
	
Any data type may be used for a key (not just integers). However, during
lookup, insert, and delete, the key k may be compared to any node's key.
Thus all keys must support ==, >, and < comparisons and be mutually
comparable by Python type rules.

The value of a node may be any Python type. To associate a value with
a key, present it as v in insert(k,v) or instest(k,v). To retrieve a
value, access tree.lookup(k).value or tree.instest(k,v).value. To change
the value of an existing key, assign to tree.lookup(k).value or to
tree.instest(k,v).value, or simply call tree.insert(k,newvalue)

The tree is balanced; adding and deleting keys in any sequence will
not produce an unbalanced tree. The algorithms are based on the
discussion by Julienne Walker found at
http://eternallyconfuzzled.com/tuts/datastructures/jsw_tut_andersson.aspx
'''
#
# A "node" is an object containing a key, a value, a level, and
# links to at most two other nodes.
#
# Walker's algorithms expect the subtrees of any leaf node to be filled
# with pointers to a nil node that acts as a sentinel. It has level=0
# and left and right subs that point to itself. Here we create that one
# node in the bbst module namespace.
#
class bbsnode_0:
	def __init__(self):
		self.level = 0
		self.key = None
		self.value = None
		self.subs = [None,None]
#
NIL_NODE = bbsnode_0()
NIL_NODE.subs = [NIL_NODE,NIL_NODE]
#
# Here we create an official object of type node, with subtrees initialized
# to NIL_NODE:
#
class bbsnode:
	def __init__(self,k,v=None):
		self.level = 1
		self.key = k
		self.value = v
		self.subs = [NIL_NODE,NIL_NODE]
#
# End of class bbsnode.
#
class bbstree:
#
# A "tree" is an object that contains zero or more "nodes." The tree,
# which is the "self" argument to all these member functions, is not
# itself a node, but rather contains nodes, or actually, links to the
# one node which is currently the root of the tree. That in turn links
# to sub-nodes and they to sub-sub-nodes etc.
#
# Here we initialize a new tree. Its only node is NIL_NODE which has
# a level of 0, hence the height of an empty tree is 0.
#
	def __init__(self):
		self.root = NIL_NODE # the tree is empty
		self.lastinsert = None # save last-inserted node
#
# Height
#
	def height(self):
		return self.root.level

#
# Look for key k in the tree; if found, return its node, else return None.
# tree.lookup() is the "public" method; it handles the special case of
# the empty tree. We use __lookup() to recurse into the tree.
#
	def lookup(self, k):
		if self.root != NIL_NODE: # we are not an empty tree
			return self.__lookup(self.root, k)
		else: # this tree has no nodes
			return None # not found
#
# Recursive lookup
#
	def __lookup(self, tnode, k):
		if tnode.key == k:
			return tnode
		else:
			sub = tnode.subs[k > tnode.key] # k is in left or right?
			if sub != NIL_NODE: # we have a subtree that side
				return self.__lookup(sub, k)
			else:
				return None
#
# Generator functions, forward() for stepping through the tree in order, and
# reverse() backwards. Self is a tree object, and the generator really needs to
# operate on the tree nodes, so the real generator is __scan. To scan forward,
# one goes first down the left (0) side then down the right (1) side. To scan
# backward, the reverse. Since the code is the same except for the subscripts
# of tnode.subs[], those numbers are passed in as arguments to __scan().
# The scanning function is not recursive as this would be awkward for a python
# generator (yes you can have recursive generators but a generator that calls
# itself only returns another generator object!). It emulates recursion with a
# stack. Stack size is not an issue with a balanced tree; if we stack 31
# tnodes we have a tree of 2 giga-nodes and stack size is not our main issue!
# An empty tree has self.root = NIL_NODE, terminating the generator immediately.
#
	def forward(self):
		return self.__scan(self.root,0,1)

	def reverse(self):
		return self.__scan(self.root,1,0)

	def __scan(self,tnode,first,second):
		direction = 0
		stack = [NIL_NODE] # when popped, ends the loop
		while tnode != NIL_NODE:
			if direction == 0:
				while tnode.subs[first] != NIL_NODE:
					stack.append(tnode)
					tnode = tnode.subs[first]
			yield tnode
			if tnode.subs[second] != NIL_NODE:
				tnode = tnode.subs[second]
				direction = 0
			else:
				tnode = stack.pop()
				direction = 1

#
# The following are sub-functions of __insert, used to balance the tree.
# One basic operation is the skew: "A skew removes left horizontal links by 
# rotating right at the parent." In the example, a/b/c/d/e are key values
# and the digits are level values. Skew(node-with-d):
#
#             d,2               b,2
#            /   \             /   \
#         b,2     e,1  -->  a,1     d,2
#        /   \                     /   \
#     a,1     c,1               c,1     e,1
#
# The following is Walker's non-recursive skew. Although it gets a self
# arg by virtue of being a class attribute of tree, "self" (the tree object)
# is ignored. The function operates only on the argument (a node) t.
#
	def __skew(self, t):
		if t.level != 0: 
			if t.subs[0].level == t.level:
				tmp = t.subs[0]
				t.subs[0] = tmp.subs[1]
				tmp.subs[1] = t
				t = tmp
		return t
#
# The second basic operation is the split: "A split removes consecutive
# horizontal links by rotating left and increasing the level of the parent."
# Continuing the example following the above skew, split(node-with-b)
#
#         b,2                     d,3
#        /   \                   /   \
#     a,1     d,2     -->     b,2     e,2
#            /   \           /   \
#         c,1     e,2     a,1     c,1
#
# The following is Walker's non-recursive split. Again self (a tree) is ignored.
#
	def __split(self, t):
		if t.level != 0:
			if t.level == t.subs[1].subs[1].level:
				tmp = t.subs[1]
				t.subs[1] = tmp.subs[0]
				tmp.subs[0] = t
				t = tmp
				t.level += 1
		return t
#
# With that, we can insert key k into the tree (or simply find it, if
# it already exists). Nothing is returned. As with tree.lookup(),
# tree.insert() is the public method and handles the special case of
# the empty tree; then __insert() is called to do the recursion.
#
	def insert(self, k, v=None):
		if self.root != NIL_NODE: # we are not an empty tree
			self.root = self.__split( self.__skew( self.__insert(self.root, k, v) ) )
		else: # empty tree getting its first node
			self.root = bbsnode(k,v)
			self.lastinsert = self.root
#
# Recursive insertion
#
	def __insert(self, tnode, k, v):
		if k == tnode.key: # key exists, update value
			tnode.value = v
		else: # it goes in the left or the right
			side = k > tnode.key
			sub = tnode.subs[ side ]
			if sub != NIL_NODE: # there is a subtree that side
				tnode.subs[side] = self.__split( self.__skew( self.__insert(sub, k, v) ) )
			else: # no subtree on that side, make new leaf
				self.lastinsert = bbsnode(k,v)
				tnode.subs[side] = self.lastinsert
		return tnode
#
# Because of balancing, it is impractical for insert() to return the inserted 
# node: __insert() may return a different node as a result of balancing, and
# insert() doesn't know if self.lastinsert was updated or not. This makes it
# awkward to use the tree e.g. to keep a running tally of input symbols, where
# each symbol is stored as a key and the tally is the value. Here we provide
# instest() for this purpose: it can know that self.lastinsert was updated
# because of the failure of the preceding lookup.
#
	def instest(self, k, v):
		tnode = self.lookup(k)
		if tnode == None:
			self.insert(k, v)
			tnode = self.lastinsert
		return tnode
#
# Deletion, still following the lead of Julienne Walker although, you know,
# rewriting pretty freely. Again the public method is part of a tree and 
# handles the special case of the empty tree. __delete() does the recursion.
#
	def delete(self, k):
		if self.root != NIL_NODE: # we are not an empty tree
			self.root = self.__delete(self.root, k)
#
# Recursive deletion with nonrecursive skew and split. This is the second
# version Walker gives, described as "simple, but unconventional enough
# to be confusing." Yup. See her discussion for why it is always enough
# to do three skews and two splits. Also recall that the skew and split
# routines are conditional and do nothing but a test when not needed.
#
	def __delete(self, tnode, k):
		if tnode != NIL_NODE:
			if k == tnode.key:
				if (tnode.subs[0] != NIL_NODE) & (tnode.subs[1] != NIL_NODE) :
					heir = tnode.subs[0]
					while heir.subs[1] != NIL_NODE:
						heir = heir.subs[1]
					tnode.key = heir.key
					tnode.value = heir.value
					tnode.subs[0] = self.__delete(tnode.subs[0], tnode.key)
				else:
					tnode = tnode.subs[tnode.subs[0]==NIL_NODE]
			else:
				side = tnode.key < k # side==0 on key > OR EQUAL
				tnode.subs[side] = self.__delete(tnode.subs[side], k)
		lv = tnode.level-1
		if (tnode.subs[0].level < lv) | (tnode.subs[1].level < lv) :
			tnode.level = lv
			if tnode.subs[1].level > lv:
				tnode.subs[1].level = lv
			tnode = self.__skew(tnode)
			tnode.subs[1] = self.__skew(tnode.subs[1])
			tnode.subs[1].subs[1] = self.__skew(tnode.subs[1].subs[1])
			tnode = self.__split(tnode)
			tnode.subs[1] = self.__split(tnode.subs[1])
		return tnode