I have to start this post by saying that I love functional programming. I understand that purely functional languages can perfect for only a small subset of problems, thus are not really valuable for real-world scenarios, BUT its learning can prove to be fantastic to create a new mindset on how to approach problems. With that in mind, I decided to write this post on how to code a Binary Search Tree in Haskell alongside some classic BST algorithms.
Let's get started!
We start by telling Haskell what a BST is. In this case, a BST of type t can be None, which is just a name that we are going to use to say that the node is not defined, or can be a Node, that has a value of type t with two children that are also BSTs. You don't need to focus on the deriving (Show) part, it simply means that our data definition uses the Show class in order to print.
data BST t = None | Node t (BST a) (BST a) deriving (Show)
It's interesting to note here this powerful ability of defining names in Haskell. This can be used to easily create new types with definitions that don't necessarily exist in the language, like we just did.
Now that we have our definition of what a BST is, we want to create a function that adds a value to our tree.
add :: BST Integer -> Integer -> BST Integer add None x = Node x None None add (Node y left right) x | x > y = Node y left (add right x) | otherwise = Node y (add left x) right
For those who don't remember, a Binary Search Tree is a binary tree where, on every given node, all nodes to his right are bigger than him and all nodes to his left are smaller or equal to him. That's exactly what we are telling our function to do.
We first say that our add function expects a BST of integers and an integer, and returns a BST of integers. Note here that Haskell supports high order functions which means that this same definition can be used for a function that receives a BST of integers and returns a function that expects an integer and returns a BST of integers. I will not enter this subject right now, but you can keep that in mind as it is really cool.
With the function type set, we can tell what our function actually does. We only have two scenarios here. If our BST is empty, we return a Node with the given value and empty children, and if we have a Node already, we compare its value to one we want to add. If the new value is bigger than the value of the current node, we say that our right child is equal to the add function with our right child and the value we want to add, otherwise our left node is equal to the add function with our left child and the new value. Pretty, right?
Since Haskell is purely functional, we don't have stuff like variables, so it's important to have a function that receives a list of values and creates a BST out of them.
addMany :: BST Integer -> [Integer] -> BST Integer addMany tree  = tree addMany tree (head:tail) = addMany (add tree head) tail
Here we define our addMany function, which receives a BST and a list of integers. If our list is empty, it means that there are no new nodes to add, so we can simply return our current tree, but if we still have elements, we recursively call our addMany function by adding the first element of the list to our current tree, and the rest of our list.
And remember the high order stuff? For the more advanced people out there, we can also define our addMany function as the following, although I won't give you too much detail on this as I believe it would go out of the purpose of this post.
addMany :: BST Integer -> [Integer] -> BST Integer addMany = foldl add
Now let's tackle some classic BST algorithms. For the sake of length of the post, I will talk about only a few problems: check if a value exists, tree depth, and get tree levels - just to get a sense of a trickier one.
hasValue :: BST Integer -> Integer -> Bool hasValue None _ = False hasValue (Node x left right) y | x == y = True | otherwise = hasValue left y || hasValue right y
The hasValue function, that receives a BST and an integer and returns a bool if the value exists or not, has two scenarios. The first one, if the BST is None, returns false, because we didn't find the value on the tree. If the BST is not None, on the other hand, if the value we are looking for is equal to the current one, means that we found the node and return True, otherwise we simply search the value on the left and right children, and return if we found the value on either one of them.
depth :: BST Integer -> Integer depth None = 0 depth (Node _ left right) = 1 + bigger (depth left) (depth right) where bigger x y | x > y = x | otherwise = y
For the depth, if our BST is None, we simply return 0, but if it has a Node, we return 1 + the most profound subtree between the left and right child.
Here is nice to note the order of the sum. Since Haskell has a lazy evaluation, we can put the recursion in the end to create a tail recursion and save some memory.
levels :: BST Integer -> [[Integer]] levels None =  levels x = addAtLevel [(x, 0)] where appendAt x n  = appendAt x n [] appendAt x 0 (list : tail) = (x : list) : tail appendAt x at (head : tail) = head : appendAt x (at -1) tail addAtLevel  =  addAtLevel ((None, _) : tail) = addAtLevel tail addAtLevel ((Node v left right, pos) : tail) = appendAt v pos (addAtLevel (tail ++ [(left, pos + 1), (right, pos + 1)]))
Finally, a difficult one. Not exactly because the problem itself is hard, but because it requires us to think about how to do a level traverse in a recursive way, which sounds not very intuitive.
For this, we will create 2 auxiliary functions. The first one receives a value, an index, and a list of lists and inserts this value as the first element of the list of position index. If the position index is not yet a list, it creates one and inserts the value.
The second and most important function receives a list of pairs. Each pair contains a node and the level of that node, which will give us the index in which we will insert the value. If the list is empty, we simply return an empty list. If the node we want to insert is None, it means that is not a value, and, thus, will not be inserted in the final answer, so we return the recursive call of the remaining of the list. If the pair has a Node, on the other hand, we call the appendAt function on the value of the Node, to be inserted at position pos in the recursive call of the remaining of the list concatenated with our left and right children with the current position plus one. Appending these children to the end of the list will do the work of a queue, which will help us go through the tree level by level.
You can realize that the pattern is always the same. We first think of the scenario that ends the recursion and then the ones the call the function recursively.
Sometimes can be difficult to think like that, but forcing yourself to approach every problem this way can make it easier to find solutions where this is actually a good approach.