Posted on August 21, 2011 by

Nasdaq ITCH

I recently read through Veryon’s posts on bootstrapping a low latency HFT and I was intrigued. I had really thought that all automated trading was simply watching stock prices, comparing to different algorithms, and making decisions. I didn’t see any reason why you would need low latency or anything like that. I learned a lot.

I like the idea of doing things fast and using algorithms that I never get to use during my normal day. So I created a Python based ITCH 4.1 reader, which is hosted on github. It is terribly slow, and I intend to delve into the mulitprocessing library later when I have more time. But I wanted to start an implementation in C as well, which is where I started playing with a retrieval tree focused on stock tickers.

One of the primary obstacles of doing HFT is keeping track of every stock ticker while ever changing data comes in, and making decisions based on the information you have. We need a quick way to lookup stock ticker information, and I decided to use a retrieval tree.

Based on ITCH 4.1, a stock ticker is a maximum of 8 characters, all uppercase. This means that our retrieval tree will be at most 8 levels deep and 26 levels wide. Since we are, theoretically, playing with stocks and huge money, memory should be no problem. We are more concerned with speed. So instead of using a list of child nodes that has to be iterated at each level of depth, I have a static array of 26 child nodes. Each node will be accessed by subtracting 65 from the current key character (remember, all uppercase). Our memory requirements will be (26^8)*sizeof(node). HOWEVER, this is assuming we have every value in use. Really, we use on average 3 levels of depth. But that third level will still have a static array of nodes, so we plan for 4 levels. Our structure is 109 bytes, so we need (26^4)*109, or roughly 50MBs of ram to hold the trie structure.

At the expense of RAM, the search time of the trie is O(l) where l is the length of the key. This is because we don’t have to search a list of child nodes at each level, they are already created.

void *trie_get(struct node* root, char* key)
{
    int i = key[0] - 65;
 
    // If the next character is null, then this is
    // the end of our key.
    if( key[1] == 0 )
        return root->value_ptr;
 
    // Otherwise we go to the offset of the next character
    if( key[i] != NULL )
        return trie_get(root->nodes[i], &key[1]);
    return NULL;
}

Creating the trie is more expensive, because each node has a 26 element array of pointers that has to be allocated.

int trie_set(struct node* root, char* key, void* value)
{
    int i = key[0] - 65;
 
    // If this is the last letter in our key then
    // set the value and return
    if( key[1] == 0 )
    {
        root->value_ptr = value;
        return 1;
    }
 
    // If we have no node, then create the node, and
    // continue recursing the key
    if( root->nodes[i] == NULL )
    {
        struct node* new_node = malloc(sizeof(struct node));
        root->nodes[i] = new_node;
        return trie_set(new_node, &key[1], value);
    }
 
    // Otherwise just keep walking the trie
    return trie_set(root->nodes[i], &key[1], value);
}

But thats ok, because ITCH will send us every ticker at the beginning of the day before the market even opens. So we will create the entire trie before we need real speed.

With some more free time, I can hopefully get the whole parser written in C. Until then…