Design question for tick database
819828Nov 30 2010 — edited Nov 30 2010Hi,
I'm looking at using BerkeleyDB (JE) to store trades (financial market trades). Since I'm sure this is not an uncommon use, I thought I would poll for design advice.
Each record would have a datetime, symbol, price and size. A record in its own right is not unique - there can be duplicates - so I believe I'll need to add a unique sequence number, call it id.
A feed handler would be continually capturing market data and writing to the database - in normal mode this would always be writing data with the current datetime (increasing).
Queries would be a combination of replaying all data between two dates, and also instrument specific queries (give me the last trade for symbol X at datetime Y).
My initial thought was to have a separate database per symbol, allowing duplicates, keying only on the datetime. This is ok but gives bad write performance since if we are capturing 500 symbols, we are writing to 500 databases so the IO is very scattered.
My second thought was to have a single database, with a secondary index on the symbol. The problem with this is that I believe if I have a secondary index, the database cannot have duplicates. So I think I'd need to key on dttm + id to ensure uniqueness but still allow scrolling by datetime. I think this would give a good IO pattern for capturing data, and good access for querying both by symbol and from/to date ranges.
Any comments welcome - I'm sure this is a common use case and has been designed-to-death by know.
Thanks,
Jon