Coinciding with another project of mine, I've decided to design a graph database management system. In Python.
Why in the world would I want to do that? There are a number of solid graph databases out there. What do I hope to accomplish?
Primarily, I'd like to learn. Database systems require developing with a much keener eye toward performance than I'm used to. Beyond runtime and memory optimization, I'll need to constantly juggle I/O time. Revisiting hard drive geometry will be painful, but there is a certain thrill you get from understanding your application's performance to that level, and I plan to chase it.
I chose Python for the usual reasons- to save on development time. Designing a DBMS is serious work, or so I hear- and it'll be a while before I can hack on this stuff with confidence. Until then, I'd like to use an enabling language. There are better performing choices than Python, surely, but until I have an idea where the bottlenecks lie, why worry? Oh, and I almost forgot: Python is awesome.
I've already run into a number of challenges on this project.
- Resources on designing a DBMS are scarce.
- Resources on designing a graph DBMS are nonexistent. Looks like I'll be cracking open some source, and soon.
- Clever disk use is tough. Seemingly simple questions about, say, the benefits of memory-mapped I/O don't always have easy answers.
Despite these, I'm going to give this thing a shot. I'll post more on my progress as it, well, progresses.
