edit: links to home.etherpunk.com will likely not work properly since I'm moving servers around. I'll fix this when I get a chance.
Yesterday a bug hit me. I wanted to see what was the fastest way to read and write data to a database. The methods of which to do the writing was the fun part. I chose LINQ, SqlCommand (2.1 Pakala), and SubSonic. There are a couple different ways to use each and I wondered which way was the right way. I found some interesting results.
Firstly, I'd like to provide the link to the code that I used so you can verify the results yourself or critique my methods. It can be found at: http://home.etherpunk.com/svn/DatabaseSpeedTest/DatabaseSpeedTest/WriteT...
Secondly, the database structure is *very* simple. The only table I'm using is a TypeLookup table which is essentially a primary key (set as a unique identifier), a name, and createdon fields. This is a table I'm using for another application (that application is exceptionally simple, which is why I chose it).
Here are the results (in seconds):
Running write tests. Creating 10000 entities.
LINQ using Attach: 3.523
LINQ using InsertOnSubmit: 38.25
SqlCommand Dynamic Sql (constant connection string): 15.989
SqlCommand Dynamic Sql (static connection string): 15.123
SqlCommand Individual (constant connection string): 8.86
SqlCommand Individual (static connection string): 9.526
SqlCommand Individual More Effecient (constant connection string): 8.862
SqlCommand Individual More Effecient (static connection string): 9.527
SqlCommand with StringBuilder (constant connection string): 6.056
SqlCommand with StringBuilder (static conenction string): 6.144
SubSonic with individual saves: 8.945
SubSonic using Collections: 9.654
An intersting thing is how horrible InsertOnSubmit() is for LINQ... I'm guessing that I'm using it wrong because holy crap... it doesn't scale well at all however using the Attach() method seems to be significantly faster than everything else. I tried using 100,000 entities however many of these just didn't scale well and it took longer than 10 minutes. I'll probably do that later tonight and update this blog with those numbers in the morning.
I should also note that I learned how to use LINQ yesterday... so it's very likely I'm not using it properly but it's a pain to find good and simple examples. Luckily Visual Studio 2008 came with some which got me going... but some of those were either lacking in areas I needed or more complex than I cared for -- fortunately Google to the rescue.
I find it very intersting that opening and closing a connection have next to zero overhead for creating entities. Granted, this is likely using pooling but still -- I would have thought even the code overhead would have been more than that. I also find it intersting how much faster a constant value is than a static value -- the read access must really be something to take in to consideration.
Something I learned about LINQ is it seems to bring over the constraints from the database in to the code itself while SubSonic seems to be just setting up classes and being fairly stupid (which isn't neccassarily a bad thing and can give you more control in certain situations). Because of this, I wasn't able to take advantage of the LINQ collections because it forced me to make a unique ID upon adding it or else it would throw an exception ("Cannot add an entity with a key that is already in use.") -- one which I haven't figured out how to get around just yet because I don't want to assign those keys in code, I want those to be assigned when they are saved and have the database assign them.
I'd really like to compare how LINQ handles revision control systems against SubSonic... since regenerating that SubSonic DAL sucks because you normally delete the folder, recreate the folder, regenerate the DAL -- and doing all that in a version control system can be very slow. LINQ doesn't seem to need tons of files however for bigger databases it seems to take longer generating the XML. This was using LINQ .NET 3.5 SP1.
SubSonic also doesn't seem to handle constraints very well or at least when I placed constraints forcing a db structure such as linking PersonID's or forcing UNIQUE names (e.g. room names) it seemed to not finish the code upon generation but never threw an error and as such it would never compile to generated DAL. As such, I had to choose a simple table structure to make sure the playing field was level however given a database that needs to maintain ACID compliance and one of moderate size (will have more than a couple users and more than a couple tables), I would not choose SubSonic (which 2 weeks ago I would not have said this) however I plan on looking back at it later. This was using SubSonic 2.1 Pakala which was somewhat recently released -- so I'm hoping it's just a few bugs to work out. However if I were using Sqlite -- I would likely choose SubSonic.
I've recently come to the conclusion that using constraints is the only sane way of keeping a database consistant since everything else requires a *ton* of overhead checking (and you will still never be 100% safe without doing row-level or table-level locking) or you take a chance at your database being inconsistant or having it slowly become more corrupt over time. I'm still trying to figure out how to cleanly handle exceptions thrown which violate these constraints... without having try/catches over ever database write. Herely lately I've been pushing for everything being saved in a stored procedure and wrapped in a transaction. This was accomplished only because I found out how to use the XML datatype in SQL Server and use it to pass in mini-datatables. Doing this seems to have made things *really* fast and most likely faster than LINQ, SqlCommand, or SubSonic will ever be because it's all in the database... only catch is you have a metric ton of variables for the save stored procedure -- which is where a strongly type data access layer comes in handy.
The next areas I plan on working on is the read tests and then a full database population read/write test -- with using constraints and without constraints. I'm very curious to know what I'll find.
Comments
I re-hashed some of the code.
I re-hashed some of the code. For some reason I thought .Attach() would create a new entity but apparently it only updates an entity. I removed that.
I adjusted the quantity of entities being made. 3 tests are now performed -- 2500 objects, 5000 objects, 10000 objects, and 20000 objects. Some things I expected the speed increase to be linear while others to be fixed'ish. For example, the SqlCommand would always be the fastest if the string was pre-generated. However I don't think that is fair to the others because I then shouldn't count .Add() on the SubSonic's -- since that's the same.
After I record the final datetime (when the process was finished), I now truncate the lookup table -- making it fresh for the next batch.
Finally, I adjusted the 'Release' mode to remove all debugging and told Visual Studio not to attach to it and to remove the vhost -- hoping to speed things up as much as possible. Turns out, it did help some.
I'm still in awe the .Open() and .Close() for SqlConnection doesn't have much overhead -- even with pooling -- I would have thought it to be a good slow down but apparently was wrong.
I'm also thinking about adding a stored procedure which does all of this -- one that loops and another that is static. I don't know why... but just because sounds like as good of an excuse as any.
Consistantly SubSonic is slowly that LINQ but not be as much as I thought it was because of my previous mistake. What's still weird to me is that Collections aren't always faster than saving individually.
Another thing I want to add is running with .InsertOnSubmit() and .SubmitChanges() within the for loop -- since the .SubmitChanges() seems to have considerable overhead.
I feel compelled to note why these numbers are important to me. One of the projects at work I'm working on is migrating data from an old system to a new system -- so speed is VERY important. It's why I use try/catch as little as possible because the catch part is *very* expensive in .NET. While never hitting an exception doesn't slow it down much, it does slow it down some (I ran loop tests to prove this with different percentages of exceptions from 0% to 100% exceptions). I also had a bad experience with SubSonic lately and reacted by looking at other solutions.
I also want to try NHibernate and see how it compares.
From what I'm seeing, LINQ should be the one to use or NHibernate because it will likely be easier to hire someone who can code that than SubSonic -- assuming you must have an ORM. Not everyone needs / wants one. At the cost of overhead, it gives you cleaner code and code that will breka during compile time should a column not exist (hopefully).
Using StringBuilder still seems the best way to do anything in .NET dynamically.