Linq

Database Speed Tests

edit: links to home.etherpunk.com will likely not work properly since I'm moving servers around. I'll fix this when I get a chance.

Yesterday a bug hit me. I wanted to see what was the fastest way to read and write data to a database. The methods of which to do the writing was the fun part. I chose LINQ, SqlCommand (2.1 Pakala), and SubSonic. There are a couple different ways to use each and I wondered which way was the right way. I found some interesting results.

Firstly, I'd like to provide the link to the code that I used so you can verify the results yourself or critique my methods. It can be found at: http://home.etherpunk.com/svn/DatabaseSpeedTest/DatabaseSpeedTest/WriteT...

Secondly, the database structure is *very* simple. The only table I'm using is a TypeLookup table which is essentially a primary key (set as a unique identifier), a name, and createdon fields. This is a table I'm using for another application (that application is exceptionally simple, which is why I chose it).

Here are the results (in seconds):

Running write tests. Creating 10000 entities.
LINQ using Attach: 3.523
LINQ using InsertOnSubmit: 38.25
SqlCommand Dynamic Sql (constant connection string): 15.989
SqlCommand Dynamic Sql (static connection string): 15.123
SqlCommand Individual (constant connection string): 8.86
SqlCommand Individual (static connection string): 9.526
SqlCommand Individual More Effecient (constant connection string): 8.862
SqlCommand Individual More Effecient (static connection string): 9.527
SqlCommand with StringBuilder (constant connection string): 6.056
SqlCommand with StringBuilder (static conenction string): 6.144
SubSonic with individual saves: 8.945
SubSonic using Collections: 9.654

An intersting thing is how horrible InsertOnSubmit() is for LINQ... I'm guessing that I'm using it wrong because holy crap... it doesn't scale well at all however using the Attach() method seems to be significantly faster than everything else. I tried using 100,000 entities however many of these just didn't scale well and it took longer than 10 minutes. I'll probably do that later tonight and update this blog with those numbers in the morning.

I should also note that I learned how to use LINQ yesterday... so it's very likely I'm not using it properly but it's a pain to find good and simple examples. Luckily Visual Studio 2008 came with some which got me going... but some of those were either lacking in areas I needed or more complex than I cared for -- fortunately Google to the rescue.

I find it very intersting that opening and closing a connection have next to zero overhead for creating entities. Granted, this is likely using pooling but still -- I would have thought even the code overhead would have been more than that. I also find it intersting how much faster a constant value is than a static value -- the read access must really be something to take in to consideration.

Something I learned about LINQ is it seems to bring over the constraints from the database in to the code itself while SubSonic seems to be just setting up classes and being fairly stupid (which isn't neccassarily a bad thing and can give you more control in certain situations). Because of this, I wasn't able to take advantage of the LINQ collections because it forced me to make a unique ID upon adding it or else it would throw an exception ("Cannot add an entity with a key that is already in use.") -- one which I haven't figured out how to get around just yet because I don't want to assign those keys in code, I want those to be assigned when they are saved and have the database assign them.

I'd really like to compare how LINQ handles revision control systems against SubSonic... since regenerating that SubSonic DAL sucks because you normally delete the folder, recreate the folder, regenerate the DAL -- and doing all that in a version control system can be very slow. LINQ doesn't seem to need tons of files however for bigger databases it seems to take longer generating the XML. This was using LINQ .NET 3.5 SP1.

SubSonic also doesn't seem to handle constraints very well or at least when I placed constraints forcing a db structure such as linking PersonID's or forcing UNIQUE names (e.g. room names) it seemed to not finish the code upon generation but never threw an error and as such it would never compile to generated DAL. As such, I had to choose a simple table structure to make sure the playing field was level however given a database that needs to maintain ACID compliance and one of moderate size (will have more than a couple users and more than a couple tables), I would not choose SubSonic  (which 2 weeks ago I would not have said this) however I plan on looking back at it later. This was using SubSonic 2.1 Pakala which was somewhat recently released -- so I'm hoping it's just a few bugs to work out. However if I were using Sqlite -- I would likely choose SubSonic.

I've recently come to the conclusion that using constraints is the only sane way of keeping a database consistant since everything else requires a *ton* of overhead checking (and you will still never be 100% safe without doing row-level or table-level locking) or you take a chance at your database being inconsistant or having it slowly become more corrupt over time. I'm still trying to figure out how to cleanly handle exceptions thrown which violate these constraints... without having try/catches over ever database write. Herely lately I've been pushing for everything being saved in a stored procedure and wrapped in a transaction. This was accomplished only because I found out how to use the XML datatype in SQL Server and use it to pass in mini-datatables. Doing this seems to have made things *really* fast and most likely faster than LINQ, SqlCommand, or SubSonic will ever be because it's all in the database... only catch is you have a metric ton of variables for the save stored procedure -- which is where a strongly type data access layer comes in handy.

The next areas I plan on working on is the read tests and then a full database population read/write test -- with using constraints and without constraints. I'm very curious to know what I'll find.

Speed testing

I'm in the process of getting an app going under .NET 3.0 using Visual Studio 2008 to see the differences in speed between Linq, SubSonic, and using a plain SQLDataAdapter. Yes, I'm well aware using a DataReader / DataAdapter is going to be faster -- the question is: How much faster? Is it worth making the code that much more cleaner? What about if the code is only going to be used once 4 months from now and, hopefully, never touched again? My problem is I want something to be as fast as possible without making code ugly and without spending a lot of time tweaking -- so I'm taking the shotgun approach. At the moment, it seems like they are all fairly close unless you are dealing with LOTS (millions) of records -- which I am not. SubSonic has its fair share of bugs such as puking if a table doesn't have a primary key. I can understand it not understanding how to right back or do complex queries, I just want to populate a collection with a 'Select *' type statement -- so it doesn't need a care about primary keys as I'm just going to loop through item in the collection and pull data as I see fit. Also, SubSonic seems to always do a 'SELECT *' type query if I want to use the simple code such as 'Some_DB.Some_TableCollection TC = new Some_DB.Some_TableCollection().Load();'. If I want to be specific then I have to use a Query class. Even worse, it's /horrible/ at doing OR's. They really prefer you to use views or stored procedures. Neither of those are bad, but I'm not writing code for something anyone else but me should be using -- so I don't care about exploits or any safety measures. Even if I was, this engine doesn't have any direct input mechanism, so good luck with that. Linq seems overly complicated. Fucking A is it a bitch to setup. Nothing seems automatic. Code doesn't /feel/ cleaner (so far from my, admittedly, little testing). Google doesn't seem to know how to use it either. I'm finding contradicting information. So, this means I have to do a "best case scenario" speed test to get both Linq and SubSonic working, which isn't a bad idea. One of the programming styles I learned was "assume everything will work perfectly, *then* make exceptions". The alternative, which is more common, is plan for the worst then if it's the best do X. The problem with that is you are always occurring the overhead of the worst instead of doing the best first and if it fails (Which it rarely should) then go to some exception handling. This places me back at square 1. Here is my game plan:

  • Have a script which will populate a database + table with random data. This way nothing can be cached and you can regen as needed. This will probably populate a 'person' table with 2 million records. This is a small amount compared to some other databases, but for our instance it should be plenty to test for speed.
  • Write a GUI which has a listbox and a DataGridView. The listbox will contain different methods of getting data from a SQLServer to a DataGridView. The hope being that if I can get it from a SQLServer to a DGV then I can be fairly certain nothing got cached and everything got pulled over for data manipulation.
  • Write a a few tests. These tests will probably be a simple SubSonic, Linq, and SQLDataAdapter. I will later do more complex queries and embed them to attempt to bring out slowness or resource hogs. SQDA and SubSonic will be the first two. Linq will be third, and any other complex queries will be last.
  • Open source the program and release to world under BSD license.

I'm also probably going to try some of this code on other DB engines such as MySQL and SQLite.

Syndicate content