blog.jj5.net (2003 to 2005)

DBNull.Value in the .NET framework

Fri Jan 9 09:44:00 UTC+1100 2004

Categories:

Programming

I have an application framework that I'm working on where I create business objects (derived from base class Entity) that model records in a relational database. It poses a lot of pretty hairy design challenges, one that I'm using my blog (i.e. this post) to think through - it's like having a discussion with myself that you can watch.

I know about the problems associated with designing your business tier directly from your data tier, and understand that some developers attempt (very nobly) try to abstract Many-Many relationships not as objects in the business tier, but simply references, etc. Or more often (not so nobly) they simply program directly against the database from the client application, this undeniably has the effect of putting business logic (at least some, usually all) directly into the client, which is a bad thing.

My problem is that my database contains in excess of 100 hundred smallish (i.e. 'thin') tables (which I guess is a good thing since it kind of implies that it is likely to be at least third normal, which it mostly is) meaning that I need to generate a lot of code, and it is most convenient (at this stage) to model such n:m relationships as objects to allow for the easiest manipulation in the database. So I can add/remove simple relationships (i.e. foreign keys) by setting the foreign key, and more complex relationships (i.e. Many-Many) by adding and removing corresponding 'relationship objects'. Often too, a 'business relationship' is truly an object of it's own right anyway, since it might be necessary for the relationship to define properties of it's own, for example:

Consider a database application that had Author and Book tables. If an author could have many books, and a book could have many authors (which is reasonable) then I'd need an AuthorBook table to manage the Many-Many relationships. At first glance if I was to create a 'natural' API for this in a business object-model then I might be inclined to have a BookCollection object that managed a collection of books, and provide an Author.BookSet (avoiding plurals like a good boy ;) property with this type, allowing me to enumerate all the books by a given author. This is harder to do, because my framework needs to hide the fact that the persistence layer actually has a concept of an AuthorBook entity to mange this relationship. A problem might then present itself, where the relationship itself had a property, say for example, AuthorBook.ContributionLevel which models the level of contribution to the given book, say for example 'Primary Author', or 'Contributing Author', or 'Editor', etc. (a little contrived perhaps, but still reasonable for the sake of example). Now my business model can't simply provide the same level of abstraction, because I have no place to represent this extra information in my business model. So I need to admit in the business model that the 'business relationship' actually does exist, and provide an AuthorBookCollection and instead of Author.BookSet I _must_ provide an Author.AuthorBookSet property so that the client can get at the ContributionLevel. This makes the client application work harder to express the relationship of Authors to Books, but is seemingly unavoidable. Since I might have either type of these listed scenarios in any given application, ideally my business model might simply provide both types of interfaces as appropriate, however, a problem would then arise in the future if modifications where required, say for example audit information about the user who created a record in the database was required, and it became a requirement to know which user was the first to specify that a given author contributed to a specified book, and the client application needed to be aware of this information. I would then need to modify my business code to stop using the BookCollection in the Author.BookSet property and change to the second scenario, which would then require changes to all my client code, since it relied on that interface..

Face it, abstracting a relational database as an object model has always been, and it remains, a hard problem. In my defense I do attempt to further abstract this initial abstraction for the sake of providing a more 'natural' API for users of my business library and I can do this with composition of the initial abstraction.. anyway, I digress.

In my previous post, I mentioned the complications of Null values. The problem with NULL is that it really doesn't mean anything. It could mean unknown, could mean not applicable, etc. The other problem with NULL is that when abstracting a database as an object model, what do you do when the key is not in fact null, but rather simply 'not yet allocated'? Since most databases support the notion of an auto-increment primary key, you need to actually create a record _in the database_ before you can identify it in the application if you use the primary key as a means of identification. The problem is then, that a user might create a new business object locally, that has not yet been assigned a primary key (because the database will do this at persistence time) but they may be able to define a relationship between this 'new' business object and others at run-time prior to persistence (invariably this makes sense, i.e. I create a new Author object, two new Book objects and three new AuthorBook 'relationship' objects before I decide to persist all my changes inside the same database transaction).

Of course things get even more interesting, because of the three-value logic rules (TVL). Where a NULL value is not equal to another NULL value, but can be said to be NULL (i.e. IS NULL).

Say a database represents a 'missing' or Null value internally as NULL (all caps). The .NET framework and 'ADO.NET' will represent this same value as DBNull. This is a singleton class, accessed via DBNull.Value. Interestingly, this class does not support the concept of three-value logic, so a test of DBNull.Value == DBNull.Value will return true, even though the semantics of what it is modeling in fact deny that this is possible, and no NULL value could be said to equal another NULL value. So that's another interesting problem - and I'm sure that some thought went into this, and the decision the designers made is probably the best one, but you can see where it could lead to anomalies..

Now think about NULL foreign keys on a table. If these exist then they just mean 'there is no relationship' they don't imply anything about what that actually means (of themselves, although it is reasonable that the developer defines it to mean something, or assumes it just means, don't know yet, can't know yet, etc). This is one thing from a database perspective, but if you are modeling the relationships prior to database persistence, you might in fact understand what the relationship is, but have no means of expressing it via the Primary Key mechanism that the database is aware of, and ultimately controls.

So what I've done with my application framework (or business tier) is support the concept of both the related 'entity' and the related 'entity key', where the key need not be known prior to defining relationships between business objects (meaning that I don't need to 'persist as I go' or attempt to implement my own application level keying mechanism, or universally use GUIDs, etc). So I would have for example both a Project.Client and Project.ClientKey property on a Project business object. The Project database table would simply define a ClientId field for storing the foreign key, say this could be NULL if the project existed but had not been allocated a Client (probably contrived, since it sounds like this should probably be a non-nullable key, but nevertheless it'll do for the sake of example). The ClientKey is an abstraction of the primary key that identifies a Client object in the database. It is derived from a base class called Key and adds the appropriate, strong-typed properties that represent its value (can be more than one field/value that defines a key, can be of different types, say string, or int, etc.), is immutable, and overloads Equals and GetHashCode such that it can conveniently be used for comparing Client objects (i.e. if they are not new, they can be compared by their ClientKey objects using the == syntax even though they need not be the same object but simply model the same value in the same domain). Anyway, I run into the problem that if for example I've just created a new Client in my application, and prior to saving it to the database I've also created a new Project and allocated the Client to the Project. If the Client were not new, it would have a primary key, and I could simply set the Project.ClientKey object. Since the Client is new the Project can not yet be told the value of the ClientKey, because the Client has not yet been saved to the database to receive its auto-incremented key. This means that in order to specify the relationship I need to set the Project.Client value, to store a reference to this Client. (lets not get into the trouble this creates, trust me, I know). Anyway, I need to be able to do this - short of holding a database transaction open and 'saving as I go' during the entire process, which is unacceptable, and virtually impossible with a web-based client application and definitely impossible in an 'off-line' application where the object model would need to be serialized locally and imported at a later date because available keys could simply not be known. So say I have this new Client and Project, with the Project.Client property set but the Project.ClientKey property not yet known, but not NULL.

Maybe what I want to do in this case is throw an exception. I.e. you can not get the ClientKey if the ClientKey can not be known. This would mean though that I would also have to provide an interface for the client to check if the ClientKey could be known, with something like Project.IsClientKeyDeterminate for example, but such a solution just adds more weight to a class that is already getting pretty bulky. Then the client would have to check this value before any access of Project.ClientKey or risk receiving an exception. And that sounds like I'm creating a lot of work for the client application.

As a side note, NULL is generally a sticky issue even for 'normal' data that are not keys. Short of creating my own 'dual state' objects to aid in encapsulating primitive values and the possibility of them being NULL (which would kill performance and generally suck) or using the SqlDbType types (which already basically do this but is also a sux option) if I want strong types (which I do) I have to create an interface to query if a value is null on the business object, and throw an exception if an attempt is made to read a strong typed bit of data that is NULL. For example, if I have Project.CommencedOn as a nullable date field (where NULL could be taken to mean either 'not commenced' or 'we don't know when it commenced' or 'we haven't yet specified when it commenced, but we do know, although you don't, ner ner ni ner ner') and this returns a System.DateTime value, then if the value is NULL, since DateTime is a value type, I can not return a null reference, but I can not return a value either, since I don't have a value (short of using a sentinel value, which can also suck, what do I do if its a nullable boolean field?) I can not return a value, so I have to throw a NullPropertyException (or similar). This means I also need to add an interface for nullable values that allows the client to query if the value is null, for example, Project.IsCommencedOnNull, which they need to query before they access the property or again risk the exception. I also, for the sake of convenience provide a DispayCommencedOn interface that will format the value of the CommencedOn property as a string, regardless of whether it is NULL or not.. blah, blah..

Because Key objects are reference objects (even though they are immutable, and overload equality operators) I could actually return a null reference in the case that the Project.ClientKey could not be known. Note also, that the Project.ClientKey if it were not a nullable field (i.e. was required) could in fact still have no key defined, since it is practically impossible to provide a default value for such a thing, but it may not yet have been specified or known, even if the 'real' relationship with the 'new' Client was known. So assume I take a null reference to simply mean that the ClientKey is indeterminate. What do I do when that same key is in fact known and is in fact NULL. In other words, I have a Project record and it's ClientId field is NULL. I can't know return a null reference, since this would imply that the key was simply indeterminate. What I need to be able to do is represent this as a NULL value. I could do the same thing that I do with other properties, and throw a NullPropertyException if an attempt to read the value was made while it was null, but because this is a special purpose object (i.e. the abstract Key object and the concrete ClientKey object, etc) I have the flexibility to provide a sentinel value for a Null Key (which I don't have for value type objects). I could do this (and perhaps this is a horribly bad idea) by making Key not only the base class for other Key objects but also a singleton class that provides access to it's only instance via a Key.Null property. Or better still, I could derive a new type from Key, say NullKey, and implement this as a singleton with NullKey.Value. Then if a Foreign Key was NULL, I could return a reference to this singleton, although it would semantically be an error to return this for a Primary Key the class itself could not protect against this (that would be the responsibility of the concrete Entity). If I were to do this, consider that Key objects overload == and override Equals and GetHashCode. I would need to provide an implementation for these. The hash code is OK, it would be safe to return a constant, but what about the implementation of Equals()? What do I do there. Do I allow NullKey.Value to be compared effectively by reference such that NullKey.Value == NullKey.Value or do I try and bring in the TVL, and state that NullKey.Value can not be said to be equal to NullKey.Value. Interesting problem, and I'm not immediately sure of a solution. The other little problem that has crept in now, is that if I had to Project objects, p1 and p2. If I had a conditional statement p1.ClientKey == p2.ClientKey then (assuming for now that neither was NULL) then it could actually return true when both p1 and p2 did have clients defined, and each client was different. The reason that this could happen, is that if each client was a new client then neither would have a primary key, thus Project.ClientKey would be indeterminate, and would have returned a null reference and the comparison would seem to indicate that they were for the same client, even though this is not the case. I might then do something terrible based on my assumption, and call p1.SendConfidentialInformationRegardingProjectToClient(p2.Client) which could actually work (although a strange interface, and really you should check that the client really does exist in the database prior to such an action etc, it is an example of something that seems reasonable, but could be tragic).

So up until that last part, I kind of liked where I was heading with this. Part of the reason is then I could have a UserInterface control that knew how to deal with Keys and I could let that happen with a simple assignment. But if the assignment was capable of throwing exceptions then I'd have to do a series of checks first, and I was hoping to avoid that.

I don't want to throw an exception if a user calls Project.ClientKey because they are just requesting some information and I don't want to make them query another interface to see if the Project is in a state where this is OK for them to do. But based on this discussion so far, it seems as though the only reasonable way to go ahead is to throw an exception if the Key is indeterminate and provide an interface for checking this fact. Bummer.

The other question that remains is, what do I do with the NULL key issue. Do I create this NullKey class, and if so, what about the Equals() implementation. Or if I don't flag indeterminate (externally at least, internally it will still be a null reference) using the null reference it is now available again for use as a flag for NULL. null references have reference equality, so I'd still have the TVL problem. I.e. p1.ClientKey == p2.ClientKey returns true, even though neither Project actually has a Client defined. Or do I stick with exceptions for this too, and have an IsClientKeyNull property.

The reason that I've been thinking about this, is that I haven't been totally happy with my present implementation. Currently something like Project.ClientKey would return a null reference in both the case that the Client was indeterminate (i.e. configured, but new) and NULL (i.e. specifically flagged as NULL, because it was not known, etc). There is also a Project.IsClientNull property that returns a value indicating if the Project.ClientKey is in fact NULL. The problem here is that I'm still open to the possibility of a comparison of one project to another via something like p1.ClientKey == p2.ClientKey where p1.Client is a new client and p2.Client is null thus I could think that the projects are for the same client, where in fact they are not. Although this might seem like a strange thing to worry about, I really do worry that an interface where p1.ClientKey == p2.ClientKey doesn't mean the projects are for the same client, and p1.Client will actually return a reference to a Client object (a new one that has not been persisted) leaving the possibility that a call to p2.Client would return the same Client (although such a call would actually result in a NullPropertyException()) especially when ClientKey _is_ the preferred mechanism for determining equality, since it is the immutable, overloaded, overridden class designed for this purpose.

I think all this typing may have paid off, I'm tempted to change to this:

Project.IsClientKeyDeterminate Project.ClientKey

implement the NullKey singleton.

If the Project.IsClientKeyDeterminate returns true, then ClientKey will return a reference to a concrete Key object. If Project.IsClientKeyDeterminate returns false, then ClientKey will throw an IndeterminateKeyException(). The concrete NullKey singleton will provide an implementation for Equals that supports reference equality (i.e. not TVL). Thus if Project.ClientId (in the database) is NULL, then it will return NullKey.Value. If Project.ClientId is specified then Project.ClientKey will return a concrete ClientKey object. However, the immediate problem with this is that the type of Project.ClientKey would have to be Key, and not ClientKey (which is far more preferable). So taking it a little further, we can drop the NullKey.Value idea (since TVL is not going to be implemented) and instead just return a null reference.

So if Project.ClientKey is NULL it returns a null reference, if it is specified it returns a ClientKey instance, in any other case it _must be_ indeterminate in which case an exception is thrown. Setting ClientKey externally to a null reference has the effect of specifying a NULL value, setting to a ClientKey instance has the effect of specifying the Client identified by ClientKey and there is no way to set this to 'indeterminate', since such an operation makes no sense. A client application could affect the state of the Project such that it effectively set the ClientKey to an indeterminate value, by setting the Project.Client property to an instance of a new Client object. If the app set the Project.Client to an instance of Client that was not new (and therefore had a primary key) the value returned by Project.ClientKey on its next call would have been updated to reflect this change (all this is managed behind the scene).

That was fun. Turns out that was pretty simple after all. Anyone see any holes in this? :P

John.