Monday, May 26, 2008

Using null in programming

In C (and most derivatives), Java, SQL, etc it is called null. In Objective-C, Ruby, and Pascal, it is nil. But regardless, it means the same thing:

Nothingness - the absence of anything - utter non-existence.

Fredrik Normén wrote a post questioning whether null objects should be returned from methods. His post brings to light a flawed coding principal - the thought that a method should avoid returning null due to null reference exceptions, and having to check if a value is null after the method. I must say that the argument for avoiding null is baseless and absurd, but, like many other programming practices out there, has its roots in billions of lines of poorly written code in which null is used inappropriately.

Programs should feel free to return null, as it represents a state of nothingness, a non-result. I will consider three examples, built around a fictional music player program. First, I have a method which looks for songs of a given genre and returns them:

(note - the vast majority of code samples on this site will be in a C#/Ruby-like psuedo-code. They are not meant to be real code, but are more than likely very close, as it is actually easier that way)

public Song[] GetSongsByGenre(string genre)
{
    // Get songs by genre and return them.
}

This method should never return null. The result of a search operation is never null, it is never nothing. It is always at least an empty collection - a value which has a point beyond just being nothing. It can be searched against, queried, iterated, etc. Null would not be a good thing to return here, and almost never is for methods returning collections.

But what about methods which return single objects?

public class Song
{
    public int PlayCount()
    {
        // return the number of times this song has been played.
    }
}

Here is another method which should not return null (ignoring, for the moment that it cannot in most languages as int is a value type). Returning null is an indication of 'complete lack of anything', and this method cannot possibly legitimately return that. If it cannot legitimately execute for some unknown reason, the proper response is to throw an exception. This might return zero, but zero exists (unless you are still working in Roman numeral systems, in which case, I wish you good luck with zero-based arrays), and so it should not be returned as null.

Now say I have this last bit of code:

public class User
{
    public Song GetFavoriteSong()
    {
        // Returns the user's favorite song
    }
}

public class UserInterface
{
    protected void OnPlayFavoriteHotkeyPressed()
    {
        currentUser.GetFavoriteSong().Play();
    }
}

It is this type of code which typically brings rise to the argument against using null. User.GetFavoriteSong is a method that should be able to return null. If the user has no favorite song, than the proper value to return is 'nothing' or 'non-existence'. But in this case, the end result would be a null reference exception, since Play() cannot be called on null.

The proper pattern here is to do a check for null:

protected void OnPlayFavoriteHotkeyPressed()
{
    Song favoriteSong = currentUser.GetFavoriteSong() ?? AskUserForFav(currentUser);

    if(favoiteSong != null)
    {
        return;
    }

    favoriteSong.Play();
}

Why is this code, which is more complex than the previous code by a lot, better? Complexity is bad, right?

In this case, it is not. In this case, the code actually needs to make a decision, whether the coder thinks decision making is acceptable or not (if not, he probably voted for John Kerry). The code can be more simply explained as:

Get the user's favorite song. If it doesn't exist, ask them if they want to choose one. If they still do not choose one, just stop trying. Otherwise play the favorite song.

There really is no alternative to this. No matter what you do, your program has to deal with whether or not the user has a favorite song.

Some would argue that by avoiding null, this can be solved in one place, the definition point, instead of collectively when the result is used. This rarely works, and this case is a fine example of that.

In this case, we have a situation where the code must use the object, and in order to use it, the code must know that it exists (or else be vulnerable to null reference exceptions), which means that the code has to either except its existence as guaranteed, or perform validation. In this case, if I tried to use some other pattern, I would still be forced to return something to satisfy this, and nothing cannot be converted to something in this case without user interference.

I could throw an exception, but then I'm forced to either allow the exception to bubble up and explode or catch it somewhere and do something about it, which really then becomes another, less obvious form of validation (don't get me wrong on exceptions though - there are many great places where this kind of pattern rules...this just isn't one of them).

I could return some type of object (like double.NaN) that would signify null, but that doesn't do any better, as I still have to check for it.

The bottom line is that null is actually a perfectly valid and highly useful pattern, when applied in the proper matter. Null is like every other programming pattern - an abstraction. Good programming recognizes that code abstractions should maintain a one-to-one relationship with real world things. Null is a clear example of this. Null represents nothing, and developers can and should use it when nothing is called for, and only when nothing is called for.

No comments: