The C# language has changed a lot since its initial release in 2000. C# 2.0 brought us:
- Anonymous Methods
- Partial Classes
- Nullable Types
- Static Classes
- Property and Indexer Accesability Changes
- Automatic Properties
- Object and Collection Initializers
- Lambda Expressions
- Extension Methods
- Anonymous Types
- Local Variable Type Inference
And that is just the new language features in the last 3 years! This does not include all of the new classes added to the BCL over the same time period.
Know the Cost
I am as excited about these new changes as the next guy. I have used various combinations of these improvements in my projects with great success. But is there a cost to new features?
Recently I read a couple of articles that reminded me that you not only have to understand how a feature is used, you also have to understand the cost of using it. Sometimes that cost is performance, other times it could be something like readability or portability. I’ll take one example from the 2.0 framework and then one from the 3.5 framework to show that you need to understand the implications of using new features.
First, there is Fritz Onion’s recent post on the amount of code generated by the yield return syntax introduced in C# 2.0:
The array allocation function generated a total of 20 lines of IL, but the yield return function, if you included all of the IL instructions for the IEnumerable class generation as well was over 100! That’s a 5x penalty in code generation to save 18 characters of typing
So is the 5x penalty ever worth it? Yes, of course it is. There are things you can do with yield that you couldn’t do previously. In most cases, the use of yield improves readability dramatically. It also makes the task of creating a custom enumerator much easier.
In his article The power of yield return, Joshua Flanagan points to an example where using the yield return improved performance dramatically:
With the improved implementation that took advantage of the yield keyword, the program was able to finish its job in less than half the time! It also used much less memory, as it never had to store all 9 strings in a collection. Now imagine the potential impact if GetCombinations returned a collection with thousands of entries!
The point here is that you have to know the costs and how they fit in with the requirements of the project you are working on. If you are building something that must support thousands of concurrent users and must be very performant, you will most likely choose to use the yield return despite the extra code generated by the compiler.
LINQ is Dead, Long Live LINQ
Most of the time when I show somebody the new language enhancements in C# 3.0, the thing that interests them the most (and rightly so) is LINQ. It is just so different to see query syntax built right into the language. The next question they usually ask is how the performance of something written using LINQ would compare the same procedure written using just C# 2.0 control structures.
Steve Hebert recently wrote a post about LINQ performance where he concludes:
Despite my best efforts I just couldn’t make my hand-written code perform as poorly as Linq
What? Seriously? Wow. Then I guess we should stay away from LINQ then right? Well, if you look closer at his post, you will notice that he has optimized his non LINQ version of the code a little bit for the underlying data structure where LINQ has to be able to operate regardless of the underlying data structure. I am not so much interested in the particulars so much as I am the idea that you should understand these things before you start replacing all of your for loops with LINQ queries.
One more recent post that fits in here. Rob Conery wrote about his recent experience where his 5 year old code has come back to haunt him:
I spose the moral of the story is to always view the concept of maintenance with an eye towards shifting toolsets and platforms. In 4 years you will need to support the ASP.NET 2.0 site you’re on now, using Visual Studio 2012 and it’s Silverlight-generated Scaffolds :).
This is a great reminder of how tough it can be to deal with changes in technology. I mean, come on, he couldn’t find a machine to compile his 5 year old code?!? I know we can all relate. With the technology learning curve growing at a semi exponential way, just imagine where that same code will be in another 5 years.
Back to Machine Code
Obviously I am not advocating that we just give up on new features because they might not be as fast or resource efficient as our hand coded machine code. In fact, I think we need to embrace abstractions wherever possible. They make our jobs as software developers much easier. Just be smart about it. Have some idea what is going on behind the covers. It is important to realize that yield return eventually becomes a bunch of compiler generated code and that LINQ queries are general case and they may not always be as great as manually doing the coding yourself.
Maybe the documentation on these new language features could do a better job at explaining what sorts of considerations we need to take into account when using them in our applications.
I am OCD about following LINQ. One point I think your post misses that deserves equal time is what LINQ will allow in future iterations of its code base. With PLINQ being talked about openely. I don’t think it is out of the question for MS to slip stream perf improvements into then old LINQ code. With doing everything manually you side step this benifit and will have to opt in. There is only so much guessing MS will do to speed up a for or while statement.
Just like SQL and other four generation languages improve with age. LINQ being the same concept. It will improve without coder intervention as time goes on. I think in 2012 code will be a lot more disposable than it is now when projects start embracing modularity instead of lets redo x, y, z that _____ product/project already does.