CLR Boxing Demystified

November 27, 2007

One area of programming that many developers do not fully understand is boxing and this can lead to some interesting bugs and poor performance. A firm grasp of boxing is very important for any programmer who works with .NET and the CLR. I will attempt to demystify boxing from a C# perspective.

What? Why?

Firstly it’s important to understand some of the fundamentals of how types are allocated in memory on a CLR based system. There are two main categories of types in any CLR system, C# included: reference types and value types. Reference types are allocated on the managed heap, where as value types are allocated on a thread’s stack. Value types are the more lightweight of the two. When designing types its very important that you understand this difference for a range of reasons, mostly to do with memory management and performance. Reference types (being on the managed heap) are subject to garbage collection (and allocation could force a gc operation) and items on the heap have some overhead memory usage associated with them per instance. Value types on the other hand are allocated on the thread’s stack and are not subject to garbage collection – their memory is freed as soon as the method that defines the variable ends.

Some examples of value types are stucts, enums, int, bool etc.

It’s also important to understand that a value type variable will store its value directly, where as reference types will store a pointer to the object on the managed heap. Also another important point is that value types are sealed and cannot be inherited from. Value types all derive from System.ValueType which itself is derived from System.Object (all types both reference and value end up being derived from System.Object – important to remember when thinking about boxing).

Important: When designing a value type, it must be immutable, which means that no members update any of its fields – this is very important to save you confusion one day.

You only should use a value type when the following statements are true:

  • Type is simple and immutable (i.e. it acts as a primitive type)
  • Does not need to inherit from another type
  • No types will need to derived from it
  • Data will be relatively small in the type or the type will never be passed into or returned from methods (this is important because each time a value type is assigned to a new variable its entire value will be copied – where as a reference type will simply have its reference to the object on the heap copied.

So why do we need boxing?

A boxing operation occurs when ever a reference of a value type is required, for example in a method that takes an object as a parameter. It also occurs when value types are created using any interfaces they may inherit from (oh yeah, value types can implement interfaces :) ). As I mentioned earlier, value types all inherit from System.Object in the end, but System.Object is a reference type… so for a value type to become a System.Object type it must be turned into a reference and placed on the managed heap. This is called boxing. Once the value type is converted to Object, it becomes a reference type.

Why is this important to understand? Because boxing is very expensive… creating objects on the managed heap etc causes more work in assigning the object and for the garbage collector. A boxed value type’s life will extend the life of the unboxed version. It also means you can end up with more than one copy of the object (on on stack and one on heap). Working with boxed objects can also be a little strange to the uninitiated.

So what does a box operation look like? Simple:

int x = 5;
Object o = x;

Done. The variable x has now been placed onto the managed heap. The original variable x will stay on the stack until the current method exits.

How can I tell when a box or unbox operation happens easily?

It’s very handy to know when a boxing or unboxing operation is happening, and it may not always be apparent (especially when you start overloading methods and implementing type safe versions of default methods etc). A very easy way to see this is by using ILDasm and looking at the IL version of your code. The IL code for the code above would look like this:

.locals init ([0] int32 x,
          [1] object o,
          [2] int32 y)
 IL_0000:  nop
 IL_0001:  ldc.i4.5
 IL_0002:  stloc.0
 IL_0003:  ldloc.0
 IL_0004:  box        [mscorlib]System.Int32
 IL_0009:  stloc.1
 IL_000a:  ldloc.1
 IL_000b:  unbox.any  [mscorlib]System.Int32
 IL_0010:  stloc.2

You can clearly see the box and unbox IL calls here… you don’t need to fully understand Intermediary Language to gain benefits from this.

Okay, so what are the traps – this seems pretty easy

Well, aside from the performance issues associated with unwanted boxing and unboxing operations, there is a really big gotcha when dealing with boxed versions of types.

Imagine this code:

A test structure

struct TestStruct
    {
        private int value;           	public TestStruct(int initVar)
        {
            this.value = initVar;
        }           	public override string ToString()
        {
            return value.ToString();
        }           	public void ChangeVal(int newVal)
        {
            value = newVal;
        }          }

A test main method

static void Main(string[] args)
       {
        TestStruct ts = new TestStruct(10);              	 	Object boxedObj = ts;              	 	((TestStruct)boxedObj).ChangeVal(20);              	 	Console.WriteLine(string.Format("Boxed version: {0}", boxedObj.ToString()));
        Console.WriteLine(string.Format("Normal version: {0}", ts.ToString()));
        Console.ReadLine();
       }

What will this code display? We first create the test struct and initiate it to 10. Then we box it into object. Then we cast the boxed object back into a TestStruct type and change its value to 20. Then we write out the two values. The output:

Boxed version: 10
Normal version: 10

What happened? Well first, Object knows nothing about TestStruct’s ChangeVal method, so it needs to be cast back to a TestStruct, which creates a temporary TestStruct value on the stack. Then we change the value to 20. However the boxed version doesn’t get updated! Oh and by the way, C# prevents you from altering the fields on the boxed version of a value type.

So be very careful when dealing with boxing and unboxing it could cause you some headaches one day!


Visual Studio 2008 RTM Available on MSDN

November 19, 2007

Just a quick note: VS 2008 RTM is available on MSDN to subscribers – http://msdn2.microsoft.com/en-gb/subscriptions/default.aspx.

Edit

ScottGu goes into some great detail about the new release, including Express and trial versions in his blog. He also gives a nice overview of some of the new features including Multi-Targeting support, built in AJAX and JavaScript intellisense, CSS support, LINQ, extension methods and lambda expressions.

Daniel Moth has a great article outlining the top 10 things to know about Visual Studio 2008 and .NET 3.5.


Slow compilation of large ASP.NET sites on IIS

November 16, 2007

Let me start by saying this is not about slow compilation in VS 2005. If you are experienceing slow performance in Visual Studio, then look here.

Background

Our CMS utilises ASP.NET to render the live site that the end users see. The CMS basically publishes an ASP.NET site as if you hand wrote it yourself. The management classes and ASPX pages behind this are pre-compiled and the pre-compiled configuration file is set to allow updates i.e.:

<precompiledApp version="2" updatable="true"/>

This allows us to have a nice strong pre-compiled set of management classes etc whilst allowing the publication system to send new ASPX files to the live site at any time.

This means that there are heaps of ASPX files on the site possibly needing to be compiled at any given time (on some of our larger client sites there could be 20,000 ASPX files).

Problem

Normally you would expect that a new page would be compiled on the first visit and then run fast and efficient for subsequent hits… however a few months ago (June maybe) Microsoft released a patch which caused a few headaches to a large non-pre-compiled site.

Our clients started ringing up complaining about slow downs after publication (i.e. a page had to be re-compiled by ASP.NET). Some of the slow downs were 15 or more minutes! The CPU and memory would max out even quite high spec machines – and the site would become totally unresponsive.

After some head scratching and remote access to the client networks (we couldn’t replicate the problem in our environment) and lots of file monitoring, run-time monitoring etc. I finally realised that ASP.NET was re-compiling every page in the site (including a special cache we keep for system purposes). Naturally on a large site this was taking forever. Further research on our part tracked the problem back to the KB article http://go.microsoft.com/fwlink/?LinkId=91233. Being a security update, some of our clients installed it straight away – against our general advice (i.e install to staging environment first :) ).

The solution is however quite simple:

<configuration>
    <system.web>
        <compilation debug="false" batch="false" numRecompilesBeforeAppRestart="50"></compilation>
    </system.web>
 </configuration>

Update your web.config file and add in the attribute batch=”false” to the compilation line. The problem was that the framework patch had turned on batch compilation mode by default – causing obvious compilation delays (well this is what we assume has happened – information was a little scarce about the problem, most sites are not as dynamic as an Objectify live site). With this setting, affected machines now compile only one page at once. Click here for more information.

You will also notice another setting here – numRecompilesBeforeAppRestart. Normally ASP.NET will cycle the worker process after 15 dynamic compiles. You can extend this out depending on your needs.

One final note… if you do use a dynamic site, remember that InProc state management is bad! Your sessions will be lost after 15 compiles!

NB: We couldn’t absolutely prove it was the KB article i listed here that caused the problem… it was our investigation by comparing patches and applying patches in our environment that lead us to this (i.e. we ended up replicating the issue – thank god).

Edit

2007-11-28

I just remembered that the day(s) that I was investigating this issue as a matter of top priority was the same day that my entire company went to Calder Park Raceway to drive V8 Supercars! I missed out…oh boy that was a tough day.


How to do a join between two XML files with Linq

November 8, 2007

Linq is still a relatively new concept, but with the release of Visual Studio 2008 beta 2 and talk of a release candidate version it is about to become available for production use.

I’ve been knocking around with Linq for a little while now, and so far I really like what I see. There is plenty of talk about it around the internet and some excellent tutorials kicking around also. For a great look into Linq to SQL and also an overview of lambda expressions and extension methods check out Scott Guthrie’s blog.

First let me begin by saying that I make no pretences to being any form of Linq expert. My linq skills are still very much in their infancy… but I’m going to have a go at talking you through a little problem I had to solve the other day.

The Problem

You have two XML files, or a single XML file with two distinct tree’s. The first file/tree contains some records with primary keys of sorts. The second file/tree contains some joining information. The example I will use here is a list of words and synonyms.

File A may look like this:

<words>       <word>          <id>1</id>
         <text>Accident</text>      </word>      <word>          <id>2</id>
         <text>Mistake</text>      </word>   </words>

File B may look like this (linking the word and its synonyms):

<links>       <link>          <id>1</id>
         <refId>2</refId>      </link>   </links>

A little disclaimer here: Obviously given the chance you would use the hierarchical nature of XML to store the relationships, but because the source of the data I had to deal with was flat I had no choice.

When using linq to SQL you get lovely typed data and all the relationships are mapped out for you (or you map them out using the linq to SQL designer if they are not present in the source data). Unfortunately with linq to XML we can’t explicitly map these relationships *yet*.

Enter stage left the linq join functionality. The join keyword available in linq basically lets you write similar joins to what you might be used to with SQL (left, inner, outer etc).

An SQL query to select a word and find all the synonyms using the ref table might look something like this:

SELECT words_1.word
FROM links INNER JOIN
words ON links.id = words.id INNER JOIN
words AS words_1 ON links.refId = words_1.id

Pretty simple in SQL… but in linq to XML how to you achieve the same result? Using the join keyword of course.

Start off by loading the two XDocuments – one for the words and the other for the linking information:

XDocument wordDoc = XDocument.Load(AppDomain.CurrentDomain.BaseDirectory + "\\words.xml");
XDocument linkDoc = XDocument.Load(AppDomain.CurrentDomain.BaseDirectory + "\\links.xml");

Then start the linkq query in the usual way.

 var wordMatched = from w in wordDoc.Elements("words").Elements("word")

This will iterate all the elements in words/word, and place them into the w variable.

Next we can start populating the return var with information on the currently selected word element.

 select new
 {
 id = w.Element("id").Value,
 term = w.Element("text").Value,

Basically each item in the wordMatched IEnumerable object will contain .id and .term when you iterate them. That’s all really interesting but it doesn’t really do much yet, we want each wordMatched element to contain multiple child synonym words.

 syns = from refIds in linkDoc.Elements("links").Elements("link")
        join synLinkedWord in wordDoc.Elements("words").Elements("word") on refIds.Element("refId").Value equals synLinkedWord.Element("id").Value
        where refIds.Element("id").Value == w.Element("id").Value
        select new
        {
           id = synLinkedWord.Element("id").Value,
           term = synLinkedWord.Element("text").Value
        }

The join here is selecting words from the same XML file first up… this is because the final data we want resides in the same XML file as the words (i.e. words and their synonyms are identical, its the linking table that joins them up). The next part of the join says: we want to make the refIds in the other XML file match the ids of words in the original table.

Next is the where clause, which places a restraint on the query, or the join will return all items that are referenced in the links table… so only return items from the links table that have an id the same as the id on the original “w” object.

Finally we select some data from the original table again (this time using the word that our linking XML gave us).

To read this data is quite simple:

 foreach (var word in wordMatched)
 {
     txtOutput.AppendText("\r\nWord: " + word.term);
     foreach (var syn in word.syns)
     {
          txtOutput.AppendText("\r\n\t->Syn: " + syn.term);
     }
 }

As you can see each word contains the id and term properties, but also contains another IEnumerable properly called syns, each one containing the details of the syns!

Problems

There is one problem which I cannot solve as yet, and this is due to my total lack of depth in linq… Iterating through the wordMatch enumeable will return words which were listed as syns in other words… i.e. in our example mistake will be listed as a syn under accident, but it will also be listed by itself as a main word. This can probably be solved with some more where clauses and possibly another join… comments would be greatly appreciated on this!

Edit 9 Movember (sic) 2007

I solved my own problem whilst working on another problem… Insert the following line below the first from w in wordDoc xxxxx line:

where linkDoc.Elements("links").Elements("link").Where(p=>p.Element("refId").Value==w.Element("id").Value).Any() == false

This line adds a where clause that runs a scan on the links XML document searching for items that have a refId that is the same as the current id. The lambda expression in the .Where checks all elements in the linkDoc.Elements(“links”).Elements(“link”) enumerable to see if their ids match. The .Any() method returns true when the returned enumerable contains one or more items. So if an items id is a refId in the links table we assume that it is a synonym.

Full Listing

XDocument wordDoc = XDocument.Load(AppDomain.CurrentDomain.BaseDirectory + "\\words.xml");
            XDocument linkDoc = XDocument.Load(AppDomain.CurrentDomain.BaseDirectory + "\\links.xml");

            var wordMatched = from w in wordDoc.Elements("words").Elements("word")
                              where linkDoc.Elements("links").Elements("link").Where(p=>p.Element("refId").Value==w.Element("id").Value).Any() == false 			      select new
                              {
                                  id = w.Element("id").Value,
                                  term = w.Element("text").Value,
                                  syns = from refIds in linkDoc.Elements("links").Elements("link")
                                         join synLinkedWord in wordDoc.Elements("words").Elements("word") on refIds.Element("refId").Value equals synLinkedWord.Element("id").Value
                                         where refIds.Element("id").Value == w.Element("id").Value
                                         select new
                                         {
                                             id = synLinkedWord.Element("id").Value,
                                             term = synLinkedWord.Element("text").Value
                                         }
                              };

            foreach (var word in wordMatched)
            {
                txtOutput.AppendText("\r\nWord: " + word.term);
                foreach (var syn in word.syns)
                {
                    txtOutput.AppendText("\r\n\t->Syn: " + syn.term);
                }
            }