<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:copyright="http://blogs.law.harvard.edu/tech/rss" xmlns:image="http://purl.org/rss/1.0/modules/image/">
    <channel>
        <title>ScapeCode.Com</title>
        <link>http://scapecode.com/Default.aspx</link>
        <description>A strange land of meta-programming and template hackery!</description>
        <language>en-US</language>
        <copyright>Washu</copyright>
        <managingEditor>ryoohki@gmail.com</managingEditor>
        <generator>Subtext Version 1.9.4.78</generator>
        <image>
            <title>ScapeCode.Com</title>
            <url>http://scapecode.com/images/RSS2Image.gif</url>
            <link>http://scapecode.com/Default.aspx</link>
            <width>77</width>
            <height>60</height>
        </image>
        <item>
            <title>Playing With The .NET JIT Part 4</title>
            <link>http://scapecode.com/archive/2007/05/05/Playing-With-The-.NET-JIT-Part-4.aspx</link>
            <description>&lt;p&gt;As noted &lt;a href="http://scapecode.com/archive/2007/04/28/Playing-with-the-.NET-JIT-Part-3.aspx"&gt;previously&lt;/a&gt; there are some cases where the performance of unmanaged code can beat that of the managed JIT. In the previous case it was the matrix multiplication function. We do have some other possible performance benefits we can give to our .NET code, specifically, we can NGEN it. NGEN is an interesting utility, it can perform heavy optimizations that would not be possible in the standard runtime JIT (as we shall see). The question before us is: Will it give us enough of a boost to be able to surpass the performance of our unmanaged matrix multiplication?&lt;/p&gt;
&lt;h3&gt;An Analysis of Existing Code&lt;/h3&gt;
&lt;p&gt;We haven't looked at the current code that was produced for our previous tests yet, so I feel that it is time we gave it a look and see what we have. To keep this shorter we'll only look at the inner product function. The code produced for the matrix multiplication suffers from the same problems and benefits from the same extensions. For the purposes of this writing we'll only consider the x64 platform. First up we'll look at our unmanaged matrix multiplication, which as we may recall is an SSE2 version. There some things we should note: this method cannot be inlined into the managed code, and there are no frame pointers (they got optimized out).&lt;/p&gt;
&lt;pre&gt;00000001`800019c3 0f100a          movups  xmm1,xmmword ptr [rdx]&lt;br /&gt;00000001`800019c6 0f59c8          mulps   xmm1,xmm0&lt;br /&gt;00000001`800019c9 0f28c1          movaps  xmm0,xmm1&lt;br /&gt;00000001`800019cc 0fc6c14e        shufps  xmm0,xmm1,4Eh&lt;br /&gt;00000001`800019d0 0f58c8          addps   xmm1,xmm0&lt;br /&gt;00000001`800019d3 0f28c1          movaps  xmm0,xmm1&lt;br /&gt;00000001`800019d6 0fc6c11b        shufps  xmm0,xmm1,1Bh&lt;br /&gt;00000001`800019da 0f58c1          addps   xmm0,xmm1&lt;br /&gt;00000001`800019dd f3410f1100      movss   dword ptr [r8],xmm0&lt;br /&gt;00000001`800019e2 c3              ret&lt;/pre&gt;
&lt;p&gt;The code used to produce the managed version shown below has undergone a slight modification. No longer does the method return a float, instead it has an out parameter to a float, which ends up holding the result of the operation. This change was made to eliminate some compilation issues in both the managed and unmanaged versions. In the case of the managed version below, without the out parameter the store operation (at &lt;font face="Courier New"&gt;00000642`801673b3&lt;/font&gt;) would have required a conversion to a double and back to a single again, the new versions are shown at the end of this post. Examining the managed inner product we get a somewhat worse picture:&lt;/p&gt;
&lt;pre&gt;00000642`8016732f 4c8b4908        mov     r9,qword ptr [rcx+8]&lt;br /&gt;00000642`80167333 4d85c9          test    r9,r9&lt;br /&gt;00000642`80167336 0f8684000000    jbe     00000642`801673c0&lt;br /&gt;00000642`8016733c f30f104110      movss   xmm0,dword ptr [rcx+10h]&lt;br /&gt;00000642`80167341 488b4208        mov     rax,qword ptr [rdx+8]&lt;br /&gt;00000642`80167345 4885c0          test    rax,rax&lt;br /&gt;00000642`80167348 7676            jbe     00000642`801673c0&lt;br /&gt;00000642`8016734a f30f104a10      movss   xmm1,dword ptr [rdx+10h]&lt;br /&gt;00000642`8016734f f30f59c8        mulss   xmm1,xmm0&lt;br /&gt;00000642`80167353 4983f901        cmp     r9,1&lt;br /&gt;00000642`80167357 7667            jbe     00000642`801673c0&lt;br /&gt;00000642`80167359 f30f105114      movss   xmm2,dword ptr [rcx+14h]&lt;br /&gt;00000642`8016735e 483d01000000    cmp     rax,1&lt;br /&gt;00000642`80167364 765a            jbe     00000642`801673c0&lt;br /&gt;00000642`80167366 f30f104214      movss   xmm0,dword ptr [rdx+14h]&lt;br /&gt;00000642`8016736b f30f59c2        mulss   xmm0,xmm2&lt;br /&gt;00000642`8016736f f30f58c1        addss   xmm0,xmm1&lt;br /&gt;00000642`80167373 4983f902        cmp     r9,2&lt;br /&gt;00000642`80167377 7647            jbe     00000642`801673c0&lt;br /&gt;00000642`80167379 f30f105118      movss   xmm2,dword ptr [rcx+18h]&lt;br /&gt;00000642`8016737e 483d02000000    cmp     rax,2&lt;br /&gt;00000642`80167384 763a            jbe     00000642`801673c0&lt;br /&gt;00000642`80167386 f30f104a18      movss   xmm1,dword ptr [rdx+18h]&lt;br /&gt;00000642`8016738b f30f59ca        mulss   xmm1,xmm2&lt;br /&gt;00000642`8016738f f30f58c8        addss   xmm1,xmm0&lt;br /&gt;00000642`80167393 4983f903        cmp     r9,3&lt;br /&gt;00000642`80167397 7627            jbe     00000642`801673c0&lt;br /&gt;00000642`80167399 f30f10511c      movss   xmm2,dword ptr [rcx+1Ch]&lt;br /&gt;00000642`8016739e 483d03000000    cmp     rax,3&lt;br /&gt;00000642`801673a4 761a            jbe     00000642`801673c0&lt;br /&gt;00000642`801673a6 f30f10421c      movss   xmm0,dword ptr [rdx+1Ch]&lt;br /&gt;00000642`801673ab f30f59c2        mulss   xmm0,xmm2&lt;br /&gt;00000642`801673af f30f58c1        addss   xmm0,xmm1&lt;br /&gt;00000642`801673b3 f3410f114040    movss   dword ptr [r8+40h],xmm0&lt;br /&gt;.&lt;br /&gt;.&lt;br /&gt;.&lt;br /&gt;00000642`801673bd f3c3            rep ret&lt;br /&gt;00000642`801673bf 90              nop&lt;br /&gt;00000642`801673c0 e88b9f8aff      call    mscorwks!JIT_RngChkFail (00000642`7fa11350)&lt;/pre&gt;
&lt;p&gt;Wow! Lots of conditionals there, it's not vectorized either, but we don't expect it to be, automatic vectorization is a hit and miss type of deal with most optimizing compilers (like the Intel one). Not to mention, vectorizing in the runtime JIT would take up far too much time. This method is inlined for us (thankfully), but we see that it is littered with conditionals and jumps. So where are they jumping to? Well, they are actually ending up just after the end of the method. Note the &lt;font face="Courier New"&gt;nop&lt;/font&gt; instruction that causes the jump destination to be paragraph aligned, that is intentional. As you can probably guess based on the name from the jump destination, those conditionals are checking the array bounds, stored in &lt;font face="Courier New"&gt;r9&lt;/font&gt; and &lt;font face="Courier New"&gt;rax&lt;/font&gt;, against the indices being used. Those jumps aren't actually that friendly for branch prediction, but for the most part they won't hamper the speed of this method much, but they are an additional cost. Unfortuantly, they are rather problematic for the matrix version, and tend to cost quite a bit in performance.&lt;/p&gt;
&lt;p&gt;We also can see that in x64 mode the JIT will use SSE2 for floating point operations. This is quite nice, but does have some interesting consequences, for instance comparing floating point numbers generated using the FPU and those using SSE2 will actually more than likely fail, EVEN IF you truncate them to their appropriate sizes. The reason for this is that the XMM registers (when using the single versions of the instructions and not the double ones) store the floating point values as exactly 32 bit floats. The FPU however will expand them to 80 bit floats, which means that operations on those 80 bit floats before truncating them can affect the lower bits of the 32 bit result in a manner that will result in them differing in the lower portions. If you are wondering when this might become an issue, then you can imagine the problems of running a managed networked game where you have 64bit and 32 bit clients all sending packets to the server. This is just another reason why you should be using deltas for comparison of floats. Other things to note is that with the addition of SSE2 support came the ability to use instructions that save us loads and stores, such as the &lt;font face="Courier New"&gt;cvtss2sd&lt;/font&gt; and &lt;font face="Courier New"&gt;cvtsd2ss&lt;/font&gt; instructions, which perform single to double and double to single conversions respectively.&lt;/p&gt;
&lt;h3&gt;Examining the Call Stack&lt;/h3&gt;
&lt;p&gt;Of course, there is also the question of exactly what all does our program go through to call our unmanaged methods. First off, the JIT will have to generate several marshalling stubs (to deal with any non-blittable types, although in this case all of the passed types are blittable), along with the security demands. The total number of machines instructions for these stubs is around 10-30, never the less, they aren't inlinable and end up having to be created at runtime. The extra overhead of these calls can add up to quite a bit. First up we'll look at the pinvoke and the delegate stacks:&lt;/p&gt;
&lt;pre&gt;000006427f66bd14 ManagedMathLib!matrix_mul&lt;br /&gt;0000064280168b85 mscorwks!DoNDirectCall__PatchGetThreadCall+0x78&lt;br /&gt;0000064280168ccc ManagedMathLib!DomainBoundILStubClass.IL_STUB(Single[], Single[], Single[])+0xb5&lt;br /&gt;0000064280168a0f PInvokeTest!SecurityILStubClass.IL_STUB(Single[], Single[], Single[])+0x5c&lt;br /&gt;000006428016893e PInvokeTest!PInvokeTest.Program+&amp;lt;&amp;gt;c__DisplayClass8.&amp;lt;Main&amp;gt;b__0()+0x1f&lt;br /&gt;0000064280167ca1 PInvokeTest!PInvokeTest.Program.TimeTest(TestMethod, Int32)+0x6e&lt;br /&gt;000006427f66c5e2 PInvokeTest!PInvokeTest.Program.Main(System.String[])+0x591&lt;/pre&gt;
&lt;pre&gt;000006427f66bd14 ManagedMathLib!matrix_mul&lt;br /&gt;0000064280168465 mscorwks!DoNDirectCall__PatchGetThreadCall+0x78&lt;br /&gt;00000642801685c1 ManagedMathLib!DomainBoundILStubClass.IL_STUB(Single[], Single[], Single[])+0xb5&lt;br /&gt;0000064280168945 PInvokeTest!SecurityILStubClass.IL_STUB(Single[], Single[], Single[])+0x51&lt;br /&gt;0000064280167d59 PInvokeTest!PInvokeTest.Program.TimeTest(TestMethod, Int32)+0x75&lt;br /&gt;000006427f66c5e2 PInvokeTest!PInvokeTest.Program.Main(System.String[])+0x649&lt;/pre&gt;
&lt;p&gt;We can see the two stubs that were created, along with this last method called &lt;font face="Courier New"&gt;DoNDirectCall__PatchGetThreadCall&lt;/font&gt; that actually does the work of calling to our unmanaged function. Exactly what it does is probably what the name says, although I haven't actually dug in and tried to find out what's going on in the internals of it. One important thing to notice is the &lt;font face="Courier New"&gt;PInvokeTest!PInvokeTest.Program+&amp;lt;&amp;gt;c__DisplayClass8.&amp;lt;Main&amp;gt;b__0() &lt;/font&gt;&lt;font face="Arial"&gt;call, which is actually a delegate used to call to our unmanaged method (passed in to TimeTest). By using the delegate to call the matrix multiplication function, the JIT was able to eliminate the calls entirely. Other than that, the contents of the two sets of stubs are practically identical. The security stub actually asserts that we have the right to call to unmanaged code, as this is a security demand and can change at runtime, this cannot be eliminated. Calling to our unmanaged function from the manged DLL is up next, and it turns out that this is also the most direct call:&lt;/font&gt;&lt;/p&gt;
&lt;pre&gt;000006427f66bf32 ManagedMathLib!matrix_mul&lt;br /&gt;0000064280169601 mscorwks!DoNDirectCallWorker+0x62&lt;br /&gt;00000642801694ef ManagedMathLib!ManagedMathLib.ManagedMath.MatrixMul(Single[], Single[], Single[])+0xd1&lt;br /&gt;0000064280168945 PInvokeTest!PInvokeTest.Program+&amp;lt;&amp;gt;c__DisplayClass8.&amp;lt;Main&amp;gt;b__3()+0x1f&lt;br /&gt;0000064280167ecf PInvokeTest!PInvokeTest.Program.TimeTest(TestMethod, Int32)+0x75&lt;br /&gt;000006427f66c5e2 PInvokeTest!PInvokeTest.Program.Main(System.String[])+0x7bf&lt;/pre&gt;
&lt;p&gt;As we can see, the only real work that is done to call our unmanaged method is the call to &lt;font face="Courier New"&gt;DoNDirectCallWorker&lt;/font&gt;&lt;font face="Arial"&gt;. Digging around in that method we find that it is basically a wrapper that saves registers, sets up some registers and then dispatches to the unmanaged function. Upon returning it restores the registers and returns to the caller. There is no dynamic method construction, nor does this require any extra overhead on our end. In fact, one could say that the code is about as fast as we can expect it to be for a managed to unmanaged transition. Looking at the difference between the original unmanaged inner product call and the new version (which writes takes a pointer to the destination float), being made from the managed DLL, we can see a huge difference:&lt;/font&gt;&lt;/p&gt;
&lt;pre&gt;000006427f66bf32 ManagedMathLib!inner_product&lt;br /&gt;0000064280169bd0 mscorwks!DoNDirectCallWorker+0x62&lt;br /&gt;0000064280169acf ManagedMathLib!ManagedMathLib.ManagedMath.InnerProduct(Single[], Single[], Single ByRef)+0xc0&lt;br /&gt;0000064280168955 PInvokeTest!PInvokeTest.Program+&amp;lt;&amp;gt;c__DisplayClass8.&amp;lt;Main&amp;gt;b__7()+0x1f&lt;br /&gt;00000642801681c5 PInvokeTest!PInvokeTest.Program.TimeTest(TestMethod, Int32)+0x75&lt;br /&gt;000006427f66c5e2 PInvokeTest!PInvokeTest.Program.Main(System.String[])+0xab5&lt;/pre&gt;
&lt;pre&gt;000006427f66bd14 ManagedMathLib!inner_product&lt;br /&gt;0000064280169ca3 mscorwks!DoNDirectCall__PatchGetThreadCall+0x78&lt;br /&gt;0000064280169ba0 ManagedMathLib!DomainBoundILStubClass.IL_STUB(Single*, Single*)+0x43&lt;br /&gt;0000064280169b00 ManagedMathLib!ManagedMathLib.ManagedMath.InnerProduct(Single[], Single[])+0x50&lt;br /&gt;000006428016893e PInvokeTest!PInvokeTest.Program+&amp;lt;&amp;gt;c__DisplayClass8.&amp;lt;Main&amp;gt;b__7()+0x20&lt;br /&gt;00000642801681c5 PInvokeTest!PInvokeTest.Program.TimeTest(TestMethod, Int32)+0x6e&lt;br /&gt;000006427f66c5e2 PInvokeTest!PInvokeTest.Program.Main(System.String[])+0xab5&lt;/pre&gt;
&lt;p&gt;Notice the second call stack has the marshalling stub (also note the parameters to the stub). Returning value types has all sorts of interesting consequences. By changing the signature to write out to a float (in the case of the managed DLL it uses an out parameter), we eliminate the marshalling stub entirely. This improves performance by a decent bit, but nowhere near enough to make up for the call in the first place. The managed inner product is still significantly faster.&lt;/p&gt;
&lt;h3&gt;And then came NGEN&lt;/h3&gt;
&lt;p&gt;So, we've gone through and optimized our managed application, but yet it still is running too slow. We contemplate the necessity of moving some code over to the unmanaged world and shudder at the implications. Security would be shot, bugs abound...what to do! But then we remember that there's yet one more option, NGEN!&lt;/p&gt;
&lt;p&gt;Running NGEN on our test executable prejitted the whole thing, even methods that eventually ended up being inlined. So, what did it do to our managed inner product? Well first we'll look at the actual method that got prejitted:&lt;/p&gt;
&lt;pre&gt;PInvokeTest.Program.InnerProduct2(Single[], Single[], Single ByRef)&lt;br /&gt;Begin 0000064288003290, size b0&lt;br /&gt;00000642`88003290 4883ec28        sub     rsp,28h&lt;br /&gt;00000642`88003294 4c8bc9          mov     r9,rcx&lt;br /&gt;00000642`88003297 498b4108        mov     rax,qword ptr [r9+8]&lt;br /&gt;00000642`8800329b 4885c0          test    rax,rax&lt;br /&gt;00000642`8800329e 0f8696000000    jbe     PInvokeTest_ni!COM+_Entry_Point &amp;lt;PERF&amp;gt; (PInvokeTest_ni+0x333a) (00000642`8800333a)&lt;br /&gt;00000642`880032a4 33c9            xor     ecx,ecx&lt;br /&gt;00000642`880032a6 488b4a08        mov     rcx,qword ptr [rdx+8]&lt;br /&gt;00000642`880032aa 4885c9          test    rcx,rcx&lt;br /&gt;00000642`880032ad 0f8687000000    jbe     PInvokeTest_ni!COM+_Entry_Point &amp;lt;PERF&amp;gt; (PInvokeTest_ni+0x333a) (00000642`8800333a)&lt;br /&gt;00000642`880032b3 4533d2          xor     r10d,r10d&lt;br /&gt;00000642`880032b6 483d01000000    cmp     rax,1&lt;br /&gt;00000642`880032bc 767c            jbe     PInvokeTest_ni!COM+_Entry_Point &amp;lt;PERF&amp;gt; (PInvokeTest_ni+0x333a) (00000642`8800333a)&lt;br /&gt;00000642`880032be 41ba01000000    mov     r10d,1&lt;br /&gt;00000642`880032c4 4883f901        cmp     rcx,1&lt;br /&gt;00000642`880032c8 7670            jbe     PInvokeTest_ni!COM+_Entry_Point &amp;lt;PERF&amp;gt; (PInvokeTest_ni+0x333a) (00000642`8800333a)&lt;br /&gt;00000642`880032ca 41ba01000000    mov     r10d,1&lt;br /&gt;00000642`880032d0 483d02000000    cmp     rax,2&lt;br /&gt;00000642`880032d6 7662            jbe     PInvokeTest_ni!COM+_Entry_Point &amp;lt;PERF&amp;gt; (PInvokeTest_ni+0x333a) (00000642`8800333a)&lt;br /&gt;00000642`880032d8 41ba02000000    mov     r10d,2&lt;br /&gt;00000642`880032de 4883f902        cmp     rcx,2&lt;br /&gt;00000642`880032e2 7656            jbe     PInvokeTest_ni!COM+_Entry_Point &amp;lt;PERF&amp;gt; (PInvokeTest_ni+0x333a) (00000642`8800333a)&lt;br /&gt;00000642`880032e4 483d03000000    cmp     rax,3&lt;br /&gt;00000642`880032ea 764e            jbe     PInvokeTest_ni!COM+_Entry_Point &amp;lt;PERF&amp;gt; (PInvokeTest_ni+0x333a) (00000642`8800333a)&lt;br /&gt;00000642`880032ec b803000000      mov     eax,3&lt;br /&gt;00000642`880032f1 4883f903        cmp     rcx,3&lt;br /&gt;00000642`880032f5 7643            jbe     PInvokeTest_ni!COM+_Entry_Point &amp;lt;PERF&amp;gt; (PInvokeTest_ni+0x333a) (00000642`8800333a)&lt;br /&gt;00000642`880032f7 f30f104a14      movss   xmm1,dword ptr [rdx+14h]&lt;br /&gt;00000642`880032fc f3410f594914    mulss   xmm1,dword ptr [r9+14h]&lt;br /&gt;00000642`88003302 f30f104210      movss   xmm0,dword ptr [rdx+10h]&lt;br /&gt;00000642`88003307 f3410f594110    mulss   xmm0,dword ptr [r9+10h]&lt;br /&gt;00000642`8800330d f30f58c8        addss   xmm1,xmm0&lt;br /&gt;00000642`88003311 f30f104218      movss   xmm0,dword ptr [rdx+18h]&lt;br /&gt;00000642`88003316 f3410f594118    mulss   xmm0,dword ptr [r9+18h]&lt;br /&gt;00000642`8800331c f30f58c8        addss   xmm1,xmm0&lt;br /&gt;00000642`88003320 f30f10421c      movss   xmm0,dword ptr [rdx+1Ch]&lt;br /&gt;00000642`88003325 f3410f59411c    mulss   xmm0,dword ptr [r9+1Ch]&lt;br /&gt;00000642`8800332b f30f58c8        addss   xmm1,xmm0&lt;br /&gt;00000642`8800332f f3410f1108      movss   dword ptr [r8],xmm1&lt;br /&gt;00000642`88003334 4883c428        add     rsp,28h&lt;br /&gt;00000642`88003338 f3c3            rep ret&lt;br /&gt;00000642`8800333a e811e0a0f7      call    mscorwks!JIT_RngChkFail (00000642`7fa11350)&lt;br /&gt;00000642`8800333f 90              nop&lt;/pre&gt;
&lt;p&gt;Interesting results eh? First off, all of the checks are right up front, and ignoring the stack frames we can see exactly what will be inlined. Some other things to note: This method appears a lot better than before, with all of the branches right up at the top where one would assume branch prediction can best deal with them (the registers never change and are being compared to constants). Never the less there are some oddities in this code, for instance there appear to be some extrenuous instructions like &lt;font face="Courier New"&gt;mov eax,3&lt;/font&gt;&lt;font face="Arial"&gt;. Yeah, don't ask me. Never the less the code is clearly superior to its previous form, and in fact the matrix version is equally as superior, with the range checks being spaced out significantly more (and a bunch are done right up front as well). Of course, the question now is: How much does this help our performance? First up we'll examine some results from the new code base, and then some from the NGEN results on the same code base.&lt;/font&gt;&lt;/p&gt;
&lt;pre&gt;Count: 50&lt;br /&gt;PInvoke MatrixMul : 00:00:07.6456226 Average: 00:00:00.1529124&lt;br /&gt;Delegate MatrixMul: 00:00:06.6500307 Average: 00:00:00.1330006&lt;br /&gt;Managed MatrixMul: 00:00:05.5783511 Average: 00:00:00.1115670&lt;br /&gt;Internal MatrixMul: 00:00:04.5377141 Average: 00:00:00.0907542&lt;br /&gt;PInvoke Inner Product: 00:00:05.4466987 Average: 00:00:00.1089339&lt;br /&gt;Delegate Inner Product: 00:00:04.5001885 Average: 00:00:00.0900037&lt;br /&gt;Managed Inner Product: 00:00:00.5535891 Average: 00:00:00.0110717&lt;br /&gt;Internal Inner Product: 00:00:02.2694728 Average: 00:00:00.0453894&lt;/pre&gt;
&lt;pre&gt;Count: 10&lt;br /&gt;PInvoke MatrixMul : 00:00:01.5706254 Average: 00:00:00.1570625&lt;br /&gt;Delegate MatrixMul: 00:00:01.2689247 Average: 00:00:00.1268924&lt;br /&gt;Managed MatrixMul: 00:00:01.1501118 Average: 00:00:00.1150111&lt;br /&gt;Internal MatrixMul: 00:00:00.9302144 Average: 00:00:00.0930214&lt;br /&gt;PInvoke Inner Product: 00:00:01.0198933 Average: 00:00:00.1019893&lt;br /&gt;Delegate Inner Product: 00:00:00.8538827 Average: 00:00:00.0853882&lt;br /&gt;Managed Inner Product: 00:00:00.0987369 Average: 00:00:00.0098736&lt;br /&gt;Internal Inner Product: 00:00:00.4287660 Average: 00:00:00.0428766&lt;/pre&gt;
&lt;p&gt;All in all, our performance changes have helped out the managed inner product a decent amount, although even the unmanaged calls managed to get a bit of a boost. Now for the NGEN results:&lt;/p&gt;
&lt;pre&gt;Count: 50&lt;br /&gt;PInvoke MatrixMul : 00:00:07.5788052 Average: 00:00:00.1515761&lt;br /&gt;Delegate MatrixMul: 00:00:06.2202549 Average: 00:00:00.1244050&lt;br /&gt;Managed MatrixMul: 00:00:04.0376665 Average: 00:00:00.0807533&lt;br /&gt;Internal MatrixMul: 00:00:04.5778189 Average: 00:00:00.0915563&lt;br /&gt;PInvoke Inner Product: 00:00:05.2785764 Average: 00:00:00.1055715&lt;br /&gt;Delegate Inner Product: 00:00:04.1814388 Average: 00:00:00.0836287&lt;br /&gt;Managed Inner Product: 00:00:00.5579279 Average: 00:00:00.0111585&lt;br /&gt;Internal Inner Product: 00:00:02.2419279 Average: 00:00:00.0448385&lt;/pre&gt;
&lt;pre&gt;Count: 10&lt;br /&gt;PInvoke MatrixMul : 00:00:01.3822036 Average: 00:00:00.1382203&lt;br /&gt;Delegate MatrixMul: 00:00:01.1436108 Average: 00:00:00.1143610&lt;br /&gt;Managed MatrixMul: 00:00:00.7386742 Average: 00:00:00.0738674&lt;br /&gt;Internal MatrixMul: 00:00:00.8427460 Average: 00:00:00.0842746&lt;br /&gt;PInvoke Inner Product: 00:00:00.9507331 Average: 00:00:00.0950733&lt;br /&gt;Delegate Inner Product: 00:00:00.7428082 Average: 00:00:00.0742808&lt;br /&gt;Managed Inner Product: 00:00:00.1005084 Average: 00:00:00.0100508&lt;br /&gt;Internal Inner Product: 00:00:00.4025611 Average: 00:00:00.0402561&lt;/pre&gt;
&lt;p&gt;So, now we can see that our matrix multiplication doesn't offer any advantages over the managed version, in fact it's actually &lt;strong&gt;SLOWER&lt;/strong&gt; than the managed version! We also can see that the unmanaged invocations also benefitted from the NGEN process, as their managed calls were also optimized somewhat, although the stub wrappers are still there and hence still add their overhead. Other things we note is that the inner product function appears to have slowed down just a bit, this might be nothing, or it might be due to machine load or it might genuinly be slower. I'm tempted to say that it's actually slower now, though.&lt;/p&gt;
&lt;h3&gt;Conclusion&lt;/h3&gt;
&lt;p&gt;You may recall that this was all sparked by a discussion I had &lt;a href="http://cowboyprogramming.com/2007/01/05/blob-physics/"&gt;way back when&lt;/a&gt; about comparing managed and unmanaged benchmarks and the disadvantages of just setting the /clr flag. I've gone a bit past that though in looking at our managed resources and optimized unmanaged resources and when it is actually beneficial to call into unmanaged code. It is still beneficial to do so, but only with some operations that are just sufficiently taxing enough to bother with. In this case our matrix code which, while in a pure JIT situation, the native code clearly beat out the JIT produced code, gets beat out by the managed version. So what is sufficiently taxing then? Well, set processing might be taxing enough. That is: applying a set of vectorized operations to a collection of objects. But the reality is, you MUST profile first before you can be sure that optimizations of that sort are anywhere near what you need, as if you just assume it will you're probably mistaken.&lt;/p&gt;
&lt;p&gt;On a final note, the x86 version also performs better when NGENed than the native version, although in a surprise jump, the delegates actually cost significantly more:&lt;/p&gt;
&lt;pre&gt;Count: 50&lt;br /&gt;PInvoke MatrixMul : 00:00:07.9897235 Average: 00:00:00.1597944&lt;br /&gt;Delegate MatrixMul: 00:00:27.2561396 Average: 00:00:00.5451227&lt;br /&gt;Managed MatrixMul: 00:00:03.5224029 Average: 00:00:00.0704480&lt;br /&gt;Internal MatrixMul: 00:00:04.5232549 Average: 00:00:00.0904650&lt;br /&gt;PInvoke Inner Product: 00:00:05.5799834 Average: 00:00:00.1115996&lt;br /&gt;Delegate Inner Product: 00:00:29.5660003 Average: 00:00:00.5913200&lt;br /&gt;Managed Inner Product: 00:00:00.5755690 Average: 00:00:00.0115113&lt;br /&gt;Internal Inner Product: 00:00:01.8218949 Average: 00:00:00.0364378&lt;/pre&gt;
&lt;p&gt;Exactly why this is I haven't investigated, and perhaps I will next time.&lt;/p&gt;
&lt;p&gt;Sources for the new inner product functions:&lt;/p&gt;
&lt;pre&gt;&lt;font color="#0000ff"&gt;void __declspec&lt;/font&gt;(&lt;font color="#0000ff"&gt;dllexport&lt;/font&gt;) inner_product(&lt;font color="#0000ff"&gt;float const&lt;/font&gt;* v1, &lt;font color="#0000ff"&gt;float const&lt;/font&gt;* v2, &lt;font color="#0000ff"&gt;float&lt;/font&gt;* out) {
	&lt;font color="#0000ff"&gt;__m128&lt;/font&gt; a = _mm_mul_ps(_mm_loadu_ps(v1), _mm_loadu_ps(v2));
	a = _mm_add_ps(a, _mm_shuffle_ps(a, a, _MM_SHUFFLE(&lt;font color="#800000"&gt;1&lt;/font&gt;, &lt;font color="#800000"&gt;0&lt;/font&gt;, &lt;font color="#800000"&gt;3&lt;/font&gt;, &lt;font color="#800000"&gt;2&lt;/font&gt;)));
	_mm_store_ss(out, _mm_add_ps(a, _mm_shuffle_ps(a, a, _MM_SHUFFLE(&lt;font color="#800000"&gt;0&lt;/font&gt;, &lt;font color="#800000"&gt;1&lt;/font&gt;, &lt;font color="#800000"&gt;2&lt;/font&gt;, &lt;font color="#800000"&gt;3&lt;/font&gt;))));
}

&lt;font color="#0000ff"&gt;static void&lt;/font&gt; InnerProduct(&lt;font color="#0000ff"&gt;array&lt;/font&gt;&amp;lt;&lt;font color="#0000ff"&gt;float&lt;/font&gt;&amp;gt;&lt;float&gt;&lt;/float&gt;^ v1, &lt;font color="#0000ff"&gt;array&lt;/font&gt;&amp;lt;&lt;font color="#0000ff"&gt;float&lt;/font&gt;&amp;gt;&lt;float&gt;&lt;/float&gt;^ v2, [Runtime::InteropServices::&lt;font color="#008080"&gt;Out&lt;/font&gt;] &lt;font color="#0000ff"&gt;float&lt;/font&gt;% result) {
	&lt;font color="#0000ff"&gt;pin_ptr&lt;/font&gt;&amp;lt;&lt;font color="#0000ff"&gt;float&lt;/font&gt;&amp;gt;&lt;float&gt;&lt;/float&gt; pv1 = &amp;amp;v1[&lt;font color="#800000"&gt;0&lt;/font&gt;];
	&lt;font color="#0000ff"&gt;pin_ptr&lt;/font&gt;&amp;lt;&lt;font color="#0000ff"&gt;float&lt;/font&gt;&amp;gt;&lt;float&gt;&lt;/float&gt; pv2 = &amp;amp;v2[&lt;font color="#800000"&gt;0&lt;/font&gt;];
	&lt;font color="#0000ff"&gt;pin_ptr&lt;/font&gt;&amp;lt;&lt;font color="#0000ff"&gt;float&lt;/font&gt;&amp;gt;&lt;float&gt;&lt;/float&gt; out = &amp;amp;result;

	inner_product(pv1, pv2, out);
}

&lt;font color="#0000ff"&gt;public&lt;/font&gt; &lt;font color="#0000ff"&gt;static void&lt;/font&gt; InnerProduct2(&lt;font color="#0000ff"&gt;float&lt;/font&gt;[] v1, &lt;font color="#0000ff"&gt;float&lt;/font&gt;[] v2, &lt;font color="#0000ff"&gt;out float&lt;/font&gt; f) {
	f = v1[&lt;font color="#800000"&gt;0&lt;/font&gt;] * v2[&lt;font color="#800000"&gt;0&lt;/font&gt;] + v1[&lt;font color="#800000"&gt;1&lt;/font&gt;] * v2[&lt;font color="#800000"&gt;1&lt;/font&gt;] + v1[&lt;font color="#800000"&gt;2&lt;/font&gt;] * v2[&lt;font color="#800000"&gt;2&lt;/font&gt;] + v1[&lt;font color="#800000"&gt;3&lt;/font&gt;] * v2[&lt;font color="#800000"&gt;3&lt;/font&gt;];
}&lt;/pre&gt;&lt;img src="http://scapecode.com/aggbug/12.aspx" width="1" height="1" /&gt;</description>
            <dc:creator>Washu</dc:creator>
            <guid>http://scapecode.com/archive/2007/05/05/Playing-With-The-.NET-JIT-Part-4.aspx</guid>
            <pubDate>Sun, 06 May 2007 05:47:25 GMT</pubDate>
            <wfw:comment>http://scapecode.com/comments/12.aspx</wfw:comment>
            <comments>http://scapecode.com/archive/2007/05/05/Playing-With-The-.NET-JIT-Part-4.aspx#feedback</comments>
            <slash:comments>1</slash:comments>
            <wfw:commentRss>http://scapecode.com/comments/commentRss/12.aspx</wfw:commentRss>
            <trackback:ping>http://scapecode.com/services/trackbacks/12.aspx</trackback:ping>
        </item>
        <item>
            <title>Playing With The .NET JIT Part 3</title>
            <link>http://scapecode.com/archive/2007/04/28/Playing-with-the-.NET-JIT-Part-3.aspx</link>
            <description>&lt;p&gt;Integrating unmanaged code into the managed platform is one of the problem areas with the managed world. Often times the exact costs of calling into unmanaged code is unknown. This obviously leads to some confusion as to when it is appropriate to mix in unmanaged code to help to improve the performance of our application.&lt;/p&gt;
&lt;h3&gt;PInvoke&lt;/h3&gt;
&lt;p&gt;There are three ways to access an unmanaged function from managed code. The first is to use the PInvoke capabilities of the language. In C# this is done by declaring a method with external linkage and indicating (using the &lt;font face="Courier New" color="#008080"&gt;DllImportAttribute&lt;/font&gt; attribute) in which DLL the method may be found. The second way would be to obtain a pointer to the function (using &lt;font face="Courier New"&gt;LoadLibrary&lt;/font&gt;/&lt;font face="Courier New"&gt;GetProcAddress&lt;/font&gt;/&lt;font face="Courier New"&gt;FreeLibrary&lt;/font&gt;), and marshal that pointer to a managed delegate using &lt;font face="Courier New"&gt;&lt;font color="#008080"&gt;Marshal&lt;/font&gt;.GetDelegateForFunctionPointer&lt;/font&gt;. Finally you can write an unmanaged wrapper around the function, using C++/CLI, and invoke that managed method, which will in turn call the unmanaged method.&lt;/p&gt;
&lt;p&gt;For the purposes of this post we’ll be using two mathematical sample functions. The first being the standard inner product on R&lt;sup&gt;3&lt;/sup&gt; (aka the dot product), and the second will be a 4x4 matrix multiplication. We’ll be comparing two implementations, the first will be a trivial managed implementation of them, and the second will be a SSE2 optimized version. Thanks must be given to Arseny Kapoulkine for the SSE2 version of the matrix multiplication.&lt;/p&gt;
&lt;p&gt;First up are the implementations of the inner product functions, it should be noted that I’ll be doing the profiling in x64 mode, however the results are similar (albeit a bit slower) for x86.&lt;/p&gt;
&lt;pre&gt;&lt;font color="#0000ff"&gt;public&lt;/font&gt; &lt;font color="#0000ff"&gt;static&lt;/font&gt; &lt;font color="#0000ff"&gt;float&lt;/font&gt; InnerProduct2(&lt;font color="#0000ff"&gt;float&lt;/font&gt;[] v1, &lt;font color="#0000ff"&gt;float&lt;/font&gt;[] v2) {&lt;br /&gt;	&lt;font color="#0000ff"&gt;return&lt;/font&gt; v1[&lt;font color="#800000"&gt;0&lt;/font&gt;] * v2[&lt;font color="#800000"&gt;0&lt;/font&gt;] + v1[&lt;font color="#800000"&gt;1&lt;/font&gt;] * v2[&lt;font color="#800000"&gt;1&lt;/font&gt;] + v1[&lt;font color="#800000"&gt;2&lt;/font&gt;] * v2[&lt;font color="#800000"&gt;2&lt;/font&gt;] + v1[&lt;font color="#800000"&gt;3&lt;/font&gt;] * v2[&lt;font color="#800000"&gt;3&lt;/font&gt;];&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;&lt;font color="#0000ff"&gt;float __declspec&lt;/font&gt;(&lt;font color="#0000ff"&gt;dllexport&lt;/font&gt;) inner_product(&lt;font color="#0000ff"&gt;float const&lt;/font&gt;* v1, &lt;font color="#0000ff"&gt;float const&lt;/font&gt;* v2) {&lt;br /&gt;	&lt;font color="#0000ff"&gt;float&lt;/font&gt; result;&lt;br /&gt;	&lt;font color="#0000ff"&gt;__m128&lt;/font&gt; a = _mm_mul_ps(_mm_loadu_ps(v1), _mm_loadu_ps(v2));&lt;br /&gt;	a = _mm_add_ps(a, _mm_shuffle_ps(a, a, _MM_SHUFFLE(&lt;font color="#800000"&gt;1&lt;/font&gt;, &lt;font color="#800000"&gt;0&lt;/font&gt;, &lt;font color="#800000"&gt;3&lt;/font&gt;, &lt;font color="#800000"&gt;2&lt;/font&gt;)));&lt;br /&gt;	_mm_store_ss(&amp;amp;result, _mm_add_ps(a, _mm_shuffle_ps(a, a, _MM_SHUFFLE(&lt;font color="#800000"&gt;0&lt;/font&gt;, &lt;font color="#800000"&gt;1&lt;/font&gt;, &lt;font color="#800000"&gt;2&lt;/font&gt;, &lt;font color="#800000"&gt;3&lt;/font&gt;))));&lt;br /&gt;	&lt;font color="#0000ff"&gt;return&lt;/font&gt; result;&lt;br /&gt;}&lt;/pre&gt;
&lt;p&gt;Things that should be noted about these implementations is that they both operate soley on arrays of floats. InnerProduct2 is inlineable since it’s only 23 bytes long and is taking reference types as parameters. The unmanaged inner product could also be implemented using the SSE3 haddps instruction, however I decided to keep it as processor neutral as possible by using only SSE2 instructions.&lt;/p&gt;
&lt;p&gt;The implementations of the matrix multiplication vary quite significantly as well, the managed version is the trivial implementation, but its expansion into machine code is quite long. The unmanaged version is an SSE2 optimized one, the raw performance boost of using it is quite significant.&lt;/p&gt;
&lt;pre&gt;&lt;font color="#0000ff"&gt;public static void&lt;/font&gt; MatrixMul2(&lt;font color="#0000ff"&gt;float&lt;/font&gt;[] m1, &lt;font color="#0000ff"&gt;float&lt;/font&gt;[] m2, &lt;font color="#0000ff"&gt;float&lt;/font&gt;[] o) {&lt;br /&gt;	o[&lt;font color="#800000"&gt;0&lt;/font&gt;] = m1[&lt;font color="#800000"&gt;0&lt;/font&gt;] * m2[&lt;font color="#800000"&gt;0&lt;/font&gt;] + m1[&lt;font color="#800000"&gt;1&lt;/font&gt;] * m2[&lt;font color="#800000"&gt;4&lt;/font&gt;] + m1[&lt;font color="#800000"&gt;2&lt;/font&gt;] * m2[&lt;font color="#800000"&gt;8&lt;/font&gt;] + m1[&lt;font color="#800000"&gt;3&lt;/font&gt;] * m2[&lt;font color="#800000"&gt;12&lt;/font&gt;];&lt;br /&gt;	o[&lt;font color="#800000"&gt;1&lt;/font&gt;] = m1[&lt;font color="#800000"&gt;0&lt;/font&gt;] * m2[&lt;font color="#800000"&gt;1&lt;/font&gt;] + m1[&lt;font color="#800000"&gt;1&lt;/font&gt;] * m2[&lt;font color="#800000"&gt;5&lt;/font&gt;] + m1[&lt;font color="#800000"&gt;2&lt;/font&gt;] * m2[&lt;font color="#800000"&gt;9&lt;/font&gt;] + m1[&lt;font color="#800000"&gt;3&lt;/font&gt;] * m2[&lt;font color="#800000"&gt;13&lt;/font&gt;];&lt;br /&gt;	o[&lt;font color="#800000"&gt;2&lt;/font&gt;] = m1[&lt;font color="#800000"&gt;0&lt;/font&gt;] * m2[&lt;font color="#800000"&gt;2&lt;/font&gt;] + m1[&lt;font color="#800000"&gt;1&lt;/font&gt;] * m2[&lt;font color="#800000"&gt;6&lt;/font&gt;] + m1[&lt;font color="#800000"&gt;2&lt;/font&gt;] * m2[&lt;font color="#800000"&gt;10&lt;/font&gt;] + m1[&lt;font color="#800000"&gt;3&lt;/font&gt;] * m2[&lt;font color="#800000"&gt;14&lt;/font&gt;];&lt;br /&gt;	o[&lt;font color="#800000"&gt;3&lt;/font&gt;] = m1[&lt;font color="#800000"&gt;0&lt;/font&gt;] * m2[&lt;font color="#800000"&gt;3&lt;/font&gt;] + m1[&lt;font color="#800000"&gt;1&lt;/font&gt;] * m2[&lt;font color="#800000"&gt;7&lt;/font&gt;] + m1[&lt;font color="#800000"&gt;2&lt;/font&gt;] * m2[&lt;font color="#800000"&gt;11&lt;/font&gt;] + m1[&lt;font color="#800000"&gt;3&lt;/font&gt;] * m2[&lt;font color="#800000"&gt;15&lt;/font&gt;];&lt;br /&gt;&lt;br /&gt;	o[&lt;font color="#800000"&gt;4&lt;/font&gt;] = m1[&lt;font color="#800000"&gt;4&lt;/font&gt;] * m2[&lt;font color="#800000"&gt;0&lt;/font&gt;] + m1[&lt;font color="#800000"&gt;5&lt;/font&gt;] * m2[&lt;font color="#800000"&gt;4&lt;/font&gt;] + m1[&lt;font color="#800000"&gt;6&lt;/font&gt;] * m2[&lt;font color="#800000"&gt;8&lt;/font&gt;] + m1[&lt;font color="#800000"&gt;7&lt;/font&gt;] * m2[&lt;font color="#800000"&gt;12&lt;/font&gt;];&lt;br /&gt;	o[&lt;font color="#800000"&gt;5&lt;/font&gt;] = m1[&lt;font color="#800000"&gt;4&lt;/font&gt;] * m2[&lt;font color="#800000"&gt;1&lt;/font&gt;] + m1[&lt;font color="#800000"&gt;5&lt;/font&gt;] * m2[&lt;font color="#800000"&gt;5&lt;/font&gt;] + m1[&lt;font color="#800000"&gt;6&lt;/font&gt;] * m2[&lt;font color="#800000"&gt;9&lt;/font&gt;] + m1[&lt;font color="#800000"&gt;7&lt;/font&gt;] * m2[&lt;font color="#800000"&gt;13&lt;/font&gt;];&lt;br /&gt;	o[&lt;font color="#800000"&gt;6&lt;/font&gt;] = m1[&lt;font color="#800000"&gt;4&lt;/font&gt;] * m2[&lt;font color="#800000"&gt;2&lt;/font&gt;] + m1[&lt;font color="#800000"&gt;5&lt;/font&gt;] * m2[&lt;font color="#800000"&gt;6&lt;/font&gt;] + m1[&lt;font color="#800000"&gt;6&lt;/font&gt;] * m2[&lt;font color="#800000"&gt;10&lt;/font&gt;] + m1[&lt;font color="#800000"&gt;7&lt;/font&gt;] * m2[&lt;font color="#800000"&gt;14&lt;/font&gt;];&lt;br /&gt;	o[&lt;font color="#800000"&gt;7&lt;/font&gt;] = m1[&lt;font color="#800000"&gt;4&lt;/font&gt;] * m2[&lt;font color="#800000"&gt;3&lt;/font&gt;] + m1[&lt;font color="#800000"&gt;5&lt;/font&gt;] * m2[&lt;font color="#800000"&gt;7&lt;/font&gt;] + m1[&lt;font color="#800000"&gt;6&lt;/font&gt;] * m2[&lt;font color="#800000"&gt;11&lt;/font&gt;] + m1[&lt;font color="#800000"&gt;7&lt;/font&gt;] * m2[&lt;font color="#800000"&gt;15&lt;/font&gt;];&lt;br /&gt;&lt;br /&gt;	o[&lt;font color="#800000"&gt;8&lt;/font&gt;] = m1[&lt;font color="#800000"&gt;8&lt;/font&gt;] * m2[&lt;font color="#800000"&gt;0&lt;/font&gt;] + m1[&lt;font color="#800000"&gt;9&lt;/font&gt;] * m2[&lt;font color="#800000"&gt;4&lt;/font&gt;] + m1[&lt;font color="#800000"&gt;10&lt;/font&gt;] * m2[&lt;font color="#800000"&gt;8&lt;/font&gt;] + m1[&lt;font color="#800000"&gt;11&lt;/font&gt;] * m2[&lt;font color="#800000"&gt;12&lt;/font&gt;];&lt;br /&gt;	o[&lt;font color="#800000"&gt;9&lt;/font&gt;] = m1[&lt;font color="#800000"&gt;8&lt;/font&gt;] * m2[&lt;font color="#800000"&gt;1&lt;/font&gt;] + m1[&lt;font color="#800000"&gt;9&lt;/font&gt;] * m2[&lt;font color="#800000"&gt;5&lt;/font&gt;] + m1[&lt;font color="#800000"&gt;10&lt;/font&gt;] * m2[&lt;font color="#800000"&gt;9&lt;/font&gt;] + m1[&lt;font color="#800000"&gt;11&lt;/font&gt;] * m2[&lt;font color="#800000"&gt;13&lt;/font&gt;];&lt;br /&gt;	o[&lt;font color="#800000"&gt;10&lt;/font&gt;] = m1[&lt;font color="#800000"&gt;8&lt;/font&gt;] * m2[&lt;font color="#800000"&gt;2&lt;/font&gt;] + m1[&lt;font color="#800000"&gt;9&lt;/font&gt;] * m2[&lt;font color="#800000"&gt;6&lt;/font&gt;] + m1[&lt;font color="#800000"&gt;10&lt;/font&gt;] * m2[&lt;font color="#800000"&gt;10&lt;/font&gt;] + m1[&lt;font color="#800000"&gt;11&lt;/font&gt;] * m2[&lt;font color="#800000"&gt;14&lt;/font&gt;];&lt;br /&gt;	o[&lt;font color="#800000"&gt;11&lt;/font&gt;] = m1[&lt;font color="#800000"&gt;8&lt;/font&gt;] * m2[&lt;font color="#800000"&gt;3&lt;/font&gt;] + m1[&lt;font color="#800000"&gt;9&lt;/font&gt;] * m2[&lt;font color="#800000"&gt;7&lt;/font&gt;] + m1[&lt;font color="#800000"&gt;10&lt;/font&gt;] * m2[&lt;font color="#800000"&gt;11&lt;/font&gt;] + m1[&lt;font color="#800000"&gt;11&lt;/font&gt;] * m2[&lt;font color="#800000"&gt;15&lt;/font&gt;];&lt;br /&gt;&lt;br /&gt;	o[&lt;font color="#800000"&gt;12&lt;/font&gt;] = m1[&lt;font color="#800000"&gt;12&lt;/font&gt;] * m2[&lt;font color="#800000"&gt;0&lt;/font&gt;] + m1[&lt;font color="#800000"&gt;13&lt;/font&gt;] * m2[&lt;font color="#800000"&gt;4&lt;/font&gt;] + m1[&lt;font color="#800000"&gt;14&lt;/font&gt;] * m2[&lt;font color="#800000"&gt;8&lt;/font&gt;] + m1[&lt;font color="#800000"&gt;15&lt;/font&gt;] * m2[&lt;font color="#800000"&gt;12&lt;/font&gt;];&lt;br /&gt;	o[&lt;font color="#800000"&gt;13&lt;/font&gt;] = m1[&lt;font color="#800000"&gt;12&lt;/font&gt;] * m2[&lt;font color="#800000"&gt;1&lt;/font&gt;] + m1[&lt;font color="#800000"&gt;13&lt;/font&gt;] * m2[&lt;font color="#800000"&gt;5]&lt;/font&gt; + m1[&lt;font color="#800000"&gt;14&lt;/font&gt;] * m2[&lt;font color="#800000"&gt;9&lt;/font&gt;] + m1[&lt;font color="#800000"&gt;15&lt;/font&gt;] * m2[&lt;font color="#800000"&gt;13&lt;/font&gt;];&lt;br /&gt;	o[&lt;font color="#800000"&gt;14&lt;/font&gt;] = m1[&lt;font color="#800000"&gt;12&lt;/font&gt;] * m2[&lt;font color="#800000"&gt;2&lt;/font&gt;] + m1[&lt;font color="#800000"&gt;13&lt;/font&gt;] * m2[&lt;font color="#800000"&gt;6&lt;/font&gt;] + m1[&lt;font color="#800000"&gt;14&lt;/font&gt;] * m2[&lt;font color="#800000"&gt;10&lt;/font&gt;] + m1[&lt;font color="#800000"&gt;15&lt;/font&gt;] * m2[&lt;font color="#800000"&gt;14&lt;/font&gt;];&lt;br /&gt;	o[&lt;font color="#800000"&gt;15&lt;/font&gt;] = m1[&lt;font color="#800000"&gt;12&lt;/font&gt;] * m2[&lt;font color="#800000"&gt;3&lt;/font&gt;] + m1[&lt;font color="#800000"&gt;13&lt;/font&gt;] * m2[&lt;font color="#800000"&gt;7&lt;/font&gt;] + m1[&lt;font color="#800000"&gt;14&lt;/font&gt;] * m2[&lt;font color="#800000"&gt;11&lt;/font&gt;] + m1[&lt;font color="#800000"&gt;15&lt;/font&gt;] * m2[&lt;font color="#800000"&gt;15&lt;/font&gt;];&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;&lt;font color="#0000ff"&gt;void __declspec&lt;/font&gt;(&lt;font color="#0000ff"&gt;dllexport&lt;/font&gt;) matrix_mul(&lt;font color="#0000ff"&gt;float const&lt;/font&gt;* m1, &lt;font color="#0000ff"&gt;float const&lt;/font&gt;* m2, &lt;font color="#0000ff"&gt;float&lt;/font&gt;* out)&lt;br /&gt;{&lt;br /&gt;	&lt;font color="#0000ff"&gt;__m128&lt;/font&gt; r;&lt;br /&gt;&lt;br /&gt;	&lt;font color="#0000ff"&gt;__m128&lt;/font&gt; col1 = _mm_loadu_ps(m2);&lt;br /&gt;	&lt;font color="#0000ff"&gt;__m128&lt;/font&gt; col2 = _mm_loadu_ps(m2 + &lt;font color="#800000"&gt;4&lt;/font&gt;);&lt;br /&gt;	&lt;font color="#0000ff"&gt;__m128&lt;/font&gt; col3 = _mm_loadu_ps(m2 + &lt;font color="#800000"&gt;8&lt;/font&gt;);&lt;br /&gt;	&lt;font color="#0000ff"&gt;__m128&lt;/font&gt; col4 = _mm_loadu_ps(m2 + &lt;font color="#800000"&gt;12&lt;/font&gt;);&lt;br /&gt;&lt;br /&gt;	&lt;font color="#0000ff"&gt;__m128&lt;/font&gt; row1 = _mm_loadu_ps(m1);&lt;br /&gt;&lt;br /&gt;	r = _mm_add_ps(_mm_mul_ps(_mm_shuffle_ps(row1, row1, _MM_SHUFFLE(&lt;font color="#800000"&gt;0&lt;/font&gt;, &lt;font color="#800000"&gt;0&lt;/font&gt;, &lt;font color="#800000"&gt;0&lt;/font&gt;, &lt;font color="#800000"&gt;0&lt;/font&gt;)), col1),&lt;br /&gt;		_mm_add_ps(_mm_mul_ps(_mm_shuffle_ps(row1, row1, _MM_SHUFFLE(&lt;font color="#800000"&gt;1&lt;/font&gt;, &lt;font color="#800000"&gt;1&lt;/font&gt;, &lt;font color="#800000"&gt;1&lt;/font&gt;, &lt;font color="#800000"&gt;1&lt;/font&gt;)), col2),&lt;br /&gt;		_mm_add_ps(_mm_mul_ps(_mm_shuffle_ps(row1, row1, _MM_SHUFFLE(&lt;font color="#800000"&gt;2&lt;/font&gt;, &lt;font color="#800000"&gt;2&lt;/font&gt;, &lt;font color="#800000"&gt;2&lt;/font&gt;, &lt;font color="#800000"&gt;2&lt;/font&gt;)), col3),&lt;br /&gt;		_mm_mul_ps(_mm_shuffle_ps(row1, row1, _MM_SHUFFLE(&lt;font color="#800000"&gt;3&lt;/font&gt;, &lt;font color="#800000"&gt;3&lt;/font&gt;, &lt;font color="#800000"&gt;3&lt;/font&gt;, &lt;font color="#800000"&gt;3&lt;/font&gt;)), col4))));&lt;br /&gt;&lt;br /&gt;	_mm_storeu_ps(out, r);  &lt;br /&gt;&lt;br /&gt;	&lt;font color="#0000ff"&gt;__m128&lt;/font&gt; row2 = _mm_loadu_ps(m1 + &lt;font color="#800000"&gt;4&lt;/font&gt;);&lt;br /&gt;&lt;br /&gt;	r = _mm_add_ps(_mm_mul_ps(_mm_shuffle_ps(row2, row2, _MM_SHUFFLE(&lt;font color="#800000"&gt;0&lt;/font&gt;, &lt;font color="#800000"&gt;0&lt;/font&gt;, &lt;font color="#800000"&gt;0&lt;/font&gt;, &lt;font color="#800000"&gt;0&lt;/font&gt;)), col1),&lt;br /&gt;		_mm_add_ps(_mm_mul_ps(_mm_shuffle_ps(row2, row2, _MM_SHUFFLE(&lt;font color="#800000"&gt;1&lt;/font&gt;, &lt;font color="#800000"&gt;1&lt;/font&gt;, &lt;font color="#800000"&gt;1&lt;/font&gt;, &lt;font color="#800000"&gt;1&lt;/font&gt;)), col2),&lt;br /&gt;		_mm_add_ps(_mm_mul_ps(_mm_shuffle_ps(row2, row2, _MM_SHUFFLE(&lt;font color="#800000"&gt;2&lt;/font&gt;, &lt;font color="#800000"&gt;2&lt;/font&gt;, &lt;font color="#800000"&gt;2&lt;/font&gt;, &lt;font color="#800000"&gt;2&lt;/font&gt;)), col3),&lt;br /&gt;		_mm_mul_ps(_mm_shuffle_ps(row2, row2, _MM_SHUFFLE(&lt;font color="#800000"&gt;3&lt;/font&gt;, &lt;font color="#800000"&gt;3&lt;/font&gt;, &lt;font color="#800000"&gt;3&lt;/font&gt;, &lt;font color="#800000"&gt;3&lt;/font&gt;)), col4))));&lt;br /&gt;&lt;br /&gt;	_mm_storeu_ps(out + &lt;font color="#800000"&gt;4&lt;/font&gt;, r);  &lt;br /&gt;&lt;br /&gt;	&lt;font color="#0000ff"&gt;__m128&lt;/font&gt; row3 = _mm_loadu_ps(m1 + &lt;font color="#800000"&gt;8&lt;/font&gt;);&lt;br /&gt;&lt;br /&gt;	r = _mm_add_ps(_mm_mul_ps(_mm_shuffle_ps(row3, row3, _MM_SHUFFLE(&lt;font color="#800000"&gt;0&lt;/font&gt;, &lt;font color="#800000"&gt;0&lt;/font&gt;, &lt;font color="#800000"&gt;0&lt;/font&gt;, &lt;font color="#800000"&gt;0&lt;/font&gt;)), col1),&lt;br /&gt;		_mm_add_ps(_mm_mul_ps(_mm_shuffle_ps(row3, row3, _MM_SHUFFLE(&lt;font color="#800000"&gt;1&lt;/font&gt;, &lt;font color="#800000"&gt;1&lt;/font&gt;, &lt;font color="#800000"&gt;1&lt;/font&gt;, &lt;font color="#800000"&gt;1&lt;/font&gt;)), col2),&lt;br /&gt;		_mm_add_ps(_mm_mul_ps(_mm_shuffle_ps(row3, row3, _MM_SHUFFLE(&lt;font color="#800000"&gt;2&lt;/font&gt;, &lt;font color="#800000"&gt;2&lt;/font&gt;, &lt;font color="#800000"&gt;2&lt;/font&gt;, &lt;font color="#800000"&gt;2&lt;/font&gt;)), col3),&lt;br /&gt;		_mm_mul_ps(_mm_shuffle_ps(row3, row3, _MM_SHUFFLE(&lt;font color="#800000"&gt;3&lt;/font&gt;, &lt;font color="#800000"&gt;3&lt;/font&gt;, &lt;font color="#800000"&gt;3&lt;/font&gt;, &lt;font color="#800000"&gt;3&lt;/font&gt;)), col4))));&lt;br /&gt;&lt;br /&gt;	_mm_storeu_ps(out + &lt;font color="#800000"&gt;8&lt;/font&gt;, r);  &lt;br /&gt;&lt;br /&gt;	&lt;font color="#0000ff"&gt;__m128&lt;/font&gt; row4 = _mm_loadu_ps(m1 + &lt;font color="#800000"&gt;12&lt;/font&gt;);&lt;br /&gt;&lt;br /&gt;	r = _mm_add_ps(_mm_mul_ps(_mm_shuffle_ps(row4, row4, _MM_SHUFFLE(&lt;font color="#800000"&gt;0&lt;/font&gt;, &lt;font color="#800000"&gt;0&lt;/font&gt;, &lt;font color="#800000"&gt;0&lt;/font&gt;, &lt;font color="#800000"&gt;0&lt;/font&gt;)), col1),&lt;br /&gt;		_mm_add_ps(_mm_mul_ps(_mm_shuffle_ps(row4, row4, _MM_SHUFFLE(&lt;font color="#800000"&gt;1&lt;/font&gt;, &lt;font color="#800000"&gt;1&lt;/font&gt;, &lt;font color="#800000"&gt;1&lt;/font&gt;, &lt;font color="#800000"&gt;1&lt;/font&gt;)), col2),&lt;br /&gt;		_mm_add_ps(_mm_mul_ps(_mm_shuffle_ps(row4, row4, _MM_SHUFFLE(&lt;font color="#800000"&gt;2&lt;/font&gt;, &lt;font color="#800000"&gt;2&lt;/font&gt;, &lt;font color="#800000"&gt;2&lt;/font&gt;, &lt;font color="#800000"&gt;2&lt;/font&gt;)), col3),&lt;br /&gt;		_mm_mul_ps(_mm_shuffle_ps(row4, row4, _MM_SHUFFLE(&lt;font color="#800000"&gt;3&lt;/font&gt;, &lt;font color="#800000"&gt;3&lt;/font&gt;, &lt;font color="#800000"&gt;3&lt;/font&gt;, &lt;font color="#800000"&gt;3&lt;/font&gt;)), col4))));&lt;br /&gt;&lt;br /&gt;	_mm_storeu_ps(out + &lt;font color="#800000"&gt;12&lt;/font&gt;, r);&lt;br /&gt;}&lt;/pre&gt;
&lt;p&gt;It is trivially obvious that the managed version of the matrix multiplication cannot be inlined. The overhead of the function call is really the least of your worries though (it is the smallest cost of the entire method really). The unmanaged version is a nicely optimized SSE2 method, and requires only a minimal number of loads and stores from main memory, and the loads and stores are reasonably cache friendly (P4 will prefetch 128 bytes of memory).&lt;/p&gt;
&lt;h4&gt;PInvoke&lt;/h4&gt;
&lt;p&gt;Of course, the question is, how do these perform against each other when called from a managed application. The profiling setup is quite simple. It simply runs the methods against a set of matricies and vectors (randomly generated) a million times. It repeats those tests several more times (100 in this case), and averages the results. Full optimizations were turned on for both the unmanaged and managed tests. The Internal calls are made from a managed class that directly calls to the unmanaged methods. Both the managed wrapper and the unmanaged methods are hosted in the same DLL (source for the full DLL at the end of this entry).&lt;/p&gt;
&lt;pre&gt;PInvoke MatrixMul : 00:00:15.0203285 Average: 00:00:00.1502032&lt;br /&gt;Delegate MatrixMul: 00:00:13.1004306 Average: 00:00:00.1310043&lt;br /&gt;Managed MatrixMul: 00:00:10.2809715 Average: 00:00:00.1028097&lt;br /&gt;Internal MatrixMul: 00:00:08.8992407 Average: 00:00:00.0889924&lt;br /&gt;PInvoke Inner Product: 00:00:10.6779944 Average: 00:00:00.1067799&lt;br /&gt;Delegate Inner Product: 00:00:09.3359882 Average: 00:00:00.0933598&lt;br /&gt;Managed Inner Product: 00:00:01.3460812 Average: 00:00:00.0134608&lt;br /&gt;Internal Inner Product: 00:00:05.6842336 Average: 00:00:00.0568423&lt;/pre&gt;
&lt;p&gt;The first thing to note is that the PInvoke calls for both the matrix multiplication and inner product were the slowest. The delegate calls were only slightly faster than the PInvoke calls. As we move into the managed territory we find the the results begin to diverge. The managed matrix multiplication is slower than the internal matrix multiplication, however the managed inner product is several times faster than the internal one.&lt;/p&gt;
&lt;p&gt;Part of the reason behind this divergance is a result of the invocation framework. There is a cost to calling unmanaged methods from managed code, as each method must be wrapped to perform operations such as fixing any managed resources, performing marshalling for non-blittable types, and finally calling the actual native method. After returning the method further marshalling of the return type may be required, along with checks on the condition of the stack and exception checks (SEH exceptions are caught and wrapped in the SEHException class). Even the internal calls to the unmanaged method require some amount of this, although the actual marshalling requirements are avoided, as are some of the other costs. The result is that the costs add up over time, and in the case of the inner product the additional cost overrode the complexity requirements of the method (which is fairly trivial). The case, on the average, is different for the matrix multiplication. The additional costs of the call do not add a significant amount overhead compared to that of the body of the method, which executes faster than that of the managed matrix multiplication due to vectorization.&lt;/p&gt;
&lt;p&gt;Performing further testing with counts at 50 and 25 reveal similar results, however the managed matrix multiplication begins to approach the performance of the internal one. However, even at a count of 1 (that’s one million matrix multiplications), the internal matrix multiplication is faster than the managed version.&lt;/p&gt;
&lt;pre&gt;Count = 50&lt;br /&gt;PInvoke MatrixMul : 00:00:07.4730356 Average: 00:00:00.1494607&lt;br /&gt;Delegate MatrixMul: 00:00:06.4519274 Average: 00:00:00.1290385&lt;br /&gt;Managed MatrixMul: 00:00:05.1662482 Average: 00:00:00.1033249&lt;br /&gt;Internal MatrixMul: 00:00:04.3371530 Average: 00:00:00.0867430&lt;br /&gt;PInvoke Inner Product: 00:00:05.3891030 Average: 00:00:00.1077820&lt;br /&gt;Delegate Inner Product: 00:00:04.7625597 Average: 00:00:00.0952511&lt;br /&gt;Managed Inner Product: 00:00:00.6791549 Average: 00:00:00.0135830&lt;br /&gt;Internal Inner Product: 00:00:02.6719175 Average: 00:00:00.0534383&lt;br /&gt;&lt;br /&gt;Count = 25&lt;br /&gt;PInvoke MatrixMul : 00:00:03.7432932 Average: 00:00:00.1497317&lt;br /&gt;Delegate MatrixMul: 00:00:03.2074834 Average: 00:00:00.1282993&lt;br /&gt;Managed MatrixMul: 00:00:02.6200096 Average: 00:00:00.1048003&lt;br /&gt;Internal MatrixMul: 00:00:02.2144342 Average: 00:00:00.0885773&lt;br /&gt;PInvoke Inner Product: 00:00:02.8778559 Average: 00:00:00.1151142&lt;br /&gt;Delegate Inner Product: 00:00:02.0178957 Average: 00:00:00.0807158&lt;br /&gt;Managed Inner Product: 00:00:00.3385675 Average: 00:00:00.0135427&lt;br /&gt;Internal Inner Product: 00:00:01.4391529 Average: 00:00:00.0575661&lt;br /&gt;&lt;br /&gt;Count = 5&lt;br /&gt;PInvoke MatrixMul : 00:00:00.7642981 Average: 00:00:00.1528596&lt;br /&gt;Delegate MatrixMul: 00:00:00.6407667 Average: 00:00:00.1281533&lt;br /&gt;Managed MatrixMul: 00:00:00.5231416 Average: 00:00:00.1046283&lt;br /&gt;Internal MatrixMul: 00:00:00.4458765 Average: 00:00:00.0891753&lt;br /&gt;PInvoke Inner Product: 00:00:00.5702666 Average: 00:00:00.1140533&lt;br /&gt;Delegate Inner Product: 00:00:00.4122217 Average: 00:00:00.0824443&lt;br /&gt;Managed Inner Product: 00:00:00.0683842 Average: 00:00:00.0136768&lt;br /&gt;Internal Inner Product: 00:00:00.2899304 Average: 00:00:00.0579860&lt;br /&gt;&lt;br /&gt;Count = 1&lt;br /&gt;PInvoke MatrixMul : 00:00:00.1476958 Average: 00:00:00.1476958&lt;br /&gt;Delegate MatrixMul: 00:00:00.1337818 Average: 00:00:00.1337818&lt;br /&gt;Managed MatrixMul: 00:00:00.1155993 Average: 00:00:00.1155993&lt;br /&gt;Internal MatrixMul: 00:00:00.0919538 Average: 00:00:00.0919538&lt;br /&gt;PInvoke Inner Product: 00:00:00.1155769 Average: 00:00:00.1155769&lt;br /&gt;Delegate Inner Product: 00:00:00.0906768 Average: 00:00:00.0906768&lt;br /&gt;Managed Inner Product: 00:00:00.0155480 Average: 00:00:00.0155480&lt;br /&gt;Internal Inner Product: 00:00:00.0653527 Average: 00:00:00.0653527&lt;/pre&gt;
&lt;h3&gt;Conclusion&lt;/h3&gt;
&lt;p&gt;Clearly we should reserve unmanaged operations for longer running methods where the cost of the managed wrappers is negligible compared to the cost of the method. Even heavily optimized methods cost significantly in the wrapping code, and so trivial optimizations are easily overshadowed by that cost. It is best to use unmanaged operations wrapped in a C++/CLI wrapper (and preferably the wrapper will be part of the library that the operations are in). Next time we'll look at the assembly produced by the JIT for these methods under varying circumstances.&lt;/p&gt;
&lt;p&gt;Source for Managed DLL:&lt;/p&gt;
&lt;pre&gt;#pragma managed(push, off)
#include &amp;lt;intrin.h&amp;gt;

extern "C" {
	float __declspec(dllexport) inner_product(float const* v1, float const* v2) {
		float result;
		__m128 a = _mm_mul_ps(_mm_loadu_ps(v1), _mm_loadu_ps(v2));
		a = _mm_add_ps(a, _mm_shuffle_ps(a, a, _MM_SHUFFLE(1, 0, 3, 2)));
		_mm_store_ss(&amp;amp;result, _mm_add_ps(a, _mm_shuffle_ps(a, a, _MM_SHUFFLE(0, 1, 2, 3))));
		return result;
	}

void __declspec(dllexport) matrix_mul(float const* m1, float const* m2, float* out)
{
	__m128 r;

	__m128 col1 = _mm_loadu_ps(m2);
	__m128 col2 = _mm_loadu_ps(m2 + 4);
	__m128 col3 = _mm_loadu_ps(m2 + 8);
	__m128 col4 = _mm_loadu_ps(m2 + 12);

	__m128 row1 = _mm_loadu_ps(m1);
	
	r = _mm_add_ps(_mm_mul_ps(_mm_shuffle_ps(row1, row1, _MM_SHUFFLE(0, 0, 0, 0)), col1),
		_mm_add_ps(_mm_mul_ps(_mm_shuffle_ps(row1, row1, _MM_SHUFFLE(1, 1, 1, 1)), col2),
		_mm_add_ps(_mm_mul_ps(_mm_shuffle_ps(row1, row1, _MM_SHUFFLE(2, 2, 2, 2)), col3),
		_mm_mul_ps(_mm_shuffle_ps(row1, row1, _MM_SHUFFLE(3, 3, 3, 3)), col4))));

	_mm_storeu_ps(out, r);  
	__m128 row2 = _mm_loadu_ps(m1 + 4);

	r = _mm_add_ps(_mm_mul_ps(_mm_shuffle_ps(row2, row2, _MM_SHUFFLE(0, 0, 0, 0)), col1),
		_mm_add_ps(_mm_mul_ps(_mm_shuffle_ps(row2, row2, _MM_SHUFFLE(1, 1, 1, 1)), col2),
		_mm_add_ps(_mm_mul_ps(_mm_shuffle_ps(row2, row2, _MM_SHUFFLE(2, 2, 2, 2)), col3),
		_mm_mul_ps(_mm_shuffle_ps(row2, row2, _MM_SHUFFLE(3, 3, 3, 3)), col4))));

	_mm_storeu_ps(out + 4, r);  
	__m128 row3 = _mm_loadu_ps(m1 + 8);

	r = _mm_add_ps(_mm_mul_ps(_mm_shuffle_ps(row3, row3, _MM_SHUFFLE(0, 0, 0, 0)), col1),
		_mm_add_ps(_mm_mul_ps(_mm_shuffle_ps(row3, row3, _MM_SHUFFLE(1, 1, 1, 1)), col2),
		_mm_add_ps(_mm_mul_ps(_mm_shuffle_ps(row3, row3, _MM_SHUFFLE(2, 2, 2, 2)), col3),
		_mm_mul_ps(_mm_shuffle_ps(row3, row3, _MM_SHUFFLE(3, 3, 3, 3)), col4))));

	_mm_storeu_ps(out + 8, r);  
	__m128 row4 = _mm_loadu_ps(m1 + 12);

	r = _mm_add_ps(_mm_mul_ps(_mm_shuffle_ps(row4, row4, _MM_SHUFFLE(0, 0, 0, 0)), col1),
		_mm_add_ps(_mm_mul_ps(_mm_shuffle_ps(row4, row4, _MM_SHUFFLE(1, 1, 1, 1)), col2),
		_mm_add_ps(_mm_mul_ps(_mm_shuffle_ps(row4, row4, _MM_SHUFFLE(2, 2, 2, 2)), col3),
		_mm_mul_ps(_mm_shuffle_ps(row4, row4, _MM_SHUFFLE(3, 3, 3, 3)), col4))));

	_mm_storeu_ps(out + 12, r);
}
}
#pragma managed(pop)

using namespace System;

namespace ManagedMathLib {
	public ref class ManagedMath {
	public:
		static IntPtr InnerProductPtr = IntPtr(inner_product);
		static IntPtr MatrixMulPtr = IntPtr(matrix_mul);

		static float InnerProduct(array&amp;lt;float&amp;gt;^ v1, array&amp;lt;float&amp;gt;^ v2) {
			pin_ptr&amp;lt;float&amp;gt; pv1 = &amp;amp;v1[0];
			pin_ptr&amp;lt;float&amp;gt; pv2 = &amp;amp;v2[0];

			return inner_product(pv1, pv2);
		}

		static void MatrixMul(array&amp;lt;float&amp;gt;^ m1, array&amp;lt;float&amp;gt;^ m2, array&amp;lt;float&amp;gt;^ out) {
			pin_ptr&amp;lt;float&amp;gt; pm1 = &amp;amp;m1[0];
			pin_ptr&amp;lt;float&amp;gt; pm2 = &amp;amp;m2[0];
			pin_ptr&amp;lt;float&amp;gt; outp = &amp;amp;out[0];
			matrix_mul(pm1, pm2, outp);
		}
	};
}&lt;/pre&gt;&lt;img src="http://scapecode.com/aggbug/11.aspx" width="1" height="1" /&gt;</description>
            <dc:creator>Washu</dc:creator>
            <guid>http://scapecode.com/archive/2007/04/28/Playing-with-the-.NET-JIT-Part-3.aspx</guid>
            <pubDate>Sun, 29 Apr 2007 01:21:31 GMT</pubDate>
            <wfw:comment>http://scapecode.com/comments/11.aspx</wfw:comment>
            <comments>http://scapecode.com/archive/2007/04/28/Playing-with-the-.NET-JIT-Part-3.aspx#feedback</comments>
            <slash:comments>4</slash:comments>
            <wfw:commentRss>http://scapecode.com/comments/commentRss/11.aspx</wfw:commentRss>
            <trackback:ping>http://scapecode.com/services/trackbacks/11.aspx</trackback:ping>
        </item>
        <item>
            <title>Playing With The .NET JIT Part 2</title>
            <link>http://scapecode.com/archive/2007/02/26/10.aspx</link>
            <description>&lt;p&gt;&lt;a href="http://scapecode.com/archive/2007/02/23/9.aspx"&gt;Previously&lt;/a&gt; I discussed various potential issues the x86 JIT had with inlining non-trivial methods and functions taking or returning value types. In this entry I hope to cover some potential pitfalls facing would be optimizers, along with discussing some unexpected optimizations that do take place.&lt;/p&gt;
&lt;h3&gt;Optimizations That Aren’t&lt;/h3&gt;
&lt;p&gt;It is not that uncommon to see people advocating the usage of unsafe code as a means of producing “optimized” code in the managed environment. The idea is a simple one, by getting down to the metal with pointers and all that fun stuff, you can somehow produce code that will be “optimized” in ways that typical managed code cannot be.&lt;/p&gt;
&lt;p&gt;Unsafe code does not allow you to manipulate pointers to managed objects in whatever manner you please. Certain steps have to be taken to ensure that your operations are safe with regards to the managed heap. Just because your code is marked as “unsafe” doesn’t mean that it is free to do what it wants. For example, you cannot assign a pointer the address of a managed object without first pinning the object. Pointers to objects are not tracked by the GC, so should you obtain a pointer to an object and then attempt to use the pointer, you could end up accessing a now collected region of memory. What can also happen is that you could obtain a pointer to an object, but when the GC runs your object could be shuffled around on the heap. This shuffling would invalidate your pointer, but since pointers are not tracked by the GC it would not be updated (while references to objects are updated). Pinning objects solves this problem, and hence is why you are only allowed to take the address of an object that’s been pinned. In essence, a pinned object cannot be moved nor collected by the GC until it is unpinned. This is typically done through the use of the fixed keyword in C# or the GCHandle structure.&lt;/p&gt;
&lt;p&gt;Much like how a fixed object cannot be moved by the GC, a pointer to a fixed object cannot be reassigned. This makes it difficult to traverse primitive arrays, as you end up needing to create other temporary pointers, or limiting the size of the fixed area to a small segment. Fixed objects, and unsafe code, increase the overall size of the produced IL by a fairly significant margin. While an increase in the IL is not indicative of the size of the produced machine code, it does prevent the runtime from inlining such methods. As an example, the two following snippets reveal the difference between a safe inner product and an unsafe one; note that in the unmanaged case it was using a fixed sized buffer.&lt;/p&gt;
&lt;pre&gt;&lt;font color="#0000ff"&gt;public float&lt;/font&gt; Magnitude() {
    &lt;font color="#0000ff"&gt;return&lt;/font&gt; (&lt;font color="#0000ff"&gt;float&lt;/font&gt;)&lt;font color="#008080"&gt;Math&lt;/font&gt;.Sqrt(X * X + Y * Y + Z * Z);
}

&lt;font color="#000080"&gt;.method public hidebysig instance&lt;/font&gt; &lt;font color="#0000ff"&gt;float32&lt;/font&gt; Magnitude() &lt;font color="#000080"&gt;cil managed&lt;/font&gt;
{
    .&lt;font color="#000080"&gt;maxstack&lt;/font&gt; 8
    L_0000: ldarg.0 
    L_0001: ldfld &lt;font color="#0000ff"&gt;float32&lt;/font&gt; PerformanceTests.&lt;font color="#008080"&gt;Vector3&lt;/font&gt;::X
    L_0006: ldarg.0 
    L_0007: ldfld &lt;font color="#008080"&gt;&lt;font color="#0000ff"&gt;float32&lt;/font&gt; &lt;/font&gt;PerformanceTests.&lt;font color="#008080"&gt;Vector3&lt;/font&gt;::X
    L_000c: mul 
    L_000d: ldarg.0 
    L_000e: ldfld &lt;font color="#008080"&gt;&lt;font color="#0000ff"&gt;float32&lt;/font&gt; &lt;/font&gt;PerformanceTests.&lt;font color="#008080"&gt;Vector3&lt;/font&gt;::Y
    L_0013: ldarg.0 
    L_0014: ldfld &lt;font color="#008080"&gt;&lt;font color="#0000ff"&gt;float32&lt;/font&gt; &lt;/font&gt;PerformanceTests.&lt;font color="#008080"&gt;Vector3&lt;/font&gt;::Y
    L_0019: mul 
    L_001a: add 
    L_001b: ldarg.0 
    L_001c: ldfld &lt;font color="#0000ff"&gt;float32&lt;/font&gt; PerformanceTests.&lt;font color="#008080"&gt;Vector3&lt;/font&gt;::Z
    L_0021: ldarg.0 
    L_0022: ldfld &lt;font color="#0000ff"&gt;float32&lt;/font&gt; PerformanceTests.&lt;font color="#008080"&gt;Vector3&lt;/font&gt;::Z
    L_0027: mul 
    L_0028: add 
    L_0029: conv.r8 
    L_002a: call &lt;font color="#0000ff"&gt;float64&lt;/font&gt; [mscorlib]System.&lt;font color="#008080"&gt;Math&lt;/font&gt;::Sqrt(&lt;font color="#0000ff"&gt;float64&lt;/font&gt;)
    L_002f: conv.r4 
    L_0030: ret 
}

&lt;font color="#0000ff"&gt;public float&lt;/font&gt; Magnitude() {
    &lt;font color="#0000ff"&gt;fixed&lt;/font&gt; (&lt;font color="#0000ff"&gt;float&lt;/font&gt;* p = V) {
       &lt;font color="#0000ff"&gt;return&lt;/font&gt; (&lt;font color="#0000ff"&gt;float&lt;/font&gt;)&lt;font color="#008080"&gt;Math&lt;/font&gt;.Sqrt(p[0] * p[0] + p[1] * p[1] + p[2] * p[2]);
    }
}

&lt;font color="#000080"&gt;.method public hidebysig instance&lt;/font&gt; &lt;font color="#0000ff"&gt;float32&lt;/font&gt; Magnitude() &lt;font color="#000080"&gt;cil managed&lt;/font&gt;
{
    .&lt;font color="#000080"&gt;maxstack&lt;/font&gt; 4
    .&lt;font color="#000080"&gt;locals init&lt;/font&gt; (
       [0] &lt;font color="#0000ff"&gt;float32&lt;/font&gt;&amp;amp; &lt;font color="#000080"&gt;pinned&lt;/font&gt; singleRef1,
       [1] &lt;font color="#0000ff"&gt;float32 &lt;/font&gt;single1)
    L_0000: ldarg.0 
    L_0001: ldflda PerformanceTests.Unsafe.&lt;font color="#008080"&gt;Vector3&lt;/font&gt;/&lt;font color="#008080"&gt;&amp;lt;V&amp;gt;e__FixedBuffer0&lt;/font&gt; PerformanceTests.Unsafe.&lt;font color="#008080"&gt;Vector3&lt;/font&gt;::V
    L_0006: ldflda &lt;font color="#0000ff"&gt;float32&lt;/font&gt; PerformanceTests.Unsafe.&lt;font color="#008080"&gt;Vector3&lt;/font&gt;/&lt;font color="#008080"&gt;&amp;lt;V&amp;gt;e__FixedBuffer0&lt;/font&gt;::FixedElementField
    L_000b: stloc.0 
    L_000c: ldloc.0 
    L_000d: conv.i 
    L_000e: ldind.r4 
    L_000f: ldloc.0 
    L_0010: conv.i 
    L_0011: ldind.r4 
    L_0012: mul 
    L_0013: ldloc.0 
    L_0014: conv.i 
    L_0015: ldc.i4.4 
    L_0016: conv.i 
    L_0017: add 
    L_0018: ldind.r4 
    L_0019: ldloc.0 
    L_001a: conv.i 
    L_001b: ldc.i4.4 
    L_001c: conv.i 
    L_001d: add 
    L_001e: ldind.r4 
    L_001f: mul 
    L_0020: add 
    L_0021: ldloc.0 
    L_0022: conv.i 
    L_0023: ldc.i4.8 
    L_0024: conv.i 
    L_0025: add 
    L_0026: ldind.r4 
    L_0027: ldloc.0 
    L_0028: conv.i 
    L_0029: ldc.i4.8 
    L_002a: conv.i 
    L_002b: add 
    L_002c: ldind.r4 
    L_002d: mul 
    L_002e: add 
    L_002f: conv.r8 
    L_0030: call &lt;font color="#0000ff"&gt;float64 &lt;/font&gt;[mscorlib]System.&lt;font color="#008080"&gt;Math&lt;/font&gt;::Sqrt(&lt;font color="#0000ff"&gt;float64&lt;/font&gt;)
    L_0035: conv.r4 
    L_0036: stloc.1 
    L_0037: leave.s L_0039
    L_0039: ldloc.1 
    L_003a: ret 
}&lt;/pre&gt;
&lt;p&gt;Note that neither of these two appear to be candidates for inlining, both being well over the 32 byte IL limit. The produced IL, while not directly indicative of the assembly produced by the JIT compiler, does tend to give an overall idea of how much larger we should expect this method to be when reproduced in machine code. Fixed length buffers have other issues that need addressing: You cannot access a fixed length buffer outside of a fixed statement. They are also an unsafe construct, and so you must indicate that the type is unsafe. Finally, they produce temporary types at compilation time that can throw off serialization and other reflection based mechanisms.&lt;/p&gt;
&lt;p&gt;In the end, unsafe code does not increase performance, and the reliance upon platform structures to ensure safety, such as the fixed construct, introduces more problems than it solves. Furthermore, even the smallest method that might be inlined tends to bloat up to the point where inlining by the JIT is no longer possible.&lt;/p&gt;
&lt;h3&gt;Surprising Developments and JIT Optimizations&lt;/h3&gt;
&lt;p&gt;Previously I noted that the JIT compiler can only inline a method that is a maximum of 32 bytes of IL in length. However, I wasn’t completely honest with you. In some cases the JIT compiler will inline chunks of code that are longer than 32 bytes of IL. I have not dug in-depth into the reasons for this, nor when these conditions may arise. As such this information is presented as an informal experimental result. In the case of a function returning the result of an intrinsic operation, there may arise a condition whereby the result is inlined. Two examples of this behavior will be shown, note that in both cases the function used is an intrinsic math function and that neither are passed value types (which will prevent inlining). The first is the Magnitude function, which we saw above. Calling it results in it being inlined and produces the following inlined assembly.&lt;/p&gt;
&lt;pre&gt;00220164 D945D4         fld        dword ptr [ebp-2Ch]
00220167 D8C8           fmul       st,st(0)
00220169 D945D8         fld        dword ptr [ebp-28h]
0022016C D8C8           fmul       st,st(0)
0022016E DEC1           faddp      st(1),st
00220170 D945DC         fld        dword ptr [ebp-24h]
00220173 D8C8           fmul       st,st(0)
00220175 DEC1           faddp      st(1),st
00220177 DD5D9C         fstp       qword ptr [ebp-64h]
0022017A DD459C         fld        qword ptr [ebp-64h]
0022017D D9FA           fsqrt
&lt;/pre&gt;
&lt;p&gt;We note that this is the optimal form for the magnitude function, with a minimal number of memory reads, the majority of the work taking place on the FPU stack. Compared to the unsafe version, which is shown next, you can clearly see how much worse unsafe code is.&lt;/p&gt;
&lt;pre&gt;007A0438 55             push       ebp
007A0439 8BEC           mov        ebp,esp
007A043B 57             push       edi
007A043C 56             push       esi
007A043D 53             push       ebx
007A043E 83EC10         sub        esp,10h
007A0441 33C0           xor        eax,eax
007A0443 8945F0         mov        dword ptr [ebp-10h],eax
007A0446 894DF0         mov        dword ptr [ebp-10h],ecx
007A0449 D901           fld        dword ptr [ecx]
007A044B 8BF1           mov        esi,ecx
007A044D D80E           fmul       dword ptr [esi]
007A044F 8BF9           mov        edi,ecx
007A0451 D94704         fld        dword ptr [edi+4]
007A0454 8BD1           mov        edx,ecx
007A0456 D84A04         fmul       dword ptr [edx+4]
007A0459 DEC1           faddp      st(1),st
007A045B 8BC1           mov        eax,ecx
007A045D D94008         fld        dword ptr [eax+8]
007A0460 8BD8           mov        ebx,eax
007A0462 D84B08         fmul       dword ptr [ebx+8]
007A0465 DEC1           faddp      st(1),st
007A0467 DD5DE4         fstp       qword ptr [ebp-1Ch]
007A046A DD45E4         fld        qword ptr [ebp-1Ch]
007A046D D9FA           fsqrt
007A046F D95DEC         fstp       dword ptr [ebp-14h]
007A0472 D945EC         fld        dword ptr [ebp-14h]
007A0475 8D65F4         lea        esp,[ebp-0Ch]
007A0478 5B             pop        ebx
007A0479 5E             pop        esi
007A047A 5F             pop        edi
007A047B 5D             pop        ebp
007A047C C3             ret&lt;/pre&gt;
&lt;p&gt;Next up is a fairly ubiquitous utility function which obtains the angle between two unit length vectors, note that acos is not directly producible as a machine instruction, none the less it is considered an intrinsic function. As we see below, this produces a nicely optimized set of instructions, with only a single call to a function (which computes acos).&lt;/p&gt;
&lt;pre&gt;&lt;font color="#0000ff"&gt;public static float&lt;/font&gt; AngleBetween(&lt;font color="#0000ff"&gt;ref&lt;/font&gt; &lt;font color="#008080"&gt;Vector3&lt;/font&gt; lhs, &lt;font color="#0000ff"&gt;ref&lt;/font&gt; &lt;font color="#008080"&gt;Vector3&lt;/font&gt; rhs) {
    &lt;font color="#0000ff"&gt;return&lt;/font&gt; (&lt;font color="#0000ff"&gt;float&lt;/font&gt;)&lt;font color="#008080"&gt;Math&lt;/font&gt;.Acos(lhs.X * rhs.X + lhs.Y * rhs.Y + lhs.Z * rhs.Z);
}

&lt;font color="#000080"&gt;.method public hidebysig static&lt;/font&gt; &lt;font color="#0000ff"&gt;float32&lt;/font&gt; AngleBetween(PerformanceTests.&lt;font color="#008080"&gt;Vector3&lt;/font&gt;&amp;amp; lhs, PerformanceTests.&lt;font color="#008080"&gt;Vector3&lt;/font&gt;&amp;amp; rhs) &lt;font color="#000080"&gt;cil managed&lt;/font&gt;
{
     .&lt;font color="#000080"&gt;maxstack&lt;/font&gt; 8
     L_0000: ldarg.0 
     L_0001: ldfld &lt;font color="#0000ff"&gt;float32&lt;/font&gt; PerformanceTests.&lt;font color="#008080"&gt;Vector3&lt;/font&gt;::X
     L_0006: ldarg.1 
     L_0007: ldfld &lt;font color="#0000ff"&gt;float32&lt;/font&gt; PerformanceTests.&lt;font color="#008080"&gt;Vector3&lt;/font&gt;::X
     L_000c: mul 
     L_000d: ldarg.0 
     L_000e: ldfld &lt;font color="#0000ff"&gt;float32&lt;/font&gt; PerformanceTests.&lt;font color="#008080"&gt;Vector3&lt;/font&gt;::Y
     L_0013: ldarg.1 
     L_0014: ldfld &lt;font color="#0000ff"&gt;float32&lt;/font&gt; PerformanceTests.&lt;font color="#008080"&gt;Vector3&lt;/font&gt;::Y
     L_0019: mul 
     L_001a: add 
     L_001b: ldarg.0 
     L_001c: ldfld &lt;font color="#0000ff"&gt;float32&lt;/font&gt; PerformanceTests.&lt;font color="#008080"&gt;Vector3&lt;/font&gt;::Z
     L_0021: ldarg.1 
     L_0022: ldfld &lt;font color="#0000ff"&gt;float32&lt;/font&gt; PerformanceTests.&lt;font color="#008080"&gt;Vector3&lt;/font&gt;::Z
     L_0027: mul 
     L_0028: add 
     L_0029: conv.r8 
     L_002a: call &lt;font color="#0000ff"&gt;float64&lt;/font&gt; [mscorlib]System.&lt;font color="#008080"&gt;Math&lt;/font&gt;::Acos(&lt;font color="#0000ff"&gt;float64&lt;/font&gt;)
     L_002f: conv.r4 
     L_0030: ret 
}

007A01D9 8D55D4          lea        edx,[ebp-2Ch]
007A01DC 8D4DC8          lea        ecx,[ebp-38h]
007A01DF D902            fld        dword ptr [edx]
007A01E1 D809            fmul       dword ptr [ecx]
007A01E3 D94204          fld        dword ptr [edx+4]
007A01E6 D84904          fmul       dword ptr [ecx+4]
007A01E9 DEC1            faddp      st(1),st
007A01EB D94208          fld        dword ptr [edx+8]
007A01EE D84908          fmul       dword ptr [ecx+8]
007A01F1 DEC1            faddp      st(1),st
007A01F3 83EC08          sub        esp,8
007A01F6 DD1C24          fstp       qword ptr [esp]
007A01F9 E868A5AF79      call       7A29A766 (System.&lt;font color="#008080"&gt;Math&lt;/font&gt;.Acos(&lt;font color="#0000ff"&gt;Double&lt;/font&gt;), mdToken: 06000b28)&lt;/pre&gt;
&lt;p&gt;Finally there is the issue of SIMD instruction sets. While the JIT will not use SIMD instructions on the x86 platform, it will utilize them for other operations. One common operation you see is the conversion of floating point numbers to integers. In .NET 2.0 the JIT will optimize this to use the SSE2 instruction. For instance, the following snippet of code will result in the assembly dump following.&lt;/p&gt;
&lt;pre&gt;&lt;font color="#0000ff"&gt;int&lt;/font&gt; n = (&lt;font color="#0000ff"&gt;int&lt;/font&gt;)r.NextDouble();

002A02FB 8BCB            mov        ecx,ebx
002A02FD 8B01            mov        eax,dword ptr [ecx]
002A02FF FF5048          call       dword ptr [eax+48h]
002A0302 DD5DA0          fstp       qword ptr [ebp-60h]
002A0305 F20F1045A0      movsd      xmm0,mmword ptr [ebp-60h]
002A030A F20F2CF0        cvttsd2si  esi,xmm0&lt;/pre&gt;
&lt;p&gt;While not quite as optimal as it could be if the JIT were using the full SSE2 instruction set, this minor optimization can go a long way.&lt;/p&gt;
&lt;p&gt;So what is left to visit? Well, there's obviously the x64 platform, which is growing in popularity. The x64 platform presents new opportunities to explore, including certain guarantees and performance benefits that aren’t available on the x86 platform. Amongst them are a whole new set of optimizations and available instruction sets that the JIT can take advantage of. Finally there is the case of calling to unmanaged code for highly performance intensive operations. Hand optimized SIMD code and the potential performance benefits or hazards calling to an unmanaged function can incur.&lt;/p&gt;&lt;img src="http://scapecode.com/aggbug/10.aspx" width="1" height="1" /&gt;</description>
            <dc:creator>Washu</dc:creator>
            <guid>http://scapecode.com/archive/2007/02/26/10.aspx</guid>
            <pubDate>Tue, 27 Feb 2007 01:44:08 GMT</pubDate>
            <wfw:comment>http://scapecode.com/comments/10.aspx</wfw:comment>
            <comments>http://scapecode.com/archive/2007/02/26/10.aspx#feedback</comments>
            <slash:comments>5</slash:comments>
            <wfw:commentRss>http://scapecode.com/comments/commentRss/10.aspx</wfw:commentRss>
            <trackback:ping>http://scapecode.com/services/trackbacks/10.aspx</trackback:ping>
        </item>
        <item>
            <title>Playing With The .NET JIT</title>
            <link>http://scapecode.com/archive/2007/02/23/9.aspx</link>
            <description>&lt;h3&gt;Introduction&lt;/h3&gt;
&lt;p&gt;.NET has been getting some interesting press recently. Even to the point where an &lt;a href="http://gamedeveloper.texterity.com/gamedeveloper/sample/?pg=41"&gt;article&lt;/a&gt; in Game Developer Magazine was published advocating the usage of managed code for rapid development of components. However, I did &lt;a href="http://cowboyprogramming.com/2007/01/05/blob-physics/trackback/"&gt;raise some issues&lt;/a&gt; with the author in regards to the performance metric he used. Thus it is that I have decided to cover some issue with .NET performance, future benefits, and hopefully even a few solutions to some of the problems I'll be posing.&lt;/p&gt;
&lt;p&gt;Ultimately the performance of your application will be determined by the algorithms and data-structures you use . No amount of micro-optimization can hope to account for the huge performance differences that can crop up between different choices of algorithms. Thus the most important tool you can have in your arsenal is a decent profiler. Thankfully there are many good profilers available for the .NET platform. Some of the profiling tools are specific to certain areas of managed coding, such as the &lt;a href="http://www.microsoft.com/downloads/details.aspx?FamilyId=A362781C-3870-43BE-8926-862B40AA0CD0&amp;amp;displaylang=en"&gt;CLR Profiler&lt;/a&gt;, which is useful for profiling the allocation patterns of your managed application. Others, like &lt;a href="http://www.compuware.com/products/devpartner/"&gt;DevPartner&lt;/a&gt;, allow you to profile the entire application, identifying performance bottlenecks in both managed and unmanaged code. Finally there are the low level profiling tools, such as the SOS Debugging Tools, these tools give you extremely detailed information about the performance of your systems but are hard to use.&lt;/p&gt;
&lt;p&gt;Applications designed and built towards a managed platform tend to have different design decisions behind them than unmanaged applications. Even such fundamental things as memory allocation patterns are usually quite a bit different. With object lifetimes being non-deterministic, one has to apply different patterns to ensure the timely release of resources. Allocation patterns are also different, partly due to the inability to allocate objects on the stack, but also due to the ease of allocation on the managed heap. Allocating on an unmanaged heap typically requires a heap walk to find a block of free space that is at least the size of the block requested. The managed allocator typically allocates at the end of the heap, resulting in significantly faster allocation times (constant time, for the most part). These changes to the underlying assumptions that drive the system typically have large sweeping changes on the overall design of the systems.&lt;/p&gt;
&lt;h3&gt;Future Developments&lt;/h3&gt;
&lt;p&gt;Theoretically a JIT compiler can outperform a standard compiler simply because it can target the platform in ways that traditional compilation cannot. Traditionally, to target different instruction sets, you would have to compile a binary for each instruction set. For instance, targeting SSE2 would require you to build a separate binary from that of your non-SSE2 branch. You could, of course, do this through the use of DLLs, or by custom writing your SSE2 code and using function pointers to dictate which branch to chose.&lt;/p&gt;
&lt;p&gt;Hand written SIMD code is often faster than compiler generated SIMD, due to the ability to manually vectorize the data thus enabling for true SIMD to take place. Some compilers, like the &lt;a href="http://www.intel.com/cd/software/products/asmo-na/eng/compilers/279578.htm"&gt;Intel C++ Compiler&lt;/a&gt; can perform automatic vectorization. However it is unable to guarantee the accuracy of the resulting binary and extensive testing typically has to be done in order to ensure that the functionality was correctly generated. While most compilers have the option to target SIMD instruction sets, they usually use it to replace standard floating point operations where they can, as the single based SIMD instructions are generally faster than their FPU counterparts.&lt;/p&gt;
&lt;p&gt;The JIT compiler could target any SIMD instruction set supported by its platform, along with any other hardware specific optimizations it knew about. While automatic vectorization is not likely to be in a JIT release anytime soon, even using the non-vectorized SIMD instruction sets can help to parallelize your processing. As an example, multiple independent SIMD operations can typically run in parallel (that is, an add and a multiplication could both run simultaneously). Furthermore, the JIT can allow any .NET application to target any system it supports, provided the libraries it uses are also available on that system. This means that, provided you aren't doing anything highly non-portable such as assuming that a pointer is 32bits..., your application could be JIT compiled to target a 64 bit compiler and run natively that way.&lt;/p&gt;
&lt;p&gt;Another area of potential advancement includes the realm of Profile Guided Optimization. Currently POGO is restricted to the arena of unmanaged applications, as it requires the ability to generate raw machine code and to perform instruction reordering. In essence you instrument an application with a POGO profiler; then you use the application normally to allow the profiler to collect usage data and to find the hotspots. Finally you run the optimizer on the solution, which will rebuild the solution, using the profiling data it gathered to optimize the heavily utilized sections of your application. A JIT compiler could instrument a managed program on first launch and watch its usage, while in another thread it could be optimizing the machine code using the profiling data that it gathers. The resulting cached binary image would be optimized on the next launch (excepting those areas that had not been accessed, and thus the JIT hadn't compiled yet). This would be especially effective on systems with multiple cores.&lt;/p&gt;
&lt;h3&gt;JIT Compilation for the x86&lt;/h3&gt;
&lt;p&gt;The JIT compiler for the x86 platform, as of .NET 2.0, does not support SIMD instruction sets. It will generate occasional MMX or SSE instructions for some integral and floating point promotions, but otherwise it will not utilize SIMD instruction sets. Inlining poses its own problems for the JIT compiler. Currently the JIT compiler will only inline functions that are 32 bytes of IL or smaller. Because the JIT compiler runs in an extremely tight time constraint, it is forced to make sacrifices in the optimizations it can make. Inlining is typically an expensive operation because it requires shuffling around the addresses of everything that comes after the inlined code (which requires interpreting the IL, then determining if its address is before or after the inlined code, then making the appropriate adjustments…). Because of this, all but the smallest of methods will not be inlined. Here’s a sample of a method that will not be inlined, and the IL that accompanies it:&lt;/p&gt;
&lt;pre&gt;&lt;font color="#0000ff"&gt;public float&lt;/font&gt; SquareMagnitude() {
    &lt;font color="#0000ff"&gt;return&lt;/font&gt; X * X + Y * Y + Z * Z;
}

&lt;font color="#000080"&gt;.method public hidebysig instance&lt;/font&gt; &lt;font color="#0000ff"&gt;float32&lt;/font&gt; SquareMagnitude() &lt;font color="#000080"&gt;cil managed&lt;/font&gt;
{
    &lt;font color="#003366"&gt;.maxstack&lt;/font&gt; 8
    L_0001: ldfld &lt;font color="#0000ff"&gt;float32&lt;/font&gt; Performance_Tests.&lt;font color="#008080"&gt;Vector3&lt;/font&gt;::X
    L_0006: ldarg.0 
    L_0007: ldfld &lt;font color="#0000ff"&gt;float32&lt;/font&gt; Performance_Tests.&lt;font color="#008080"&gt;Vector3&lt;/font&gt;::X
    L_000c: mul 
    L_000d: ldarg.0 
    L_000e: ldfld &lt;font color="#0000ff"&gt;float32&lt;/font&gt; Performance_Tests.&lt;font color="#008080"&gt;Vector3&lt;/font&gt;::Y
    L_0013: ldarg.0 
    L_0014: ldfld &lt;font color="#0000ff"&gt;float32&lt;/font&gt; Performance_Tests.&lt;font color="#008080"&gt;Vector3&lt;/font&gt;::Y
    L_0019: mul 
    L_001a: add 
    L_001b: ldarg.0 
    L_001c: ldfld &lt;font color="#0000ff"&gt;float32&lt;/font&gt; Performance_Tests.&lt;font color="#008080"&gt;Vector3&lt;/font&gt;::Z
    L_0021: ldarg.0 
    L_0022: ldfld &lt;font color="#0000ff"&gt;float32&lt;/font&gt; Performance_Tests.&lt;font color="#008080"&gt;Vector3&lt;/font&gt;::Z
    L_0027: mul 
    L_0028: add 
    L_0029: ret 
}&lt;/pre&gt;
&lt;p&gt;This method, as you can tell, is 42 bytes long, counting the return instruction. Clearly this is over the 32 byte IL limit. However, the resulting assembly compiles down to less than 25 bytes:&lt;/p&gt;
&lt;pre&gt;002802C0 D901             fld         dword ptr [ecx]
002802C2 D9C0             fld         st(0)
002802C4 DEC9             fmulp       st(1),st
002802C6 D94104           fld         dword ptr [ecx+4]
002802C9 D9C0             fld         st(0)
002802CB DEC9             fmulp       st(1),st
002802CD DEC1             faddp       st(1),st
002802CF D94108           fld         dword ptr [ecx+8]
002802D2 D9C0             fld         st(0)
002802D4 DEC9             fmulp       st(1),st
002802D6 DEC1             faddp       st(1),st
002802D8 C3               ret&lt;/pre&gt;
&lt;p&gt;Methods that use this one though, like the Magnitude method, may be candidates for inlining however. Which typically reduces to a call to the SquareMagnitude method and a fsqrt call.&lt;/p&gt;
&lt;p&gt;Another area where the JIT has issues deals with value-types and inlining. Methods that take value-type parameters are not currently considered for inlining. There is a fix in the pipe for this, as it is considered a bug. An example of this behavior can be seen in the following example function, which although far below the 32 bytes of IL limit, will not be inlined.&lt;/p&gt;
&lt;pre&gt;&lt;font color="#0000ff"&gt;static float &lt;/font&gt;WillNotInline32(&lt;font color="#0000ff"&gt;float&lt;/font&gt; f) {
    &lt;font color="#0000ff"&gt;return&lt;/font&gt; f * f;
}

&lt;font color="#003366"&gt;.method private hidebysig static&lt;/font&gt; &lt;font color="#0000ff"&gt;float32&lt;/font&gt; WillNotInline32(&lt;font color="#0000ff"&gt;float32&lt;/font&gt; f) &lt;font color="#000080"&gt;cil managed&lt;/font&gt;
{
    &lt;font color="#000080"&gt;.maxstack&lt;/font&gt;&lt;font color="#000000"&gt; 8&lt;/font&gt;
    L_0000: ldarg.0 
    L_0001: ldarg.0 
    L_0002: mul 
    L_0003: ret 
}&lt;/pre&gt;
&lt;p&gt;The resulting call to this function and the assembly code of the function looks as follows&lt;/p&gt;
&lt;pre&gt;0087008F FF75F4           push        dword ptr [ebp-0Ch]
00870092 FF154C302A00     call        dword ptr ds:[002A304Ch]
----
003F01F8 D9442404         fld         dword ptr [esp+4]
003F01FC DCC8             fmul        st(0),st
003F01FE C20400           ret         4&lt;/pre&gt;
&lt;p&gt;Clearly the x86 JIT requires a lot more work before it will be able to produce machine code approaching that of a good optimizing compiler. However, the news isn’t all grim. Interop between .NET and unmanaged code allows for you to write those methods that need to be highly optimized in a lower level language.&lt;/p&gt;&lt;img src="http://scapecode.com/aggbug/9.aspx" width="1" height="1" /&gt;</description>
            <dc:creator>Washu</dc:creator>
            <guid>http://scapecode.com/archive/2007/02/23/9.aspx</guid>
            <pubDate>Sat, 24 Feb 2007 02:55:08 GMT</pubDate>
            <wfw:comment>http://scapecode.com/comments/9.aspx</wfw:comment>
            <comments>http://scapecode.com/archive/2007/02/23/9.aspx#feedback</comments>
            <slash:comments>1</slash:comments>
            <wfw:commentRss>http://scapecode.com/comments/commentRss/9.aspx</wfw:commentRss>
            <trackback:ping>http://scapecode.com/services/trackbacks/9.aspx</trackback:ping>
        </item>
        <item>
            <title>Playing With Template Meta-Programming Part 2</title>
            <link>http://scapecode.com/archive/2007/02/09/Playing-With-Template-MetaProgramming-Part-2.aspx</link>
            <description>&lt;p&gt;Previously I covered the topic of SFINAE as a means of using compile-time information provided by the C++ language’s overload resolution mechanism to make decisions at compile-time. In the context of TR1, specifically the &lt;font face="Courier New" color="#008080"&gt;reference_wrapper&lt;/font&gt; class, we can use SFINAE as a means of determining various properties that the reference_wrapper class is required to provide depending on the type being wrapped.&lt;/p&gt;
&lt;p&gt;&lt;font face="Courier New" color="#008080"&gt;reference_wrapper&lt;/font&gt; is a parameterized wrapper around a reference to a type T. It is a copy constructible and assignable class, thus enabling it to be changed even after creation, unlike standard references. &lt;font face="Courier New" color="#008080"&gt;reference_wrapper&lt;/font&gt; is also defined as inheriting from &lt;font face="Courier New"&gt;std::&lt;font color="#008080"&gt;u&lt;/font&gt;&lt;font color="#008080"&gt;nary_function&lt;/font&gt;&lt;/font&gt; or &lt;font face="Courier New"&gt;std::&lt;font color="#008080"&gt;binary_function&lt;/font&gt;&lt;/font&gt;, depending on if particular conditions are met (which will be detailed below). It also has what is known as a weak &lt;font face="Courier New" color="#008080"&gt;result_type&lt;/font&gt;; ergo the &lt;font face="Courier New" color="#008080"&gt;result_type&lt;/font&gt; is defined only if the type &lt;font face="Courier New" color="#008080"&gt;T&lt;/font&gt; is a function, reference to a function, pointer to a function type, member function pointer, or a class type with a &lt;font face="Courier New" color="#008080"&gt;result_type&lt;/font&gt; type member. In all other cases, &lt;font face="Courier New" color="#008080"&gt;result_type&lt;/font&gt; will not be defined.&lt;/p&gt;
&lt;h3&gt;Conditional Inheritance&lt;/h3&gt;
&lt;p&gt;The &lt;font face="Courier New"&gt;&lt;font color="#008080"&gt;reference_wrapper&lt;/font&gt;&amp;lt;&lt;font color="#008080"&gt;T&lt;/font&gt;&amp;gt;&lt;/font&gt;&lt;t&gt;&lt;/t&gt; class is defined as inheriting from &lt;font face="Courier New"&gt;std::&lt;font color="#008080"&gt;unary_function&lt;/font&gt;&lt;/font&gt; if the following conditions are met:&lt;/p&gt;
&lt;ul&gt;
    &lt;li&gt;If the type &lt;font face="Courier New" color="#008080"&gt;T&lt;/font&gt; is a function pointer or a function type that takes only one argument, hereafter called &lt;font face="Courier New" color="#008080"&gt;T1&lt;/font&gt;, and returning a result, hereafter called &lt;font face="Courier New" color="#008080"&gt;R&lt;/font&gt;. &lt;/li&gt;
    &lt;li&gt;If the type &lt;font face="Courier New" color="#008080"&gt;T&lt;/font&gt; is a possibly cv-qualified member function pointer to a member function such that &lt;font face="Courier New" color="#008080"&gt;T1&lt;font face="Arial"&gt; &lt;/font&gt;&lt;/font&gt;is defined as &lt;font face="Courier New"&gt;&lt;em&gt;cv&lt;/em&gt; &lt;font color="#008080"&gt;T&lt;/font&gt;*&lt;/font&gt; taking no arguments, and returning a result &lt;font face="Courier New" color="#008080"&gt;R&lt;/font&gt;. &lt;/li&gt;
    &lt;li&gt;If the type T is derived from &lt;font face="Courier New"&gt;std::&lt;font color="#008080"&gt;unary_function&lt;/font&gt;&amp;lt;&lt;font color="#008080"&gt;T1&lt;/font&gt;, &lt;font color="#008080"&gt;R&lt;/font&gt;&amp;gt;&lt;/font&gt;. &lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Obviously the first two of these conditions are fairly easy to detect, using template specialization, however the last one requires a bit more work to detect, since we need to detect if &lt;font face="Courier New" color="#008080"&gt;T&lt;/font&gt; has derived from a templated class. For our purposes we’ll define a template-structure named &lt;font face="Courier New" color="#008080"&gt;is_unary_function&lt;/font&gt; which will be used as our base class for determining if we should inherit from &lt;font face="Courier New"&gt;std::&lt;font color="#008080"&gt;unary_function&lt;/font&gt;&lt;/font&gt;.&lt;/p&gt;
&lt;p&gt;Clearly the first step should be to detect the simplest cases first, that of the unary function type and unary function pointer type, so let us declare our template class type, and specialize it for the two function cases.&lt;/p&gt;
&lt;pre&gt;&lt;font color="#0000ff"&gt;template&lt;/font&gt;&amp;lt;&lt;font color="#0000ff"&gt;class&lt;/font&gt; T&amp;gt; &lt;font color="#0000ff"&gt;struct&lt;/font&gt; &lt;font color="#008080"&gt;is_unary_function&lt;/font&gt;;&lt;br /&gt;&lt;font color="#0000ff"&gt;template&lt;/font&gt;&amp;lt;&lt;font color="#0000ff"&gt;class&lt;/font&gt; &lt;font color="#008080"&gt;Arg&lt;/font&gt;, &lt;font color="#0000ff"&gt;class&lt;/font&gt; &lt;font color="#008080"&gt;Result&lt;/font&gt;&amp;gt;&lt;br /&gt;&lt;font color="#0000ff"&gt;struct&lt;/font&gt; &lt;font color="#008080"&gt;is_unary_function&lt;/font&gt;&amp;lt;&lt;font color="#008080"&gt;Result &lt;/font&gt;(&lt;font color="#008080"&gt;Arg&lt;/font&gt;)&amp;gt; : &lt;font color="#008080"&gt;true_type&lt;/font&gt; {&lt;br /&gt;    &lt;font color="#0000ff"&gt;typedef&lt;/font&gt; std::&lt;font color="#008080"&gt;unary_function&lt;/font&gt;&amp;lt;&lt;font color="#008080"&gt;Arg&lt;/font&gt;, &lt;font color="#008080"&gt;Result&lt;/font&gt;&amp;gt; &lt;font color="#008080"&gt;unary_function_type&lt;/font&gt;;&lt;br /&gt;};&lt;br /&gt;&lt;font color="#0000ff"&gt;template&lt;/font&gt;&amp;lt;&lt;font color="#0000ff"&gt;class&lt;/font&gt; &lt;font color="#008080"&gt;Arg&lt;/font&gt;, &lt;font color="#0000ff"&gt;class&lt;/font&gt; &lt;font color="#008080"&gt;Result&lt;/font&gt;&amp;gt;&lt;br /&gt;&lt;font color="#0000ff"&gt;struct&lt;/font&gt; &lt;font color="#008080"&gt;is_unary_function&lt;/font&gt;&amp;lt;&lt;font color="#008080"&gt;Result &lt;/font&gt;(*)(&lt;font color="#008080"&gt;Arg&lt;/font&gt;)&amp;gt; : &lt;font color="#008080"&gt;true_type&lt;/font&gt; {&lt;br /&gt;    &lt;font color="#0000ff"&gt;typedef&lt;/font&gt; std::&lt;font color="#008080"&gt;unary_function&lt;/font&gt;&amp;lt;&lt;font color="#008080"&gt;Arg&lt;/font&gt;, &lt;font color="#008080"&gt;Result&lt;/font&gt;&amp;gt; &lt;font color="#008080"&gt;unary_function_type&lt;/font&gt;;&lt;br /&gt;};&lt;/pre&gt;
&lt;p&gt;The next step should be to specialize the &lt;font face="Courier New" color="#008080"&gt;is_unary_function&lt;/font&gt; template for pointer to member function types. However, we must also specialize it for the various cv-qualified member function pointer types that exist. There are four such types, non-cv-qualified, &lt;font face="Courier New" color="#0000ff"&gt;const&lt;/font&gt; qualified, &lt;font face="Courier New" color="#0000ff"&gt;volatile&lt;/font&gt; qualified, and &lt;font face="Courier New" color="#0000ff"&gt;const volatile &lt;/font&gt;qualified.&lt;/p&gt;
&lt;pre&gt;&lt;font color="#0000ff"&gt;template&lt;/font&gt;&amp;lt;&lt;font color="#0000ff"&gt;class&lt;/font&gt; &lt;font color="#008080"&gt;Class&lt;/font&gt;, &lt;font color="#0000ff"&gt;class&lt;/font&gt; &lt;font color="#008080"&gt;Result&lt;/font&gt;&amp;gt;&lt;br /&gt;&lt;font color="#0000ff"&gt;struct&lt;/font&gt; &lt;font color="#008080"&gt;is_unary_function&lt;/font&gt;&amp;lt;&lt;font color="#008080"&gt;Result &lt;/font&gt;(&lt;font color="#008080"&gt;Class&lt;/font&gt;::*)()&amp;gt; : &lt;font color="#008080"&gt;true_type&lt;/font&gt; {&lt;br /&gt;    &lt;font color="#0000ff"&gt;typedef&lt;/font&gt; std::&lt;font color="#008080"&gt;unary_function&lt;/font&gt;&amp;lt;&lt;font color="#008080"&gt;Class&lt;/font&gt;*, &lt;font color="#008080"&gt;Result&lt;/font&gt;&amp;gt; &lt;font color="#008080"&gt;unary_function_type&lt;/font&gt;;&lt;br /&gt;};&lt;br /&gt;&lt;font color="#0000ff"&gt;template&lt;/font&gt;&amp;lt;&lt;font color="#0000ff"&gt;class&lt;/font&gt; &lt;font color="#008080"&gt;Class&lt;/font&gt;, &lt;font color="#0000ff"&gt;class&lt;/font&gt; &lt;font color="#008080"&gt;Result&lt;/font&gt;&amp;gt;&lt;br /&gt;&lt;font color="#0000ff"&gt;struct&lt;/font&gt; &lt;font color="#008080"&gt;is_unary_function&lt;/font&gt;&amp;lt;&lt;font color="#008080"&gt;Result &lt;/font&gt;(&lt;font color="#008080"&gt;Class&lt;/font&gt;::* &lt;font color="#0000ff"&gt;const&lt;/font&gt;)()&amp;gt; : &lt;font color="#008080"&gt;true_type&lt;/font&gt; {&lt;br /&gt;    &lt;font color="#0000ff"&gt;typedef&lt;/font&gt; std::&lt;font color="#008080"&gt;unary_function&lt;/font&gt;&amp;lt;&lt;font color="#008080"&gt;Class&lt;/font&gt; &lt;font color="#0000ff"&gt;const&lt;/font&gt;*, &lt;font color="#008080"&gt;Result&lt;/font&gt;&amp;gt; &lt;font color="#008080"&gt;unary_function_type&lt;/font&gt;;&lt;br /&gt;};&lt;br /&gt;&lt;font color="#0000ff"&gt;template&lt;/font&gt;&amp;lt;&lt;font color="#0000ff"&gt;class&lt;/font&gt; &lt;font color="#008080"&gt;Class&lt;/font&gt;, &lt;font color="#0000ff"&gt;class&lt;/font&gt; &lt;font color="#008080"&gt;Result&lt;/font&gt;&amp;gt;&lt;br /&gt;&lt;font color="#0000ff"&gt;struct&lt;/font&gt; &lt;font color="#008080"&gt;is_unary_function&lt;/font&gt;&amp;lt;&lt;font color="#008080"&gt;Result &lt;/font&gt;(&lt;font color="#008080"&gt;Class&lt;/font&gt;::* &lt;font color="#0000ff"&gt;volatile&lt;/font&gt;)()&amp;gt; : &lt;font color="#008080"&gt;true_type&lt;/font&gt; {&lt;br /&gt;    &lt;font color="#0000ff"&gt;typedef&lt;/font&gt; std::&lt;font color="#008080"&gt;unary_function&lt;/font&gt;&amp;lt;&lt;font color="#008080"&gt;Class&lt;/font&gt; &lt;font color="#0000ff"&gt;volatile&lt;/font&gt;*, &lt;font color="#008080"&gt;Result&lt;/font&gt;&amp;gt; &lt;font color="#008080"&gt;unary_function_type&lt;/font&gt;;&lt;br /&gt;};&lt;br /&gt;&lt;font color="#0000ff"&gt;template&lt;/font&gt;&amp;lt;&lt;font color="#0000ff"&gt;class&lt;/font&gt; &lt;font color="#008080"&gt;Class&lt;/font&gt;, &lt;font color="#0000ff"&gt;class&lt;/font&gt; &lt;font color="#008080"&gt;Result&lt;/font&gt;&amp;gt;&lt;br /&gt;&lt;font color="#0000ff"&gt;struct&lt;/font&gt; &lt;font color="#008080"&gt;is_unary_function&lt;/font&gt;&amp;lt;&lt;font color="#008080"&gt;Result &lt;/font&gt;(&lt;font color="#008080"&gt;Class&lt;/font&gt;::* &lt;font color="#0000ff"&gt;const volatile&lt;/font&gt;)()&amp;gt; : &lt;font color="#008080"&gt;true_type&lt;/font&gt; {&lt;br /&gt;    &lt;font color="#0000ff"&gt;typedef&lt;/font&gt; std::&lt;font color="#008080"&gt;unary_function&lt;/font&gt;&amp;lt;&lt;font color="#008080"&gt;Class &lt;/font&gt;&lt;font color="#0000ff"&gt;const volatile&lt;/font&gt;*, &lt;font color="#008080"&gt;Result&lt;/font&gt;&amp;gt; &lt;font color="#008080"&gt;unary_function_type&lt;/font&gt;;&lt;br /&gt;};&lt;/pre&gt;
&lt;p&gt;The final step is to identify types that inherit from &lt;font face="Courier New"&gt;std::&lt;font color="#008080"&gt;unary_function&lt;/font&gt;&amp;lt;&lt;font color="#008080"&gt;T1&lt;/font&gt;, &lt;font color="#008080"&gt;R&lt;/font&gt;&amp;gt;&lt;/font&gt;. For this purpose, we will use SFINAE, but we'll also require another template to correctly build the &lt;font face="Courier New" color="#008080"&gt;unary_function_type&lt;/font&gt; &lt;font face="Courier New" color="#0000ff"&gt;typedef&lt;/font&gt;.&lt;/p&gt;
&lt;pre&gt;&lt;font color="#0000ff"&gt;template&lt;/font&gt;&amp;lt;&lt;font color="#0000ff"&gt;class&lt;/font&gt; &lt;font color="#008080"&gt;T&lt;/font&gt;&amp;gt;&lt;br /&gt;&lt;font color="#0000ff"&gt;struct&lt;/font&gt; &lt;font color="#008080"&gt;has_unary_base&lt;/font&gt; {&lt;br /&gt;&lt;font color="#0000ff"&gt;private&lt;/font&gt;:&lt;br /&gt;    &lt;font color="#0000ff"&gt;template&lt;/font&gt;&amp;lt;&lt;font color="#0000ff"&gt;class&lt;/font&gt; &lt;font color="#008080"&gt;Arg&lt;/font&gt;, &lt;font color="#0000ff"&gt;class&lt;/font&gt; &lt;font color="#008080"&gt;Result&lt;/font&gt;&amp;gt;&lt;br /&gt;    &lt;font color="#0000ff"&gt;static&lt;/font&gt; &lt;font color="#008080"&gt;sfinae_types&lt;/font&gt;::&lt;font color="#008080"&gt;one&lt;/font&gt; test(std::&lt;font color="#008080"&gt;unary_function&lt;/font&gt;&amp;lt;&lt;font color="#008080"&gt;Arg&lt;/font&gt;, &lt;font color="#008080"&gt;Result&lt;/font&gt;&amp;gt;*);&lt;br /&gt;    &lt;font color="#0000ff"&gt;static&lt;/font&gt; &lt;font color="#008080"&gt;sfinae_types&lt;/font&gt;::&lt;font color="#008080"&gt;two&lt;/font&gt; test(...);&lt;br /&gt;&lt;font color="#0000ff"&gt;public&lt;/font&gt;:&lt;br /&gt;    &lt;font color="#0000ff"&gt;static const bool&lt;/font&gt; value = &lt;font color="#0000ff"&gt;sizeof&lt;/font&gt;(test((&lt;font color="#008080"&gt;T&lt;/font&gt;*)&lt;font color="#800000"&gt;0&lt;/font&gt;)) == &lt;font color="#0000ff"&gt;sizeof&lt;/font&gt;(&lt;font color="#008080"&gt;sfinae_types&lt;/font&gt;::&lt;font color="#008080"&gt;one&lt;/font&gt;);&lt;br /&gt;};&lt;br /&gt;&lt;br /&gt;&lt;font color="#0000ff"&gt;template&lt;/font&gt;&amp;lt;&lt;font color="#0000ff"&gt;class&lt;/font&gt; &lt;font color="#008080"&gt;T&lt;/font&gt;, &lt;font color="#0000ff"&gt;bool&lt;/font&gt; B&amp;gt; &lt;font color="#0000ff"&gt;struct&lt;/font&gt; &lt;font color="#008080"&gt;unary_base_typedef&lt;/font&gt; {&lt;br /&gt;    &lt;font color="#0000ff"&gt;typedef&lt;/font&gt; &lt;font color="#008080"&gt;empty unary_function_type&lt;/font&gt;;&lt;br /&gt;};&lt;br /&gt;&lt;font color="#0000ff"&gt;template&lt;/font&gt;&amp;lt;&lt;font color="#0000ff"&gt;class&lt;/font&gt; &lt;font color="#008080"&gt;T&lt;/font&gt;&amp;gt; &lt;font color="#0000ff"&gt;struct&lt;/font&gt; &lt;font color="#008080"&gt;unary_base_typedef&lt;/font&gt;&amp;lt;&lt;font color="#008080"&gt;T&lt;/font&gt;, &lt;font color="#0000ff"&gt;true&lt;/font&gt;&amp;gt; {&lt;br /&gt;    &lt;font color="#0000ff"&gt;typedef &lt;font color="#000000"&gt;std::&lt;/font&gt;&lt;font color="#008080"&gt;unary_function&lt;/font&gt;&lt;font color="#000000"&gt;&amp;lt;&lt;br /&gt;        &lt;/font&gt;typename &lt;font color="#008080"&gt;T&lt;/font&gt;&lt;font color="#000000"&gt;::&lt;/font&gt;&lt;font color="#008080"&gt;argument_type&lt;/font&gt;,&lt;br /&gt;        typename &lt;font color="#008080"&gt;T&lt;/font&gt;&lt;font color="#000000"&gt;::&lt;/font&gt;&lt;/font&gt;&lt;font color="#008080"&gt;result_type&lt;/font&gt;&lt;font color="#000000"&gt;&amp;gt;&lt;/font&gt; &lt;font color="#008080"&gt;unary_function_type&lt;/font&gt;;&lt;br /&gt;};&lt;br /&gt;&lt;br /&gt;&lt;font color="#0000ff"&gt;template&lt;/font&gt;&amp;lt;&lt;font color="#0000ff"&gt;class&lt;/font&gt; &lt;font color="#008080"&gt;T&lt;/font&gt;&amp;gt;&lt;br /&gt;&lt;font color="#0000ff"&gt;struct&lt;/font&gt; &lt;font color="#008080"&gt;is_unary_function&lt;/font&gt; : &lt;font color="#008080"&gt;integral_type&lt;/font&gt;&amp;lt;&lt;font color="#0000ff"&gt;bool&lt;/font&gt;, &lt;font color="#008080"&gt;has_unary_base&lt;/font&gt;&amp;lt;&lt;font color="#008080"&gt;T&lt;/font&gt;&amp;gt;::value&amp;gt; {&lt;br /&gt;    &lt;font color="#0000ff"&gt;typedef typename&lt;/font&gt; &lt;font color="#008080"&gt;unary_base_typedef&lt;/font&gt;&amp;lt;&lt;font color="#008080"&gt;T&lt;/font&gt;&amp;gt;::&lt;font color="#008080"&gt;unary_function_type &lt;font color="#008080"&gt;unary_function_type;&lt;/font&gt;&lt;/font&gt;&lt;br /&gt;};&lt;/pre&gt;
&lt;p&gt;Clearly this template serves a dual purpose. It can be used to both determine if an object is a unary function type, and it also provides us with the appropriate typedef to inherit from &lt;font face="Courier New"&gt;std::&lt;font color="#008080"&gt;unary_function&lt;/font&gt;&lt;/font&gt; if that is the case. A similar method works for the &lt;font face="Courier New"&gt;std::&lt;font color="#008080"&gt;binary_function&lt;/font&gt;&lt;/font&gt; type, which also has similar requirements as the unary function type.&lt;/p&gt;&lt;img src="http://scapecode.com/aggbug/8.aspx" width="1" height="1" /&gt;</description>
            <dc:creator>Washu</dc:creator>
            <guid>http://scapecode.com/archive/2007/02/09/Playing-With-Template-MetaProgramming-Part-2.aspx</guid>
            <pubDate>Fri, 09 Feb 2007 22:45:06 GMT</pubDate>
            <wfw:comment>http://scapecode.com/comments/8.aspx</wfw:comment>
            <comments>http://scapecode.com/archive/2007/02/09/Playing-With-Template-MetaProgramming-Part-2.aspx#feedback</comments>
            <slash:comments>2</slash:comments>
            <wfw:commentRss>http://scapecode.com/comments/commentRss/8.aspx</wfw:commentRss>
            <trackback:ping>http://scapecode.com/services/trackbacks/8.aspx</trackback:ping>
        </item>
        <item>
            <title>Playing With Template Meta-Programming Part 1  </title>
            <link>http://scapecode.com/archive/2007/02/02/Playing-With-Template-MetaProgramming-Part-1.aspx</link>
            <description>&lt;p&gt;I've recently started playing around with a mock implementation of the &lt;a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2005/n1836.pdf"&gt;Technical Report on C++ Library Extensions&lt;/a&gt;(draft link). First off, I should note that its very difficult to implement these libraries properly. The standard makes many changes to the adopted boost libraries, both by enhancing functionality and by clarifying their properties and behaviors. One of the really interesting parts of implementing this standard is detecting the various conditions that need to be met by the &lt;font face="Courier New" color="#008080"&gt;reference_wrapper&lt;/font&gt; class. I will be discussing those conditions, and the methods that I have used to implement them. Mind you, implementing the TR1 standard is not an easy thing, and it requires a great deal of work on the part of compiler vendors to provide methods to determine object classifications at runtime, along with various other pieces of data that aren't typically available to template meta-programmers. &lt;/p&gt;
&lt;p&gt;Implementing, or at least attempting to implement, this standard will teach one who is not that familiar with template meta-programming techniques a great deal. I will attempt to cover as many of those points as I feel are relevant to the code I will be presenting, but omissions may be made either through accident, or because I consider them trivial. &lt;/p&gt;
&lt;p&gt;Meta-programming in C++ can be both frustrating and quite fun. The fun is all in the challenge of writing code that is executed not at runtime, but at compile time. Now, when one hears of “compile time execution” one tends to immediately jump to the conclusion that we are speaking of macros. Macros, however, are not executed at all but involve textual substitution. Fancy tricks can be performed using that textual substitution, as demonstrated by the &lt;font face="Courier New"&gt;boost::&lt;font color="#008080"&gt;preprocessor&lt;/font&gt;&lt;/font&gt; library. Compile time execution involves the evaluation of templates that use constants and the ability of compilers to evaluate mathematical and logical expressions involving constants. While not &lt;a href="http://en.wikipedia.org/wiki/Turing_complete"&gt;Turing Complete&lt;/a&gt;, the power of template meta-programming is immense. Unfortunately, this art is understood by only a few, and is viewed by many to be simply a means of obfuscating code beyond the ability of others to understand. While bad template coding can easily result in this, good template coding should be simple enough to read and even understand. At least, it should be at a relatively high level. Knowledge of the inner workings is not a prerequisite to understanding the concept and application of template meta-programming. This is where TR1 comes in. &lt;/p&gt;
&lt;h4&gt;Substitution Failure Is Not An Error (SFINAE) &lt;/h4&gt;
&lt;p&gt;Given a piece of code that calls an overloaded function, the compiler needs to select the best possible match based on a small subset of data. This data includes the number of parameters, how well the arguments match the types of the parameters, how well the object matches the implied object parameter (for nonstatic member functions), and certain other properties of the candidate function. A candidate set of functions is initially built based upon the context of the call. This context is then shrunk by selecting only those functions with the required number of arguments and that meet certain other conditions to form a new set of viable functions. Then the best viable function is selected based upon the implicit conversion sequences needed to match each argument to the corresponding parameter of each viable function. &lt;/p&gt;
&lt;p&gt;How can we use this? Well, quite simply, as long as there is a viable overload left, we cannot cause an error by creating functions with signatures that would be impossible for a particular type. To give a simple example... &lt;/p&gt;
&lt;p&gt;&lt;font face="Courier New"&gt;&lt;font color="#0000ff"&gt;template&lt;/font&gt;&amp;lt;&lt;font color="#0000ff"&gt;class&lt;/font&gt; &lt;font color="#008080"&gt;T&lt;/font&gt;&amp;gt;&lt;br /&gt;
&lt;font color="#0000ff"&gt;void&lt;/font&gt; f(&lt;font color="#008080"&gt;T&lt;/font&gt;*);&lt;br /&gt;
&lt;font color="#0000ff"&gt;template&lt;/font&gt;&amp;lt;&lt;font color="#0000ff"&gt;class&lt;/font&gt; &lt;font color="#008080"&gt;T&lt;/font&gt;&amp;gt;&lt;br /&gt;
&lt;/font&gt;&lt;font color="#0000ff"&gt;&lt;font face="Courier New"&gt;void&lt;/font&gt;&lt;/font&gt;&lt;font face="Courier New"&gt; f(&lt;font color="#008080"&gt;T&lt;/font&gt;);&lt;br /&gt;
&lt;br /&gt;
&lt;/font&gt;&lt;font face="Courier New"&gt;f(&lt;font color="#800000"&gt;1&lt;/font&gt;);&lt;/font&gt;&lt;/p&gt;
&lt;p&gt;Clearly the first overload would result in erroneous code, as 1 is not a pointer. However since there are other viable overloads available, they are used instead. However, if it has not been an integral constant, but a pointer, then the pointer overload would have been chosen, as it would be the closest candidate. Due to this property we are able to use SFINAE to detect when certain conditions are present within our template code. &lt;/p&gt;
&lt;p&gt;It is useful to define a method whereby we can easily determine which path the compiler chose to take when selecting overloads. For this purpose we define a class named &lt;font face="Courier New" color="#008080"&gt;sfinae_types&lt;/font&gt; as follows &lt;/p&gt;
&lt;p&gt;&lt;font face="Courier New"&gt;&lt;font color="#0000ff"&gt;struct&lt;/font&gt; &lt;font color="#008080"&gt;sfinae_types&lt;/font&gt; {&lt;br /&gt;
    &lt;font color="#0000ff"&gt;typedef char&lt;/font&gt; &lt;font color="#008080"&gt;one&lt;/font&gt;;&lt;br /&gt;
    &lt;font color="#0000ff"&gt;typedef struct&lt;/font&gt; { &lt;font color="#0000ff"&gt;char&lt;/font&gt; t[&lt;font color="#800000"&gt;2&lt;/font&gt;]; } &lt;font color="#008080"&gt;two&lt;/font&gt;;&lt;br /&gt;
};&lt;/font&gt;&lt;/p&gt;
&lt;p&gt;From this we can declare function overloads that use these return types. A simple &lt;font face="Courier New" color="#0000ff"&gt;sizeof &lt;/font&gt;check will determine which one was chosen, and thus we will know down which path the compiler has traveled. To give a more concrete example, here’s one way to determine if a type has a member type named &lt;font face="Courier New" color="#008080"&gt;argument_type&lt;/font&gt;. &lt;br /&gt;
&lt;br /&gt;
&lt;font face="Courier New"&gt;&lt;font color="#0000ff"&gt;template&lt;/font&gt;&amp;lt;&lt;font color="#0000ff"&gt;class&lt;/font&gt; &lt;font color="#008080"&gt;T&lt;/font&gt;&amp;gt;&lt;br /&gt;
&lt;font color="#0000ff"&gt;struct&lt;/font&gt; &lt;font color="#008080"&gt;has_argument_type&lt;/font&gt; {&lt;br /&gt;
&lt;font color="#0000ff"&gt;private&lt;/font&gt;:&lt;br /&gt;
    &lt;font color="#0000ff"&gt;template&lt;/font&gt;&amp;lt;&lt;font color="#0000ff"&gt;class&lt;/font&gt; &lt;font color="#008080"&gt;T&lt;/font&gt;&amp;gt; &lt;font color="#0000ff"&gt;struct&lt;/font&gt; &lt;font color="#008080"&gt;wrap&lt;/font&gt; { };&lt;br /&gt;
&lt;br /&gt;
    &lt;font color="#0000ff"&gt;template&lt;/font&gt;&amp;lt;&lt;font color="#0000ff"&gt;class&lt;/font&gt; &lt;font color="#008080"&gt;T&lt;/font&gt;&amp;gt;&lt;br /&gt;
    &lt;font color="#0000ff"&gt;static&lt;/font&gt; &lt;font color="#008080"&gt;sfinae_types&lt;/font&gt;::&lt;font color="#008080"&gt;one&lt;/font&gt; test(&lt;font color="#008080"&gt;wrap&lt;/font&gt;&amp;lt;&lt;font color="#0000ff"&gt;typename&lt;/font&gt; &lt;font color="#008080"&gt;T&lt;/font&gt;::&lt;font color="#008080"&gt;argument_type&lt;/font&gt;&amp;gt;*);&lt;br /&gt;
&lt;br /&gt;
    &lt;font color="#0000ff"&gt;template&lt;/font&gt;&amp;lt;&lt;font color="#0000ff"&gt;class&lt;/font&gt; &lt;font color="#008080"&gt;T&lt;/font&gt;&amp;gt;&lt;br /&gt;
    &lt;font color="#0000ff"&gt;static&lt;/font&gt; &lt;font color="#008080"&gt;sfinae_types&lt;/font&gt;::&lt;font color="#008080"&gt;two&lt;/font&gt; test(...);&lt;br /&gt;
&lt;font color="#0000ff"&gt;public&lt;/font&gt;:&lt;br /&gt;
    &lt;font color="#0000ff"&gt;static const bool&lt;/font&gt; value = &lt;font color="#0000ff"&gt;sizeof&lt;/font&gt;(test&amp;lt;&lt;font color="#008080"&gt;T&lt;/font&gt;&amp;gt;(&lt;font color="#800000"&gt;0&lt;/font&gt;)) == &lt;font color="#0000ff"&gt;sizeof&lt;/font&gt;(&lt;font color="#008080"&gt;sfinae_types&lt;/font&gt;::&lt;font color="#008080"&gt;one&lt;/font&gt;);&lt;br /&gt;
};&lt;/font&gt;&lt;/p&gt;
&lt;p&gt;It is important to remember that &lt;font face="Courier New" color="#0000ff"&gt;sizeof&lt;/font&gt; results in the size of the result of an expression, but it does not evaluate the expression. Thus things like function calls are never made, and the declaration is sufficient for the size of the result to be determined. In the case of a function call, it is the size of the return type, which for our purposes is either &lt;font face="Courier New"&gt;&lt;font color="#008080"&gt;sfinae_types&lt;/font&gt;::&lt;font color="#008080"&gt;one&lt;/font&gt;&lt;/font&gt; or &lt;font face="Courier New"&gt;&lt;font color="#008080"&gt;sfinae_types&lt;/font&gt;::&lt;font color="#008080"&gt;two&lt;/font&gt;&lt;/font&gt;. We can then use the size of the return type to determine down which path our compiler has traveled, and use the resulting boolean value to perform branching at compile time. It is important to note that the ellipse operator has the lowest implicit conversion sequence of any function overload, so any overload that fits better will be chosen. In this case, if &lt;font face="Courier New" color="#008080"&gt;T&lt;/font&gt; has an &lt;font face="Courier New" color="#008080"&gt;argument_type&lt;/font&gt; typedef (or inner class), then the &lt;font face="Courier New"&gt;&lt;font color="#008080"&gt;wrap&lt;/font&gt;&amp;lt;&lt;font color="#008080"&gt;T&lt;/font&gt;::&lt;font color="#008080"&gt;argument_type&lt;/font&gt;&amp;gt;*&lt;/font&gt; overload will be chosen instead. &lt;br /&gt;
&lt;br /&gt;
Anyways, that's where I'll finish up this time. &lt;/p&gt;&lt;img src="http://scapecode.com/aggbug/7.aspx" width="1" height="1" /&gt;</description>
            <dc:creator>Washu</dc:creator>
            <guid>http://scapecode.com/archive/2007/02/02/Playing-With-Template-MetaProgramming-Part-1.aspx</guid>
            <pubDate>Fri, 02 Feb 2007 22:34:41 GMT</pubDate>
            <wfw:comment>http://scapecode.com/comments/7.aspx</wfw:comment>
            <comments>http://scapecode.com/archive/2007/02/02/Playing-With-Template-MetaProgramming-Part-1.aspx#feedback</comments>
            <slash:comments>2</slash:comments>
            <wfw:commentRss>http://scapecode.com/comments/commentRss/7.aspx</wfw:commentRss>
            <trackback:ping>http://scapecode.com/services/trackbacks/7.aspx</trackback:ping>
        </item>
        <item>
            <title>The Third C++ Quiz - Aftermath</title>
            <link>http://scapecode.com/archive/2007/02/02/The-Third-C-Quiz--Aftermath.aspx</link>
            <description>&lt;p&gt;Quiz answers:&lt;/p&gt;
&lt;p&gt;During the construction of a const object, if the value of the object or any of its sub objects are accessed through an lvalue that is not obtained, either directly or indirectly, from the constructor’s this pointer, the value of the object or sub object thus obtained is unspecified. As such the answers to questions 1, 2 and 3 are: &lt;br /&gt;
1. &lt;font face="Courier New"&gt;i&lt;/font&gt; will contain an unspecified value as &lt;font face="Courier New"&gt;obj.c&lt;/font&gt; results in an unspecified value. &lt;br /&gt;
2. After the evaluation of the second numbered line, &lt;font face="Courier New"&gt;p-&amp;gt;c&lt;/font&gt; will contain the unspecified value that was in &lt;font face="Courier New"&gt;i&lt;/font&gt;. &lt;br /&gt;
3. The unspecified value in &lt;font face="Courier New"&gt;obj.c&lt;/font&gt; will be printed. &lt;/p&gt;
&lt;p&gt;At most only a single user defined conversion (constructor or conversion function) is implicitly applied to a single value. As such the answers for questions 4 and 5 are: &lt;br /&gt;
4. This should result in a compile time error. &lt;br /&gt;
5. The type is explicitly converted to type &lt;font face="Courier New" color="#008080"&gt;X&lt;/font&gt; using a conversion constructor, and &lt;font face="Courier New"&gt;&lt;font color="#0000ff"&gt;operator int&lt;/font&gt;()&lt;/font&gt; will be called on that &lt;font face="Courier New" color="#008080"&gt;X&lt;/font&gt; type, resulting in the int containing the integral value returned by &lt;font face="Courier New"&gt;&lt;font color="#008080"&gt;X&lt;/font&gt;::&lt;font color="#0000ff"&gt;operator int&lt;/font&gt;().&lt;/font&gt;&lt;/p&gt;
&lt;p&gt;An explicit constructor constructs objects only where the direct-initialization syntax or casts are explicitly used. As such the answer for question 6 is: &lt;br /&gt;
6. The first numbered line will result in an error, the second numbered line will result in &lt;font face="Courier New"&gt;z2&lt;/font&gt; being assigned the value in an unnamed temporary constructed during the &lt;font face="Courier New" color="#0000ff"&gt;static_cast&lt;/font&gt;.&lt;/p&gt;
&lt;p&gt;A typedef-name that names a class shall not be used as the identifier in the declarator for a destructor declaration. As such the answers to question 7 are:&lt;br /&gt;
7. The results of the calls on the numbered lines, irrespective of the other lines are. &lt;br /&gt;
a. Calls &lt;font face="Courier New"&gt;&lt;font color="#008080"&gt;Base&lt;/font&gt;::~&lt;font face="Courier New"&gt;&lt;font color="#0000ff"&gt;&lt;font face="Courier New"&gt;&lt;font color="#008080"&gt;Base&lt;/font&gt;&lt;/font&gt;&lt;/font&gt;&lt;/font&gt;() &lt;/font&gt;&lt;br /&gt;
b. Calls &lt;font face="Courier New"&gt;&lt;font color="#0000ff"&gt;&lt;font face="Courier New"&gt;&lt;font face="Courier New" color="#008080"&gt;Derived&lt;/font&gt;&lt;/font&gt;&lt;/font&gt;::~&lt;font face="Courier New"&gt;&lt;font color="#0000ff"&gt;&lt;font face="Courier New"&gt;&lt;font face="Courier New" color="#008080"&gt;Derived&lt;/font&gt;&lt;/font&gt;&lt;/font&gt;&lt;/font&gt;() &lt;br /&gt;
&lt;/font&gt;c. Calls &lt;font face="Courier New"&gt;&lt;font face="Courier New" color="#008080"&gt;Derived&lt;/font&gt;::~&lt;font face="Courier New"&gt;&lt;font color="#0000ff"&gt;&lt;font face="Courier New"&gt;&lt;font face="Courier New" color="#008080"&gt;Derived&lt;/font&gt;&lt;/font&gt;&lt;/font&gt;&lt;/font&gt;() &lt;br /&gt;
&lt;/font&gt;d. Calls &lt;font face="Courier New"&gt;&lt;font face="Courier New"&gt;&lt;font color="#0000ff"&gt;&lt;font face="Courier New"&gt;&lt;font color="#008080"&gt;Base&lt;/font&gt;&lt;/font&gt;&lt;/font&gt;&lt;/font&gt;::~&lt;font face="Courier New"&gt;&lt;font color="#0000ff"&gt;&lt;font face="Courier New"&gt;&lt;font color="#008080"&gt;Base&lt;/font&gt;&lt;/font&gt;&lt;/font&gt;&lt;/font&gt;()&lt;br /&gt;
&lt;/font&gt;e. This line should result in an error as this is clearly a decleration of a destructor (used in a function call syntax). As such the usage of the typedef is illegal. However it has been noted that many compilers will accept this syntax as legal. That does not mean that you should do it, nor expect it to work correctly. &lt;br /&gt;
&lt;/p&gt;&lt;img src="http://scapecode.com/aggbug/6.aspx" width="1" height="1" /&gt;</description>
            <dc:creator>Washu</dc:creator>
            <guid>http://scapecode.com/archive/2007/02/02/The-Third-C-Quiz--Aftermath.aspx</guid>
            <pubDate>Fri, 02 Feb 2007 22:21:48 GMT</pubDate>
            <wfw:comment>http://scapecode.com/comments/6.aspx</wfw:comment>
            <comments>http://scapecode.com/archive/2007/02/02/The-Third-C-Quiz--Aftermath.aspx#feedback</comments>
            <wfw:commentRss>http://scapecode.com/comments/commentRss/6.aspx</wfw:commentRss>
            <trackback:ping>http://scapecode.com/services/trackbacks/6.aspx</trackback:ping>
        </item>
        <item>
            <title>The Third C++ Quiz</title>
            <category>C++</category>
            <category>Quizes</category>
            <link>http://scapecode.com/archive/2007/02/02/The-Third-C-Quiz.aspx</link>
            <description>&lt;p&gt;A lot of code in this one, but you guys should be able to handle it, it is all quite easy. &lt;/p&gt;
&lt;p&gt;&lt;font face="Courier New"&gt;&lt;font color="#0000ff"&gt;struct&lt;/font&gt; &lt;font color="#008080"&gt;C&lt;/font&gt;; &lt;br /&gt;
&lt;font color="#0000ff"&gt;void&lt;/font&gt; f(&lt;font color="#008080"&gt;C&lt;/font&gt;* p); &lt;br /&gt;
&lt;br /&gt;
&lt;font face="Courier New"&gt;&lt;font color="#0000ff"&gt;struct&lt;/font&gt; &lt;/font&gt;&lt;font color="#008080"&gt;C&lt;/font&gt; { &lt;br /&gt;
    &lt;font color="#0000ff"&gt;int&lt;/font&gt; c; &lt;br /&gt;
    &lt;font color="#008080"&gt;C&lt;/font&gt;() : c(&lt;font color="#800000"&gt;1&lt;/font&gt;) { f(&lt;font color="#0000ff"&gt;this&lt;/font&gt;); } &lt;br /&gt;
}; &lt;br /&gt;
&lt;br /&gt;
&lt;font color="#0000ff"&gt;const&lt;/font&gt; &lt;font color="#008080"&gt;C&lt;/font&gt; obj; &lt;br /&gt;
&lt;br /&gt;
&lt;font color="#0000ff"&gt;void&lt;/font&gt; f(&lt;font color="#008080"&gt;C&lt;/font&gt;* p) { &lt;br /&gt;
    &lt;font color="#0000ff"&gt;int&lt;/font&gt; i = obj.c &amp;lt;&amp;lt; &lt;font color="#800000"&gt;2&lt;/font&gt;;             &lt;font color="#339966"&gt;//1 &lt;br /&gt;
&lt;/font&gt;    p-&amp;gt;c = i;                       &lt;font color="#339966"&gt;//2 &lt;br /&gt;
&lt;/font&gt;    &lt;font color="#008080"&gt;std::cout&lt;/font&gt;&amp;lt;&amp;lt; obj.c &amp;lt;&amp;lt; std::endl; &lt;font color="#339966"&gt;//3 &lt;br /&gt;
&lt;/font&gt;} &lt;br /&gt;
&lt;/font&gt;&lt;br /&gt;
1. What is the value of i after the first numbered line is evaluated? &lt;br /&gt;
2. What is the value of p-&amp;gt;c after the second numbered line is evaluated? &lt;br /&gt;
3. What does the third numbered line print? &lt;br /&gt;
&lt;br /&gt;
&lt;font face="Courier New"&gt;&lt;font face="Courier New"&gt;&lt;font color="#0000ff"&gt;struct&lt;/font&gt; &lt;/font&gt;&lt;font color="#008080"&gt;X&lt;/font&gt; { &lt;br /&gt;
    &lt;font color="#0000ff"&gt;operator int&lt;/font&gt;() { &lt;font color="#0000ff"&gt;return&lt;/font&gt; &lt;font color="#800000"&gt;314159&lt;/font&gt;; } &lt;br /&gt;
};&lt;br /&gt;
&lt;br /&gt;
&lt;font face="Courier New"&gt;&lt;font color="#0000ff"&gt;struct&lt;/font&gt; &lt;/font&gt;&lt;font color="#008080"&gt;Y&lt;/font&gt; { &lt;br /&gt;
    &lt;font color="#0000ff"&gt;operator&lt;/font&gt; &lt;font color="#008080"&gt;X&lt;/font&gt;() { &lt;font color="#0000ff"&gt;return&lt;/font&gt; &lt;font color="#008080"&gt;X&lt;/font&gt;(); } &lt;br /&gt;
}; &lt;br /&gt;
&lt;br /&gt;
&lt;font color="#008080"&gt;Y&lt;/font&gt; y; &lt;br /&gt;
&lt;br /&gt;
&lt;font color="#0000ff"&gt;int&lt;/font&gt; i = y;    &lt;font color="#339966"&gt;//1&lt;/font&gt; &lt;br /&gt;
&lt;font color="#0000ff"&gt;int&lt;/font&gt; j = &lt;font color="#008080"&gt;X&lt;/font&gt;(y); &lt;font color="#339966"&gt;//2&lt;/font&gt;&lt;/font&gt;&lt;/p&gt;
&lt;p&gt;Without compiling, answer the following questions: &lt;br /&gt;
4. What should you expect the compiler to do on the first numbered line? Why? &lt;br /&gt;
5. What should you expect the value of j to be after the second numbered line is evaluated? Why?&lt;br /&gt;
&lt;br /&gt;
&lt;font face="Courier New"&gt;&lt;font face="Courier New"&gt;&lt;font color="#0000ff"&gt;struct&lt;/font&gt; &lt;/font&gt;&lt;font color="#008080"&gt;Z&lt;/font&gt; { &lt;br /&gt;
&lt;font color="#008080"&gt;Z&lt;/font&gt;() {} &lt;br /&gt;
    &lt;font color="#0000ff"&gt;explicit&lt;/font&gt; &lt;font color="#008080"&gt;Z&lt;/font&gt;(&lt;font color="#0000ff"&gt;int&lt;/font&gt;) {} &lt;br /&gt;
}; &lt;br /&gt;
&lt;br /&gt;
&lt;font color="#008080"&gt;Z&lt;/font&gt; z1 = &lt;font color="#800000"&gt;1&lt;/font&gt;;                 &lt;font color="#339966"&gt;//1 &lt;br /&gt;
&lt;/font&gt;&lt;font color="#008080"&gt;Z&lt;/font&gt; z2 = &lt;font color="#0000ff"&gt;static_cast&lt;/font&gt;&amp;lt;Z&amp;gt;(&lt;font color="#800000"&gt;1&lt;/font&gt;); &lt;font color="#339966"&gt;//2&lt;/font&gt;&lt;/font&gt;&lt;/p&gt;
&lt;p&gt;Without compiling, answer the following questions: &lt;br /&gt;
6. What should you expect the compiler to do on the first and second numbered lines? Why? &lt;br /&gt;
&lt;br /&gt;
&lt;font face="Courier New"&gt;&lt;font face="Courier New"&gt;&lt;font color="#0000ff"&gt;struct&lt;/font&gt; &lt;/font&gt;&lt;font color="#008080"&gt;Base &lt;/font&gt;{ &lt;br /&gt;
    virtual ~&lt;font color="#008080"&gt;Base&lt;/font&gt;() {} &lt;br /&gt;
}; &lt;br /&gt;
&lt;br /&gt;
&lt;font face="Courier New"&gt;&lt;font color="#0000ff"&gt;struct&lt;/font&gt; &lt;/font&gt;&lt;font color="#008080"&gt;Derived&lt;/font&gt; : &lt;font color="#008080"&gt;Base &lt;/font&gt;{ &lt;br /&gt;
    ~&lt;font color="#008080"&gt;Derived&lt;/font&gt;() {} &lt;br /&gt;
}; &lt;br /&gt;
&lt;br /&gt;
&lt;font color="#0000ff"&gt;typedef&lt;/font&gt; &lt;font color="#008080"&gt;Base &lt;/font&gt;&lt;font color="#008080"&gt;Base2&lt;/font&gt;; &lt;br /&gt;
&lt;font color="#008080"&gt;Derived&lt;/font&gt; d; &lt;br /&gt;
&lt;br /&gt;
&lt;font color="#008080"&gt;Base&lt;/font&gt;* p = &amp;amp;d; &lt;br /&gt;
&lt;br /&gt;
&lt;font color="#3366ff"&gt;void&lt;/font&gt; f() { &lt;br /&gt;
    d.&lt;font color="#008080"&gt;Base&lt;/font&gt;::~&lt;font color="#008080"&gt;Base&lt;/font&gt;();    &lt;font color="#339966"&gt;//1 &lt;br /&gt;
&lt;/font&gt;    p-&amp;gt;~&lt;font color="#008080"&gt;Base&lt;/font&gt;();         &lt;font color="#339966"&gt;//2 &lt;br /&gt;
&lt;/font&gt;    p-&amp;gt;~&lt;font color="#008080"&gt;Base2&lt;/font&gt;();        &lt;font color="#339966"&gt;//3&lt;/font&gt; &lt;br /&gt;
    p-&amp;gt;&lt;font color="#008080"&gt;Base2&lt;/font&gt;::~&lt;font color="#008080"&gt;Base&lt;/font&gt;();  &lt;font color="#339966"&gt;//4&lt;br /&gt;
&lt;/font&gt;    p-&amp;gt;&lt;font color="#008080"&gt;Base2&lt;/font&gt;::~&lt;font color="#008080"&gt;Base2&lt;/font&gt;(); &lt;font color="#339966"&gt;//5 &lt;br /&gt;
&lt;/font&gt;}&lt;/font&gt;&lt;/p&gt;
&lt;p&gt;Without compiling, answer the following questions: &lt;br /&gt;
7. What should you expect the behavior of each of the numbered lines, irrespective of the other lines, to be? &lt;br /&gt;
&lt;/p&gt;&lt;img src="http://scapecode.com/aggbug/5.aspx" width="1" height="1" /&gt;</description>
            <dc:creator>Washu</dc:creator>
            <guid>http://scapecode.com/archive/2007/02/02/The-Third-C-Quiz.aspx</guid>
            <pubDate>Fri, 02 Feb 2007 22:18:50 GMT</pubDate>
            <wfw:comment>http://scapecode.com/comments/5.aspx</wfw:comment>
            <comments>http://scapecode.com/archive/2007/02/02/The-Third-C-Quiz.aspx#feedback</comments>
            <wfw:commentRss>http://scapecode.com/comments/commentRss/5.aspx</wfw:commentRss>
            <trackback:ping>http://scapecode.com/services/trackbacks/5.aspx</trackback:ping>
        </item>
        <item>
            <title>A Second C++ Quiz - Aftermath</title>
            <category>C++</category>
            <category>Quizes</category>
            <link>http://scapecode.com/archive/2007/02/02/A-Second-C-Quiz--Aftermath.aspx</link>
            <description>&lt;p&gt;&lt;em&gt;Originally posted at: &lt;a href="http://www.gamedev.net/community/forums/mod/journal/journal.asp?jn=259115&amp;amp;reply_id=2465469"&gt;Washu's GDNet Journal&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Well, ok, I lied. This quiz was a lot harder than the previous one. Obviously the first one required you to know the insides of multiple inheritance, while the second one was all about templates and the difference between complete and incomplete types.&lt;/p&gt;
&lt;p&gt;1.1) &lt;font face="Courier New"&gt;&lt;font color="#0000ff"&gt;typeid&lt;/font&gt;(*a);&lt;/font&gt;