<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>ScapeCode &#187; Software Development</title>
	<atom:link href="http://scapecode.com/category/software-development/feed/" rel="self" type="application/rss+xml" />
	<link>http://scapecode.com</link>
	<description>Anime, tentacles, and software development.</description>
	<lastBuildDate>Thu, 26 Aug 2010 22:32:18 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>SlimGen and You, Part ADD EAX, [EAX] of N</title>
		<link>http://scapecode.com/2009/08/slimgen-and-you-part-add-eax-eax-of-n-2/</link>
		<comments>http://scapecode.com/2009/08/slimgen-and-you-part-add-eax-eax-of-n-2/#comments</comments>
		<pubDate>Mon, 17 Aug 2009 17:22:47 +0000</pubDate>
		<dc:creator>Washu</dc:creator>
				<category><![CDATA[.Net]]></category>
		<category><![CDATA[SlimDX]]></category>
		<category><![CDATA[SlimGen]]></category>
		<category><![CDATA[Software Development]]></category>

		<guid isPermaLink="false">http://scapecode.com/?p=73</guid>
		<description><![CDATA[So far I’ve covered how SlimGen works and the difficulties in doing what it does, including calling convention issues that one must be made aware of when writing replacement methods for use with SlimGen. So the next question arises, just how much of a difference can using SlimGen make? Well, a lot of that will [...]]]></description>
			<content:encoded><![CDATA[<p>So far I’ve covered how SlimGen works and the difficulties in doing what it does, including calling convention issues that one must be made aware of when writing replacement methods for use with SlimGen.</p>
<p>So the next question arises, just how much of a difference can using SlimGen make? Well, a lot of that will depend on the developer and their skill level. But we also were pretty curious about this and so we slapped together a test sample that runs through a series of matrix multiplications and times it. It uses three arrays to perform the multiplications, two of the arrays contains 100,000 randomly generated matrixes, with the third being used as the destinations for the results. Both matrix multiplications (the SlimGen one and the .Net one) assume that a source can also be used as a destination, and so they are overlap safe.</p>
<p>The timing results will vary, of course, from machine to machine depending on the processor in the machine, how much ram you have and also on what you’re doing at the time. Running the results against my Phenom 9850 I get:</p>
<pre>Total Matrix Count Per Run:  100,000
Multiply        Total Ticks: 2,001,059
SlimGenMultiply Total Ticks: 1,269,200
Improvement:                 36.57 % </pre>
<p>While when I run it against my T8300 Core2 Duo laptop I get:</p>
<pre>Total Matrix Count Per Run:  100,000
Multiply        Total Ticks: 2,175,380
SlimGenMultiply Total Ticks: 1,621,830
Improvement:                 25.45 %</pre>
<p>Still, 25-35% improvement over the FPU based multiply is quite significant. Since X64 support hasn’t been fully hammered out (in that it “works” but hasn’t been sufficiently verified as working), those numbers are unavailable at the moment. However, they should be available in the near future as we finalize error handling and ensure that there are no bugs in the x64 assembly handling.</p>
<p>So why the great difference in performance? Well, part of it is the method size, the .Net method is 566 bytes of pure code, that’s over half a kilobyte of code that has to be walked through by the processor, code which needs to be brought into the instruction-cache on the CPU and executed, meanwhile the SSE2 method is around half that size, at 266 bytes. The smaller your footprint in the I-cache, the fewer hits you take and the more likely your code is to actually be IN the I-cache. Then there’s the instructions, SSE2 has been around for a while, and so it has had plenty of time to be wrangled around with by CPU manufacturers to ensure optimal performance. Finally there’s the memory hit issue, the SSE2 based code hits memory a minimal number of times, reducing the chances of cache misses, after the first read/write, except for a few cases.</p>
<p>Finally there’s how it deals with storage of the temporary results. The .Net FPU based version allocates a Matrix type on the stack, calls the constructor (which 0 initializes it), and then proceeds to overwrite those entries one by one with the results of each set of dot products. At the end of the method it does what amounts to a memcpy, and copies the temporary matrix over the result matrix. The SSE2 version however doesn’t bother with initializing the stack and only stores three of the results on the stack, opting to write out the final result directly to the destination. The three other rows are then moved back into XMM registers and then back out to the destination.</p>
<p>The SSE2 source code, followed by the .Net source code, note that both are functionally equivalent:</p>

<div class="wp_syntax"><div class="code"><pre class="asm" style="font-family:monospace;">start<span style="color: #339933;">:</span>      <span style="color: #00007f; font-weight: bold;">mov</span>     <span style="color: #00007f;">eax</span><span style="color: #339933;">,</span> <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #00007f;">esp</span> <span style="color: #339933;">+</span> <span style="color: #0000ff;">4</span><span style="color: #009900; font-weight: bold;">&#93;</span>
            <span style="color: #00007f; font-weight: bold;">movups</span>  <span style="color: #00007f;">xmm4</span><span style="color: #339933;">,</span> <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #00007f;">edx</span><span style="color: #009900; font-weight: bold;">&#93;</span>
            <span style="color: #00007f; font-weight: bold;">movups</span>  <span style="color: #00007f;">xmm5</span><span style="color: #339933;">,</span> <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #00007f;">edx</span> <span style="color: #339933;">+</span> <span style="color: #0000ff;">0x10</span><span style="color: #009900; font-weight: bold;">&#93;</span>
            <span style="color: #00007f; font-weight: bold;">movups</span>  <span style="color: #00007f;">xmm6</span><span style="color: #339933;">,</span> <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #00007f;">edx</span> <span style="color: #339933;">+</span> <span style="color: #0000ff;">0x20</span><span style="color: #009900; font-weight: bold;">&#93;</span>
            <span style="color: #00007f; font-weight: bold;">movups</span>  <span style="color: #00007f;">xmm7</span><span style="color: #339933;">,</span> <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #00007f;">edx</span> <span style="color: #339933;">+</span> <span style="color: #0000ff;">0x30</span><span style="color: #009900; font-weight: bold;">&#93;</span>
&nbsp;
            <span style="color: #00007f; font-weight: bold;">movups</span>  <span style="color: #00007f;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #00007f;">ecx</span><span style="color: #009900; font-weight: bold;">&#93;</span>
            <span style="color: #00007f; font-weight: bold;">movaps</span>  <span style="color: #00007f;">xmm1</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm0</span>
            <span style="color: #00007f; font-weight: bold;">movaps</span>  <span style="color: #00007f;">xmm2</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm0</span>
            <span style="color: #00007f; font-weight: bold;">movaps</span>  <span style="color: #00007f;">xmm3</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm0</span>
            <span style="color: #00007f; font-weight: bold;">shufps</span>  <span style="color: #00007f;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm1</span><span style="color: #339933;">,</span> <span style="color: #0000ff;">0x00</span>
            <span style="color: #00007f; font-weight: bold;">shufps</span>  <span style="color: #00007f;">xmm1</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm1</span><span style="color: #339933;">,</span> <span style="color: #0000ff;">0x55</span>
            <span style="color: #00007f; font-weight: bold;">shufps</span>  <span style="color: #00007f;">xmm2</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm2</span><span style="color: #339933;">,</span> <span style="color: #0000ff;">0xAA</span>
            <span style="color: #00007f; font-weight: bold;">shufps</span>  <span style="color: #00007f;">xmm3</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm3</span><span style="color: #339933;">,</span> <span style="color: #0000ff;">0xFF</span>
&nbsp;
            <span style="color: #00007f; font-weight: bold;">mulps</span>   <span style="color: #00007f;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm4</span>
            <span style="color: #00007f; font-weight: bold;">mulps</span>   <span style="color: #00007f;">xmm1</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm5</span>
            <span style="color: #00007f; font-weight: bold;">mulps</span>   <span style="color: #00007f;">xmm2</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm6</span>
            <span style="color: #00007f; font-weight: bold;">mulps</span>   <span style="color: #00007f;">xmm3</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm7</span>
            <span style="color: #00007f; font-weight: bold;">addps</span>   <span style="color: #00007f;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm2</span>
            <span style="color: #00007f; font-weight: bold;">addps</span>   <span style="color: #00007f;">xmm1</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm3</span>
            <span style="color: #00007f; font-weight: bold;">addps</span>   <span style="color: #00007f;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm1</span>
&nbsp;
            <span style="color: #00007f; font-weight: bold;">movups</span>  <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #00007f;">esp</span> <span style="color: #339933;">-</span> <span style="color: #0000ff;">0x20</span><span style="color: #009900; font-weight: bold;">&#93;</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm0</span> <span style="color: #666666; font-style: italic;">; store row 0 of new matrix</span>
&nbsp;
            <span style="color: #00007f; font-weight: bold;">movups</span>  <span style="color: #00007f;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #00007f;">ecx</span> <span style="color: #339933;">+</span> <span style="color: #0000ff;">0x10</span><span style="color: #009900; font-weight: bold;">&#93;</span>
            <span style="color: #00007f; font-weight: bold;">movaps</span>  <span style="color: #00007f;">xmm1</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm0</span>
            <span style="color: #00007f; font-weight: bold;">movaps</span>  <span style="color: #00007f;">xmm2</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm0</span>
            <span style="color: #00007f; font-weight: bold;">movaps</span>  <span style="color: #00007f;">xmm3</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm0</span>
            <span style="color: #00007f; font-weight: bold;">shufps</span>  <span style="color: #00007f;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #0000ff;">0x00</span>
            <span style="color: #00007f; font-weight: bold;">shufps</span>  <span style="color: #00007f;">xmm1</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm1</span><span style="color: #339933;">,</span> <span style="color: #0000ff;">0x55</span>
            <span style="color: #00007f; font-weight: bold;">shufps</span>  <span style="color: #00007f;">xmm2</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm2</span><span style="color: #339933;">,</span> <span style="color: #0000ff;">0xAA</span>
            <span style="color: #00007f; font-weight: bold;">shufps</span>  <span style="color: #00007f;">xmm3</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm3</span><span style="color: #339933;">,</span> <span style="color: #0000ff;">0xFF</span>
&nbsp;
            <span style="color: #00007f; font-weight: bold;">mulps</span>   <span style="color: #00007f;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm4</span>
            <span style="color: #00007f; font-weight: bold;">mulps</span>   <span style="color: #00007f;">xmm1</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm5</span>
            <span style="color: #00007f; font-weight: bold;">mulps</span>   <span style="color: #00007f;">xmm2</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm6</span>
            <span style="color: #00007f; font-weight: bold;">mulps</span>   <span style="color: #00007f;">xmm3</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm7</span>
            <span style="color: #00007f; font-weight: bold;">addps</span>   <span style="color: #00007f;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm2</span>
            <span style="color: #00007f; font-weight: bold;">addps</span>   <span style="color: #00007f;">xmm1</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm3</span>
            <span style="color: #00007f; font-weight: bold;">addps</span>   <span style="color: #00007f;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm1</span>
&nbsp;
            <span style="color: #00007f; font-weight: bold;">movups</span>  <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #00007f;">esp</span> <span style="color: #339933;">-</span> <span style="color: #0000ff;">0x30</span><span style="color: #009900; font-weight: bold;">&#93;</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm0</span> <span style="color: #666666; font-style: italic;">; store row 1 of new matrix</span>
&nbsp;
            <span style="color: #00007f; font-weight: bold;">movups</span>  <span style="color: #00007f;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #00007f;">ecx</span> <span style="color: #339933;">+</span> <span style="color: #0000ff;">0x20</span><span style="color: #009900; font-weight: bold;">&#93;</span>
            <span style="color: #00007f; font-weight: bold;">movaps</span>  <span style="color: #00007f;">xmm1</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm0</span>
            <span style="color: #00007f; font-weight: bold;">movaps</span>  <span style="color: #00007f;">xmm2</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm0</span>
            <span style="color: #00007f; font-weight: bold;">movaps</span>  <span style="color: #00007f;">xmm3</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm0</span>
            <span style="color: #00007f; font-weight: bold;">shufps</span>  <span style="color: #00007f;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #0000ff;">0x00</span>
            <span style="color: #00007f; font-weight: bold;">shufps</span>  <span style="color: #00007f;">xmm1</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm1</span><span style="color: #339933;">,</span> <span style="color: #0000ff;">0x55</span>
            <span style="color: #00007f; font-weight: bold;">shufps</span>  <span style="color: #00007f;">xmm2</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm2</span><span style="color: #339933;">,</span> <span style="color: #0000ff;">0xAA</span>
            <span style="color: #00007f; font-weight: bold;">shufps</span>  <span style="color: #00007f;">xmm3</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm3</span><span style="color: #339933;">,</span> <span style="color: #0000ff;">0xFF</span>
&nbsp;
            <span style="color: #00007f; font-weight: bold;">mulps</span>   <span style="color: #00007f;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm4</span>
            <span style="color: #00007f; font-weight: bold;">mulps</span>   <span style="color: #00007f;">xmm1</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm5</span>
            <span style="color: #00007f; font-weight: bold;">mulps</span>   <span style="color: #00007f;">xmm2</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm6</span>
            <span style="color: #00007f; font-weight: bold;">mulps</span>   <span style="color: #00007f;">xmm3</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm7</span>
            <span style="color: #00007f; font-weight: bold;">addps</span>   <span style="color: #00007f;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm2</span>
            <span style="color: #00007f; font-weight: bold;">addps</span>   <span style="color: #00007f;">xmm1</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm3</span>
            <span style="color: #00007f; font-weight: bold;">addps</span>   <span style="color: #00007f;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm1</span>
&nbsp;
            <span style="color: #00007f; font-weight: bold;">movups</span>  <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #00007f;">esp</span> <span style="color: #339933;">-</span> <span style="color: #0000ff;">0x40</span><span style="color: #009900; font-weight: bold;">&#93;</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm0</span> <span style="color: #666666; font-style: italic;">; store row 2 of new matrix</span>
&nbsp;
            <span style="color: #00007f; font-weight: bold;">movups</span>  <span style="color: #00007f;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #00007f;">ecx</span> <span style="color: #339933;">+</span> <span style="color: #0000ff;">0x30</span><span style="color: #009900; font-weight: bold;">&#93;</span>
            <span style="color: #00007f; font-weight: bold;">movaps</span>  <span style="color: #00007f;">xmm1</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm0</span>
            <span style="color: #00007f; font-weight: bold;">movaps</span>  <span style="color: #00007f;">xmm2</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm0</span>
            <span style="color: #00007f; font-weight: bold;">movaps</span>  <span style="color: #00007f;">xmm3</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm0</span>
            <span style="color: #00007f; font-weight: bold;">shufps</span>  <span style="color: #00007f;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #0000ff;">0x00</span>
            <span style="color: #00007f; font-weight: bold;">shufps</span>  <span style="color: #00007f;">xmm1</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm1</span><span style="color: #339933;">,</span> <span style="color: #0000ff;">0x55</span>
            <span style="color: #00007f; font-weight: bold;">shufps</span>  <span style="color: #00007f;">xmm2</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm2</span><span style="color: #339933;">,</span> <span style="color: #0000ff;">0xAA</span>
            <span style="color: #00007f; font-weight: bold;">shufps</span>  <span style="color: #00007f;">xmm3</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm3</span><span style="color: #339933;">,</span> <span style="color: #0000ff;">0xFF</span>
&nbsp;
            <span style="color: #00007f; font-weight: bold;">mulps</span>   <span style="color: #00007f;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm4</span>
            <span style="color: #00007f; font-weight: bold;">mulps</span>   <span style="color: #00007f;">xmm1</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm5</span>
            <span style="color: #00007f; font-weight: bold;">mulps</span>   <span style="color: #00007f;">xmm2</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm6</span>
            <span style="color: #00007f; font-weight: bold;">mulps</span>   <span style="color: #00007f;">xmm3</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm7</span>
            <span style="color: #00007f; font-weight: bold;">addps</span>   <span style="color: #00007f;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm2</span>
            <span style="color: #00007f; font-weight: bold;">addps</span>   <span style="color: #00007f;">xmm1</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm3</span>
            <span style="color: #00007f; font-weight: bold;">addps</span>   <span style="color: #00007f;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm1</span>
&nbsp;
            <span style="color: #00007f; font-weight: bold;">movups</span>  <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #00007f;">eax</span> <span style="color: #339933;">+</span> <span style="color: #0000ff;">0x30</span><span style="color: #009900; font-weight: bold;">&#93;</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm0</span> <span style="color: #666666; font-style: italic;">; store row 3 of new matrix</span>
            <span style="color: #00007f; font-weight: bold;">movups</span>  <span style="color: #00007f;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #00007f;">esp</span> <span style="color: #339933;">-</span> <span style="color: #0000ff;">0x40</span><span style="color: #009900; font-weight: bold;">&#93;</span>
            <span style="color: #00007f; font-weight: bold;">movups</span>  <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #00007f;">eax</span> <span style="color: #339933;">+</span> <span style="color: #0000ff;">0x20</span><span style="color: #009900; font-weight: bold;">&#93;</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm0</span>
            <span style="color: #00007f; font-weight: bold;">movups</span>  <span style="color: #00007f;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #00007f;">esp</span> <span style="color: #339933;">-</span> <span style="color: #0000ff;">0x30</span><span style="color: #009900; font-weight: bold;">&#93;</span>
            <span style="color: #00007f; font-weight: bold;">movups</span>  <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #00007f;">eax</span> <span style="color: #339933;">+</span> <span style="color: #0000ff;">0x10</span><span style="color: #009900; font-weight: bold;">&#93;</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm0</span>
            <span style="color: #00007f; font-weight: bold;">movups</span>  <span style="color: #00007f;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #00007f;">esp</span> <span style="color: #339933;">-</span> <span style="color: #0000ff;">0x20</span><span style="color: #009900; font-weight: bold;">&#93;</span>
            <span style="color: #00007f; font-weight: bold;">movups</span>  <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #00007f;">eax</span><span style="color: #009900; font-weight: bold;">&#93;</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm0</span>
            <span style="color: #00007f; font-weight: bold;">ret</span>     <span style="color: #0000ff;">4</span></pre></div></div>

<p>The .Net matrix multiplication source code:</p>

<div class="wp_syntax"><div class="code"><pre class="csharp" style="font-family:monospace;"><span style="color: #0600FF;">public</span> <span style="color: #0600FF;">static</span> <span style="color: #0600FF;">void</span> Multiply<span style="color: #000000;">&#40;</span><span style="color: #0600FF;">ref</span> Matrix left, <span style="color: #0600FF;">ref</span> Matrix right, <span style="color: #0600FF;">out</span> Matrix result<span style="color: #000000;">&#41;</span> <span style="color: #000000;">&#123;</span>
    Matrix r<span style="color: #008000;">;</span>
    r.<span style="color: #0000FF;">M11</span> <span style="color: #008000;">=</span> <span style="color: #000000;">&#40;</span>left.<span style="color: #0000FF;">M11</span> <span style="color: #008000;">*</span> right.<span style="color: #0000FF;">M11</span><span style="color: #000000;">&#41;</span> <span style="color: #008000;">+</span> <span style="color: #000000;">&#40;</span>left.<span style="color: #0000FF;">M12</span> <span style="color: #008000;">*</span> right.<span style="color: #0000FF;">M21</span><span style="color: #000000;">&#41;</span> <span style="color: #008000;">+</span> <span style="color: #000000;">&#40;</span>left.<span style="color: #0000FF;">M13</span> <span style="color: #008000;">*</span> right.<span style="color: #0000FF;">M31</span><span style="color: #000000;">&#41;</span> <span style="color: #008000;">+</span> <span style="color: #000000;">&#40;</span>left.<span style="color: #0000FF;">M14</span> <span style="color: #008000;">*</span> right.<span style="color: #0000FF;">M41</span><span style="color: #000000;">&#41;</span><span style="color: #008000;">;</span>
    r.<span style="color: #0000FF;">M12</span> <span style="color: #008000;">=</span> <span style="color: #000000;">&#40;</span>left.<span style="color: #0000FF;">M11</span> <span style="color: #008000;">*</span> right.<span style="color: #0000FF;">M12</span><span style="color: #000000;">&#41;</span> <span style="color: #008000;">+</span> <span style="color: #000000;">&#40;</span>left.<span style="color: #0000FF;">M12</span> <span style="color: #008000;">*</span> right.<span style="color: #0000FF;">M22</span><span style="color: #000000;">&#41;</span> <span style="color: #008000;">+</span> <span style="color: #000000;">&#40;</span>left.<span style="color: #0000FF;">M13</span> <span style="color: #008000;">*</span> right.<span style="color: #0000FF;">M32</span><span style="color: #000000;">&#41;</span> <span style="color: #008000;">+</span> <span style="color: #000000;">&#40;</span>left.<span style="color: #0000FF;">M14</span> <span style="color: #008000;">*</span> right.<span style="color: #0000FF;">M42</span><span style="color: #000000;">&#41;</span><span style="color: #008000;">;</span>
    r.<span style="color: #0000FF;">M13</span> <span style="color: #008000;">=</span> <span style="color: #000000;">&#40;</span>left.<span style="color: #0000FF;">M11</span> <span style="color: #008000;">*</span> right.<span style="color: #0000FF;">M13</span><span style="color: #000000;">&#41;</span> <span style="color: #008000;">+</span> <span style="color: #000000;">&#40;</span>left.<span style="color: #0000FF;">M12</span> <span style="color: #008000;">*</span> right.<span style="color: #0000FF;">M23</span><span style="color: #000000;">&#41;</span> <span style="color: #008000;">+</span> <span style="color: #000000;">&#40;</span>left.<span style="color: #0000FF;">M13</span> <span style="color: #008000;">*</span> right.<span style="color: #0000FF;">M33</span><span style="color: #000000;">&#41;</span> <span style="color: #008000;">+</span> <span style="color: #000000;">&#40;</span>left.<span style="color: #0000FF;">M14</span> <span style="color: #008000;">*</span> right.<span style="color: #0000FF;">M43</span><span style="color: #000000;">&#41;</span><span style="color: #008000;">;</span>
    r.<span style="color: #0000FF;">M14</span> <span style="color: #008000;">=</span> <span style="color: #000000;">&#40;</span>left.<span style="color: #0000FF;">M11</span> <span style="color: #008000;">*</span> right.<span style="color: #0000FF;">M14</span><span style="color: #000000;">&#41;</span> <span style="color: #008000;">+</span> <span style="color: #000000;">&#40;</span>left.<span style="color: #0000FF;">M12</span> <span style="color: #008000;">*</span> right.<span style="color: #0000FF;">M24</span><span style="color: #000000;">&#41;</span> <span style="color: #008000;">+</span> <span style="color: #000000;">&#40;</span>left.<span style="color: #0000FF;">M13</span> <span style="color: #008000;">*</span> right.<span style="color: #0000FF;">M34</span><span style="color: #000000;">&#41;</span> <span style="color: #008000;">+</span> <span style="color: #000000;">&#40;</span>left.<span style="color: #0000FF;">M14</span> <span style="color: #008000;">*</span> right.<span style="color: #0000FF;">M44</span><span style="color: #000000;">&#41;</span><span style="color: #008000;">;</span>
    r.<span style="color: #0000FF;">M21</span> <span style="color: #008000;">=</span> <span style="color: #000000;">&#40;</span>left.<span style="color: #0000FF;">M21</span> <span style="color: #008000;">*</span> right.<span style="color: #0000FF;">M11</span><span style="color: #000000;">&#41;</span> <span style="color: #008000;">+</span> <span style="color: #000000;">&#40;</span>left.<span style="color: #0000FF;">M22</span> <span style="color: #008000;">*</span> right.<span style="color: #0000FF;">M21</span><span style="color: #000000;">&#41;</span> <span style="color: #008000;">+</span> <span style="color: #000000;">&#40;</span>left.<span style="color: #0000FF;">M23</span> <span style="color: #008000;">*</span> right.<span style="color: #0000FF;">M31</span><span style="color: #000000;">&#41;</span> <span style="color: #008000;">+</span> <span style="color: #000000;">&#40;</span>left.<span style="color: #0000FF;">M24</span> <span style="color: #008000;">*</span> right.<span style="color: #0000FF;">M41</span><span style="color: #000000;">&#41;</span><span style="color: #008000;">;</span>
    r.<span style="color: #0000FF;">M22</span> <span style="color: #008000;">=</span> <span style="color: #000000;">&#40;</span>left.<span style="color: #0000FF;">M21</span> <span style="color: #008000;">*</span> right.<span style="color: #0000FF;">M12</span><span style="color: #000000;">&#41;</span> <span style="color: #008000;">+</span> <span style="color: #000000;">&#40;</span>left.<span style="color: #0000FF;">M22</span> <span style="color: #008000;">*</span> right.<span style="color: #0000FF;">M22</span><span style="color: #000000;">&#41;</span> <span style="color: #008000;">+</span> <span style="color: #000000;">&#40;</span>left.<span style="color: #0000FF;">M23</span> <span style="color: #008000;">*</span> right.<span style="color: #0000FF;">M32</span><span style="color: #000000;">&#41;</span> <span style="color: #008000;">+</span> <span style="color: #000000;">&#40;</span>left.<span style="color: #0000FF;">M24</span> <span style="color: #008000;">*</span> right.<span style="color: #0000FF;">M42</span><span style="color: #000000;">&#41;</span><span style="color: #008000;">;</span>
    r.<span style="color: #0000FF;">M23</span> <span style="color: #008000;">=</span> <span style="color: #000000;">&#40;</span>left.<span style="color: #0000FF;">M21</span> <span style="color: #008000;">*</span> right.<span style="color: #0000FF;">M13</span><span style="color: #000000;">&#41;</span> <span style="color: #008000;">+</span> <span style="color: #000000;">&#40;</span>left.<span style="color: #0000FF;">M22</span> <span style="color: #008000;">*</span> right.<span style="color: #0000FF;">M23</span><span style="color: #000000;">&#41;</span> <span style="color: #008000;">+</span> <span style="color: #000000;">&#40;</span>left.<span style="color: #0000FF;">M23</span> <span style="color: #008000;">*</span> right.<span style="color: #0000FF;">M33</span><span style="color: #000000;">&#41;</span> <span style="color: #008000;">+</span> <span style="color: #000000;">&#40;</span>left.<span style="color: #0000FF;">M24</span> <span style="color: #008000;">*</span> right.<span style="color: #0000FF;">M43</span><span style="color: #000000;">&#41;</span><span style="color: #008000;">;</span>
    r.<span style="color: #0000FF;">M24</span> <span style="color: #008000;">=</span> <span style="color: #000000;">&#40;</span>left.<span style="color: #0000FF;">M21</span> <span style="color: #008000;">*</span> right.<span style="color: #0000FF;">M14</span><span style="color: #000000;">&#41;</span> <span style="color: #008000;">+</span> <span style="color: #000000;">&#40;</span>left.<span style="color: #0000FF;">M22</span> <span style="color: #008000;">*</span> right.<span style="color: #0000FF;">M24</span><span style="color: #000000;">&#41;</span> <span style="color: #008000;">+</span> <span style="color: #000000;">&#40;</span>left.<span style="color: #0000FF;">M23</span> <span style="color: #008000;">*</span> right.<span style="color: #0000FF;">M34</span><span style="color: #000000;">&#41;</span> <span style="color: #008000;">+</span> <span style="color: #000000;">&#40;</span>left.<span style="color: #0000FF;">M24</span> <span style="color: #008000;">*</span> right.<span style="color: #0000FF;">M44</span><span style="color: #000000;">&#41;</span><span style="color: #008000;">;</span>
    r.<span style="color: #0000FF;">M31</span> <span style="color: #008000;">=</span> <span style="color: #000000;">&#40;</span>left.<span style="color: #0000FF;">M31</span> <span style="color: #008000;">*</span> right.<span style="color: #0000FF;">M11</span><span style="color: #000000;">&#41;</span> <span style="color: #008000;">+</span> <span style="color: #000000;">&#40;</span>left.<span style="color: #0000FF;">M32</span> <span style="color: #008000;">*</span> right.<span style="color: #0000FF;">M21</span><span style="color: #000000;">&#41;</span> <span style="color: #008000;">+</span> <span style="color: #000000;">&#40;</span>left.<span style="color: #0000FF;">M33</span> <span style="color: #008000;">*</span> right.<span style="color: #0000FF;">M31</span><span style="color: #000000;">&#41;</span> <span style="color: #008000;">+</span> <span style="color: #000000;">&#40;</span>left.<span style="color: #0000FF;">M34</span> <span style="color: #008000;">*</span> right.<span style="color: #0000FF;">M41</span><span style="color: #000000;">&#41;</span><span style="color: #008000;">;</span>
    r.<span style="color: #0000FF;">M32</span> <span style="color: #008000;">=</span> <span style="color: #000000;">&#40;</span>left.<span style="color: #0000FF;">M31</span> <span style="color: #008000;">*</span> right.<span style="color: #0000FF;">M12</span><span style="color: #000000;">&#41;</span> <span style="color: #008000;">+</span> <span style="color: #000000;">&#40;</span>left.<span style="color: #0000FF;">M32</span> <span style="color: #008000;">*</span> right.<span style="color: #0000FF;">M22</span><span style="color: #000000;">&#41;</span> <span style="color: #008000;">+</span> <span style="color: #000000;">&#40;</span>left.<span style="color: #0000FF;">M33</span> <span style="color: #008000;">*</span> right.<span style="color: #0000FF;">M32</span><span style="color: #000000;">&#41;</span> <span style="color: #008000;">+</span> <span style="color: #000000;">&#40;</span>left.<span style="color: #0000FF;">M34</span> <span style="color: #008000;">*</span> right.<span style="color: #0000FF;">M42</span><span style="color: #000000;">&#41;</span><span style="color: #008000;">;</span>
    r.<span style="color: #0000FF;">M33</span> <span style="color: #008000;">=</span> <span style="color: #000000;">&#40;</span>left.<span style="color: #0000FF;">M31</span> <span style="color: #008000;">*</span> right.<span style="color: #0000FF;">M13</span><span style="color: #000000;">&#41;</span> <span style="color: #008000;">+</span> <span style="color: #000000;">&#40;</span>left.<span style="color: #0000FF;">M32</span> <span style="color: #008000;">*</span> right.<span style="color: #0000FF;">M23</span><span style="color: #000000;">&#41;</span> <span style="color: #008000;">+</span> <span style="color: #000000;">&#40;</span>left.<span style="color: #0000FF;">M33</span> <span style="color: #008000;">*</span> right.<span style="color: #0000FF;">M33</span><span style="color: #000000;">&#41;</span> <span style="color: #008000;">+</span> <span style="color: #000000;">&#40;</span>left.<span style="color: #0000FF;">M34</span> <span style="color: #008000;">*</span> right.<span style="color: #0000FF;">M43</span><span style="color: #000000;">&#41;</span><span style="color: #008000;">;</span>
    r.<span style="color: #0000FF;">M34</span> <span style="color: #008000;">=</span> <span style="color: #000000;">&#40;</span>left.<span style="color: #0000FF;">M31</span> <span style="color: #008000;">*</span> right.<span style="color: #0000FF;">M14</span><span style="color: #000000;">&#41;</span> <span style="color: #008000;">+</span> <span style="color: #000000;">&#40;</span>left.<span style="color: #0000FF;">M32</span> <span style="color: #008000;">*</span> right.<span style="color: #0000FF;">M24</span><span style="color: #000000;">&#41;</span> <span style="color: #008000;">+</span> <span style="color: #000000;">&#40;</span>left.<span style="color: #0000FF;">M33</span> <span style="color: #008000;">*</span> right.<span style="color: #0000FF;">M34</span><span style="color: #000000;">&#41;</span> <span style="color: #008000;">+</span> <span style="color: #000000;">&#40;</span>left.<span style="color: #0000FF;">M34</span> <span style="color: #008000;">*</span> right.<span style="color: #0000FF;">M44</span><span style="color: #000000;">&#41;</span><span style="color: #008000;">;</span>
    r.<span style="color: #0000FF;">M41</span> <span style="color: #008000;">=</span> <span style="color: #000000;">&#40;</span>left.<span style="color: #0000FF;">M41</span> <span style="color: #008000;">*</span> right.<span style="color: #0000FF;">M11</span><span style="color: #000000;">&#41;</span> <span style="color: #008000;">+</span> <span style="color: #000000;">&#40;</span>left.<span style="color: #0000FF;">M42</span> <span style="color: #008000;">*</span> right.<span style="color: #0000FF;">M21</span><span style="color: #000000;">&#41;</span> <span style="color: #008000;">+</span> <span style="color: #000000;">&#40;</span>left.<span style="color: #0000FF;">M43</span> <span style="color: #008000;">*</span> right.<span style="color: #0000FF;">M31</span><span style="color: #000000;">&#41;</span> <span style="color: #008000;">+</span> <span style="color: #000000;">&#40;</span>left.<span style="color: #0000FF;">M44</span> <span style="color: #008000;">*</span> right.<span style="color: #0000FF;">M41</span><span style="color: #000000;">&#41;</span><span style="color: #008000;">;</span>
    r.<span style="color: #0000FF;">M42</span> <span style="color: #008000;">=</span> <span style="color: #000000;">&#40;</span>left.<span style="color: #0000FF;">M41</span> <span style="color: #008000;">*</span> right.<span style="color: #0000FF;">M12</span><span style="color: #000000;">&#41;</span> <span style="color: #008000;">+</span> <span style="color: #000000;">&#40;</span>left.<span style="color: #0000FF;">M42</span> <span style="color: #008000;">*</span> right.<span style="color: #0000FF;">M22</span><span style="color: #000000;">&#41;</span> <span style="color: #008000;">+</span> <span style="color: #000000;">&#40;</span>left.<span style="color: #0000FF;">M43</span> <span style="color: #008000;">*</span> right.<span style="color: #0000FF;">M32</span><span style="color: #000000;">&#41;</span> <span style="color: #008000;">+</span> <span style="color: #000000;">&#40;</span>left.<span style="color: #0000FF;">M44</span> <span style="color: #008000;">*</span> right.<span style="color: #0000FF;">M42</span><span style="color: #000000;">&#41;</span><span style="color: #008000;">;</span>
    r.<span style="color: #0000FF;">M43</span> <span style="color: #008000;">=</span> <span style="color: #000000;">&#40;</span>left.<span style="color: #0000FF;">M41</span> <span style="color: #008000;">*</span> right.<span style="color: #0000FF;">M13</span><span style="color: #000000;">&#41;</span> <span style="color: #008000;">+</span> <span style="color: #000000;">&#40;</span>left.<span style="color: #0000FF;">M42</span> <span style="color: #008000;">*</span> right.<span style="color: #0000FF;">M23</span><span style="color: #000000;">&#41;</span> <span style="color: #008000;">+</span> <span style="color: #000000;">&#40;</span>left.<span style="color: #0000FF;">M43</span> <span style="color: #008000;">*</span> right.<span style="color: #0000FF;">M33</span><span style="color: #000000;">&#41;</span> <span style="color: #008000;">+</span> <span style="color: #000000;">&#40;</span>left.<span style="color: #0000FF;">M44</span> <span style="color: #008000;">*</span> right.<span style="color: #0000FF;">M43</span><span style="color: #000000;">&#41;</span><span style="color: #008000;">;</span>
    r.<span style="color: #0000FF;">M44</span> <span style="color: #008000;">=</span> <span style="color: #000000;">&#40;</span>left.<span style="color: #0000FF;">M41</span> <span style="color: #008000;">*</span> right.<span style="color: #0000FF;">M14</span><span style="color: #000000;">&#41;</span> <span style="color: #008000;">+</span> <span style="color: #000000;">&#40;</span>left.<span style="color: #0000FF;">M42</span> <span style="color: #008000;">*</span> right.<span style="color: #0000FF;">M24</span><span style="color: #000000;">&#41;</span> <span style="color: #008000;">+</span> <span style="color: #000000;">&#40;</span>left.<span style="color: #0000FF;">M43</span> <span style="color: #008000;">*</span> right.<span style="color: #0000FF;">M34</span><span style="color: #000000;">&#41;</span> <span style="color: #008000;">+</span> <span style="color: #000000;">&#40;</span>left.<span style="color: #0000FF;">M44</span> <span style="color: #008000;">*</span> right.<span style="color: #0000FF;">M44</span><span style="color: #000000;">&#41;</span><span style="color: #008000;">;</span>
    result <span style="color: #008000;">=</span> r<span style="color: #008000;">;</span>
<span style="color: #000000;">&#125;</span></pre></div></div>

]]></content:encoded>
			<wfw:commentRss>http://scapecode.com/2009/08/slimgen-and-you-part-add-eax-eax-of-n-2/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>SlimGen and You, Part ADD AL, [RAX] of N</title>
		<link>http://scapecode.com/2009/08/slimgen-and-you-part-add-al-rax-of-n/</link>
		<comments>http://scapecode.com/2009/08/slimgen-and-you-part-add-al-rax-of-n/#comments</comments>
		<pubDate>Fri, 14 Aug 2009 20:40:23 +0000</pubDate>
		<dc:creator>Washu</dc:creator>
				<category><![CDATA[.Net]]></category>
		<category><![CDATA[SlimDX]]></category>
		<category><![CDATA[SlimGen]]></category>
		<category><![CDATA[Software Development]]></category>

		<guid isPermaLink="false">http://scapecode.com/?p=72</guid>
		<description><![CDATA[The question does arise though, when using SlimGen and writing your SSE replacement methods, what kind of calling convention does the CLR use? The CLR uses a version of fastcall. On x86 processors this means that the first two parameters (that are DWORD or smaller) are passed in ECX and EDX. However, and this is [...]]]></description>
			<content:encoded><![CDATA[<p>The question does arise though, when using SlimGen and writing your SSE replacement methods, what kind of calling convention does the CLR use?</p>
<p>The CLR uses a version of fastcall. On x86 processors this means that the first two parameters (that are DWORD or smaller) are passed in ECX and EDX. However, and this is where the CLR differs from standard fastcall, the parameters after the first two are pushed onto the stack from left to right, not right to left. This is important to remember, especially for functions that take a variable number of arguments. So a call like: <tt>X(‘c’, 2, 3.0f, “Hello”);</tt> becomes:</p>

<div class="wp_syntax"><div class="code"><pre class="asm" style="font-family:monospace;">X<span style="color: #009900; font-weight: bold;">&#40;</span><span style="color: #7f007f;">'c'</span><span style="color: #339933;">,</span> <span style="color: #0000ff;">2</span><span style="color: #339933;">,</span> <span style="color: #0000ff;">3.0f</span><span style="color: #339933;">,</span> <span style="color: #7f007f;">&quot;Hello&quot;</span><span style="color: #009900; font-weight: bold;">&#41;</span><span style="color: #666666; font-style: italic;">;</span>
<span style="color: #adadad; font-style: italic;">00000025</span>  <span style="color: #00007f; font-weight: bold;">push</span>        <span style="color: #0000ff;">40400000h</span> <span style="color: #666666; font-style: italic;">; 3.0f</span>
<span style="color: #adadad; font-style: italic;">0000002a</span>  <span style="color: #00007f; font-weight: bold;">push</span>        <span style="color: #000000; font-weight: bold;">dword</span> <span style="color: #000000; font-weight: bold;">ptr</span> <span style="color: #00007f;">ds</span><span style="color: #339933;">:</span><span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #0000ff;">03402088h</span><span style="color: #009900; font-weight: bold;">&#93;</span> <span style="color: #666666; font-style: italic;">;Address of &quot;Hello&quot;</span>
<span style="color: #adadad; font-style: italic;">00000030</span>  <span style="color: #00007f; font-weight: bold;">mov</span>         <span style="color: #00007f;">edx</span><span style="color: #339933;">,</span><span style="color: #0000ff;">2</span> 
<span style="color: #adadad; font-style: italic;">00000035</span>  <span style="color: #00007f; font-weight: bold;">mov</span>         <span style="color: #00007f;">ecx</span><span style="color: #339933;">,</span><span style="color: #0000ff;">63h</span> <span style="color: #666666; font-style: italic;">;'c'</span>
<span style="color: #adadad; font-style: italic;">0000003a</span>  <span style="color: #00007f; font-weight: bold;">call</span>        FFB8B040</pre></div></div>

<p>The situation is the same for member functions as well, except with this being passed in ECX, which leaves only EDX to hold an additional parameter. The rest are passed on the stack as before:</p>

<div class="wp_syntax"><div class="code"><pre class="asm" style="font-family:monospace;">p<span style="color: #339933;">.</span>Y<span style="color: #009900; font-weight: bold;">&#40;</span><span style="color: #0000ff;">2</span><span style="color: #339933;">,</span> <span style="color: #0000ff;">3.0f</span><span style="color: #009900; font-weight: bold;">&#41;</span><span style="color: #666666; font-style: italic;">;</span>
<span style="color: #adadad; font-style: italic;">0000006d</span>  <span style="color: #00007f; font-weight: bold;">push</span>        <span style="color: #0000ff;">40400000h</span>  <span style="color: #666666; font-style: italic;">; 3.0f</span>
<span style="color: #adadad; font-style: italic;">00000072</span>  <span style="color: #00007f; font-weight: bold;">mov</span>         <span style="color: #00007f;">ecx</span><span style="color: #339933;">,</span><span style="color: #000000; font-weight: bold;">dword</span> <span style="color: #000000; font-weight: bold;">ptr</span> <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #00007f;">ebp</span><span style="color: #339933;">-</span><span style="color: #0000ff;">40h</span><span style="color: #009900; font-weight: bold;">&#93;</span> <span style="color: #666666; font-style: italic;">;this</span>
<span style="color: #adadad; font-style: italic;">00000075</span>  <span style="color: #00007f; font-weight: bold;">mov</span>         <span style="color: #00007f;">edx</span><span style="color: #339933;">,</span><span style="color: #0000ff;">2</span>
<span style="color: #adadad; font-style: italic;">0000007c</span>  <span style="color: #00007f; font-weight: bold;">call</span>        FFA1B048</pre></div></div>

<p>So this all seems clear enough, but it’s important to note these differences, especially when you’re poking around in the low level bowels of the CLR or when you’re doing what SlimGen does: which is replacing actual method bodies.</p>
<p>So this does beget the question, what about on the x64 platform? Well, again, the calling convention is fastcall with a few differences. The first four parameters are in RCX, RDX, R8 and R9 (or smaller registers), unless those parameters are floating point types, in which case they are passed using XMM registers.&#160;</p>

<div class="wp_syntax"><div class="code"><pre class="asm" style="font-family:monospace;">Z<span style="color: #009900; font-weight: bold;">&#40;</span><span style="color: #7f007f;">'c'</span><span style="color: #339933;">,</span> <span style="color: #0000ff;">2</span><span style="color: #339933;">,</span> <span style="color: #0000ff;">3.0f</span><span style="color: #339933;">,</span> <span style="color: #7f007f;">&quot;Hello&quot;</span><span style="color: #339933;">,</span> <span style="color: #0000ff;">1.0</span><span style="color: #339933;">,</span> pa<span style="color: #009900; font-weight: bold;">&#41;</span><span style="color: #666666; font-style: italic;">;</span>
<span style="color: #adadad; font-style: italic;">000000c0</span>  <span style="color: #00007f; font-weight: bold;">mov</span>         r9<span style="color: #339933;">,</span><span style="color: #0000ff;">124D3100h</span> 
<span style="color: #adadad; font-style: italic;">000000ca</span>  <span style="color: #00007f; font-weight: bold;">mov</span>         r9<span style="color: #339933;">,</span><span style="color: #000000; font-weight: bold;">qword</span> <span style="color: #000000; font-weight: bold;">ptr</span> <span style="color: #009900; font-weight: bold;">&#91;</span>r9<span style="color: #009900; font-weight: bold;">&#93;</span> <span style="color: #666666; font-style: italic;">; &quot;Hello&quot;</span>
<span style="color: #adadad; font-style: italic;">000000cd</span>  <span style="color: #00007f; font-weight: bold;">mov</span>         rax<span style="color: #339933;">,</span><span style="color: #000000; font-weight: bold;">qword</span> <span style="color: #000000; font-weight: bold;">ptr</span> <span style="color: #009900; font-weight: bold;">&#91;</span>rsp<span style="color: #339933;">+</span><span style="color: #0000ff;">38h</span><span style="color: #009900; font-weight: bold;">&#93;</span> <span style="color: #666666; font-style: italic;">;pa (IntPtr[])</span>
<span style="color: #adadad; font-style: italic;">000000d2</span>  <span style="color: #00007f; font-weight: bold;">mov</span>         <span style="color: #000000; font-weight: bold;">qword</span> <span style="color: #000000; font-weight: bold;">ptr</span> <span style="color: #009900; font-weight: bold;">&#91;</span>rsp<span style="color: #339933;">+</span><span style="color: #0000ff;">28h</span><span style="color: #009900; font-weight: bold;">&#93;</span><span style="color: #339933;">,</span>rax <span style="color: #666666; font-style: italic;">;pa - stack spill</span>
<span style="color: #adadad; font-style: italic;">000000d7</span>  <span style="color: #00007f; font-weight: bold;">movsd</span>       <span style="color: #00007f;">xmm0</span><span style="color: #339933;">,</span>mmword <span style="color: #000000; font-weight: bold;">ptr</span> <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #0000ff;">00000118h</span><span style="color: #009900; font-weight: bold;">&#93;</span> <span style="color: #666666; font-style: italic;">;1.0</span>
<span style="color: #adadad; font-style: italic;">000000df</span>  <span style="color: #00007f; font-weight: bold;">movsd</span>       mmword <span style="color: #000000; font-weight: bold;">ptr</span> <span style="color: #009900; font-weight: bold;">&#91;</span>rsp<span style="color: #339933;">+</span><span style="color: #0000ff;">20h</span><span style="color: #009900; font-weight: bold;">&#93;</span><span style="color: #339933;">,</span><span style="color: #00007f;">xmm0</span> <span style="color: #666666; font-style: italic;">;1.0 - stack spill</span>
<span style="color: #adadad; font-style: italic;">000000e5</span>  movss       <span style="color: #00007f;">xmm2</span><span style="color: #339933;">,</span><span style="color: #000000; font-weight: bold;">dword</span> <span style="color: #000000; font-weight: bold;">ptr</span> <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #0000ff;">00000110h</span><span style="color: #009900; font-weight: bold;">&#93;</span> <span style="color: #666666; font-style: italic;">;3.0f</span>
<span style="color: #adadad; font-style: italic;">000000ed</span>  <span style="color: #00007f; font-weight: bold;">mov</span>         <span style="color: #00007f;">edx</span><span style="color: #339933;">,</span><span style="color: #0000ff;">2</span> <span style="color: #666666; font-style: italic;">;int (2)</span>
<span style="color: #adadad; font-style: italic;">000000f2</span>  <span style="color: #00007f; font-weight: bold;">mov</span>         <span style="color: #00007f;">cx</span><span style="color: #339933;">,</span><span style="color: #0000ff;">63h</span> <span style="color: #666666; font-style: italic;">;'c' </span>
<span style="color: #adadad; font-style: italic;">000000f6</span>  <span style="color: #00007f; font-weight: bold;">call</span>        FFFFFFFFFFEC9300</pre></div></div>

<p>Whew, that looks pretty nasty doesn’t it? But if you notice, pretty much every single parameter to that function is passed in a register. The stack spillage is part of the calling convention to allow for variables to be spilled into memory (or read back from memory) when the register needs to be used. Calling an instance method follows pretty much the same rules, except the this pointer is passed in RCX first.</p>

<div class="wp_syntax"><div class="code"><pre class="asm" style="font-family:monospace;">p<span style="color: #339933;">.</span>Q<span style="color: #009900; font-weight: bold;">&#40;</span>~0L<span style="color: #339933;">,</span> ~1L<span style="color: #339933;">,</span> ~2L<span style="color: #339933;">,</span> ~<span style="color: #0000ff;">3</span><span style="color: #009900; font-weight: bold;">&#41;</span><span style="color: #666666; font-style: italic;">;</span>
<span style="color: #adadad; font-style: italic;">0000010a</span>  <span style="color: #00007f; font-weight: bold;">mov</span>         rcx<span style="color: #339933;">,</span><span style="color: #000000; font-weight: bold;">qword</span> <span style="color: #000000; font-weight: bold;">ptr</span> <span style="color: #009900; font-weight: bold;">&#91;</span>rsp<span style="color: #339933;">+</span><span style="color: #0000ff;">30h</span><span style="color: #009900; font-weight: bold;">&#93;</span> <span style="color: #666666; font-style: italic;">; this pointer</span>
<span style="color: #adadad; font-style: italic;">0000010f</span>  <span style="color: #00007f; font-weight: bold;">mov</span>         <span style="color: #000000; font-weight: bold;">qword</span> <span style="color: #000000; font-weight: bold;">ptr</span> <span style="color: #009900; font-weight: bold;">&#91;</span>rsp<span style="color: #339933;">+</span><span style="color: #0000ff;">20h</span><span style="color: #009900; font-weight: bold;">&#93;</span><span style="color: #339933;">,</span><span style="color: #0000ff;">0FFFFFFFFFFFFFFFCh</span> <span style="color: #666666; font-style: italic;">;~3L, spilled to stack</span>
<span style="color: #adadad; font-style: italic;">00000118</span>  <span style="color: #00007f; font-weight: bold;">mov</span>         r9<span style="color: #339933;">,</span><span style="color: #0000ff;">0FFFFFFFFFFFFFFFDh</span> <span style="color: #666666; font-style: italic;">;~2L</span>
<span style="color: #adadad; font-style: italic;">0000011f</span>  <span style="color: #00007f; font-weight: bold;">mov</span>         r8<span style="color: #339933;">,</span><span style="color: #0000ff;">0FFFFFFFFFFFFFFFEh</span> <span style="color: #666666; font-style: italic;">;~1L</span>
<span style="color: #adadad; font-style: italic;">00000126</span>  <span style="color: #00007f; font-weight: bold;">mov</span>         rdx<span style="color: #339933;">,</span><span style="color: #0000ff;">0FFFFFFFFFFFFFFFFh</span> <span style="color: #666666; font-style: italic;">;~0L</span>
<span style="color: #adadad; font-style: italic;">0000012d</span>  <span style="color: #00007f; font-weight: bold;">call</span>        FFFFFFFFFFEC9310&lt;<span style="color: #339933;">/</span>p&gt;</pre></div></div>

<p>Calling a function and passing something larger than a register can store does pose an interesting problem, the CLR deals with it by moving the entire data onto the stack, and passing it (hence call by value)</p>

<div class="wp_syntax"><div class="code"><pre class="asm" style="font-family:monospace;">var v = new Vector<span style="color: #009900; font-weight: bold;">&#40;</span><span style="color: #009900; font-weight: bold;">&#41;</span><span style="color: #666666; font-style: italic;">;</span>
p<span style="color: #339933;">.</span>R<span style="color: #009900; font-weight: bold;">&#40;</span>v<span style="color: #009900; font-weight: bold;">&#41;</span><span style="color: #666666; font-style: italic;">;</span>
<span style="color: #adadad; font-style: italic;">00000169</span>  <span style="color: #00007f; font-weight: bold;">lea</span>         rcx<span style="color: #339933;">,</span><span style="color: #009900; font-weight: bold;">&#91;</span>rsp<span style="color: #339933;">+</span><span style="color: #0000ff;">40h</span><span style="color: #009900; font-weight: bold;">&#93;</span> 
<span style="color: #adadad; font-style: italic;">0000016e</span>  <span style="color: #00007f; font-weight: bold;">mov</span>         rax<span style="color: #339933;">,</span><span style="color: #000000; font-weight: bold;">qword</span> <span style="color: #000000; font-weight: bold;">ptr</span> <span style="color: #009900; font-weight: bold;">&#91;</span>rcx<span style="color: #009900; font-weight: bold;">&#93;</span> 
<span style="color: #adadad; font-style: italic;">00000171</span>  <span style="color: #00007f; font-weight: bold;">mov</span>         <span style="color: #000000; font-weight: bold;">qword</span> <span style="color: #000000; font-weight: bold;">ptr</span> <span style="color: #009900; font-weight: bold;">&#91;</span>rsp<span style="color: #339933;">+</span><span style="color: #0000ff;">50h</span><span style="color: #009900; font-weight: bold;">&#93;</span><span style="color: #339933;">,</span>rax 
<span style="color: #adadad; font-style: italic;">00000176</span>  <span style="color: #00007f; font-weight: bold;">mov</span>         rax<span style="color: #339933;">,</span><span style="color: #000000; font-weight: bold;">qword</span> <span style="color: #000000; font-weight: bold;">ptr</span> <span style="color: #009900; font-weight: bold;">&#91;</span>rcx<span style="color: #339933;">+</span><span style="color: #0000ff;">8</span><span style="color: #009900; font-weight: bold;">&#93;</span> 
<span style="color: #adadad; font-style: italic;">0000017a</span>  <span style="color: #00007f; font-weight: bold;">mov</span>         <span style="color: #000000; font-weight: bold;">qword</span> <span style="color: #000000; font-weight: bold;">ptr</span> <span style="color: #009900; font-weight: bold;">&#91;</span>rsp<span style="color: #339933;">+</span><span style="color: #0000ff;">58h</span><span style="color: #009900; font-weight: bold;">&#93;</span><span style="color: #339933;">,</span>rax 
<span style="color: #adadad; font-style: italic;">0000017f</span>  <span style="color: #00007f; font-weight: bold;">lea</span>         rdx<span style="color: #339933;">,</span><span style="color: #009900; font-weight: bold;">&#91;</span>rsp<span style="color: #339933;">+</span><span style="color: #0000ff;">50h</span><span style="color: #009900; font-weight: bold;">&#93;</span> 
<span style="color: #adadad; font-style: italic;">00000184</span>  <span style="color: #00007f; font-weight: bold;">mov</span>         rcx<span style="color: #339933;">,</span>r8 
<span style="color: #adadad; font-style: italic;">00000187</span>  <span style="color: #00007f; font-weight: bold;">call</span>        FFFFFFFFFFEC9318</pre></div></div>

<p>As you can see, it copies the data from the vector onto the stack, stores the this pointer in RCX, and then calls to the function. This is why pass by reference is the preferred method (for fast code) to move around structures that are non-trivial.</p>
<p>All of this goes into calcuating our matrix multiplication method (which assumes the output is not one of the inputs):</p>

<div class="wp_syntax"><div class="code"><pre class="asm" style="font-family:monospace;">BITS        <span style="color: #0000ff;">32</span>
<span style="color: #000000; font-weight: bold;">ORG</span>         <span style="color: #0000ff;">0x59f0</span>
<span style="color: #666666; font-style: italic;">;           void Multiply(ref Matrix, ref Matrix, out Matrix)</span>
start<span style="color: #339933;">:</span>      <span style="color: #00007f; font-weight: bold;">mov</span>     <span style="color: #00007f;">eax</span><span style="color: #339933;">,</span> <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #00007f;">esp</span> <span style="color: #339933;">+</span> <span style="color: #0000ff;">4</span><span style="color: #009900; font-weight: bold;">&#93;</span>
            <span style="color: #00007f; font-weight: bold;">movups</span>  <span style="color: #00007f;">xmm4</span><span style="color: #339933;">,</span> <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #00007f;">edx</span><span style="color: #009900; font-weight: bold;">&#93;</span>
            <span style="color: #00007f; font-weight: bold;">movups</span>  <span style="color: #00007f;">xmm5</span><span style="color: #339933;">,</span> <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #00007f;">edx</span> <span style="color: #339933;">+</span> <span style="color: #0000ff;">0x10</span><span style="color: #009900; font-weight: bold;">&#93;</span>
            <span style="color: #00007f; font-weight: bold;">movups</span>  <span style="color: #00007f;">xmm6</span><span style="color: #339933;">,</span> <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #00007f;">edx</span> <span style="color: #339933;">+</span> <span style="color: #0000ff;">0x20</span><span style="color: #009900; font-weight: bold;">&#93;</span>
            <span style="color: #00007f; font-weight: bold;">movups</span>  <span style="color: #00007f;">xmm7</span><span style="color: #339933;">,</span> <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #00007f;">edx</span> <span style="color: #339933;">+</span> <span style="color: #0000ff;">0x30</span><span style="color: #009900; font-weight: bold;">&#93;</span>
&nbsp;
            <span style="color: #00007f; font-weight: bold;">movups</span>  <span style="color: #00007f;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #00007f;">ecx</span><span style="color: #009900; font-weight: bold;">&#93;</span>
            <span style="color: #00007f; font-weight: bold;">movaps</span>  <span style="color: #00007f;">xmm1</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm0</span>
            <span style="color: #00007f; font-weight: bold;">movaps</span>  <span style="color: #00007f;">xmm2</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm0</span>
            <span style="color: #00007f; font-weight: bold;">movaps</span>  <span style="color: #00007f;">xmm3</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm0</span>
            <span style="color: #00007f; font-weight: bold;">shufps</span>  <span style="color: #00007f;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm1</span><span style="color: #339933;">,</span> <span style="color: #0000ff;">0x00</span>
            <span style="color: #00007f; font-weight: bold;">shufps</span>  <span style="color: #00007f;">xmm1</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm1</span><span style="color: #339933;">,</span> <span style="color: #0000ff;">0x55</span>
            <span style="color: #00007f; font-weight: bold;">shufps</span>  <span style="color: #00007f;">xmm2</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm2</span><span style="color: #339933;">,</span> <span style="color: #0000ff;">0xAA</span>
            <span style="color: #00007f; font-weight: bold;">shufps</span>  <span style="color: #00007f;">xmm3</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm3</span><span style="color: #339933;">,</span> <span style="color: #0000ff;">0xFF</span>
&nbsp;
            <span style="color: #00007f; font-weight: bold;">mulps</span>   <span style="color: #00007f;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm4</span>
            <span style="color: #00007f; font-weight: bold;">mulps</span>   <span style="color: #00007f;">xmm1</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm5</span>
            <span style="color: #00007f; font-weight: bold;">mulps</span>   <span style="color: #00007f;">xmm2</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm6</span>
            <span style="color: #00007f; font-weight: bold;">mulps</span>   <span style="color: #00007f;">xmm3</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm7</span>
            <span style="color: #00007f; font-weight: bold;">addps</span>   <span style="color: #00007f;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm2</span>
            <span style="color: #00007f; font-weight: bold;">addps</span>   <span style="color: #00007f;">xmm1</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm3</span>
            <span style="color: #00007f; font-weight: bold;">addps</span>   <span style="color: #00007f;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm1</span>
&nbsp;
            <span style="color: #00007f; font-weight: bold;">movups</span>  <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #00007f;">eax</span><span style="color: #009900; font-weight: bold;">&#93;</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm0</span> <span style="color: #666666; font-style: italic;">; Calculate row 0 of new matrix</span>
&nbsp;
            <span style="color: #00007f; font-weight: bold;">movups</span>  <span style="color: #00007f;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #00007f;">ecx</span> <span style="color: #339933;">+</span> <span style="color: #0000ff;">0x10</span><span style="color: #009900; font-weight: bold;">&#93;</span>
            <span style="color: #00007f; font-weight: bold;">movaps</span>  <span style="color: #00007f;">xmm1</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm0</span>
            <span style="color: #00007f; font-weight: bold;">movaps</span>  <span style="color: #00007f;">xmm2</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm0</span>
            <span style="color: #00007f; font-weight: bold;">movaps</span>  <span style="color: #00007f;">xmm3</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm0</span>
            <span style="color: #00007f; font-weight: bold;">shufps</span>  <span style="color: #00007f;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #0000ff;">0x00</span>
            <span style="color: #00007f; font-weight: bold;">shufps</span>  <span style="color: #00007f;">xmm1</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm1</span><span style="color: #339933;">,</span> <span style="color: #0000ff;">0x55</span>
            <span style="color: #00007f; font-weight: bold;">shufps</span>  <span style="color: #00007f;">xmm2</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm2</span><span style="color: #339933;">,</span> <span style="color: #0000ff;">0xAA</span>
            <span style="color: #00007f; font-weight: bold;">shufps</span>  <span style="color: #00007f;">xmm3</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm3</span><span style="color: #339933;">,</span> <span style="color: #0000ff;">0xFF</span>
&nbsp;
            <span style="color: #00007f; font-weight: bold;">mulps</span>   <span style="color: #00007f;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm4</span>
            <span style="color: #00007f; font-weight: bold;">mulps</span>   <span style="color: #00007f;">xmm1</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm5</span>
            <span style="color: #00007f; font-weight: bold;">mulps</span>   <span style="color: #00007f;">xmm2</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm6</span>
            <span style="color: #00007f; font-weight: bold;">mulps</span>   <span style="color: #00007f;">xmm3</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm7</span>
            <span style="color: #00007f; font-weight: bold;">addps</span>   <span style="color: #00007f;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm2</span>
            <span style="color: #00007f; font-weight: bold;">addps</span>   <span style="color: #00007f;">xmm1</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm3</span>
            <span style="color: #00007f; font-weight: bold;">addps</span>   <span style="color: #00007f;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm1</span>
&nbsp;
            <span style="color: #00007f; font-weight: bold;">movups</span>  <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #00007f;">eax</span> <span style="color: #339933;">+</span> <span style="color: #0000ff;">0x10</span><span style="color: #009900; font-weight: bold;">&#93;</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm0</span> <span style="color: #666666; font-style: italic;">; Calculate row 1 of new matrix</span>
&nbsp;
            <span style="color: #00007f; font-weight: bold;">movups</span>  <span style="color: #00007f;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #00007f;">ecx</span> <span style="color: #339933;">+</span> <span style="color: #0000ff;">0x20</span><span style="color: #009900; font-weight: bold;">&#93;</span>
            <span style="color: #00007f; font-weight: bold;">movaps</span>  <span style="color: #00007f;">xmm1</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm0</span>
            <span style="color: #00007f; font-weight: bold;">movaps</span>  <span style="color: #00007f;">xmm2</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm0</span>
            <span style="color: #00007f; font-weight: bold;">movaps</span>  <span style="color: #00007f;">xmm3</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm0</span>
            <span style="color: #00007f; font-weight: bold;">shufps</span>  <span style="color: #00007f;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #0000ff;">0x00</span>
            <span style="color: #00007f; font-weight: bold;">shufps</span>  <span style="color: #00007f;">xmm1</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm1</span><span style="color: #339933;">,</span> <span style="color: #0000ff;">0x55</span>
            <span style="color: #00007f; font-weight: bold;">shufps</span>  <span style="color: #00007f;">xmm2</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm2</span><span style="color: #339933;">,</span> <span style="color: #0000ff;">0xAA</span>
            <span style="color: #00007f; font-weight: bold;">shufps</span>  <span style="color: #00007f;">xmm3</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm3</span><span style="color: #339933;">,</span> <span style="color: #0000ff;">0xFF</span>
&nbsp;
            <span style="color: #00007f; font-weight: bold;">mulps</span>   <span style="color: #00007f;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm4</span>
            <span style="color: #00007f; font-weight: bold;">mulps</span>   <span style="color: #00007f;">xmm1</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm5</span>
            <span style="color: #00007f; font-weight: bold;">mulps</span>   <span style="color: #00007f;">xmm2</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm6</span>
            <span style="color: #00007f; font-weight: bold;">mulps</span>   <span style="color: #00007f;">xmm3</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm7</span>
            <span style="color: #00007f; font-weight: bold;">addps</span>   <span style="color: #00007f;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm2</span>
            <span style="color: #00007f; font-weight: bold;">addps</span>   <span style="color: #00007f;">xmm1</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm3</span>
            <span style="color: #00007f; font-weight: bold;">addps</span>   <span style="color: #00007f;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm1</span>
&nbsp;
            <span style="color: #00007f; font-weight: bold;">movups</span>  <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #00007f;">eax</span> <span style="color: #339933;">+</span> <span style="color: #0000ff;">0x20</span><span style="color: #009900; font-weight: bold;">&#93;</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm0</span> <span style="color: #666666; font-style: italic;">; Calculate row 2 of new matrix</span>
&nbsp;
            <span style="color: #00007f; font-weight: bold;">movups</span>  <span style="color: #00007f;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #00007f;">ecx</span> <span style="color: #339933;">+</span> <span style="color: #0000ff;">0x30</span><span style="color: #009900; font-weight: bold;">&#93;</span>
            <span style="color: #00007f; font-weight: bold;">movaps</span>  <span style="color: #00007f;">xmm1</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm0</span>
            <span style="color: #00007f; font-weight: bold;">movaps</span>  <span style="color: #00007f;">xmm2</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm0</span>
            <span style="color: #00007f; font-weight: bold;">movaps</span>  <span style="color: #00007f;">xmm3</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm0</span>
            <span style="color: #00007f; font-weight: bold;">shufps</span>  <span style="color: #00007f;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #0000ff;">0x00</span>
            <span style="color: #00007f; font-weight: bold;">shufps</span>  <span style="color: #00007f;">xmm1</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm1</span><span style="color: #339933;">,</span> <span style="color: #0000ff;">0x55</span>
            <span style="color: #00007f; font-weight: bold;">shufps</span>  <span style="color: #00007f;">xmm2</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm2</span><span style="color: #339933;">,</span> <span style="color: #0000ff;">0xAA</span>
            <span style="color: #00007f; font-weight: bold;">shufps</span>  <span style="color: #00007f;">xmm3</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm3</span><span style="color: #339933;">,</span> <span style="color: #0000ff;">0xFF</span>
&nbsp;
            <span style="color: #00007f; font-weight: bold;">mulps</span>   <span style="color: #00007f;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm4</span>
            <span style="color: #00007f; font-weight: bold;">mulps</span>   <span style="color: #00007f;">xmm1</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm5</span>
            <span style="color: #00007f; font-weight: bold;">mulps</span>   <span style="color: #00007f;">xmm2</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm6</span>
            <span style="color: #00007f; font-weight: bold;">mulps</span>   <span style="color: #00007f;">xmm3</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm7</span>
            <span style="color: #00007f; font-weight: bold;">addps</span>   <span style="color: #00007f;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm2</span>
            <span style="color: #00007f; font-weight: bold;">addps</span>   <span style="color: #00007f;">xmm1</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm3</span>
            <span style="color: #00007f; font-weight: bold;">addps</span>   <span style="color: #00007f;">xmm0</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm1</span>
&nbsp;
            <span style="color: #00007f; font-weight: bold;">movups</span>  <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #00007f;">eax</span> <span style="color: #339933;">+</span> <span style="color: #0000ff;">0x30</span><span style="color: #009900; font-weight: bold;">&#93;</span><span style="color: #339933;">,</span> <span style="color: #00007f;">xmm0</span> <span style="color: #666666; font-style: italic;">; Calculate row 3 of new matrix</span>
            <span style="color: #00007f; font-weight: bold;">ret</span>     <span style="color: #0000ff;">4</span></pre></div></div>

]]></content:encoded>
			<wfw:commentRss>http://scapecode.com/2009/08/slimgen-and-you-part-add-al-rax-of-n/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>SlimGen and You, Part ADD [EAX], EAX of N</title>
		<link>http://scapecode.com/2009/08/slimgen-and-you-part-add-eax-eax-of-n/</link>
		<comments>http://scapecode.com/2009/08/slimgen-and-you-part-add-eax-eax-of-n/#comments</comments>
		<pubDate>Fri, 07 Aug 2009 23:00:16 +0000</pubDate>
		<dc:creator>Washu</dc:creator>
				<category><![CDATA[.Net]]></category>
		<category><![CDATA[SlimDX]]></category>
		<category><![CDATA[SlimGen]]></category>
		<category><![CDATA[Software Development]]></category>

		<guid isPermaLink="false">http://scapecode.com/?p=70</guid>
		<description><![CDATA[So previously we delved into one of the nastier performance corners on the .Net framework. Today I’m going to introduce you to a tool, that is in development currently, which allows you to take those slow math functions of yours and replace them with high performance SSE optimized methods. We’ve called it SlimGen, which although [...]]]></description>
			<content:encoded><![CDATA[<p>So previously we delved into one of the nastier performance corners on the .Net framework. Today I’m going to introduce you to a tool, that is in development currently, which allows you to take those slow math functions of yours and replace them with high performance SSE optimized methods.</p>
<p>We’ve called it <a href="http://code.google.com/p/slimgen/">SlimGen</a>, which although not exactly accurate, does fit nicely in with the other Slim projects currently underway including <a href="http://ventspace.wordpress.com/">SlimTune</a>, and the flagship that started it all, <a href="http://slimdx.org">SlimDX</a>.</p>
<p>So what does SlimGen do? Well, you pass it a .Net assembly and it replaces the native method bodies, which are generated using NGEN, with replacement ones written in assembly (for now). This modified assembly then replaces the original assembly that was stored in the native image store. SlimGen can operate on signed and unsigned assemblies alike, as the native image is not signed, more on this later though.</p>
<p>Managed PE files contain a great deal of metadata stored in tables. You can enumerate these tables and parse them yourself, for instance if you were writing your own <a href="http://scientificninja.com/tag/clr">CLR</a>. Thankfully though, the .Net framework comes with several COM interfaces that are very helpful in accessing these tables without having to manually parse them out of the PE file, this is especially useful since the table rows are are not a fixed format. Specifically, indexes in the tables can be either a 2 bytes or 4 bytes in size depending on the size of the dataset indexed. In the case of SlimGen we use the <a href="http://msdn.microsoft.com/en-us/library/ms232953.aspx">IMetaDataImport2 interface</a> for accessing the metadata.</p>
<p>Of course, the managed metadata does not contain all of the information we need. NGEN manipulates the managed assembly and introduces pre-jitted versions of the functions contained within the assembly. However, their managed counterparts remain in the assembly and are what the metadata tables reference to. So how does one go from a managed method and its IL to the associated unmanaged code? Well, the CLR header of a PE file does contain a pointer to a table for a native header. However the exact format of that table is undocumented and as such it makes it hard to parse it and find the information we need. Therefore we have to use an alternative method…</p>
<p>When you load up an assembly the CLR generates, using the metadata and other information found in the PE file, a set of runtime tables that it uses to indicate information about where things are in memory, and their current state. For instance, it can tell if its jitted a method or not. When you load up an assembly that’s been NGENed, it checks the native images for an associated copy, assuming your assembly validates, and will load up the NGENed assembly and parse out the appropriate information from that. Therefore we need some way of gaining access to these runtime generated tables. Enter the debugger.</p>
<p>The .Net framework exposes debugging interfaces that are quite trivial to implement, but more important, they give you access to all of the runtime information available to the CLR. In the case of SlimGen what we do is load up your assembly (not run) into a host process and then simply have the host process execute a debugger breakpoint. The SlimGen analyzer first initializes its self as a debugger and then executes the host process as the attached debugger. When the breakpoint is hit, it breaks into the analyzer, which can then begin the work of processing the loaded assemblies. Since SlimGen knows which assembly it fed to the host, it is able to filter out all of the other assemblies that have been loaded and focus in on the one we care about. First we check and see if a native version of the assembly has been loaded, for if one hasn’t been loaded there is no point in continuing. if not then we simply report an error and cleanup. Assuming there is a native version of the assembly loaded then we use the aforementioned metadata interfaces to walk the assembly and find all of the methods that have been marked for replacement. Each method is examined to ensure that it has a native counterpart, and if it doesn’t another warning is issued and the method is skipped.</p>
<p>Now comes the annoying part. In .Net 1.x the framework had each method exist within a singular code chunk, which made extracting that code quite easy. However in .Net 2.x and forward the framework allows a method to have multiple code chunks, each with a different base address and length. This is theoretically to allow an optimizer to spread work its magic, but it does make extracting methods harder. SlimGen will generate an assembly file per chunk and all of the associated binaries for each chunk, generated from the assembly files, must be present for the method to be replaced. No dangling chunks please. The SlimGen analyzer extracts each base address from each chunk, along with the module base address. Using that information we can then calculate the relative virtual address of each method’s native counterpart within the NGENed file.</p>
<p>Using that information the SlimGen client simply walks a copy of the native image performing the replacement of each method, and then when done (and assuming no errors), copies it back over the original NGEN image. Tada, you now have your highly optimized SSE code running in a managed application with no managed –&gt; unmanaged transitions in sight.</p>
]]></content:encoded>
			<wfw:commentRss>http://scapecode.com/2009/08/slimgen-and-you-part-add-eax-eax-of-n/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>SlimGen and You, Part ADD [EAX], AL of N</title>
		<link>http://scapecode.com/2009/07/slimgen-and-you-part-add-eax-al-of-n/</link>
		<comments>http://scapecode.com/2009/07/slimgen-and-you-part-add-eax-al-of-n/#comments</comments>
		<pubDate>Sat, 01 Aug 2009 05:52:05 +0000</pubDate>
		<dc:creator>Washu</dc:creator>
				<category><![CDATA[.Net]]></category>
		<category><![CDATA[SlimDX]]></category>
		<category><![CDATA[SlimGen]]></category>
		<category><![CDATA[Software Development]]></category>

		<guid isPermaLink="false">http://scapecode.com/?p=68</guid>
		<description><![CDATA[Imagine you could have the safety of managed code, and the speed of SIMD all in one? Sounds like one of those weird dreams Trent has, or perhaps you are already thinking of using C++/CLI to wrap SIMD methods to help reduce the unmanaged transition overhead. You might also be thinking about pinvoking DLL methods [...]]]></description>
			<content:encoded><![CDATA[<p>Imagine you could have the safety of managed code, and the speed of SIMD all in one? Sounds like one of those weird dreams <a href="http://polycat.net/" target="_blank">Trent</a> has, or perhaps you are already thinking of using C++/CLI to wrap SIMD methods to help reduce the unmanaged transition overhead. You might also be thinking about pinvoking DLL methods such as those used in the D3DX framework to take advantage of its SIMD capabilities.</p>
<p>While all of those are quite possible, and for sufficiently large problems quite efficient too, they also have a relatively high cost of invocation. Managed to unmanaged transitions, even in the best of cases, costs a pretty penny. Registers have to be saved, marshalling of non-fundamental types has to be performed, and in many cases an interop thunk has to be created/jitted. This is a case where the best option is to do as much work as you can in one area before transitioning to the next.</p>
<p>But you can’t always do tons of work at once, a prime example is that of managing your game state. You’ll have discrete transformations of objects, but batching up those transformations to perform them all at once because a management nightmare. You have to craft special data-structures to avoid marshalling, use pinned arrays, and in general you end up doing a lot of work maintaining the two, will spend plenty of time debugging your interface, and may actually not gain anything speed wise still.</p>
<p>If you’re wondering just how bad the interop transition is, you can take a look at my <a href="http://scapecode.com/?cat=3" target="_blank">previous entries</a>, where I explored the topic in some detail.</p>
<p>In the .Net framework, most code runs almost as fast, as fast, or faster than the comparable native counterparts. There are cases where the framework is significantly faster, and cases where it loses out at about 10% in the worst case. 10% isn’t a horrible loss, and it’s not a consistent loss either. The cost will vary depending on factors such as: is JITing required, is memory allocation performed, are you doing FPU math that would be vectorized in native code?</p>
<p>In fact, that 10% figure isn’t accurate either: If a method requires JITting the first time it is called, which could cost you 10% on the first invocation, future invocations will not need JITing and so the cost may end up being the same as its native counterpart henceforth. If the method is called a thousand times, then that’s only an additional .01% cost over the entire set of invocations.</p>
<p>The only real area that the .Net framework seriously loses out to unmanaged code is in the math department. The inability to use vectorization can significantly increase the cost of managed math over that of unamanged math code, that 10% figure rears its ugly head here. On the integer math side of things managed code is almost on equal footing with unmanaged code, although there are some vectorized operations you can perform that will enhance integer operations quite significantly, but in general the two add up to be about the same. However when it comes to floating point performance managed code loses out due to its dependency on the FPU or single float SSE instructions. The ability to vectorize large chunks of floating point math can work wonders for unmanaged code.</p>
<p>Well, all is not lost for those of us who love the managed world… SlimGen is here. Exactly what SlimGen is will be delved into later, but here’s a sample preview of what it can do:</p>

<div class="wp_syntax"><div class="code"><pre class="asm" style="font-family:monospace;">SlimDX<span style="color: #339933;">.</span>Matrix<span style="color: #339933;">.</span>Multiply<span style="color: #009900; font-weight: bold;">&#40;</span>SlimDX<span style="color: #339933;">.</span>Matrix ByRef<span style="color: #339933;">,</span> SlimDX<span style="color: #339933;">.</span>Matrix ByRef<span style="color: #339933;">,</span> SlimDX<span style="color: #339933;">.</span>Matrix ByRef<span style="color: #009900; font-weight: bold;">&#41;</span>
Begin 5a856e64<span style="color: #339933;">,</span> <span style="color: #000000; font-weight: bold;">size</span> <span style="color: #0000ff;">293</span>
<span style="color: #adadad; font-style: italic;">5A856E64</span> 8B442404         <span style="color: #00007f; font-weight: bold;">mov</span>         <span style="color: #00007f;">eax</span><span style="color: #339933;">,</span><span style="color: #000000; font-weight: bold;">dword</span> <span style="color: #000000; font-weight: bold;">ptr</span> <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #00007f;">esp</span><span style="color: #339933;">+</span><span style="color: #0000ff;">4</span><span style="color: #009900; font-weight: bold;">&#93;</span>
<span style="color: #adadad; font-style: italic;">5A856E68</span> 0F1022           <span style="color: #00007f; font-weight: bold;">movups</span>      <span style="color: #00007f;">xmm4</span><span style="color: #339933;">,</span>xmmword <span style="color: #000000; font-weight: bold;">ptr</span> <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #00007f;">edx</span><span style="color: #009900; font-weight: bold;">&#93;</span>
<span style="color: #adadad; font-style: italic;">5A856E6B</span> 0F106A10         <span style="color: #00007f; font-weight: bold;">movups</span>      <span style="color: #00007f;">xmm5</span><span style="color: #339933;">,</span>xmmword <span style="color: #000000; font-weight: bold;">ptr</span> <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #00007f;">edx</span><span style="color: #339933;">+</span><span style="color: #0000ff;">10h</span><span style="color: #009900; font-weight: bold;">&#93;</span>
<span style="color: #adadad; font-style: italic;">5A856E6F</span> 0F107220         <span style="color: #00007f; font-weight: bold;">movups</span>      <span style="color: #00007f;">xmm6</span><span style="color: #339933;">,</span>xmmword <span style="color: #000000; font-weight: bold;">ptr</span> <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #00007f;">edx</span><span style="color: #339933;">+</span><span style="color: #0000ff;">20h</span><span style="color: #009900; font-weight: bold;">&#93;</span>
<span style="color: #adadad; font-style: italic;">5A856E73</span> 0F107A30         <span style="color: #00007f; font-weight: bold;">movups</span>      <span style="color: #00007f;">xmm7</span><span style="color: #339933;">,</span>xmmword <span style="color: #000000; font-weight: bold;">ptr</span> <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #00007f;">edx</span><span style="color: #339933;">+</span><span style="color: #0000ff;">30h</span><span style="color: #009900; font-weight: bold;">&#93;</span>
<span style="color: #adadad; font-style: italic;">5A856E77</span> 0F1001           <span style="color: #00007f; font-weight: bold;">movups</span>      <span style="color: #00007f;">xmm0</span><span style="color: #339933;">,</span>xmmword <span style="color: #000000; font-weight: bold;">ptr</span> <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #00007f;">ecx</span><span style="color: #009900; font-weight: bold;">&#93;</span>
<span style="color: #adadad; font-style: italic;">5A856E7A</span> 0F28C8           <span style="color: #00007f; font-weight: bold;">movaps</span>      <span style="color: #00007f;">xmm1</span><span style="color: #339933;">,</span><span style="color: #00007f;">xmm0</span></pre></div></div>

]]></content:encoded>
			<wfw:commentRss>http://scapecode.com/2009/07/slimgen-and-you-part-add-eax-al-of-n/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>I Love Python</title>
		<link>http://scapecode.com/2009/06/i-love-python/</link>
		<comments>http://scapecode.com/2009/06/i-love-python/#comments</comments>
		<pubDate>Fri, 19 Jun 2009 19:33:35 +0000</pubDate>
		<dc:creator>Washu</dc:creator>
				<category><![CDATA[Python]]></category>
		<category><![CDATA[Software Development]]></category>

		<guid isPermaLink="false">http://scapecode.com/?p=7</guid>
		<description><![CDATA[I think one could say that I&#8217;m somewhat infatuated with Python. It&#8217;s a wonderful language really. The language is easy to use, powerful, and developing things in it gets done so much faster than in most other languages I&#8217;ve used. It&#8217;s also pretty trivial to pickup, and once you&#8217;re past the basic strangeness of white-spacing [...]]]></description>
			<content:encoded><![CDATA[<p>I think one could say that I&#8217;m somewhat infatuated with Python. It&#8217;s a wonderful language really. The language is easy to use, powerful, and developing things in it gets done so much faster than in most other languages I&#8217;ve used. It&#8217;s also pretty trivial to pickup, and once you&#8217;re past the basic strangeness of white-spacing based scope resolution, you quickly will find yourself making what would be severely complex applications in it with but a few lines.</p>
<p>Now, at this point, most people would be pointing out the functional origins of many of Python&#8217;s capabilities, and then point to the many functional languages that have many of the same capabilities. Or perhaps they would point to the dynamic typing, and how that makes development so much more flexible&#8230; a few would probably point out that many bugs won&#8217;t show up till compile-time that a statically typed language would find immediately.</p>
<p>But none of that really matters to me, because the biggest thing that I find with Python is that it encourages readable code. Now, we&#8217;re not talking C# readable code, which while it can be made readable still tends to be mixed in with a great deal of language jargon that can confuse the casual reader. Nor am I talking about C++ readable code, which just simply doesn&#8217;t exist. No, I&#8217;m talking about code that you can sit down, and almost read out loud in a sensible manner. Code that you can look at, and without having to filter out many of the little niggling bits, can simply understand what it does.</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;">items = <span style="color: black;">&#91;</span>
    <span style="color: black;">&#123;</span><span style="color: #483d8b;">'name'</span> : <span style="color: #483d8b;">&quot;Bronze Sword&quot;</span>, <span style="color: #483d8b;">'value'</span> : <span style="color: #ff4500;">50</span>, <span style="color: #483d8b;">'diceCount'</span> : <span style="color: #ff4500;">1</span>, <span style="color: #483d8b;">'diceSides'</span> : <span style="color: #ff4500;">4</span><span style="color: black;">&#125;</span>,
    <span style="color: black;">&#123;</span><span style="color: #483d8b;">'name'</span> : <span style="color: #483d8b;">&quot;Steel Sword&quot;</span>, <span style="color: #483d8b;">'value'</span> : <span style="color: #ff4500;">100</span>, <span style="color: #483d8b;">'diceCount'</span> : <span style="color: #ff4500;">2</span>, <span style="color: #483d8b;">'diceSides'</span> : <span style="color: #ff4500;">4</span><span style="color: black;">&#125;</span>,
    <span style="color: black;">&#123;</span><span style="color: #483d8b;">'name'</span> : <span style="color: #483d8b;">&quot;Adamantium Sword&quot;</span>, <span style="color: #483d8b;">'value'</span> : <span style="color: #ff4500;">200</span>, <span style="color: #483d8b;">'diceCount'</span> : <span style="color: #ff4500;">1</span>, <span style="color: #483d8b;">'diceSides'</span> : <span style="color: #ff4500;">10</span><span style="color: black;">&#125;</span>
<span style="color: black;">&#93;</span>
&nbsp;
result = <span style="color: black;">&#91;</span>item <span style="color: #ff7700;font-weight:bold;">for</span> item <span style="color: #ff7700;font-weight:bold;">in</span> items <span style="color: #ff7700;font-weight:bold;">if</span> item<span style="color: black;">&#91;</span><span style="color: #483d8b;">'value'</span><span style="color: black;">&#93;</span> <span style="color: #66cc66;">&gt;</span> <span style="color: #ff4500;">50</span><span style="color: black;">&#93;</span>
<span style="color: #ff7700;font-weight:bold;">print</span> result</pre></div></div>

<p>Now, looking over the above code what immediately comes to mind is that items is an array containing objects, with properties. Pretty cool in my opinion. A similar C# example could be done, but then you would have to rely on either an explicit Item object, anonymous types, or a dictionary of objects (which would involve typecasting, and other nasty behavior)&#8230;</p>

<div class="wp_syntax"><div class="code"><pre class="csharp" style="font-family:monospace;"><span style="color: #0600FF;">using</span> <span style="color: #008080;">System</span><span style="color: #008000;">;</span>
<span style="color: #0600FF;">using</span> <span style="color: #008080;">System.Collections.Generic</span><span style="color: #008000;">;</span>
<span style="color: #0600FF;">using</span> <span style="color: #008080;">System.Linq</span><span style="color: #008000;">;</span>
&nbsp;
<span style="color: #0600FF;">static</span> <span style="color: #FF0000;">class</span> Program <span style="color: #000000;">&#123;</span>
    <span style="color: #0600FF;">public</span> <span style="color: #0600FF;">static</span> <span style="color: #0600FF;">void</span> <span style="color: #0600FF;">ForEach</span><span style="color: #008000;">&lt;</span>T<span style="color: #008000;">&gt;</span><span style="color: #000000;">&#40;</span><span style="color: #0600FF;">this</span> IEnumerable<span style="color: #008000;">&lt;</span>T<span style="color: #008000;">&gt;</span> cont, Action<span style="color: #008000;">&lt;</span>T<span style="color: #008000;">&gt;</span> action<span style="color: #000000;">&#41;</span> <span style="color: #000000;">&#123;</span>
        <span style="color: #0600FF;">foreach</span> <span style="color: #000000;">&#40;</span>var t <span style="color: #0600FF;">in</span> cont<span style="color: #000000;">&#41;</span>
            action<span style="color: #000000;">&#40;</span>t<span style="color: #000000;">&#41;</span><span style="color: #008000;">;</span>
    <span style="color: #000000;">&#125;</span>
&nbsp;
    <span style="color: #0600FF;">static</span> <span style="color: #0600FF;">void</span> Main<span style="color: #000000;">&#40;</span><span style="color: #FF0000;">string</span><span style="color: #000000;">&#91;</span><span style="color: #000000;">&#93;</span> args<span style="color: #000000;">&#41;</span> <span style="color: #000000;">&#123;</span>
        var items <span style="color: #008000;">=</span> <span style="color: #008000;">new</span><span style="color: #000000;">&#91;</span><span style="color: #000000;">&#93;</span> <span style="color: #000000;">&#123;</span>
            <span style="color: #008000;">new</span> <span style="color: #000000;">&#123;</span> Name <span style="color: #008000;">=</span> <span style="color: #666666;">&quot;Bronze Sword&quot;</span>, Value <span style="color: #008000;">=</span> <span style="color: #FF0000;">50</span>, DiceCount <span style="color: #008000;">=</span> <span style="color: #FF0000;">1</span>, DiceSides <span style="color: #008000;">=</span> <span style="color: #FF0000;">4</span> <span style="color: #000000;">&#125;</span>,
            <span style="color: #008000;">new</span> <span style="color: #000000;">&#123;</span>Name <span style="color: #008000;">=</span> <span style="color: #666666;">&quot;Steel Sword&quot;</span>, Value <span style="color: #008000;">=</span> <span style="color: #FF0000;">100</span>, DiceCount <span style="color: #008000;">=</span> <span style="color: #FF0000;">2</span>, DiceSides <span style="color: #008000;">=</span> <span style="color: #FF0000;">4</span><span style="color: #000000;">&#125;</span>,
            <span style="color: #008000;">new</span> <span style="color: #000000;">&#123;</span>Name <span style="color: #008000;">=</span> <span style="color: #666666;">&quot;Adamantium Sword&quot;</span>, Value <span style="color: #008000;">=</span><span style="color: #FF0000;">200</span>, DiceCount <span style="color: #008000;">=</span> <span style="color: #FF0000;">1</span>, DiceSides <span style="color: #008000;">=</span> <span style="color: #FF0000;">10</span><span style="color: #000000;">&#125;</span>
        <span style="color: #000000;">&#125;</span><span style="color: #008000;">;</span>
&nbsp;
        var result <span style="color: #008000;">=</span> from item <span style="color: #0600FF;">in</span> items where item.<span style="color: #0000FF;">Value</span> <span style="color: #008000;">&gt;</span> <span style="color: #FF0000;">50</span> select item<span style="color: #008000;">;</span>
        result.<span style="color: #0600FF;">ForEach</span><span style="color: #000000;">&#40;</span>
            <span style="color: #000000;">&#40;</span>item<span style="color: #000000;">&#41;</span> <span style="color: #008000;">=&gt;</span>
                Console.<span style="color: #0000FF;">WriteLine</span><span style="color: #000000;">&#40;</span>
                    <span style="color: #666666;">&quot;{{Name : {0}, Value : {1}, DiceCount : {2}, DiceSides : {3}}}&quot;</span>,
                    item.<span style="color: #0000FF;">Name</span>, item.<span style="color: #0000FF;">Value</span>, item.<span style="color: #0000FF;">DiceCount</span>, item.<span style="color: #0000FF;">DiceSides</span>
                <span style="color: #000000;">&#41;</span>
        <span style="color: #000000;">&#41;</span><span style="color: #008000;">;</span>
    <span style="color: #000000;">&#125;</span>
<span style="color: #000000;">&#125;</span></pre></div></div>

<p>Now, think about how much longer that is. Heck, read through it a bit and try and easily understand it. It&#8217;s not that difficult to do, but you&#8217;re filtering out a lot of useless language garbage. Things like &#8220;new&#8221; or &#8220;var&#8221; just get in the way. Heck, look at what we had to do to easily print out the list in a single line! While we could have certainly embedded the foreach explicitly into the main function, the ability to apply that functionality to any query is just too useful to not define an extension method for it.</p>
<p>A C++ version, which I will not provide here, is even worse since it lacks many of the language niceties of C#. This means you end up spending more time doing all that lovely low down dirty work just to get a simple list of items to perform some queries on and then print out the results of.</p>
<p>But how does it encourage readability? Well, first and foremost, there&#8217;s the scoping. Since scoping is based on indentation, you have to make sure your code is properly indented. The worst experience one can have is to open someone&#8217;s source code and find a lack of, or haphazard indenting. It completely ruins the flow when reading the code. Then there&#8217;s the lack of keywords, which require you to interpret them within the context in which they are used, as some keywords in many languages behave differently depending on the context. A trivial example of this is new in C# when allocating a value type versus allocating a reference type. Last, but certainly not least, are all the libraries. These allow you to rapidly build entire applications without having to worry about all the low level nitty gritty stuff. You just get in there, and go.</p>
]]></content:encoded>
			<wfw:commentRss>http://scapecode.com/2009/06/i-love-python/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Blinders</title>
		<link>http://scapecode.com/2009/06/blinders/</link>
		<comments>http://scapecode.com/2009/06/blinders/#comments</comments>
		<pubDate>Fri, 19 Jun 2009 18:59:20 +0000</pubDate>
		<dc:creator>Washu</dc:creator>
				<category><![CDATA[Software Development]]></category>

		<guid isPermaLink="false">http://scapecode.com/?p=9</guid>
		<description><![CDATA[Something that&#8217;s been bugging me for a while now are developers who appear to be wearing blinders. I don&#8217;t necessarily mean physically (although that is entirely possible), but mentally. They are confronted with a problem, and when a solution is suggested they are unable to see how to apply it, or come up with non-existent [...]]]></description>
			<content:encoded><![CDATA[<p>Something that&#8217;s been bugging me for a while now are developers who appear to be wearing blinders.</p>
<p>I don&#8217;t necessarily mean physically (although that is entirely possible), but mentally. They are confronted with a problem, and when a solution is suggested they are unable to see how to apply it, or come up with non-existent limitations on the solution, because they are unable to see how to apply it in a bigger view. Now, that doesn&#8217;t mean they are wearing blinders all the time, although I frequently find that they are, but it does suggest an endemic issue with developers being unable to focus on anything more than the problem at hand.</p>
<p>Architects have to be able to view the design from both a high vantage point and lower, more problem oriented vantage points. This duality is what makes architects so special. As such when they encounter a problem, the solution they will devise tends to fit both the problem at hand, and within the design as a whole. However, this ability should also be present within developers (who will hopefully become architects), as you should be able to look beyond your current problem and its limitations and see how a solution can fit as a whole.</p>
<p>A simple example, let&#8217;s take the instance of submitting high scores to a server for a single player game, assume the game is something like DDR. Submitting high scores then must be done in some manner,, such as a web-post to the server. So the question is, how can we deter cheating? Some ideas were thrown around, including storing the high score in multiple places in memory, using MD5 to ensure that it hadn&#8217;t changed, and a few other solutions. The problems with these various solutions should be fairly obvious, as an example the first one doesn&#8217;t stop me from using a packet analyzer to examine the sent data and determining a way of submitting my own scores, nor does the MD5 method, as I can just rehash my high score. A more complex method of submitting high scores was suggested; send the moves made to the server. Let the server run the moves through (it could do it quite fast for a DDR based game, since it would just need the exact key time indices, and then an error margin).</p>
<p>The solution was initially rejected because the author of said game couldn&#8217;t see how users might submit more than one high score. After it was pointed out that the same method that allows a person to submit multiple high scores would work anyways, he rejected it for other spurious reasons (such as &#8220;what happens if they quit the game part way through?&#8221;), all of which the entire solution would have suffered from. He was wearing blinders, and couldn&#8217;t see how the solution fit within the whole architecture.</p>
<p>Now, admittedly, the solution isn&#8217;t perfect, and it would take some work on the server side as well to properly recalculate the high score. It also doesn&#8217;t prevent cheating, just makes it quite a bit harder, since they would have to submit the key strokes with the proper timing, and assuming some sort of a random seed, that can be harder to automate.</p>
<p>So the next time you find yourself having issues with a solution, stop. Take a step back, and look at the solution from a few more angles. Perhaps it fits, just not how you expected it to.</p>
]]></content:encoded>
			<wfw:commentRss>http://scapecode.com/2009/06/blinders/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Playing With The .NET JIT Part 4</title>
		<link>http://scapecode.com/2009/06/playing-with-the-net-jit-part-4/</link>
		<comments>http://scapecode.com/2009/06/playing-with-the-net-jit-part-4/#comments</comments>
		<pubDate>Fri, 19 Jun 2009 18:58:11 +0000</pubDate>
		<dc:creator>Washu</dc:creator>
				<category><![CDATA[.Net]]></category>
		<category><![CDATA[Software Development]]></category>
		<category><![CDATA[Unamanged Code]]></category>

		<guid isPermaLink="false">http://scapecode.com/?p=12</guid>
		<description><![CDATA[As noted previously there are some cases where the performance of unmanaged code can beat that of the managed JIT. In the previous case it was the matrix multiplication function. We do have some other possible performance benefits we can give to our .NET code, specifically, we can NGEN it. NGEN is an interesting utility, [...]]]></description>
			<content:encoded><![CDATA[<p>As noted previously there are some cases where the performance of unmanaged code can beat that of the managed JIT. In the previous case it was the matrix multiplication function. We do have some other possible performance benefits we can give to our .NET code, specifically, we can NGEN it. NGEN is an interesting utility, it can perform heavy optimizations that would not be possible in the standard runtime JIT (as we shall see). The question before us is: Will it give us enough of a boost to be able to surpass the performance of our unmanaged matrix multiplication?</p>
<p><strong>An Analysis of Existing Code</strong></p>
<p>We haven&#8217;t looked at the current code that was produced for our previous tests yet, so I feel that it is time we gave it a look and see what we have. To keep this shorter we&#8217;ll only look at the inner product function. The code produced for the matrix multiplication suffers from the same problems and benefits from the same extensions. For the purposes of this writing we&#8217;ll only consider the x64 platform. First up we&#8217;ll look at our unmanaged matrix multiplication, which as we may recall is an SSE2 version. There some things we should note: this method cannot be inlined into the managed code, and there are no frame pointers (they got optimized out).</p>

<div class="wp_syntax"><div class="code"><pre class="asm" style="font-family:monospace;"><span style="color: #adadad; font-style: italic;">00000001</span>`800019c3 0f100a          <span style="color: #00007f; font-weight: bold;">movups</span>  <span style="color: #00007f;">xmm1</span><span style="color: #339933;">,</span>xmmword <span style="color: #000000; font-weight: bold;">ptr</span> <span style="color: #009900; font-weight: bold;">&#91;</span>rdx<span style="color: #009900; font-weight: bold;">&#93;</span>
<span style="color: #adadad; font-style: italic;">00000001</span>`800019c6 0f59c8          <span style="color: #00007f; font-weight: bold;">mulps</span>   <span style="color: #00007f;">xmm1</span><span style="color: #339933;">,</span><span style="color: #00007f;">xmm0</span>
<span style="color: #adadad; font-style: italic;">00000001</span>`800019c9 0f28c1          <span style="color: #00007f; font-weight: bold;">movaps</span>  <span style="color: #00007f;">xmm0</span><span style="color: #339933;">,</span><span style="color: #00007f;">xmm1</span>
<span style="color: #adadad; font-style: italic;">00000001</span>`800019cc 0fc6c14e        <span style="color: #00007f; font-weight: bold;">shufps</span>  <span style="color: #00007f;">xmm0</span><span style="color: #339933;">,</span><span style="color: #00007f;">xmm1</span><span style="color: #339933;">,</span><span style="color: #0000ff;">4Eh</span>
<span style="color: #adadad; font-style: italic;">00000001</span>`800019d0 0f58c8          <span style="color: #00007f; font-weight: bold;">addps</span>   <span style="color: #00007f;">xmm1</span><span style="color: #339933;">,</span><span style="color: #00007f;">xmm0</span>
<span style="color: #adadad; font-style: italic;">00000001</span>`800019d3 0f28c1          <span style="color: #00007f; font-weight: bold;">movaps</span>  <span style="color: #00007f;">xmm0</span><span style="color: #339933;">,</span><span style="color: #00007f;">xmm1</span>
<span style="color: #adadad; font-style: italic;">00000001</span>`800019d6 0fc6c11b        <span style="color: #00007f; font-weight: bold;">shufps</span>  <span style="color: #00007f;">xmm0</span><span style="color: #339933;">,</span><span style="color: #00007f;">xmm1</span><span style="color: #339933;">,</span><span style="color: #0000ff;">1Bh</span>
<span style="color: #adadad; font-style: italic;">00000001</span>`800019da 0f58c1          <span style="color: #00007f; font-weight: bold;">addps</span>   <span style="color: #00007f;">xmm0</span><span style="color: #339933;">,</span><span style="color: #00007f;">xmm1</span>
<span style="color: #adadad; font-style: italic;">00000001</span>`800019dd f3410f1100      movss   <span style="color: #000000; font-weight: bold;">dword</span> <span style="color: #000000; font-weight: bold;">ptr</span> <span style="color: #009900; font-weight: bold;">&#91;</span>r8<span style="color: #009900; font-weight: bold;">&#93;</span><span style="color: #339933;">,</span><span style="color: #00007f;">xmm0</span>
<span style="color: #adadad; font-style: italic;">00000001</span>`<span style="color: #0000ff;">800019e2</span> c3              <span style="color: #00007f; font-weight: bold;">ret</span></pre></div></div>

<p>The code used to produce the managed version shown below has undergone a slight modification. No longer does the method return a float, instead it has an out parameter to a float, which ends up holding the result of the operation. This change was made to eliminate some compilation issues in both the managed and unmanaged versions. In the case of the managed version below, without the out parameter the store operation (at 00000642`801673b3) would have required a conversion to a double and back to a single again, the new versions are shown at the end of this post. Examining the managed inner product we get a somewhat worse picture:</p>

<div class="wp_syntax"><div class="code"><pre class="asm" style="font-family:monospace;"><span style="color: #adadad; font-style: italic;">00000642</span>`<span style="color: #0000ff;">8016732f</span> 4c8b4908        <span style="color: #00007f; font-weight: bold;">mov</span>     r9<span style="color: #339933;">,</span><span style="color: #000000; font-weight: bold;">qword</span> <span style="color: #000000; font-weight: bold;">ptr</span> <span style="color: #009900; font-weight: bold;">&#91;</span>rcx<span style="color: #339933;">+</span><span style="color: #0000ff;">8</span><span style="color: #009900; font-weight: bold;">&#93;</span>
<span style="color: #adadad; font-style: italic;">00000642</span>`<span style="color: #0000ff;">80167333</span> 4d85c9          <span style="color: #00007f; font-weight: bold;">test</span>    r9<span style="color: #339933;">,</span>r9
<span style="color: #adadad; font-style: italic;">00000642</span>`<span style="color: #0000ff;">80167336</span> 0f8684000000    <span style="color: #00007f; font-weight: bold;">jbe</span>     <span style="color: #0000ff;">00000642</span>`801673c0
<span style="color: #adadad; font-style: italic;">00000642</span>`8016733c f30f104110      movss   <span style="color: #00007f;">xmm0</span><span style="color: #339933;">,</span><span style="color: #000000; font-weight: bold;">dword</span> <span style="color: #000000; font-weight: bold;">ptr</span> <span style="color: #009900; font-weight: bold;">&#91;</span>rcx<span style="color: #339933;">+</span><span style="color: #0000ff;">10h</span><span style="color: #009900; font-weight: bold;">&#93;</span>
<span style="color: #adadad; font-style: italic;">00000642</span>`<span style="color: #0000ff;">80167341</span> 488b4208        <span style="color: #00007f; font-weight: bold;">mov</span>     rax<span style="color: #339933;">,</span><span style="color: #000000; font-weight: bold;">qword</span> <span style="color: #000000; font-weight: bold;">ptr</span> <span style="color: #009900; font-weight: bold;">&#91;</span>rdx<span style="color: #339933;">+</span><span style="color: #0000ff;">8</span><span style="color: #009900; font-weight: bold;">&#93;</span>
<span style="color: #adadad; font-style: italic;">00000642</span>`<span style="color: #0000ff;">80167345</span> 4885c0          <span style="color: #00007f; font-weight: bold;">test</span>    rax<span style="color: #339933;">,</span>rax
<span style="color: #adadad; font-style: italic;">00000642</span>`<span style="color: #0000ff;">80167348</span> <span style="color: #0000ff;">7676</span>            <span style="color: #00007f; font-weight: bold;">jbe</span>     <span style="color: #0000ff;">00000642</span>`801673c0
<span style="color: #adadad; font-style: italic;">00000642</span>`8016734a f30f104a10      movss   <span style="color: #00007f;">xmm1</span><span style="color: #339933;">,</span><span style="color: #000000; font-weight: bold;">dword</span> <span style="color: #000000; font-weight: bold;">ptr</span> <span style="color: #009900; font-weight: bold;">&#91;</span>rdx<span style="color: #339933;">+</span><span style="color: #0000ff;">10h</span><span style="color: #009900; font-weight: bold;">&#93;</span>
<span style="color: #adadad; font-style: italic;">00000642</span>`<span style="color: #0000ff;">8016734f</span> f30f59c8        mulss   <span style="color: #00007f;">xmm1</span><span style="color: #339933;">,</span><span style="color: #00007f;">xmm0</span>
<span style="color: #adadad; font-style: italic;">00000642</span>`<span style="color: #0000ff;">80167353</span> 4983f901        <span style="color: #00007f; font-weight: bold;">cmp</span>     r9<span style="color: #339933;">,</span><span style="color: #0000ff;">1</span>
<span style="color: #adadad; font-style: italic;">00000642</span>`<span style="color: #0000ff;">80167357</span> <span style="color: #0000ff;">7667</span>            <span style="color: #00007f; font-weight: bold;">jbe</span>     <span style="color: #0000ff;">00000642</span>`801673c0
<span style="color: #adadad; font-style: italic;">00000642</span>`<span style="color: #0000ff;">80167359</span> f30f105114      movss   <span style="color: #00007f;">xmm2</span><span style="color: #339933;">,</span><span style="color: #000000; font-weight: bold;">dword</span> <span style="color: #000000; font-weight: bold;">ptr</span> <span style="color: #009900; font-weight: bold;">&#91;</span>rcx<span style="color: #339933;">+</span><span style="color: #0000ff;">14h</span><span style="color: #009900; font-weight: bold;">&#93;</span>
<span style="color: #adadad; font-style: italic;">00000642</span>`8016735e 483d01000000    <span style="color: #00007f; font-weight: bold;">cmp</span>     rax<span style="color: #339933;">,</span><span style="color: #0000ff;">1</span>
<span style="color: #adadad; font-style: italic;">00000642</span>`<span style="color: #0000ff;">80167364</span> 765a            <span style="color: #00007f; font-weight: bold;">jbe</span>     <span style="color: #0000ff;">00000642</span>`801673c0
<span style="color: #adadad; font-style: italic;">00000642</span>`<span style="color: #0000ff;">80167366</span> f30f104214      movss   <span style="color: #00007f;">xmm0</span><span style="color: #339933;">,</span><span style="color: #000000; font-weight: bold;">dword</span> <span style="color: #000000; font-weight: bold;">ptr</span> <span style="color: #009900; font-weight: bold;">&#91;</span>rdx<span style="color: #339933;">+</span><span style="color: #0000ff;">14h</span><span style="color: #009900; font-weight: bold;">&#93;</span>
<span style="color: #adadad; font-style: italic;">00000642</span>`8016736b f30f59c2        mulss   <span style="color: #00007f;">xmm0</span><span style="color: #339933;">,</span><span style="color: #00007f;">xmm2</span>
<span style="color: #adadad; font-style: italic;">00000642</span>`<span style="color: #0000ff;">8016736f</span> f30f58c1        addss   <span style="color: #00007f;">xmm0</span><span style="color: #339933;">,</span><span style="color: #00007f;">xmm1</span>
<span style="color: #adadad; font-style: italic;">00000642</span>`<span style="color: #0000ff;">80167373</span> 4983f902        <span style="color: #00007f; font-weight: bold;">cmp</span>     r9<span style="color: #339933;">,</span><span style="color: #0000ff;">2</span>
<span style="color: #adadad; font-style: italic;">00000642</span>`<span style="color: #0000ff;">80167377</span> <span style="color: #0000ff;">7647</span>            <span style="color: #00007f; font-weight: bold;">jbe</span>     <span style="color: #0000ff;">00000642</span>`801673c0
<span style="color: #adadad; font-style: italic;">00000642</span>`<span style="color: #0000ff;">80167379</span> f30f105118      movss   <span style="color: #00007f;">xmm2</span><span style="color: #339933;">,</span><span style="color: #000000; font-weight: bold;">dword</span> <span style="color: #000000; font-weight: bold;">ptr</span> <span style="color: #009900; font-weight: bold;">&#91;</span>rcx<span style="color: #339933;">+</span><span style="color: #0000ff;">18h</span><span style="color: #009900; font-weight: bold;">&#93;</span>
<span style="color: #adadad; font-style: italic;">00000642</span>`8016737e 483d02000000    <span style="color: #00007f; font-weight: bold;">cmp</span>     rax<span style="color: #339933;">,</span><span style="color: #0000ff;">2</span>
<span style="color: #adadad; font-style: italic;">00000642</span>`<span style="color: #0000ff;">80167384</span> 763a            <span style="color: #00007f; font-weight: bold;">jbe</span>     <span style="color: #0000ff;">00000642</span>`801673c0
<span style="color: #adadad; font-style: italic;">00000642</span>`<span style="color: #0000ff;">80167386</span> f30f104a18      movss   <span style="color: #00007f;">xmm1</span><span style="color: #339933;">,</span><span style="color: #000000; font-weight: bold;">dword</span> <span style="color: #000000; font-weight: bold;">ptr</span> <span style="color: #009900; font-weight: bold;">&#91;</span>rdx<span style="color: #339933;">+</span><span style="color: #0000ff;">18h</span><span style="color: #009900; font-weight: bold;">&#93;</span>
<span style="color: #adadad; font-style: italic;">00000642</span>`8016738b f30f59ca        mulss   <span style="color: #00007f;">xmm1</span><span style="color: #339933;">,</span><span style="color: #00007f;">xmm2</span>
<span style="color: #adadad; font-style: italic;">00000642</span>`<span style="color: #0000ff;">8016738f</span> f30f58c8        addss   <span style="color: #00007f;">xmm1</span><span style="color: #339933;">,</span><span style="color: #00007f;">xmm0</span>
<span style="color: #adadad; font-style: italic;">00000642</span>`<span style="color: #0000ff;">80167393</span> 4983f903        <span style="color: #00007f; font-weight: bold;">cmp</span>     r9<span style="color: #339933;">,</span><span style="color: #0000ff;">3</span>
<span style="color: #adadad; font-style: italic;">00000642</span>`<span style="color: #0000ff;">80167397</span> <span style="color: #0000ff;">7627</span>            <span style="color: #00007f; font-weight: bold;">jbe</span>     <span style="color: #0000ff;">00000642</span>`801673c0
<span style="color: #adadad; font-style: italic;">00000642</span>`<span style="color: #0000ff;">80167399</span> f30f10511c      movss   <span style="color: #00007f;">xmm2</span><span style="color: #339933;">,</span><span style="color: #000000; font-weight: bold;">dword</span> <span style="color: #000000; font-weight: bold;">ptr</span> <span style="color: #009900; font-weight: bold;">&#91;</span>rcx<span style="color: #339933;">+</span><span style="color: #0000ff;">1Ch</span><span style="color: #009900; font-weight: bold;">&#93;</span>
<span style="color: #adadad; font-style: italic;">00000642</span>`8016739e 483d03000000    <span style="color: #00007f; font-weight: bold;">cmp</span>     rax<span style="color: #339933;">,</span><span style="color: #0000ff;">3</span>
<span style="color: #adadad; font-style: italic;">00000642</span>`801673a4 761a            <span style="color: #00007f; font-weight: bold;">jbe</span>     <span style="color: #0000ff;">00000642</span>`801673c0
<span style="color: #adadad; font-style: italic;">00000642</span>`801673a6 f30f10421c      movss   <span style="color: #00007f;">xmm0</span><span style="color: #339933;">,</span><span style="color: #000000; font-weight: bold;">dword</span> <span style="color: #000000; font-weight: bold;">ptr</span> <span style="color: #009900; font-weight: bold;">&#91;</span>rdx<span style="color: #339933;">+</span><span style="color: #0000ff;">1Ch</span><span style="color: #009900; font-weight: bold;">&#93;</span>
<span style="color: #adadad; font-style: italic;">00000642</span>`801673ab f30f59c2        mulss   <span style="color: #00007f;">xmm0</span><span style="color: #339933;">,</span><span style="color: #00007f;">xmm2</span>
<span style="color: #adadad; font-style: italic;">00000642</span>`801673af f30f58c1        addss   <span style="color: #00007f;">xmm0</span><span style="color: #339933;">,</span><span style="color: #00007f;">xmm1</span>
<span style="color: #adadad; font-style: italic;">00000642</span>`801673b3 f3410f114040    movss   <span style="color: #000000; font-weight: bold;">dword</span> <span style="color: #000000; font-weight: bold;">ptr</span> <span style="color: #009900; font-weight: bold;">&#91;</span>r8<span style="color: #339933;">+</span><span style="color: #0000ff;">40h</span><span style="color: #009900; font-weight: bold;">&#93;</span><span style="color: #339933;">,</span><span style="color: #00007f;">xmm0</span>
<span style="color: #339933;">.</span>
<span style="color: #339933;">.</span>
<span style="color: #339933;">.</span>
<span style="color: #adadad; font-style: italic;">00000642</span>`801673bd f3c3            <span style="color: #00007f; font-weight: bold;">rep</span> <span style="color: #00007f; font-weight: bold;">ret</span>
<span style="color: #adadad; font-style: italic;">00000642</span>`801673bf <span style="color: #0000ff;">90</span>              <span style="color: #00007f; font-weight: bold;">nop</span>
<span style="color: #adadad; font-style: italic;">00000642</span>`801673c0 e88b9f8aff      <span style="color: #00007f; font-weight: bold;">call</span>    mscorwks!JIT_RngChkFail <span style="color: #009900; font-weight: bold;">&#40;</span><span style="color: #0000ff;">00000642</span>`7fa11350<span style="color: #009900; font-weight: bold;">&#41;</span></pre></div></div>

<p>Wow! Lots of conditionals there, it&#8217;s not vectorized either, but we don&#8217;t expect it to be, automatic vectorization is a hit and miss type of deal with most optimizing compilers (like the Intel one). Not to mention, vectorizing in the runtime JIT would take up far too much time. This method is inlined for us (thankfully), but we see that it is littered with conditionals and jumps. So where are they jumping to? Well, they are actually ending up just after the end of the method. Note the nop instruction that causes the jump destination to be paragraph aligned, that is intentional. As you can probably guess based on the name from the jump destination, those conditionals are checking the array bounds, stored in r9 and rax, against the indices being used. Those jumps aren&#8217;t actually that friendly for branch prediction, but for the most part they won&#8217;t hamper the speed of this method much, but they are an additional cost. Unfortunately, they are rather problematic for the matrix version, and tend to cost quite a bit in performance.</p>
<p>We also can see that in x64 mode the JIT will use SSE2 for floating point operations. This is quite nice, but does have some interesting consequences, for instance comparing floating point numbers generated using the FPU and those using SSE2 will actually more than likely fail, EVEN IF you truncate them to their appropriate sizes. The reason for this is that the XMM registers (when using the single versions of the instructions and not the double ones) store the floating point values as exactly 32 bit floats. The FPU however will expand them to 80 bit floats, which means that operations on those 80 bit floats before truncating them can affect the lower bits of the 32 bit result in a manner that will result in them differing in the lower portions. If you are wondering when this might become an issue, then you can imagine the problems of running a managed networked game where you have 64bit and 32 bit clients all sending packets to the server. This is just another reason why you should be using deltas for comparison of floats. Other things to note is that with the addition of SSE2 support came the ability to use instructions that save us loads and stores, such as the cvtss2sd and cvtsd2ss instructions, which perform single to double and double to single conversions respectively.</p>
<p><strong>Examining the Call Stack</strong></p>
<p>Of course, there is also the question of exactly what all does our program go through to call our unmanaged methods. First off, the JIT will have to generate several marshalling stubs (to deal with any non-blittable types, although in this case all of the passed types are blittable), along with the security demands. The total number of machines instructions for these stubs is around 10-30, never the less, they aren&#8217;t inlinable and end up having to be created at runtime. The extra overhead of these calls can add up to quite a bit. First up we&#8217;ll look at the pinvoke and the delegate stacks:</p>
<pre>000006427f66bd14 ManagedMathLib!matrix_mul
0000064280168b85 mscorwks!DoNDirectCall__PatchGetThreadCall+0x78
0000064280168ccc ManagedMathLib!DomainBoundILStubClass.IL_STUB(Single[], Single[], Single[])+0xb5
0000064280168a0f PInvokeTest!SecurityILStubClass.IL_STUB(Single[], Single[], Single[])+0x5c
000006428016893e PInvokeTest!PInvokeTest.Program+<>c__DisplayClass8.<main>b__0()+0x1f
0000064280167ca1 PInvokeTest!PInvokeTest.Program.TimeTest(TestMethod, Int32)+0x6e
000006427f66c5e2 PInvokeTest!PInvokeTest.Program.Main(System.String[])+0x591
000006427f66bd14 ManagedMathLib!matrix_mul
0000064280168465 mscorwks!DoNDirectCall__PatchGetThreadCall+0x78
00000642801685c1 ManagedMathLib!DomainBoundILStubClass.IL_STUB(Single[], Single[], Single[])+0xb5
0000064280168945 PInvokeTest!SecurityILStubClass.IL_STUB(Single[], Single[], Single[])+0x51
0000064280167d59 PInvokeTest!PInvokeTest.Program.TimeTest(TestMethod, Int32)+0x75
000006427f66c5e2 PInvokeTest!PInvokeTest.Program.Main(System.String[])+0x649
</pre>
<p>We can see the two stubs that were created, along with this last method called
<pre>DoNDirectCall__PatchGetThreadCall</pre>
<p> that actually does the work of calling to our unmanaged function. Exactly what it does is probably what the name says, although I haven&#8217;t actually dug in and tried to find out what&#8217;s going on in the internals of it. One important thing to notice is the PInvokeTest!PInvokeTest.Program+<>c__DisplayClass8.<main>b__0() call, which is actually a delegate used to call to our unmanaged method (passed in to TimeTest). By using the delegate to call the matrix multiplication function, the JIT was able to eliminate the calls entirely. Other than that, the contents of the two sets of stubs are practically identical. The security stub actually asserts that we have the right to call to unmanaged code, as this is a security demand and can change at runtime, this cannot be eliminated. Calling to our unmanaged function from the manged DLL is up next, and it turns out that this is also the most direct call:</p>
<pre>000006427f66bf32 ManagedMathLib!matrix_mul
0000064280169601 mscorwks!DoNDirectCallWorker+0x62
00000642801694ef ManagedMathLib!ManagedMathLib.ManagedMath.MatrixMul(Single[], Single[], Single[])+0xd1
0000064280168945 PInvokeTest!PInvokeTest.Program+<>c__DisplayClass8.<main>b__3()+0x1f
0000064280167ecf PInvokeTest!PInvokeTest.Program.TimeTest(TestMethod, Int32)+0x75
000006427f66c5e2 PInvokeTest!PInvokeTest.Program.Main(System.String[])+0x7bf
</pre>
<p>As we can see, the only real work that is done to call our unmanaged method is the call to DoNDirectCallWorker. Digging around in that method we find that it is basically a wrapper that saves registers, sets up some registers and then dispatches to the unmanaged function. Upon returning it restores the registers and returns to the caller. There is no dynamic method construction, nor does this require any extra overhead on our end. In fact, one could say that the code is about as fast as we can expect it to be for a managed to unmanaged transition. Looking at the difference between the original unmanaged inner product call and the new version (which writes takes a pointer to the destination float), being made from the managed DLL, we can see a huge difference:</p>
<pre>000006427f66bf32 ManagedMathLib!inner_product
0000064280169bd0 mscorwks!DoNDirectCallWorker+0x62
0000064280169acf ManagedMathLib!ManagedMathLib.ManagedMath.InnerProduct(Single[], Single[], Single ByRef)+0xc0
0000064280168955 PInvokeTest!PInvokeTest.Program+<>c__DisplayClass8.<main>b__7()+0x1f
00000642801681c5 PInvokeTest!PInvokeTest.Program.TimeTest(TestMethod, Int32)+0x75
000006427f66c5e2 PInvokeTest!PInvokeTest.Program.Main(System.String[])+0xab5
000006427f66bd14 ManagedMathLib!inner_product
0000064280169ca3 mscorwks!DoNDirectCall__PatchGetThreadCall+0x78
0000064280169ba0 ManagedMathLib!DomainBoundILStubClass.IL_STUB(Single*, Single*)+0x43
0000064280169b00 ManagedMathLib!ManagedMathLib.ManagedMath.InnerProduct(Single[], Single[])+0x50
000006428016893e PInvokeTest!PInvokeTest.Program+<>c__DisplayClass8.<main>b__7()+0x20
00000642801681c5 PInvokeTest!PInvokeTest.Program.TimeTest(TestMethod, Int32)+0x6e
000006427f66c5e2 PInvokeTest!PInvokeTest.Program.Main(System.String[])+0xab5
</pre>
<p>Notice the second call stack has the marshalling stub (also note the parameters to the stub). Returning value types has all sorts of interesting consequences. By changing the signature to write out to a float (in the case of the managed DLL it uses an out parameter), we eliminate the marshalling stub entirely. This improves performance by a decent bit, but nowhere near enough to make up for the call in the first place. The managed inner product is still significantly faster.</p>
<p><strong>And then came NGEN</strong></p>
<p>So, we&#8217;ve gone through and optimized our managed application, but yet it still is running too slow. We contemplate the necessity of moving some code over to the unmanaged world and shudder at the implications. Security would be shot, bugs abound&#8230;what to do! But then we remember that there&#8217;s yet one more option, NGEN!</p>
<p>Running NGEN on our test executable prejitted the whole thing, even methods that eventually ended up being inlined. So, what did it do to our managed inner product? Well first we&#8217;ll look at the actual method that got prejitted:</p>

<div class="wp_syntax"><div class="code"><pre class="asm" style="font-family:monospace;">PInvokeTest<span style="color: #339933;">.</span>Program<span style="color: #339933;">.</span>InnerProduct2<span style="color: #009900; font-weight: bold;">&#40;</span>Single<span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #009900; font-weight: bold;">&#93;</span><span style="color: #339933;">,</span> Single<span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #009900; font-weight: bold;">&#93;</span><span style="color: #339933;">,</span> Single ByRef<span style="color: #009900; font-weight: bold;">&#41;</span>
Begin <span style="color: #0000ff;">0000064288003290</span><span style="color: #339933;">,</span> <span style="color: #000000; font-weight: bold;">size</span> b0
<span style="color: #adadad; font-style: italic;">00000642</span>`<span style="color: #0000ff;">88003290</span> 4883ec28        <span style="color: #00007f; font-weight: bold;">sub</span>     rsp<span style="color: #339933;">,</span><span style="color: #0000ff;">28h</span>
<span style="color: #adadad; font-style: italic;">00000642</span>`<span style="color: #0000ff;">88003294</span> 4c8bc9          <span style="color: #00007f; font-weight: bold;">mov</span>     r9<span style="color: #339933;">,</span>rcx
<span style="color: #adadad; font-style: italic;">00000642</span>`<span style="color: #0000ff;">88003297</span> 498b4108        <span style="color: #00007f; font-weight: bold;">mov</span>     rax<span style="color: #339933;">,</span><span style="color: #000000; font-weight: bold;">qword</span> <span style="color: #000000; font-weight: bold;">ptr</span> <span style="color: #009900; font-weight: bold;">&#91;</span>r9<span style="color: #339933;">+</span><span style="color: #0000ff;">8</span><span style="color: #009900; font-weight: bold;">&#93;</span>
<span style="color: #adadad; font-style: italic;">00000642</span>`8800329b 4885c0          <span style="color: #00007f; font-weight: bold;">test</span>    rax<span style="color: #339933;">,</span>rax
<span style="color: #adadad; font-style: italic;">00000642</span>`8800329e 0f8696000000    <span style="color: #00007f; font-weight: bold;">jbe</span>     PInvokeTest_ni!COM<span style="color: #339933;">+</span>_Entry_Point &lt;perf&gt; <span style="color: #009900; font-weight: bold;">&#40;</span>PInvokeTest_ni<span style="color: #339933;">+</span><span style="color: #0000ff;">0x333a</span><span style="color: #009900; font-weight: bold;">&#41;</span> <span style="color: #009900; font-weight: bold;">&#40;</span><span style="color: #0000ff;">00000642</span>`8800333a<span style="color: #009900; font-weight: bold;">&#41;</span>
<span style="color: #adadad; font-style: italic;">00000642</span>`880032a4 33c9            <span style="color: #00007f; font-weight: bold;">xor</span>     <span style="color: #00007f;">ecx</span><span style="color: #339933;">,</span><span style="color: #00007f;">ecx</span>
<span style="color: #adadad; font-style: italic;">00000642</span>`880032a6 488b4a08        <span style="color: #00007f; font-weight: bold;">mov</span>     rcx<span style="color: #339933;">,</span><span style="color: #000000; font-weight: bold;">qword</span> <span style="color: #000000; font-weight: bold;">ptr</span> <span style="color: #009900; font-weight: bold;">&#91;</span>rdx<span style="color: #339933;">+</span><span style="color: #0000ff;">8</span><span style="color: #009900; font-weight: bold;">&#93;</span>
<span style="color: #adadad; font-style: italic;">00000642</span>`880032aa 4885c9          <span style="color: #00007f; font-weight: bold;">test</span>    rcx<span style="color: #339933;">,</span>rcx
<span style="color: #adadad; font-style: italic;">00000642</span>`880032ad 0f8687000000    <span style="color: #00007f; font-weight: bold;">jbe</span>     PInvokeTest_ni!COM<span style="color: #339933;">+</span>_Entry_Point &lt;perf&gt; <span style="color: #009900; font-weight: bold;">&#40;</span>PInvokeTest_ni<span style="color: #339933;">+</span><span style="color: #0000ff;">0x333a</span><span style="color: #009900; font-weight: bold;">&#41;</span> <span style="color: #009900; font-weight: bold;">&#40;</span><span style="color: #0000ff;">00000642</span>`8800333a<span style="color: #009900; font-weight: bold;">&#41;</span>
<span style="color: #adadad; font-style: italic;">00000642</span>`880032b3 4533d2          <span style="color: #00007f; font-weight: bold;">xor</span>     r10d<span style="color: #339933;">,</span>r10d
<span style="color: #adadad; font-style: italic;">00000642</span>`880032b6 483d01000000    <span style="color: #00007f; font-weight: bold;">cmp</span>     rax<span style="color: #339933;">,</span><span style="color: #0000ff;">1</span>
<span style="color: #adadad; font-style: italic;">00000642</span>`880032bc 767c            <span style="color: #00007f; font-weight: bold;">jbe</span>     PInvokeTest_ni!COM<span style="color: #339933;">+</span>_Entry_Point &lt;perf&gt; <span style="color: #009900; font-weight: bold;">&#40;</span>PInvokeTest_ni<span style="color: #339933;">+</span><span style="color: #0000ff;">0x333a</span><span style="color: #009900; font-weight: bold;">&#41;</span> <span style="color: #009900; font-weight: bold;">&#40;</span><span style="color: #0000ff;">00000642</span>`8800333a<span style="color: #009900; font-weight: bold;">&#41;</span>
<span style="color: #adadad; font-style: italic;">00000642</span>`880032be 41ba01000000    <span style="color: #00007f; font-weight: bold;">mov</span>     r10d<span style="color: #339933;">,</span><span style="color: #0000ff;">1</span>
<span style="color: #adadad; font-style: italic;">00000642</span>`880032c4 4883f901        <span style="color: #00007f; font-weight: bold;">cmp</span>     rcx<span style="color: #339933;">,</span><span style="color: #0000ff;">1</span>
<span style="color: #adadad; font-style: italic;">00000642</span>`880032c8 <span style="color: #0000ff;">7670</span>            <span style="color: #00007f; font-weight: bold;">jbe</span>     PInvokeTest_ni!COM<span style="color: #339933;">+</span>_Entry_Point &lt;perf&gt; <span style="color: #009900; font-weight: bold;">&#40;</span>PInvokeTest_ni<span style="color: #339933;">+</span><span style="color: #0000ff;">0x333a</span><span style="color: #009900; font-weight: bold;">&#41;</span> <span style="color: #009900; font-weight: bold;">&#40;</span><span style="color: #0000ff;">00000642</span>`8800333a<span style="color: #009900; font-weight: bold;">&#41;</span>
<span style="color: #adadad; font-style: italic;">00000642</span>`880032ca 41ba01000000    <span style="color: #00007f; font-weight: bold;">mov</span>     r10d<span style="color: #339933;">,</span><span style="color: #0000ff;">1</span>
<span style="color: #adadad; font-style: italic;">00000642</span>`880032d0 483d02000000    <span style="color: #00007f; font-weight: bold;">cmp</span>     rax<span style="color: #339933;">,</span><span style="color: #0000ff;">2</span>
<span style="color: #adadad; font-style: italic;">00000642</span>`880032d6 <span style="color: #0000ff;">7662</span>            <span style="color: #00007f; font-weight: bold;">jbe</span>     PInvokeTest_ni!COM<span style="color: #339933;">+</span>_Entry_Point &lt;perf&gt; <span style="color: #009900; font-weight: bold;">&#40;</span>PInvokeTest_ni<span style="color: #339933;">+</span><span style="color: #0000ff;">0x333a</span><span style="color: #009900; font-weight: bold;">&#41;</span> <span style="color: #009900; font-weight: bold;">&#40;</span><span style="color: #0000ff;">00000642</span>`8800333a<span style="color: #009900; font-weight: bold;">&#41;</span>
<span style="color: #adadad; font-style: italic;">00000642</span>`880032d8 41ba02000000    <span style="color: #00007f; font-weight: bold;">mov</span>     r10d<span style="color: #339933;">,</span><span style="color: #0000ff;">2</span>
<span style="color: #adadad; font-style: italic;">00000642</span>`880032de 4883f902        <span style="color: #00007f; font-weight: bold;">cmp</span>     rcx<span style="color: #339933;">,</span><span style="color: #0000ff;">2</span>
<span style="color: #adadad; font-style: italic;">00000642</span>`<span style="color: #0000ff;">880032e2</span> <span style="color: #0000ff;">7656</span>            <span style="color: #00007f; font-weight: bold;">jbe</span>     PInvokeTest_ni!COM<span style="color: #339933;">+</span>_Entry_Point &lt;perf&gt; <span style="color: #009900; font-weight: bold;">&#40;</span>PInvokeTest_ni<span style="color: #339933;">+</span><span style="color: #0000ff;">0x333a</span><span style="color: #009900; font-weight: bold;">&#41;</span> <span style="color: #009900; font-weight: bold;">&#40;</span><span style="color: #0000ff;">00000642</span>`8800333a<span style="color: #009900; font-weight: bold;">&#41;</span>
<span style="color: #adadad; font-style: italic;">00000642</span>`<span style="color: #0000ff;">880032e4</span> 483d03000000    <span style="color: #00007f; font-weight: bold;">cmp</span>     rax<span style="color: #339933;">,</span><span style="color: #0000ff;">3</span>
<span style="color: #adadad; font-style: italic;">00000642</span>`880032ea 764e            <span style="color: #00007f; font-weight: bold;">jbe</span>     PInvokeTest_ni!COM<span style="color: #339933;">+</span>_Entry_Point &lt;perf&gt; <span style="color: #009900; font-weight: bold;">&#40;</span>PInvokeTest_ni<span style="color: #339933;">+</span><span style="color: #0000ff;">0x333a</span><span style="color: #009900; font-weight: bold;">&#41;</span> <span style="color: #009900; font-weight: bold;">&#40;</span><span style="color: #0000ff;">00000642</span>`8800333a<span style="color: #009900; font-weight: bold;">&#41;</span>
<span style="color: #adadad; font-style: italic;">00000642</span>`880032ec b803000000      <span style="color: #00007f; font-weight: bold;">mov</span>     <span style="color: #00007f;">eax</span><span style="color: #339933;">,</span><span style="color: #0000ff;">3</span>
<span style="color: #adadad; font-style: italic;">00000642</span>`880032f1 4883f903        <span style="color: #00007f; font-weight: bold;">cmp</span>     rcx<span style="color: #339933;">,</span><span style="color: #0000ff;">3</span>
<span style="color: #adadad; font-style: italic;">00000642</span>`880032f5 <span style="color: #0000ff;">7643</span>            <span style="color: #00007f; font-weight: bold;">jbe</span>     PInvokeTest_ni!COM<span style="color: #339933;">+</span>_Entry_Point &lt;perf&gt; <span style="color: #009900; font-weight: bold;">&#40;</span>PInvokeTest_ni<span style="color: #339933;">+</span><span style="color: #0000ff;">0x333a</span><span style="color: #009900; font-weight: bold;">&#41;</span> <span style="color: #009900; font-weight: bold;">&#40;</span><span style="color: #0000ff;">00000642</span>`8800333a<span style="color: #009900; font-weight: bold;">&#41;</span>
<span style="color: #adadad; font-style: italic;">00000642</span>`880032f7 f30f104a14      movss   <span style="color: #00007f;">xmm1</span><span style="color: #339933;">,</span><span style="color: #000000; font-weight: bold;">dword</span> <span style="color: #000000; font-weight: bold;">ptr</span> <span style="color: #009900; font-weight: bold;">&#91;</span>rdx<span style="color: #339933;">+</span><span style="color: #0000ff;">14h</span><span style="color: #009900; font-weight: bold;">&#93;</span>
<span style="color: #adadad; font-style: italic;">00000642</span>`880032fc f3410f594914    mulss   <span style="color: #00007f;">xmm1</span><span style="color: #339933;">,</span><span style="color: #000000; font-weight: bold;">dword</span> <span style="color: #000000; font-weight: bold;">ptr</span> <span style="color: #009900; font-weight: bold;">&#91;</span>r9<span style="color: #339933;">+</span><span style="color: #0000ff;">14h</span><span style="color: #009900; font-weight: bold;">&#93;</span>
<span style="color: #adadad; font-style: italic;">00000642</span>`<span style="color: #0000ff;">88003302</span> f30f104210      movss   <span style="color: #00007f;">xmm0</span><span style="color: #339933;">,</span><span style="color: #000000; font-weight: bold;">dword</span> <span style="color: #000000; font-weight: bold;">ptr</span> <span style="color: #009900; font-weight: bold;">&#91;</span>rdx<span style="color: #339933;">+</span><span style="color: #0000ff;">10h</span><span style="color: #009900; font-weight: bold;">&#93;</span>
<span style="color: #adadad; font-style: italic;">00000642</span>`<span style="color: #0000ff;">88003307</span> f3410f594110    mulss   <span style="color: #00007f;">xmm0</span><span style="color: #339933;">,</span><span style="color: #000000; font-weight: bold;">dword</span> <span style="color: #000000; font-weight: bold;">ptr</span> <span style="color: #009900; font-weight: bold;">&#91;</span>r9<span style="color: #339933;">+</span><span style="color: #0000ff;">10h</span><span style="color: #009900; font-weight: bold;">&#93;</span>
<span style="color: #adadad; font-style: italic;">00000642</span>`8800330d f30f58c8        addss   <span style="color: #00007f;">xmm1</span><span style="color: #339933;">,</span><span style="color: #00007f;">xmm0</span>
<span style="color: #adadad; font-style: italic;">00000642</span>`<span style="color: #0000ff;">88003311</span> f30f104218      movss   <span style="color: #00007f;">xmm0</span><span style="color: #339933;">,</span><span style="color: #000000; font-weight: bold;">dword</span> <span style="color: #000000; font-weight: bold;">ptr</span> <span style="color: #009900; font-weight: bold;">&#91;</span>rdx<span style="color: #339933;">+</span><span style="color: #0000ff;">18h</span><span style="color: #009900; font-weight: bold;">&#93;</span>
<span style="color: #adadad; font-style: italic;">00000642</span>`<span style="color: #0000ff;">88003316</span> f3410f594118    mulss   <span style="color: #00007f;">xmm0</span><span style="color: #339933;">,</span><span style="color: #000000; font-weight: bold;">dword</span> <span style="color: #000000; font-weight: bold;">ptr</span> <span style="color: #009900; font-weight: bold;">&#91;</span>r9<span style="color: #339933;">+</span><span style="color: #0000ff;">18h</span><span style="color: #009900; font-weight: bold;">&#93;</span>
<span style="color: #adadad; font-style: italic;">00000642</span>`8800331c f30f58c8        addss   <span style="color: #00007f;">xmm1</span><span style="color: #339933;">,</span><span style="color: #00007f;">xmm0</span>
<span style="color: #adadad; font-style: italic;">00000642</span>`<span style="color: #0000ff;">88003320</span> f30f10421c      movss   <span style="color: #00007f;">xmm0</span><span style="color: #339933;">,</span><span style="color: #000000; font-weight: bold;">dword</span> <span style="color: #000000; font-weight: bold;">ptr</span> <span style="color: #009900; font-weight: bold;">&#91;</span>rdx<span style="color: #339933;">+</span><span style="color: #0000ff;">1Ch</span><span style="color: #009900; font-weight: bold;">&#93;</span>
<span style="color: #adadad; font-style: italic;">00000642</span>`<span style="color: #0000ff;">88003325</span> f3410f59411c    mulss   <span style="color: #00007f;">xmm0</span><span style="color: #339933;">,</span><span style="color: #000000; font-weight: bold;">dword</span> <span style="color: #000000; font-weight: bold;">ptr</span> <span style="color: #009900; font-weight: bold;">&#91;</span>r9<span style="color: #339933;">+</span><span style="color: #0000ff;">1Ch</span><span style="color: #009900; font-weight: bold;">&#93;</span>
<span style="color: #adadad; font-style: italic;">00000642</span>`8800332b f30f58c8        addss   <span style="color: #00007f;">xmm1</span><span style="color: #339933;">,</span><span style="color: #00007f;">xmm0</span>
<span style="color: #adadad; font-style: italic;">00000642</span>`<span style="color: #0000ff;">8800332f</span> f3410f1108      movss   <span style="color: #000000; font-weight: bold;">dword</span> <span style="color: #000000; font-weight: bold;">ptr</span> <span style="color: #009900; font-weight: bold;">&#91;</span>r8<span style="color: #009900; font-weight: bold;">&#93;</span><span style="color: #339933;">,</span><span style="color: #00007f;">xmm1</span>
<span style="color: #adadad; font-style: italic;">00000642</span>`<span style="color: #0000ff;">88003334</span> 4883c428        <span style="color: #00007f; font-weight: bold;">add</span>     rsp<span style="color: #339933;">,</span><span style="color: #0000ff;">28h</span>
<span style="color: #adadad; font-style: italic;">00000642</span>`<span style="color: #0000ff;">88003338</span> f3c3            <span style="color: #00007f; font-weight: bold;">rep</span> <span style="color: #00007f; font-weight: bold;">ret</span>
<span style="color: #adadad; font-style: italic;">00000642</span>`8800333a e811e0a0f7      <span style="color: #00007f; font-weight: bold;">call</span>    mscorwks!JIT_RngChkFail <span style="color: #009900; font-weight: bold;">&#40;</span><span style="color: #0000ff;">00000642</span>`7fa11350<span style="color: #009900; font-weight: bold;">&#41;</span>
<span style="color: #adadad; font-style: italic;">00000642</span>`<span style="color: #0000ff;">8800333f</span> <span style="color: #0000ff;">90</span>              <span style="color: #00007f; font-weight: bold;">nop</span></pre></div></div>

<p>Interesting results eh? First off, all of the checks are right up front, and ignoring the stack frames we can see exactly what will be inlined. Some other things to note: This method appears a lot better than before, with all of the branches right up at the top where one would assume branch prediction can best deal with them (the registers never change and are being compared to constants). Never the less there are some oddities in this code, for instance there appear to be some extrenuous instructions like mov eax,3. Yeah, don&#8217;t ask me. Never the less the code is clearly superior to its previous form, and in fact the matrix version is equally as superior, with the range checks being spaced out significantly more (and a bunch are done right up front as well). Of course, the question now is: How much does this help our performance? First up we&#8217;ll examine some results from the new code base, and then some from the NGEN results on the same code base.</p>
<pre>Count: 50
PInvoke MatrixMul : 00:00:07.6456226 Average: 00:00:00.1529124
Delegate MatrixMul: 00:00:06.6500307 Average: 00:00:00.1330006
Managed MatrixMul: 00:00:05.5783511 Average: 00:00:00.1115670
Internal MatrixMul: 00:00:04.5377141 Average: 00:00:00.0907542
PInvoke Inner Product: 00:00:05.4466987 Average: 00:00:00.1089339
Delegate Inner Product: 00:00:04.5001885 Average: 00:00:00.0900037
Managed Inner Product: 00:00:00.5535891 Average: 00:00:00.0110717
Internal Inner Product: 00:00:02.2694728 Average: 00:00:00.0453894
Count: 10
PInvoke MatrixMul : 00:00:01.5706254 Average: 00:00:00.1570625
Delegate MatrixMul: 00:00:01.2689247 Average: 00:00:00.1268924
Managed MatrixMul: 00:00:01.1501118 Average: 00:00:00.1150111
Internal MatrixMul: 00:00:00.9302144 Average: 00:00:00.0930214
PInvoke Inner Product: 00:00:01.0198933 Average: 00:00:00.1019893
Delegate Inner Product: 00:00:00.8538827 Average: 00:00:00.0853882
Managed Inner Product: 00:00:00.0987369 Average: 00:00:00.0098736
Internal Inner Product: 00:00:00.4287660 Average: 00:00:00.0428766
</pre>
<p>All in all, our performance changes have helped out the managed inner product a decent amount, although even the unmanaged calls managed to get a bit of a boost. Now for the NGEN results:</p>
<pre>Count: 50
PInvoke MatrixMul : 00:00:07.5788052 Average: 00:00:00.1515761
Delegate MatrixMul: 00:00:06.2202549 Average: 00:00:00.1244050
Managed MatrixMul: 00:00:04.0376665 Average: 00:00:00.0807533
Internal MatrixMul: 00:00:04.5778189 Average: 00:00:00.0915563
PInvoke Inner Product: 00:00:05.2785764 Average: 00:00:00.1055715
Delegate Inner Product: 00:00:04.1814388 Average: 00:00:00.0836287
Managed Inner Product: 00:00:00.5579279 Average: 00:00:00.0111585
Internal Inner Product: 00:00:02.2419279 Average: 00:00:00.0448385
Count: 10
PInvoke MatrixMul : 00:00:01.3822036 Average: 00:00:00.1382203
Delegate MatrixMul: 00:00:01.1436108 Average: 00:00:00.1143610
Managed MatrixMul: 00:00:00.7386742 Average: 00:00:00.0738674
Internal MatrixMul: 00:00:00.8427460 Average: 00:00:00.0842746
PInvoke Inner Product: 00:00:00.9507331 Average: 00:00:00.0950733
Delegate Inner Product: 00:00:00.7428082 Average: 00:00:00.0742808
Managed Inner Product: 00:00:00.1005084 Average: 00:00:00.0100508
Internal Inner Product: 00:00:00.4025611 Average: 00:00:00.0402561
</pre>
<p>So, now we can see that our matrix multiplication doesn&#8217;t offer any advantages over the managed version, in fact it&#8217;s actually SLOWER than the managed version! We also can see that the unmanaged invocations also benefitted from the NGEN process, as their managed calls were also optimized somewhat, although the stub wrappers are still there and hence still add their overhead. Other things we note is that the inner product function appears to have slowed down just a bit, this might be nothing, or it might be due to machine load or it might genuinly be slower. I&#8217;m tempted to say that it&#8217;s actually slower now, though.</p>
<p><strong>Conclusion</strong></p>
<p>You may recall that this was all sparked by a discussion I had way back when about comparing managed and unmanaged benchmarks and the disadvantages of just setting the /clr flag. I&#8217;ve gone a bit past that though in looking at our managed resources and optimized unmanaged resources and when it is actually beneficial to call into unmanaged code. It is still beneficial to do so, but only with some operations that are just sufficiently taxing enough to bother with. In this case our matrix code which, while in a pure JIT situation, the native code clearly beat out the JIT produced code, gets beat out by the managed version. So what is sufficiently taxing then? Well, set processing might be taxing enough. That is: applying a set of vectorized operations to a collection of objects. But the reality is, you MUST profile first before you can be sure that optimizations of that sort are anywhere near what you need, as if you just assume it will you&#8217;re probably mistaken.</p>
<p>On a final note, the x86 version also performs better when NGENed than the native version, although in a surprise jump, the delegates actually cost significantly more:</p>
<pre>Count: 50
PInvoke MatrixMul : 00:00:07.9897235 Average: 00:00:00.1597944
Delegate MatrixMul: 00:00:27.2561396 Average: 00:00:00.5451227
Managed MatrixMul: 00:00:03.5224029 Average: 00:00:00.0704480
Internal MatrixMul: 00:00:04.5232549 Average: 00:00:00.0904650
PInvoke Inner Product: 00:00:05.5799834 Average: 00:00:00.1115996
Delegate Inner Product: 00:00:29.5660003 Average: 00:00:00.5913200
Managed Inner Product: 00:00:00.5755690 Average: 00:00:00.0115113
Internal Inner Product: 00:00:01.8218949 Average: 00:00:00.0364378
</pre>
<p>Exactly why this is I haven&#8217;t investigated, and perhaps I will next time.</p>
<p>Sources for the new inner product functions:</p>

<div class="wp_syntax"><div class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #0000ff;">void</span> __declspec<span style="color: #008000;">&#40;</span>dllexport<span style="color: #008000;">&#41;</span> inner_product<span style="color: #008000;">&#40;</span><span style="color: #0000ff;">float</span> <span style="color: #0000ff;">const</span><span style="color: #000040;">*</span> v1, <span style="color: #0000ff;">float</span> <span style="color: #0000ff;">const</span><span style="color: #000040;">*</span> v2, <span style="color: #0000ff;">float</span><span style="color: #000040;">*</span> out<span style="color: #008000;">&#41;</span> <span style="color: #008000;">&#123;</span>
        __m128 a <span style="color: #000080;">=</span> _mm_mul_ps<span style="color: #008000;">&#40;</span>_mm_loadu_ps<span style="color: #008000;">&#40;</span>v1<span style="color: #008000;">&#41;</span>, _mm_loadu_ps<span style="color: #008000;">&#40;</span>v2<span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
        a <span style="color: #000080;">=</span> _mm_add_ps<span style="color: #008000;">&#40;</span>a, _mm_shuffle_ps<span style="color: #008000;">&#40;</span>a, a, _MM_SHUFFLE<span style="color: #008000;">&#40;</span><span style="color: #0000dd;">1</span>, <span style="color: #0000dd;">0</span>, <span style="color: #0000dd;">3</span>, <span style="color: #0000dd;">2</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
        _mm_store_ss<span style="color: #008000;">&#40;</span>out, _mm_add_ps<span style="color: #008000;">&#40;</span>a, _mm_shuffle_ps<span style="color: #008000;">&#40;</span>a, a, _MM_SHUFFLE<span style="color: #008000;">&#40;</span><span style="color: #0000dd;">0</span>, <span style="color: #0000dd;">1</span>, <span style="color: #0000dd;">2</span>, <span style="color: #0000dd;">3</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
<span style="color: #008000;">&#125;</span>
&nbsp;
<span style="color: #0000ff;">static</span> <span style="color: #0000ff;">void</span> InnerProduct<span style="color: #008000;">&#40;</span>array<span style="color: #000080;">&lt;</span><span style="color: #0000ff;">float</span><span style="color: #000080;">&gt;</span><span style="color: #000040;">^</span> v1, array<span style="color: #000080;">&lt;</span><span style="color: #0000ff;">float</span><span style="color: #000080;">&gt;</span><span style="color: #000040;">^</span> v2, <span style="color: #008000;">&#91;</span>Runtime<span style="color: #008080;">::</span><span style="color: #007788;">InteropServices</span><span style="color: #008080;">::</span><span style="color: #007788;">Out</span><span style="color: #008000;">&#93;</span> float<span style="color: #000040;">%</span> result<span style="color: #008000;">&#41;</span> <span style="color: #008000;">&#123;</span>
        pin_ptr<span style="color: #000080;">&lt;</span><span style="color: #0000ff;">float</span><span style="color: #000080;">&gt;</span> pv1 <span style="color: #000080;">=</span> <span style="color: #000040;">&amp;</span>v1<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">0</span><span style="color: #008000;">&#93;</span><span style="color: #008080;">;</span>
        pin_ptr<span style="color: #000080;">&lt;</span><span style="color: #0000ff;">float</span><span style="color: #000080;">&gt;</span> pv2 <span style="color: #000080;">=</span> <span style="color: #000040;">&amp;</span>v2<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">0</span><span style="color: #008000;">&#93;</span><span style="color: #008080;">;</span>
        pin_ptr<span style="color: #000080;">&lt;</span><span style="color: #0000ff;">float</span><span style="color: #000080;">&gt;</span> out <span style="color: #000080;">=</span> <span style="color: #000040;">&amp;</span>result<span style="color: #008080;">;</span>
&nbsp;
        inner_product<span style="color: #008000;">&#40;</span>pv1, pv2, out<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
<span style="color: #008000;">&#125;</span>
&nbsp;
<span style="color: #0000ff;">public</span> <span style="color: #0000ff;">static</span> <span style="color: #0000ff;">void</span> InnerProduct2<span style="color: #008000;">&#40;</span><span style="color: #0000ff;">float</span><span style="color: #008000;">&#91;</span><span style="color: #008000;">&#93;</span> v1, <span style="color: #0000ff;">float</span><span style="color: #008000;">&#91;</span><span style="color: #008000;">&#93;</span> v2, out <span style="color: #0000ff;">float</span> f<span style="color: #008000;">&#41;</span> <span style="color: #008000;">&#123;</span>
        f <span style="color: #000080;">=</span> v1<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">0</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">*</span> v2<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">0</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">+</span> v1<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">1</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">*</span> v2<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">1</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">+</span> v1<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">2</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">*</span> v2<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">2</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">+</span> v1<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">3</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">*</span> v2<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">3</span><span style="color: #008000;">&#93;</span><span style="color: #008080;">;</span>
<span style="color: #008000;">&#125;</span></pre></div></div>

]]></content:encoded>
			<wfw:commentRss>http://scapecode.com/2009/06/playing-with-the-net-jit-part-4/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Playing With The .NET JIT Part 3</title>
		<link>http://scapecode.com/2009/06/playing-with-the-net-jit-part-3/</link>
		<comments>http://scapecode.com/2009/06/playing-with-the-net-jit-part-3/#comments</comments>
		<pubDate>Fri, 19 Jun 2009 18:57:54 +0000</pubDate>
		<dc:creator>Washu</dc:creator>
				<category><![CDATA[.Net]]></category>
		<category><![CDATA[Software Development]]></category>
		<category><![CDATA[Unamanged Code]]></category>

		<guid isPermaLink="false">http://scapecode.com/?p=17</guid>
		<description><![CDATA[Integrating unmanaged code into the managed platform is one of the problem areas with the managed world. Often times the exact costs of calling into unmanaged code is unknown. This obviously leads to some confusion as to when it is appropriate to mix in unmanaged code to help to improve the performance of our application. [...]]]></description>
			<content:encoded><![CDATA[<p>Integrating unmanaged code into the managed platform is one of the problem areas with the managed world. Often times the exact costs of calling into unmanaged code is unknown. This obviously leads to some confusion as to when it is appropriate to mix in unmanaged code to help to improve the performance of our application.</p>
<p><strong>PInvoke</strong></p>
<p>There are three ways to access an unmanaged function from managed code. The first is to use the PInvoke capabilities of the language. In C# this is done by declaring a method with external linkage and indicating (using the DllImportAttribute attribute) in which DLL the method may be found. The second way would be to obtain a pointer to the function (using LoadLibrary/GetProcAddress/FreeLibrary), and marshal that pointer to a managed delegate using Marshal.GetDelegateForFunctionPointer. Finally you can write an unmanaged wrapper around the function, using C++/CLI, and invoke that managed method, which will in turn call the unmanaged method.</p>
<p>For the purposes of this post we&#8217;ll be using two mathematical sample functions. The first being the standard inner product on R3 (aka the dot product), and the second will be a 4&#215;4 matrix multiplication. We&#8217;ll be comparing two implementations, the first will be a trivial managed implementation of them, and the second will be a SSE2 optimized version. Thanks must be given to Arseny Kapoulkine for the SSE2 version of the matrix multiplication.</p>
<p>First up are the implementations of the inner product functions, it should be noted that I&#8217;ll be doing the profiling in x64 mode, however the results are similar (albeit a bit slower) for x86.</p>

<div class="wp_syntax"><div class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #0000ff;">public</span> <span style="color: #0000ff;">static</span> <span style="color: #0000ff;">float</span> InnerProduct2<span style="color: #008000;">&#40;</span><span style="color: #0000ff;">float</span><span style="color: #008000;">&#91;</span><span style="color: #008000;">&#93;</span> v1, <span style="color: #0000ff;">float</span><span style="color: #008000;">&#91;</span><span style="color: #008000;">&#93;</span> v2<span style="color: #008000;">&#41;</span> <span style="color: #008000;">&#123;</span>
        <span style="color: #0000ff;">return</span> v1<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">0</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">*</span> v2<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">0</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">+</span> v1<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">1</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">*</span> v2<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">1</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">+</span> v1<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">2</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">*</span> v2<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">2</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">+</span> v1<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">3</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">*</span> v2<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">3</span><span style="color: #008000;">&#93;</span><span style="color: #008080;">;</span>
<span style="color: #008000;">&#125;</span>
&nbsp;
<span style="color: #0000ff;">float</span> __declspec<span style="color: #008000;">&#40;</span>dllexport<span style="color: #008000;">&#41;</span> inner_product<span style="color: #008000;">&#40;</span><span style="color: #0000ff;">float</span> <span style="color: #0000ff;">const</span><span style="color: #000040;">*</span> v1, <span style="color: #0000ff;">float</span> <span style="color: #0000ff;">const</span><span style="color: #000040;">*</span> v2<span style="color: #008000;">&#41;</span> <span style="color: #008000;">&#123;</span>
        <span style="color: #0000ff;">float</span> result<span style="color: #008080;">;</span>
        __m128 a <span style="color: #000080;">=</span> _mm_mul_ps<span style="color: #008000;">&#40;</span>_mm_loadu_ps<span style="color: #008000;">&#40;</span>v1<span style="color: #008000;">&#41;</span>, _mm_loadu_ps<span style="color: #008000;">&#40;</span>v2<span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
        a <span style="color: #000080;">=</span> _mm_add_ps<span style="color: #008000;">&#40;</span>a, _mm_shuffle_ps<span style="color: #008000;">&#40;</span>a, a, _MM_SHUFFLE<span style="color: #008000;">&#40;</span><span style="color: #0000dd;">1</span>, <span style="color: #0000dd;">0</span>, <span style="color: #0000dd;">3</span>, <span style="color: #0000dd;">2</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
        _mm_store_ss<span style="color: #008000;">&#40;</span><span style="color: #000040;">&amp;</span>result, _mm_add_ps<span style="color: #008000;">&#40;</span>a, _mm_shuffle_ps<span style="color: #008000;">&#40;</span>a, a, _MM_SHUFFLE<span style="color: #008000;">&#40;</span><span style="color: #0000dd;">0</span>, <span style="color: #0000dd;">1</span>, <span style="color: #0000dd;">2</span>, <span style="color: #0000dd;">3</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
        <span style="color: #0000ff;">return</span> result<span style="color: #008080;">;</span>
<span style="color: #008000;">&#125;</span></pre></div></div>

<p>Things that should be noted about these implementations is that they both operate soley on arrays of floats. InnerProduct2 is inlineable since it&#8217;s only 23 bytes long and is taking reference types as parameters. The unmanaged inner product could also be implemented using the SSE3 haddps instruction, however I decided to keep it as processor neutral as possible by using only SSE2 instructions.</p>
<p>The implementations of the matrix multiplication vary quite significantly as well, the managed version is the trivial implementation, but its expansion into machine code is quite long. The unmanaged version is an SSE2 optimized one, the raw performance boost of using it is quite significant.</p>

<div class="wp_syntax"><div class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #0000ff;">public</span> <span style="color: #0000ff;">static</span> <span style="color: #0000ff;">void</span> MatrixMul2<span style="color: #008000;">&#40;</span><span style="color: #0000ff;">float</span><span style="color: #008000;">&#91;</span><span style="color: #008000;">&#93;</span> m1, <span style="color: #0000ff;">float</span><span style="color: #008000;">&#91;</span><span style="color: #008000;">&#93;</span> m2, <span style="color: #0000ff;">float</span><span style="color: #008000;">&#91;</span><span style="color: #008000;">&#93;</span> o<span style="color: #008000;">&#41;</span> <span style="color: #008000;">&#123;</span>
        o<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">0</span><span style="color: #008000;">&#93;</span> <span style="color: #000080;">=</span> m1<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">0</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">*</span> m2<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">0</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">+</span> m1<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">1</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">*</span> m2<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">4</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">+</span> m1<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">2</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">*</span> m2<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">8</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">+</span> m1<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">3</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">*</span> m2<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">12</span><span style="color: #008000;">&#93;</span><span style="color: #008080;">;</span>
        o<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">1</span><span style="color: #008000;">&#93;</span> <span style="color: #000080;">=</span> m1<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">0</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">*</span> m2<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">1</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">+</span> m1<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">1</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">*</span> m2<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">5</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">+</span> m1<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">2</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">*</span> m2<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">9</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">+</span> m1<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">3</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">*</span> m2<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">13</span><span style="color: #008000;">&#93;</span><span style="color: #008080;">;</span>
        o<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">2</span><span style="color: #008000;">&#93;</span> <span style="color: #000080;">=</span> m1<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">0</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">*</span> m2<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">2</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">+</span> m1<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">1</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">*</span> m2<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">6</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">+</span> m1<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">2</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">*</span> m2<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">10</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">+</span> m1<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">3</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">*</span> m2<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">14</span><span style="color: #008000;">&#93;</span><span style="color: #008080;">;</span>
        o<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">3</span><span style="color: #008000;">&#93;</span> <span style="color: #000080;">=</span> m1<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">0</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">*</span> m2<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">3</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">+</span> m1<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">1</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">*</span> m2<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">7</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">+</span> m1<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">2</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">*</span> m2<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">11</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">+</span> m1<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">3</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">*</span> m2<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">15</span><span style="color: #008000;">&#93;</span><span style="color: #008080;">;</span>
&nbsp;
        o<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">4</span><span style="color: #008000;">&#93;</span> <span style="color: #000080;">=</span> m1<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">4</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">*</span> m2<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">0</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">+</span> m1<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">5</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">*</span> m2<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">4</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">+</span> m1<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">6</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">*</span> m2<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">8</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">+</span> m1<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">7</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">*</span> m2<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">12</span><span style="color: #008000;">&#93;</span><span style="color: #008080;">;</span>
        o<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">5</span><span style="color: #008000;">&#93;</span> <span style="color: #000080;">=</span> m1<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">4</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">*</span> m2<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">1</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">+</span> m1<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">5</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">*</span> m2<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">5</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">+</span> m1<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">6</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">*</span> m2<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">9</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">+</span> m1<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">7</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">*</span> m2<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">13</span><span style="color: #008000;">&#93;</span><span style="color: #008080;">;</span>
        o<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">6</span><span style="color: #008000;">&#93;</span> <span style="color: #000080;">=</span> m1<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">4</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">*</span> m2<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">2</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">+</span> m1<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">5</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">*</span> m2<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">6</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">+</span> m1<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">6</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">*</span> m2<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">10</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">+</span> m1<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">7</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">*</span> m2<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">14</span><span style="color: #008000;">&#93;</span><span style="color: #008080;">;</span>
        o<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">7</span><span style="color: #008000;">&#93;</span> <span style="color: #000080;">=</span> m1<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">4</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">*</span> m2<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">3</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">+</span> m1<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">5</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">*</span> m2<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">7</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">+</span> m1<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">6</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">*</span> m2<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">11</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">+</span> m1<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">7</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">*</span> m2<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">15</span><span style="color: #008000;">&#93;</span><span style="color: #008080;">;</span>
&nbsp;
        o<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">8</span><span style="color: #008000;">&#93;</span> <span style="color: #000080;">=</span> m1<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">8</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">*</span> m2<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">0</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">+</span> m1<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">9</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">*</span> m2<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">4</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">+</span> m1<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">10</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">*</span> m2<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">8</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">+</span> m1<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">11</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">*</span> m2<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">12</span><span style="color: #008000;">&#93;</span><span style="color: #008080;">;</span>
        o<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">9</span><span style="color: #008000;">&#93;</span> <span style="color: #000080;">=</span> m1<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">8</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">*</span> m2<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">1</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">+</span> m1<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">9</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">*</span> m2<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">5</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">+</span> m1<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">10</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">*</span> m2<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">9</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">+</span> m1<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">11</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">*</span> m2<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">13</span><span style="color: #008000;">&#93;</span><span style="color: #008080;">;</span>
        o<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">10</span><span style="color: #008000;">&#93;</span> <span style="color: #000080;">=</span> m1<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">8</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">*</span> m2<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">2</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">+</span> m1<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">9</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">*</span> m2<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">6</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">+</span> m1<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">10</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">*</span> m2<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">10</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">+</span> m1<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">11</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">*</span> m2<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">14</span><span style="color: #008000;">&#93;</span><span style="color: #008080;">;</span>
        o<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">11</span><span style="color: #008000;">&#93;</span> <span style="color: #000080;">=</span> m1<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">8</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">*</span> m2<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">3</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">+</span> m1<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">9</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">*</span> m2<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">7</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">+</span> m1<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">10</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">*</span> m2<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">11</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">+</span> m1<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">11</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">*</span> m2<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">15</span><span style="color: #008000;">&#93;</span><span style="color: #008080;">;</span>
&nbsp;
        o<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">12</span><span style="color: #008000;">&#93;</span> <span style="color: #000080;">=</span> m1<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">12</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">*</span> m2<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">0</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">+</span> m1<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">13</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">*</span> m2<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">4</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">+</span> m1<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">14</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">*</span> m2<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">8</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">+</span> m1<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">15</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">*</span> m2<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">12</span><span style="color: #008000;">&#93;</span><span style="color: #008080;">;</span>
        o<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">13</span><span style="color: #008000;">&#93;</span> <span style="color: #000080;">=</span> m1<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">12</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">*</span> m2<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">1</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">+</span> m1<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">13</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">*</span> m2<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">5</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">+</span> m1<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">14</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">*</span> m2<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">9</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">+</span> m1<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">15</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">*</span> m2<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">13</span><span style="color: #008000;">&#93;</span><span style="color: #008080;">;</span>
        o<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">14</span><span style="color: #008000;">&#93;</span> <span style="color: #000080;">=</span> m1<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">12</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">*</span> m2<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">2</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">+</span> m1<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">13</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">*</span> m2<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">6</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">+</span> m1<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">14</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">*</span> m2<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">10</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">+</span> m1<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">15</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">*</span> m2<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">14</span><span style="color: #008000;">&#93;</span><span style="color: #008080;">;</span>
        o<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">15</span><span style="color: #008000;">&#93;</span> <span style="color: #000080;">=</span> m1<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">12</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">*</span> m2<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">3</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">+</span> m1<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">13</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">*</span> m2<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">7</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">+</span> m1<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">14</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">*</span> m2<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">11</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">+</span> m1<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">15</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">*</span> m2<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">15</span><span style="color: #008000;">&#93;</span><span style="color: #008080;">;</span>
<span style="color: #008000;">&#125;</span>
&nbsp;
<span style="color: #0000ff;">void</span> __declspec<span style="color: #008000;">&#40;</span>dllexport<span style="color: #008000;">&#41;</span> matrix_mul<span style="color: #008000;">&#40;</span><span style="color: #0000ff;">float</span> <span style="color: #0000ff;">const</span><span style="color: #000040;">*</span> m1, <span style="color: #0000ff;">float</span> <span style="color: #0000ff;">const</span><span style="color: #000040;">*</span> m2, <span style="color: #0000ff;">float</span><span style="color: #000040;">*</span> out<span style="color: #008000;">&#41;</span>
<span style="color: #008000;">&#123;</span>
        __m128 r<span style="color: #008080;">;</span>
&nbsp;
        __m128 col1 <span style="color: #000080;">=</span> _mm_loadu_ps<span style="color: #008000;">&#40;</span>m2<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
        __m128 col2 <span style="color: #000080;">=</span> _mm_loadu_ps<span style="color: #008000;">&#40;</span>m2 <span style="color: #000040;">+</span> <span style="color: #0000dd;">4</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
        __m128 col3 <span style="color: #000080;">=</span> _mm_loadu_ps<span style="color: #008000;">&#40;</span>m2 <span style="color: #000040;">+</span> <span style="color: #0000dd;">8</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
        __m128 col4 <span style="color: #000080;">=</span> _mm_loadu_ps<span style="color: #008000;">&#40;</span>m2 <span style="color: #000040;">+</span> <span style="color: #0000dd;">12</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
&nbsp;
        __m128 row1 <span style="color: #000080;">=</span> _mm_loadu_ps<span style="color: #008000;">&#40;</span>m1<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
&nbsp;
        r <span style="color: #000080;">=</span> _mm_add_ps<span style="color: #008000;">&#40;</span>_mm_mul_ps<span style="color: #008000;">&#40;</span>_mm_shuffle_ps<span style="color: #008000;">&#40;</span>row1, row1, _MM_SHUFFLE<span style="color: #008000;">&#40;</span><span style="color: #0000dd;">0</span>, <span style="color: #0000dd;">0</span>, <span style="color: #0000dd;">0</span>, <span style="color: #0000dd;">0</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span>, col1<span style="color: #008000;">&#41;</span>,
               _mm_add_ps<span style="color: #008000;">&#40;</span>_mm_mul_ps<span style="color: #008000;">&#40;</span>_mm_shuffle_ps<span style="color: #008000;">&#40;</span>row1, row1, _MM_SHUFFLE<span style="color: #008000;">&#40;</span><span style="color: #0000dd;">1</span>, <span style="color: #0000dd;">1</span>, <span style="color: #0000dd;">1</span>, <span style="color: #0000dd;">1</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span>, col2<span style="color: #008000;">&#41;</span>,
               _mm_add_ps<span style="color: #008000;">&#40;</span>_mm_mul_ps<span style="color: #008000;">&#40;</span>_mm_shuffle_ps<span style="color: #008000;">&#40;</span>row1, row1, _MM_SHUFFLE<span style="color: #008000;">&#40;</span><span style="color: #0000dd;">2</span>, <span style="color: #0000dd;">2</span>, <span style="color: #0000dd;">2</span>, <span style="color: #0000dd;">2</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span>, col3<span style="color: #008000;">&#41;</span>,
               _mm_mul_ps<span style="color: #008000;">&#40;</span>_mm_shuffle_ps<span style="color: #008000;">&#40;</span>row1, row1, _MM_SHUFFLE<span style="color: #008000;">&#40;</span><span style="color: #0000dd;">3</span>, <span style="color: #0000dd;">3</span>, <span style="color: #0000dd;">3</span>, <span style="color: #0000dd;">3</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span>, col4<span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
&nbsp;
        _mm_storeu_ps<span style="color: #008000;">&#40;</span>out, r<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
&nbsp;
        __m128 row2 <span style="color: #000080;">=</span> _mm_loadu_ps<span style="color: #008000;">&#40;</span>m1 <span style="color: #000040;">+</span> <span style="color: #0000dd;">4</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
&nbsp;
        r <span style="color: #000080;">=</span> _mm_add_ps<span style="color: #008000;">&#40;</span>_mm_mul_ps<span style="color: #008000;">&#40;</span>_mm_shuffle_ps<span style="color: #008000;">&#40;</span>row2, row2, _MM_SHUFFLE<span style="color: #008000;">&#40;</span><span style="color: #0000dd;">0</span>, <span style="color: #0000dd;">0</span>, <span style="color: #0000dd;">0</span>, <span style="color: #0000dd;">0</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span>, col1<span style="color: #008000;">&#41;</span>,
               _mm_add_ps<span style="color: #008000;">&#40;</span>_mm_mul_ps<span style="color: #008000;">&#40;</span>_mm_shuffle_ps<span style="color: #008000;">&#40;</span>row2, row2, _MM_SHUFFLE<span style="color: #008000;">&#40;</span><span style="color: #0000dd;">1</span>, <span style="color: #0000dd;">1</span>, <span style="color: #0000dd;">1</span>, <span style="color: #0000dd;">1</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span>, col2<span style="color: #008000;">&#41;</span>,
               _mm_add_ps<span style="color: #008000;">&#40;</span>_mm_mul_ps<span style="color: #008000;">&#40;</span>_mm_shuffle_ps<span style="color: #008000;">&#40;</span>row2, row2, _MM_SHUFFLE<span style="color: #008000;">&#40;</span><span style="color: #0000dd;">2</span>, <span style="color: #0000dd;">2</span>, <span style="color: #0000dd;">2</span>, <span style="color: #0000dd;">2</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span>, col3<span style="color: #008000;">&#41;</span>,
               _mm_mul_ps<span style="color: #008000;">&#40;</span>_mm_shuffle_ps<span style="color: #008000;">&#40;</span>row2, row2, _MM_SHUFFLE<span style="color: #008000;">&#40;</span><span style="color: #0000dd;">3</span>, <span style="color: #0000dd;">3</span>, <span style="color: #0000dd;">3</span>, <span style="color: #0000dd;">3</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span>, col4<span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
&nbsp;
        _mm_storeu_ps<span style="color: #008000;">&#40;</span>out <span style="color: #000040;">+</span> <span style="color: #0000dd;">4</span>, r<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
&nbsp;
        __m128 row3 <span style="color: #000080;">=</span> _mm_loadu_ps<span style="color: #008000;">&#40;</span>m1 <span style="color: #000040;">+</span> <span style="color: #0000dd;">8</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
&nbsp;
        r <span style="color: #000080;">=</span> _mm_add_ps<span style="color: #008000;">&#40;</span>_mm_mul_ps<span style="color: #008000;">&#40;</span>_mm_shuffle_ps<span style="color: #008000;">&#40;</span>row3, row3, _MM_SHUFFLE<span style="color: #008000;">&#40;</span><span style="color: #0000dd;">0</span>, <span style="color: #0000dd;">0</span>, <span style="color: #0000dd;">0</span>, <span style="color: #0000dd;">0</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span>, col1<span style="color: #008000;">&#41;</span>,
               _mm_add_ps<span style="color: #008000;">&#40;</span>_mm_mul_ps<span style="color: #008000;">&#40;</span>_mm_shuffle_ps<span style="color: #008000;">&#40;</span>row3, row3, _MM_SHUFFLE<span style="color: #008000;">&#40;</span><span style="color: #0000dd;">1</span>, <span style="color: #0000dd;">1</span>, <span style="color: #0000dd;">1</span>, <span style="color: #0000dd;">1</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span>, col2<span style="color: #008000;">&#41;</span>,
               _mm_add_ps<span style="color: #008000;">&#40;</span>_mm_mul_ps<span style="color: #008000;">&#40;</span>_mm_shuffle_ps<span style="color: #008000;">&#40;</span>row3, row3, _MM_SHUFFLE<span style="color: #008000;">&#40;</span><span style="color: #0000dd;">2</span>, <span style="color: #0000dd;">2</span>, <span style="color: #0000dd;">2</span>, <span style="color: #0000dd;">2</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span>, col3<span style="color: #008000;">&#41;</span>,
               _mm_mul_ps<span style="color: #008000;">&#40;</span>_mm_shuffle_ps<span style="color: #008000;">&#40;</span>row3, row3, _MM_SHUFFLE<span style="color: #008000;">&#40;</span><span style="color: #0000dd;">3</span>, <span style="color: #0000dd;">3</span>, <span style="color: #0000dd;">3</span>, <span style="color: #0000dd;">3</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span>, col4<span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
&nbsp;
        _mm_storeu_ps<span style="color: #008000;">&#40;</span>out <span style="color: #000040;">+</span> <span style="color: #0000dd;">8</span>, r<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
&nbsp;
        __m128 row4 <span style="color: #000080;">=</span> _mm_loadu_ps<span style="color: #008000;">&#40;</span>m1 <span style="color: #000040;">+</span> <span style="color: #0000dd;">12</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
&nbsp;
        r <span style="color: #000080;">=</span> _mm_add_ps<span style="color: #008000;">&#40;</span>_mm_mul_ps<span style="color: #008000;">&#40;</span>_mm_shuffle_ps<span style="color: #008000;">&#40;</span>row4, row4, _MM_SHUFFLE<span style="color: #008000;">&#40;</span><span style="color: #0000dd;">0</span>, <span style="color: #0000dd;">0</span>, <span style="color: #0000dd;">0</span>, <span style="color: #0000dd;">0</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span>, col1<span style="color: #008000;">&#41;</span>,
               _mm_add_ps<span style="color: #008000;">&#40;</span>_mm_mul_ps<span style="color: #008000;">&#40;</span>_mm_shuffle_ps<span style="color: #008000;">&#40;</span>row4, row4, _MM_SHUFFLE<span style="color: #008000;">&#40;</span><span style="color: #0000dd;">1</span>, <span style="color: #0000dd;">1</span>, <span style="color: #0000dd;">1</span>, <span style="color: #0000dd;">1</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span>, col2<span style="color: #008000;">&#41;</span>,
               _mm_add_ps<span style="color: #008000;">&#40;</span>_mm_mul_ps<span style="color: #008000;">&#40;</span>_mm_shuffle_ps<span style="color: #008000;">&#40;</span>row4, row4, _MM_SHUFFLE<span style="color: #008000;">&#40;</span><span style="color: #0000dd;">2</span>, <span style="color: #0000dd;">2</span>, <span style="color: #0000dd;">2</span>, <span style="color: #0000dd;">2</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span>, col3<span style="color: #008000;">&#41;</span>,
               _mm_mul_ps<span style="color: #008000;">&#40;</span>_mm_shuffle_ps<span style="color: #008000;">&#40;</span>row4, row4, _MM_SHUFFLE<span style="color: #008000;">&#40;</span><span style="color: #0000dd;">3</span>, <span style="color: #0000dd;">3</span>, <span style="color: #0000dd;">3</span>, <span style="color: #0000dd;">3</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span>, col4<span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
&nbsp;
        _mm_storeu_ps<span style="color: #008000;">&#40;</span>out <span style="color: #000040;">+</span> <span style="color: #0000dd;">12</span>, r<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
<span style="color: #008000;">&#125;</span></pre></div></div>

<p>It is trivially obvious that the managed version of the matrix multiplication cannot be inlined. The overhead of the function call is really the least of your worries though (it is the smallest cost of the entire method really). The unmanaged version is a nicely optimized SSE2 method, and requires only a minimal number of loads and stores from main memory, and the loads and stores are reasonably cache friendly (P4 will prefetch 128 bytes of memory).</p>
<p><strong>PInvoke</strong></p>
<p>Of course, the question is, how do these perform against each other when called from a managed application. The profiling setup is quite simple. It simply runs the methods against a set of matricies and vectors (randomly generated) a million times. It repeats those tests several more times (100 in this case), and averages the results. Full optimizations were turned on for both the unmanaged and managed tests. The Internal calls are made from a managed class that directly calls to the unmanaged methods. Both the managed wrapper and the unmanaged methods are hosted in the same DLL (source for the full DLL at the end of this entry).</p>
<pre>PInvoke MatrixMul : 00:00:15.0203285 Average: 00:00:00.1502032
Delegate MatrixMul: 00:00:13.1004306 Average: 00:00:00.1310043
Managed MatrixMul: 00:00:10.2809715 Average: 00:00:00.1028097
Internal MatrixMul: 00:00:08.8992407 Average: 00:00:00.0889924
PInvoke Inner Product: 00:00:10.6779944 Average: 00:00:00.1067799
Delegate Inner Product: 00:00:09.3359882 Average: 00:00:00.0933598
Managed Inner Product: 00:00:01.3460812 Average: 00:00:00.0134608
Internal Inner Product: 00:00:05.6842336 Average: 00:00:00.0568423
</pre>
<p>The first thing to note is that the PInvoke calls for both the matrix multiplication and inner product were the slowest. The delegate calls were only slightly faster than the PInvoke calls. As we move into the managed territory we find the the results begin to diverge. The managed matrix multiplication is slower than the internal matrix multiplication, however the managed inner product is several times faster than the internal one.</p>
<p>Part of the reason behind this divergance is a result of the invocation framework. There is a cost to calling unmanaged methods from managed code, as each method must be wrapped to perform operations such as fixing any managed resources, performing marshalling for non-blittable types, and finally calling the actual native method. After returning the method further marshalling of the return type may be required, along with checks on the condition of the stack and exception checks (SEH exceptions are caught and wrapped in the SEHException class). Even the internal calls to the unmanaged method require some amount of this, although the actual marshalling requirements are avoided, as are some of the other costs. The result is that the costs add up over time, and in the case of the inner product the additional cost overrode the complexity requirements of the method (which is fairly trivial). The case, on the average, is different for the matrix multiplication. The additional costs of the call do not add a significant amount overhead compared to that of the body of the method, which executes faster than that of the managed matrix multiplication due to vectorization.</p>
<p>Performing further testing with counts at 50 and 25 reveal similar results, however the managed matrix multiplication begins to approach the performance of the internal one. However, even at a count of 1 (that&#8217;s one million matrix multiplications), the internal matrix multiplication is faster than the managed version.</p>
<pre>Count = 50
PInvoke MatrixMul : 00:00:07.4730356 Average: 00:00:00.1494607
Delegate MatrixMul: 00:00:06.4519274 Average: 00:00:00.1290385
Managed MatrixMul: 00:00:05.1662482 Average: 00:00:00.1033249
Internal MatrixMul: 00:00:04.3371530 Average: 00:00:00.0867430
PInvoke Inner Product: 00:00:05.3891030 Average: 00:00:00.1077820
Delegate Inner Product: 00:00:04.7625597 Average: 00:00:00.0952511
Managed Inner Product: 00:00:00.6791549 Average: 00:00:00.0135830
Internal Inner Product: 00:00:02.6719175 Average: 00:00:00.0534383

Count = 25
PInvoke MatrixMul : 00:00:03.7432932 Average: 00:00:00.1497317
Delegate MatrixMul: 00:00:03.2074834 Average: 00:00:00.1282993
Managed MatrixMul: 00:00:02.6200096 Average: 00:00:00.1048003
Internal MatrixMul: 00:00:02.2144342 Average: 00:00:00.0885773
PInvoke Inner Product: 00:00:02.8778559 Average: 00:00:00.1151142
Delegate Inner Product: 00:00:02.0178957 Average: 00:00:00.0807158
Managed Inner Product: 00:00:00.3385675 Average: 00:00:00.0135427
Internal Inner Product: 00:00:01.4391529 Average: 00:00:00.0575661

Count = 5
PInvoke MatrixMul : 00:00:00.7642981 Average: 00:00:00.1528596
Delegate MatrixMul: 00:00:00.6407667 Average: 00:00:00.1281533
Managed MatrixMul: 00:00:00.5231416 Average: 00:00:00.1046283
Internal MatrixMul: 00:00:00.4458765 Average: 00:00:00.0891753
PInvoke Inner Product: 00:00:00.5702666 Average: 00:00:00.1140533
Delegate Inner Product: 00:00:00.4122217 Average: 00:00:00.0824443
Managed Inner Product: 00:00:00.0683842 Average: 00:00:00.0136768
Internal Inner Product: 00:00:00.2899304 Average: 00:00:00.0579860

Count = 1
PInvoke MatrixMul : 00:00:00.1476958 Average: 00:00:00.1476958
Delegate MatrixMul: 00:00:00.1337818 Average: 00:00:00.1337818
Managed MatrixMul: 00:00:00.1155993 Average: 00:00:00.1155993
Internal MatrixMul: 00:00:00.0919538 Average: 00:00:00.0919538
PInvoke Inner Product: 00:00:00.1155769 Average: 00:00:00.1155769
Delegate Inner Product: 00:00:00.0906768 Average: 00:00:00.0906768
Managed Inner Product: 00:00:00.0155480 Average: 00:00:00.0155480
Internal Inner Product: 00:00:00.0653527 Average: 00:00:00.0653527
</pre>
<p><strong>Conclusion</strong></p>
<p>Clearly we should reserve unmanaged operations for longer running methods where the cost of the managed wrappers is negligible compared to the cost of the method. Even heavily optimized methods cost significantly in the wrapping code, and so trivial optimizations are easily overshadowed by that cost. It is best to use unmanaged operations wrapped in a C++/CLI wrapper (and preferably the wrapper will be part of the library that the operations are in). Next time we&#8217;ll look at the assembly produced by the JIT for these methods under varying circumstances.</p>
<p>Source for Managed DLL:</p>

<div class="wp_syntax"><div class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #339900;">#pragma managed(push, off)</span>
<span style="color: #339900;">#include &lt;intrin.h&gt;</span>
&nbsp;
<span style="color: #0000ff;">extern</span> <span style="color: #FF0000;">&quot;C&quot;</span> <span style="color: #008000;">&#123;</span>
        <span style="color: #0000ff;">float</span> __declspec<span style="color: #008000;">&#40;</span>dllexport<span style="color: #008000;">&#41;</span> inner_product<span style="color: #008000;">&#40;</span><span style="color: #0000ff;">float</span> <span style="color: #0000ff;">const</span><span style="color: #000040;">*</span> v1, <span style="color: #0000ff;">float</span> <span style="color: #0000ff;">const</span><span style="color: #000040;">*</span> v2<span style="color: #008000;">&#41;</span> <span style="color: #008000;">&#123;</span>
               <span style="color: #0000ff;">float</span> result<span style="color: #008080;">;</span>
               __m128 a <span style="color: #000080;">=</span> _mm_mul_ps<span style="color: #008000;">&#40;</span>_mm_loadu_ps<span style="color: #008000;">&#40;</span>v1<span style="color: #008000;">&#41;</span>, _mm_loadu_ps<span style="color: #008000;">&#40;</span>v2<span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
               a <span style="color: #000080;">=</span> _mm_add_ps<span style="color: #008000;">&#40;</span>a, _mm_shuffle_ps<span style="color: #008000;">&#40;</span>a, a, _MM_SHUFFLE<span style="color: #008000;">&#40;</span><span style="color: #0000dd;">1</span>, <span style="color: #0000dd;">0</span>, <span style="color: #0000dd;">3</span>, <span style="color: #0000dd;">2</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
               _mm_store_ss<span style="color: #008000;">&#40;</span><span style="color: #000040;">&amp;</span>result, _mm_add_ps<span style="color: #008000;">&#40;</span>a, _mm_shuffle_ps<span style="color: #008000;">&#40;</span>a, a, _MM_SHUFFLE<span style="color: #008000;">&#40;</span><span style="color: #0000dd;">0</span>, <span style="color: #0000dd;">1</span>, <span style="color: #0000dd;">2</span>, <span style="color: #0000dd;">3</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
               <span style="color: #0000ff;">return</span> result<span style="color: #008080;">;</span>
        <span style="color: #008000;">&#125;</span>
&nbsp;
<span style="color: #0000ff;">void</span> __declspec<span style="color: #008000;">&#40;</span>dllexport<span style="color: #008000;">&#41;</span> matrix_mul<span style="color: #008000;">&#40;</span><span style="color: #0000ff;">float</span> <span style="color: #0000ff;">const</span><span style="color: #000040;">*</span> m1, <span style="color: #0000ff;">float</span> <span style="color: #0000ff;">const</span><span style="color: #000040;">*</span> m2, <span style="color: #0000ff;">float</span><span style="color: #000040;">*</span> out<span style="color: #008000;">&#41;</span>
<span style="color: #008000;">&#123;</span>
        __m128 r<span style="color: #008080;">;</span>
&nbsp;
        __m128 col1 <span style="color: #000080;">=</span> _mm_loadu_ps<span style="color: #008000;">&#40;</span>m2<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
        __m128 col2 <span style="color: #000080;">=</span> _mm_loadu_ps<span style="color: #008000;">&#40;</span>m2 <span style="color: #000040;">+</span> <span style="color: #0000dd;">4</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
        __m128 col3 <span style="color: #000080;">=</span> _mm_loadu_ps<span style="color: #008000;">&#40;</span>m2 <span style="color: #000040;">+</span> <span style="color: #0000dd;">8</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
        __m128 col4 <span style="color: #000080;">=</span> _mm_loadu_ps<span style="color: #008000;">&#40;</span>m2 <span style="color: #000040;">+</span> <span style="color: #0000dd;">12</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
&nbsp;
        __m128 row1 <span style="color: #000080;">=</span> _mm_loadu_ps<span style="color: #008000;">&#40;</span>m1<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
&nbsp;
        r <span style="color: #000080;">=</span> _mm_add_ps<span style="color: #008000;">&#40;</span>_mm_mul_ps<span style="color: #008000;">&#40;</span>_mm_shuffle_ps<span style="color: #008000;">&#40;</span>row1, row1, _MM_SHUFFLE<span style="color: #008000;">&#40;</span><span style="color: #0000dd;">0</span>, <span style="color: #0000dd;">0</span>, <span style="color: #0000dd;">0</span>, <span style="color: #0000dd;">0</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span>, col1<span style="color: #008000;">&#41;</span>,
               _mm_add_ps<span style="color: #008000;">&#40;</span>_mm_mul_ps<span style="color: #008000;">&#40;</span>_mm_shuffle_ps<span style="color: #008000;">&#40;</span>row1, row1, _MM_SHUFFLE<span style="color: #008000;">&#40;</span><span style="color: #0000dd;">1</span>, <span style="color: #0000dd;">1</span>, <span style="color: #0000dd;">1</span>, <span style="color: #0000dd;">1</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span>, col2<span style="color: #008000;">&#41;</span>,
               _mm_add_ps<span style="color: #008000;">&#40;</span>_mm_mul_ps<span style="color: #008000;">&#40;</span>_mm_shuffle_ps<span style="color: #008000;">&#40;</span>row1, row1, _MM_SHUFFLE<span style="color: #008000;">&#40;</span><span style="color: #0000dd;">2</span>, <span style="color: #0000dd;">2</span>, <span style="color: #0000dd;">2</span>, <span style="color: #0000dd;">2</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span>, col3<span style="color: #008000;">&#41;</span>,
               _mm_mul_ps<span style="color: #008000;">&#40;</span>_mm_shuffle_ps<span style="color: #008000;">&#40;</span>row1, row1, _MM_SHUFFLE<span style="color: #008000;">&#40;</span><span style="color: #0000dd;">3</span>, <span style="color: #0000dd;">3</span>, <span style="color: #0000dd;">3</span>, <span style="color: #0000dd;">3</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span>, col4<span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
&nbsp;
        _mm_storeu_ps<span style="color: #008000;">&#40;</span>out, r<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
        __m128 row2 <span style="color: #000080;">=</span> _mm_loadu_ps<span style="color: #008000;">&#40;</span>m1 <span style="color: #000040;">+</span> <span style="color: #0000dd;">4</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
&nbsp;
        r <span style="color: #000080;">=</span> _mm_add_ps<span style="color: #008000;">&#40;</span>_mm_mul_ps<span style="color: #008000;">&#40;</span>_mm_shuffle_ps<span style="color: #008000;">&#40;</span>row2, row2, _MM_SHUFFLE<span style="color: #008000;">&#40;</span><span style="color: #0000dd;">0</span>, <span style="color: #0000dd;">0</span>, <span style="color: #0000dd;">0</span>, <span style="color: #0000dd;">0</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span>, col1<span style="color: #008000;">&#41;</span>,
               _mm_add_ps<span style="color: #008000;">&#40;</span>_mm_mul_ps<span style="color: #008000;">&#40;</span>_mm_shuffle_ps<span style="color: #008000;">&#40;</span>row2, row2, _MM_SHUFFLE<span style="color: #008000;">&#40;</span><span style="color: #0000dd;">1</span>, <span style="color: #0000dd;">1</span>, <span style="color: #0000dd;">1</span>, <span style="color: #0000dd;">1</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span>, col2<span style="color: #008000;">&#41;</span>,
               _mm_add_ps<span style="color: #008000;">&#40;</span>_mm_mul_ps<span style="color: #008000;">&#40;</span>_mm_shuffle_ps<span style="color: #008000;">&#40;</span>row2, row2, _MM_SHUFFLE<span style="color: #008000;">&#40;</span><span style="color: #0000dd;">2</span>, <span style="color: #0000dd;">2</span>, <span style="color: #0000dd;">2</span>, <span style="color: #0000dd;">2</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span>, col3<span style="color: #008000;">&#41;</span>,
               _mm_mul_ps<span style="color: #008000;">&#40;</span>_mm_shuffle_ps<span style="color: #008000;">&#40;</span>row2, row2, _MM_SHUFFLE<span style="color: #008000;">&#40;</span><span style="color: #0000dd;">3</span>, <span style="color: #0000dd;">3</span>, <span style="color: #0000dd;">3</span>, <span style="color: #0000dd;">3</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span>, col4<span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
&nbsp;
        _mm_storeu_ps<span style="color: #008000;">&#40;</span>out <span style="color: #000040;">+</span> <span style="color: #0000dd;">4</span>, r<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
        __m128 row3 <span style="color: #000080;">=</span> _mm_loadu_ps<span style="color: #008000;">&#40;</span>m1 <span style="color: #000040;">+</span> <span style="color: #0000dd;">8</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
&nbsp;
        r <span style="color: #000080;">=</span> _mm_add_ps<span style="color: #008000;">&#40;</span>_mm_mul_ps<span style="color: #008000;">&#40;</span>_mm_shuffle_ps<span style="color: #008000;">&#40;</span>row3, row3, _MM_SHUFFLE<span style="color: #008000;">&#40;</span><span style="color: #0000dd;">0</span>, <span style="color: #0000dd;">0</span>, <span style="color: #0000dd;">0</span>, <span style="color: #0000dd;">0</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span>, col1<span style="color: #008000;">&#41;</span>,
               _mm_add_ps<span style="color: #008000;">&#40;</span>_mm_mul_ps<span style="color: #008000;">&#40;</span>_mm_shuffle_ps<span style="color: #008000;">&#40;</span>row3, row3, _MM_SHUFFLE<span style="color: #008000;">&#40;</span><span style="color: #0000dd;">1</span>, <span style="color: #0000dd;">1</span>, <span style="color: #0000dd;">1</span>, <span style="color: #0000dd;">1</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span>, col2<span style="color: #008000;">&#41;</span>,
               _mm_add_ps<span style="color: #008000;">&#40;</span>_mm_mul_ps<span style="color: #008000;">&#40;</span>_mm_shuffle_ps<span style="color: #008000;">&#40;</span>row3, row3, _MM_SHUFFLE<span style="color: #008000;">&#40;</span><span style="color: #0000dd;">2</span>, <span style="color: #0000dd;">2</span>, <span style="color: #0000dd;">2</span>, <span style="color: #0000dd;">2</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span>, col3<span style="color: #008000;">&#41;</span>,
               _mm_mul_ps<span style="color: #008000;">&#40;</span>_mm_shuffle_ps<span style="color: #008000;">&#40;</span>row3, row3, _MM_SHUFFLE<span style="color: #008000;">&#40;</span><span style="color: #0000dd;">3</span>, <span style="color: #0000dd;">3</span>, <span style="color: #0000dd;">3</span>, <span style="color: #0000dd;">3</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span>, col4<span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
&nbsp;
        _mm_storeu_ps<span style="color: #008000;">&#40;</span>out <span style="color: #000040;">+</span> <span style="color: #0000dd;">8</span>, r<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
        __m128 row4 <span style="color: #000080;">=</span> _mm_loadu_ps<span style="color: #008000;">&#40;</span>m1 <span style="color: #000040;">+</span> <span style="color: #0000dd;">12</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
&nbsp;
        r <span style="color: #000080;">=</span> _mm_add_ps<span style="color: #008000;">&#40;</span>_mm_mul_ps<span style="color: #008000;">&#40;</span>_mm_shuffle_ps<span style="color: #008000;">&#40;</span>row4, row4, _MM_SHUFFLE<span style="color: #008000;">&#40;</span><span style="color: #0000dd;">0</span>, <span style="color: #0000dd;">0</span>, <span style="color: #0000dd;">0</span>, <span style="color: #0000dd;">0</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span>, col1<span style="color: #008000;">&#41;</span>,
               _mm_add_ps<span style="color: #008000;">&#40;</span>_mm_mul_ps<span style="color: #008000;">&#40;</span>_mm_shuffle_ps<span style="color: #008000;">&#40;</span>row4, row4, _MM_SHUFFLE<span style="color: #008000;">&#40;</span><span style="color: #0000dd;">1</span>, <span style="color: #0000dd;">1</span>, <span style="color: #0000dd;">1</span>, <span style="color: #0000dd;">1</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span>, col2<span style="color: #008000;">&#41;</span>,
               _mm_add_ps<span style="color: #008000;">&#40;</span>_mm_mul_ps<span style="color: #008000;">&#40;</span>_mm_shuffle_ps<span style="color: #008000;">&#40;</span>row4, row4, _MM_SHUFFLE<span style="color: #008000;">&#40;</span><span style="color: #0000dd;">2</span>, <span style="color: #0000dd;">2</span>, <span style="color: #0000dd;">2</span>, <span style="color: #0000dd;">2</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span>, col3<span style="color: #008000;">&#41;</span>,
                _mm_mul_ps<span style="color: #008000;">&#40;</span>_mm_shuffle_ps<span style="color: #008000;">&#40;</span>row4, row4, _MM_SHUFFLE<span style="color: #008000;">&#40;</span><span style="color: #0000dd;">3</span>, <span style="color: #0000dd;">3</span>, <span style="color: #0000dd;">3</span>, <span style="color: #0000dd;">3</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span>, col4<span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
&nbsp;
        _mm_storeu_ps<span style="color: #008000;">&#40;</span>out <span style="color: #000040;">+</span> <span style="color: #0000dd;">12</span>, r<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
<span style="color: #008000;">&#125;</span>
<span style="color: #008000;">&#125;</span>
<span style="color: #339900;">#pragma managed(pop)</span>
&nbsp;
<span style="color: #0000ff;">using</span> <span style="color: #0000ff;">namespace</span> System<span style="color: #008080;">;</span>
&nbsp;
<span style="color: #0000ff;">namespace</span> ManagedMathLib <span style="color: #008000;">&#123;</span>
        <span style="color: #0000ff;">public</span> ref <span style="color: #0000ff;">class</span> ManagedMath <span style="color: #008000;">&#123;</span>
        <span style="color: #0000ff;">public</span><span style="color: #008080;">:</span>
               <span style="color: #0000ff;">static</span> IntPtr InnerProductPtr <span style="color: #000080;">=</span> IntPtr<span style="color: #008000;">&#40;</span>inner_product<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
               <span style="color: #0000ff;">static</span> IntPtr MatrixMulPtr <span style="color: #000080;">=</span> IntPtr<span style="color: #008000;">&#40;</span>matrix_mul<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
&nbsp;
               <span style="color: #0000ff;">static</span> <span style="color: #0000ff;">float</span> InnerProduct<span style="color: #008000;">&#40;</span>array<span style="color: #000080;">&lt;</span><span style="color: #0000ff;">float</span><span style="color: #000080;">&gt;</span><span style="color: #000040;">^</span> v1, array<span style="color: #000080;">&lt;</span><span style="color: #0000ff;">float</span><span style="color: #000080;">&gt;</span><span style="color: #000040;">^</span> v2<span style="color: #008000;">&#41;</span> <span style="color: #008000;">&#123;</span>
                       pin_ptr<span style="color: #000080;">&lt;</span><span style="color: #0000ff;">float</span><span style="color: #000080;">&gt;</span> pv1 <span style="color: #000080;">=</span> <span style="color: #000040;">&amp;</span>v1<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">0</span><span style="color: #008000;">&#93;</span><span style="color: #008080;">;</span>
                       pin_ptr<span style="color: #000080;">&lt;</span><span style="color: #0000ff;">float</span><span style="color: #000080;">&gt;</span> pv2 <span style="color: #000080;">=</span> <span style="color: #000040;">&amp;</span>v2<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">0</span><span style="color: #008000;">&#93;</span><span style="color: #008080;">;</span>
&nbsp;
                       <span style="color: #0000ff;">return</span> inner_product<span style="color: #008000;">&#40;</span>pv1, pv2<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
               <span style="color: #008000;">&#125;</span>
&nbsp;
               <span style="color: #0000ff;">static</span> <span style="color: #0000ff;">void</span> MatrixMul<span style="color: #008000;">&#40;</span>array<span style="color: #000080;">&lt;</span><span style="color: #0000ff;">float</span><span style="color: #000080;">&gt;</span><span style="color: #000040;">^</span> m1, array<span style="color: #000080;">&lt;</span><span style="color: #0000ff;">float</span><span style="color: #000080;">&gt;</span><span style="color: #000040;">^</span> m2, array<span style="color: #000080;">&lt;</span><span style="color: #0000ff;">float</span><span style="color: #000080;">&gt;</span><span style="color: #000040;">^</span> out<span style="color: #008000;">&#41;</span> <span style="color: #008000;">&#123;</span>
                       pin_ptr<span style="color: #000080;">&lt;</span><span style="color: #0000ff;">float</span><span style="color: #000080;">&gt;</span> pm1 <span style="color: #000080;">=</span> <span style="color: #000040;">&amp;</span>m1<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">0</span><span style="color: #008000;">&#93;</span><span style="color: #008080;">;</span>
                       pin_ptr<span style="color: #000080;">&lt;</span><span style="color: #0000ff;">float</span><span style="color: #000080;">&gt;</span> pm2 <span style="color: #000080;">=</span> <span style="color: #000040;">&amp;</span>m2<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">0</span><span style="color: #008000;">&#93;</span><span style="color: #008080;">;</span>
                       pin_ptr<span style="color: #000080;">&lt;</span><span style="color: #0000ff;">float</span><span style="color: #000080;">&gt;</span> outp <span style="color: #000080;">=</span> <span style="color: #000040;">&amp;</span>out<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">0</span><span style="color: #008000;">&#93;</span><span style="color: #008080;">;</span>
                       matrix_mul<span style="color: #008000;">&#40;</span>pm1, pm2, outp<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
               <span style="color: #008000;">&#125;</span>
        <span style="color: #008000;">&#125;</span><span style="color: #008080;">;</span>
<span style="color: #008000;">&#125;</span></pre></div></div>

]]></content:encoded>
			<wfw:commentRss>http://scapecode.com/2009/06/playing-with-the-net-jit-part-3/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Playing With The .NET JIT Part 2</title>
		<link>http://scapecode.com/2009/06/playing-with-the-net-jit-part-2/</link>
		<comments>http://scapecode.com/2009/06/playing-with-the-net-jit-part-2/#comments</comments>
		<pubDate>Fri, 19 Jun 2009 18:56:05 +0000</pubDate>
		<dc:creator>Washu</dc:creator>
				<category><![CDATA[.Net]]></category>
		<category><![CDATA[Software Development]]></category>
		<category><![CDATA[Unamanged Code]]></category>

		<guid isPermaLink="false">http://scapecode.com/?p=25</guid>
		<description><![CDATA[Previously I discussed various potential issues the x86 JIT had with inlining non-trivial methods and functions taking or returning value types. In this entry I hope to cover some potential pitfalls facing would be optimizers, along with discussing some unexpected optimizations that do take place. Optimizations That Aren&#8217;t It is not that uncommon to see [...]]]></description>
			<content:encoded><![CDATA[<p>Previously I discussed various potential issues the x86 JIT had with inlining non-trivial methods and functions taking or returning value types. In this entry I hope to cover some potential pitfalls facing would be optimizers, along with discussing some unexpected optimizations that do take place.</p>
<p><strong>Optimizations That Aren&#8217;t</strong></p>
<p>It is not that uncommon to see people advocating the usage of unsafe code as a means of producing &#8220;optimized&#8221; code in the managed environment. The idea is a simple one, by getting down to the metal with pointers and all that fun stuff, you can somehow produce code that will be &#8220;optimized&#8221; in ways that typical managed code cannot be.</p>
<p>Unsafe code does not allow you to manipulate pointers to managed objects in whatever manner you please. Certain steps have to be taken to ensure that your operations are safe with regards to the managed heap. Just because your code is marked as &#8220;unsafe&#8221; doesn&#8217;t mean that it is free to do what it wants. For example, you cannot assign a pointer the address of a managed object without first pinning the object. Pointers to objects are not tracked by the GC, so should you obtain a pointer to an object and then attempt to use the pointer, you could end up accessing a now collected region of memory. What can also happen is that you could obtain a pointer to an object, but when the GC runs your object could be shuffled around on the heap. This shuffling would invalidate your pointer, but since pointers are not tracked by the GC it would not be updated (while references to objects are updated). Pinning objects solves this problem, and hence is why you are only allowed to take the address of an object that&#8217;s been pinned. In essence, a pinned object cannot be moved nor collected by the GC until it is unpinned. This is typically done through the use of the fixed keyword in C# or the GCHandle structure.</p>
<p>Much like how a fixed object cannot be moved by the GC, a pointer to a fixed object cannot be reassigned. This makes it difficult to traverse primitive arrays, as you end up needing to create other temporary pointers, or limiting the size of the fixed area to a small segment. Fixed objects, and unsafe code, increase the overall size of the produced IL by a fairly significant margin. While an increase in the IL is not indicative of the size of the produced machine code, it does prevent the runtime from inlining such methods. As an example, the two following snippets reveal the difference between a safe inner product and an unsafe one; note that in the unmanaged case it was using a fixed sized buffer.</p>

<div class="wp_syntax"><div class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #0000ff;">public</span> <span style="color: #0000ff;">float</span> Magnitude<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span> <span style="color: #008000;">&#123;</span>
    <span style="color: #0000ff;">return</span> <span style="color: #008000;">&#40;</span><span style="color: #0000ff;">float</span><span style="color: #008000;">&#41;</span>Math.<span style="color: #007788;">Sqrt</span><span style="color: #008000;">&#40;</span>X <span style="color: #000040;">*</span> X <span style="color: #000040;">+</span> Y <span style="color: #000040;">*</span> Y <span style="color: #000040;">+</span> Z <span style="color: #000040;">*</span> Z<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
<span style="color: #008000;">&#125;</span>
&nbsp;
.<span style="color: #007788;">method</span> <span style="color: #0000ff;">public</span> hidebysig instance float32 Magnitude<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span> cil managed
<span style="color: #008000;">&#123;</span>
    .<span style="color: #007788;">maxstack</span> <span style="color: #0000dd;">8</span>
    L_0000<span style="color: #008080;">:</span> ldarg.0
    L_0001<span style="color: #008080;">:</span> ldfld float32 PerformanceTests.<span style="color: #007788;">Vector3</span><span style="color: #008080;">::</span><span style="color: #007788;">X</span>
    L_0006<span style="color: #008080;">:</span> ldarg.0
    L_0007<span style="color: #008080;">:</span> ldfld float32 PerformanceTests.<span style="color: #007788;">Vector3</span><span style="color: #008080;">::</span><span style="color: #007788;">X</span>
    L_000c<span style="color: #008080;">:</span> mul
    L_000d<span style="color: #008080;">:</span> ldarg.0
    L_000e<span style="color: #008080;">:</span> ldfld float32 PerformanceTests.<span style="color: #007788;">Vector3</span><span style="color: #008080;">::</span><span style="color: #007788;">Y</span>
    L_0013<span style="color: #008080;">:</span> ldarg.0
    L_0014<span style="color: #008080;">:</span> ldfld float32 PerformanceTests.<span style="color: #007788;">Vector3</span><span style="color: #008080;">::</span><span style="color: #007788;">Y</span>
    L_0019<span style="color: #008080;">:</span> mul
    L_001a<span style="color: #008080;">:</span> add
    L_001b<span style="color: #008080;">:</span> ldarg.0
    L_001c<span style="color: #008080;">:</span> ldfld float32 PerformanceTests.<span style="color: #007788;">Vector3</span><span style="color: #008080;">::</span><span style="color: #007788;">Z</span>
    L_0021<span style="color: #008080;">:</span> ldarg.0
    L_0022<span style="color: #008080;">:</span> ldfld float32 PerformanceTests.<span style="color: #007788;">Vector3</span><span style="color: #008080;">::</span><span style="color: #007788;">Z</span>
    L_0027<span style="color: #008080;">:</span> mul
    L_0028<span style="color: #008080;">:</span> add
    L_0029<span style="color: #008080;">:</span> conv.<span style="color: #007788;">r8</span>
    L_002a<span style="color: #008080;">:</span> call float64 <span style="color: #008000;">&#91;</span>mscorlib<span style="color: #008000;">&#93;</span>System.<span style="color: #007788;">Math</span><span style="color: #008080;">::</span><span style="color: #007788;">Sqrt</span><span style="color: #008000;">&#40;</span>float64<span style="color: #008000;">&#41;</span>
    L_002f<span style="color: #008080;">:</span> conv.<span style="color: #007788;">r4</span>
    L_0030<span style="color: #008080;">:</span> ret
<span style="color: #008000;">&#125;</span>
&nbsp;
<span style="color: #0000ff;">public</span> <span style="color: #0000ff;">float</span> Magnitude<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span> <span style="color: #008000;">&#123;</span>
    fixed <span style="color: #008000;">&#40;</span><span style="color: #0000ff;">float</span><span style="color: #000040;">*</span> p <span style="color: #000080;">=</span> V<span style="color: #008000;">&#41;</span> <span style="color: #008000;">&#123;</span>
       <span style="color: #0000ff;">return</span> <span style="color: #008000;">&#40;</span><span style="color: #0000ff;">float</span><span style="color: #008000;">&#41;</span>Math.<span style="color: #007788;">Sqrt</span><span style="color: #008000;">&#40;</span>p<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">0</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">*</span> p<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">0</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">+</span> p<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">1</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">*</span> p<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">1</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">+</span> p<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">2</span><span style="color: #008000;">&#93;</span> <span style="color: #000040;">*</span> p<span style="color: #008000;">&#91;</span><span style="color: #0000dd;">2</span><span style="color: #008000;">&#93;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
    <span style="color: #008000;">&#125;</span>
<span style="color: #008000;">&#125;</span>
&nbsp;
.<span style="color: #007788;">method</span> <span style="color: #0000ff;">public</span> hidebysig instance float32 Magnitude<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span> cil managed
<span style="color: #008000;">&#123;</span>
    .<span style="color: #007788;">maxstack</span> <span style="color: #0000dd;">4</span>
    .<span style="color: #007788;">locals</span> init <span style="color: #008000;">&#40;</span>
       <span style="color: #008000;">&#91;</span><span style="color: #0000dd;">0</span><span style="color: #008000;">&#93;</span> float32<span style="color: #000040;">&amp;</span> pinned singleRef1,
       <span style="color: #008000;">&#91;</span><span style="color: #0000dd;">1</span><span style="color: #008000;">&#93;</span> float32 single1<span style="color: #008000;">&#41;</span>
    L_0000<span style="color: #008080;">:</span> ldarg.0
    L_0001<span style="color: #008080;">:</span> ldflda PerformanceTests.<span style="color: #007788;">Unsafe</span>.<span style="color: #007788;">Vector3</span><span style="color: #000040;">/</span><span style="color: #000080;">&lt;</span>v<span style="color: #000080;">&gt;</span>e__FixedBuffer0 PerformanceTests.<span style="color: #007788;">Unsafe</span>.<span style="color: #007788;">Vector3</span><span style="color: #008080;">::</span><span style="color: #007788;">V</span>
    L_0006<span style="color: #008080;">:</span> ldflda float32 PerformanceTests.<span style="color: #007788;">Unsafe</span>.<span style="color: #007788;">Vector3</span><span style="color: #000040;">/</span><span style="color: #000080;">&lt;</span>v<span style="color: #000080;">&gt;</span>e__FixedBuffer0<span style="color: #008080;">::</span><span style="color: #007788;">FixedElementField</span>
    L_000b<span style="color: #008080;">:</span> stloc.0
    L_000c<span style="color: #008080;">:</span> ldloc.0
    L_000d<span style="color: #008080;">:</span> conv.<span style="color: #007788;">i</span>
    L_000e<span style="color: #008080;">:</span> ldind.<span style="color: #007788;">r4</span>
    L_000f<span style="color: #008080;">:</span> ldloc.0
    L_0010<span style="color: #008080;">:</span> conv.<span style="color: #007788;">i</span>
    L_0011<span style="color: #008080;">:</span> ldind.<span style="color: #007788;">r4</span>
    L_0012<span style="color: #008080;">:</span> mul
    L_0013<span style="color: #008080;">:</span> ldloc.0
    L_0014<span style="color: #008080;">:</span> conv.<span style="color: #007788;">i</span>
    L_0015<span style="color: #008080;">:</span> ldc.<span style="color: #007788;">i4</span>.4
    L_0016<span style="color: #008080;">:</span> conv.<span style="color: #007788;">i</span>
    L_0017<span style="color: #008080;">:</span> add
    L_0018<span style="color: #008080;">:</span> ldind.<span style="color: #007788;">r4</span>
    L_0019<span style="color: #008080;">:</span> ldloc.0
    L_001a<span style="color: #008080;">:</span> conv.<span style="color: #007788;">i</span>
    L_001b<span style="color: #008080;">:</span> ldc.<span style="color: #007788;">i4</span>.4
    L_001c<span style="color: #008080;">:</span> conv.<span style="color: #007788;">i</span>
    L_001d<span style="color: #008080;">:</span> add
    L_001e<span style="color: #008080;">:</span> ldind.<span style="color: #007788;">r4</span>
    L_001f<span style="color: #008080;">:</span> mul
    L_0020<span style="color: #008080;">:</span> add
    L_0021<span style="color: #008080;">:</span> ldloc.0
    L_0022<span style="color: #008080;">:</span> conv.<span style="color: #007788;">i</span>
    L_0023<span style="color: #008080;">:</span> ldc.<span style="color: #007788;">i4</span>.8
    L_0024<span style="color: #008080;">:</span> conv.<span style="color: #007788;">i</span>
    L_0025<span style="color: #008080;">:</span> add
    L_0026<span style="color: #008080;">:</span> ldind.<span style="color: #007788;">r4</span>
    L_0027<span style="color: #008080;">:</span> ldloc.0
    L_0028<span style="color: #008080;">:</span> conv.<span style="color: #007788;">i</span>
    L_0029<span style="color: #008080;">:</span> ldc.<span style="color: #007788;">i4</span>.8
    L_002a<span style="color: #008080;">:</span> conv.<span style="color: #007788;">i</span>
    L_002b<span style="color: #008080;">:</span> add
    L_002c<span style="color: #008080;">:</span> ldind.<span style="color: #007788;">r4</span>
    L_002d<span style="color: #008080;">:</span> mul
    L_002e<span style="color: #008080;">:</span> add
    L_002f<span style="color: #008080;">:</span> conv.<span style="color: #007788;">r8</span>
    L_0030<span style="color: #008080;">:</span> call float64 <span style="color: #008000;">&#91;</span>mscorlib<span style="color: #008000;">&#93;</span>System.<span style="color: #007788;">Math</span><span style="color: #008080;">::</span><span style="color: #007788;">Sqrt</span><span style="color: #008000;">&#40;</span>float64<span style="color: #008000;">&#41;</span>
    L_0035<span style="color: #008080;">:</span> conv.<span style="color: #007788;">r4</span>
    L_0036<span style="color: #008080;">:</span> stloc.1
    L_0037<span style="color: #008080;">:</span> leave.<span style="color: #007788;">s</span> L_0039
    L_0039<span style="color: #008080;">:</span> ldloc.1
    L_003a<span style="color: #008080;">:</span> ret
<span style="color: #008000;">&#125;</span></pre></div></div>

<p>Note that neither of these two appear to be candidates for inlining, both being well over the 32 byte IL limit. The produced IL, while not directly indicative of the assembly produced by the JIT compiler, does tend to give an overall idea of how much larger we should expect this method to be when reproduced in machine code. Fixed length buffers have other issues that need addressing: You cannot access a fixed length buffer outside of a fixed statement. They are also an unsafe construct, and so you must indicate that the type is unsafe. Finally, they produce temporary types at compilation time that can throw off serialization and other reflection based mechanisms.</p>
<p>In the end, unsafe code does not increase performance, and the reliance upon platform structures to ensure safety, such as the fixed construct, introduces more problems than it solves. Furthermore, even the smallest method that might be inlined tends to bloat up to the point where inlining by the JIT is no longer possible.</p>
<p><strong>Surprising Developments and JIT Optimizations</strong></p>
<p>Previously I noted that the JIT compiler can only inline a method that is a maximum of 32 bytes of IL in length. However, I wasn&#8217;t completely honest with you. In some cases the JIT compiler will inline chunks of code that are longer than 32 bytes of IL. I have not dug in-depth into the reasons for this, nor when these conditions may arise. As such this information is presented as an informal experimental result. In the case of a function returning the result of an intrinsic operation, there may arise a condition whereby the result is inlined. Two examples of this behavior will be shown, note that in both cases the function used is an intrinsic math function and that neither are passed value types (which will prevent inlining). The first is the Magnitude function, which we saw above. Calling it results in it being inlined and produces the following inlined assembly.</p>

<div class="wp_syntax"><div class="code"><pre class="asm" style="font-family:monospace;"><span style="color: #adadad; font-style: italic;">00220164</span> D945D4         <span style="color: #0000ff; font-weight: bold;">fld</span>        <span style="color: #000000; font-weight: bold;">dword</span> <span style="color: #000000; font-weight: bold;">ptr</span> <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #00007f;">ebp</span><span style="color: #339933;">-</span><span style="color: #0000ff;">2Ch</span><span style="color: #009900; font-weight: bold;">&#93;</span>
<span style="color: #adadad; font-style: italic;">00220167</span> D8C8           <span style="color: #0000ff; font-weight: bold;">fmul</span>       <span style="color: #00007f;">st</span><span style="color: #339933;">,</span><span style="color: #00007f;">st</span><span style="color: #009900; font-weight: bold;">&#40;</span><span style="color: #0000ff;">0</span><span style="color: #009900; font-weight: bold;">&#41;</span>
<span style="color: #adadad; font-style: italic;">00220169</span> D945D8         <span style="color: #0000ff; font-weight: bold;">fld</span>        <span style="color: #000000; font-weight: bold;">dword</span> <span style="color: #000000; font-weight: bold;">ptr</span> <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #00007f;">ebp</span><span style="color: #339933;">-</span><span style="color: #0000ff;">28h</span><span style="color: #009900; font-weight: bold;">&#93;</span>
<span style="color: #adadad; font-style: italic;">0022016C</span> D8C8           <span style="color: #0000ff; font-weight: bold;">fmul</span>       <span style="color: #00007f;">st</span><span style="color: #339933;">,</span><span style="color: #00007f;">st</span><span style="color: #009900; font-weight: bold;">&#40;</span><span style="color: #0000ff;">0</span><span style="color: #009900; font-weight: bold;">&#41;</span>
<span style="color: #adadad; font-style: italic;">0022016E</span> DEC1           <span style="color: #0000ff; font-weight: bold;">faddp</span>      <span style="color: #00007f;">st</span><span style="color: #009900; font-weight: bold;">&#40;</span><span style="color: #0000ff;">1</span><span style="color: #009900; font-weight: bold;">&#41;</span><span style="color: #339933;">,</span><span style="color: #00007f;">st</span>
<span style="color: #adadad; font-style: italic;">00220170</span> D945DC         <span style="color: #0000ff; font-weight: bold;">fld</span>        <span style="color: #000000; font-weight: bold;">dword</span> <span style="color: #000000; font-weight: bold;">ptr</span> <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #00007f;">ebp</span><span style="color: #339933;">-</span><span style="color: #0000ff;">24h</span><span style="color: #009900; font-weight: bold;">&#93;</span>
<span style="color: #adadad; font-style: italic;">00220173</span> D8C8           <span style="color: #0000ff; font-weight: bold;">fmul</span>       <span style="color: #00007f;">st</span><span style="color: #339933;">,</span><span style="color: #00007f;">st</span><span style="color: #009900; font-weight: bold;">&#40;</span><span style="color: #0000ff;">0</span><span style="color: #009900; font-weight: bold;">&#41;</span>
<span style="color: #adadad; font-style: italic;">00220175</span> DEC1           <span style="color: #0000ff; font-weight: bold;">faddp</span>      <span style="color: #00007f;">st</span><span style="color: #009900; font-weight: bold;">&#40;</span><span style="color: #0000ff;">1</span><span style="color: #009900; font-weight: bold;">&#41;</span><span style="color: #339933;">,</span><span style="color: #00007f;">st</span>
<span style="color: #adadad; font-style: italic;">00220177</span> DD5D9C         <span style="color: #0000ff; font-weight: bold;">fstp</span>       <span style="color: #000000; font-weight: bold;">qword</span> <span style="color: #000000; font-weight: bold;">ptr</span> <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #00007f;">ebp</span><span style="color: #339933;">-</span><span style="color: #0000ff;">64h</span><span style="color: #009900; font-weight: bold;">&#93;</span>
<span style="color: #adadad; font-style: italic;">0022017A</span> DD459C         <span style="color: #0000ff; font-weight: bold;">fld</span>        <span style="color: #000000; font-weight: bold;">qword</span> <span style="color: #000000; font-weight: bold;">ptr</span> <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #00007f;">ebp</span><span style="color: #339933;">-</span><span style="color: #0000ff;">64h</span><span style="color: #009900; font-weight: bold;">&#93;</span>
<span style="color: #adadad; font-style: italic;">0022017D</span> D9FA           <span style="color: #0000ff; font-weight: bold;">fsqrt</span></pre></div></div>

<p>We note that this is the optimal form for the magnitude function, with a minimal number of memory reads, the majority of the work taking place on the FPU stack. Compared to the unsafe version, which is shown next, you can clearly see how much worse unsafe code is.</p>

<div class="wp_syntax"><div class="code"><pre class="asm" style="font-family:monospace;"><span style="color: #adadad; font-style: italic;">007A0438</span> <span style="color: #0000ff;">55</span>             <span style="color: #00007f; font-weight: bold;">push</span>       <span style="color: #00007f;">ebp</span>
<span style="color: #adadad; font-style: italic;">007A0439</span> 8BEC           <span style="color: #00007f; font-weight: bold;">mov</span>        <span style="color: #00007f;">ebp</span><span style="color: #339933;">,</span><span style="color: #00007f;">esp</span>
<span style="color: #adadad; font-style: italic;">007A043B</span> <span style="color: #0000ff;">57</span>             <span style="color: #00007f; font-weight: bold;">push</span>       <span style="color: #00007f;">edi</span>
<span style="color: #adadad; font-style: italic;">007A043C</span> <span style="color: #0000ff;">56</span>             <span style="color: #00007f; font-weight: bold;">push</span>       <span style="color: #00007f;">esi</span>
<span style="color: #adadad; font-style: italic;">007A043D</span> <span style="color: #0000ff;">53</span>             <span style="color: #00007f; font-weight: bold;">push</span>       <span style="color: #00007f;">ebx</span>
<span style="color: #adadad; font-style: italic;">007A043E</span> 83EC10         <span style="color: #00007f; font-weight: bold;">sub</span>        <span style="color: #00007f;">esp</span><span style="color: #339933;">,</span><span style="color: #0000ff;">10h</span>
<span style="color: #adadad; font-style: italic;">007A0441</span> 33C0           <span style="color: #00007f; font-weight: bold;">xor</span>        <span style="color: #00007f;">eax</span><span style="color: #339933;">,</span><span style="color: #00007f;">eax</span>
<span style="color: #adadad; font-style: italic;">007A0443</span> 8945F0         <span style="color: #00007f; font-weight: bold;">mov</span>        <span style="color: #000000; font-weight: bold;">dword</span> <span style="color: #000000; font-weight: bold;">ptr</span> <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #00007f;">ebp</span><span style="color: #339933;">-</span><span style="color: #0000ff;">10h</span><span style="color: #009900; font-weight: bold;">&#93;</span><span style="color: #339933;">,</span><span style="color: #00007f;">eax</span>
<span style="color: #adadad; font-style: italic;">007A0446</span> 894DF0         <span style="color: #00007f; font-weight: bold;">mov</span>        <span style="color: #000000; font-weight: bold;">dword</span> <span style="color: #000000; font-weight: bold;">ptr</span> <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #00007f;">ebp</span><span style="color: #339933;">-</span><span style="color: #0000ff;">10h</span><span style="color: #009900; font-weight: bold;">&#93;</span><span style="color: #339933;">,</span><span style="color: #00007f;">ecx</span>
<span style="color: #adadad; font-style: italic;">007A0449</span> D901           <span style="color: #0000ff; font-weight: bold;">fld</span>        <span style="color: #000000; font-weight: bold;">dword</span> <span style="color: #000000; font-weight: bold;">ptr</span> <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #00007f;">ecx</span><span style="color: #009900; font-weight: bold;">&#93;</span>
<span style="color: #adadad; font-style: italic;">007A044B</span> 8BF1           <span style="color: #00007f; font-weight: bold;">mov</span>        <span style="color: #00007f;">esi</span><span style="color: #339933;">,</span><span style="color: #00007f;">ecx</span>
<span style="color: #adadad; font-style: italic;">007A044D</span> D80E           <span style="color: #0000ff; font-weight: bold;">fmul</span>       <span style="color: #000000; font-weight: bold;">dword</span> <span style="color: #000000; font-weight: bold;">ptr</span> <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #00007f;">esi</span><span style="color: #009900; font-weight: bold;">&#93;</span>
<span style="color: #adadad; font-style: italic;">007A044F</span> 8BF9           <span style="color: #00007f; font-weight: bold;">mov</span>        <span style="color: #00007f;">edi</span><span style="color: #339933;">,</span><span style="color: #00007f;">ecx</span>
<span style="color: #adadad; font-style: italic;">007A0451</span> D94704         <span style="color: #0000ff; font-weight: bold;">fld</span>        <span style="color: #000000; font-weight: bold;">dword</span> <span style="color: #000000; font-weight: bold;">ptr</span> <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #00007f;">edi</span><span style="color: #339933;">+</span><span style="color: #0000ff;">4</span><span style="color: #009900; font-weight: bold;">&#93;</span>
<span style="color: #adadad; font-style: italic;">007A0454</span> 8BD1           <span style="color: #00007f; font-weight: bold;">mov</span>        <span style="color: #00007f;">edx</span><span style="color: #339933;">,</span><span style="color: #00007f;">ecx</span>
<span style="color: #adadad; font-style: italic;">007A0456</span> D84A04         <span style="color: #0000ff; font-weight: bold;">fmul</span>       <span style="color: #000000; font-weight: bold;">dword</span> <span style="color: #000000; font-weight: bold;">ptr</span> <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #00007f;">edx</span><span style="color: #339933;">+</span><span style="color: #0000ff;">4</span><span style="color: #009900; font-weight: bold;">&#93;</span>
<span style="color: #adadad; font-style: italic;">007A0459</span> DEC1           <span style="color: #0000ff; font-weight: bold;">faddp</span>      <span style="color: #00007f;">st</span><span style="color: #009900; font-weight: bold;">&#40;</span><span style="color: #0000ff;">1</span><span style="color: #009900; font-weight: bold;">&#41;</span><span style="color: #339933;">,</span><span style="color: #00007f;">st</span>
<span style="color: #adadad; font-style: italic;">007A045B</span> 8BC1           <span style="color: #00007f; font-weight: bold;">mov</span>        <span style="color: #00007f;">eax</span><span style="color: #339933;">,</span><span style="color: #00007f;">ecx</span>
<span style="color: #adadad; font-style: italic;">007A045D</span> D94008         <span style="color: #0000ff; font-weight: bold;">fld</span>        <span style="color: #000000; font-weight: bold;">dword</span> <span style="color: #000000; font-weight: bold;">ptr</span> <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #00007f;">eax</span><span style="color: #339933;">+</span><span style="color: #0000ff;">8</span><span style="color: #009900; font-weight: bold;">&#93;</span>
<span style="color: #adadad; font-style: italic;">007A0460</span> 8BD8           <span style="color: #00007f; font-weight: bold;">mov</span>        <span style="color: #00007f;">ebx</span><span style="color: #339933;">,</span><span style="color: #00007f;">eax</span>
<span style="color: #adadad; font-style: italic;">007A0462</span> D84B08         <span style="color: #0000ff; font-weight: bold;">fmul</span>       <span style="color: #000000; font-weight: bold;">dword</span> <span style="color: #000000; font-weight: bold;">ptr</span> <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #00007f;">ebx</span><span style="color: #339933;">+</span><span style="color: #0000ff;">8</span><span style="color: #009900; font-weight: bold;">&#93;</span>
<span style="color: #adadad; font-style: italic;">007A0465</span> DEC1           <span style="color: #0000ff; font-weight: bold;">faddp</span>      <span style="color: #00007f;">st</span><span style="color: #009900; font-weight: bold;">&#40;</span><span style="color: #0000ff;">1</span><span style="color: #009900; font-weight: bold;">&#41;</span><span style="color: #339933;">,</span><span style="color: #00007f;">st</span>
<span style="color: #adadad; font-style: italic;">007A0467</span> DD5DE4         <span style="color: #0000ff; font-weight: bold;">fstp</span>       <span style="color: #000000; font-weight: bold;">qword</span> <span style="color: #000000; font-weight: bold;">ptr</span> <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #00007f;">ebp</span><span style="color: #339933;">-</span><span style="color: #0000ff;">1Ch</span><span style="color: #009900; font-weight: bold;">&#93;</span>
<span style="color: #adadad; font-style: italic;">007A046A</span> DD45E4         <span style="color: #0000ff; font-weight: bold;">fld</span>        <span style="color: #000000; font-weight: bold;">qword</span> <span style="color: #000000; font-weight: bold;">ptr</span> <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #00007f;">ebp</span><span style="color: #339933;">-</span><span style="color: #0000ff;">1Ch</span><span style="color: #009900; font-weight: bold;">&#93;</span>
<span style="color: #adadad; font-style: italic;">007A046D</span> D9FA           <span style="color: #0000ff; font-weight: bold;">fsqrt</span>
<span style="color: #adadad; font-style: italic;">007A046F</span> D95DEC         <span style="color: #0000ff; font-weight: bold;">fstp</span>       <span style="color: #000000; font-weight: bold;">dword</span> <span style="color: #000000; font-weight: bold;">ptr</span> <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #00007f;">ebp</span><span style="color: #339933;">-</span><span style="color: #0000ff;">14h</span><span style="color: #009900; font-weight: bold;">&#93;</span>
<span style="color: #adadad; font-style: italic;">007A0472</span> D945EC         <span style="color: #0000ff; font-weight: bold;">fld</span>        <span style="color: #000000; font-weight: bold;">dword</span> <span style="color: #000000; font-weight: bold;">ptr</span> <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #00007f;">ebp</span><span style="color: #339933;">-</span><span style="color: #0000ff;">14h</span><span style="color: #009900; font-weight: bold;">&#93;</span>
<span style="color: #adadad; font-style: italic;">007A0475</span> 8D65F4         <span style="color: #00007f; font-weight: bold;">lea</span>        <span style="color: #00007f;">esp</span><span style="color: #339933;">,</span><span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #00007f;">ebp</span><span style="color: #339933;">-</span><span style="color: #0000ff;">0Ch</span><span style="color: #009900; font-weight: bold;">&#93;</span>
<span style="color: #adadad; font-style: italic;">007A0478</span> 5B             <span style="color: #00007f; font-weight: bold;">pop</span>        <span style="color: #00007f;">ebx</span>
<span style="color: #adadad; font-style: italic;">007A0479</span> 5E             <span style="color: #00007f; font-weight: bold;">pop</span>        <span style="color: #00007f;">esi</span>
<span style="color: #adadad; font-style: italic;">007A047A</span> <span style="color: #0000ff;">5F</span>             <span style="color: #00007f; font-weight: bold;">pop</span>        <span style="color: #00007f;">edi</span>
<span style="color: #adadad; font-style: italic;">007A047B</span> 5D             <span style="color: #00007f; font-weight: bold;">pop</span>        <span style="color: #00007f;">ebp</span>
<span style="color: #adadad; font-style: italic;">007A047C</span> C3             <span style="color: #00007f; font-weight: bold;">ret</span></pre></div></div>

<p>Next up is a fairly ubiquitous utility function which obtains the angle between two unit length vectors, note that acos is not directly producible as a machine instruction, none the less it is considered an intrinsic function. As we see below, this produces a nicely optimized set of instructions, with only a single call to a function (which computes acos).</p>

<div class="wp_syntax"><div class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #0000ff;">public</span> <span style="color: #0000ff;">static</span> <span style="color: #0000ff;">float</span> AngleBetween<span style="color: #008000;">&#40;</span>ref Vector3 lhs, ref Vector3 rhs<span style="color: #008000;">&#41;</span> <span style="color: #008000;">&#123;</span>
    <span style="color: #0000ff;">return</span> <span style="color: #008000;">&#40;</span><span style="color: #0000ff;">float</span><span style="color: #008000;">&#41;</span>Math.<span style="color: #007788;">Acos</span><span style="color: #008000;">&#40;</span>lhs.<span style="color: #007788;">X</span> <span style="color: #000040;">*</span> rhs.<span style="color: #007788;">X</span> <span style="color: #000040;">+</span> lhs.<span style="color: #007788;">Y</span> <span style="color: #000040;">*</span> rhs.<span style="color: #007788;">Y</span> <span style="color: #000040;">+</span> lhs.<span style="color: #007788;">Z</span> <span style="color: #000040;">*</span> rhs.<span style="color: #007788;">Z</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
<span style="color: #008000;">&#125;</span>
&nbsp;
.<span style="color: #007788;">method</span> <span style="color: #0000ff;">public</span> hidebysig <span style="color: #0000ff;">static</span> float32 AngleBetween<span style="color: #008000;">&#40;</span>PerformanceTests.<span style="color: #007788;">Vector3</span><span style="color: #000040;">&amp;</span> lhs, PerformanceTests.<span style="color: #007788;">Vector3</span><span style="color: #000040;">&amp;</span> rhs<span style="color: #008000;">&#41;</span> cil managed
<span style="color: #008000;">&#123;</span>
     .<span style="color: #007788;">maxstack</span> <span style="color: #0000dd;">8</span>
     L_0000<span style="color: #008080;">:</span> ldarg.0
     L_0001<span style="color: #008080;">:</span> ldfld float32 PerformanceTests.<span style="color: #007788;">Vector3</span><span style="color: #008080;">::</span><span style="color: #007788;">X</span>
     L_0006<span style="color: #008080;">:</span> ldarg.1
     L_0007<span style="color: #008080;">:</span> ldfld float32 PerformanceTests.<span style="color: #007788;">Vector3</span><span style="color: #008080;">::</span><span style="color: #007788;">X</span>
     L_000c<span style="color: #008080;">:</span> mul
     L_000d<span style="color: #008080;">:</span> ldarg.0
     L_000e<span style="color: #008080;">:</span> ldfld float32 PerformanceTests.<span style="color: #007788;">Vector3</span><span style="color: #008080;">::</span><span style="color: #007788;">Y</span>
     L_0013<span style="color: #008080;">:</span> ldarg.1
     L_0014<span style="color: #008080;">:</span> ldfld float32 PerformanceTests.<span style="color: #007788;">Vector3</span><span style="color: #008080;">::</span><span style="color: #007788;">Y</span>
     L_0019<span style="color: #008080;">:</span> mul
     L_001a<span style="color: #008080;">:</span> add
     L_001b<span style="color: #008080;">:</span> ldarg.0
     L_001c<span style="color: #008080;">:</span> ldfld float32 PerformanceTests.<span style="color: #007788;">Vector3</span><span style="color: #008080;">::</span><span style="color: #007788;">Z</span>
     L_0021<span style="color: #008080;">:</span> ldarg.1
     L_0022<span style="color: #008080;">:</span> ldfld float32 PerformanceTests.<span style="color: #007788;">Vector3</span><span style="color: #008080;">::</span><span style="color: #007788;">Z</span>
     L_0027<span style="color: #008080;">:</span> mul
     L_0028<span style="color: #008080;">:</span> add
     L_0029<span style="color: #008080;">:</span> conv.<span style="color: #007788;">r8</span>
     L_002a<span style="color: #008080;">:</span> call float64 <span style="color: #008000;">&#91;</span>mscorlib<span style="color: #008000;">&#93;</span>System.<span style="color: #007788;">Math</span><span style="color: #008080;">::</span><span style="color: #007788;">Acos</span><span style="color: #008000;">&#40;</span>float64<span style="color: #008000;">&#41;</span>
     L_002f<span style="color: #008080;">:</span> conv.<span style="color: #007788;">r4</span>
     L_0030<span style="color: #008080;">:</span> ret
<span style="color: #008000;">&#125;</span></pre></div></div>


<div class="wp_syntax"><div class="code"><pre class="asm" style="font-family:monospace;"><span style="color: #adadad; font-style: italic;">007A01D9</span> 8D55D4          <span style="color: #00007f; font-weight: bold;">lea</span>        <span style="color: #00007f;">edx</span><span style="color: #339933;">,</span><span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #00007f;">ebp</span><span style="color: #339933;">-</span><span style="color: #0000ff;">2Ch</span><span style="color: #009900; font-weight: bold;">&#93;</span>
<span style="color: #adadad; font-style: italic;">007A01DC</span> 8D4DC8          <span style="color: #00007f; font-weight: bold;">lea</span>        <span style="color: #00007f;">ecx</span><span style="color: #339933;">,</span><span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #00007f;">ebp</span><span style="color: #339933;">-</span><span style="color: #0000ff;">38h</span><span style="color: #009900; font-weight: bold;">&#93;</span>
<span style="color: #adadad; font-style: italic;">007A01DF</span> D902            <span style="color: #0000ff; font-weight: bold;">fld</span>        <span style="color: #000000; font-weight: bold;">dword</span> <span style="color: #000000; font-weight: bold;">ptr</span> <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #00007f;">edx</span><span style="color: #009900; font-weight: bold;">&#93;</span>
<span style="color: #adadad; font-style: italic;">007A01E1</span> D809            <span style="color: #0000ff; font-weight: bold;">fmul</span>       <span style="color: #000000; font-weight: bold;">dword</span> <span style="color: #000000; font-weight: bold;">ptr</span> <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #00007f;">ecx</span><span style="color: #009900; font-weight: bold;">&#93;</span>
<span style="color: #adadad; font-style: italic;">007A01E3</span> D94204          <span style="color: #0000ff; font-weight: bold;">fld</span>        <span style="color: #000000; font-weight: bold;">dword</span> <span style="color: #000000; font-weight: bold;">ptr</span> <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #00007f;">edx</span><span style="color: #339933;">+</span><span style="color: #0000ff;">4</span><span style="color: #009900; font-weight: bold;">&#93;</span>
<span style="color: #adadad; font-style: italic;">007A01E6</span> D84904          <span style="color: #0000ff; font-weight: bold;">fmul</span>       <span style="color: #000000; font-weight: bold;">dword</span> <span style="color: #000000; font-weight: bold;">ptr</span> <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #00007f;">ecx</span><span style="color: #339933;">+</span><span style="color: #0000ff;">4</span><span style="color: #009900; font-weight: bold;">&#93;</span>
<span style="color: #adadad; font-style: italic;">007A01E9</span> DEC1            <span style="color: #0000ff; font-weight: bold;">faddp</span>      <span style="color: #00007f;">st</span><span style="color: #009900; font-weight: bold;">&#40;</span><span style="color: #0000ff;">1</span><span style="color: #009900; font-weight: bold;">&#41;</span><span style="color: #339933;">,</span><span style="color: #00007f;">st</span>
<span style="color: #adadad; font-style: italic;">007A01EB</span> D94208          <span style="color: #0000ff; font-weight: bold;">fld</span>        <span style="color: #000000; font-weight: bold;">dword</span> <span style="color: #000000; font-weight: bold;">ptr</span> <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #00007f;">edx</span><span style="color: #339933;">+</span><span style="color: #0000ff;">8</span><span style="color: #009900; font-weight: bold;">&#93;</span>
<span style="color: #adadad; font-style: italic;">007A01EE</span> D84908          <span style="color: #0000ff; font-weight: bold;">fmul</span>       <span style="color: #000000; font-weight: bold;">dword</span> <span style="color: #000000; font-weight: bold;">ptr</span> <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #00007f;">ecx</span><span style="color: #339933;">+</span><span style="color: #0000ff;">8</span><span style="color: #009900; font-weight: bold;">&#93;</span>
<span style="color: #adadad; font-style: italic;">007A01F1</span> DEC1            <span style="color: #0000ff; font-weight: bold;">faddp</span>      <span style="color: #00007f;">st</span><span style="color: #009900; font-weight: bold;">&#40;</span><span style="color: #0000ff;">1</span><span style="color: #009900; font-weight: bold;">&#41;</span><span style="color: #339933;">,</span><span style="color: #00007f;">st</span>
<span style="color: #adadad; font-style: italic;">007A01F3</span> 83EC08          <span style="color: #00007f; font-weight: bold;">sub</span>        <span style="color: #00007f;">esp</span><span style="color: #339933;">,</span><span style="color: #0000ff;">8</span>
<span style="color: #adadad; font-style: italic;">007A01F6</span> DD1C24          <span style="color: #0000ff; font-weight: bold;">fstp</span>       <span style="color: #000000; font-weight: bold;">qword</span> <span style="color: #000000; font-weight: bold;">ptr</span> <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #00007f;">esp</span><span style="color: #009900; font-weight: bold;">&#93;</span>
<span style="color: #adadad; font-style: italic;">007A01F9</span> E868A5AF79      <span style="color: #00007f; font-weight: bold;">call</span>       7A29A766 <span style="color: #009900; font-weight: bold;">&#40;</span>System<span style="color: #339933;">.</span>Math<span style="color: #339933;">.</span>Acos<span style="color: #009900; font-weight: bold;">&#40;</span>Double<span style="color: #009900; font-weight: bold;">&#41;</span><span style="color: #339933;">,</span> mdToken<span style="color: #339933;">:</span> 06000b28<span style="color: #009900; font-weight: bold;">&#41;</span></pre></div></div>

<p>Finally there is the issue of SIMD instruction sets. While the JIT will not use SIMD instructions on the x86 platform, it will utilize them for other operations. One common operation you see is the conversion of floating point numbers to integers. In .NET 2.0 the JIT will optimize this to use the SSE2 instruction. For instance, the following snippet of code will result in the assembly dump following.</p>

<div class="wp_syntax"><div class="code"><pre class="asm" style="font-family:monospace;"><span style="color: #00007f; font-weight: bold;">int</span> n = <span style="color: #009900; font-weight: bold;">&#40;</span><span style="color: #00007f; font-weight: bold;">int</span><span style="color: #009900; font-weight: bold;">&#41;</span>r<span style="color: #339933;">.</span>NextDouble<span style="color: #009900; font-weight: bold;">&#40;</span><span style="color: #009900; font-weight: bold;">&#41;</span><span style="color: #666666; font-style: italic;">;</span>
&nbsp;
<span style="color: #adadad; font-style: italic;">002A02FB</span> 8BCB            <span style="color: #00007f; font-weight: bold;">mov</span>        <span style="color: #00007f;">ecx</span><span style="color: #339933;">,</span><span style="color: #00007f;">ebx</span>
<span style="color: #adadad; font-style: italic;">002A02FD</span> 8B01            <span style="color: #00007f; font-weight: bold;">mov</span>        <span style="color: #00007f;">eax</span><span style="color: #339933;">,</span><span style="color: #000000; font-weight: bold;">dword</span> <span style="color: #000000; font-weight: bold;">ptr</span> <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #00007f;">ecx</span><span style="color: #009900; font-weight: bold;">&#93;</span>
<span style="color: #adadad; font-style: italic;">002A02FF</span> FF5048          <span style="color: #00007f; font-weight: bold;">call</span>       <span style="color: #000000; font-weight: bold;">dword</span> <span style="color: #000000; font-weight: bold;">ptr</span> <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #00007f;">eax</span><span style="color: #339933;">+</span><span style="color: #0000ff;">48h</span><span style="color: #009900; font-weight: bold;">&#93;</span>
<span style="color: #adadad; font-style: italic;">002A0302</span> DD5DA0          <span style="color: #0000ff; font-weight: bold;">fstp</span>       <span style="color: #000000; font-weight: bold;">qword</span> <span style="color: #000000; font-weight: bold;">ptr</span> <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #00007f;">ebp</span><span style="color: #339933;">-</span><span style="color: #0000ff;">60h</span><span style="color: #009900; font-weight: bold;">&#93;</span>
<span style="color: #adadad; font-style: italic;">002A0305</span> F20F1045A0      <span style="color: #00007f; font-weight: bold;">movsd</span>      <span style="color: #00007f;">xmm0</span><span style="color: #339933;">,</span>mmword <span style="color: #000000; font-weight: bold;">ptr</span> <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #00007f;">ebp</span><span style="color: #339933;">-</span><span style="color: #0000ff;">60h</span><span style="color: #009900; font-weight: bold;">&#93;</span>
<span style="color: #adadad; font-style: italic;">002A030A</span> F20F2CF0        cvttsd2si  <span style="color: #00007f;">esi</span><span style="color: #339933;">,</span><span style="color: #00007f;">xmm0</span></pre></div></div>

<p>While not quite as optimal as it could be if the JIT were using the full SSE2 instruction set, this minor optimization can go a long way.</p>
<p>So what is left to visit? Well, there&#8217;s obviously the x64 platform, which is growing in popularity. The x64 platform presents new opportunities to explore, including certain guarantees and performance benefits that aren&#8217;t available on the x86 platform. Amongst them are a whole new set of optimizations and available instruction sets that the JIT can take advantage of. Finally there is the case of calling to unmanaged code for highly performance intensive operations. Hand optimized SIMD code and the potential performance benefits or hazards calling to an unmanaged function can incur.</p>
]]></content:encoded>
			<wfw:commentRss>http://scapecode.com/2009/06/playing-with-the-net-jit-part-2/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Playing With The .NET JIT Part 1</title>
		<link>http://scapecode.com/2009/06/playing-with-the-net-jit-part-1/</link>
		<comments>http://scapecode.com/2009/06/playing-with-the-net-jit-part-1/#comments</comments>
		<pubDate>Fri, 19 Jun 2009 18:55:26 +0000</pubDate>
		<dc:creator>Washu</dc:creator>
				<category><![CDATA[.Net]]></category>
		<category><![CDATA[Software Development]]></category>
		<category><![CDATA[Unamanged Code]]></category>

		<guid isPermaLink="false">http://scapecode.com/?p=28</guid>
		<description><![CDATA[Introduction .NET has been getting some interesting press recently. Even to the point where an article in Game Developer Magazine was published advocating the usage of managed code for rapid development of components. However, I did raise some issues with the author in regards to the performance metric he used. Thus it is that I [...]]]></description>
			<content:encoded><![CDATA[<p><strong>Introduction</strong></p>
<p>.NET has been getting some interesting press recently. Even to the point where an article in Game Developer Magazine was published advocating the usage of managed code for rapid development of components. However, I did raise some issues with the author in regards to the performance metric he used. Thus it is that I have decided to cover some issue with .NET performance, future benefits, and hopefully even a few solutions to some of the problems I&#8217;ll be posing.</p>
<p>Ultimately the performance of your application will be determined by the algorithms and data-structures you use . No amount of micro-optimization can hope to account for the huge performance differences that can crop up between different choices of algorithms. Thus the most important tool you can have in your arsenal is a decent profiler. Thankfully there are many good profilers available for the .NET platform. Some of the profiling tools are specific to certain areas of managed coding, such as the CLR Profiler, which is useful for profiling the allocation patterns of your managed application. Others, like DevPartner, allow you to profile the entire application, identifying performance bottlenecks in both managed and unmanaged code. Finally there are the low level profiling tools, such as the SOS Debugging Tools, these tools give you extremely detailed information about the performance of your systems but are hard to use.</p>
<p>Applications designed and built towards a managed platform tend to have different design decisions behind them than unmanaged applications. Even such fundamental things as memory allocation patterns are usually quite a bit different. With object lifetimes being non-deterministic, one has to apply different patterns to ensure the timely release of resources. Allocation patterns are also different, partly due to the inability to allocate objects on the stack, but also due to the ease of allocation on the managed heap. Allocating on an unmanaged heap typically requires a heap walk to find a block of free space that is at least the size of the block requested. The managed allocator typically allocates at the end of the heap, resulting in significantly faster allocation times (constant time, for the most part). These changes to the underlying assumptions that drive the system typically have large sweeping changes on the overall design of the systems.</p>
<p><strong>Future Developments</strong></p>
<p>Theoretically a JIT compiler can outperform a standard compiler simply because it can target the platform in ways that traditional compilation cannot. Traditionally, to target different instruction sets, you would have to compile a binary for each instruction set. For instance, targeting SSE2 would require you to build a separate binary from that of your non-SSE2 branch. You could, of course, do this through the use of DLLs, or by custom writing your SSE2 code and using function pointers to dictate which branch to chose.</p>
<p>Hand written SIMD code is often faster than compiler generated SIMD, due to the ability to manually vectorize the data thus enabling for true SIMD to take place. Some compilers, like the Intel C++ Compiler can perform automatic vectorization. However it is unable to guarantee the accuracy of the resulting binary and extensive testing typically has to be done in order to ensure that the functionality was correctly generated. While most compilers have the option to target SIMD instruction sets, they usually use it to replace standard floating point operations where they can, as the single based SIMD instructions are generally faster than their FPU counterparts.</p>
<p>The JIT compiler could target any SIMD instruction set supported by its platform, along with any other hardware specific optimizations it knew about. While automatic vectorization is not likely to be in a JIT release anytime soon, even using the non-vectorized SIMD instruction sets can help to parallelize your processing. As an example, multiple independent SIMD operations can typically run in parallel (that is, an add and a multiplication could both run simultaneously). Furthermore, the JIT can allow any .NET application to target any system it supports, provided the libraries it uses are also available on that system. This means that, provided you aren&#8217;t doing anything highly non-portable such as assuming that a pointer is 32bits&#8230;, your application could be JIT compiled to target a 64 bit compiler and run natively that way.</p>
<p>Another area of potential advancement includes the realm of Profile Guided Optimization. Currently POGO is restricted to the arena of unmanaged applications, as it requires the ability to generate raw machine code and to perform instruction reordering. In essence you instrument an application with a POGO profiler; then you use the application normally to allow the profiler to collect usage data and to find the hotspots. Finally you run the optimizer on the solution, which will rebuild the solution, using the profiling data it gathered to optimize the heavily utilized sections of your application. A JIT compiler could instrument a managed program on first launch and watch its usage, while in another thread it could be optimizing the machine code using the profiling data that it gathers. The resulting cached binary image would be optimized on the next launch (excepting those areas that had not been accessed, and thus the JIT hadn&#8217;t compiled yet). This would be especially effective on systems with multiple cores.</p>
<p><strong>JIT Compilation for the x86</strong></p>
<p>The JIT compiler for the x86 platform, as of .NET 2.0, does not support SIMD instruction sets. It will generate occasional MMX or SSE instructions for some integral and floating point promotions, but otherwise it will not utilize SIMD instruction sets. Inlining poses its own problems for the JIT compiler. Currently the JIT compiler will only inline functions that are 32 bytes of IL or smaller. Because the JIT compiler runs in an extremely tight time constraint, it is forced to make sacrifices in the optimizations it can make. Inlining is typically an expensive operation because it requires shuffling around the addresses of everything that comes after the inlined code (which requires interpreting the IL, then determining if its address is before or after the inlined code, then making the appropriate adjustments&#8230;). Because of this, all but the smallest of methods will not be inlined. Here&#8217;s a sample of a method that will not be inlined, and the IL that accompanies it:</p>

<div class="wp_syntax"><div class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #0000ff;">public</span> <span style="color: #0000ff;">float</span> SquareMagnitude<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span> <span style="color: #008000;">&#123;</span>
    <span style="color: #0000ff;">return</span> X <span style="color: #000040;">*</span> X <span style="color: #000040;">+</span> Y <span style="color: #000040;">*</span> Y <span style="color: #000040;">+</span> Z <span style="color: #000040;">*</span> Z<span style="color: #008080;">;</span>
<span style="color: #008000;">&#125;</span>
&nbsp;
.<span style="color: #007788;">method</span> <span style="color: #0000ff;">public</span> hidebysig instance float32 SquareMagnitude<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span> cil managed
<span style="color: #008000;">&#123;</span>
    .<span style="color: #007788;">maxstack</span> <span style="color: #0000dd;">8</span>
    L_0001<span style="color: #008080;">:</span> ldfld float32 Performance_Tests.<span style="color: #007788;">Vector3</span><span style="color: #008080;">::</span><span style="color: #007788;">X</span>
    L_0006<span style="color: #008080;">:</span> ldarg.0
    L_0007<span style="color: #008080;">:</span> ldfld float32 Performance_Tests.<span style="color: #007788;">Vector3</span><span style="color: #008080;">::</span><span style="color: #007788;">X</span>
    L_000c<span style="color: #008080;">:</span> mul
    L_000d<span style="color: #008080;">:</span> ldarg.0
    L_000e<span style="color: #008080;">:</span> ldfld float32 Performance_Tests.<span style="color: #007788;">Vector3</span><span style="color: #008080;">::</span><span style="color: #007788;">Y</span>
    L_0013<span style="color: #008080;">:</span> ldarg.0
    L_0014<span style="color: #008080;">:</span> ldfld float32 Performance_Tests.<span style="color: #007788;">Vector3</span><span style="color: #008080;">::</span><span style="color: #007788;">Y</span>
    L_0019<span style="color: #008080;">:</span> mul
    L_001a<span style="color: #008080;">:</span> add
    L_001b<span style="color: #008080;">:</span> ldarg.0
    L_001c<span style="color: #008080;">:</span> ldfld float32 Performance_Tests.<span style="color: #007788;">Vector3</span><span style="color: #008080;">::</span><span style="color: #007788;">Z</span>
    L_0021<span style="color: #008080;">:</span> ldarg.0
    L_0022<span style="color: #008080;">:</span> ldfld float32 Performance_Tests.<span style="color: #007788;">Vector3</span><span style="color: #008080;">::</span><span style="color: #007788;">Z</span>
    L_0027<span style="color: #008080;">:</span> mul
    L_0028<span style="color: #008080;">:</span> add
    L_0029<span style="color: #008080;">:</span> ret
<span style="color: #008000;">&#125;</span></pre></div></div>

<p>This method, as you can tell, is 42 bytes long, counting the return instruction. Clearly this is over the 32 byte IL limit. However, the resulting assembly compiles down to less than 25 bytes:</p>

<div class="wp_syntax"><div class="code"><pre class="asm" style="font-family:monospace;"><span style="color: #adadad; font-style: italic;">002802C0</span> D901             <span style="color: #0000ff; font-weight: bold;">fld</span>         <span style="color: #000000; font-weight: bold;">dword</span> <span style="color: #000000; font-weight: bold;">ptr</span> <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #00007f;">ecx</span><span style="color: #009900; font-weight: bold;">&#93;</span>
<span style="color: #adadad; font-style: italic;">002802C2</span> D9C0             <span style="color: #0000ff; font-weight: bold;">fld</span>         <span style="color: #00007f;">st</span><span style="color: #009900; font-weight: bold;">&#40;</span><span style="color: #0000ff;">0</span><span style="color: #009900; font-weight: bold;">&#41;</span>
<span style="color: #adadad; font-style: italic;">002802C4</span> DEC9             <span style="color: #0000ff; font-weight: bold;">fmulp</span>       <span style="color: #00007f;">st</span><span style="color: #009900; font-weight: bold;">&#40;</span><span style="color: #0000ff;">1</span><span style="color: #009900; font-weight: bold;">&#41;</span><span style="color: #339933;">,</span><span style="color: #00007f;">st</span>
<span style="color: #adadad; font-style: italic;">002802C6</span> D94104           <span style="color: #0000ff; font-weight: bold;">fld</span>         <span style="color: #000000; font-weight: bold;">dword</span> <span style="color: #000000; font-weight: bold;">ptr</span> <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #00007f;">ecx</span><span style="color: #339933;">+</span><span style="color: #0000ff;">4</span><span style="color: #009900; font-weight: bold;">&#93;</span>
<span style="color: #adadad; font-style: italic;">002802C9</span> D9C0             <span style="color: #0000ff; font-weight: bold;">fld</span>         <span style="color: #00007f;">st</span><span style="color: #009900; font-weight: bold;">&#40;</span><span style="color: #0000ff;">0</span><span style="color: #009900; font-weight: bold;">&#41;</span>
<span style="color: #adadad; font-style: italic;">002802CB</span> DEC9             <span style="color: #0000ff; font-weight: bold;">fmulp</span>       <span style="color: #00007f;">st</span><span style="color: #009900; font-weight: bold;">&#40;</span><span style="color: #0000ff;">1</span><span style="color: #009900; font-weight: bold;">&#41;</span><span style="color: #339933;">,</span><span style="color: #00007f;">st</span>
<span style="color: #adadad; font-style: italic;">002802CD</span> DEC1             <span style="color: #0000ff; font-weight: bold;">faddp</span>       <span style="color: #00007f;">st</span><span style="color: #009900; font-weight: bold;">&#40;</span><span style="color: #0000ff;">1</span><span style="color: #009900; font-weight: bold;">&#41;</span><span style="color: #339933;">,</span><span style="color: #00007f;">st</span>
<span style="color: #adadad; font-style: italic;">002802CF</span> D94108           <span style="color: #0000ff; font-weight: bold;">fld</span>         <span style="color: #000000; font-weight: bold;">dword</span> <span style="color: #000000; font-weight: bold;">ptr</span> <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #00007f;">ecx</span><span style="color: #339933;">+</span><span style="color: #0000ff;">8</span><span style="color: #009900; font-weight: bold;">&#93;</span>
<span style="color: #adadad; font-style: italic;">002802D2</span> D9C0             <span style="color: #0000ff; font-weight: bold;">fld</span>         <span style="color: #00007f;">st</span><span style="color: #009900; font-weight: bold;">&#40;</span><span style="color: #0000ff;">0</span><span style="color: #009900; font-weight: bold;">&#41;</span>
<span style="color: #adadad; font-style: italic;">002802D4</span> DEC9             <span style="color: #0000ff; font-weight: bold;">fmulp</span>       <span style="color: #00007f;">st</span><span style="color: #009900; font-weight: bold;">&#40;</span><span style="color: #0000ff;">1</span><span style="color: #009900; font-weight: bold;">&#41;</span><span style="color: #339933;">,</span><span style="color: #00007f;">st</span>
<span style="color: #adadad; font-style: italic;">002802D6</span> DEC1             <span style="color: #0000ff; font-weight: bold;">faddp</span>       <span style="color: #00007f;">st</span><span style="color: #009900; font-weight: bold;">&#40;</span><span style="color: #0000ff;">1</span><span style="color: #009900; font-weight: bold;">&#41;</span><span style="color: #339933;">,</span><span style="color: #00007f;">st</span>
<span style="color: #adadad; font-style: italic;">002802D8</span> C3               <span style="color: #00007f; font-weight: bold;">ret</span></pre></div></div>

<p>Methods that use this one though, like the Magnitude method, may be candidates for inlining however. Which typically reduces to a call to the SquareMagnitude method and a fsqrt call.</p>
<p>Another area where the JIT has issues deals with value-types and inlining. Methods that take value-type parameters are not currently considered for inlining. There is a fix in the pipe for this, as it is considered a bug. An example of this behavior can be seen in the following example function, which although far below the 32 bytes of IL limit, will not be inlined.</p>

<div class="wp_syntax"><div class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #0000ff;">static</span> <span style="color: #0000ff;">float</span> WillNotInline32<span style="color: #008000;">&#40;</span><span style="color: #0000ff;">float</span> f<span style="color: #008000;">&#41;</span> <span style="color: #008000;">&#123;</span>
    <span style="color: #0000ff;">return</span> f <span style="color: #000040;">*</span> f<span style="color: #008080;">;</span>
<span style="color: #008000;">&#125;</span>
&nbsp;
.<span style="color: #007788;">method</span> <span style="color: #0000ff;">private</span> hidebysig <span style="color: #0000ff;">static</span> float32 WillNotInline32<span style="color: #008000;">&#40;</span>float32 f<span style="color: #008000;">&#41;</span> cil managed
<span style="color: #008000;">&#123;</span>
    .<span style="color: #007788;">maxstack</span> <span style="color: #0000dd;">8</span>
    L_0000<span style="color: #008080;">:</span> ldarg.0
    L_0001<span style="color: #008080;">:</span> ldarg.0
    L_0002<span style="color: #008080;">:</span> mul
    L_0003<span style="color: #008080;">:</span> ret
<span style="color: #008000;">&#125;</span></pre></div></div>

<p>The resulting call to this function and the assembly code of the function looks as follows</p>

<div class="wp_syntax"><div class="code"><pre class="asm" style="font-family:monospace;"><span style="color: #adadad; font-style: italic;">0087008F</span> FF75F4           <span style="color: #00007f; font-weight: bold;">push</span>        <span style="color: #000000; font-weight: bold;">dword</span> <span style="color: #000000; font-weight: bold;">ptr</span> <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #00007f;">ebp</span><span style="color: #339933;">-</span><span style="color: #0000ff;">0Ch</span><span style="color: #009900; font-weight: bold;">&#93;</span>
<span style="color: #adadad; font-style: italic;">00870092</span> FF154C302A00     <span style="color: #00007f; font-weight: bold;">call</span>        <span style="color: #000000; font-weight: bold;">dword</span> <span style="color: #000000; font-weight: bold;">ptr</span> <span style="color: #00007f;">ds</span><span style="color: #339933;">:</span><span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #0000ff;">002A304Ch</span><span style="color: #009900; font-weight: bold;">&#93;</span>
<span style="color: #339933;">----</span>
<span style="color: #adadad; font-style: italic;">003F01F8</span> D9442404         <span style="color: #0000ff; font-weight: bold;">fld</span>         <span style="color: #000000; font-weight: bold;">dword</span> <span style="color: #000000; font-weight: bold;">ptr</span> <span style="color: #009900; font-weight: bold;">&#91;</span><span style="color: #00007f;">esp</span><span style="color: #339933;">+</span><span style="color: #0000ff;">4</span><span style="color: #009900; font-weight: bold;">&#93;</span>
<span style="color: #adadad; font-style: italic;">003F01FC</span> DCC8             <span style="color: #0000ff; font-weight: bold;">fmul</span>        <span style="color: #00007f;">st</span><span style="color: #009900; font-weight: bold;">&#40;</span><span style="color: #0000ff;">0</span><span style="color: #009900; font-weight: bold;">&#41;</span><span style="color: #339933;">,</span><span style="color: #00007f;">st</span>
<span style="color: #adadad; font-style: italic;">003F01FE</span> C20400           <span style="color: #00007f; font-weight: bold;">ret</span>         <span style="color: #0000ff;">4</span></pre></div></div>

<p>Clearly the x86 JIT requires a lot more work before it will be able to produce machine code approaching that of a good optimizing compiler. However, the news isn&#8217;t all grim. Interop between .NET and unmanaged code allows for you to write those methods that need to be highly optimized in a lower level language.</p>
]]></content:encoded>
			<wfw:commentRss>http://scapecode.com/2009/06/playing-with-the-net-jit-part-1/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
