By m0nkeymafia
via systemcoder.blogspot.com
Published: Feb 13 2007 / 22:19
By m0nkeymafia
via systemcoder.blogspot.com
Published: Feb 13 2007 / 22:19
Comments
mezmo replied ago:
I hope no-one has forgotten these.
Sigh, makes me miss the days of the Abrash books.
MattGiuca replied ago:
Hm, I think this one is debatable:
"Pass objects by reference rather than value"
By debatable, I don't mean I disagree - I mean you literally could debate it for hours.
My hunch (and what I usually do in my code) is that the bigger the struct, the more you want to pass byref. The more times you access the struct within the function itself (especially taking loops into account), the more you want to pass byval.
The reason is that while the article is true that passing byval means more copying to the stack, what it doesn't tell you is that passing byref requires a dereference on each access (in general; it can be optimised down).
Furthermore, the performance penalties of each depend greatly on the language, and the compiler (and how the compiler optimises).
I tried it in C, with no optimisation first of all - compiling to x86 assembly with GCC. I defined a massive struct (about 128 bytes). When passing byval, it incurred a hefty call to memcpy. This was certainly not pleasant: (sub, mov, mov, lea, mov, sub, push, push, push, call memcpy).
However, the x86 arch has built in string copying instructions, and even if you didn't use them, you could inline memcpy - so while GCC didn't do the best here, another compiler or language could. Also, for smaller structs, I believe GCC will just push the bytes onto the stack, without using memcpy.
The upside of this is that once you're in the function, you can access all the elements of the struct by just grabbing them off the stack. This is a single mov relative to ebp.
Now to compare, using the byref passing. I did the same thing as above but passed byref instead of byval. The generated assembly code has a MUCH lighter call - as expected it simply pushes the pointer on the stack.
However, looking inside the function, it takes TWO movs to make each stack access instead of one. That's one mov to get the pointer off the stack into a register, and another mov to dereference the register location and access the data inside. This makes byref very slow if you're accessing struct elements a lot, especially inside a "tight loop".
Now GCC with no optimisation did not realise that it was re-loading the same pointer into a register each time. In other words, every time it accessed a stack member, it would first load the struct pointer off the stack into eax, and then proceed to dereference eax. Then the next access, it would reload from the stack into eax.
Obviously, when you optimise (using -O2 for example), it will realise it's wasting time and only load it once. If it's doing this, then the performance hit for using byref is negligible - so once again it depends on your compiler and optimisation level.
But this doesn't work in the general case, unfortunately. Remember, x86 only has 4 general purpose registers. So in many (most?) cases, it will not be possible to hold a pointer in a register for very long. This means that in a practical situation, it will be necessary (even with optimisation) to keep reloading that pointer off the stack into a register.
The bottom line is: byref makes it slower INSIDE the function, on a per-access basis. byval makes it slower TO CALL the function, based on the size of the struct. There is really no general way to say which is better or not - the keen programmer will optimise on a case-by-case basis.
mystilleef replied ago:
Don't these suggestions fall under premature optimization? Isn't using profilers the best way to optimize a program? In his last point, I'm surprised he didn't mention using asynchronous I/O libraries, as opposed to synchronous ones.
Voters For This Link (15)
Voters Against This Link (0)