Regarding my post the other day about delaying an object copy until required a friend asked for me to show my working out. So here is the "proof". The scenarios are somewhat contrived as I didn't spend much effort initially as I felt the tip was self explanatory.
OverviewThe basic setup for the test is a simple function that either takes a by-value object to force the copy or a const-reference as the most efficient non-pointer implementation. The object has a method called 
isValid which returns true if one of its attributes is greater than zero. This attribute is defined in the constructor and therefore remains constant over the life of the object.
To give the object something to do whilst in the function, there is a second method called update. This increments a different attribute by one. It also enforces the requirement that a non-const object exist at some point.
ObjectsThere are three types of objects. A 
simple object, a 
string object and a 
complex object. Each one has different members to make the copy constructor increasingly complex.
The Single Object has 8 32 bit unsigned int attributes and that is all. The String Object has the same eight attributes as the Single Object and on top of this are five emptystl string objects. Finally the Complex object has the same attributes as the String Object as well as three dynamic arrays that are allocated to 100 bytes in length on construction and threestl vectors of differing types. The Complex Object has a custom copy constructor to copy the dynamic byte array.
 Simple (click to enlarge)
Simple (click to enlarge)
 String (click to enlarge)
String (click to enlarge) Complex (click to enlarge)
Complex (click to enlarge)There were two basic tests. 
Always exit early and 
never exit early. There was also two builds. Debug mode and a Release build with "full optimisation" that didn't favour speed or size. Both were compiled using the Microsoft Visual Studio 2005 compiler.
These are the release-full compiler switches: /Ox /GL /D "WIN32" /D "NDEBUG" /D "_CONSOLE" /D  "_UNICODE" /D "UNICODE" /FD /EHsc /MD /Fo"Release-Full\\"  /Fd"Release-Full\vc80.pdb" /W3 /nologo /c /Wp64 /Zi /TP  /errorReport:prompt
Each test was placed inside a loop and executed 50,000,000 times so that the timing code would return cycles greater than zero. I used timing code that had a resolution of one second. Any finer granularity wouldn't have really made a difference.
 test code (click to enlarge)
test code (click to enlarge)
Hardware/Software Configuration
The computer details are:- Microsoft Windows XP Professional, Version 2002, SP2
- AMD Athlon 64bit X2 Dual Core 4200+ (2.21GHz)
- 2GB 800Mhz DDR3 RAM
The machine had a fair few other applications open at the same time but none had focus aside from the executing test. Naturally the other apps are still waking and sleeping from time to time and while they may impact the overall run time they won't impact the outcome of the test.
ResultsHere are the results. Numbers indicate clocks per second. The finest granularity I used was a second. The zero numbers are, more than likely, caused by the optimiser realising that the isValid method will always return true or false and as such it removed that code out of the equation. In the exit-early scenarios this collapses the function to nothing.
The simple object with always copy ends up being "zero time" I 
believe because the compiler knew in advance the number of iterations and the functionality in the update was trivial (integer increment). This means it could calculate an end result without running the loop at all.
Irrespective of the poor test setup it is blindingly obvious that the const-ref scenario wins as soon as the object being copied becomes non-trivial.
 results (click to view same image in own window...)
results (click to view same image in own window...)
ConclusionThere you have it, const reference parameters with late copying are faster than pass by value copied parameters for anything but simple data objects. They are marginally slower when the late copy needs to occur but this delay is the time it takes to make a four byte copy. Which could be offset by using the const-reference object for as long as possible, therefore enabling the compiler to make better optimisations. Const objects are easier to optimise than non-const.
If my memory of Java is true, then this applies to Java. I don't know if it applies to C#. I suspect that C# does some of this in the background.