Performance is a very complex issue because it is a composite of many different things. All companies figure out ways to look at those things which make them appear better than the competition -- the trick is to figure out "who is lying" (see marketing) the most.
Intel has a long proud history of being the best at "stretching" the truth -- in a very big way. Now they aren't lying, exactly, but users are highly unlikely to get performance near what they claim in the real world. Others are not saintly, and use some of these techniques -- but they do so to far lesser degrees.
So read on to understand how benchmarks can get so blown out of proportion.
The easiest way to exaggerate your performance is to test your machine on a hot box. A hot box is some super-fast, super custom machine, that no one in the real world can afford, but which makes your processor look faster than the competitions.
All companies use hot boxes (high end machines) to benchmark with, there are just differences in degrees. IBM and Motorola usually use high end machines that they sell. Intel usually makes custom motherboards (or even controller chips) that don't ever seem to be sold, or at best are never mainstream. While this does mean that the processor can theoretically perform that fast, it does not mean that yours (of the same speed) will ever come close.
I've seen some Intel benchmarks that use SRAM (high speed expensive cache RAM) as main memory, or they used 4 way interleaved SDRAM (a way to speed up memory access) -- long before you could buy an off the shelf memory controller that supports such thing. The P6 (Pentium Pro) was also benchmarked using a version of the chip that had a larger integrated cache then users could buy. (Almost a custom version of the processor itself). Intel did sell a few of them (just to be in compliance with regulations) but that flavor of P6 (with the 1 Meg integrated cache) was never produced in volume, it cost nearly twice as much as the ones users got, and most of them were delivered late in the products life-cycle (but benchmarked at the beginning). Basically, it was not the processor that 99% of the users bought (they had either 256K or 512K cache) -- and so not the performance that users were going to get. There are lots of little Intel tricks like that.
So most companies may be exaggerating their performance (by 5% or 10% from the norm) by using hot-boxes, or boxes that are high end hardware (but people can actually buy) -- but at least they are fairly close to claims. Intel on the other hand is often off by 10%, 20% or even 30% -- easily. There are rumors of them tuning their processors to run benchmarks faster (which wouldn't be a surprise). Intel definitely does not test with off-the-shelf Dell or Compaq machines (even high end ones) -- those machines would rate far below what Intel claims. So in some ways the users are defrauded into believing that their machines runs far faster than it actually does. But since users are only comparing performance to an older machine, that also had exaggerated claims, they will notice the improvement and never know the absolute truth.
So I always reduce Intel's Spec numbers by 25-30% just for their hardware exaggerations alone. And this is only one small technique they use to trick people into thinking they are faster than they are.
Most Apple machines they seem to be below the top by closer to 5% -- rather than the 20-40% on the Intel side. This is because Apple produces the whole System, and most users are more into buying quality instead of buying the cheapest. (The exception were some older low-end Performa's which were designed for price more than performance). IBM and Motorola don't want to ruin their reputation with fraudulent claims, and must assume that their customers are smarter than Intels customers or PC magazine reporters (who have a vested interest in perpetuating the fraud).
Intel uses a special compiler called the 'Intel Reference Compiler". This is a compiler that no one really uses for writing commercial software -- mainly because it is "tuned" for Spec's (a standard benchmarks). Now Intel would go ballistic at that wording (it might violate the rules for Spec and imply fraud) -- so lets just say that it is "tuned" for things that are done in the Spec benchmarks, and for little else -- but the point is that this compiler is really only good for making Intel processors look better than they really are. One of the techniques they used was a sophisticated multi-pass dynamic profiling (tuning) technique (2), basically this gave Intel at least a 10-15% boost in performance, that most others people can't afford to, or won't spend the time to do. Partly, they don't need to do it, because the complexities of writing really fast x86 code may have surpassed RISC code when it comes to scheduling complexities or so on -- or in other words, you are more likely to need such things in the Intel world.
(2) Basically, they compile the code once with lots of special stubs -- the compiler runs the code and profiles (analyzes) the the code as it is running to see if it doing things that are used in the Spec suite of benchmarks. When it sees those things, it places a marker, and then the code is recompiled again. On the second pass the compiler replaces your "tagged" code, with highly optimized "hand tuned" assembly code to make it faster (basically, it rewrites your code for you, if you are doing things that are in the Spec suite). This takes a ton of time, and is only useful for a few things (like cheating at benchmarks), but hey, that's Intel.
While it is technically "legal" to have that kind of optimization, it is completely impractical to use it for almost anything other than specialize benchmark tuning -- it takes a lot of work (cost) to create (so no commercial App writers can afford to do it), and it only seems to work well for doing very specific things things (like benchmarks). This is why almost no one uses the Intel Reference Compiler for writing apps -- not to mention the lousy interface, horrid usability, and it is very, very slow to use. Since Application writers don't use it, you will never see that "potential" performance. So while it is true that the processor is theoretically that fast, it is completely false that your applications will ever run that fast. Using traditional compilers which are less optimized in general (like Microsoft's), and given average development budgets (which don't usually include lots of time for specialized "tuning"), I would expect to see 15-20% less performance than Intel claims (at least).
Apple, IBM and Motorola both have high performance compilers as well. But both are far more likely to be used commercially. Motorola's and Apple's compilers are plug-ins for Code Warrior, and they are used in a lot of programs. IBM, and their clients, use their compiler as well (though most Mac users don't use IBM's compiler). Motorola also released a bunch of optimized math libraries, so that ALL Application developers can use those tuned libraries. Those libraries are more generic, and so the speed up more than just Spec. The point being that PowerPC's are far closer to achieving their theoretical maximum than Intel Processors.
Pentium Optimized Code
Each processor requires the code to be "optimized" (tuned) for that machine. If you do not do this you can get a performance hit (loss) of anywhere between 5-30% (with most being in the 10-15% level).
But here in lies the problem -- you have to tune for each processor, but which processor do you tune for? The problem is made worse because of the size of the PC market, and the number of different processors (that vary greatly in architecture) -- legacy is a millstone that holds you back.
If you have 100 Million users that are using a pre-pentium machine, then when do you come out with Pentium-Optimized code? Do you wait until after ten-thousands Pentiums are sold or after 10 million, or do you wait until they are half of your users (years after introduction)? When the App does change, all of the older 486 users are going to see a performance drop (say a penalty of 5-15%) -- and they will not be happy with your "upgrade" -- and the magazines will show big charts of how much slower you new version of you application is, and so on. So App writers are slow to change optimizations.
The size of the PC marketplace (legacy), means that they move far more slowly (to wait for their user base to catch up to any processor). So there is probably a 2-3 YEAR lag between processor release, and when many Apps are optimized for that processor (at least). But by then Intel has moved on to the next flavor of processor (like the Pentium Pro, Pentium with MMX, or PentiumII). And Intel has to keep moving and changing things or the clone-chip makers will catch them. So almost every program is really optimized for the previous version of processor. Meaning that new processors are really going to be running Applications well below their rated performance. Maybe as the processors are becoming outdated (and replaced) will the new Apps be tuned for them, meaning that things are slower than they could be on the newest processors.
But wait, there's more. Because there are 400 flavors of the x86 processor, it is even harder to pick well. You can't choose the PentiumII, because its scheduling may cause a performance hit for the Cyrix, AMD, other companies, or other Intel processors. And imagine trying to tune for something like MMX, when Intel is going to have multiple flavors of that (already two slightly different flavors, a bigger change is coming soon, and more to come). The amount of choices, works against developers, and achieving theoretical performance. So most don't really tune for any one processor at all -- they go for a lowest common denominator (some compromise that is sorta half tuned for a few different processors).
Remember, this doesn't only apply to Applications, but to Microsoft and the Windows Operating itself. Some of the performance drops can be additive (Apps slower + OS slower + Drivers Slower = computer far slower). So once again, the performance that Intel (or others) claim, is assuming "Optimized" code -- which programmers can't do as well in the PC world.
While this optimization (instruction scheduling/tuning) problems exists in the PowerPC's world as well, it is far less a problem for a variety of reasons:
Some of this problem of App tuning can be reduced by using Unix, if you are compiling your own Apps for each processor. But then Unix compilers may not have the best optimizers either (which can mean a pretty big performance hit as well), and many people are getting pre-compiled Unix code now days (with compromise optimizations).
Have you ever noticed that the benchmarks are not run on Microsoft Windows95 or 98. Most of the time they use a nice clean, and stripped version of Unix, to make things go as fast as possible. This is partly because Windows is a big slow-bloated pig of an Operating System -- but it is also the Operating System that most people are using. So when people see the theoretical performance of "Spec", that only applies to the 2-4% of PC users that are running Unix. Most will probably see anywhere from a 5% - 15% performance drop for using Windows (and it is probably worse with Win95).
PowerPC's also usually use Unix to benchmark with. The difference is that Mac OS X is actually going to be built on Unix. The flavor of Unix that it is being built on, is likely to be faster than the ones that things are benchmarked on right now (since Apple is tuning the scheduling). There also seem to be less of a performance difference between Unix-MacOS than Unix-Windows when running benchmarks (the Mac is a tighter OS). So Apple is not only getting closer to actual performance today, but likely to be right-on or even faster tomorrow.
16-Bit vs. 32-Bit
Remember that legacy nightmare on PC's? It is much worse than you think. That backwards compatibility thing, holds everything back. Most people are using Windows95 (or Windows98). Windows9x still has a whole lot of 16 bit stuff buried in there, and there are 16 bit drivers, and many 16 bit Applications. Those performance ratings are all assuming nice clean 32 bit programs and Operating Systems. So once again, the real world, and the benchmarks are too different things. In fact, so much so, that while the Pentium Pro (P6) was twice as fast as the Pentium, it was actually the same speed (or possibly slower) when running 16 bit Apps. But remember, that ugliness and performance drop can get you when running 32 bit Apps on Win95 (because Win95 itself is not 32 bit clean). So for most people the PentiumPro was actually SLOWER than the regular Pentiums -- despite performance claims that it was up to twice as fast.
So if you are not running WinNT and Win32 Apps only, then you can expect another performance penalty from 5-50% (usually towards the lower end of the scale).
There is no such thing as 16-bit (paged) mode on the PowerPC, or the other quirky modes of the x86. So there is no way for the PowerPC to take that performance hit.
Application benchmarks are where you compare one Application on one Platform to the same Application on another. These are the most important comparisons since they are how users work, and what users are trying to do.
Intel got tired of being whipped in these Application benchmarks -- so they paid to hand tune some key "benchmarking" programs and write key parts in Assembly-Language (faster but something most companies can't afford to do any more). So Intel tuned just for MMX, all so that it would look better compared to PowerPC's. They did this with Adobe Photoshop (and I believe one or two other Apps).
This means that Adobe Photoshop is anomolously fast compared to other PC programs because of the tuning. So it is a hot-app for the following reasons:
So this is just another case of Intel tilting the table in their favor (and slightly defrauding the public). Yet we all know the results. Even with Photoshop being biased in favor of the Pentium, it still runs twice as fast (or more) on the PowerMacs. There are a few small places where that investment paid off and it is as fast, or even a tad faster, but those are rare indeed. In fact, when Intel saw the "Toasted Bunnies" commercial, they got mad about getting crushed in ByteMarks, so Intel tried to use their tuned Photoshop Filters to do a benchmark. Even using specially tuned routines, at special sizes, the Mac was more than twice as fast in composite tests. And if it wasn't for Intel's biasing, the Pentium would be running even slower.
Of course, Photoshop isn't the only Application they've ever done this with -- they used to compare Office on the Mac to Office on the PC, because Office on the Mac actually ran with this big, ugly, fat, Windows emulator underneath it (thanks Microsoft) -- even Office'98 still has some of that bloated psuedo-emulator in there.
MMX itself is somewhat fraudulent. It does help performance for certain things -- that performance is real. But the way it is presented may be more fraud than truth. Remember, most Apps don't use MMX. So while it may make some Apps seem faster (according to certain benchmarks), it may not apply across the board (only working in certain conditions).
Remember that Photoshop cook-off between the Mac and MMX tuned Pentium code? Not only it the Mac twice as fast, but the PC is only as fast as it pretends to be on a few specific filters, and only if those filters are done with very specific parameters. For example if you do a Gaussian Blur with less than a 4 pixel diameter (fairly common), it runs as fast as demonstrated. But if you do a Gaussian Blur with a diameter of more than 4 pixels the MMX won't be used (it only works with small sizes) -- the the Pentium is less than 1/2 as fast as it was before. The Mac doesn't take that performance hit. It is like this with most of the "filters" and specialty things that Intel demonstrates MMX with -- sure, it works for SOME things. Most users will see little if any difference with MMX, except in very rare cases -- which Intel plays up and tries to sell as "the norm".
ByteMarks vs. SpecMarks
This is a big debate among geeks. Basically, Spec is a far better benchmark -- in theory. They are both just a collection of sample "things" to do -- but Spec does more things. It is also more "controlled" in how tests will be run, and so on. But it may be those "controls" that allow for the impractical results. So in theory Spec should be more reflective of overall performance -- too bad the theory doesn't hold water.
Being a better benchmark in theory or not doesn't seem to matter for all those reasons mentioned throughout this article. Intel taints the benchmarks by only running it in completely impractical conditions, with special compilers, on special hardware, and with OS's that people aren't using, and so on. Intel may even have tuned their processor a little to make it faster especially for Spec. So while Specs shows theoretical processor performance (in a perfect world), that is irrelevant to all users -- what users care about is how a processor performs in their world. Compare the following:
So even though Spec is theoretically a more full-featured, and more pure, testing suite, it isn't one in real life. The only thing Spec seems to show is relative performance, or how well Intel can taint benchmarks. Best case, Spec is just a theoretical measurement of how fast Intel can make their processor go, if it wasn't in a system that you had to use, and in a purely academic way -- but that is completely unreflective of how fast your machine will run.
ByteMarks, while being a smaller and simpler suite, is a better measurement, probably just because it has not been the focus of Intel's tuning efforts (and lower key)... at least up until recently. I don't doubt that Intel will figure out some ways to start polluting the facts for ByteMarks as well. So when the rubber hits the road (and I am working in the real world), I can compile a test to see how things work in the real world with ByteMarks. I've done a lot of comparative compiles and benchmarks (with our own apps, and many application fragments), and the tests almost always reflect a 2x Mac advantage (for processor intensive stuff). Commercial Apps that I run reflect a 2x Mac advantage (for processor intensive stuff). My Mac feels far faster than my PC (for processor intensive stuff). ByteMarks reflect a 2x Mac advantage (for processor intensive stuff). The only test that does NOT reflect a 2x Mac advantage is one benchmark that I have no reason to run in the real world. So what should we believe? That everything else is wrong, and Spec is the only true test? Or that despite being a well done benchmark, and more thorough, that Spec has been so polluted by Intel's marketing techniques, and that the measurement is so impractically specialized, that ByteMarks are a better general measure of real world performance.