Dojo (HowTo)







  Easter Eggs




  Martial Arts

The Revolution

By:David K. Every
©Copyright 1999


As we get further into the future, the less is known. Remember, these numbers are highly speculative (varying from reliable rumor, to wild-assed guessing). I would expect these processor in mid 2000 to the end of 2000 -- but Motorola has been missing some schedules (things seem to be slipping a bit). They are going to need something to compete with Merced, McKinley and bumps to the x86 core.




800 MHz+?


20 million transistors+?
.15µm process
100 mm2?
50 million transistors+?
.15µm process
200-400 mm2?




32K x 2 L1
Integrated L2 (128K+?)
L0 (32K x 2) + L1
& L2 off board L3?.


7 - 10 Execution Units?
depends how you count


mid to late '00?
late '00 to early '01

The G5 is supposed to have a redesigned core that has a full 64 bit implementation of the PowerPC ISA (Instruction Set Architecture). This will include 64 bit addressing and integer units (the PowerPC already supports 64 bit floating point and 128 bit vector unit).

Someone asked if 64 bit integers would help integer performance. The answer is probably not much at all. Superscalar 32 bit is pretty close (in performance) for most integer memory operations, and the AltiVec can do 128 bit memory moves much faster. Most programmers that need more detail than 32 bit ints can provide, can just use 64 bit floating point. So the difference between 64 bit integers and 64 bit floating point (or 128 bit AltiVec) is probably nominal for most things. In fact, odds are that the larger 64 bit address space (when used) can minorly degrade performance for some things -- so I think it will probably be a wash (just give more addressing space -- not more speed).

Some other people mistaken think that 64 bit version of the PowerPC will break most Mac apps. I doubt it. I think the way it will be implemented will allow it to work just like the older 32 bit versions. So programmers will probably be able to devote a separate 32 bit environment to each Application/Process, and only worry about the full 64 addressing in a few Apps or in the OS itself (if necessary). Meaning compatibility, plus features. But I do not have the details on this.

I would expect some minor improvements to the AltiVec engine (maybe even dual issue ALU if they feel they have space to burn -- but I doubt it). And there should be some boosts to both the Integer and FP engines.

This core is going to include some changes to the instruction set. Both to support 64 Bit -- IBM and Motorola are working on ways to do this that are different than the original ways in the PPC spec. Since the PPC camps goal is to keep processors low(er) power, I doubt they will go crazy and add lot of execution units (even though they could). But some moderate changes wouldn't be a surprise -- maybe another execution unit or two -- but I don't think they'll go crazy.

Other ways that I would expect changes to the instruction set include adding some predication. This allows for both sides of a branch or conditional to be executed at once (and the non-used one to be thrown out). Already there are some limited forms of this in the PPC, but I would expect a few more. When you use predication it allows for more parallelism assuming you have enough execution units to keep busy -- so predication and more units seem to go hand-in-hand.

The change that is a given is deeper pipes! Motorola is not stupid, and the shallow and simple pipes of the G3 (G4 and earlier PPCs) is limiting the MHz that the processor can be run at. It is hard to fight the hype of MHz = Performance -- so they will stop fighting it. A deeper pipe will mean a significant bump in MHz. You aren't always getting as much work done per cycle with simpler stages (deeper pipes) -- but you can often get so many more cycles that you can still come out ahead. And good analysis and design can yield nice performance improvements.

Another highly likely inclusion is on-chip L2 cache. Intel and AMD are going to onboard L2 caches. This has some negligible performance and cost/performance advantages (today) -- more needed in the PC world. On-chip L2 usually has smaller cache (say 128K versus 512K to 1 MB) -- but it runs at higher speeds making them close to equal (better for some things, worse for others). In a few ways onboard L2s are actually more limiting than off-board L2 since you can't bundle the same processor in as many different ways and it cuts yields on the chips, and so on. So overall the on-chip L2 has some small cost and performance advantages (today) that may not be a huge win on their own -- but there are other issues, and this core is designed for tomorrow (next year+). The biggest issue is that with many of the big players going to on-chip L2s, the long term market for L2 cache could dry up, or they could start demanding a premium for those chips. Since either of these things would impact the PowerPC camp (or anyone not having on-chip L2) and change the cost-performance equation (against chips without on-chip cache), this is a more important move for the future than it might seem. Even if the L2 costs-equation doesn't change in the future, then you can still use off-chip L3 cache to gain even more performance -- and the differential between processor speed and memory speed is justifying more layers. So on chip L2 is really a low-risk insurance policy, that can give some performance gains (in the future) as well.

Parallelism is the Key

There is a big debate about parallelism in design circles (what is the best way to get more work done at the same time).

Merced is trying to execute one stream of code, very fast, through the use of multiple inline streams and something called predication.

SIMD implementations (like AltiVec) are superior for many uses, and will be able to outperform Merced for many (most) special tasks.

The other way to do parallelism is to have multiple threads (mini-programs) running at the same time. Basically this is like having multiprocessing (multiple processors) in a single chip -- like IBMs GigaProcessor (a supercharged multicore PowerPC).

I suspect that Merced's parallelism will be able to outperform a single G4 or G5 core (sans-AltiVec) at the same MHz, by a reasonable amount (say 20-60% -- tending towards the lower). But then for many things AltiVec will likely beat IA64 -- and there are tradeoffs. Merced's approach will likely have a smaller cache, added complexity, wasted space for emulation, and tradeoffs that are likely to negate any Merced performance advantages. I do not think that Merced will be able to hold a candle to a dual core PowerPC, let along 3 or 4 cores. I also think that it is easier to scale the multi-cored design than EPIC.

The problems with VLIW (EPIC) and compiler complexity have just not been solved (yet), despite years of work -- while we easily understand multiprocessor issues and there are already tools to take advantage of this style of coding. It will take years to work out the problems with complexity that the Merced will introduce. Plus, 99% of what Merced will be doing early on is just emulating x86 code (running legacy), while multicore PowerPCs will work with current applications "native".

All this leads me to believe that Merced will will left standing still at the starting gate. Not in marketing and sales -- Intel's size guarantees a certain degree of success, no matter how poor the product -- but in performance and usefulness I think it will fall far short.


There were rumored to be multiple core flavors of the G4 in design -- but I think that may have been pushed out to the G5. Mainly, there has to be an MP Operating System in place for a while to see the true advantages of MP software in an MP processor. So there is some lead time -- but also a chicken and egg thing (do you create the OS first, or the processor?). I expect a few multiprocessor boxes, and then that will be the big hint that multiple cores are on their way.

Many people question if MP is useful without lots of multithreaded applications first. The answer is yes it can be. To justify a technology you really only need one or two key Apps to use multiple threads (cores) -- like Photoshop or rendering. That alone can justify the technology. And most people are running multiple Applications at once -- and MP can load-balance and keep all those threads running better overall. But to see the performance advantages on one single app, that App has to be threaded. Java is threaded, and many parts of the OS are multiple threads, and some can break themselves into multiple threads. So you will see performance returns.

For those that claim that RISC hasn't won, look anywhere in the computer world. RISC has totally taken over the embedded market, the controller market, the high-end market, and made a large dent in the mainstream market. Even Intel (the last hold out, with everything to lose) is giving in and updating their architecture with more and more RISC like design.

We are just moving into an era of "post-RISC"... which is like RISC+. AltiVec isn't truly what people thought of (originally) when they thought of RISC.


I would not be surprised if a little over a year from now, if we are working with PowerPC processors that take a 2 - 4 times leap over the previous generation.

I'm interested in the new G5 core (ISA changes). I think they will give the PPC camp a lot more hype with the faster MHz, and some very nice performance improvements for a single thread. I would guestimate like a 10-20% improvement in performance due to changes in the ISA. Another 10 - 40% improvement for the changes in clock speed.

But I still think the multiple cores solution will give a longer term solution to a lot more processing power. I expect we could see like 80% (or more) improvement because of dual cores -- and scaling to 3, 4 or 16 cores is possible.

It would be great if all these changes could come at once -- but I suspect they would come out with single core first, then the multicore a few months after that. Of course Motorola and Apple could surprise us, and release a multicore version of the G4 -- but I doubt it will happen that soon.

I hear politically that Motorola's upper management is not convinced of multiple cores in the embedded space, and being sort of daft about it. I also hear that at first Moto wasn't convinced of AltiVec and it was Apple and IBM pushing them that got things started there too (Keith Diefendorf of Apple, and I forget IBMs designers name, Indian sounding name). But rumors are rumors -- I don't take them too seriously. I just hope, and believe, that intelligence will win out in the end. And that means multiple core versions of the PPCs. IBM is doing them with their Power4, and Motorola would be moronic not to cover their bets -- and it seems the best way to fight dramatically increasing design costs. So let's keep our fingers crossed. I think when we start seeing more MP machines out of Apple, that the countdown to multiple cores will have begun.

Created: 07/06/98
Revised: 09/07/99
Updated: 11/09/02

Top of page

Top of Section