Advocacy

  Myths
  Press

Dojo (HowTo)

  General
  Hack
  Hardware
  Interface
  Software

Reference

  Standards
  People
  Forensics

Markets

  Web

Museum

  CodeNames
  Easter Eggs
  History
  Innovation
  Sightings

News

  Opinion

Other

  Martial Arts
  ITIL
  Thought



The Revolution

By:David K. Every
©Copyright 1999


G4's (7400) with AltiVec arrives

Motorola calles the Vector Processor in the G4 "AltiVec" -- and Apple calls the same technology the "Velocity Engine". Don't let the names confuse you -- it is the same thing.

CPU

G4 w/AltiVec
7400

Speed

400-600 MHz

Size

10.5 million transistors
.20µm HiPerMOS process
Copper Process
83 mm2

Power

5 - 11W
1.8v

Cache

32Ki/32Kd
supports 2MB L2
(On chip tags)

Units

7 Units
----------
2 Integer
Load/Store
Branch / System
1 Floating Point
AltiVec (ALU)
AltiVec (Permute)

Date

Sept '99

There are improvements over previous PPC's including:

  1. New maxbus and 60x bus support. The 60x bus is synchronous -- if you have an outstanding read operation, you have to wait until you get the results of that read to go on. Since RAM is much slower than the Bus (and slower still than the processor), this means a lot of waiting. The mpx bus (used on the G4) is asynchronous and allows for up 4 outstanding accesses at the same time. So when you stall on one read operation, you just go on and do another and another, knowing that when the first one gets completed it will get taken care of (without hanging everything else up). The results are up to a 3 fold performance increase for memory bound operations.
     
    NOTE: This is why specs can be so deceptive. Without changing the speed (MHz) of the bus at all Apple/Motorola made it up to 3 times faster!
     
  2. There is a version of the processor (G4) that uses a 128 bit mpx bus (called Maxbus), that is not being used. Adding all the extra pins to the package increase those costs -- and would cause a major redesign of motherboards, and all the support chips, and so on -- so it will be a while (if ever) before Apple uses it.
       
  3. 128 bit data-path to cache -- with the option of supporting 128 bit system bus as well. Macs will probably not use the 128 bit system bus because of cost and complexity issues -- but the increase to cache alone will probably give a performance increase by itself.
      
  4. 7400 also supports a 2 Megabyte L2 Cache (with all tags on-chip), which again can help performance over the previous 1 MB L2 limit.
     
  5. Cache-touch-ops. When the caches miss-predicts what you are going to load next from RAM, it can be a BIG hit in performance. Some new instructions allow compilers (programmers) to "pre-warm" the cache with the right information (in the background), so that by the time the processor needs any data it can be there. This is cache-hinting, and in some algorithms can make a really big difference in performance. It was added along with the AltiVec engine, but they aren't married (they are separate additions). It allows 4 streams to be preloaded, very low overhead (much better than the Pentiums limited hinting), and can make a difference to non-vectorized code (help Integer and Floating Point performance as well as Vector performance).
     
  6. The floating point of the 603 and G3 had only a 32 bit multiplier (ALU) -- so for floating point multiplies it took two passes (2 cycles) to do a 64 bit floating point operations. The 604's had a full 64 bit multiplier, and so did better at floating point math performance than the others (up to twice as fast). The G4 has picked up the 64 bit ALU, so the G4 should have better floating point performance than the G3. Furthermore, many operaions (especially FP) can be bound to cache and bus speed -- these will get more improvements from the bus improvements and by the wider (and larger) cache.
     
  7. The big addition is the AltiVec (SIMD) units (called the Velocity Engine) which can do 4 way single precision floating point, or 16 way byte math, all in a single cycle. In fact, it is superscalar and can be doing up to two of these vector operations at the same time (ALU and Permute). For a few things this will give a 16 times+ performance increase or more (though most real world increases will be more moderate). But this does require Apps and the OS to take advantage of it first -- but it will make a very radical performance increase, to a few critical things (Quicktime, Photoshop, 3D, speech, networking, sound, some graphics, some emulation, etc.). Read the various AltiVec articles (in Hardware) for more. This is a big deal!
      
  8. Full SMP support+. The G3 supported the MEI part of 4-State MERSI (SMP) standard. You could do MP, but there were performance hits. The G4 supports the full 5-State MERSI standard, that not only supports MultiProcessing, but can directly transfer information chip-to-chip (at very high speed) -- so it is really good at MP. As MP becomes more important, the superior MP of the G4 will make things even better.

Some people are reporting very modest increases by using the G4. But many benchmarks are not good at reflecting real world performance. For example, SPEC can't reflect some improvements because they don't even know how to handle an AltiVec (SIMD) -- so they show a 0% increase -- yet the real world performance could be substantial.

And more than just the processor performance increases, the G4s will go in systems (Sawtooth and later Shark) that are faster than the other systems of the day. The improved I/O of these motherboards, the maxbus, and so all this should all add up. So I expect that the G4 performance in a computer systems will be significant.

Rumors -- 64 Bit? Multicore?

There are 64 bit versions of the G3 and G4 in early phases of design -- but there is little reason to manufacture them right now. Likely we will see them with the G5 Processor (and a new core). Though there is nothing stopping them from adding them sooner -- it just makes more sense

There were some rumors of a processor (G4) with multiple cores -- but that was a while ago. The G5 is starting to come on the roadmap as well -- so I expect that multicore has been pushed out a bit to the G5 core. To see the benefits of multicore we need OS X shipping for a while and good MP support -- and it will likely be at least 6 months to a year after Apple starts selling MP boxes that we might see a processor that integrates multicore on chip. Apple first needs to prove the demand (and this guestimate is based on lead times, and how long it will take to get software polished and all that). So I expect it would be about a year after OS X -- which gives them enough time to come out with the G5 core first.

Of course, just to confuse things, IBM has released a new piece of "big iron" (S/390 mainframe) that they call the G5. They are even going to have the G6. I believe they use PowerPC's in this box, but only for I/O controllers -- the main processors are some custom high-end ultra-CISC chips. Not all CISC is bad CISC.

Motorola and IBM were diverging -- but it looks like IBM caught on to how neat AltiVec is, and they are coming back into the fold and going to probably make some G4 chips as well. With IBMs superior process technology, this should be good for all.

Conclusions

Change is constant, and progress will march on, while our computers get faster and faster. It is amazing that home computers of today are now performing like the supercomputers of a generation ago.

Apple is marketing the G4 as a "super-computer" -- which is true and not true. The AltiVec (Velocity Engine) Vector unit can compete in some ways with a Cray-YMP and other supercomputers of a decade ago (or even more modern) -- but todays super computers are massively parallel machines that can go up to Teraflops (trillions of floating point operations per second) instead of the G4s Gigaflops (billions). But most modern super-computers get their performance not by having a fast single processor, but with thousands of off-the-shelf processors (and complex memory systems and OSs). Intel and IBM have both made Teraflop computers (using Pentiums and PPCs respectively -- with IBMs machine being superior to Intels, as is to be expected). And these were made with PentiumPros and PPC 604s I believe -- imagine if they each did the same thing with newer processors!

Remember, they were doing amazing things with those supercomputers of 10 years ago -- and you can now do much of that on your desktop for 1:1000th the price. The individual G4 processor is certainly in the supercomputer range (for a single processor) -- but a single processing desktop machine isn't truly as fast as a brand new super computer, even if it is as fast as a super computer was just a few short years ago. So Apple isn't lying -- it is a super computer by export rules, and it is certainly performing faster (for some things) than most "super-computers" that are in most labs (unless they are very new supercomputers). And you can network an array of Apple G4 computers together, and get true (brand new) supercomputer performance at a fraction of the price -- as UCLA and some other places are doing. This is amazing stuff.

Of course home users really don't need to do the same things as super-computers (and their users) -- but there are many things that they do need a lot of computing power. What we users care about is not the speed itself -- but what the speed will allow us to do. Real-time photoshop filters, real time rendering, better video and compression, faster 3-D, better sound and speech (recognition) processing, better networking, and so on. Time marches on -- and computer technology just keeps getting better.


Created: 07/06/98
Revised: 09/07/99
Updated: 11/09/02


Top of page

Top of Section

Home