Thread

AltiVec sucks

· PowerPC · 22 posts · Apr 23, 2005 View original thread ↗

#1 Sat, 23 Apr 2005 - 07:16

http://spl.haxial.net/weblog/index.php?p=85

Quote:

AltiVec (a.k.a. Velocity Engine and VMX) is a steaming pile of doggy doo. It is a real pain in the ass to use, and usually a waste of time. It is more of a marketing gimmick than a real feature, or at least it is highly overrated. The only reason they added it was because Intel had already added the equivalent to their CPUs, and they did not want to look lacking in comparison (they did a me too!").

Yes, Intel had it first, it was called MMX (MultiMedia Extensions) and it was introduced in 1997. Yes, MMX is a steaming pile of doggy doo, but we are comparing doggy doo (MMX) with doggy doo (AltiVec) here.

These days, MMX has been extended, and now it is called SSE and SSE2 (Streaming SIMD Extensions, where SIMD is Single Instruction, Multiple Data). Or you can say MMX refers to the original instructions, and SSE to the more recent ones, whatever, MMX and SSE are both SIMD, and it began in 1997 before AltiVec in 1999. Motorola obtained the general idea for AltiVec from Intel.

However, Intel is more honest in their name for it they do not call it vector. Motorolas AltiVec name is misleading. The Vec is for vector, meaning a 1-dimensional array, but what is the maximum sized vector it can handle per instruction? A vector of 4 normal-sized integers or 2 normal-size floating-points!

It is a technicality. True, 2 items can still be a vector, but practically speaking, it is just fugging ridiculous to call it vector when the maximum size of the vector is 2 items. 4 is little better.

If you use so-called single precision floating-point values, these are half the size of the normal floating-point values, and then the maximum vector size is 4, which is still pathetic, and comes at the expensive of accuracy in the floating-point calculations.

The lower accuracy of single precision FP is usually unacceptable for scientific purposes. It is acceptable for games, however games do not use it anyway because nearly all the hard yakka (work) is done by the separate GPU on the graphics card, not the CPU. It has also been acknowledged that AltiVec is not usually suitable for server-type programs such as a web server, and that not all code (in fact only certain types of code) can benefit from AltiVec.

The bottom line is that AltiVec is NOT a vector processing unit, despite the name. It is a SIMD unit, and although it does have some benefits (which come at a cost), the benefits have been greatly exaggerated.

What are these costs I mention? To use AltiVec, you cannot write normal C code, you are forced to write AltiVec-specific assembly code. Assembly is considerably more difficult to write than normal C code. Furthermore, assembly means that you lose portability, making it difficult to take your code to other CPUs or operating systems. And because AltiVec cannot handle vectors, to process a vector you must kludge it by calling the instruction repeatedly, processing 2 or 4 elements at a time, until the whole vector is done. This makes it difficult, awkward, and messy to program.

Update: Actually, I just realized that I am not certain that AltiVec even supports normal-sized floating-point values. Possibly it only supports the single precision FP, which makes it even less useful. I cannot be bothered checking because either way, my point remains AltiVec is crappy. By the way, when I say normal-sized integers", that is 32-bit integers. When I say normal-sized floating-point", that is 64-bit (which is the size of the normal floating-point registers in PowerPC). One AltiVec instruction works on one 128-bit block, so 128/32 is where the maximum vector size of 4 is obtained (it is whatever you can fit in 128 bits).

Can somebody refute this jerk?

#2 Sat, 23 Apr 2005 - 10:28

I don't believe that the speed increases Altivec promised have been realized.

#3 Sat, 23 Apr 2005 - 11:32

Yea but this guy is making like Apple are outright liars which is so false. Isn't somebody here an expert and can refute his BS?

#4 Sat, 23 Apr 2005 - 12:40

1. You certainly don't need to write in assembly to call AltiVec when programming on a Macintosh. Apple makes it VERY easy to code.

2. As a real life scientist, I use single precision numbers all the time in my code, with the number of significant figures available from my data sets it's all I need. I suspect the same is true of many other scientists working on real world data sets, as most instrumentation hardly has the precision of a single, yet alone a double.

#5 Sat, 23 Apr 2005 - 12:44

Quote:

Originally Posted by macintologist

Yea but this guy is making like Apple are outright liars which is so false. Isn't somebody here an expert and can refute his BS?

Velocity on a flat plane is a two number vector. It has a speed value and a direction. Once upon a time, IBM/Microsoft claimed a faster computer because they used single precision math as opposed to the extended single precision used in the Apple II. Single precision (32 bits) is useful in some engineering applications and certainly for games. Double precision (64 bits) is generally accurate enough for most purposes including calculating mortgage payments; however, spreadsheets long ago switched to extended double precision (80 bits) for their calculations. ANSI/IEEE-STD-754 is the defined standard for computer math. Texas Instruments has made a living selling a separate vector processor chip for years. I am not familiar with built-in vector instructions for IBM, Motorola, Intel, or processors nor am I aware of any simple support in C. I would think that one would need a higher level language to get vector instructions. C IS assembly language and to optimize its use on any specific processor requires specialized programming and/or libraries. So what! This is true for all functions on all processors. Most code is not optimized and it still works. By the way, he is saying that Apple/Motorola advertising executives are bragging about a minor glitzy thing. What a revelation. sam

#6 Sat, 23 Apr 2005 - 12:51

One of the project I worked on last year was very CPU intensive. Without going into the gory details, I optimized the code to determine at runtime what kind of system it's on. The hardest job took 17 minutes to finish, and optimized for G5/Altivec it took 4 minutes.

There's your refute.

Mike

#7 Sat, 23 Apr 2005 - 15:39

Need to code in assembly? Mac OS X has a framework for AltiVec programming and 10.4 will auto-optimize code for AltiVec.

#8 Sat, 23 Apr 2005 - 15:40

Very nice, starman

SVass, this isn't the first time that jerk has blown things out of proportion

Ganesha, If only that idiot had guests comments enabled could that trash of his be refuted. He password protected his comments section for a very good reason.

#9 Sat, 23 Apr 2005 - 15:45

Quote:

Originally Posted by macintologist

http://spl.haxial.net/weblog/index.php?p=85

Can somebody refute this jerk?

This guy is a douché. He's been ranting about Macs and many other subjects for years. Here's his "soap box" page:
http://spl.haxial.net/

#10 Sat, 23 Apr 2005 - 16:18

The only thing that sucks worse than AltiVec is not having AltiVec ... at least in my experience owning G3 and G4 machines.

#11 Sat, 23 Apr 2005 - 16:46

If the truth be known, altivec was probably designed to optimize certain operations (blur?) in Adobe Photoshop as that was used to benchmark computer speed in advertising. Anyway, optimization is a big risk when done for multiple processors and operating systems as well as a large expenditure of time. We used to optimize Conway's Life on different computers with different languages as an intellectual exercise and only succeeded in proving that clock speed, language implementation,instruction set, and algorithm design all had varying degrees of influence on the outcome. For older people who remember the ads for multiple jewel watches or multi-tube radios and for middle age folk who recall the 16 bit 8088 (actually 8 bit) and the toddlers who want the 300 hp Gargantua V13 auto, I remind you that the horsepower is measured with the exhaust system removed (no back pressure), the fuel economy is measured on a flat course with no traffic lights, and the clock speed is the front side and does not include the backside memory acquisition. sam

#12 Sat, 23 Apr 2005 - 16:55

Whoa.

And here I'd0 thought for years that He-Who-Was-Hinks was some kind of programming genius. If he doesn't even know what single-precision and double-precision floats are, and the difference between the two, not to mention what a vector is, then this goes past idiotic to bordering on outright incompetence.

No, seriously. It looks like he read the POV-Ray FAQ on why they don't support AltiVec and then drew his own conclusions from that (and drew them poorly, I might add). Except that had he even had that much information, then he would have known for certain that AltiVec doesn't do double-precision floats.

Either this guy is a complete farud -in which case, the whole Hotline debacle should be investigated further- or he's just plain making stuff up, probably out of a distaste for Apple which has been mirrored for years in his idiotic "UI policy".

#13 Sat, 23 Apr 2005 - 17:47

Using Altivec amounts to the same as using SSE (MMX wasn't useful and AFAIK not really used) -- you have to write apps to use it. So most of the things that use it have relatively simple code which allows for parallelization. Media encoding would be the perfect example, another one are specific filters.

#14 Sat, 23 Apr 2005 - 21:07

You can program for AltiVec in C and even C++, it's just that assembly is as low level as you can get save for binary, which makes it much more efficient and faster.

#15 Sat, 23 Apr 2005 - 21:54

Quote:

Originally Posted by SVass

If the truth be known, altivec was probably designed to optimize certain operations (blur?) in Adobe Photoshop as that was used to benchmark computer speed in advertising.

yes, but also in scientic applications and pure processing information like server farms right?

so this altruistic dev of altivec is useful in that, the end-user benefits from the technology.

[i think i am right... right?]

#16 Sun, 24 Apr 2005 - 00:20

AltiVec is used extensively in bio science software, especially gene mapping and sequencing. I wanna find the article again, but it was a comparison of gene mapping software between a G4 and a P4. The G4 was multiples faster in gene sequencing (don't remember how fast, but I seem to recall something crazy like 6x faster.)

#17 Sun, 24 Apr 2005 - 00:21

Oh, here's the article. It's actually on MacNN's front page.

http://www-128.ibm.com/developerwor...gr-mw01Altivec2

Quote:

In this second article of a three-part series, Peter Seebach looks closer at AltiVec, the PowerPC SIMD unit. He explains further how you can effectively use AltiVec, discussing the choice between C and assembly, and shows some of the issues you'll face when trying to get the best performance out of an AltiVec processor.

#18 Sat, 23 Apr 2005 - 07:16

iTunes Encoding (192Kpbs MP3)

My 733 G4 peaks at 10.4x

My friend Andy's 1.8ghz Dell peaks at 4.5x

Yeah I know iTunes is an Apple program, but c'mon, my old iMac used to encode at 4x.

#19 Sat, 23 Apr 2005 - 07:16

Quote:

Originally Posted by sek929

iTunes Encoding (192Kpbs MP3)
Yeah I know iTunes is an Apple program, but c'mon, my old iMac used to encode at 4x.

The AltiVec enhancements were very present back when it was SoundJam.

#20 Sat, 23 Apr 2005 - 07:16

Add me to the list of users who have recoded scientific apps and achieved significant performance gains over scalar code. The original rant is of dubious merit, even childish.

-Bryan

#21 Sat, 23 Apr 2005 - 07:16

Well, there are a few mistakes right off the bat:

he is focused on floating point operations only. A single Altivec unit will work on a vector of 4 (single precision) floating point values at once. Notice that this is a perfectly appropriate use of the Computer Science term "vector". That is the definition.
now on the integer side of things is where AltiVec really shines. It is capable of chopping the 128 bit registers that it takes data from and viewing each one as: 16 8bit values, 8 16bit values, or 4 32bit values. The combinations of what processes are available for this are truly astounding. For example multiplying 16 values tines 16 other values and then adding 16 other values can be accomplished with a single instruction (not counting the load and stores).
He focuses on SSE2's abiltiy to work with 2 double precision values at the same time, but seems to ignore the fact that the G5's main floating point units can do this as well... since there are more than 2 of them. The only reason that Intel put in this feature was that the x86 instruction set is so starved for registers that it does make a difference to do this math in the extra registers available in the SSE2 extension (which is not subject to the design limitations on the x86).
pretty much everyone agrees that "AltiVec is vector processing done right", and that Intel has been playing catchup.
While MMX was out ahead of AltiVec, IBM has had a long history of doing vector processing in their chips. And vector processing has been around for a long time, it was one of the main features of the old Cray's.
you could argue that the old Mac IIfx's twin DSP's were effectively a vector processor unit... but this is stretching things a bit. They did do the same job, but in a slightly different way.

#22 Sat, 23 Apr 2005 - 07:16

Quote:

Originally Posted by sek929

iTunes Encoding (192Kpbs MP3)

My 733 G4 peaks at 10.4x

My friend Andy's 1.8ghz Dell peaks at 4.5x

Yeah I know iTunes is an Apple program, but c'mon, my old iMac used to encode at 4x.

And my dad's 2.4 Dell peaks at 11x...it depends on the system in PC world. There's a reason Macs are comparably fast to PCs of greater speeds, and that is because the hardware and OS are made for each other so well.