|
On the eve of the release of a new Linux kernel, OpenBench Labs found it quite appropriate to test AMD's extravagant claims of delivering "the world's most powerful x86 processors, outperforming Intel's Pentium III processor and delivering the highest integer, floating-point, and 3-D multimedia performance for applications running on x86 system platforms." To do this, we custom-designed a very high-end workstation to take full advantage of the number-crunching capabilities of AMD's Athlon processor and expanded driver support in areas such as USB devices and 3-D graphics controllers in the new Linux kernel.
The close relationship between AMD's Athlon and the Alpha EV6 design-AMD originally licensed this processor technology from Digital Equipment Corporation-motivated OpenBench Labs in this assessment. The upshot of this link is an x86-compatible processor with very RISC-like structures.
The Athlon CPU sports a superpipelined, nine-issue superscalar architecture and a 200-MHz, 1.6-gigabyte/sec system bus. The high-speed execution core includes multiple x86 instruction decoders, dual-ported large 128-KB split-L1 cache with 64 KB for instructions and 64 KB for data, large 256-/512-KB L2 cache, three independent 10-stage integer pipelines, and three address-calculation pipelines. More importantly, the Athlon is distinguished by the first floating-point engine in an x86-compatible platform with a 15-stage pipeline and multiple schedulers for superscalar, out-of-order, speculative execution of instructions.
For IT decision makers whose required college reading list excluded Hennessy and Patterson's tome on computer architecture, the business case for all of these microprocessor pyrotechnics is simply the ability to excel at executing software. Emerging cutting-edge software in such areas as digital content creation for streaming over the Internet, workstation-class 3-D modeling, commercial desktop publishing, and speech recognition significantly stress processor and system bandwidth. What's more, since Athlon processors are binary-compatible with existing x86 CPUs, all of the more than 60,000 existing commercial software applications, including those optimized for MMX, that run on Intel-based systems will run on Athlon-based systems.
Surprisingly, all of these design advances have been packaged into a CPU chip that is compatible with motherboards based on AMD's Socket A form factor. As a result, high-performance systems built using the AMD Athlon processor can leverage commonly available mechanical components including chassis, power supplies, heat sinks, and fans. This factor made it very easy for OpenBench Labs to act as a systems integrator and put together a very aggressive workstation.
Our foundation for this system was an ATX-format, bare-bones system from Enlight Corp. with a GA-7ZX motherboard from G.B.T. Technology Trading of Hamburg, Germany. At the heart of the board is the VIA Technologies Apollo KT133 chipset. The Apollo K133 chipset provides the controller interconnects necessary to bridge the Athlon's 200-MHz front-side bus with a 133-MHz memory bus (PC100, PC133, and VC SDRAM can be mixed), a 4X AGP graphics bus, as well as support for ATA-33/66 drives and up to four USP ports. The GA-72X motherboard provides three slots for memory (up to 1.5GB), a 4X AGP slot for a graphics card, five 32-bit PCI slots, two USB ports, and of course, a Socket A connection for an Athlon (K7) processor.
We began by installing an 850-MHz Athlon processor on the motherboard. We then loaded our workstation with three 128-MB ECC PC133 modules to keep it
comparable to a Dell 2400 Server. For hard-disk storage, we utilized the system's three 5-inch open bays to install an Enlight 8720 storage unit, which provides five hot-swap drive trays. Into this unit, we loaded 4 Seagate Cheetah X15 Ultra160 SCSI drives. We then connected these drives to an ICP Vortex RAID controller. Using the firmware on the controller, we configured a single RAID 5 volume, presenting this to the Linux OS as a single logical drive.
Next, we loaded Red Hat v7.0 on to our workstation and upgraded the Linux kernel to 2.4.0-0.26enterprise along with the glibc library. Once we confirmed that the 2.4 kernel was working properly, we then halted the system, replaced the PS2 desktop devices with USB devices, and brought up the system. Red Hat detected the new devices and provided a simple configuration option; in minutes we were up and running.
We repeated this sequence on a Dell PowerEdge 2400 Server powered by a 866-MHz Pentium III processor built on the Intel 440 chipset. Similar to our workstation, this server had 256 MB of PC-133 memory with ECC. This server would be the basis on which to compare CPU performance between the AMD Athlon and the Intel Pentium III using the OpenBench Labs oblcpu benchmark.
The oblcpu benchmark runs 34 numerically intensive kernels with a mix of integer and floating-point arithmetic done in both single and double precision. While OpenBench Labs' benchmark has an obvious relationship to the performance of scientific, mathematical, and engineering workstation applications, there is also a close relationship between floating-point calculation performance and high-end graphics performance.
Extensive floating-point arithmetic can be found in MPEG-2 video-encoding, speech-recognition, financial-modeling, and trading applications. These calculations are essential for geometric calculations common in CAD/CAE processing, high-precision mathematical calculations, and 3-D graphics applications involving physics, geometry, and triangle setup. With the need to deliver increasing levels of realism and detail when modeling physical objects in motion, higher-quality digital video, and a richer Web experience, good floating-point performance has become a "must-have" requirement.
When we ran our oblcpu benchmark on both systems, three out of every four kernels ran distinctly faster on the AMD Athlon system. In a few instances, it grew to a 200% to 300% differential. These instances were the exception, however, and not the norm. Overall, the best measure to obtain a single performance number is to take the geometric mean of all of the kernels. This measure is the least susceptible to being over-weighted by data points at the extremities.
For our 34 kernels, the geometric mean of the performance indices of the kernels clocked in at 300 for the 850-MHz AMD Athlon and 254 for the 866-MHz Intel Pentium III. OpenBench Labs currently normalizes performance to a standard 300-MHz Pentium III
running Windows 2000, which we set to 100. An even better indication of performance is a 95% statistical-confidence region. For the Athlon processor, this range was between 275 and 353, while for the Intel Pentium III this ranged for 238 to 299. The bottom line for Linux users is the likelihood of a 20% performance edge when running on an Athlon as opposed to an equally fast Pentium III.
Our final test was OpenBench Labs' memband benchmark, which measures memory throughput by using the CPU to move data between memory locations and measuring the time taken. Few implementation factors are as critical to overall system performance as the usable bandwidth between the CPU and memory. The benchmark requires a minimum allocation of 64 MB of memory so that a broad range of physical addresses can be used and the address translation caches can be effective without dominating the behavior of the overall system. OpenBench Labs' memband varies the "stride" value as it proceeds through the test, stressing the bandwidth by transferring larger chunks of data. Since both the Enlight workstation and the Dell PowerEdge Server were using PC133 ECC SRAMs, memband was essentially measuring the performance of the AMD Athlon and VIA Technologies Apollo KT133 chipset combination.
Given the Athlon's 200-MHz system bus-which connects the Athlon to the Apollo KT 133-and the Athlon's large integrated full-speed L1 cache-which is four times larger than the Pentium III processor's L1 cache-we expected to measure stellar memory bandwidth performance. On memory strides ranging from 8 to 64 bytes, this is exactly what we measured with throughput in MB/sec, averaging about 20% greater than the Dell PowerEdge 2400.
Nonetheless, we encountered a rather strange anomaly at a 4-byte stride: For this size stride, we would have predicted performance on the order of 350 MB/sec. Instead, the AMD/VIA combination demonstrated enormous variability in throughput for this particular stride. The performance range ran from a high of 330 MB/sec to a low of 209 MB/sec. This is the first time that OpenBench Labs encountered such variability outside of a small-memory system. While all of the other data points were rock-solid in their consistency, throughput for a 4-byte stride wobbled like Jello.
Over the coming months we'll be testing other performance aspects of the new v2.4 kernel. One of the more interesting aspects of the enterprise version is the presence of 32 "async I/O" threads. This is a new programming construct, which must be explicitly coded for in order for an application to take advantage of their presence.
OPENBENCH LABS SCENARIO
UNDER EXAMINATION
AMD Athlon-based workstation performance
WHAT WE TESTED
Enlight Bare Bones system for AMD Socket A CPUs with G.B.T.GA-7ZX motherboard
www.enlightcorp.com
HOW WE TESTED
Red Hat Linux v7.0 running beta kernel 2.4.0-0.26enterprise
www.redhat.com
AMD 850MHz Athlon processor
www.amd.com
ICP GDT7663RN, six-channel Ultra160 SCSI disk array controller
www.icp-vortex.com
(4) Seagate Cheetah X15 drives
www.seagate.com
Enlight EN-8720 hot-swappable drive module
www.enlight.com
OpenBench Labs oblcpu v1.0 benchmark
OpenBench Labs memband v1.0 benchmark
KEY FINDINGS
- AMD's Athlon provides approximately a 20% CPU-processing advantage over a Pentium III-based system running at the same clock speed.
- Memory throughput averaged approximately 20% greater than that of a Pentium III-based system. Note that at four-byte strides, a great deal of variability was consistently measured.
- Athlon architecture is binary-compatible with Intel x86.
- New Linux kernel provides full support for USB devices.
|