AMD Opteron Server – Part 2 (Favour of 64 bits)

In the first part of this series we discussed hardware of AMD64 server. In this article we will describe some features of AMD64 platform and the Opteron processor and do a brief review of Linux support.

10.5.2004 23:00 | Jan Houštěk | přečteno 26299×

AMD64, x86-64, x86_64?

There is a little confusion about those names. After some time of using x86-64, AMD decided to switch to a little bit egotistic name AMD64. The old x86-64 still remains in many places (software, documentation) but it is expected to be replaced. Some explanation could be found in this post from AMD to discuss@x86-64.org list.

The form with underscore is even more confusing. I suspect the real reason for the underscore in x86_64 in Linux is that autoconf/configure hate dashes in arch names, because of this notation: x86_64-gnu-linux-pc. If a dash were used, the string would be unparseable without prior knowledge of all arch names.

AMD64 architecture

The x86 did not have key changes for a long time except attaching new instructions (MMX for example). Key features of instructions addressing, memory segmentation, x86 instructions themselves did not change since the i386 – that was the last revolutionary processor.

The most worth feature (and also most damned by some fundamentalists) of AMD64 is 100% backwards compatibility with x86. Nothing changes for 32bit applications, you can use 32bit applications in the 64bit OS. AMD claims that this feature increases the number of transistors only by 2%–3% and has no impact on 64bit performance. On the other side, even those legacy 32bit applications can benefit from fast memory and IO and flat 64bit memory space (one 32bit process cannot use more than 4GB but more processes running simultaneously can).

Let's have a look how is the 64bit extension to x86 is achieved. Basically, it is done by adding a new mode called long mode. Is is enabled by a global control bit called LMA (for Long Mode Active). When LMA is disabled, the processor operates as a standard x86 processor, and is compatible with all existing 16 and 32bit operating systems and applications. Long mode consists of two sub-modes, 64-bit mode and compatibility mode. Both compatibility and legacy modes are meant to run old 16 and 32bit applications, compatibility long mode requires 64bit OS and does not support insanities such as x86 real mode or virtual-8086 mode (if you really want is, you have to use legacy mode and do without 64bit support).

LMA bit
AMD64 modes

The 64bit mode adds following new features:

Let's now explore what real AMD64 processors can do. Besides 64bitness and some straightforward enhancements (like 1MB L2 cache in Opteron, non-executable pages, IOMMU and others) there are two major innovations – integrated memory controller and HyperTransport bus.

Memory access

In the most of x86 systems CPU access to memory consists in several steps: CPU is connected to FSB (Front-Side Bus) operating at much lower frequency (e.g. 10 times) than the CPU core. The FSB is connected to memory controller also known as NorthBridge which provides access to the memory. This approach has two flaws: the FSB is too slow compared to performance needs of next generation CPUs and it is difficult to scale (the only way is to increase its operating frequency which is not easy because other components than CPU and NorthBridge use is). The other flaw is that using intermediate controller causes unnecessary delays.

AMD's solutions is really challenging. Opteron and Athlon64 have memory controller integrated into the CPU chip. Among obvious advantages of such approach there is one interesting consequence for multiple-way systems. While solution with common memory controller accessible by all CPUs fit to SMP architecture, solution with memory controller integrated in CPU is mostly NUMA-like. Generally this is a good idea since NUMA scales much better and gains better performance with properly designed applications. The problem is that many applications expect SMP. Realizing this AMD provides a hybrid approach called SUMO (for Sufficiently Uniform Memory Organization) which enables the OS system to appear like SMP while the physical architecture is NUMA-like (and there is still possibility to benefit from it for NUMA-aware applications). The SMP emulation is achieved by fast inter-CPU communication realized by the HyperTransport bus.

HyperTransport

HyperTransport is a high-speed point-to-point full-duplex link for integrated circuits. It combines a simple layout, excellent speeds, low latencies and good scalability; at the same time, it is compatible with the software PCI model. Links are scalable both in frequency and data-path width. Default clock frequency is 200MHz and the current implementations uses upto 800MHz. This is similar to other commonly used buses. The further flexibility is enabled by providing scalable data path width for each link (currently 2, 4, 8, 16 or 32 bits are available). 16bit HyperTransport device can be connected with 2, 4, 8 or 16 bit link. The fastest 32bit link has aggregate bandwidth of 12.8 GB/s.

Opteron processor has 3 16bit HT links and one 8bit. In 2-way systems the 8bit link (3.2 GB/s) is used to connect slower IO devices (such as 32bit PCI), one 16bit link connects faster IO devices (e.g. PCI-X) and the remaining two links are aggregated into one 32bit links with throughput of 12.8 GB/s and used for inter-CPU communication. Cheaper Athlon64 CPUs have only one 16bit link and one 8bit &nhash; this still enables to build 2-way systems. And of course, more than 2-way systems are also possible (only with Opterons). The picture below shows HT links wiring in 4-way system.

HT links in 4-way Opteron

Below is a diagram which demonstrates a scalability level of the Hyper Transport bus.

HT scalability

Who would really benefit form AMD64

AMD64 is most likely to benefit to:

Cryptography and safety ensuring applications get a great benefit from 64bit integer calculations. In this sphere usage of the AMD64 can favour a real breakthrough. E.g. for one arithmetic operation with 128-bit numbers x86 needs 60 instructions (16 mul, 29 adc, 15 add) while AMD64 needs only 12 instructions (4 mul, 5 adc, 3 add). There is a slight trick because the execution times differ for the 32bit and 64bit operands, but anyway it is hard to overrate importance of 64bit for cryptography. For a high-bandwidth VPN router or another server providing large amount of encrypted data, AMD64 is an obvious choice.

In next parts we will provide results of some benchmarks on the server described in the first part.

Linux on AMD64

As mentioned, any x86 OS can run on AMD64 machine in legacy mode but nobody would really want that. Let's focus on truly 64bit systems. Microsoft is working on 64bit version of Windows (currently – May 2004 – there is a beta version available, some people predict that we will have to wait 64 years for 64bit Windows :). SUN has Solaris 10 and development versions FreeBSD and NetBSD also support AMD64.

Linux supports AMD64 quite well. Recent versions of kernel 2.4 has AMD64 support but 2.4 is being deprecated in favour of 2.6 version. Nearly all major distributions have AMD64 versions. Some of them and their availability are listed below (only stable releases are included):

Slackware does not provide AMD64 version and there is no development effort in this direction now. Debian has a highly development version and it is possible that the next stable version will add AMD64 to the list of supported architectures.

 

Resources

Related articles

Online verze článku: http://www.linuxsoft.cz/article.php?id_article=141