Performance loss using gcc 3.0 vs 2.9x on x86 platforms

NOTE: all of these problems appear to be fixed in gcc 3.1. I confirmed the fix on an Athlon using a prerelease, and on the PIII using official 3.1. Both performed indistinguishably from 2.95.2.

Some terminology

In order to simplify reporting below, I'm going to seperate compilers into the 2.9x series, and 3.0. 2.96-80, despite the name, is not included in the 2.9x moniker. If I want to reference it, I will give it's release number explicitly. I would say 2.95 or previous, but RedHat 7.0's 2.96 version acts like the 2.9x series, while 7.1's 2.96-80 acts more like 3.0.

Executive summary

This page reports on a performance problem with the new gccs. This all started because a user reported a factor 2 drop in performance when compiling ATLAS with RedHat 7.1's 2.96-80 gcc compiler on an Athlon. I confirmed that 3.0 compiled code ran at 54% of the rate of 2.9x compiled code on Athlons, and at 75% of the rate of 2.9x code on a PentiumIII.

From the experiments described below, I think it is highly probable that there are two significant problems with gcc 3.0 on x86:

  1. The 3.0 fetch scheduler is optimized for Pentiums, to the very great detriment of Athlons. The scheduling algorithm used in the 2.9x series is much better for Athlons (2.96-80 has this problem as well). For ATLAS's kernel on Athlons, this problem causes an almost 50% drop in performance.
  2. 3.0's fpu stack handling is inferior to 2.9x's on all x86 platforms (2.96-80 does not have this problem, if you turn up optimization over that required in earlier releases). For ATLAS's kernel on any x86 (actually, confirmed for PIII & Athlon only), this problem causes a roughly 10% drop in performance.

On 8/01/01, I submitted this problem as a bug to both RedHat and gnu:

On 12/03/01, a user suggested I try submitting under a different catagory under gnu, since I had gotten no response. It has been resubmitted as: