ATLAS 3.2.1 errata



NOTE:

This file is obsolete, and no longer maintained, as is the ATLAS version it describes. The supported errata is always here; The previous errata was for ATLAS 3.2.0 .


[Home] [Docs] [FAQ] [Errata] [Software] [Support] [Lists] [Developer home]

ATLAS v3.2 released on 12/20/00

First 3.2 update 3.2.1 released on 03/23/01

To see what errors were fixed, you can scope the The 3.2.0 errata file

ATLAS errors :

System problem, user hints/tips:


[Home] [Docs] [FAQ] [Errata] [Software] [Support] [Lists] [Developer home]


ATLAS build dies on Red Hat 7.0 and/or gcc 2.9[6,7]

Red Hat 7.0 shipped with a version of gcc not supported by GNU (GCC 2.96 and/or 2.97). It contains error(s) causing the ATLAS build to fail. Redhat has released a patch fixing the problems in the RH7.0 version, available here. If this doesn't work for you, the recommended fix for the problem is to install any of the GNU-supported gcc versions.

Installing with a non-default f77 compiler

The only Fortran routines in ATLAS are the Fortran77 interface routines, which do no computation. Therefore, the Fortran77 compiler has absolutely no effect on ATLAS's performance, and so the only reason you should need to use a non-default f77 compiler is if the f77 compiler you wish to use does not interoperate with ATLAS's default compiler.

To install with a non-default f77 compiler, when ATLAS prompts you for "use express setup?", answer no, and when prompted for the F77 compiler and flags, fill in those you wish to use (this will be made easier in the next release -- OK the release after that).

Installing additional f77 interfaces

The only Fortran routines in ATLAS are the Fortran77 interface routines, which do no computation. Therefore, the Fortran77 compiler has absolutely no effect on ATLAS's performance, and so the only reason you should need to use a non-default f77 compiler is if the f77 compiler you wish to use does not interoperate with ATLAS's default compiler.

If you want to install ATLAS so it can be called from multiple, non-interoperable Fortran compilers (or indeed, have already installed with the wrong f77 compiler), you can do this with moderate ease, assuming you know how C and the given F77 compiler(s) interoperate. If you do not know this interoperational information, you must get config to find it for you. To do this, in your ATLAS/ directory, run config again (usually, just issue make), and when ATLAS prompts you for "use express setup?", answer no, and when prompted for the F77 compiler and flags, fill in those you wish to use, and when prompted for the architecture name, give a unused architecture name like F2C_BOGUS. You can then look at the generated Make.F2C_BOGUS's F2CDEFS for the appropriate settings, and replicate them, along with the new F77 compiler/linker information, into your original Make.ARCH. You should then get rid of the unnecessary files created by this "installation" by typing make killall arch=F2C_BOGUS.

For those user's already aware of the information needed for C/F77 interoperation, ATLAS needs three pieces of information in order to correctly handle F77/C interoperation, and this information appears as defines to the C compiler, set in your Make.ARCH's F2CDEFS.

The first macro controls the name space alterations necessary to make a C routine callable from Fortran77. The options are:

Add_
All F77-callable C routines should be lowercase, and have an underscore suffixed to their names.
Add__
All F77-callable C routines should be lowercase, have an underscore suffixed to their names, and if the F77 name itself posseses an underscore, two underscores should be suffixed.
NoChange
All F77-callable C routines should be lowercase, with no name alteration.
UpCase
All F77-callable C routines should be made uppercase, with no further name alteration.

The second macro provides a mapping between F77's INTEGER and the appropriate C integral type. Options are:

No definition
Default case where C's int corresponds to F77's INTEGER.
F77_INTEGER=long
F77's INTEGER corresponds to C's long.
F77_INTEGER=short
F77's INTEGER corresponds to C's short.

The third macro deals with F77 string handling. The options are:

StringSunStyle
The string's address is passed at the string's location on the stack, and the string's length is then passed as an F77_INTEGER after all explicit stack arguments.
CrayStyle
Special option for CRAY machines, which uses Cray's fcd (fortran character descriptor) for interoperation.
StringStructPtr
The address of a structure is passed by a Fortran77 string, and the structure is of the form:
      struct {char *cp; F77_INTEGER len;};
StringStructVal
A structure is passed by value for each Fortran77 string, and the structure is of the form:
      struct {char *cp; F77_INTEGER len;};

By default, ATLAS builds the F77 interface to the BLAS into the file pointed at by Make.ARCH's F77BLASlib, and so changing this macro before recompiling the interface will allow you to build multiple F77 interfaces.

For example, say on a Solaris machine I want to build the f77 interface for both Sun's f77 and g77. First, I install ATLAS as normal, with the default f77 compiler. Now, to get a g77 interface lib, I edit my ATLAS/Make.SunOS_SunUS2, and I find that ATLAS has detected the C/F77 interface for Sun's f77 compiler as:

   F2CDEFS = -DAdd_ -DStringSunStyle
I then change this to match g77:
   F2CDEFS = -DAdd__ -DStringSunStyle
Now, so that my Sun f77 interface will not be overwritten, I also change:
   F77BLASlib = $(LIBdir)/libf77blas.a
to:
   F77BLASlib = $(LIBdir)/libg77blas.a
Finally, I change the f77 compiler/linker information from:
   F77 = /opt/SUNWspro/bin/f77
   F77FLAGS = -dalign -native -xarch=v8plusa -xO5
to:
   F77 = /usr/local/bin/g77
   F77FLAGS = -O3 -funroll-all-loops
Now, I cd ATLAS/interfaces/blas/F77/src/SunOS_SunUS2, and issue:
   make clean
   make lib
Now, when linking with Sun's f77, I link to -lf77blas.a -latlas.a, and when linking with g77 I use -lg77blas.a -latlas.a

You can essentially repeat this process for the LAPACK F77 interface, but change LAPACKlib rather than F77BLASlib, and go to ATLAS/interfaces/lapack/F77/src/SunOS_SunUS2 rather than ATLAS/interfaces/blas/F77/src/SunOS_SunUS2.

I get "MAIN__ Unresolved" error when linking using Compaq/Dec f77

ATLAS's bin/ARCH/Makefile uses f77 to handle linking in order to satisfy F77 symbols brought in by linking vendor's BLAS. However, all the programs in this directory are written in C, and you need to signal this to f77 by throwing the -nofor_main flag. The easist place to add this flag is in your Make.ARCH's FLINKFLAG macro. Note that ATLAS doesn't just always throw this flag because the F77 interface testers, for instance, are written with Fortran77, so this "correction" can cause errors when compiling other executables.

How do I link with all these libraries?

The user libs created by ATLAS are:
liblapack.a
The LAPACK routines provided by ATLAS. If you want a full lapack library, the .o in this lib can be archived into the f77 lapack lib without error.
libcblas.a
The ANSI C interface to the BLAS.
libf77blas.a
The Fortran77 interface to the BLAS.
libptcblas.a
The ANSI C interface to the threaded (SMP) BLAS. This library only appears if you have asked for SMP support.
libptf77blas.a
The Fortran77 interface to the threaded (SMP) BLAS. This library only appears if you have asked for SMP support.
libatlas.a
The main ATLAS library, providing low-level routines for all interface libs.
If you have missing symbols on link, make sure you are linking in all of the libraries you need, and remember that order *is* significant. For instance, a code calling the Fortran77 interface to the BLAS would need:
   -L$(MY_HOME)/ATLAS/lib/$(MY_ARCH)/ -lf77blas -latlas
The full LAPACK library created by merging ATLAS and netlib LAPACK requires both C and Fortran77 interfaces, and thus that link line would be:
   -L$(MY_HOME)/ATLAS/lib/$(MY_ARCH)/ -llapack -lf77blas -lcblas -latlas
If you wish to use threaded BLAS, you simply indicate those interface libs rather than the sequential. The above line for SMP would be:
   -L$(MY_HOME)/ATLAS/lib/$(MY_ARCH)/ -llapack -lptf77blas -lptcblas -latlas

Dec ALPHA compilers

If you are using a Compaq/DEC Alpha, it is extremely important to use gnu gcc, rather than egcs gcc or Compac's cc. On my 533Mhz Dec ALPHA ev56 (21164), gcc 2.8.1 gets clearly better performance than egcs 1.1.1, egcs 2.91.66 (1.1.2), or gcc 2.95.1. The difference can be as much as 100Mflop, so this is vitally important. For further information on the egcs/gcc issue, go here. I have put up a small page describing the process I used to install gcc 2.8.1 on my Dec ev56 running Linux here if you are having difficulty finding or installing it. User's of DEC unix should be aware that cc should not be used to compile the generated matmul code. The compiler does not allow the turning off of optimizations that cause optimal code to run slower, so you should compile these routines with gcc. Configure will do this automatically if gcc is installed. If you don't have gcc, we strongly recommend you download and install it (you don't have to be superuser to do this). Otherwise, expect a large performance drop (around 8% for an EV6, and considerably more for an EV56). If you insist on getting bad performance, and don't install an old gcc, make sure to edit your Make.ARCH include file to correct the gcc-only flags used in MMFLAGS, and tell config you do not want to use the ATLAS-provided defaults (they assume the correct gcc). If you are using Linux/ev56, I recommend getting gcc-2.8.1 and installing it with the instructions given here. If you are using Tru64 (AKA OSF1), just do the default install using gcc-2.7.2.3.

x86 compatible compilers

For Linux x86 compilers, the only rule should be: don't use portland group cc. Egcs, pentium optimized gcc, and gnu gcc all get roughly equivalent performance with ATLAS, but the portland group C compiler is much worse. Gcc should interoperate with the portland group compilers, so this should not be a problem. If you are using portland group fortran, it does not natively interoperate with g77, which is what config will use to compile ATLAS's F77 interface routines by default. In order to force the use of portland group fortran, answer "no" to the "use express setup?" question, and when prompted, give the path to the portland group fortran. Again, do not use portland group C, even if using portland group's fortran.

For Windows, you need to install Cygwin in order to use gcc. It delivers significantly better performance than MSVC++ or Watcom C (as well as the Intel enhanced MSVC++).

My system doesn't have the -f option to cp

If you take the following line, and put it in a file cp you make executable, and then put it in your path before your system cp, it should get rid of the -f option:
/bin/cp `echo $* | sed -e 's/-f / /'`

How do I install on a Intel Celeron?

Config will ask you what your hardware is. It is recommended that you set it to Pentium II.

ATLAS fails Level 1 BLAS tester when compiled with gcc 2.95.2 on Compaq/DEC alphas

We have observed this problem, but not yet tracked it down. Since the exact same testers and code work correctly with older gccs (eg, 2.8 or 2.7), we suspect a compiler error. For now, the fix is to install gcc 2.8 or 2.7.

Building a complete LAPACK library

ATLAS does not provide a full LAPACK library. However, there is a simple way to get ATLAS to provide its faster LAPACK routines to a full LAPACK library. ATLAS's internal routines are distinct from LAPACK's, so it is safe to compile ATLAS's LAPACK routines directly into a netlib-style LAPACK library. First, download and install the standard LAPACK library from the LAPACK homepage. Then, in your ATLAS/lib/ARCH directory (where you should have a liblapack.a), issue the following commands:
  mkdir tmp
  cd tmp
  ar x ../liblapack.a
  cp <your LAPACK path & lib> ../liblapack.a
  ar r ../liblapack.a *.o
  cd ..
  rm -rf tmp

Just linking in ATLAS's liblapack.a first will not get you the best LAPACK performance, mainly because LAPACK's untuned ILAENV will be used instead of ATLAS's tuned one. So, if you use any LAPACK routine that is not provided by ATLAS, it is essential that you create this hybrid LAPACK/ATLAS library in order to get the best performance.

How do I restart a install from scratch?

From your ATLAS directory, issue :
   make killall arch=ARCH
   make startup arch=ARCH
   make install arch=ARCH

How do I restart an interrupted install?

If your ATLAS install was interrupted, and you have fixed the problem, you can usually safely (there are always exceptions; if the install died in the middle of an ar command, for instance, many systems cannot recover) restart the install by:

How do I do I get rid of all the .o's?

ATLAS does not have a working "make clean" that leaves the architecture-specific directory structure in place. Issuing "make kill arch=ARCH" in your ATLAS directory, however, will remove all architecture-specific subdirectories, with the exception of ATLAS/lib/ARCH, along with all related object files. Issuing "make killall arch=ARCH" gets rid of all architectural-specific subdirectories.

Do NOT use the -fno-f2c flag with g77

Haven't tracked this down in a while, but it appears to break quite a few things in fairly non-obvious ways for mixed g77/gcc libs.

Building a Compaq Visual Fortran and MSVC++ compatible ATLAS under Windows

ATLAS should be MSVC++ compatible by default (ATLAS will compile itself using gcc, but with special tricks so that the library may be called from MSVC++). Getting it working with CVF, however, is not quite so trivial.

ATLAS can only interface using to CVF using the /iface:cref interface (you may, in addition, use /iface=nomixed_str_len_arg, or not, as you choose). The following steps were necessary using cygwin 1.1.8, Visual Studio 6.0 and CVF 6.5:

Athlon performance varies on each install it's slower than it used to be)

This is probably due to a empirically timed value called CacheEdge. Timings on the Athlon are not very repeatable, with the result that a non-optimal CacheEdge is often found. You can repeatedly run the executable that detects this value in order to get a better idea of what the setting should be. To do this, cd to ATLAS/tune/blas/gemm/ARCH, and issue make xdfindCE. Run this guy as many times as you want, and see if you can detect a trend. Once you have your value, enter it in atlas_cacheedge and recompile, as shown below.

If you want to use the values we have settled on here at UT, for Athlon's with 512K of L2 cache (classic Athlons), set your ATLAS/include/ARCH/atlas_cachedge.h to:

#ifndef ATLAS_CACHEEDGE_H
   #define ATLAS_CACHEEDGE_H
   #define CacheEdge 307200
#endif

For Athlons with a 256K L2 (i.e. "enhanced" Athlon's), set the same file to:

#ifndef ATLAS_CACHEEDGE_H
   #define ATLAS_CACHEEDGE_H
   #define CacheEdge 217088
#endif

If you are not sure what kind of Athlon you have, cat /proc/cpuinfo should tell most Linux users.

Once you make the change, typing make xdl3blastst xsl3blastst xcl3blastst xzl3blastst in your ATLAS/bin/ARCH will apply the fix to all of your data types.

What happens if I install with no Fortran compiler?

ATLAS will still install correctly, though it will obviously not create the Fortran77 interface libraries. You will not be able to run the testers under the ATLAS/interfaces/ directory, since these testers are written in Fortran. Further, ATLAS expects that you will be comparing against a Fortran77 interface BLAS, and this will obviously not be the case, and so you will need to make the following changes if you want to run any of the ATLAS tester/timers, even the ones written in C:

Post install tuning.

Here are some tips to improving ATLAS performance after an install:

Tuning CacheEdge.

CacheEdge is an Level 2 Cache blocking parameter; because it's effects are fairly subtle on most machines, it often goes wrong on machines experiencing any kind of load, causing performance to be be suboptimal. CacheEdge can improve performance by as much as 15%, and it can reduce ATLAS's memory usage as well.

In ATLAS/tune/blas/gemm/ARCH, issue make xdfindCE. Run this program several times to get a consensus idea of what a good setting would be. If a CacheEdge setting gets performance in the same range as no CacheEdge (CacheEdge of 0 is no CacheEdge in printout of xdfindCE), it is still recommended that you use that setting, since ATLAS with CacheEdge set will use less memory as problem sizes grows.

Once you have gotten an idea of what to set CacheEdge to, you can change it by editing ATLAS/include/ARCH/atlas_cacheedge.h. xdfindCE prints out data in KB, but atlas_cacheedge.h needs bytes, so multiply the xdfindCE result by 1024 to get the number you want to use in atlas_cacheedge.h.

Let's take an example. Say xdfindCE printed out this:

TA  TB       M       N       K   alpha    beta  CacheEdge       TIME    MFLOPS
==  ==  ======  ======  ======  ======  ======  =========  =========  ========

 T   N    1000    1000    1000    1.00    1.00          0      5.470    365.63
 T   N    1000    1000    1000    1.00    1.00         16      5.470    365.63
 T   N    1000    1000    1000    1.00    1.00         32      5.460    366.30
 T   N    1000    1000    1000    1.00    1.00         64      5.470    365.63
 T   N    1000    1000    1000    1.00    1.00        128      5.260    380.23
 T   N    1000    1000    1000    1.00    1.00        256      5.240    381.68

Initial CE=256KB, mflop=381.68


Best CE=256KB, mflop=381.68
So we want to set CacheEdge to 1024*256 = 262144. atlas_cacheedge will look something like:
#ifndef ATLAS_CACHEEDGE_H
   #define ATLAS_CACHEEDGE_H
   #define CacheEdge 196608
#endif
If your initial install did not use CacheEdge, line 3 will be missing completely. If you don't have this line, you would simply add it, using the new value of 262144. In the above example, we would simply replace 196608 with 262144.

By successively editing this file and recompiling, for instance ATLAS/bin/ARCH/x[d,s,z,c]mmtst you can tune this value further. Many users expect that they should set CacheEdge to the actual size of their L2 cache. This is only rarely the best setting, mainly because L2 caches are normally combined data/instruction, and so a smaller setting, leaving room for instruction caching, is usually best. On some machines with large L2 caches, things like associativity, or even TLB issues, can make it more efficient to use a very small subset of the available cache.

Here are some CacheEdge settings that the ATLAS team has chosen:
ArchL2 CacheCacheEdge
PPRO256K147456
PII512K262144
PIII512K262144
PIII256K163840
P4256K131072
Athlon256K217088
Athlon512K307200

Once you have set CacheEdge to the value you need, update all libs with the new setting by issuing make xdl3blastst xsl3blastst xcl3blastst xzl3blastst in your ATLAS/bin/ARCH directory.

When linking ATLAS's testers, I'm getting a bunch of undefined BLAS symbols (eg. dgemm_, dgemv_, etc).

The ATLAS BLAS testers (x[s,d,c,z]l[1,2,3]blastst) expect to compare against a F77 interface BLAS library for performance and testing purposes. You get these missing symbols when your Make.ARCH's BLASlib is left blank, or does not point at a complete BLAS library. If you have a non-ATLAS BLAS built somewhere, point the BLASlib macro at it. If you don't, probably the easiest fix is probably to grab the Fortran77 reference BLAS tarfile, and build it into the required lib. If you don't want to do this, or don't have access to Fortran77, then you can have ATLAS test against its own C reference as discussed here.