DO I = 1,N Y(J) = Y(J) + X(I) ENDDOIdeally the compiler should hoist the load for Y(J) before the loop and sink the store for Y(J) below the loop. Due to Y being a function of the index J, however, some compilers may have trouble performing this optimization on the above loop. How could you change this code to help the compiler perform this loop optimization?
REAL X(8,N) DO J = 1,8 DO I = 1,N X(J,I) = 0.0 CALL SUB1(X) ... ENDDO ENDDOAssume that the loops cannot be interchanged due to other work in the inner loop. Suppose that the X array is out-of-cache and that the systems on which you will be running the code have interleaved memory systems with eight banks. What performance problem might occur, and how might you change the code to improve the performance?
S = DDOT( 10, X, 1, Y, 1 )where DDOT is defined as
FUNCTION DDOT( N, X, IX, Y, IY ) READ*8 X(0:N-1), Y(0:N-1) S = 0.0 IF (IX .EQ. 1 .AND. IY .EQ. 1) THEN DO I = 0,N-1 S = S + X(I) * Y(I) ENDDO ELSE DO I = 0,N-1 S = S + X(I*INCX) * Y(I*INCY) ENDDO ENDIFhow might a compiler use inlining, constant propagation, and dead code elimination to optimize the code?
DO I=1,N IF (D(J) .LE. 0.0) X(I) = 0.0 A(I) = B(I)+C(I)*D(I) E(I) = X(I)+F*G(I) ENDDOWhat is inefficient about this code and how could you re-code it to be more efficient?
DO I=1,N DO J=1,N A(J,I)=B(J,I)*SIN(X(J)) ENDDO ENDDOHow could you rewrite this code to reduce the number of calls to SIN?
integer nx,nz parameter (nx=2048,nz=2048) real p(2,nx,nz) ... ... do 25 ix=2,nx-1 do 20 iz=2,nz-1 p(itl,ix,iz) = -p(itl,ix,iz) & +s*p(it2,ix-1,iz) & +s*p(it2,ix+1,iz) & +s*p(it2,ix,iz-1) & +s*p(it2,ix,iz+1) 20 continue 25 continue
DO II = 1,N,NB DO JJ = 1,N,NB DO KK = 1,N,NB DO I = II,MIN(N,II+NB-1) DO J = JJ,MIN(N,JJ+NB-1) DO K = KK,MIN(N,KK+NB-1) C(I,J) = C(I,J)+A(I,K)*B(K,J) ENDDO ENDDO ENDDO ENDDO ENDDO ENDDO