CS596 Assignment 1.2: Single Processor Performance


Due date 3/22/01 at 5pm in class

1. Write a code that declares two arrays of size 40,000,000 each and has a 
scalar s. Assign some values to arrays x and y and to the scalar s. Declare 
everything to be float (in C) or real (in fortran). Execute the 
following 40000000 times.

    x[i] = x[i] + s*y[i]

    Put a timer around the execution of the loop only(i.e. do not include 
    the initialization and the print statements at the end). Next print out 
    some of the values of x. 

   Below the compiler option -fast means higher level of optimization

a) Compile as cc -S code.c
   (for fortran codes replace cc with f90)
   This will produce a file called code.s which has the assembly code.
   Look at the assembly part of the code that executes the loop
   x[i] = x[i] + y[i]*s
   i.e. identify the above line in the assembly code and below it will
   be the assembly code of that line.
   Notice the loads (ld), floating point multiplie single (fmuls)   floating point add single(fadds), store (st), branch (bl) instructions f
   or C assembly code. For fortran assembly code the ld, fmuls, fadds, 
   st are same but the branch instruction may be 
   branch less than equal (ble).
   Floating point registers will be named in your assembly code as f0, f1, 
   ..f4....f14 etc.
   Try to identify which register holds s and which holds x[i] and y[i]s. 
   Was the loop unrolled by the compiler ? (you may want to wait until
   you look at the assembly code of (b) before you answer this).
   Recompile as 
   cc code.c -o code 
   and run the code and record timing.

b) Compile as cc -fast -S code.c
   This will produce a file called code.s which has the assembly code.
   Like question a, look at the assembly part of the code that executes the 
   loop: x[i] = x[i] + y[i]*s 
   This will be little difficult than qustion a 
   since it will initilly (called header) do some fmuls and fadds to 
   set up things and then do fadd and fmuls for the actual loop, and then
   do some additional fmuls and fadds at the end. You should look for the 
   branch (bl for C and ble for fortran) command to identify where 
   the actual loop calculation is done.
   Can you tell if and how many times the compiler unrolled the loop?
   Recompile as 
   cc -fast code.c 
   and record timing and compare with case (a) above.
Hand in hard copy of the actual codes (for a and b) and assembly codes
(for a and b). Answer the questions on loop unroll (in a and b) and
also hand in the actual execution timings of a and b. As before loginto
gaos to write and compile codes and run (using bsub) on ultra.