CS596 Assignment 1.2: Single Processor Performance
Due date 3/22/01 at 5pm in class
1. Write a code that declares two arrays of size 40,000,000 each and has a
scalar s. Assign some values to arrays x and y and to the scalar s. Declare
everything to be float (in C) or real (in fortran). Execute the
following 40000000 times.
x[i] = x[i] + s*y[i]
Put a timer around the execution of the loop only(i.e. do not include
the initialization and the print statements at the end). Next print out
some of the values of x.
Below the compiler option -fast means higher level of optimization
a) Compile as cc -S code.c
(for fortran codes replace cc with f90)
This will produce a file called code.s which has the assembly code.
Look at the assembly part of the code that executes the loop
x[i] = x[i] + y[i]*s
i.e. identify the above line in the assembly code and below it will
be the assembly code of that line.
Notice the loads (ld), floating point multiplie single (fmuls),
floating point add single(fadds), store (st), branch (bl) instructions f
or C assembly code. For fortran assembly code the ld, fmuls, fadds,
st are same but the branch instruction may be
branch less than equal (ble).
Floating point registers will be named in your assembly code as f0, f1,
..f4....f14 etc.
Try to identify which register holds s and which holds x[i] and y[i]s.
Was the loop unrolled by the compiler ? (you may want to wait until
you look at the assembly code of (b) before you answer this).
Recompile as
cc code.c -o code
and run the code and record timing.
b) Compile as cc -fast -S code.c
This will produce a file called code.s which has the assembly code.
Like question a, look at the assembly part of the code that executes the
loop: x[i] = x[i] + y[i]*s
This will be little difficult than qustion a
since it will initilly (called header) do some fmuls and fadds to
set up things and then do fadd and fmuls for the actual loop, and then
do some additional fmuls and fadds at the end. You should look for the
branch (bl for C and ble for fortran) command to identify where
the actual loop calculation is done.
Can you tell if and how many times the compiler unrolled the loop?
Recompile as
cc -fast code.c
and record timing and compare with case (a) above.
Hand in hard copy of the actual codes (for a and b) and assembly codes
(for a and b). Answer the questions on loop unroll (in a and b) and
also hand in the actual execution timings of a and b. As before loginto
gaos to write and compile codes and run (using bsub) on ultra.