GrADS:   Experimental results from Sept 2001 ScaLAPACK runs

This page contains results gleaned from the Contract Monitor output for the ScaLAPACK runs Celso did in Sept 2001 using updated ScaLAPACK model for predicting iteration duration.   

The plots show

In addition, some notes about experiment parameters and measured durations are included. 

Jump to Experiment1  Experiment2  Experiment3 Experiment4


Experiment 1:   

N = 12,000; NB=64  Processes= 8   

machine opus16 opus13 opus14 opus15 torc1 torc7 torc4 torc6
mem(MB) 240 220 229 220 452 479 446 458
speed 270 270 270 270 330 330 330 330
load 0.88 0.97 1.00 0.86 0.35 1.12 0.38 1.35

Fine grid latency matrix :

1.00

0.22

0.39

0.22

81.52

81.52

81.52

81.52

0.22

-1.00

0.22

0.22

81.52

81.52

81.52

81.52

0.39

0.22

-1.00

0.22

81.52

81.52

81.52

81.52

0.22

0.22

0.22

-1.00

81.52

81.52

81.52

81.52

81.52

81.52

81.52

81.52

-1.00

27.53

0.32

1.22

81.52

81.52

81.52

81.52

27.53

-1.00

0.31

0.30

81.52

81.52

81.52

81.52

0.32

0.31

-1.00

0.31

81.52

81.52

81.52

81.52

1.22

0.30

0.31

-1.00


Fine grid Bandwidth matrix :

-1.00

249.54

244.88

245.22

4.39

4.39

4.39

4.39

249.54

-1.00

242.84

238.42

4.39

4.39

4.39

4.39

244.88

242.84

-1.00

239.51

4.39

4.39

4.39

4.39

245.22

238.42

239.51

-1.00

4.39

4.39

4.39

4.39

4.39

4.39

4.39

4.39

-1.00

83.04

82.14

60.12

4.39

4.39

4.39

4.39

83.04

-1.00

81.93

81.45

4.39

4.39

4.39

4.39

82.14

81.93

-1.00

81.23

4.39

4.39

4.39

4.39

60.12

81.45

81.23

-1.00

Data shown is for process 0.

In the following zoom view, the peaks occur at iterations 5, 13, 21, etc.

 


Experiment 2:  

This is the same problem size as Experiment 1, but here only 7 processors were selected for the run.

N = 12000; NB=64; Processes=7; 

machine opus14 opus13 opus16 opus15 torc4 torc6 torc7
mem(MB) 215 214 227 215 233 479 479
speed 270 270 270 270 330 330 330
load 1.00 0.99 1.00 0.99 1.00 1.04 0.87

Fine grid latency matrix :

-1.00 0.24 0.29 0.26 83.78 83.78 83.78
0.24 -1.00 0.24 0.23 83.78 83.78 83.78
0.29 0.24 -1.00 0.23 83.78 83.78 83.78
0.26 0.23 0.23 -1.00 83.78 83.78 83.78
83.78 83.78 83.78 83.78 -1.00 0.31 0.31
83.78 83.78 83.78 83.78 0.31 -1.00 0.31
83.78 83.78 83.78 83.78 0.31 0.31 -1.00


Fine grid Bandwidth matrix :

-1.00 248.83 247.31 246.38 2.83 2.83 2.83
248.83 -1.00 244.54 240.94 2.83 2.83 2.83
247.31 244.54 -1.00 247.54 2.83 2.83 2.83
246.38 240.94 247.54 -1.00 2.83 2.83 2.83
2.83 2.83 2.83 2.83 -1.00 81.96 56.47
2.83 2.83 2.83 2.83 81.96 -1.00 50.90
2.83 2.83 2.83 2.83 56.47 50.90 -1.00

All data plotted is for process 0.

For Experiment 2, we also collected the raw sensor output for all the processes which shows the iteration duration and the timestamp when the measurement was made.   That information is plotted here:

The following plot shows the ratio of the measured to the predicted values for 3 different "metrics of performance" which might be used as the basis for contract validation.


Experiment 3:  

This is the same problem size as Experiments 1 and 2.  Here 8 systems at UIUC and UCSD were selected.

N = 12000; NB=64; Processes=8; 

machine opus16 opus14 opus13 opus15 dralion mystere quidam soleil
mem(MB) 225 212 214 213 215 210 224 183
speed 270 270 270 270 270 240 240 240
load 1.00 1.00 .84 0.99 1.00 1.00 .64 0.71

Fine grid latency matrix :

-1.00

0.25

0.25

0.30

134.94

134.94

134.94

134.94

0.25

-1.00

0.56

0.35

134.94

134.94

134.94

134.94

0.25

0.56

-1.00

0.23

134.94

134.94

134.94

134.94

0.30

0.35

0.23

-1.00

134.94

134.94

134.94

134.94

134.94

134.94

134.94

134.94

-1.00

0.23

31.41

0.38

134.94

134.94

134.94

134.94

0.23

-1.00

0.24

0.36

134.94

134.94

134.94

134.94

31.41

0.24

-1.00

0.23

134.94

134.94

134.94

134.94

0.38

0.36

0.23

-1.00

Fine grid Bandwidth matrix :

-1.00

253.16

251.58

244.65

5.89

5.89

5.89

5.89

253.16

-1.00

246.61

246.96

5.89

5.89

5.89

5.89

251.58

246.61

-1.00

239.40

5.89

5.89

5.89

5.89

244.65

246.96

239.40

-1.00

5.89

5.89

5.89

5.89

5.89

5.89

5.89

5.89

-1.00

70.90

56.22

38.10

5.89

5.89

5.89

5.89

70.90

-1.00

74.06

83.91

5.89

5.89

5.89

5.89

56.22

74.06

-1.00

71.08

5.89

5.89

5.89

5.89

38.10

83.91

71.08

-1.00

All data plotted is for process 0:

For Experiment 3, we also collected the raw sensor output for all the processes which shows the iteration duration and the timestamp when the measurement was made.   That information is plotted here:


Experiment 4:  

This is the same problem size as Experiments 1, 2, and 3.   Here 7 systems at UIUC and UTK were selected. There was an extremely high network load on the UTK systems during this run.  An additional computational load was introduced on Processor X about 70 iterations into the run.

N = 12000; NB=64; Processes=7; 

machine opus15 opus14 opus16 torc4 torc6 torc5 torc7
mem(MB) 225 225 226 486 486 486 487
speed 270 270 270 330 330 330 330
load 1.00 1.00 1.00 1.00 1.56 1.17 1.11

Fine grid latency matrix :

-1.00

0.29

0.29

193.71

193.71

193.71

193.71

0.29

-1.00

0.22

193.71

193.71

193.71

193.71

0.29

0.22

-1.00

193.71

193.71

193.71

193.71

193.71

193.71

193.71

-1.00

0.32

0.31

0.28

193.71

193.71

193.71

0.32

-1.00

0.49

0.30

193.71

193.71

193.71

0.31

0.49

-1.00

0.31

193.71

193.71

193.71

0.28

0.30

0.31

-1.00

Fine grid Bandwidth matrix :

-1.00

258.52

242.39

0.73

0.73

0.73

0.73

258.52

-1.00

252.43

0.73

0.73

0.73

0.73

242.39

252.43

-1.00

0.73

0.73

0.73

0.73

0.73

0.73

0.73

-1.00

82.01

72.05

49.08

0.73

0.73

0.73

82.01

-1.00

58.88

51.89

0.73

0.73

0.73

72.05

58.88

-1.00

57.30

0.73

0.73

0.73

49.08

51.89

57.30

-1.00

All data plotted is for process 0:

For Experiment 4, we also collected the raw sensor output for all the processes which shows the iteration duration and the timestamp when the measurement was made.   That information is plotted here:

The following plot shows the ratio of the measured to the predicted values for 3 different "metrics of performance" which might be used as the basis for contract validation.


This material is based upon work supported by the National Science Foundation under Grant No. 9975020.

Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.


Department of Computer Science
University of Illinois at Urbana-Champaign

webmaster@renci.org

Last modified: Tuesday, November 20, 2001 01:08 PM