Matrix times Matrix

Presented is the real-time for doing matrix-matrix-multiplication for different numbers of threads. The shown speedup is the ratio between the time on one processor and the time on n processors (aka. the scaling factor). The speedup for one processor (in italic) is the ratio between the time on the E6000 and the respective system.

An overview of the single-cpu-results is given at the and of this page.

Sun E6000

level time using # threads memory
1 4 8 12 16
6 126.5   1.00 32.0   98.8 16.4   96.4 11.5   91.7 9.6   82.4 138  
7 830.6   1.00 216.0   96.1 119.3   87.0 75.2   92.0 57.6   90.1 444  
8 5134.8   1.00 1285.9   99.8 643.9   99.7 443.1   96.6 346.8   92.5 1926  
9 27885.9   1.00 7088.6   98.3 3586.9   97.2 2474.7   93.9 1937.3   90.0 8874  
10 160680.9   1.00 -   - -   - -   - -   - 40900  

Sun SunFire 6800

level time using # threads memory
1 4 8 12 16
6 39.3   3.22 10.4   99.0 5.1   98.1 3.6   91.9 2.8   82.3 138  
7 256.0   3.24 65.8   97.3 34.5   92.8 23.0   92.8 17.4   92.0 444  
8 1508.5   3.40 387.3   97.4 196.7   95.9 135.5   92.8 100.8   93.5 1926  
9 8336.3   3.35 2180.9   95.6 1114.0   93.5 757.2   91.7 566.3   92.0 8874  
10 45192.6   3.27 11762.7   96.1 5958.5   94.8 4146.5   90.8 3118.0   90.6 40040  

Sun SunFire 15k

level time using # threads memory
1 4 8 12 16
6 39.3   3.22 10.2   99.0 5.3   98.1 3.9   91.9 3.6   82.3 138  
7 256.2   3.24 67.8   94.5 34.6   92.6 25.2   84.7 24.1   66.4 444  
8 1544.7   3.32 395.4   97.7 206.3   93.6 148.3   86.8 127.3   75.8 1926  
9 8376.9   3.33 2237.2   93.6 1192.5   87.8 824.8   84.6 758.7   69.0 8874  
10 50474.7   2.93 12187.5   103.5 6185.0   102.0 4762.4   88.3 3887.3   81.2 40040  

HP 9000 Superdome

level time using # threads memory
1 4 8 12 16
6 29.3   4.32 7.4   99.0 3.7   99.0 2.6   93.9 2.0   91.6 137  
7 193.5   4.29 48.8   99.1 24.6   98.3 16.8   96.0 12.9   93.8 444  
8 1147.5   4.47 288.6   99.4 147.3   97.4 99.3   96.3 74.1   96.8 1925  
9 6518.5   4.28 1637.9   99.5 839.2   97.1 577.7   94.0 426.2   95.6 8883  
10 38540.6   4.17 9732.8   99.0 5030.8   95.8 3470.9   92.5 2575.0   93.5 40052  

IBM eServer p690

level time using # threads memory
1 4 8 12 16
6 22.9   5.52 6.3   90.9 4.0   71.6 -   - -   - -  
7 149.1   5.57 41.2   90.5 25.7   72.5 -   - -   - -  
8 842.3   6.10 255.8   82.3 135.9   77.5 -   - -   - -  
9 4738.6   5.88 1302.1   91.0 735.9   80.5 -   - -   - -  
10 25586.8   6.28 7257.3   88.1 4245.4   75.3 3425.7   62.2 3489.7   45.8 40052  

AMD Athlon MP 2000

level time using # threads memory
1 4 8 12 16
6 20.4   6.19 -   - -   - -   - -   - 74  
7 149.7   5.55 -   - -   - -   - -   - 337  
8 998.0   5.14 -   - -   - -   - -   - 1610  

Dell Xeon 2400

level time using # threads memory
1 4 8 12 16
6 20.2   6.26 -   - -   - -   - -   - 74  
7 147.4   5.63 -   - -   - -   - -   - 337  
8 860.8   5.97 -   - -   - -   - -   - 1610  

Overview

In the following table the speedup of the individual systems for the matrix-matrix-multiplication on one processor is presented in a more compact form. The best on each level is printed in red colour.

level SF6800 SF15k HP9000 P690 Athlon
6 3.22   3.22   4.32   5.52   6.19  
7 3.24   3.24   4.29   5.57   5.55  
8 3.40   3.32   4.47   6.10   5.14  
9 3.35   3.33   4.28   5.88   -  
10 3.27   2.93   4.17   6.28   -