Matrix times Matrix
Presented is the real-time for doing matrix-matrix-multiplication for
different numbers of threads. The shown speedup is the ratio between the time on
one processor and the time on n processors (aka. the scaling
factor). The speedup for one processor (in italic) is the ratio between the time
on the E6000 and the respective system.
An overview of the single-cpu-results is given at the
and of this page.
Sun E6000
| level |
time using # threads |
memory |
| 1 |
4 |
8 |
12 |
16 |
| 6 |
126.5 |
1.00 |
32.0 |
98.8 |
16.4 |
96.4 |
11.5 |
91.7 |
9.6 |
82.4 |
138 |
| 7 |
830.6 | 1.00 |
216.0 | 96.1 |
119.3 | 87.0 |
75.2 | 92.0 |
57.6 | 90.1 |
444 |
| 8 |
5134.8 | 1.00 |
1285.9 | 99.8 |
643.9 | 99.7 |
443.1 | 96.6 |
346.8 | 92.5 |
1926 |
| 9 |
27885.9 | 1.00 |
7088.6 | 98.3 |
3586.9 | 97.2 |
2474.7 | 93.9 |
1937.3 | 90.0 |
8874 |
| 10 |
160680.9 | 1.00 |
- | - |
- | - |
- | - |
- | - |
40900 |
Sun SunFire 6800
| level |
time using # threads |
memory |
| 1 |
4 |
8 |
12 |
16 |
| 6 |
39.3 |
3.22 |
10.4 |
99.0 |
5.1 |
98.1 |
3.6 |
91.9 |
2.8 |
82.3 |
138 |
| 7 |
256.0 | 3.24 |
65.8 | 97.3 |
34.5 | 92.8 |
23.0 | 92.8 |
17.4 | 92.0 |
444 |
| 8 |
1508.5 | 3.40 |
387.3 | 97.4 |
196.7 | 95.9 |
135.5 | 92.8 |
100.8 | 93.5 |
1926 |
| 9 |
8336.3 | 3.35 |
2180.9 | 95.6 |
1114.0 | 93.5 |
757.2 | 91.7 |
566.3 | 92.0 |
8874 |
| 10 |
45192.6 | 3.27 |
11762.7 | 96.1 |
5958.5 | 94.8 |
4146.5 | 90.8 |
3118.0 | 90.6 |
40040 |
Sun SunFire 15k
| level |
time using # threads |
memory |
| 1 |
4 |
8 |
12 |
16 |
| 6 |
39.3 |
3.22 |
10.2 |
99.0 |
5.3 |
98.1 |
3.9 |
91.9 |
3.6 |
82.3 |
138 |
| 7 |
256.2 | 3.24 |
67.8 | 94.5 |
34.6 | 92.6 |
25.2 | 84.7 |
24.1 | 66.4 |
444 |
| 8 |
1544.7 | 3.32 |
395.4 | 97.7 |
206.3 | 93.6 |
148.3 | 86.8 |
127.3 | 75.8 |
1926 |
| 9 |
8376.9 | 3.33 |
2237.2 | 93.6 |
1192.5 | 87.8 |
824.8 | 84.6 |
758.7 | 69.0 |
8874 |
| 10 |
50474.7 | 2.93 |
12187.5 | 103.5 |
6185.0 | 102.0 |
4762.4 | 88.3 |
3887.3 | 81.2 |
40040 |
HP 9000 Superdome
| level |
time using # threads |
memory |
| 1 |
4 |
8 |
12 |
16 |
| 6 |
29.3 |
4.32 |
7.4 |
99.0 |
3.7 |
99.0 |
2.6 |
93.9 |
2.0 |
91.6 |
137 |
| 7 |
193.5 | 4.29 |
48.8 | 99.1 |
24.6 | 98.3 |
16.8 | 96.0 |
12.9 | 93.8 |
444 |
| 8 |
1147.5 | 4.47 |
288.6 | 99.4 |
147.3 | 97.4 |
99.3 | 96.3 |
74.1 | 96.8 |
1925 |
| 9 |
6518.5 | 4.28 |
1637.9 | 99.5 |
839.2 | 97.1 |
577.7 | 94.0 |
426.2 | 95.6 |
8883 |
| 10 |
38540.6 | 4.17 |
9732.8 | 99.0 |
5030.8 | 95.8 |
3470.9 | 92.5 |
2575.0 | 93.5 |
40052 |
IBM eServer p690
| level |
time using # threads |
memory |
| 1 |
4 |
8 |
12 |
16 |
| 6 |
22.9 |
5.52 |
6.3 |
90.9 |
4.0 |
71.6 |
- |
- |
- |
- |
- |
| 7 |
149.1 | 5.57 |
41.2 | 90.5 |
25.7 | 72.5 |
- | - |
- | - |
- |
| 8 |
842.3 | 6.10 |
255.8 | 82.3 |
135.9 | 77.5 |
- | - |
- | - |
- |
| 9 |
4738.6 | 5.88 |
1302.1 | 91.0 |
735.9 | 80.5 |
- | - |
- | - |
- |
| 10 |
25586.8 | 6.28 |
7257.3 | 88.1 |
4245.4 | 75.3 |
3425.7 | 62.2 |
3489.7 | 45.8 |
40052 |
AMD Athlon MP 2000
| level |
time using # threads |
memory |
| 1 |
4 |
8 |
12 |
16 |
| 6 |
20.4 |
6.19 |
- |
- |
- |
- |
- |
- |
- |
- |
74 |
| 7 |
149.7 |
5.55 |
- |
- |
- |
- |
- |
- |
- |
- |
337 |
| 8 |
998.0 |
5.14 |
- |
- |
- |
- |
- |
- |
- |
- |
1610 |
Dell Xeon 2400
| level |
time using # threads |
memory |
| 1 |
4 |
8 |
12 |
16 |
| 6 |
20.2 |
6.26 |
- |
- |
- |
- |
- |
- |
- |
- |
74 |
| 7 |
147.4 |
5.63 |
- |
- |
- |
- |
- |
- |
- |
- |
337 |
| 8 |
860.8 |
5.97 |
- |
- |
- |
- |
- |
- |
- |
- |
1610 |
In the following table the speedup of the individual systems for the
matrix-matrix-multiplication on one processor is presented in a
more compact form. The best on each level is printed in red colour.
| level |
SF6800 |
SF15k |
HP9000 |
P690 |
Athlon |
| 6 |
3.22 |
3.22 |
4.32 |
5.52 |
6.19 |
| 7 |
3.24 |
3.24 |
4.29 |
5.57 |
5.55 |
| 8 |
3.40 |
3.32 |
4.47 |
6.10 |
5.14 |
| 9 |
3.35 |
3.33 |
4.28 |
5.88 |
- |
| 10 |
3.27 |
2.93 |
4.17 |
6.28 |
- |