DARPA HPCS
Discrete Mathematics Benchmarks
|
|
|
|
DARPA HPCS Discrete Math Benchmarks
|
|
|
Overview
There are six (6) benchmarks in the DARPA HPCS Discrete Mathematics suite
for high productivity computing.
All have been coded in Fortran 77 with MIL-SPEC Fortran extensions or in C.
The benchmarks have been written for 64-bit machines.
In addition, some of the benchmarks are available in MPI, Shmem, and UPC.
The code provided implements a reasonably efficient algorithm.
Program documentation includes the statement of the problem, description of
the algorithm, definition of variables, and comments preceding sections of code.
DARPA is interested in all six benchmarks and therefore strongly encourages
experimentation with all of them.
Some of these benchmarks have specified sizes for particular data structures.
Random Access scales to fill the memory of whatever system it runs on.
One can expect that the scale of the other HPCS Discrete Math benchmarks
will likely change as a function of time.
Some
rules
for these benchmarks exist.
This information has been transcribed from information provided by Bob Lucas
of the Information Sciences Institute of the University of Southern California.
We hope that the transcriptions have been accurate;
any transcription errors are ours and not theirs.
Further information on HPC benchmarks, including some results
on Benchmark 0 (Random Access) can be found at an
HPC Challenge
website at the University of Tennessee.
|
|
0. Random Access
Let
T
be a table of size
2^n
and let
S
be a table of size
2^m
filled with random 64-bit integers. Let
{A_i}
be a stream of
64-bit integers of length
2^{n+2}
generated by the primitive polynomial over
GF(2),
X_{63}+1.
For each
a_i,
set
T[a_i <63, 64-n>] = T[a_i <63,64-n>] + S[a_i < m-1,0 >].
where "+" denotes addition in
GF(2),
i.e. exclusive "or", and
a_i
denotes a sequence of bits within
a_i
such that
are the lowest
m
bits.
Parameters;
-
Part a:
-
m = 9
-
n = 8 , 9, 10, ...,
maximum usable integer
-
Part b:
-
m
such that
2^m
is half the size of cache
-
n
such that
2^n
is half of memory
Some
further information
is also available, as well as a
description
of some sample performance.
|
|
1. Multiple Precision Arithmetic
Given two
N x N
matrices with entries in the ring of integers modulo some number
M, compute the product of the matrices.
Parameters:
N = 225,
M = 7^{183} - 6
|
|
2. Dynamic Program
Given:
-
A_1, A_2 , ..., A_K, N x N floating point matrices
(Assume all values in the A matrices are nonnegative)
-
D, a vector of length T, containing 8-bit integers where
1 <= D(t) <= K for t = 1, 2,...,T
Calculate:
-
Y_0, Y_1, ... Y_T,
64-bit floating point vectors of length N
-
P_1, P_2, ... P_T,
32-bit integer vectors of length N
-
B, a T+1-long 32-bit integer vector where
1 <= B(t) <= N for t = 0, 1, 2, ..., T
according to the following algorithm.
-
Y_0 (i) = 1 for i = 1, 2, ..., N
-
For t = 1, 2, ..., T
-
k = D(t)
-
For j = 1, 2, ..., N
-
Y_t (j) = max_i { Y_{t-1} (i) * A_k (i, j) }
-
P_t (j) = any i which maximized Y_t (j)
-
Next j
-
Next t
-
Let Z = max_i { Y_T } (i) and let I be any
i which maximized Z.
-
Set B(T) = I
-
For t =T - 1, T - 2, ..., 1, 0
-
Set B(t) = P_{t+1} (B(t+1))
-
Next t
-
Output Z and B(t) for t = 0, 1, 2, ..., T.
Parameters:
-
Dense case
-
N = 10, K = 30, T = 100,000
-
A_i ( i= 1,2 ,..., K) is dense.
-
Sparse case
-
N = 10^6, K = 2, T = 2000
-
A_i (i= 1,2, ..., K) has at most 4 non-zero elements per
column and at most 4 non-zero elements per row.
|
|
3. Data Transposition
Let {A_i} be a stream of n bit integers of length L.
Consider each successive block of n integers as an n x n
matrix of bits.
For each such matrix, transpose the bits such that bit b_{ij} is
interchanged with bit b_{ji}.
The resulting stream {A'_i} will have the property that:
-
A'_1 ... A'_{n+1} A'_{2n+1}
etc., will hold sequentially the high order bits of
{A_i}
-
A'_2 ... A'_{n+2} A'_{2n+2},
etc., will hold sequentially the next-to-high order bits of
{A_i},
-
and so forth.
Output the stream
{A'_i}.
Parameters:
-
n = 32, L = 10^7, ITER = 400
-
n = 64, L = 10^7, ITER = 230
-
n = 1024, L = 10^7, ITER = 12
|
|
4. Integer Sort
Given
{ A_i }, a stream of unsigned
n-bit integers of length
N,
generate the stream
{ B_i }
which is
{ A_i }
sorted in ascending order.
In-place sorting is not required and there may be non-unique
integers in the stream
Output selected values (as specified in the program code) from the stream
{ B_i }.
Parameters:
-
Part a: In-core sort
-
n = 64, N = 10^6, ITER = 300
-
n = 128, N = 5*10^7, ITER = 1
-
Part b: Secondary memory sort
This is a disk-based sort that uses an in-place radix sort for in-core
sorting and a 2-way merge sort for the disk sorting.
BASIC ALGORITHM:
Given a file of random number of length
RUNSIZE*NUMRUNS, break it into
NUMRUNS
runs of length
RUNSIZE.
Sort each run independently and write it to disk.
The runs are merge sorted together by making several passes through
the data, creating successively larger runs until the whole file is sorted.
MEMORY REQUIREMENTS:
For maximum efficiency set
RUNSIZE
as large as possible.
It will be approximately
1/4
the system memory.
There are two input buffer arrays of size
RUNSIZE
and one output
buffer of size
(2*RUNSIZE).
There is one additional array used by the radix sort of size
(64*65536).
DISK REQUIREMENTS:
The disk requirements are quite large.
There is a file of random numbers
RUNSIZE*NUMRUNS
in size.
There are also four files on disk that simulate tapes (T1, T2, T3, T4)
and total
2*RUNSIZE*NUMRUNS
in size.
These five files can be placed in any directory by identifying said
directories on the command-line.
|
|
5. Boolean Equation Solving
Let {s_i} be a stream of binary digits of length L.
Search for all occurrences of the pattern P where certain bits in the
pattern may be either 1 or 0, and let j be
the position in the stream immediately after an occurrence of P.
Example:
Assume P = 000???100??110?10?1111,
where `?' can be either 1 or 0.
Then in the stream we might have an alignment
011010001110101000001011001111001001111101...
000???100??110?10?1111
<- - - - - P - - - - >j
Starting at position j, form 700 equations
in 700 unknowns over
GF(2) as follows:
s_{j}x_1 + s_{j+1}x_2 + ... + s_{j+699}x_{700} = s_{j+700} (1)
s_{j+701}x_1 + s_{j+702}x_2 + ... + s_{j+1400}x_{700} = s_{j+1401}(2)
s_{j+1402}x_1 + s_{j+1403}x_2 + ... + s_{j+2101}x_{700} = s_{j+2102}(3)
. .
. .
. .
s_{j+489999}x_1 + s_{j+490000}x_2 + ... + s_{j+490698}x_{700} = s_{j+490699} (700)
For each occurrence of the pattern P in {s_i},
form the above equations
and determine whether the equations are dependent or independent.
If the equations are independent, then find the unique solution.
Gaussian elimination over GF(2) is the suggested method.
Output for all occurrences of P the list (j, flag, solution)
where:
-
j = starting position of the equations
-
flag = { 1, if the equations are independent; 0, otherwise}
-
solution = { unique Gaussian elimination solution (if it exists);
0, otherwise}
Parameter:
L = 10^7
|
|
|
|
|