DARPA HPCS DISCRETE MATH BENCHMARKS ------------------------------------------------------------------------------- bench0 has been replaced by the RandomAccess benchmark found in the HPCchallenge suite. Bob Lucas : 040706 ------------------------------------------------------------------------------- INTRODUCTION There are six (6) benchmarks in the suite. All benchmarks have been coded in Fortran using Fortran 77 (with MIL-SPEC Fortran extensions) or C. The benchmarks have been written for 64-bit machines. The code provided implements a reasonably efficient algorithm. Program documentation includes the statement of the problem, description of the algorithm, definition of variables, and comments preceding sections of code. We are interested in all 6 benchmarks and strongly suggest you to experiment with all 6 programs. The code provided is not straight serial Fortran/C code. The has been written in a "parallel" manner, i.e., written with the understanding that the code would be run on a vector supercomputer or parallel processing machine. However, this does not mean that these benchmarks can be run on a parallel system without any modification to the code. There will be more effort on the part of some vendors than others depending on the architecture and software available on the target system. The goal in coding was not to have any particular architecture perform poorly on any of the benchmarks where it could be helped. Generally, it will be easier to adapt the code to vector or shared memory machines than to other architectures. PROVIDED The tar file provided contains the benchmark code and problem statement. The following delineates the files and directories. Documentation files: README - this file in ASCII SOFTWARE PROVIDED 1. A directory for each benchmark N which includes: makefile mN.f - main program sN.f - data generation routine (where needed) pN.f - production routine which actually does the benchmark cN.f - check routine to verify the results rN.f - output routine For certain benchmarks, there may be other specialized routines. Note that benchmark #11 uses *11.F files. These are a form of Fortran file which must be passed through a C-like preprocessor to generate the *.f files. 2. A utility directory which includes: rand.f - The random number generator to be used in generating data for the benchmark (seed is provided in the code) util.f - A utility file which includes software functions for popcount (tally) and leadz (leading zeroes) to be used when such intrinsics are not available on the target machine. The file also provides the timing functions which return the CPU time and wall clock time. vanio.f - Vanilla Fortran version of I/O routines bufio.f - Buffering routines for asynchronous I/O ssdio.f - I/O routines to use the SSD via SDS. DIRECTIVES Everybody receiving the benchmarks is expected follow the directives outlined below: 1. IEEE 64-bit floating point format format must be used for the floating point problems. 2. Benchmarks are to be run on existing machines. Benchmarking on a simulator will be considered on a case by case basis. Actual runtimes are to be provided. Information on how the problem will scale to a larger system and the expected performance results may be. 3. The parameters specified in the benchmark statements are the parameters the vendor must use to report back results. For those benchmarks with more than one set of parameters, each set is to be used and the results reported back to us. If none of the parameter sets for a given benchmark is suitable for the target machine, other options will be provided in the documentation of each problem. Providing these reports with the specified information are helpful for us to accurately assess the results. 4. The output routine rN.f for each benchmark N will need to be modified by the vendor to include the machine information requested. This subroutine will print out the benchmark results and information on the computer system that was used to execute the code. This information will include: Heading: BENCHMARK # N - Short Descriptive Title Date: Date benchmark was run Timings: Any time statistics we request, e.g, wall clock time and CPU time Output: The output specified in the problem statement. For some (e.g., problem 10), only a subset of the results may be required. Machine Information: A. Hardware configuration: 1. Exact model and serial number of machine being tested (e.g., CRAY YMP 864) 2. Memory a. number of bytes/word b. cycle time c. bandwidth 3. CPUs a. number of CPUs and how many were used for benchmarking the problem b. cache types and sizes c. clock speed 4. Disk sizes, transfer rates, etc. a. hard disk space available b. soft disk space available (e.g. SSD) 5. Special features of benchmarked machine. Vendor will specify which features were used for the benchmark. a. extra logical units b. extra computational units c. special FFT hardware d. asymmetric CPUs B. Software configuration 1. Operating system and its version and date 2. Versions of the compilers used, including any assemblers used for the optimized versions of the problems, and the dates of each compiler. 5. On the first pass, the Fortran code should be run "as is;" the only modifications allowed are those that are necessary to get the code running on a particular machine. The makefile, the timing functions and other routines in the util.f file, and the I/O routines in the vanio.f, bufio.f and/or ssdio.f files will need to be modified for the target machine. Specifying a particular vector length and insertion of message passing primitives, synchronization primitives and parallelization primitives will be permitted. It is understood that the more radical architectures may require more extensive modifications to the code than the conventional types of. Any modifications or insertions must be carefully documented and explained. This information along with a copy of the resulting code must be provided. 6. On the second pass, the high level language code may be refined for the particular machine. This refinement may vary from merely inserting compiler directives to actually re-coding in assembly; the vendor may determine the level of effort. All code must be made available to us. A record must be kept of all modifications and must include a percentage estimate of how much code was modified and an estimate of how much time was expended. Information about step-wise refinement, if appropriate, should also be included. Any optimizations used must be clearly explained. If a different algorithm is used, this algorithm must also be given to us. Note: Directives 5 and 6 above represent our major goals in this benchmarking effort: a) to determine how well the target machine does on our problems with minimal work on our part (measures portability); and b) to determine how well the target machine does on our problems when someone takes the time to optimize the problems (indication of what we can expect to get from the compiler and/or the hardware if we choose to invest time and effort in optimizing). 7. For each benchmark, the data generated will be stored on that memory "level" (e.g., core, SSD, disk, etc.) which is large enough to hold it. This also applies to the output. The vendor will specify which memory level was used for the input and which level was used for the output. 8. For each benchmark, the vendor should report the compiler, cpu and vectorization/parallelization options used for that problem. System libraries used should also be reported for each problem. 9. For each benchmark, the vendor must include a discussion of the analysis of the problem, how the problem was implemented, and why the particular timings were obtained, i.e., what architectural characteristics or compiler features contributed to the good (or poor). We highly recommend, encourage and are willing to talk with vendors desiring to present their experiences working with the benchmark codes. We strongly suggest providing data into allow us better prepare for these meetings. 10. We reserve the right to verify code and timings on the vendor's system where possible.