Next: Latency Tolerance and Bandwidth Up: ParallelOut-of-core methods for Previous: ParallelOut-of-core methods for

Motivation

N-body methods are used in the numerical simulation of systems ranging from the atomic to the cosmological. In addition, the mathematical techniques developed in conjunction with the N-body problem have found application in areas as diverse as electromagnetic scattering and stochastic process generation. The papers collected in this mini-symposium [4], and its predecessor [1] offer ample evidence of the breadth and importance of N-body methods.

A family of methods, collectively called ``treecodes'', use tree data structures to reduce the time required to approximately evaluate a set of interactions of the form:

where the pairwise interaction F, and the coordinates and are generalizations of the familiar force laws from Newtonian mechanics. Directly evaluating (1) for N right-hand-sides, with N terms in each summation requires requires operations. Treecodes typically produce approximate results in O(N), or operations, depending on the particular algorithm, and on exactly what is being held constant as N increases. Storage is usually proportional to N.

Very roughly speaking, typical astrophysics applications require about 80N bytes of storage and 30000N floating point operations per integration timestep. A ten-million body simulation is nearly state-of-the-art, and few (if any) simulations have been done with more than 50 million bodies. The problem is not lack of CPU cycles - a 200MHz Pentium Pro processor can achieve the cycle count in a couple of months, and a small cluster can reduce the time to a few days. The problem is that memory is too expensive, so that systems with 1-10GB of DRAM are still quite rare, even though readily available CPU technology would allow important work to be done on such a system.

As of October 1996, commodity, mass-market memory prices were about $6/MB, while disk prices were $0.1/MB, almost two orders of magnitude lower. On the other hand, obtainable data rates from disk are in the range of a few MB/s, approximately two orders of magnitude less than from memory. Even worse, the latency for a typical disk access is five orders of magnitude greater than that for a memory reference (10 million ns vs. 60ns). Using disk as dynamic storage will be a challenge, but one that offers the opportunity to greatly reduce the hardware investment required for very large N-body simulation.

We have implemented out-of-core adaptive oct-trees because of our extensive prior experience with their numerical and computational behavior [6]. Our traversal differs substantially from [2, 3], allowing for a flexible criterion that decides whether a multipole is acceptable based on an error estimate that includes both geometry and the contents of the cell. Groups of bodies are tested for acceptability all at once, and if the multipole is unacceptable, we dynamically decide whether to shrink the group or visit the daughter cells of the moment. This we are able to capture the essential features of O(N), , , and ``vectorizing'' algorithms all within the same implementation.

Next: Latency Tolerance and Bandwidth Up: ParallelOut-of-core methods for Previous: ParallelOut-of-core methods for

John Salmon
Wed Jan 1 23:00:51 PST 1997