This document describes the Java Grande Forum and includes its initial deliverables. These are reports that convey a succinct set of recommendations from this forum to Sun Microsystems and other purveyors of Java technology that will enable Grande Applications to be developed.
The notion of a Grande Application (GA) is familiar to many researchers in academia and industry but the term is new. In short, a GA is any application, scientific or industrial, that requires a large number of computing resources, such as those found on the Internet, to solve one or more problems. Examples of Grande Applications are presented in this report as well as a discussion of why we believe Java technology has the greatest potential to support the development of Grande Applications.
The forum is motivated by the notion that Java could be the best possible Grande application development environment and the extensive use of Java could greatly help the large scale computing and communication fields. However this opportunity can only be realized if important changes are made to Java in its libraries, language and perhaps Virtual Machine.
The major goal of the forum is to clearly articulate the current problems with Java for Grande Applications and detail the requirements, analysis and suggestions for specific changes. It will also promote and energize widespread community activities investigating the use of Java for Grande Applications. The forum is open and operates with a mix of small working groups and public dissemination and request for comments on its recommendations The recommendations of the forum are intended primarily for those developing Java Grande base resources such as libraries and those directly influencing the direction of the Java language proper. (Presently, this implies Sun Microsystems or any standards body that may be formed.)
Java has potential to be a better environment for Grande application development than languages such as Fortran and C++. The goal of the Java Grande Forum (hereafter, JGF) is to develop community consensus and recommendations for either changes to Java or establishment of standards (frameworks) for Grande libraries and services. These language changes or frameworks are designed to realize the best ever Grande programming environment.
The Java Grande Forum does not intend to be a standards body for the Java language per se. Rather, JGF intends to act in an advisory capacity to ensure those working on Grande applications have a unified voice to address Java language design and implementation issues and communicate this input directly to Sun or a prospective Java standards group.
The remainder of this document is dedicated to addressing the following questions.
What is a Grande Application? What is an example of a Grande Application? What makes a Grande Application different from other applications? Why do we insist on Java? Are we saying there is no room for other languages? What is the Java Grande Forum? How can my organization or I participate? When is the next meeting? How can I participate? What is expected? What are the planned deliverables?
Following the discussion of these general questions, we present the preliminary reports of the two JGF working groups: the Numerics group and the Applications/Frameworks group.
This section addresses the questions of immediate interest: What is a Grande Application? What is an example of a Grande Application? Why are Grande Applications important? After this, we will discuss the relevance of Java.
Grande Applications are suddenly everybodys interest. The explosive growth of the number of computers connected to the Internet has led many researchers and practitioners alike to consider the possibility of harnessing the combined power of these computers and the network connecting them to solve more interesting problems. In the past, only a handful of computational scientists were interested in such an idea, working on the so-called grand challenge problems, which required much more computational and I/O power than found on the typical personal computer. Specialized computing resources, called parallel computers, seemingly were the only computers capable of solving such problems in a cost-effective manner. The advent of the more powerful personal computers, faster networks, widespread connectivity, etc. has made it possible to solve such problems even more economically, simply by using ones own computer, the Internet, and other computers.
With this background, a Grande Application is therefore defined as an application of large-scale nature, potentially requiring any combination of computers, networks, I/O, and memory. Examples are:
Commercial: Datamining, Financial Modeling, Oil Reservoir Simulation, Seismic Data Processing, Vehicle and Aircraft Simulation
Government: Nuclear Stockpile Stewardship, Climate and Weather, Satellite Image Processing, Forces Modeling,
Academic: Fundamental Physics (particles, relativity, cosmology), Biochemistry, Environmental Engineering, Earthquake Prediction
Grande Applications may be categorized in several ways:
A question that naturally arises is why use Java in Grande applications? The Java Grande Forum believes that, more than any other language technology introduced thus far, Java has the greatest potential to deliver an attractive productive programming environment spanning the very broad range of tasks needed by the Grande programmer. Java offers from a combination of its design features and the ready availability of excellent Java instructional material and development tools.
The Java language is not perfect; however, it promises a number of breakthroughs that have eluded most technologies thus far. Specifically, Java has the potential to be written once and run anywhere. This means, from a consumer standpoint, that a Java program can be run on virtually any conceivable computer available on the market. While this could be argued for C, C++, and FORTRAN, true portability has not been achieved in these languages, save by expert-level programmers.
While JGF is specifically focused on the use of Java to develop Grande Applications, the forum is not concerned with the elimination of other useful frameworks and languages. On the contrary, JGF intends to promote the establishment of standards and frameworks to allow Java to use other industry and research services, such as Globus and Legion. These services already provide many facilities for taking advantage of heterogeneous resources for high-performance computing applications, despite having been implemented in languages other than Java.
The forum intends a set of working meetings with a core group of active participants. These will produce reports, which are reviewed in public forums and transmitted appropriately within the cognizant bodies within the Java and computational fields.
The forum is open to any qualified member of academia, industry or government who is willing to play an active role. The summary of our last meeting in section 1.4 illustrates our approach. Our first major public meeting will be held on November 13,98 as a 3 hour panel session at SC98 in Orlando.
For more information on the forum itself and to provide comments, please direct e-mail to the academic coordinator (Geoffrey Fox); liaisons with Sun Microsystems (Sia Zadeh and John Reynders); Numerics working group leads (Ron Boisvert and Roldan Pozo); and Applications/Concurrency working group leads (Dennis Gannon and Denis Caromel )
You may also wish to visit our web site, located at http:///www.javagrande.org, which provides information about the Java Grande Forum activities, products, and upcoming events. See also http://math.nist.gov/javanumerics/.
Two relevant mailing lists are email@example.com for current forum members and firstname.lastname@example.org for a more general open group of individuals interested in this area.
The Second Java Grande Forum meeting was held May 9-10 98 in Palo Alto. It was sponsored by Sun Microsystems (Siamak Hassanzadeh), and coordinated by Geoffrey Fox with George Thiruvathukal as secretary. The first meeting of the Forum was March 1998. Both of the initial meetings had over 30 participants from academia, industry and government. The meeting started with technology updates from Sun (their Hotspot optimizing compiler and the Java Native code Interface JNI) and IBM (Marc Snir on the performance of Java in scientific computing).
Then we pursued the classic mix of parallel and plenary sessions using two working groups: Numerics and Libraries led by Roldan Pozo and Ron Boisvert of NIST and Applications and Concurrency led by Dennis Gannon from Indiana.
Both groups made good progress and their reports were made available by early June. These are used here to build the Charter document defining the Forum. After appropriate review of our suggestions by key scientific computing communities, we expect to submit a set of near term action items to JavaSoft. These will contain our proposals in the areas described in section 1.5 and will relate our numerics proposals to the presentations by James Gosling at SC97 and "Java Grande 98" (Feb28-Mar 1). Our proposal to JavaSoft will also discuss the Java VM and RMI enhancements summarized in section 1.6, needed for scaling Java to large-scale concurrent applications.
We divided our action items into three categories
- Proposals to JavaSoft as discussed above. These were further divided into either essential or desirable.
- Community activities to produce infrastructure and standards.
- Community research which will clarify the value of new activities of type 1) and 2)
Action items of type 2) include standard interfaces and reference implementations for Java libraries of Math functions, matrix algebra, signal processing etc. We also proposed a Java Grande application benchmark suite with kernels and more substantial applications. There was significant discussion of the importance of a "Java Framework for computing" -- a set of interfaces to support seamless computing or the ability to run a given job on any one of many different computers with a single client interface. A typical community research activity is the study of the scaling of the Java Virtual Machine to large applications or understanding the tradeoffs between Java thread and distributed VM forms of parallelism.
Section 2 contains full details. This working group is currently studying:
Section 3 contains full details. The working group is currently studying:
Goals and Mission of Numerics Working Group
If Java is to become the environment of choice for high-performance scientific applications, then it must provide, for large scale floating-point computations, performance comparable to what is achieved in currently used programming languages (C or Fortran). In addition, it must have language features and core libraries that enable the convenient expression of mathematical algorithms. The goal of this working group is to assess the suitability of Java for numerical computation, and to work towards community consensus on actions which can be taken to overcome deficiencies of the language and its run-time environment. In this report, we present preliminary findings of the working group.
We begin by outlining critical issues that impede Java's effectiveness in applications that are dominated by the use of floating-point arithmetic. Unless these issues are satisfactorily resolved, it is unlikely that the numerical computation community will accept Java. This can impact the entire Java enterprise by slowing the dissemination of high quality components for solving commonly occurring mathematical and statistical problems.
For each issue, we present solutions recommended by the working group. In selecting such solutions, the working group has been careful to balance the needs of the numerical community with those of Java's wider audience. The proposed solutions require additions to the current Java and JVM design. We have tried to minimize the changes required in Java, relying on compiler technology, whenever feasible. This minimizes the changes that affect all Java platforms, and enable implementers to optimize for high numerical performance only in those environments where such an effort is warranted.
Requirement: The Complex field is an essential tool in the analysis and solution of mathematical problems in all areas of science and engineering. Thus, it is essential that the use of complex numbers be as convenient and efficient as the use of floats and doubles.
Possible Solutions: The obvious solution is to develop a straightforward complex class with methods for each arithmetic operation and use such objects as needed.
There are several reasons why this approach fails.
(a) The object overhead of complex methods makes them unacceptably inefficient.
(b) The semantics of complex objects are different from those of floats and doubles. For example, the = and == operators manipulate references rather than values. Such differences lead to many errors.
(c) Use of method calls for elementary arithmetic operations leads to inscrutable code, which is very tedious to write and debug. Users would simply stay away. The ideal solution is to add new base complex types to the language on par with float and double. This, of course, requires a significant change in the language and in JVM to satisfy the needs of a relatively small community. In particular, it requires the addition of a significant number of new opcodes.
An alternative solution acceptable to the working group requires that the following actions occur:
(1) A complex arithmetic package be developed and included as a core Java package, perhaps as a subpackage in Java.math. Such a package will support assignment by value, and standard arithmetic operations and relations on complex values.
(2) Use of these classes is made as efficient as float or double. This may require an extension to Java and JVM in support of lightweight classes; see following sections. This also requires the cooperation of compiler writers to use the opportunity provided to generate efficient code.
(3) Operator overloading can be used to bind natural notation for arithmetic, logical and assignment operators to the methods of the complex classes; see following sections. This alternate suite of changes requires fewer changes to Java and JVM, but will require more compilation efforts for an efficient implementation. Lightweight classes and operator overloading are general mechanisms, which can satisfy the needs of many groups for alternate arithmetic systems, such as interval and multiple precision. In addition to complex arithmetic, the Java.math library should be extended to support complex transcendental functions. The current proposal assumes that complex numbers are pairs of doubles. It is deemed acceptable (at least initially) not to support complex numbers with float components.
Requirement: Implementation of alternative arithmetic systems, such as complex, interval, and multiple precision requires the support of new objects with value semantics. Compilers should be able to inline methods that operate on such objects and avoid the overheads of additional dereferencing. In particular, lightweight classes are critical for the implementation of complex arithmetic as described in Issue 1. A lightweight class is final. It holds a value and supports deep assignments and deep comparisons (who work on the object value, not its reference).
a.assign(b), assigns to object a the value of object b.
a.equals(b), tests that objects a and b have the same value.
Lightweight objects will usually support additional unary and binary operators. Note that there is no requirement for lightweight classes to be immutable (i.e. instance variables need not be final). Immutability would lead to unnecessary copying, preventing the updating of array of objects in place, which is a key need of the numerical community.
Possible Solutions: There are two alternatives to the design of lightweight objects.
(1) Lightweight objects are new types of objects in the Java language. They are explicitly declared as such. A lightweight object is always accessed by value, as if it had a primitive Java type. : e.g., the value of a complex variable is a complex number, not an object reference. Instances of lightweight objects can be assigned (using the assign operator) and can be compared (using the equals operator). On the other hand, if v is an instance of a lightweight object, then v = null or v == null are illegal expressions. Lightweight objects are passed by value in method invocations. Methods on such objects cannot be synchronized. It is expected that compilers will inline invocations of standard methods (assign, equals, ...) on such objects.
(2) Lightweight objects are regular Java objects, and are accessed by reference. Since they are final, and since the Java (back end) compiler has full knowledge of the semantics of the methods applied on these objects, it is expected that compilers will inline invocations of assign, equals and other predefined methods.
The first scheme is likely to lead to the best performance: one always saves the storage required for a Java object: a complex will always require two words of storage, no more. No additional referencing is needed to access an object value. Garbage collection for such objects is simplified.
On the other hand, this scheme seems to require significant changes in the Java language and the JVM. For example, the JVM instruction set does not support method invocations that return non-scalar values. It does not support arrays with entries that are not of a primitive or a reference type. Modifications will also be required in the Java program verifier.
The second scheme is more dependent on compiler optimization techniques for performance: a compiler will generally be able to inline invocations of predefined final methods. However, a lean storage layout, that holds data but no object descriptor, can be generated only if the compiler can determine that the object is not accessed by reference. Changes in the garbage collector might also be required in order to support a lean layout.
The second scheme has the added advantage that it supports both deep and shallow assignments or comparisons on lightweight objects. Thus, it provides functionality equivalent to this achieved in C or Fortran by the use of pointers. In this scheme, lightweight objects are always passed as reference arguments; this provides more flexibility (e.g., allowing a method to return multiple values).
Finally, the second scheme does not require changes in Java or JVM specifications. There are several alternative design points that should be evaluated. If the use of lightweight objects is restricted to predefined classes, such as Complex, then the inlining could be done by the front-end compiler.
However, such an approach does not extend to user-defined lightweight classes. It makes harder optimizations by back-end (dynamic or static) compilers. Such a choice also impacts debuggers and other tools. An alternative approach to lightweight objects is to treat them as regular Java objects that must obey certain restrictions; e.g., no (reference) assignment; lightweight objects cannot be components of regular Java arrays or regular Java structures (they can be components of the special rectangular Java arrays, which are defined below).
This approach would still require Java language extensions (lightweight classes have to be declared as such, and the restrictions need to be spelled out). However, no changes are needed in JVM (beyond carrying in the class file an attribute that marks the class as lightweight). Back-end compilers might be able to optimize code better using such lightweight objects, because of the added constraints.
Requirement: Usable implementation of complex arithmetic, as well as other alternative arithmetics such as interval and multiprecision, requires that code be as readable as those based only on float and double.
Possible Solutions: Operator overloading is the obvious solution to this problem. Without it, codes implementing complex arithmetic would be extremely difficult to develop, understand and maintain. Such a code will look very different from similar code using real arithmetic, thus burdening library developers. E.g., a simple statement such as
a = b+c*d
will be replaced by
Without operator overloading, a large portion of the scientific computing community would choose to avoid Java as being too unfriendly.
Only a limited facility for operator overloading is necessary to fulfill this requirement. If the first scheme is used, so that an object supports either shallow or deep assignments, but not both, the assignment operator = will be overloaded to signify deep assignment: a = b is interpreted as a.assign(b)). Similarly, a == b is syntactic sugar for a.equals(b), if a and b are lightweight objects. On the other hand, if the second scheme is used, then new operators are needed for deep assignments and comparisons. Thus a <- b is syntactic sugar for a.assign(b) and a === b is syntactic sugar for a.equals(b). (It may be desirable to introduce new operators even if the first scheme is used, so as to reduce confusion. It may also be desirable to allow '<-' to be used for assignment of primitive types, for consistency.)
Thus, if the second scheme is used, then
Complex c = new Complex(0.0, 0.0);
Complex d = new Complex(1.0, 1.0);
e = c;
c <- d;
System.out.println(e.real + "," e.imag);
The arithmetic, assignment and logical operators must be extendable by overloading, using their natural notation. One may have a predefined naming scheme for methods that overload existing operators: 'sum' for '+', 'product' for '*', etc. It is not necessary to admit the introduction of new operators (beyond assign and equals). Thus, no new syntax is required for operator overloading, except that predefined operators apply to lightweight objects. An expression of the form 'a+b' is merely syntactic sugar for sum(a,b) (and is illegal if the 'sum' method is not defined on a and b).
There are several alternatives that should be evaluated.
Operator overloading may be restricted to predefined lightweight objects (such as complex) or extended to user-defined lightweight objects, or extended to arbitrary classes. It is reasonable to couple operator overloading with lightweight objects: the language conveys the right intuition by using operators for 'cheap' operations and method invocations for expensive operations.
Binary operators, such as 'sum' can be restricted to the case were both operands are of the same type, or extended to operands of distinct types. Type promotion is very natural in many cases (e.g., real*complex), and should be supported.
Requirement: The high efficiency necessary for large-scale numerical applications requires aggressive exploitation of the unique facilities of local floating-point hardware. The current insistence of bitwise reproducibility of results on all JVMs makes it impossible to satisfy this requirement. Efficient processing of Java programs requires that compilers and JVMs provide the option to
(a) Use IEEE extended arithmetic hardware anywhere in the computation
(b) Use the associative law to rearrange to order of computation
(c) Use possibly unsafe identities to eliminate computations
The use of IEEE extended arithmetic in intermediate computations can improve the accuracy and reliability of numerical results. Processor with hardware support for IEEE extended should NOT be required to round intermediate results. This slows the computation and makes it less accurate. There should not be a requirement that such a store be forced on each assignment statement in a user's program. Rounding should be done by the compiler when necessary. For example, on a machine with extended precision registers, rounding should occur only when registers must be spilled to memory. On a machine with fused multiply-add, a multiplication followed by an addition should always be replaceable by a fused multiply-add.
The associative law can be used by optimizing compilers to reorder arithmetic operations in order to make more efficient use of hardware. Such optimizations are crucial to improving the performance of numerical codes, and users should have access to this technology. A typical example for such an optimization is the use of associativity to execute a reduction in parallel.
On the other hand, bitwise reproducibility is important for code testing and is needed in many environments. Users may want to ensure strict reproducibility by enforcing the default Java model. Programmers may want to disable unsafe optimizations for selected codes in order to have better control on the execution (e.g., guarantee better precision).
Possible Solutions: We do not feel that large changes to the Java specification are needed to satisfy the needs of the numerical community on this issue. In particular, the current JVM specification for bitwise reproducibility can remain the default behavior. Instead, what is needed is
(a) JVM flags that allow the user to select efficiency over reproducibility at runtime.
(b) A class and method modifier, StrictNumerics, which specifies that the given class or method must adhere to the more restrictive Java arithmetic specification, regardless of flags which may be set by the user.
The number and semantics of runtime flags should be left up to the JVM developer. The important features of this proposal are that Strict Java semantics apply, by default.
JVMs may provide environment flags to overrule strict Java semantics. These would be similar to compiler optimization options. Thereby, the user can decide to select efficiency over reproducibility at runtime.
Developers can shield critical segments of code where these relaxations should never occur, using the StrictNumerics attribute.
The existing proposals for LooseNumerics and IdealizedNumerics seem unnecessarily complicated -- we do not feel that code developers will want this level of fine grain control. The implementation mechanism would be the same as in these proposals: the StrictNumerics attribute is carried in the class file, and observed by back-end compilers. Also, while optimizations may change the numerical outcome of a computation, or even cause a Not a Number value to be returned, rather than a regular value, the optimizations should still preserve the "precise exception" model of Java. Null pointer or index out of bound exceptions should occur in the optimized code in the same state as they would have occurred in the unoptimized code.
Requirement: Operations on multidimensional arrays of base types must be easily optimized. In addition, the memory layout of such arrays must be known to the algorithm developer in order to process array data in the most efficient way. The performance of Java code can suffer from a deterioration of up to 25%, because of the lack of true rectangular arrays. For native Java arrays code generated for column traversal is less efficient because of pointer chasing. Compiler elimination of run time tests for null pointers and out of bound indices is harder if arrays can be jagged, or can change shape at run time. More significantly, disambiguation is hard: even if two 2D arrays are not identical, they may still share a row. This forces compilers to generate superfluous stores because of potential aliasing. Finally, a clearly defined memory layout with guaranteed locality of data would allow developers to devise algorithms, which can be processed more efficiently.
Possible Solutions: We propose that standard Java classes be developed which implement multidimensional rectangular arrays, and that these be included as subpackage in Java.math. These classes would store multidimensional arrays internally so as to provide access that is as efficient as if the arrays were stored in a canonical order (e.g., row-major). The classes would support 1D, 2D, 3D, and possibly 4D...7D arrays with Int, Long, Float, Double and Complex entries (a different class is needed for each dimensionality and each element type -- since Java does not support templates). The classes provide the following methods.
(a) Get and set to access and update an array entry.
(b) Operations that correspond to Fortran 90 array intrinsics. In particular:
(b.1) Operations to access the number of dimensions and the extends of an array.
(b.2) Operations to reshape and transpose an array.
(b.3) Elemental conversion functions (e.g., the equivalent of REAL and AIMAG, which convert complex arrays into double arrays.
(b.4) Elemental transcendental functions
(b.5) Elemental boolean functions
(b.6) Array reduction functions (sum, minval, etc.)
(b.7) Array construction functions (merge, pack, spread, unpack)
(b.8) Array reshape function
(b.9) Array manipulation functions (shift, transpose)
(b.10) Array location functions (maxloc, minloc)
(b.11) Array scatter-gather and array scan operations (Fortran 95)
(b.12) Matrix multiply Not all Fortran 90 and Fortran 95 operations are needed, upfront. One can likely do without elemental transcendental functions.
(c) Operations that correspond to array expressions (sum, scaling, etc.)
(d) Operations that create copies of or references to array sections. These operations allow one to copy subarrays (defined by subscript triplets or by vector subscripts) or to create references to such subarrays, thus supporting in place update of subarrays. (As in Fortran 90, references to subarrays may be restricted to subarrays described by subscript triplets, so as to have succinct subarray descriptors.) A possible mechanism is to support the definition of index sets (or array shapes) and the extraction of a subarray defined by such an index set.
(e) Operations to cast Java arrays into rectangular arrays, and vice-versa.
The array classes can be implemented with no changes in Java or JVM. However, It is essential that the get and set methods be implemented as efficiently as array-indexing operations are in Fortran or in C. We expect that inlining will be used for this purpose, and that garbage collectors will recognize rectangular arrays. Multidimensional arrays are extremely common in numerical computing, and hence we expect that efficient multidimensional array classes will be heavily used.
Note that an array of complex entries need not be implemented as an array of references (if Complex objects are regular Java objects), or as an array of lightweight objects. Rather, such an array can be implemented as an array of doubles (with twice as many entries as the complex array). The naive implementation of the get method will access two double values and return a (new) complex object -- better implementations will inline this code. Additional methods will be provided to convert a complex array into a double array (with twice as many entries), and vice-versa.
The inclusion of standard array classes in Java.math does not require any change to the Java language. However, the use of explicit method invocation to effect all array operations will significantly decrease the readability of Java code, and incur the wrath of users. The introduction of a simple notation for multidimensional arrays which maps to the standard array classes would make the use of such arrays much more natural. A multi-index notation, like a[i,j] to refer to such array elements would be ideal. This would allow statements like
to be more naturally expressed as
a[i,j] = b[i,j] + s*c[k,l];
Alternatively, one could reuse the bracket notation of Java, namely
a[i][j] = b[i][j] + s*c[k][l].
The front-end compiler disambiguate the expression according to the type of a. This requires changes in the Java language or (with the second alternative) fancier operator overloading mechanisms.
Some alternatives that need be discussed: Operator overloading may be applied to array arithmetic; e.g. A = B+C. This is nice, but not strictly necessary.
It would be nice to facilitate indexing operations by explicitly supporting triplet notation. This either implies new syntax, or fancy overloading of the indexing.
We did not impose a strict requirement that rectangular arrays be stored in contiguous memory in, say, row major order.
This for two reasons:
(i) this requirement would not have any semantic effect, since one cannot access a 2D or 3D array as if it was one-dimensional (we do not propose the equivalent of Fortran 90 assumed-size arrays). The requirement has only performance implications; e.g., in place reshaping of a 2D array into a 1D array is expected to be very fast, as no data copying is required. In any case, contiguity is a significant requirement only within page boundaries: contiguous pages are not necessarily contiguous in real memory.
(ii) a strict requirement that arrays be stored contiguously would require changes in JVM. Therefore, the weaker requirement that access be as efficient as if the arrays are stored in canonical order.
As for the storage order, one can follow two approaches.
There is a unique storage order, e.g., row major.
Arrays can be stored in distinct orders. For example, storage order could be specified when the array is instantiated. Possible choices would be (i) row major (C order), for better performance when native C methods are invoked; (ii) column major (Fortran order), for better performance when native Fortran methods are invoked; (iii) block major, for block oriented, recursive algorithms. A default, row-major layout would be used when users do not specify layout.
The numerics working group has agreed to begin the development of a variety of core numerical classes and interfaces to support the development of substantial Java applications in the sciences and engineering. The main purpose of this work is to standardize the interfaces to common mathematical operations. A reference implementation will be developed in each case. The purpose of the implementation will be to document clearly the class and its methods. Although we expect these to be reasonably efficient, we expect that highly tuned implementations or those relying on native methods will be developed by others. Also, the simple methods, such as get or set, will not provide reasonable performance unless they are inlined, because the method invocation overhead will be amortized over very few machine instructions. Unless otherwise specified, we will initially only define classes based on doubles, since computations with Java floats are less useful in numerical computing.
The classes identified for first consideration, are the following. We expect to have the first three fully developed this year, with the others to follow soon after.
This implements a complex data type for Java as described above. It includes methods for complex arithmetic, assignment, as well as the elementary functions.
Contacts: John Brophy, Visual Numerics, and Marc Snir, IBM.
(b) Multidimensional arrays
This implements one, two and three-dimensional arrays for Java as described above.
Contacts: Marc Snir, IBM and Roldan Pozo, NIST
(c) Linear algebra
This implements matrices (in the linear algebraic sense) and operations on matrices such as the computation of norms, standard decompositions, the solution of linear systems, and eigenvalue problems. A strawman proposal has already been developed here and will be released for comment soon.
Contacts: Cleve Moler, The MathWorks, Roldan Pozo, NIST, and Ron Boisvert, NIST
(d) Basic Linear Algebra Subroutines (BLAS)
These implement elementary operations on vectors and matrices of use to developers of linear algebra software (rather than to average users). This work will be done in conjunction with the BLAS Technical Forum.
Contacts: Roldan Pozo, NIST, Keith Seymour, University of Tennessee and Steve Hague, NAG
(e) Higher Mathematical Functions
This includes functions such as the hyperbolics, erf, gamma, Bessel functions, etc.
Contacts: Ron Boisvert, NIST and John Brophy, Visual Numerics
(f) Fourier Transforms
This includes not only a general complex transform, but specialized real, sine and cosine transforms.
Contact: Lennart Johnsson, University of Houston
(g) Interval Arithmetic
This implements an interval real data type for Java. It includes methods for interval arithmetic, assignment, as well as elementary functions.
Contact: Dmitri Chiriaev, Sun
(h) Multiprecision Arithmetic
This implements a multiprecision real data type for Java. It includes methods for arithmetic, assignment, as well as elementary functions.
Contact: Sid Chatterjee, University of North Carolina
The working group will review these proposals and open them up for public comment. It will also set standards for testing and documentation for numeric classes. It will work with Sun and others to have such classes widely distributed.
The following problems were discussed by the forum, but no formal position was taken.
(1) Alternative definition of the Java.math library of transcendental functions. The current operational definition is imprecise and suboptimal (the functions are defined in terms of bitwise compatibility with a particular implementation). Alternative definitions are (i) precise rounding -- result is as if computed in infinite precision arithmetic, next rounded; (ii) within fixed bound of precise result; or (iii) improved operation definition. The first definition is very desirable if it can be achieved with acceptable performance overhead. The second weakens bitwise reproducibility. Note that current Java implementations are not in strict adherence to this aspect of the Java standard: most JVMs use their native C math library.
(2) Improved native interfaces between Java and Fortran.
(3) Extensions to support multiple NaN values. This seems to be already in the making.
The following individuals contributed to the development of this document at the Java Grande Forum meeting on May 9-10 in Palo Alto, California.
The following additional individuals also contributed comments, which helped in the development of this document.
The primary concern of Java Grande is to ensure that the Java language, libraries and virtual machine can become the implementation vehicle of choice for future scientific and engineering applications. The first step in meeting this goal is to implement the complex and numerics proposals described in the previous sections. Accomplishing this task provides the essential language semantics needed to write high-quality scientific software. However, more will be required of the Java class libraries and runtime environment if we wish to capitalize on these language changes.
It is possible that many of the needed improvements will be driven by commercial sector efforts to build server-side enterprise applications. Indeed, the requirements of technical computing overlap with those of large enterprise applications in many ways.
For example, both technical and enterprise computing applications can be very large and they will stress the memory management of the VM. The demand for very high throughput on network and I/O services is similar for both. Many of the features of the Enterprise Bean model will be of great importance to technical computing.
However, there are also areas where technical computing is significantly different from Enterprise applications. For example, fine grain concurrency performance is substantially more critical in technical computing where a single computation may require 10,000 threads that synchronize in frequent, regular patterns. These computations would need to run on desktops as well as very large, shared memory multiprocessors. In technical applications, the same data may be accessed repeatedly, while in enterprise computing there is a great emphasis on transactions involving different data each time. Consequently, memory locality optimization is may be more important for Grande applications than it is elsewhere in the Java world. Some technical applications will require the ability to link together multiple VMs concurrently executing on a dedicated cluster of processors which communicate through special high performance switches. On such a system, specialized, ultra-low latency versions of the RMI protocol would be necessary.
It is also important to observe that there are problems, which can be described as technical computing today which will become part of the enterprise applications of the future. For example, images analysis and computer vision are closely tied to application of data mining. The processing and control of data from arrays of sensors has important applications in manufacturing and medicine. The large-scale simulation of non-linear mathematical systems is already finding its way into financial and marketing models.
While it is too soon for us to say exactly where a Grande Bean will differ from its Enterprise cousin, it is not too soon to begin working on it. In the pages that follow we describe two areas where critical improvements are needed and, where possible, make suggestions as to the solutions. We also propose three new community activities which, if successful, can open Java to new areas of technical computing and a new approach to technical problem solving that can profoundly impact both education and industry.
The first activity involves the construction of a suite of benchmark applications that can be used as guideposts for the Java VM and compiler development community. The benchmarks will fall into two categories. Kernel benchmarks will help provide insight into potential performance and scalability problems with the core Java technology. Application benchmarks will be designed to provide accurate information about how Java implementations compare to native C/C++/Fortran versions of the same program.
The second activity of the Grande Applications and Concurrency group will be to define an API for parallel application in Java. This may take the form of a set of design patterns or it may be a specification for Grande Beans. The third activity is called the requirements and specification for "seamless computing environments".
There are two areas of initial concern about the core Java technology. The first involves scalability of the virtual machine and the second involves the performance of the Java RMI. We treat each of these in turn below.
Technical computing often involves application components that require multi-gigabyte images. Unfortunately, many current VM implementations have restrictions on the size of application heap and many others demonstrate poor memory management and garbage collection performance. The Grande team will implement a series of Kernel benchmarks that test the scalability of VM implementations. Other features of the VM that are potential show stoppers when not implemented with efficiency and scalability in mind include:
1. Large numbers of threads. The scalability of thread synchronization as the number and size of thread objects grow.
2. Support for native threads and light weight process structures that are tuned for high-end SMPs with 32 or more processors.
3. Memory and synchronization primitive performance on distributed, cache-coherent multiprocessors with non-uniform access time, multi-level memory architectures. Each of these VM properties can have a dramatic impact on our ability to construct Grande applications. However, they are not likely to effect the specification or semantics of the VM design. Consequently, the goal of the benchmark is to make it easier to spot where implementation decisions impact scalable performance.
The Java Remote Method Invocation (RMI) is the most sophisticated and elegant RPC mechanism yet designed. It takes full advantage of the Java language and object model. It is ideally suited to many Grande applications. However, the design is not without some flaws and implementations suffer from serious performance limitations. While it works well for communicating small object over the commodity Internet, Grande applications that require moving multi-megabyte objects between distributed components over the next generation, high-speed networks such as the vBNS there are serious problems.
The Grande Kernel Benchmark for RMI will provide a series of tests that will allow implementers to see both the types of communications that are common in these technical applications and provide feedback on the performance of their implementation.
Object Serialization is an important and critical feature of Java. It is central to persistence in the Java component architectures and it is also fundamental to the RMI object marshaling and un-marshaling. However, in technical application where RMI arguments are often large arrays of relatively simple objects, many optimizations can be made in the serialization process. In addition for many scientific application, it is not always necessary have a full encoding of the object type as part of the stream.
A second problem with RMI has been with the transport protocol. In many technical application we will want to use the elegant RMI model to communicate over very specialized, high performance network protocols. For example, SCI, ATM AAL5, Shared memory, Myrinet, Fast Messages and Active Messages are all used in technical applications. The current 1.2 beta 3 version of JDK provides a customizable socket layer. Hence, it should be possible to support some of these protocols with that technique. However, some of the fast message protocols like FM and Nexus are not socket level interfaces but support special forms of remote service requests. In these cases, a high level API is needed to easily layer RMI over them.
It has been shown that it is possible to design fast, highly-specialized forms of serialization and to re-host RMI over other special purpose wire protocols. It may also be possible to build a smart, adaptive serialization/RMI protocol that uses knowledge about the context of the transaction to select the appropriate and available mechanism to use. However, at this point this remains a Grande Community research project and the Grande group looks forward to working with Sun and it other partners on it.
In addition to the Kernel VM and RMI benchmarks described above, the Grande Applications and Concurrency group has identified a series of real technical applications that can be provided to the community to support compiler and VM optimization efforts. This project has goals similar to the original NAS, Splash and Perfect Benchmarks, which were used by the high performance computer and compiler designers to gauge their progress. In the case of the Perfect Benchmarks, many of them are now being integrated into the SPEC suite, which is the standard for the industry.
The Grande benchmarks should play the same role in the Java computing industry. The proposed benchmarks will include:
Each benchmark will be instrumented and have a standard input data sets and configuration. The benchmark will report success or failure at achieving the correct final state and report on different aspects of the performance relative to a C++ or Fortran program that implements the same computation. As with the NAS suite, each benchmark will contain a small-test, medium and large scale input data set/configuration.
The role parallel computation plays in high performance technical computing cannot be under estimated. There are at several ways to building a Java parallel computing environments.
One approach is to take the experience of the last ten years of parallel programming and build a set of Grande-parallelism design patterns that can be cast as a set of interfaces and base classes that simplify the task of writing parallel Java Grande applications. This API can then be hosted on a set either of concurrently executing VMs or in an environment where large numbers of native threads are well supported. Such an API may be as simple as defining a truly object oriented version of MPI, or it may define a new category of distributed object aggregates and collective operations.
A second approach that may be more consistent with current Java directions would be to design a Grande Bean specification that extends the basic Bean model to one appropriate for technical applications. This would follow what has been done with Enterprise Beans for transaction oriented business applications. The Enterprise Beans model has allowed CORBA based resources to be woven into unified component model. Grande beans can build upon this to incorporate high end, parallel computational modules and visualization and VR tools into a grid of resources controlled by the VM on the users desktop system.
For the average scientist and engineer one of the greatest difficulties in doing large-scale computation is constant struggle required to port applications to a new environment. This involves the following tasks:
A seamless technical computing environment would allow a Java based programming environment that could provide a uniform interface to all these remote resources. Java based agents can be installed at each site which cooperate with the user and guide him through the resource discovery and authorization process and provide an Integrated Development Environment for using these remote resources.
It is possible that such a system can be built on top of some of the existing and emerging meta-computing infrastructures. Many of these provide the tool kit and components to build on, and a few have partial solutions to the problems listed above. With a collective effort of the Grande team, it should be possible to do much more.