Java for parallel computing and as a general language for Java for parallel computing and as a general language for scientific and engineering simulation and modeling scientific and engineering simulation and modeling

We discuss the role of Java and Web technologies for general simulation. We classify the classes of concurrency typical in problems and analyze separately the role of Java in user interfaces, coarse grain software integration, and detailed computational kernels. We conclude that Java could become a major language for computational science, as it potentially o(cid:11)ers good performance, excellent user interfaces, and the advantages of object-oriented structure.


Introduction
The World Wide Web provides important infrastructure for scienti c and engineering computation. The distributed computing hardware of the Web has remarkable potential compute performance|1,000 times that of the largest supercomputer. This ratio largely re ects the ratio of monetary investment in the two elds. Of course, the Web does not support the low latency and high bandwidth required by most parallel simulations. However, we believe that an attractive scienti c computing environment can be built on top of Web software by adding to the basic Web loosely coupled distributed computing model, the necessary added functionality for computational science. We analyze, in Section 2, the various forms of concurrency seen in applications, and then in the last three sections, discuss three major areas where Java can be e ectively used. We conclude that Java c o u l d w ell become a dominant language in science and engineering.

Concurrency in Applications
In understanding the role of the web in large-scale simulations, it is useful to classify the various forms of concurrency in problems into four types Fox:96b].

Data Parallelism
This is illustrated by natural parallelism over the particles in a molecular dynamics computation over the grid points in a partial di erential equation over the random points in 1 a M o n te Carlo algorithm. In the Web computation of the factors of RSA130 Cowie:97a], RSA:97a], RSA:97b], we can consider the parallelism over possible trials in the Sieve a lgorithm as the \data" for data parallelism in this application. Data parallelism tends to be \massive" because computations are typically time consuming due to a large amount of data. Thus, data parallelism is parallelism over what is \large" in the problem. It is not di cult to nd data parallel problems today with parallelism measured in the millions (e.g., a 100 100 100 grid).

Functional Parallelism
Here we are thinking of typical thread parallelism, such as the overlap of computation (say, decompressing an image) and communication (fetching HTML from a server). More generally, problems typically support overlap of I/O (disk, visualization) with computation. We also, of course, can have multiple compute tasks executing concurrently. This form of parallelism is present in most problems the units are modest grain size (larger than a few instructions scheduled by a compiler, smaller than an application), and typically not massively parallel. Further, such functional parallelism is typically implemented using a shared memory and, indeed, its existence in most problems makes few way parallel shared memory multiprocessors very attractive.

Object Parallelism
We could mean many things by t h i s , b u t w e h a ve in mind the type of problems solved by discrete event s i m ulators. These are illustrated by military simulations where the objects are \vehicles," \weapons," or \humans in the loop." The well-known SIMNET or DSI (Distributed Simulation Internet) have already illustrated the relevance of distributed (Internet) technology for this problem class DMSO:97a]. Object descriptions are similar to data parallelism except that the fundamental units of parallelism, namely objects are quite large, corresponding to a macroscopic description of an application. Thus, a military battle is described in terms of the units of force (tanks, soldiers) with phenomenological interactions rather than the (unrealistic in this case) fundamental description in terms of atomic particles or nite element nodes. For a typical \data parallel" problem, the fundamental units of parallelism (grid points) are typically smaller.

Metaproblems
This is another functional concurrency, but now with large-grain size components. In image processing, one often sets up an analysis system where the pixels are processed by a set of separate lters|each with a di erent convolution or image understanding algorithms. Software systems, such a s AVS and Khoros are well-known tools to support such l i n k ed modules. So a metaproblem is a set of linked problems (databases, computer programs) where each unit is essentially a complete problem itself. Data ow (a graph specifying how problems accept data from previous steps and produce data for further processing) is a successful paradigm for metaproblems. In manufacturing, one often sees metaproblems as building a complex system, such as an aircraft, requiring linking air ow, controls, manufacturing process, acoustic, pricing and structural analysis simulations. It has been estimated that designing a complete aircraft could require some 10,000 separate programs|some complicated ones such as airlow s i m ulation were mentioned above, but

Overview of Web and Parallel Computing Software Issues
We can view computing (as many other enterprises) in terms of a pyramid with widely deployed cheap systems at the bottom of the pyramid, and the few high-performance systems as the top ( Figure 1). There is much more computing power in the distributed collection of consumeroriented products|PCs, videogames, Personal Digital Assistants, Digital Set Top boxes, etc. This dominant dollar investment in the consumer products implies that one can expect the bottom of the pyramid to have much better software than the top. Software investment m ust be roughly proportional to market size, and so we see PCs, workstations, and MPPs (Massively Parallel Processors) o ering increasing unit software price and decreasing software quality a n d functionality. The Web, perhaps, o ers now the best available software (as it is potentially the largest market). When the PC market dominated quality consumer software, it was hard for the parallel processing community to take advantage of it. PCs o er, of course, a sequential computer model, but now the Web software targets a very rich distributed computing model. It seems to us clear that we can, and indeed must, build MPP software with a backbone architecture of Web software. As mentioned in the Introduction, we can then view parallel processing as a special case of a distributed model with stringent synchronization constraints. We view this as leading to a set of Compute Webs, which w e describe in the following sections. This approach has the added advantage that we can build Compute Webs by either running Web clients or servers with synchronization/compute enhanced Web software, or use the latter software to provide a very attractive user environment on specialized MPPs whose low latency In the following, we discuss the role of Web hardware and, especially, software for three distinct parts of computation ( Figure 2).
1. User (client) view|problem speci cation, visualization, computational steering, data analysis 2. Metaproblem implemented on a distributed computer 3. Individual computationally complex components of the metaproblem implemented on high-performance computers, which could in fact be a distributed system itself.
We cover these three parts|graphical user interface, data ow for metaproblems/software integration, and hardcore computation, in the next three sections.

WebWindows and the User View
We abstract future high-performance computing environments into four layers, shown in Portable exible programming at some performance cost -illustrated by Java in applet mode c) Traditional compiled level O ering a high-level language with few machine dependent features, and getting high performance|traditionally within about a factor of two of the peak performance possible on the particular algorithm|illustrated by coupled Fortran, C, C++, and Java. d) Traditional machine speci c level Rarely used by application programmers or even those building (high level) tools. Clearly, allows user to obtain peak performance at the cost of a very inconvenient programming environment.
Levels c) and d) include the computationally intense parts of the problem, which can be implemented on appropriate servers. However levels a) and b) which w e discuss in this section, are likely to be executed in the client m a c hine/environment. We describe the current trends in software strategy Fox:95d] WebWindows for a Standalone PC Figure 4: WebWindows for One PC or the World Macintosh, UNIX environments to a WebWindows basis, i.e., software built on the interfaces de ned by Web servers and Web clients. As shown in Figure 4, this is, of course, a valid approach whether one is writing for a single stand-alone machine (running a Web server and client) or the entire worldwide network. In this sense, the use of Web technology for user interfaces is trivial|the user interface is not constrained greatly by the di culties of highperformance computation, as it runs on the \conventional" client side and so can naturally use best client side technologies. Some examples of Web based user interfaces are: NCSA's biology workbench NCSA:97a], which i s a C G I i n terface to a collection of useful computational biology resources.

Network of Web Servers and Clients
Figure 5: We expect this type of interface development t o c o n tinue and become the norm. However, we see a particularly important role for Java (and VRML) in terms of level b) of Figure 3. Namely, Java seems an attractive language for building client side data analysis systems. These typically involve both computation and visualization|in which linkage, Java has unique capabilities. Thus, we expect a set of high quality Java applets (or compiled plug-ins) to be developed which support this analysis. Those applets will be used at level a) of Figure 3 by the general user with the expert modifying the code of the applets (level b)) for customized capability. A good example of Java for scienti c visualization is the work of Cornell Houle:97a], and Syracuse Warner:97a] on applets for teaching fracture mechanics, membrane physics, and other scienti c areas. We depict, in Figure 5 the resultant, environment w h i c h e s s e n tially becomes a Java wrapper for code written in traditional languages and running on sequential, parallel or distributed computers. This use of Java is likely to grow rapidly as it requires modest changes to existing software and adds great value without changing the familiar programming paradigm. However, we see it as a natural Web \seed" that can grow i n to the more pervasive use of Java.

WebFlow and Coarse Grain Software Integration
As we h a ve discussed, it is very natural to use web hardware and software to implement c o n trol of metaproblems WBH:97a]. Although we only described earlier the data ow model for this, one can, of course, use these ideas for any application with linked components that have relatively large chunks of computation that dwarf the latency and bandwidth implied by using the Web as a compute engine. In fact, we can include our recently completed RSA130 factoring project Cowie:97a], RSA:97a], in this class. This distributed the sieving operations over a diverse range of clients (from an IBM SP2 at NPAC to a 386 laptop in England) under the control of set of servers. This was implemented as a set of Web server CGI Perl scripts FAFNER RSA:97a], RSA:97b]. These created daemons to control the computation on each client which returned results to the server that accumulated results for nal processing to locate factors.
Note that a particularly interesting later computation (155 decimal digit or 512 binary digit factorization) would require about some teraop-month of computation (10,000 Pentium Pro PCs running at out for a month) and will be quite practical as a Web computing project. 512 binary digit numbers are used as the basis of the security of many banking systems that perhaps fail to realize that modern computing can crack s u c h c o d e s .
We can extract two types of computing tasks from our factorization experience Fox:95a]. The rst is the resource management problem|identifying computer resources on the Web assigning them suitable work releasing them to users when needed, etc. A sophisticated Web system ARMS ARMS:96a] for this is being developed by Lifka a t the Cornell Theory Center. Well-known distributed computing systems in this area include LSF, DQS, Codine and Condor (see review in Cluster:96a]), and this seems a very natural areas for the use of Web systems including linked databases to store job and machine parameters.
The second task is the actual synchronization of computation within a given problem| resource management, on the other hand assigns problems to groups of machines and does not get involved with detailed parallel computing algorithm and synchronization issues. Here, we see two general concepts. One is support of the messaging between individual nodes that creates a virtual (parallel) machine out of the World Wide Web.
This low level support is called by us, WebVM, in Figure 1 of Bhatia:97a], and should implement the functionality of parallel systems, such as MPI in terms of Web technology message systems|either HTTP or direct Java server|server (client) connections. Here, the most elegant model is perhaps based on a mesh of Web Servers Bhatia:97a], HPDC:96a] although today's most powerful implementations would use like F AFNER, a mesh of Web clients controlled by a few servers Alexandrov:97a], JW:97a], Superweb:96a]. In the spirit of WebWindows, we can expect servers or server equivalent capability to become available on all Web connected machines. Note that one can argue that the natural Web model is server-server, and not serverclient and indeed this supports the traditional NII dream of democracy with everybody capable of either publishing or consuming information.
O n t o p o f W ebVM, one can build higher level systems, such as the distributed shared memory model (called WebHPL in Figure 1 of Bhatia:97a]) or more easily an explicit message passing system, such as the data ow model. WebFlow supports a graphical user interface ( Fox:95a], Fox:95d], WBH:97a]) specifying metaproblem component linkage and one can naturally design domain speci c problem solving environments in this framework.
In the notation of Figure 3, one would support scripted \little languages" (designed for each application) at the top level a) (in classi cation of Section 4), which would allow for more exible and dynamic metaproblem component linkage. An interesting feature of the coarse-grain Web computing models is that it naturally joins collaboration with computing, as both as naturally implemented by linked Java servers and clients Beca:97a], Cowie:97a]. This could lead to powerful new approaches to multidisciplinary design, computational steering, and other applications linking computers and people in the loop. Now is, of course, a confusing time for as shown in Table 1 of Bhatia:97a], there are as many compute-web implementation strategies as there are major players in emerging Web technology|especially as we evolve from powerful, but rather ad hoc server side CGI scripts to integrated dynamic Java client and server systems. Thus, now is not the time for \ nal solutions" but rather for experimentation and exibility to examine and in uence the key building blocks of future Web computers.
Finally, note that the Web encourages new models for computation with problems publishing their needs and Web compute engines advertising their capabilities and dynamic matching of problems with compute resources Acharya:97a].
6 Java as the Language for Computational Science and Engineering The Syracuse workshop Javaforcse:97a] covered, generally, the topics of the last two sections where we s a w J a va as clearly attractive for both user interfaces, wrappers, and the metaproblem control. However, there was particular interest in Java's possible role as the basic programming language for science and engineering|taking the role now played by Fortran 77, Fortran 90, and C++. We n o w discuss this controversial area. Java's most important a d v antage over other languages is that it will be learnt and used by a broad group of users. Java is already being adopted in many e n try level college programming courses and will surely be attractive for teaching in middle or high schools. Java i s a v ery social language as one naturally gets Web pages from one's introductory Java exercises that can be shared with one's peers. We have found this as a helpful feature for introductory courses. Of course, the Web is the only real exposure to computers for many children, and the only languages they are typically exposed to are Java, JavaScript, and Perl. We nd it di cult to believe t h a t entering college students, fresh from their Java classes, will nd it easy to accept Fortran, which will appear quite primitive in contrast. C++ as a more complicated systems building language may well bea natural progression, but although quite heavily used, C++ has limitations as a language for simulation. In particular, it is hard for C++ to achieve good performance on even sequential and parallel code, and we expect Java not to have these problems.
In fact, let us now discuss performance, which is a key issue for Java. As already shown in Figure 3, we have already suggested a multilevel scienti c programming environment that would use purely scripted, applet mode and purely compiled environments with di erent tradeo s in usability and performance. As discussed at our workshop, there seems little reason why native Java compilers, as opposed to current portable JavaVM interpreters or Just in Time compilers (JIT), cannot obtain comparable performance to C or Fortran compilers. As stressed by Budimlic at the workshop Budimlic:96a], Budimlic:97a], a major di culty is the rich exception framework allowed by Java that could restrict compiler optimizations. Users would need to avoid complex exception handlers in performance critical portions of a code.
An important feature of Java is the lack of pointers and their absence, of course, allows much more optimization for both sequential and parallel codes. Optimistically, w e c a n s a y t h a t J a va shares the object oriented features of C++ and the performance features of Fortran.
One interesting area is the expected performance of Java interpreters (using just in time techniques) and compilers on the Java bytecodes (Virtual Machine). Here, we nd today perhaps a factor of 2{4 lower performance from a PC JIT compiler compared to C compiled code Applets:96a], Linpack:96a]. Consensus at the workshop expected this performance degradation to be no worse than a factor of two for the portable applet mode. As described above, with some restrictions on programming style, we expect Java language or VM compilers Cierniak:97a] to becompetitive with the bestFortran and C compilers. Note that we can also expect a set of high performance \native class" libraries to be produced that can be downloaded and accessed by applets to improve performance in the usual areas one builds scienti c libraries.
One interesting omission is in the framework of Figure 3, a purely interpreted version of Java|level a). This would also be very helpful for teaching. JavaScript is interpreted, but we would view it as a \little language" for document handling|and not a general Java-like interpreted environment.
Finally, w e will discuss parallelism in Java. Here, we return to the four categories of concurrency.

Data Parallelism
This is supported in Fortran by either high level data parallel HPF or at a lower level Fortran plus message passing (MPI). Java does not have any built in parallelism of this type, but at least the lack of pointers means that natural parallelism is less likely to get obscured. There seems no reason why J a va cannot be extended to high level data parallel form (HPJava) in a similar way to Fortran (HPF) or C++ (HPC++) Carpenter:97a], DBC:97a]. Parallelism can be supported on both shared Bik:97a] and distributed memory architectures for the SPMD programming model Hummel:97a]. At t h e l o wer message passing level, the situation is clearly satisfactory for Java as the language naturally supports inter-program communication, and the standard capabilities of high-performance message passing are being implemented for Java F oster:97a], Globus:96a].

Modest Grain Size Functional Parallelism
This is built into the language with threads for Java and has to be added explicitly with libraries for Fortran and C++.

Object Parallelism
This is quite natural for C++ or Java where the latter can use the applet mechanism to portably represent objects. We h a ve built a collaboration system TANGOsim where a Java server controls a set of Java applets and other applications spawned from them Beca:97a], Beca:97b], Beca:97c] Tango:96a]. We generalized the session manager present in collaborative systems to be a full event driven simulator. This illustrates the power of Java f o r this problem class and shows that it can unify traditional time stepped simulations (typical for data parallelism) with event driven forces modeling, and other such simulations Cowie:97a].

Metaproblems
We have already discussed in Section 5, the power of Java in this case for overall coarse grain software integration.
In summary, w e see that Java has no obvious major disadvantages and some clear advantages compared to C++ and especially Fortran as a basic language for large scale simulation and modeling. Obviously, we should not and cannot portall our codes to Java. Rather, we can start using Java for wrappers and user interfaces. As compilers get better, we expect users will nd it more and more attractive to use Java for new applications. Thus, we can expect to see a growing adoption by computational scientists of Web technology in all aspects of their work.