WebFlow|A Visual Programming Paradigm for Web/Java Based Coarse Grain Distributed Computing

We present here the recent work at NPAC aimed at developing WebFlow|a general purpose Web based visual interactive programming environment for coarse grain distributed computing. We follow the 3-tier architecture with the central control and integration We-bVM layer in tier-2, interacting with the visual graph editor applets in tier-1 (front-end) and the legacy systems in tier-3. WebVM is given by a mesh of Java Web servers such as Jeeves from JavaSoft or Jigsaw from MIT/W3C. All system control structures are implemented as URL-addressable servlets which enable Web browser-based authoring, monitoring, publication, documentation and software distribution tools for distributed computing. We view WebFlow/WEbVM as a promising programming paradigm and coordination model for the exploding volume of Web/Java software, and we illustrate it in a set of ongoing application development activities.


Introduction
As anticipated in our WebWindows ansatz WebHPCC96], current Web systems, fueled by Java, evolve rapidly towards a powerful open infrastructure that will enable world-wide distributed computing.In the current Web/Java expansion phase, we are witnessing a wide variety of new interesting tools and technologies but the overall integration framework is still missing and the software reuse remains di cult.We need a coarser grain encapsulation unit than a Java class to enable user-friendly distributed computing on the Web.In fact, several attempts at specifying such a framework are underway, for example JavaBeans from JavaSoft.However, the Web software industry is currently focused mainly on the front-end support for component based GUI integration, whereas the middleware and back-end layers are still an open research and prototyping area.
At NPAC, we are monitoring the emergent Web technologies pertaining to the domain of world wide scalable distributed computing and we are designing and prototyping a visual graph based data ow environment, WebFlow, using the mesh of Java Web Servers as a control and coordination middleware, WebVM.
In this document, we review brie y our Web technology evaluation activities in Sections 2{4, followed by the presentation of our WebFlow/WebVM prototype (tier-2) in Section 4 which forms the core of this paper.This is followed by the discussion of the WebFlow front-end (tier-1) in Section 6 and some initial back-end (tier-3) activities in Section 7. Finally, we summarize in Sections 8 a set of planned or ongoing application development activities in the areas of command and control, telemedicine, distance education and Internet commerce that will build on top of WebFlow/WebVM infrastructure.At NPAC, where we are closely monitoring this `bleeding edge' of interactive Web, we observe that although these new systems o er attractive capabilities, the current generation Java software is still di cult to customize, repackage or reuse.The reason is that Java class is a too small, too ne grain encapsulation unit and hence reusing a package requires usually detailed understanding of a large number of its tightly interwoven classes.Figure 1: Overview of the WebFlow/WebVM architecture: WebVM is formed in tier-2 as a mesh of Java Web servers, managing WebFlow nets (or compute-webs) and interacting with the legacy systems in tier-3 (back-end) and with the visual graph editor applets in tier-1 (frontend).
3 WebFlow/WebVM Concepts Our goal is to provide a coarser grain packaging model and the associated user-friendly authoring framework for distributed applications on the Web.We believe that we should build on top of the established standards such as HTTP, HTML and Java, and hence we adopt Java Web server as a base runtime and coordination node of any distributed Web system.Data ow model, already proven e ective by previous systems such as AVS, Khoros, CODE Browne92], HeNCE Dong94] and others, seems to be a natural coordination framework to extend the current 2-node model in which HTTP/MIME data ows between Web client and server towards multi-server systems.
Hence, we propose a runtime environment given by a mesh of Web Java servers to coordinate distributed computation represented as a set of channel-connected coarse grain Java modules.Modules are thin veneer Java interfaces so that any chunk of Java can be easily modularized and connected to other modules via suitable communication ports, acting as terminals of pointto-point data ow channels.Modules run asynchronously, are mobile, i.e., can be instantiated on any WebVM server, and communicate by exchanging Java objects along their data ow channels.
Aspects of such emergent architecture can be already found in current systems, analyzed in Table 1.For example: Jigsaw/Jeeves develop the concept or resources/servlets as control encapsulation units; Infospheres develops portlets/mailboxes as terminals for communication channels; Habanero is a multi-server system; and so on.

Early Experiments
We initiated the WebFlow/WebVM design process by experimenting with existing systems.Over the summer/fall '96, we evaluated a suite of new Java systems including Aglets, Habanero, Infospheres, Jeeves, Jigsaw, JSDA, Shaking Hands, and others.One of early decisions we made was that rather then developing custom Java servers from scratch as in Habanero or Infospheres, we prefer to add new services and maintain them within the Web Java server addressing space.
Such organization facilitates management and o ers natural, Web-browser based monitoring, publication and distribution support for the Web software.
Figures 2 and 3 illustrate our early experiments with Jigsaw where we constructed a chat collaboratory as Jigsaw resource (Figure 2) and we formed a token ring by connecting a set of Jigsaw resources viewed as WebFlow modules using Infospheres portlets.Later on, we switched to the Jeeves model since the servlet API is likely to become a standard as given by a core Java package java.servlet.We intend to continue the exploration of Jigsaw and other promising public domain Java systems and we tentatively base the WebFlow/WebVM prototype development on the Jeeves server architecture.
5 Tier-2 WebFlow/WebVM Prototype  called modules and they communicate by sending objects along channels attached to module ports.Unlike management servlets which are usually persistent and application independent, modules are more transient and can be dynamically created, connected, scheduled, run, relocated and destroyed by servlets.WebFlow is a particular programming paradigm implemented over WebVM and given by a data ow programming model (other models under experimentation include data parallel, collaboratory, and televirtual paradigms).WebFlow application is given by a computational graph, visually edited by end-users using Java applets.
Modules are written by module developers, people who have only limited knowledge of the system.on which the modules will run.They not need concern themselves with issues such as: allocating and running the modules on various machines creating connections among the modules sending and receiving data across these connections running several modules concurrently on one machine The WebFlow system hides these management and coordination functions from the developers, allowing them to concentrate on the modules being developed.
WebFlow management is currently implemented in terms of the following three servlets: Session Manager, Module Manager, and Connection Manager.These servlets are URL addressable and can o er dynamic information about their services and current state.Each of them can also communicate with each other through sockets as discussed in the next section.
Figure 4 illustrates the three base servlets employed in setting up and managing WebFlow operation.Session Manager receives graph speci cation from the editor applet, creates an image of the whole compute-web using module proxy objects called ModuleRepresentation, decides on the compute-web decomposition strategy, and noti es Module Manager about local modules to be instantiated.
Module Manager starts and maintains ModuleWrapper threads than run Modules.Each module, when created, noti es ConnectionManager about the connectivity required by this module Ports, and waits for the connections to be established.
WebFlow channels connecting two module Ports are formed dynamically by the corresponding ConnectionManagers: Sockets returned by their 'accept' and 'connect' calls are passed to the appropriate ports.After all ports of a module receive their requested sockets, the module noti es the Module Manager and is ready to participate in the data ow operations.

WebFlow requirements
The requirements placed on WebFlow stem from the discussion above.Namely, WebFlow shall: allow modules to be run on demand support communication between the modules provide facilities for the user to create and destroy an application, where an application is a set of interconnected modules.To support the requirements placed on the system, the following components have been created:  Module Manager, in charge of running modules on demand Connection Manager, in charge of creating connections between the modules Session Manager, in charge of executing all the actions the user performs on the front end.In the following section, we describe each of these management entities in more detail.

WebFlow management
Module Manager The Module Manager is the simplest of the three system components.It is in charge of running modules on demand.A user/editor request to create a module is sent to the Module Manager residing on the particular machine on which the module should be run.The Module Manager creates a separate thread for the module (thus enabling concurrent execution of multiple modules), and loads the module code, making the module ready for execution.
A request for running (destroying) a module triggers a special method called run (destroy).These methods were written by the module developers.
An important observation is that the Module Manager has no notion of a session built into it.It can support any number of modules, and requests coming from any number of Session Managers.Connection Manager The Connection Manager is in charge of establishing connections between modules.To be precise, it establishes connections between individual ports, regardless of the module on which they reside, and regardless of the machine on which the module is run.
As each module is initialized, its ports register with the Connection Manager.This enables the Connection Manager to establish connections between registered ports as illustrated in Figure 6.
To connect port 1 and port 2 in Figure 6, a connect request is received by the rst Connection Manager in step 1.In step 2, an establish request is sent to the second Connection Manager, which then, in step 3, sends an OK message back to the rst Connection Manager to acknowledge the establish request.The second Connection Manager proceeds to send a Connection back to the rst Connection Manager which receives the connection and passes it on to the port.Finally, in step 5, the rst Connection Manager replies that the operation has succeeded.If an error occurs in any stage of the protocol, then instead of OK messages, error messages will be sent back, thus aborting the protocol, and notifying the caller that the connection failed.
The gure shows the more general case in which the two ports reside in separate Connection Managers.Of course, the two ports may be registered at the same Connection Manager, in which case the whole connection procedure is simpli ed, and steps two and three are not needed.As with the Module Manager, the Connection Manager has no notion of a session built into it.It can support any number of Session Managers.

Session Manager
The Session Manager is the part of the system in charge of accepting to the rest of the system.These requests include: creating a new module, connecting two ports, running the application, and destroying the application.
Both the Session Manager and the front end store a representation of the application that the user is building.The di erence between the two is that the Session Manager needs to worry about the machines on which each of the modules has been started, while the front end worries about the position of the representation of the module on the screen.
In the WebFlow prototype, the Session Manager can only work with one user at a time.In other words, there is only one session active at any one point in time (we are currently exploring JSDA support for WebFlow to provide multi-user collaborative editing capabilities).

Internal communication in WebFlow
All the URLs point to the web server.The web server analyzes the URL, As illustrated in Figure 7, the WebFlow prototype supports two types of communication: via URL via socket connections In the gure, the client can either be front end, or the Session Manager, while the servlet can be any of the three servlets that exist in the system.
All the URLs point to the web server.The web server analyzes, the URL, and forwards the request to the servlet denoted in the URL.Socket connections are received directly by the servlet.
The former|via URL|is used when a component's socket address is unknown.This feature allows the whole system to be accessed over the web.However, the current implementation of the URL addressing scheme does not provide a convenient way to send whole objects as parameters.On the other hand, the socket connection scheme provides for a very natural way of sending any object, provided it knows how to serialize itself, over the socket.This is very useful, as all the requests and replies can easily be expressed as objects whose internal state holds the type and parameters of the request or reply.
At the time being, each of the three servlets in the WebFlow system listens to both the URL and socket connections at all times.Internal requests and replies for creation, running, and destruction of modules, as well as connecting ports all go through the socket connection, whereas the URL communication is being used to provide the socket address of the server and to perform system-wide operations, such as give usage statistics, reset the system, add new resources to the system, etc.
It is conceivable that the HTTP protocol will evolve so that the whole WebFlow communication could be eventually handled uniformly in the URL addressing mode.For the time being, we will support both URL and socket based addressing modes and we will monitor, participate in and respond to W3C e orts aimed at dynamic and object-oriented extensions of the HTTP protocol.

Module and Port identi cation in WebFlow
As soon as a module is created, it is assigned a unique identi er.This identi er is present with all the requests associated with the module, i.e. the module's running and destruction (recall that creation also creates the identi er).Module identi ers are necessary because of the following reasons: they provide an easy way of identifying the target of module operations they enable multiple instances of the same module to be run on the same machine, each of the instances having a separate identi er Each port also has an identi er, but they are less general than the module identi ers.Since ports can never exist outside of a module, it su ces to assign unique identi ers to ports on one module.The current implementation is a bit more general, however, since it assigns identi er to ports per one Module Manager/Connection Manager combination.

I/O modules in WebFlow
The previous discussion took for granted the input and output modules in an application.However, current web browser restrictions make input and output modules a non-trivial task.
Since the front end can be invoked from an arbitrary machine connected to the web, the input and output modules should be able to receive their input and send their output to the same, arbitrarily chosen, machine.The only way of doing so in the current state of a airs is to provide applets that will be able to receive user inputs, and show the application's outputs.
Therefore, the input/output modules are made of two parts: a WebFlow part|that works under the WebFlow model, and an applet part|that provides I/O capabilities, as illustrated in Figure 8. Upon initialization, the I/O modules inform the system that they require an applet to be spawned for them.That request is forwarded all the way to the system's front end, which has the capability to open a new frame on the screen, and load an HTML page in it.That HTML page can contain an invocation of the I/O module's applet.
The front end receives the HTML pages by making separate requests to the Session Manager.In the long run, the responsibility of creating and serving these HTML pages will be placed in a separate manager|the Viewer Manager, a topic further discussed in the following sections.could be grouped with the Module Manager, and in the second, it could be grouped with the Server Manager.
The Communication Manager will multiplex all the communications between ports registered on a given WebVM node.In the WebFlow prototype, each port has its own socket through which it communicates with the remote port, thus not only wasting system resources (one extra socket per port), but also having to deal with the low level details of sending and receiving messages (although it has the distinct advantage of having the sockets themselves take care of message bu ering).
The Communication Manager will provide facilities for sending, receiving, and bu ering messages.Its natural place is together with the Connection Manager, since these two servlets actually represent only two stages in the overall communication process.Future WebFlow implementations will probably have just one Connection and Communication Manager, instead of two separate ones.

Tier-1 Visual Graph Editor
Since the idea of WebFlow is to create and maintain a domain of world-wide reusable computational modules, the natural place for accessing and maintaining such a domain is the Web itself.Therefore we are faced with the existing browsers such as Netscape or Internet Explorer as a basis for the WebFlow Graphical User Interface.The security restrictions imposed by these browsers, implementation di erences due to the ongoing corporate competition, as well as the recent developments in the Network Computer domain all point towards a design solution of a light weighted front end, that will be accessible trough any browser (including new consumer electronic front-ends) and a solid back end given by a personal Java Web server, hooked to a WebVM network, which will implement the most of the functionality of the system.
The front end is designed as a tool for visual authoring of computational data ow graphs that integrate the existing public domain software modules.It is based on highly intuitive visual icons and click-and-drag design metaphors which hide the inherent complexity of the WebFlow system.
In the current implementation of the front end we used the UCI's Graph Editing Framework (GEF) Robb96] as a basis to develop the front end of the WebFlow system.GEF supports the basic graph editing mechanisms and it is naturally extensible.This framework is well structured with cleanly decoupled layers, which makes possible to concentrate on the application speci c details that concern the WebFlow front end. Figure 10 shows a snapshot of the current editor in action.
The front end is implemented as an applet, it resides in the top level layer of the system, and it creates and maintains a connection with the Session Manager in the back end of the system.
The user creates a computational graph from modules as building blocks, by selecting the corresponding icons from a list of available modules in the system and inserting them into the graph.Multiple instances of a speci c module can be created and their internal state and their connections are completely independent.
After the modules are inserted as nodes in the graph, the applet requests its initialization from the back end.After the initialization is done the back end replies to the applet, bringing information about the interface of the selected module.The applet builds and stores the representation of the graph, keeping information just about its visual representation.The  information about the actual modules and their mapping on real machines are stored by the back end.
In the same fashion as the modules, the connections between modules are created.Connecting two modules means connecting a port from one module to a port of the other, by means of simple clicking and dragging.
Individual modules and/or connections can be removed from the graph, which results in deleting them from the structure maintained by the applet itself and in killing the initialized instances of the corresponding modules in the real system as well as breaking the real connections between initialized modules.
After the computational graph is created it can be executed as well.The results are monitored trough the input/output modules that are inserted in the graph.The execution of a computational graph can generate variety of feedback patterns, ranging from just producing nal results from a complex computation, to periodic performance visualization and system monitoring modes, to real-time interactive display modes.Current WebFlow editor is restricted to single-user 2D graphics operations but we are also initiating activities on bringing the front-end to the next level of interactivity.This includes integrating WebFlow with JSDA to support collaboratory editing and with VRML2 to support televirtual authoring paradigm.8 Tier-3 Legacy Layer In parallel with the core WebFlow development work described so far, we are also starting activities on building domain-speci c tier-3 module libraries, including WebFlow wrappers to existing codes and legacy systems.ModuleWrapper discussed in Section 5 can wrap any computation, including pure Java, native libraries or external UNIX or NT processes.
In the pure Java sector, we are developing control, monitoring and coordination support for the base WebVM/WebFlow operations.Native libraries with C-coded optimized primitives o er a natural extension for media processing and high performance computing .In the external processing sector, we are experimenting with JDBC drivers for Oracle and mSQL, with JDBC/ODBC drivers for PC databases such as Access or SQL Server and we are developing a WebVM based distributed database layer (see Figure 11), with intelligent agent (such as IBM aglets) based connectivity and visual WebFlow support for designing high level information retrieval and data mining strategies.12.In two `depth' projects, WebSpace and VDCE, we are probing selected tier-1 and tier-3 aspects of WebVM, respectively.In the `breadth' area, focused on system scalability we are initiating collaboration with IBM Watson in the area of Televirtuality and we are seeking federal funds to address World-Wide Virtual Machine architecture HPDC96] SC96].

Command and Control
In the VDCE project VDCE96], we are analyzing C3I functions recently published by the RL C3I Parallel Benchmarking Project and we are developing a library of 3CI modules that would support interactive composition of Battle Management C3I systems such shown as in Figure 13 using the visual graph editing tools.More generally, VDCE addresses complementary aspects of Web based distributed computing and it o ers a natural connectivity between the pure Java

Telemedicine
In the CareWeb project CareWeb96], conducted jointly with Syracuse University College of Nursing, SUNY Health Science Center and Syracuse City School District, we are developing a collaboratory telemedicine system for school nursing, based on the `bridge' topology Bridge96].Figure 14 illustrates a CareWeb bridge under development, connecting `points of need' (parents, nurses) with `points of care' (nurse practitioners, pediatricians) via an intelligent Web based switchboard.Individual bridge services are managed as WebVM nodes and connected, integrated and customized for individual healthcare provider needs using the WebFlow visual authoring tools.

Televirtuality
In a joint project with IBM Watson TVR96], we are analyzing scalability issues of WebVM architecture in the context of televirtual, i.e., 3D multi-user collaboratory environments on the Internet.

Figure 3 :
Figure 3: Early integration experiments: Portlet library extracted from Caltech Infospheres is used to form a token ring, connecting a set of Jigsaw Resource nodes.A message packet rotates along the ring with a user-adjustable speed and generates visual feedback in a monitor applet.

Figure 4 :Figure 5 :
Figure 4: Initial design of the WebFlow management layer, implemented as a set of Jeeves servlets and including: Session Manager, Module Manager and Connection Manager.

Figure 6 :
Figure 6: Steps ivolved in making the connection between the two ports.

Figure 7 :
Figure 7: Servers in the WebFlow system are accessible through both URL and socket connections.

Figure 10 :
Figure 10: Initial WebFlow front-end, based on extended GEF (Graph Editing Framework) from UCI. Modules are selected from the palette in the click-and-drag style.Compute-webs are constructed interactively in the click-click-to-connect model.Individual modules can be given user-programmable visual appearance.In the next step, vector graphics drawing tools will be provided for interactive authoring of module icons.

Figure 11 :
Figure 11: Example of a distributed heterogeneous database environment, managed by the WebVM layer via JDBC interfaces, and custom editable via WebFlow visual graph editing tools.Intelligent agents such as IBM aglets are used to disseminate or search for the information, stored/retrieved by servlets and visualized by applets.

Figure 15 :
Figure 15: Example of a Televirtuality application with non trivial compute-web topology: Virtual Shopping Mall

Table 1 :
Comparative analysis of strategies and components for Web based distributed computing in selected systems investigated at NPAC.
5.1 OverviewOur prototype WebVM is given by a mesh of Jeeves servers, running servlets that manage and coordinate distributed computation.Atomic encapsulation units of WebVM computation are a set of standard Jigsaw resources such as File or Editors, and our own experiments with multi-user and/or multi-server extensions such as Chat session or WebFlow module Resources.
Figure12: WebVM as a reusable middleware, tested in a set of research projects at Syracuse University such as WebSpace, VDCE and Televirtuality, focused on various front-end metaphors in tier-1 and/or various computational paradigms in tier-3.
view WebVM as a reusable middleware and we intend to test it in a set of Web based distributed applications under development.These e orts, partially supported by Department of Energy, Rome Laboratory and IBM Watson, allow us to test various aspects of the WebVM architecture as illustrated in Figure We