College of Engineering and Computer Science - Former Departments, Centers, Institutes and Projects

Visual Interactive Modeling in a Java-based Hierarchical Modeling and Simulation System

Robert G. Sargent et al. — Mon, 11 Oct 2010 10:06:09 PDT

The desired set of properties of a modern simulation system is presented. The portion of the Hierarchical Modeling And Simulation System-Java (HiMASS-j) used for specifying Hierarchical Control Flow Graph (HCFG) Models is described. The specification of HCFG Models in HiMASS-j is by visual interactive modeling through the use of graphical user interfaces and dialog boxes. HCFG Models are specified using two complementary hierarchical specification structures: one to specify the components that comprise a model and how these components are interconnected, and the other to specify the behaviors of individual atomic components. A simulation of a traffic intersection using HiMASS-j is briefly discussed.

A decentralized task scheduling algorithm and its performance modeling for computer networks

Ishfaq Ahmad et al. — Mon, 11 Oct 2010 10:06:09 PDT

A dynamic task scheduling algorithm, that is stable, de-centralized, and adaptive to network topology, is presented. The proposed algorithm is an extension of nearest neighbor load balancing strategy with an enhanced degree of efficiency and it is intended for multicomputers connected by a store and forward communication network. The proposed algorithm is modeled by a central server open queuing network. It is shown that the response time of a task consists of two parts. The first part comprises a task‘s settling time which consists of scheduling time, communication time, and waiting time in scheduling and communication queues. The second part comprises waiting time in the execution queue in the execution time itself. In order to reduce the first response time, the scheduling algorithm needs to be stable, so that a task is quickly settled at some node. On the other hand, the second response time is reduced if the algorithm efficiently migrates the task to a lightly loaded node. The proposed algorithm is comprehensively evaluated, through simulation and analytical model, and is shown to be both stable and efficient. For performance evaluation, the task wander cost and the scheduling overhead is also taken into account. Experimental results are also obtained for another nearest neighbor scheduling scheme and compared with the pro-posed algorithm.

Skew-Insensitive Parallel Algorithms for Relational Join

Khaled Alsabti et al. — Mon, 11 Oct 2010 10:06:08 PDT

Join is the most important and expensive operation in relational databases. The parallel join operation is very sensitive to the presence of the data skew. In this paper, we present two new parallel join algorithms for coarse grained machines which work optimally in presence of arbitrary amount of data skew. The first algorithm is sort-based and the second is hash-based. Both of these algorithms employ a preprocessing phase (prior to the redistribution phase) to equally partition the work among the processors. The proposed algorithms have been designed for memory resident-data. However, they can be extended to disk resident-data. These algorithms are shown to be theoretically as well as practically scalable. Experimental results are provided on the IBM SP-2.

Characterization of a Class of Sigmoid Functions With Applications to Neural Networks

Anil Ravindran Menon et al. — Mon, 11 Oct 2010 10:06:07 PDT

Sigmoid functions, whose graphs are “S-shaped” curves, appear in a great variety of contexts, such as the transfer functions in many neural networks. Their ubiquity is no accident; these curves are among the simplest non-linear curves, striking a graceful balance between linear and non-linear behavior. .. This paper undertakes a study of two classes of sigmoids: the simple sigmoids, defined to be odd, asymptotically bounded, completely monotone functions in one variable, and the Hyperbolic sigmoids, a proper subset of simple sigmoids and a natural generalization of the hyperbolic tangent. The class of hyperbolic sigmoids includes a surprising number of well know sigmoids. The regular structure of the simple sigmoids often makes a theory tractable, paving the way for more general analysis.

Scheduling of Unstructured Communication on the Intel iPSC/860

Jhy-Chun Wang et al. — Mon, 11 Oct 2010 10:06:07 PDT

In this paper we present several algorithms for decomposing all-to-many personalized communication into a set of disjoint partial permutations. These partial permutations avoid node contention as well as link contention. We discuss the theoretical complexity of these algorithms and study their effectiveness both from the view of static scheduling and from runtime scheduling. Experimental results for our algorithms are presented on the iPSC/860.

Genetic Algorithms for Soft Decision Decoding of Linear Block Codes

Harpal Maini et al. — Mon, 11 Oct 2010 10:06:06 PDT

Soft-decision decoding is an NP-hard problem of great interest to developers of communication systems. We show that this problem is equivalent to the problem of optimizing Walsh polynomials. We present genetic algorithms for soft-decision decoding of binary linear block codes and compare the performance with various other decoding algorithms including the currently developed A* algorithm. Simulation results show that our algorithms achieve bit-error-probabilities as low as 0:00183 for a [104; 52] code with a low signal-to-noise ratio of 2:5 dB, exploring only 22; 400 code words, whereas the search space contains 4:5 \Theta 10 15 codewords. We define a new crossover operator that exploits domain-specific information and compare it with uniform and two point crossover. Keywords: genetic algorithms, soft-decision decoding, uniform crossover.

Shape Recognition Using Genetic Algorithms

Ender Ozcan et al. — Mon, 11 Oct 2010 10:06:05 PDT

Shape recognition is a challenging task when shapes overlap, forming noisy, occluded, partial shapes. This paper uses a genetic algorithm for matching input shapes with model shapes described in terms of features such as line segments and angles (extracted using traditional algorithms). The quality of matching is gauged using a measure derived from attributed shape grammars [12, 13]. Preliminary results, using shapes with about 30 features each, are extremely encouraging.

Scatter Scheduling for Problems with Unpredictable Structures

Min-You Wu et al. — Mon, 11 Oct 2010 10:06:05 PDT

An extended scatter scheduling was applied to problems with unpredictable, asynchronous structures. It has been found that with this simple scheduling strategy, good load balance can be reached without incurring much runtime overhead. This scheduling algorithm has been implemented on hypercube machines, and its performance is compared with other scheduling strategies.

Integer Sorting Algorithms for Coarse-Grained Parallel Machines

Khaled Alsabti et al. — Mon, 11 Oct 2010 10:06:04 PDT

Integer sorting is a subclass of the sorting problem where the elements have integer values and the largest element is polynomially bounded in the number of elements to be sorted. It is useful for applications in which the size of the maximum value of element to be sorted is bounded. In this paper, we present a new distributed radix-sort algorithm for integer sorting. The structure of our algorithm is similar to radix sort except that it typically requires less number of communication phases. We present experimental results for our algorithm on two distributed memory multiprocessors, the Intel Paragon and the Thinking machine CM-5. These results are compared with two other well known practical parallel sorting algorithms based on radix sort and sample sort. The experimental results show that the distributed radix-sort is competitive with the other two algorithms.

Communication-Efficient and Memory-Bounded External Redistribution

Jang Sun Lee et al. — Mon, 11 Oct 2010 10:06:03 PDT

This paper presents communication-efficient algorithms for the external data redistribution problem. Deterministic lower bounds and upper bounds are presented for the number of I/O operations, communication time and the memory requirements of external redistribution. Our algorithms differ from most other algorithms presented for out-of-core applications in that it is optimal (within a small constant factor) not only in the number of I/O operations, but also in the time taken for communication. A coarse-grained MIMD architecture with I/O subsystems attached to each processor is assumed, but the results are expected to be applicable over a wider variety of architectures.

Analyzing Images Containing Multiple Sparse Patterns with Neural Networks

Rangachari Anand et al. — Mon, 11 Oct 2010 10:06:03 PDT

We have addressed the problem of analyzing images containing multiple sparse overlapped patterns. This problem arises naturally when analyzing the composition of organic macromolecules using data gathered from their NMR spectra. Using a neural network approach, we have obtained excellent results in using NMR data to analyze the presence of various amino acids in protein molecules. We have achieved high correct classification percentages (about 87%) for images containing as many as five substantially distorted overlapping patterns.

The expressiveness of locally stratified programs

Howard A. Blair et al. — Mon, 11 Oct 2010 10:06:02 PDT

This paper completes an investigation of the logical expressibility of finite, locally stratified, general logic programs. We show that every hyperarithmetic set can be defined by a suitably chosen locally stratified logic program (as a set of values of a predicate over its perfect model). This is an optimal result, since the perfect model of a locally stratified program is itself an implicitly definable hyperarithmetic set (under a recursive coding of the Herbrand base); hence to obtain all hyperarithmetic sets requires something new, in this case selecting one predicate from the model. We find that the expressive power of programs does not increase when one considers the programs which have a unique stable model or a total well-founded model. This shows that all these classes of structures (perfect models of locally stratified logic programs, well-founded models which turn out to be total, and stable models of programs possessing a unique stable model) are all closely connected with Kleene’s hyperarithmetical hierarchy. Thus, for general logic programming, negation with respect to two-valued logic is related to the hyper- arithmetic hierarchy in the same way as Horn logic is to the class of recursively enumerable sets. In particular, a set is definable in the well-founded semantics by a program P whose well-founded partial model is total iff it is hyperarithmetic.

Mapping Unstructured Computational Graphs for Adaptive and Nonuniform Computational Environments

Maher Kaddoura et al. — Mon, 11 Oct 2010 10:06:02 PDT

In this paper we study the problem of mapping a large class of irregular and loosely synchronous data-parallel applications in a nonuniform and adaptive computational environment. The computational structure of these applications can be described in terms of a computational graph, where nodes of the graph represent computational tasks and edges describe the communication between tasks. Parallelization of these applications on nonuniform computational environments requires partitioning the graph among the processors in such fashion that the computation load on each node is proportional to its computational power, while communication is minimized. We discuss the applicability of current methods for graph partitioning for such environments. For an adaptive computational environment, the partitioning of the graph needs to be updated as the environment adapts, hence most algorithms described in the literature are computationally prohibitive. We discuss novel strategies that allow for fast remapping.

MPI as a Coordination Layer for Communicating HPF Tasks

Ian Foster et al. — Mon, 11 Oct 2010 10:06:01 PDT

Data-parallel languages such as High Performance Fortran (HPF) present a simple execution model in which a single thread of control performs high-level operations on distributed arrays. These languages can greatly ease the development of parallel programs. Yet there are large classes of applications for which a mixture of task and data parallelism is most appropriate. Such applications can be structured as collections of data-parallel tasks that communicate by using explicit message passing. Because the Message Passing Interface (MPI) defines standardized, familiar mechanisms for this communication model, we propose that HPF tasks communicate by making calls to a coordination library that provides an HPF binding for MPI. The semantics of a communication interface for sequential languages can be ambiguous when the interface is invoked from a parallel language; we show how these ambiguities can be resolved by describing one possible HPF binding for MPI. We then present the design of a library that implements this binding, discuss issues that influenced our design decisions, and evaluate the performance of a prototype HPF/MPI library using a communications microbenchmark and application kernel. Finally, we discuss how MPI features might be incorporated into our design framework.

Irregular Personalized Communication on Distributed Memory Machines

Sanjay Ranka et al. — Mon, 11 Oct 2010 10:06:00 PDT

In this paper we present several algorithms for performing all-to-many personalized communication on distributed memory parallel machines. We assume that each processor sends a different message (of potentially different size) to a subset of all the processors involved in the collective communication. The algorithms are based on decomposing the communication matrix into a set of partial permutations. We study the effectiveness of our algorithms both from the view of static scheduling and from runtime scheduling.

Optimization Using Replicators

Anil Ravindran Menon et al. — Mon, 11 Oct 2010 10:05:59 PDT

Replicator systems are among the simplest complex systems and can be considered to be at the foundation of many popularly used models ranging from theories of evolution and neurobiology to sociobiology and ecology. This paper presents the first successful application 2 of replicators to optimization problems. For a graph bi-partitioning problem with 50,000 nodes and 300,000 edges, for instance, close to optimal solutions were obtained in a few hundred iterations. Replicators provide a potentially powerful new tool to solve other optimization problems as well.

Random Data Accesses on a Coarse-grained Parallel Machine II. One-to-many and Many-to-one Mappings

Ravi V. Shankar et al. — Mon, 11 Oct 2010 10:05:59 PDT

This paper describes deterministic communication-efficient algorithms for performing random data accesses with hot spots on a coarse-grained parallel machine. The general random access read/write operations with hot spots can be completed in Clanip (+ lower order terms) time and is optimal and scalable provided n _> O(pa+p2r/l) (n is the number of elements distributed across p processors, r is the start-up overhead and 1/It is the data transfer rate). C is a small constant between 3 and 4 for the random access write operation, slightly higher for the random access read operation. Monotonic random access reads/writes can be completed with smaller constants and are optimal for smaller n as well. The random access read/write operations provide the framework for the communication-efficient simulation of CREW and CRCW PRAMs on a coarse- grained distributed memory parallel machine. A companion paper [24] deals with the problem of performing dynamic permutations.

Performance modeling of load balancing algorithms using neural networks

Ishfaq Ahmad et al. — Mon, 11 Oct 2010 10:05:58 PDT

This paper presents a new approach that uses neural networks to predict the performance of a number of dynamic decentralized load balancing strategies. A distributed multicomputer system using any distributed load balancing strategy is represented by a unified analytical queuing model. A large simulation data set is used to train a neural network using the back–propagation learning algorithm based on gradient descent. The performance model using the predicted data from the neural network produces the average response time of various load balancing algorithms under various system parameters. The validation and comparison with simulation data show that the neural network is very effective in predicting the performance of dynamic load balancing algorithms. Our work leads to interesting techniques for designing load balancing schemes (for large distributed systems) that are computationally very expensive to simulate. One of the important findings is that performance is affected least by the number of nodes, and most by the number of links at each node in a large distributed system.

Solving the region growing problem on the Connection Machine

Nawal Copty et al. — Mon, 11 Oct 2010 10:05:58 PDT

This paper presents a parallel algorithm for solving the region growing problem based on the split and merge approach. The algorithm was implemented on the CM-2 and the CM-5 in the data parallel and message passing models. The performance of these implementations is examined and compared.

Run-time Support for Parallelization of Data-Parallel Applications on Adaptive and Nonuniform Computational Environments

Maher Kaddoura et al. — Mon, 11 Oct 2010 10:05:57 PDT

In this paper we discuss the runtime support required for the parallelization of unstructured data parallel applications on nonuniform and adaptive environments. The approach presented is reasonably general and is applicable to a wide variety of regular as well as irregular applications. We present performance results for the solution of an unstructured mesh on a cluster of heterogeneous workstations.