Document Type

Article

Date

1994

Keywords

algorithms, transportation primitive, distributed memory parallel architecture

Language

English

Disciplines

Computer Sciences

Description/Abstract

This paper presents algorithms for implementing the transportation primitive on a distributed memory parallel architecture. The transportation primitive performs many-to-many personalized communication with bounded incoming and outgoing traffic. We present a two-stage deterministic algorithm that decomposes the communication with possibly high variance in message size into two communication stages with low message size variance. If the maximum outgoing or incoming traffic at any processor is t, transportation can be done in 2t¯ time (+ lower order terms) when t O(p 2 + pø=¯) (¯ is the inverse of the data transfer rate, ø is the startup overhead). If the maximum outgoing and incoming traffic are r and c respectively, transportation can be done in (r+c)¯ time when either r O(p 2 ) or c O(p 2 ). Optimality and scalability are thus achieved when the traffic is large, a condition that is usually satisfied in practice. The algorithm was implemented on the Connection Machine CM-5. The implementation used the low latency communication primitives (active messages) available on the CM 5, but the algorithm as such is architecture independent. An alternate single-stage algorithm using distributed random scheduling was implemented on the CM-5 and the performance of the two algorithms were compared.

Share

COinS
 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.