Genetic algorithms for graph partitioning and incremental graph partitioning

Partitioning graphs into equally large groups of nodes, minimizing the number of edges between different groups, is an extremely important problem in parallel computing. This paper presents genetic algorithms for suboptimal graph partitioning, with new crossover operators (KNUX, DKNUX) that lead to orders of magnitude improvement over traditional genetic operators in solution quality and speed. Our method can improve on good solutions previously obtained by using other algorithms or graph theoretic heuristics in minimizing the total communication cost or the worst case cost of communication for a single processor. We also extend our algorithm to Incremental Graph Partitioning problems, in which the graph structure or system properties changes with time.


Introduction
Graph partitioning is the task of dividing the nodes of a graph into groups called parts (or bins), in such a way that each part has roughly the same number of nodes, and minimizing the cut-size, i.e., the number of edges that connect nodes in di erent parts. This problem has important applications in parallel computing. For instance, e ciently parallelizing many This is a revised version of a paper that appeared in Proc. IEEE Supercomputing Conf., 1994. y Sanjay Ranka is currently at the University of Florida, Gainesville. His work was partially supported by NSF grant CCR-9110812 and DARPA contract #DABY63-91-C-0028. The contents of this paper do not necessarily re ect the position or policy of the United States government, and no o cial endorsement should be inferred. c ISSN 1063-9535. Copyright (c) 1994 IEEE. All rights reserved. scienti c and engineering applications requires partitioning data or tasks among processors, such that the computational load on each node is roughly the same, while inter-processor communication is minimized.
Obtaining exact solutions for graph partitioning is computationally intractable, and several suboptimal methods have been suggested for nding good solutions to the graph partitioning problem. Important heuristics include recursive coordinate bisection, recursive graph bisection, recursive spectral bisection, mincut based methods, clustering techniques, geometry-based mapping, block-based spatial decomposition, and scattered decomposition 3,11,12,15].
We present genetic algorithms for graph partitioning, using new crossover operators that utilize information available from the history of genetic search. Our work is characterized by the following features: 1. Use of prior information to improve solutions.
2. E cient partitioning of graphs to which incremental updates are made.
3. Parallelizability, using a distributed genetic algorithm model. 4. Re nement of parts obtained by other methods.

5.
Optimization of the worst case communication cost, a non-di erentiable function. We have obtained excellent results due to newly developed genetic recombination operators (KNUX and DKNUX) that exploit domain-speci c knowledge. These give improved solutions and faster convergence rates when compared with the traditional crossover operators. Exact comparisons of the di erent algorithms are not available due to the unavailability of benchmark problems and results. However, our experiments with the traditional crossover operators used by some of these researchers gave results of lower quality than using the operators presented in this paper.
The results achieved by our methods are better or comparable to the best known methods for graph partitioning, for graphs with a few hundred nodes. The quality of solutions obtained using DKNUX is competitive with recursive spectral bisection as a graph partitioning strategy, especially for incremental graph partitioning. However, genetic algorithms do require much more execution time than greedy algorithms, and are recommended in applications where the quality of solution is important enough to warrant the extra computational e ort. Fortunately, GA's are readily parallelizable, with near-linear speedups. Applying a prior graph contraction step should precede the partitioning of very large graphs using GA's. Section 2 describes the task addressed by the genetic algorithm. Section 3 describes how genetic algorithms are applied to this problem. Experimental results are given in Section 4.

Graph Partitioning as Optimization
Let V denote the set of vertices of the graph to be partitioned, and let E denote its edges. The graphpartitioning problem consists of nding an assignment scheme M : V ?! P that maps vertices to n parts.
We denote by B(q) the set of vertices assigned to a part q, i.e., B(q) = fv 2 V : M(v) = qg. Edges may connect physically proximate vertices in graphs representing the computational structure of a physical domain.
Graph partitioning is a multi-objective optimization problem, since both load imbalance and communication costs must be minimized. These objectives are often achieved by minimizing either X q I(q) + X q C(q); where I(q) is the load imbalance attributed to part q, C(q) is the communication cost attributed to part q, and expresses the relative importance of the two objectives. This composite cost function focuses on the total communication cost; an alternative is to minimize X q I(q) + max q C(q); which focuses on the communication cost for the worst part. Our methods work with either formulation. For domain decomposition methods, optimizing the latter function is more desirable. The former is often used because most traditional methods require di erentiability of the function being optimized.
The weight w i corresponds to the computation cost (or weight) of a vertex v i 2 V . The average load of each part is P vi2V w i =n. We de ne the load imbalance attributed to the qth part as I(q) = ( P vi2B(q) w i ? P vi2V w i =n) 2 , where n is the number of parts into which the graph must be partitioned.
The communication cost w e (v 1 ; v 2 ) corresponding to an edge describes the amount of interaction between vertices v 1 and v 2 . The cost of all the outgoing edges from a part is Genetic algorithms attempt to maximize a \ tness" function, whose value is relatively high for candidate solutions of better quality. Our experiments were conducted using the following two tness functions, which assume unit (equal) computation cost (w i ) for each node, unit communication cost (w e ) for each edge, and = 1.

Representation
For graph partitioning, we select a vector representation for each individual (candidate solution), in which the i th element of an individual is j i the i th node of the graph is allocated to the part labelled j. For instance, the string 11100011 represents the mapping that assigns nodes 1,2,3,7,8 to part (processor) 1 and nodes 4,5,6 to part (processor) 0. According to our de nitions of tness, if the graph is one in which the i th node is adjacent to the (i+1) st node for each i, then 11100011 would be less t than 11100001 (which is a more balanced partition), but more t than 10101011 (which has 6 inter-part edges).

Crossover
One-point crossover 4] works by selecting a site in chromosomes and to produce and . A popular generalization is 2-point crossover, in which the parents and produce o spring and . This has been further generalized to`k-point crossover'. In uniform crossover (UX) 14], the i th component of an o spring is chosen to be the same as that of one of the two parents, with equal probability.
UX ignores the fact that one parent may have much better genetic material than another, or that one region of the search space is already known to produce individuals of higher tness than other regions. UX can be described in terms of a bit-vector mask, each bit of which determines the parent from which an ospring inherits a value for a particular bit-position.
Our new Knowledge-based Non-Uniform Crossover operator (KNUX) generalizes this idea, using a bias probability vector p = (p 1 ; : : :; p n ), where each p i is a real number 2 0; 1]. The value of each bias probability p i depends on i, the relative tness of the parent strings, and on problem-speci c knowledge. Given p and the two parents, a = (a 1 ; : : :; a n ) and b = (b 1 ; : : :; b n ), the o spring c = (c 1 ; : : :; c n ) is obtained such that if a i = b i , then c i = a i , else the probability that c i = a i is p i .
For graph partitioning, an initial candidate solution I is rst generated. Let (i) be the set of neighbors of node i in the graph under consideration. For any candidate solution X, let #(i; X; I) be the number of nodes in (i) that are allocated by I to part X i . If a and b are the two parents, then we de ne

Dynamic KNUX (DKNUX)
The quality of solutions obtained by KNUX depends on the quality of the heuristic estimate (I above) used to derive bias probabilities. It is therefore important to obtain a good, fast heuristic estimate of a solution. DKNUX utilizes information inherent in the history of the genetic search, and continually updates the estimate I to be the current best solution, using this to build the bias vector.

Distributed Population Model
We use a coarse-grained, distributed-population genetic algorithm (DPGA), where individuals are distributed into various subpopulations which may be physically located on di erent processors con gured in some architecture (e.g., mesh). Crossovers are restricted to occur between members of the same subpopulation. Each subpopulation periodically communicates copies of its best individuals to its neighboring subpopulations (situated on neighboring processors in the parallel architecture); this is how genetic information is exchanged.

Population Initialization
The initial population can be seeded with a preestimated heuristic solution such as that obtained through an Index Based Partitioning scheme or the results of recursive spectral bisection. In the incremental case, the previous partitioning can itself be used to generate a good partitioning for the changed graph by randomly assigning new graph nodes to various nodes, while at the same time ensuring that balance is maintained.

Hill Climbing
It is possible to perform hill-climbing on o spring, to obtain the nearest local optima of the tness function. Only the \boundary points" of each part (with neighbors in other parts) are examined to see if migrating them to the appropriate neighboring part improves tness.

Experimental Results
In this section, we compare the results obtained using our approach with those of traditional heuristics (e.g., IBP or RSB) as well as with genetic algorithms that invoke traditional crossover operators. The gures are obtained by averaging the results of 5 runs, and the tables represent the best solutions obtained in these 5 runs. All experiments were done with algorithm DPGA set with a total population size of 320. The crossover rate p c = 0:7 and the mutation rate p m = 0:01. Tables 1, 2 and 3 report P q C(q)=2 values, while Tables 4, 5 and 6 report max q C(q) values, where C(q) is the number of edges that cut across part q.
Experiments were conducted with a single population as well as with 16 subpopulations con gured as a four dimensional hypercube. Graphs with unit weight nodes and edges were assumed, although weighted edges and nodes can also be handled easily. For clarity, the cut-size numbers are given in the tables, instead of the actual tness function values; for graphpartitioning, smaller cut-size numbers indicate superior performance. The results establish very clearly the excellent performance of KNUX and DKNUX in comparison with two-point crossover and also that DKNUX is competitive with recursive spectral bisection as a graph partitioning strategy.

Improving solutions obtained using other methods
Fast heuristic algorithms can be used to obtain an initial candidate solution which is then improved by applying the genetic algorithm. Table 1

Incremental Graph Partitioning
For this series of experiments, we start with a graph, partition it, then modify by adding some number of nodes in a local area chosen randomly within the graph. The modi ed graphs are then partitioned.   \Worst Cut" refers to max q C(q), where C(q) is the number of edges leading out of part q. For the GA, the maximum number of edges leading out of a part is reported, for the best individual explored by the GA.

Minimizing Worst Case Communication Cost
Unlike other methods which can be used only with a di erentiable optimization function, genetic algorithms can be used directly to optimize P q I(q) + max q C(q); a task that cannot be attempted with methods that require availability of the rst derivative of the function to be optimized. Table 4 exhibits the e ect of partitioning graphs of 78, 88, 98, 144 and 167 nodes into 4 and 8 parts, respectively. Table 4 shows the best solution found using operator DKNUX is better than that obtained using RSB in most cases. In other cases, improvements can be obtained by seeding the initial population with a heuristically obtained good solution such as the index based partitioner.

Conclusions
We have solved the graph partitioning problem using GA's with new knowledge-based crossover operators; problem-speci c knowledge is used to generate bias probabilities, and the \environment" and current population play roles in controlling genetic expression. The trajectory that the population takes in search   space is constrained, driving evolution in certain preferred directions.
We have introduced novel operators that exploit the locality information inherent in most computational graphs. We have shown this enhances the speed and performance of genetic search by orders of magnitude. We have demonstrated that genetic algorithms can be used to greatly re ne previously estimated parts with the help of KNUX and DKNUX. We show how the strategies discussed in this paper extend naturally to incremental graph partitioning. The incremental partitioning results obtained using DKNUX could not be obtained by a simple deterministic algorithm that assigns new nodes to the part to which most of its nearest neighbors belong. Performance can further be improved by incorporating a hill-climbing step.
We have presented preliminary results showing the feasibility of this approach and the gains obtainable by examining the history of the search process; unfortunately, partitioning very large graphs does require high amounts of computation by the genetic algorithm. A prior graph contraction step would allow these techniques to be applied to graphs much larger than those explored in this paper 13]. Some gains can be expected from executing the GA on parallel computers, since DPGA is an inherently parallel algorithm from which we can expect near-linear speedups. We are currently parallelizing the algorithm to run on distributed memory machines such as the CM-5 and the Intel Paragon.
Index-based algorithms to partition graphs have been described in 10]. An IBP algorithm includes three phases| indexing, sorting, and coloring. The indexing scheme is based on converting an Ndimensional co-ordinate into a one-dimensional index such that proximity in the multi-dimensional space is maintained. Row-major indexing and shu ed rowmajor indexing are two of the several ways of indexing pixels in a two-dimensional grid. These two indexing schemes are shown in Figure 1 for a graph in which the set of vertices are arranged in a grid of size 8 8. A simple example of interleaving indices is as follows. Suppose index 1 = 001, index 2 = 010, and index 3 = 110. Then the interleaved index would be 001011100. In the above case the number of bits in each dimension are equal. This could easily be generalized to cases when the sizes are di erent. For example if index 1 = 101, index 2 = 01, and index 3 = 0, then the interleaved index would be 100110. This is done by choosing bits (right to left) of each of the dimensions one by one, starting from dimension 3. When the bits of a particular dimension are no longer available, that dimension is not considered.
After indexing is done, an e cient sorting algorithm can be applied to sort these vertices according to their indices. Finally, this sorted list is divided into P equal sublists.