# Partitioned Parallel Radix Sort

@article{Lee2002PartitionedPR, title={Partitioned Parallel Radix Sort}, author={Shindong Lee and Minsoo Jeon and Dongseung Kim and Andrew Sohn}, journal={J. Parallel Distributed Comput.}, year={2002}, volume={62}, pages={656-668} }

Load balanced parallel radix sort solved the load imbalance problem present in parallel radix sort. By redistributing the keys in each round of radix, each processor has exactly the same number of keys, thereby reducing the overall sorting time. Load balanced radix sort is currently known as the fastest internal sorting method for distributed-memory multiprocessors. However, as the computation time is balanced, the communication time emerges as the bottleneck of the overall sorting performance… Expand

#### Figures, Tables, and Topics from this paper

#### 52 Citations

Partitioned Parallel Radix Sort

- Computer Science
- ISHPC
- 2000

A new parallel radix sorter is presented that solves the communication problem of balanced radix sort, called partitioned parallel radIX sort, and reduces the communication time by eliminating the redistribution steps. Expand

PARADIS: An Efficient Parallel Algorithm for In-place Radix Sort

- Computer Science
- Proc. VLDB Endow.
- 2015

A novel parallel in-place radix sort algorithm, PARADIS, which addresses both problems: a "speculative permutation" solves the first problem by assigning multiple non-continuous array stripes to each processor, and the resulting shared-nothing scheme achieves full parallelization. Expand

Load-balanced parallel merge sort on distributed memory parallel computers

- Computer Science
- Proceedings 16th International Parallel and Distributed Processing Symposium
- 2002

Sort can be speeded up on parallel computers by dividing and computing data individually in parallel. Merge sort can be parallelized, however, the conventional algorithm implemented on distributed… Expand

A full parallel radix sorting algorithm for multicore processors

- Computer Science
- 2011

This paper introduces PARL, a parallel left radix sorting algorithm for use on ordinary shared memory multi core machines, that has just one simple statement in its sequential part and is user friendly in that the user calling this PARL algorithm does not have to deal with creating threads themselves; to sort their data, they just create a sorting object and make a call to a thread safe method in that object. Expand

Parallel Merge Sort with Load Balancing

- Computer Science
- International Journal of Parallel Programming
- 2004

The proposed load-balanced merge sort utilizes all processors throughout the computation and evenly distributes data to all processors in each stage, thereby every processor is forced to work in all phases. Expand

Parallelizing Merge Sort onto Distributed Memory Parallel Computers

- Computer Science
- ISHPC
- 2002

Merge sort is useful in sorting a great number of data progressively, especially when they can be partitioned and easily collected to a few processors. Merge sort can be parallelized, however,… Expand

High-speed parallel external sorting of data with arbitrary distribution

- Computer Science
- Int. J. High Perform. Comput. Netw.
- 2004

Two distribution-insensitive scalable parallel external sorting algorithms that use sampling technique and histogram counts to achieve even distribution of keys, which eventually contribute to achieve good performance are developed. Expand

Theoretically-Efficient and Practical Parallel In-Place Radix Sorting

- Computer Science
- SPAA
- 2019

The performance of Regions Sort is compared to existing parallel in-place and out-of-place sorting algorithms on a variety of input distributions and shown to be faster than optimized out- of-place radix sorting and comparison sorting algorithms. Expand

Modified Pure Radix Sort for Large Heterogeneous Data Set

- Computer Science
- 2012

This research paper presents a new modified pure radix sort algorithm for large heterogeneous data set, which tries to optimize all related problems of radixsort through this algorithm. Expand

Communication-efficient bitonic sort on a distributed memory parallel computer

- Computer Science
- Proceedings. Eighth International Conference on Parallel and Distributed Systems. ICPADS 2001
- 2001

An efficient way of data communication in bitonic sort to minimize the interprocessor communication and comparison time is presented and it is believed the scheme is a good way to shorten the communication time in similar applications. Expand

#### References

SHOWING 1-10 OF 26 REFERENCES

Partitioned Parallel Radix Sort

- Computer Science
- ISHPC
- 2000

A new parallel radix sorter is presented that solves the communication problem of balanced radix sort, called partitioned parallel radIX sort, and reduces the communication time by eliminating the redistribution steps. Expand

Load balanced parallel radix sort

- Computer Science
- ICS '98
- 1998

Experimental results indicate that balanced radix sort can sort OSG integers in 20 seconds and 128M doubles in 15 seconds on a 64-processor SPZWN while yielding over 40-fold speedup. Expand

Communication-efficient bitonic sort on a distributed memory parallel computer

- Computer Science
- Proceedings. Eighth International Conference on Parallel and Distributed Systems. ICPADS 2001
- 2001

An efficient way of data communication in bitonic sort to minimize the interprocessor communication and comparison time is presented and it is believed the scheme is a good way to shorten the communication time in similar applications. Expand

Parallel algorithms for personalized communication and sorting with an experimental study (extended abstract)

- Computer Science
- SPAA '96
- 1996

A novel variation on sample sort which uses only two rounds of regular all-to-all personalized communication in a scheme that yields very good load balancing with virtually no overhead and performance is invariant over the set of input distributions unlike previous efficient algorithms. Expand

Fast Parallel Sorting Under LogP: Experience with the CM-5

- Computer Science
- IEEE Trans. Parallel Distributed Syst.
- 1996

The LogP model is shown to be a valuable guide in the development of parallel algorithms and a good predictor of implementation performance; the model encourages the use of data layouts which minimize communication and balanced communication schedules which avoid contention. Expand

Minimizing Communication in the Bitonic Sort

- Computer Science
- IEEE Trans. Parallel Distributed Syst.
- 2000

A recirculating bitonic sorting network is presented, which is composed of one level of N/2 comparators plus an /spl Omega/-network of (log N-1) switch levels, which reduces the cost complexity to O(N log N) compared with the O (N log/sup 2/ N) of the original bitonic sorted network, while preserving the same time complexity. Expand

Parallel sorting and data partitioning by sampling

- Computer Science
- 1983

The analysis is developed for parallel sorting in a local network environment, with distributed data sets in secondary storage devices, and a data partitioning method by sampling is proposed. Expand

Identifying the capability of overlapping computation with communication

- Computer Science
- Proceedings of the 1996 Conference on Parallel Architectures and Compilation Technique
- 1996

Experimental results indicate that both multiprocessors would yield up to 30% to 40% overlap of communication time when the message size is approximately 1K integers. Expand

Sorting networks and their applications

- Computer Science
- AFIPS '68 (Spring)
- 1968

To achieve high throughput rates today's computers perform several operations simultaneously. Not only are I/O operations performed concurrently with computing, but also, in multiprocessors, several… Expand

Tight Bounds on the Complexity of Parallel Sorting

- Mathematics, Computer Science
- IEEE Transactions on Computers
- 1985

Tight upper and lower bounds are proved on the number of processors, information transfer, wire area, and time needed to sort N numbers in a bounded-degree fixed-connection network. Expand