To install click the Add extension button. That's it.

The source code for the WIKI 2 extension is being checked by specialists of the Mozilla Foundation, Google, and Apple. You could also do it yourself at any point in time.

4,5
Kelly Slayton
Congratulations on this excellent venture… what a great idea!
Alexander Grigorievskiy
I use WIKI 2 every day and almost forgot how the original Wikipedia looks like.
Live Statistics
English Articles
Improved in 24 Hours
Added in 24 Hours
What we do. Every page goes through several hundred of perfecting techniques; in live mode. Quite the same Wikipedia. Just better.
.
Leo
Newton
Brights
Milds

Scalable locality

From Wikipedia, the free encyclopedia

Computer software is said to exhibit scalable locality[1] if it can continue to make use of processors that out-pace their memory systems, to solve ever larger problems. This term is a high-performance uniprocessor analog of the use of scalable parallelism to refer to software for which increasing numbers of processors can be employed for larger problems.

YouTube Encyclopedic

  • 1/3
    Views:
    1 341
    2 865
    2 618
  • Google Cloud Platform's resilient and scalable software-defined cloud (Google Cloud Next '17)
  • Directory Example - Georgia Tech - HPCA: Part 5
  • PWLTO#11 – Peter Sobot on An Industrial-Strength Audio Search Algorithm

Transcription

Overview

Consider the memory usage patterns of the following loop nest (an iterative two-dimensional stencil computation):

for t := 0 to T do
    for i := 1 to N-1 do
        for j := 1 to N-1 do
            new(i,j) := (A(i-1,j) + A(i,j-1) + A(i,j) + A(i,j+1) + A(i+1,j)) * .2
        end
    end

    for i := 1 to N-1 do
        for j := 1 to N-1 do
            A(i,j) := new(i,j)
        end
    end
end

The entire loop nest touches about 2*N**2 array elements, and performs about 5*T*N**2 floating-point operations. Thus, the overall compute balance (ratio of floating-point computations to floating-point memory cells used) of this entire loop nest is about 5T/2. When the compute balance is a function of problem size, as it is here, the code is said to have scalable compute balance. Here, we could achieve any compute balance we desire by simply choosing a large enough T.

However, when N is large, this code will still not exhibit good cache reuse, due to poor locality of reference: by the time new(1,1) is needed in the second assignment, or the second time step's execution of the first assignment, the cache line holding new(1,1) will have been overwritten with some other part of one of the arrays.

Tiling of the first i/j loop nest can improve cache performance, but only by a limited factor, since that nest has compute balance of about 5/2. To produce a very high degree of locality, for example 500 (to run this code efficiently with an array that will not fit in RAM and is relegated to virtual memory), we must re-use values across time steps.

Optimization across time steps has been explored in a number of research compilers; see work by Wonnacott,[1][2] by Song and Li,[3] or by Sadayappan et al.[4] for details of some approaches to time-tiling. Wonnacott[1] demonstrated that time tiling could be used to optimize for out-of-core data sets; in principle, any of these approaches[2][3][4] should be able to achieve arbitrarily high memory locality without requiring that the entire array fit in cache (the cache requirement does, however, grow with the required locality). The multiprocessor techniques cited above[2][4] should, in principle, simultaneously produce scalable locality and scalable parallelism.

References

  1. ^ a b c David Wonnacott. Achieving Scalable Locality with Time Skewing. International Journal of Parallel Programming 30.3 (2002)
  2. ^ a b c David Wonnacott. Using Time Skewing to eliminate idle time due to memory bandwidth and network limitations. International Parallel and Distributed Processing Symposium 2000
  3. ^ a b Yonghong Song and Zhiyuan Li. New tiling techniques to improve cache temporal locality. PLDI '99
  4. ^ a b c Sriram Krishnamoorthy and Muthu Baskaran and Uday Bondhugula and J. Ramanujam and Atanas Rountev and P. Sadayappan. Effective automatic parallelization of stencil computations. PLDI '07
This page was last edited on 25 October 2023, at 16:38
Basis of this page is in Wikipedia. Text is available under the CC BY-SA 3.0 Unported License. Non-text media are available under their specified licenses. Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc. WIKI 2 is an independent company and has no affiliation with Wikimedia Foundation.