KASKADE 7 development version
Classes | Typedefs | Functions | Variables
Multithreading

Support routines and data structures for multithreaded execution. More...

Classes

class  Kaskade::Kalloc
 A simple memory manager for NUMA systems. More...
 
class  Kaskade::ConcurrentQueue< T >
 A concurrent fifo queue. More...
 
class  Kaskade::NumaThreadPool
 Implementation of thread pools suitable for parallelization of (more or less) memory-bound algorithms (not only) on NUMA machines. More...
 
class  Kaskade::NumaAllocator< T >
 An STL allocator that uses memory of a specific NUMA node only. More...
 
class  Kaskade::Mutex
 A utility class implementing appropriate copy semantics for boost mutexes. More...
 

Typedefs

typedef std::packaged_task< void()> Kaskade::Task
 Abstract interface for tasks to be scheduled for concurrent execution. More...
 
typedef std::future< void > Kaskade::Ticket
 Abstract waitable job ticket for submitted tasks. More...
 

Functions

void Kaskade::equalWeightRanges (std::vector< size_t > &x, size_t n)
 Computes partitioning points such that the sum of weights in each partition is roughly the same. More...
 
template<class BlockIndex , class Index >
Index Kaskade::uniformWeightRangeStart (BlockIndex i, BlockIndex n, Index m)
 Computes partitioning points of ranges for uniform weight distributions. More...
 
template<class Index >
Index Kaskade::uniformWeightRange (Index j, Index n, Index m)
 Computes the range in which an index is to be found when partitioned for uniform weights. More...
 
template<class Func >
void Kaskade::parallelFor (Func const &f, int maxTasks=std::numeric_limits< int >::max())
 A parallel for loop that executes the given functor in parallel on different CPUs. More...
 
template<class Func >
void Kaskade::parallelFor (size_t first, size_t last, Func const &f, size_t nTasks=std::numeric_limits< size_t >::max())
 A parallel for loop that executes the given functor in parallel on different CPUs. More...
 
template<class Func >
void Kaskade::parallelForNodes (Func const &f, int maxTasks=std::numeric_limits< int >::max())
 A parallel for loop that executes the given functor in parallel on different NUMA nodes. More...
 
void Kaskade::runInBackground (std::function< void()> &f)
 Executes a function in a child process. More...
 

Variables

std::mutex Kaskade::DuneQuadratureRulesMutex
 A global lock for the Dune::QuadratureRules factory, which is not thread-safe as of 2015-01-01. More...
 
boost::mutex Kaskade::refElementMutex
 A global lock for the Dune::GenericReferenceElement singletons, which are not thread-safe. More...
 

Detailed Description

Support routines and data structures for multithreaded execution.

Typedef Documentation

◆ Task

typedef std::packaged_task<void()> Kaskade::Task

Abstract interface for tasks to be scheduled for concurrent execution.

Definition at line 248 of file threading.hh.

◆ Ticket

typedef std::future<void> Kaskade::Ticket

Abstract waitable job ticket for submitted tasks.

Definition at line 254 of file threading.hh.

Function Documentation

◆ equalWeightRanges()

void Kaskade::equalWeightRanges ( std::vector< size_t > &  x,
size_t  n 
)

Computes partitioning points such that the sum of weights in each partition is roughly the same.

Let \( w_i, 0\le i < N \) denote the weights x[i]. On exit, x has size \( k=n+1 \) with values \( x_i \) such that \( x_0 = 0 \), \( x_{k-1} = N\), and

\[ \sum_{j=x_i}^{x_{i+1}-1} w_j \approx \frac{1}{k} \sum_{j=0}^{N-1} w_j. \]

Parameters
[in,out]xthe array of (nonnegative) weights
[in]nthe desired number of partitions (positive).

Referenced by Kaskade::NumaCRSPattern< Index >::NumaCRSPattern().

◆ parallelFor() [1/2]

template<class Func >
void Kaskade::parallelFor ( Func const &  f,
int  maxTasks = std::numeric_limits<int>::max() 
)

A parallel for loop that executes the given functor in parallel on different CPUs.

Template Parameters
Funca functor with an operator ()(int i, int n)
Parameters
fthe functor to call
maxTasksthe maximal number of tasks to create

The given function object is called in parallel for values (i,n) with i ranging from 0 to n-1, i.e. it is the i-th call of total n calls. The function object is responsible for partitioning the work appropriately, e.g., using uniformWeightRange(), and perform locking if needed. The number n is the same for all calls and guaranteed not to exceed maxTasks (but may be smaller).

The function returns after the last task has been completed, therefore the computational effort for tasks 0,...,n-1 should be roughly equal, no matter which value n has. For an optimal performance, maxTasks should be either a small multiple of the number of CPUs in the system (such that ideally all CPUs are busy the whole time) or much larger than that (such that an imbalance has only a small impact).

Definition at line 489 of file threading.hh.

Referenced by Kaskade::PatchDomainDecompositionPreconditioner< Space, m, StorageTag, SparseMatrixIndex >::apply(), Kaskade::VariationalFunctionalAssembler< F, SparseIndex, BoundaryDetector, QuadRule >::assemble(), Kaskade::NumaBCRSMatrix< Entry, Index >::conjugation(), Kaskade::deformedProlongationStack(), Kaskade::getContactConstraints(), Kaskade::gridIterate(), Kaskade::NumaBCRSMatrix< Entry, Index >::operator*(), Kaskade::PatchDomainDecompositionPreconditioner< Space, m, StorageTag, SparseMatrixIndex >::PatchDomainDecompositionPreconditioner(), Kaskade::prolongationStack(), Kaskade::TransferData< Space, CoarseningPolicy >::TransferData(), and Kaskade::BDDC::BDDCSolver< Subdomain >::update_rhs().

◆ parallelFor() [2/2]

template<class Func >
void Kaskade::parallelFor ( size_t  first,
size_t  last,
Func const &  f,
size_t  nTasks = std::numeric_limits<size_t>::max() 
)

A parallel for loop that executes the given functor in parallel on different CPUs.

Template Parameters
Funca functor with an operator ()(int i)
Parameters
fthe functor to call
firstthe first iteration index, usually 0
lastone after the last iteration index, i.e. the number of iterations. Precondition: last >= first.

The given function is called last-first times, exactly once for each iteration index, with the only argument being the iteration index. The granularity can be quite fine. The computational work for different iteration indices can be rather different without sacrificing parallel efficiency.

min(last-first,nCpus,nTasks) tasks are created. Each tasks grabs the next not yet executed iteration index and calls the functor with that index, then repeats.

Definition at line 531 of file threading.hh.

◆ parallelForNodes()

template<class Func >
void Kaskade::parallelForNodes ( Func const &  f,
int  maxTasks = std::numeric_limits<int>::max() 
)

A parallel for loop that executes the given functor in parallel on different NUMA nodes.

Template Parameters
Funca functor with an operator ()(int i, int n)
Parameters
fthe functor to call
maxTasksthe maximal number of tasks to create

The given function object is called in parallel for values (i,n) with i ranging from 0 to n-1. The number n is min(nNodes,maxTasks) for all calls to f. Every task is executed on the node given by arguemnt i.

The function returns after the last task has been completed, therefore the computational effort for tasks 0,...,n-1 should be roughly equal, no matter which value n has.

Definition at line 604 of file threading.hh.

Referenced by Kaskade::NumaCRSPatternCreator< Index >::addAllElements(), Kaskade::NumaCRSPatternCreator< Index >::addElements(), Kaskade::JacobiPreconditionerDetail::DiagonalBlock< Entry, row, col >::apply(), Kaskade::NumaBCRSMatrix< Entry, Index >::NumaBCRSMatrix(), Kaskade::NumaCRSPattern< Index >::NumaCRSPattern(), Kaskade::NumaCRSPatternCreator< Index >::NumaCRSPatternCreator(), Kaskade::NumaBCRSMatrix< Entry, Index >::operator=(), and Kaskade::NumaCRSPatternCreator< Index >::~NumaCRSPatternCreator().

◆ runInBackground()

void Kaskade::runInBackground ( std::function< void()> &  f)

Executes a function in a child process.

The given function is executed independently of and in parallel to the caller in a child process. Due to the address space separation, the caller may proceed to modify its data structures, while the child is not affected. Uses this to execute expensive "fire and forget" tasks that rely on the data structures not to change and are difficult to parallelize.

The typical use case is to write output files in the background, while the computation continues.

◆ uniformWeightRange()

template<class Index >
Index Kaskade::uniformWeightRange ( Index  j,
Index  n,
Index  m 
)

Computes the range in which an index is to be found when partitioned for uniform weights.

Template Parameters
Indexan integral type
Parameters
jthe index for which the range number is to be computed. Has to be in [0,m[
nthe number of ranges
mthe total number of entries

Definition at line 91 of file threading.hh.

Referenced by Kaskade::JacobiPreconditionerDetail::DiagonalBlock< Entry, row, col >::DiagonalBlock().

◆ uniformWeightRangeStart()

template<class BlockIndex , class Index >
Index Kaskade::uniformWeightRangeStart ( BlockIndex  i,
BlockIndex  n,
Index  m 
)

Computes partitioning points of ranges for uniform weight distributions.

The functionality is similar to equalWeightRanges, but for uniform weight distribution, the partitioning points can be computed directly.

Template Parameters
Indexan integral type
Parameters
ithe number of the range for which the starting point is to be computed. Has to be in [0,n]
nthe number of ranges
mthe total number of entries

Definition at line 75 of file threading.hh.

Referenced by Kaskade::JacobiPreconditionerDetail::DiagonalBlock< Entry, row, col >::apply(), Kaskade::NumaBCRSMatrix< Entry, Index >::conjugation(), Kaskade::JacobiPreconditionerDetail::DiagonalBlock< Entry, row, col >::DiagonalBlock(), Kaskade::PatchDomainDecompositionPreconditioner< Space, m, StorageTag, SparseMatrixIndex >::PatchDomainDecompositionPreconditioner(), Kaskade::LocalVectors< Entry, SortedIdx, Vector >::scatter(), Kaskade::ThreadedMatrixDetail::CRSChunk< Entry, Index >::scatter(), and Kaskade::uniformWeightRange().

Variable Documentation

◆ DuneQuadratureRulesMutex

std::mutex Kaskade::DuneQuadratureRulesMutex
extern

A global lock for the Dune::QuadratureRules factory, which is not thread-safe as of 2015-01-01.

Referenced by Kaskade::QuadratureTraits< Dune::QuadratureRule< ctype, dim > >::rule().

◆ refElementMutex

boost::mutex Kaskade::refElementMutex
extern

A global lock for the Dune::GenericReferenceElement singletons, which are not thread-safe.