KASKADE 7 development version
|
Support routines and data structures for multithreaded execution. More...
Classes | |
class | Kaskade::Kalloc |
A simple memory manager for NUMA systems. More... | |
class | Kaskade::ConcurrentQueue< T > |
A concurrent fifo queue. More... | |
class | Kaskade::NumaThreadPool |
Implementation of thread pools suitable for parallelization of (more or less) memory-bound algorithms (not only) on NUMA machines. More... | |
class | Kaskade::NumaAllocator< T > |
An STL allocator that uses memory of a specific NUMA node only. More... | |
class | Kaskade::Mutex |
A utility class implementing appropriate copy semantics for boost mutexes. More... | |
Typedefs | |
typedef std::packaged_task< void()> | Kaskade::Task |
Abstract interface for tasks to be scheduled for concurrent execution. More... | |
typedef std::future< void > | Kaskade::Ticket |
Abstract waitable job ticket for submitted tasks. More... | |
Functions | |
void | Kaskade::equalWeightRanges (std::vector< size_t > &x, size_t n) |
Computes partitioning points such that the sum of weights in each partition is roughly the same. More... | |
template<class BlockIndex , class Index > | |
Index | Kaskade::uniformWeightRangeStart (BlockIndex i, BlockIndex n, Index m) |
Computes partitioning points of ranges for uniform weight distributions. More... | |
template<class Index > | |
Index | Kaskade::uniformWeightRange (Index j, Index n, Index m) |
Computes the range in which an index is to be found when partitioned for uniform weights. More... | |
template<class Func > | |
void | Kaskade::parallelFor (Func const &f, int maxTasks=std::numeric_limits< int >::max()) |
A parallel for loop that executes the given functor in parallel on different CPUs. More... | |
template<class Func > | |
void | Kaskade::parallelFor (size_t first, size_t last, Func const &f, size_t nTasks=std::numeric_limits< size_t >::max()) |
A parallel for loop that executes the given functor in parallel on different CPUs. More... | |
template<class Func > | |
void | Kaskade::parallelForNodes (Func const &f, int maxTasks=std::numeric_limits< int >::max()) |
A parallel for loop that executes the given functor in parallel on different NUMA nodes. More... | |
void | Kaskade::runInBackground (std::function< void()> &f) |
Executes a function in a child process. More... | |
Variables | |
std::mutex | Kaskade::DuneQuadratureRulesMutex |
A global lock for the Dune::QuadratureRules factory, which is not thread-safe as of 2015-01-01. More... | |
boost::mutex | Kaskade::refElementMutex |
A global lock for the Dune::GenericReferenceElement singletons, which are not thread-safe. More... | |
Support routines and data structures for multithreaded execution.
typedef std::packaged_task<void()> Kaskade::Task |
Abstract interface for tasks to be scheduled for concurrent execution.
Definition at line 248 of file threading.hh.
typedef std::future<void> Kaskade::Ticket |
Abstract waitable job ticket for submitted tasks.
Definition at line 254 of file threading.hh.
void Kaskade::equalWeightRanges | ( | std::vector< size_t > & | x, |
size_t | n | ||
) |
Computes partitioning points such that the sum of weights in each partition is roughly the same.
Let \( w_i, 0\le i < N \) denote the weights x[i]. On exit, x has size \( k=n+1 \) with values \( x_i \) such that \( x_0 = 0 \), \( x_{k-1} = N\), and
\[ \sum_{j=x_i}^{x_{i+1}-1} w_j \approx \frac{1}{k} \sum_{j=0}^{N-1} w_j. \]
[in,out] | x | the array of (nonnegative) weights |
[in] | n | the desired number of partitions (positive). |
Referenced by Kaskade::NumaCRSPattern< Index >::NumaCRSPattern().
void Kaskade::parallelFor | ( | Func const & | f, |
int | maxTasks = std::numeric_limits<int>::max() |
||
) |
A parallel for loop that executes the given functor in parallel on different CPUs.
Func | a functor with an operator ()(int i, int n) |
f | the functor to call |
maxTasks | the maximal number of tasks to create |
The given function object is called in parallel for values (i,n) with i ranging from 0 to n-1, i.e. it is the i-th call of total n calls. The function object is responsible for partitioning the work appropriately, e.g., using uniformWeightRange(), and perform locking if needed. The number n is the same for all calls and guaranteed not to exceed maxTasks (but may be smaller).
The function returns after the last task has been completed, therefore the computational effort for tasks 0,...,n-1 should be roughly equal, no matter which value n has. For an optimal performance, maxTasks should be either a small multiple of the number of CPUs in the system (such that ideally all CPUs are busy the whole time) or much larger than that (such that an imbalance has only a small impact).
Definition at line 489 of file threading.hh.
Referenced by Kaskade::PatchDomainDecompositionPreconditioner< Space, m, StorageTag, SparseMatrixIndex >::apply(), Kaskade::VariationalFunctionalAssembler< F, SparseIndex, BoundaryDetector, QuadRule >::assemble(), Kaskade::NumaBCRSMatrix< Entry, Index >::conjugation(), Kaskade::deformedProlongationStack(), Kaskade::getContactConstraints(), Kaskade::gridIterate(), Kaskade::NumaBCRSMatrix< Entry, Index >::operator*(), Kaskade::PatchDomainDecompositionPreconditioner< Space, m, StorageTag, SparseMatrixIndex >::PatchDomainDecompositionPreconditioner(), Kaskade::prolongationStack(), Kaskade::TransferData< Space, CoarseningPolicy >::TransferData(), and Kaskade::BDDC::BDDCSolver< Subdomain >::update_rhs().
void Kaskade::parallelFor | ( | size_t | first, |
size_t | last, | ||
Func const & | f, | ||
size_t | nTasks = std::numeric_limits<size_t>::max() |
||
) |
A parallel for loop that executes the given functor in parallel on different CPUs.
Func | a functor with an operator ()(int i) |
f | the functor to call |
first | the first iteration index, usually 0 |
last | one after the last iteration index, i.e. the number of iterations. Precondition: last >= first. |
The given function is called last-first times, exactly once for each iteration index, with the only argument being the iteration index. The granularity can be quite fine. The computational work for different iteration indices can be rather different without sacrificing parallel efficiency.
min(last-first,nCpus,nTasks) tasks are created. Each tasks grabs the next not yet executed iteration index and calls the functor with that index, then repeats.
Definition at line 531 of file threading.hh.
void Kaskade::parallelForNodes | ( | Func const & | f, |
int | maxTasks = std::numeric_limits<int>::max() |
||
) |
A parallel for loop that executes the given functor in parallel on different NUMA nodes.
Func | a functor with an operator ()(int i, int n) |
f | the functor to call |
maxTasks | the maximal number of tasks to create |
The given function object is called in parallel for values (i,n) with i ranging from 0 to n-1. The number n is min(nNodes,maxTasks) for all calls to f. Every task is executed on the node given by arguemnt i.
The function returns after the last task has been completed, therefore the computational effort for tasks 0,...,n-1 should be roughly equal, no matter which value n has.
Definition at line 604 of file threading.hh.
Referenced by Kaskade::NumaCRSPatternCreator< Index >::addAllElements(), Kaskade::NumaCRSPatternCreator< Index >::addElements(), Kaskade::JacobiPreconditionerDetail::DiagonalBlock< Entry, row, col >::apply(), Kaskade::NumaBCRSMatrix< Entry, Index >::NumaBCRSMatrix(), Kaskade::NumaCRSPattern< Index >::NumaCRSPattern(), Kaskade::NumaCRSPatternCreator< Index >::NumaCRSPatternCreator(), Kaskade::NumaBCRSMatrix< Entry, Index >::operator=(), and Kaskade::NumaCRSPatternCreator< Index >::~NumaCRSPatternCreator().
void Kaskade::runInBackground | ( | std::function< void()> & | f | ) |
Executes a function in a child process.
The given function is executed independently of and in parallel to the caller in a child process. Due to the address space separation, the caller may proceed to modify its data structures, while the child is not affected. Uses this to execute expensive "fire and forget" tasks that rely on the data structures not to change and are difficult to parallelize.
The typical use case is to write output files in the background, while the computation continues.
Index Kaskade::uniformWeightRange | ( | Index | j, |
Index | n, | ||
Index | m | ||
) |
Computes the range in which an index is to be found when partitioned for uniform weights.
Index | an integral type |
j | the index for which the range number is to be computed. Has to be in [0,m[ |
n | the number of ranges |
m | the total number of entries |
Definition at line 91 of file threading.hh.
Referenced by Kaskade::JacobiPreconditionerDetail::DiagonalBlock< Entry, row, col >::DiagonalBlock().
Index Kaskade::uniformWeightRangeStart | ( | BlockIndex | i, |
BlockIndex | n, | ||
Index | m | ||
) |
Computes partitioning points of ranges for uniform weight distributions.
The functionality is similar to equalWeightRanges, but for uniform weight distribution, the partitioning points can be computed directly.
Index | an integral type |
i | the number of the range for which the starting point is to be computed. Has to be in [0,n] |
n | the number of ranges |
m | the total number of entries |
Definition at line 75 of file threading.hh.
Referenced by Kaskade::JacobiPreconditionerDetail::DiagonalBlock< Entry, row, col >::apply(), Kaskade::NumaBCRSMatrix< Entry, Index >::conjugation(), Kaskade::JacobiPreconditionerDetail::DiagonalBlock< Entry, row, col >::DiagonalBlock(), Kaskade::PatchDomainDecompositionPreconditioner< Space, m, StorageTag, SparseMatrixIndex >::PatchDomainDecompositionPreconditioner(), Kaskade::LocalVectors< Entry, SortedIdx, Vector >::scatter(), Kaskade::ThreadedMatrixDetail::CRSChunk< Entry, Index >::scatter(), and Kaskade::uniformWeightRange().
|
extern |
A global lock for the Dune::QuadratureRules factory, which is not thread-safe as of 2015-01-01.
Referenced by Kaskade::QuadratureTraits< Dune::QuadratureRule< ctype, dim > >::rule().
|
extern |
A global lock for the Dune::GenericReferenceElement singletons, which are not thread-safe.