Support routines and data structures for multithreaded execution. More...

Classes
class	Kaskade::Kalloc
	A simple memory manager for NUMA systems. More...

class	Kaskade::ConcurrentQueue< T >
	A concurrent fifo queue. More...

class	Kaskade::NumaThreadPool
	Implementation of thread pools suitable for parallelization of (more or less) memory-bound algorithms (not only) on NUMA machines. More...

class	Kaskade::NumaAllocator< T >
	An STL allocator that uses memory of a specific NUMA node only. More...

class	Kaskade::Mutex
	A utility class implementing appropriate copy semantics for boost mutexes. More...

Typedefs
typedef std::packaged_task< void()>	Kaskade::Task
	Abstract interface for tasks to be scheduled for concurrent execution. More...

typedef std::future< void >	Kaskade::Ticket
	Abstract waitable job ticket for submitted tasks. More...

Functions
void	Kaskade::equalWeightRanges (std::vector< size_t > &x, size_t n)
	Computes partitioning points such that the sum of weights in each partition is roughly the same. More...

template<class BlockIndex , class Index >
Index	Kaskade::uniformWeightRangeStart (BlockIndex i, BlockIndex n, Index m)
	Computes partitioning points of ranges for uniform weight distributions. More...

template<class Index >
Index	Kaskade::uniformWeightRange (Index j, Index n, Index m)
	Computes the range in which an index is to be found when partitioned for uniform weights. More...

template<class Func >
void	Kaskade::parallelFor (Func const &f, int maxTasks=std::numeric_limits< int >::max())
	A parallel for loop that executes the given functor in parallel on different CPUs. More...

template<class Func >
void	Kaskade::parallelFor (size_t first, size_t last, Func const &f, size_t nTasks=std::numeric_limits< size_t >::max())
	A parallel for loop that executes the given functor in parallel on different CPUs. More...

template<class Func >
void	Kaskade::parallelForNodes (Func const &f, int maxTasks=std::numeric_limits< int >::max())
	A parallel for loop that executes the given functor in parallel on different NUMA nodes. More...

void	Kaskade::runInBackground (std::function< void()> &f)
	Executes a function in a child process. More...

Variables
std::mutex	Kaskade::DuneQuadratureRulesMutex
	A global lock for the Dune::QuadratureRules factory, which is not thread-safe as of 2015-01-01. More...

boost::mutex	Kaskade::refElementMutex
	A global lock for the Dune::GenericReferenceElement singletons, which are not thread-safe. More...

Detailed Description

Support routines and data structures for multithreaded execution.

Typedef Documentation

◆ Task

typedef std::packaged_task<void()> Kaskade::Task

Abstract interface for tasks to be scheduled for concurrent execution.

Definition at line 248 of file threading.hh.

◆ Ticket

typedef std::future<void> Kaskade::Ticket

Abstract waitable job ticket for submitted tasks.

Definition at line 254 of file threading.hh.

Function Documentation

◆ equalWeightRanges()

void Kaskade::equalWeightRanges	(	std::vector< size_t > &	x,
		size_t	n
	)

Computes partitioning points such that the sum of weights in each partition is roughly the same.

Let \( w_i, 0\le i < N \) denote the weights x[i]. On exit, x has size \( k=n+1 \) with values \( x_i \) such that \( x_0 = 0 \), \( x_{k-1} = N\), and

\[ \sum_{j=x_i}^{x_{i+1}-1} w_j \approx \frac{1}{k} \sum_{j=0}^{N-1} w_j. \]

Parameters

[in,out]	x	the array of (nonnegative) weights
[in]	n	the desired number of partitions (positive).

Referenced by Kaskade::NumaCRSPattern< Index >::NumaCRSPattern().

◆ parallelFor() [1/2]

template<class Func >

void Kaskade::parallelFor	(	Func const &	f,
		int	maxTasks = `std::numeric_limits<int>::max()`
	)

A parallel for loop that executes the given functor in parallel on different CPUs.

Template Parameters

Func	a functor with an operator ()(int i, int n)

Parameters

f	the functor to call
maxTasks	the maximal number of tasks to create

The given function object is called in parallel for values (i,n) with i ranging from 0 to n-1, i.e. it is the i-th call of total n calls. The function object is responsible for partitioning the work appropriately, e.g., using uniformWeightRange(), and perform locking if needed. The number n is the same for all calls and guaranteed not to exceed maxTasks (but may be smaller).

The function returns after the last task has been completed, therefore the computational effort for tasks 0,...,n-1 should be roughly equal, no matter which value n has. For an optimal performance, maxTasks should be either a small multiple of the number of CPUs in the system (such that ideally all CPUs are busy the whole time) or much larger than that (such that an imbalance has only a small impact).

Definition at line 489 of file threading.hh.

Referenced by Kaskade::PatchDomainDecompositionPreconditioner< Space, m, StorageTag, SparseMatrixIndex >::apply(), Kaskade::VariationalFunctionalAssembler< F, SparseIndex, BoundaryDetector, QuadRule >::assemble(), Kaskade::NumaBCRSMatrix< Entry, Index >::conjugation(), Kaskade::deformedProlongationStack(), Kaskade::getContactConstraints(), Kaskade::gridIterate(), Kaskade::NumaBCRSMatrix< Entry, Index >::operator*(), Kaskade::PatchDomainDecompositionPreconditioner< Space, m, StorageTag, SparseMatrixIndex >::PatchDomainDecompositionPreconditioner(), Kaskade::prolongationStack(), Kaskade::TransferData< Space, CoarseningPolicy >::TransferData(), and Kaskade::BDDC::BDDCSolver< Subdomain >::update_rhs().

◆ parallelFor() [2/2]

template<class Func >

void Kaskade::parallelFor	(	size_t	first,
		size_t	last,
		Func const &	f,
		size_t	nTasks = `std::numeric_limits<size_t>::max()`
	)

A parallel for loop that executes the given functor in parallel on different CPUs.

Template Parameters

Func	a functor with an operator ()(int i)

Parameters

f	the functor to call
first	the first iteration index, usually 0
last	one after the last iteration index, i.e. the number of iterations. Precondition: last >= first.

The given function is called last-first times, exactly once for each iteration index, with the only argument being the iteration index. The granularity can be quite fine. The computational work for different iteration indices can be rather different without sacrificing parallel efficiency.

min(last-first,nCpus,nTasks) tasks are created. Each tasks grabs the next not yet executed iteration index and calls the functor with that index, then repeats.

Definition at line 531 of file threading.hh.

◆ parallelForNodes()

template<class Func >

void Kaskade::parallelForNodes	(	Func const &	f,
		int	maxTasks = `std::numeric_limits<int>::max()`
	)

A parallel for loop that executes the given functor in parallel on different NUMA nodes.

Template Parameters

Func	a functor with an operator ()(int i, int n)

Parameters

f	the functor to call
maxTasks	the maximal number of tasks to create

The given function object is called in parallel for values (i,n) with i ranging from 0 to n-1. The number n is min(nNodes,maxTasks) for all calls to f. Every task is executed on the node given by arguemnt i.

The function returns after the last task has been completed, therefore the computational effort for tasks 0,...,n-1 should be roughly equal, no matter which value n has.

Definition at line 604 of file threading.hh.

Referenced by Kaskade::NumaCRSPatternCreator< Index >::addAllElements(), Kaskade::NumaCRSPatternCreator< Index >::addElements(), Kaskade::JacobiPreconditionerDetail::DiagonalBlock< Entry, row, col >::apply(), Kaskade::NumaBCRSMatrix< Entry, Index >::NumaBCRSMatrix(), Kaskade::NumaCRSPattern< Index >::NumaCRSPattern(), Kaskade::NumaCRSPatternCreator< Index >::NumaCRSPatternCreator(), Kaskade::NumaBCRSMatrix< Entry, Index >::operator=(), and Kaskade::NumaCRSPatternCreator< Index >::~NumaCRSPatternCreator().

◆ runInBackground()

void Kaskade::runInBackground ( std::function< void()> & f )

Executes a function in a child process.

The given function is executed independently of and in parallel to the caller in a child process. Due to the address space separation, the caller may proceed to modify its data structures, while the child is not affected. Uses this to execute expensive "fire and forget" tasks that rely on the data structures not to change and are difficult to parallelize.

The typical use case is to write output files in the background, while the computation continues.

◆ uniformWeightRange()

template<class Index >

Index Kaskade::uniformWeightRange	(	Index	j,
		Index	n,
		Index	m
	)

Computes the range in which an index is to be found when partitioned for uniform weights.

Template Parameters

Index an integral type

Parameters

j	the index for which the range number is to be computed. Has to be in [0,m[
n	the number of ranges
m	the total number of entries

Definition at line 91 of file threading.hh.

Referenced by Kaskade::JacobiPreconditionerDetail::DiagonalBlock< Entry, row, col >::DiagonalBlock().

◆ uniformWeightRangeStart()

template<class BlockIndex , class Index >

Index Kaskade::uniformWeightRangeStart	(	BlockIndex	i,
		BlockIndex	n,
		Index	m
	)

Computes partitioning points of ranges for uniform weight distributions.

The functionality is similar to equalWeightRanges, but for uniform weight distribution, the partitioning points can be computed directly.

Template Parameters

Index an integral type

Parameters

i	the number of the range for which the starting point is to be computed. Has to be in [0,n]
n	the number of ranges
m	the total number of entries

Definition at line 75 of file threading.hh.

Referenced by Kaskade::JacobiPreconditionerDetail::DiagonalBlock< Entry, row, col >::apply(), Kaskade::NumaBCRSMatrix< Entry, Index >::conjugation(), Kaskade::JacobiPreconditionerDetail::DiagonalBlock< Entry, row, col >::DiagonalBlock(), Kaskade::PatchDomainDecompositionPreconditioner< Space, m, StorageTag, SparseMatrixIndex >::PatchDomainDecompositionPreconditioner(), Kaskade::LocalVectors< Entry, SortedIdx, Vector >::scatter(), Kaskade::ThreadedMatrixDetail::CRSChunk< Entry, Index >::scatter(), and Kaskade::uniformWeightRange().

Variable Documentation

◆ DuneQuadratureRulesMutex

std::mutex Kaskade::DuneQuadratureRulesMutex

extern

A global lock for the Dune::QuadratureRules factory, which is not thread-safe as of 2015-01-01.

Referenced by Kaskade::QuadratureTraits< Dune::QuadratureRule< ctype, dim > >::rule().

◆ refElementMutex

boost::mutex Kaskade::refElementMutex

extern

A global lock for the Dune::GenericReferenceElement singletons, which are not thread-safe.

Classes

Typedefs

Functions

Variables

Detailed Description

Typedef Documentation

◆ Task

◆ Ticket

Function Documentation

◆ equalWeightRanges()

◆ parallelFor() [1/2]

◆ parallelFor() [2/2]

◆ parallelForNodes()

◆ runInBackground()

◆ uniformWeightRange()

◆ uniformWeightRangeStart()

Variable Documentation

◆ DuneQuadratureRulesMutex

◆ refElementMutex