Parallel.ForEach? & PLINQ?
When you need to optimize a program for
multi-core machines, a great place to start is by asking if your program can be
split up into parts that can execute in parallel. If your solution can be
viewed as a compute-intensive
operation performed on each element in a
large data set in parallel, it is a prime candidate for taking advantage of new
data-parallel programming capabilities in .NET Framework 4: Parallel.ForEach and Parallel LINQ (PLINQ).
This document will familiarize you with Parallel.ForEach
and
PLINQ, discuss how to use these technologies and explain the specific scenarios
that lend themselves to each technology.
Parallel.ForEach
The Parallel class’s ForEach method is a multi-threaded implementation of a
common loop construct in C#, the
foreach loop. Recall that a foreach loop allows you to iterate over an enumerable data set represented
using
an IEnumerable<T>. Parallel.ForEach is similar to a foreach loop in that it iterates over an enumerable data
set, but unlike foreach, Parallel.ForEach uses multiple threads to evaluate different invocations of the
loop
body. As it turns out,
these characteristics make Parallel.ForEach a broadly useful mechanism for data-parallel
programming.
In order to evaluate a
function over a sequence in parallel, an important thing to consider is how to
break the
iteration space into
smaller pieces that can be processed in parallel. This partitioning allows each
thread to
evaluate the loop body
over one partition.
Parallel.ForEach has numerous overloads; the most commonly used has the following signature:
public
static ParallelLoopResult ForEach<TSource>(
IEnumerable<TSource> source,
Action<TSource> body)
The IEnumerable<TSource> source specifies the sequence to iterate over, and the Action<TSource> body
specifies the delegate
to invoke for each element. For the sake of simplicity, we won’t explain the
details of the
other signatures of Parallel.ForEach.
PLINQ
Akin
to Parallel.ForEach, PLINQ is also a programming model for executing data-parallel
operations. The user defines a data-parallel operation by combining various
predefined set-based operators such as projections, filters, aggregations, and
so forth. Since LINQ is declarative, PLINQ is able to step in and handle
parallelizing the query on the user’s behalf. Similarly to Parallel.ForEach, PLINQ achieves parallelism by partitioning the input sequence and
processing different input elements on different threads. While each of these
tools deserves an article on its own, this information is beyond to scope of
this document. Rather, the document will focus on the interesting differences
between the two approaches to data parallelism. Specifically, this document
will cover the scenarios when it makes sense to use Parallel.ForEach instead of PLINQ, and vice versa.
No comments:
Post a Comment