Webinar: Leverage multi-core performance using Intel® Threading Building Blocks (Intel® TBB)
This series of 2 hour theory and practical webinars delivered over 3 days will introduce Intel’s Thread Building Blocks (TBB). Those attending this series need no prior knowledge of TBB and perhaps only rudimentary understanding of parallel programming. On completing the series, participants will know what TBB is, how it enables parallel programming, what differentiates it from other parallel programming models, and how to use common parallel programming patterns to parallelize their own code. The series targets an audience with intermediate or advanced programming experience, and with beginning though intermediate parallel programming experience. The series specifically targets programmers just starting to need parallel programming, such as those in the growing Data Analytics, AI, Medical and Automation sectors. It also provides material for experienced high performance computing (HPC) programmers, who have little exposure to task-based tools like TBB.
Webinar Structure and Outline
Each webinar follows the below format
- 45-minute presentation
- 15-minute break, Q&A
- 60-minute live hands-on exercise
Day 1 (Tuesday 19 September 2017):
The first webinar introduces TBB, shows how to parallelize simple code, and highlights the differences between TBB and other models such as pthreads, MPI, and OpenMP. It emphasizes TBB’s task-based nature, and how the work-stealing scheduler works. The webinar shows how to parallelize for loops with parallel_for and how to parallelize reductions with parallel_reduce.The examples show how the work stealing scheduler can yield improved performance compared to other parallel programming models.
The exercise shows where to download and install TBB, parallelizes a simple vector-add, and then takes an existing conjugate gradient code and uses parallel_for and parallel_reduce to introduce parallelism.
Day 2 (Wednesday 20 September 2017):
The second webinar discusses more advanced features of TBB which can be used to improve performance of TBB based code. This includes concepts such as blocked ranges, and partitioners. Blocked ranges are illustrated to improve performance of nested for-loop codes with lopsided loop bounds, as well to improve data locality. All TBB’s partitioners are shown and the different situations they are useful presented.
The exercise revisits the conjugate gradient code and shows how to achieve more consistent performance with TBB blocked_range objects.
A second code is also shown: a simple seismic ray-tracer. In this example the TBB affinity partitioner is used to improve data re-use.
Day 3 (Thursday 21 September 2017):
The third webinar shows the TBB flow graph. The flow graph allows parallelizing more irregular code, but this webinar focuses on the flow graph as an advanced tool to avoid fork-join style of parallelism. The webinar begins by showing what a basic skeleton of parallel code looks like using continue_node, then follows with converting the Cholesky factorization from a purely sequential version into a blocked version, and from blocked into a TBB flow graph parallel version.
The exercise revisits the conjugate gradient example and removes the implicit barrier created by reductions though the TBB flow graph.
NAG implements its software to Intel/Windows, Intel/Linux, and Intel/Mac platforms. We partner on various projects including the NAG Library for regular x86 hardware (Intel® Xeons™ etc) and specialist hardware such as the Intel Many Integrated Core Architecture with the NAG Library for Intel Xeon Phi. NAG delivers Training Courses and a Software Modernization Service for clients who wish to use and get the most from their modern Intel hardware.