The Wayback Machine - https://web.archive.org/web/20081216002347/http://www.devx.com:80/go-parallel/Article/39689
Intel Go Parallel
Intel
Getting Started Concurrent Programming Community And Opinion Tools and Tips Advanced Concepts Go Parallel RSS Feed
 Print Print

Q&A; with a TBB Junkie
Meet Dmitriy V'jukov, a Moscow-based high-performance computer systems developer who is an assiduous observer of Intel Threading Building Blocks (TBB) and the adoption of parallelism by developers around the world. 

Developing "lock-free, wait-free, obstruction-free, atomic-free synchronization algorithms and data structures" is his hobby. Based on his frequent postings, he's a "brown belt" ninja contributor on the Intel Software Network Forum, and one of the site's newest bloggers. Meet Dmitriy V'jukov, a Moscow-based high-performance computer systems developer who is an assiduous observer of Intel Threading Building Blocks (TBB) and the adoption of parallelism by developers around the world. Go Parallel invited V'jukov to share his opinions about TBB, the Microsoft Task Parallel Library, other tools to support concurrency and the proposed Intel Parallel Studio.

Q: What is your software development background?

A: I hold a masters degree in computer science from Moscow State Technical University. I have five years of experience as a C/C++ software development engineer, focused mainly on client/server systems and network servers. In my spare time, I deal with synchronization algorithms, programming models for multi-core and multi-threading verification tools.

Q: How long have you been using TBB and for what purpose?

A: I am quite aware of things happening around and inside TBB, but frankly I was not using TBB "in production." I was studying user interfaces and implementation of TBB in detail. I've developed a library for unit-testing/formal verification of synchronization algorithms (or small pieces of multi-threaded code). It's called Relacy Race Detector.

I have had some preliminary conversations with TBB developers with regards to its usage in the development of TBB. I am going to provide a free license for TBB developers. I had an analogous conversation with IBM's Paul McKenney (he works on high-end Intel platforms and Linux technology) with regards to its usage in the development of Linux kernel.

But I'm not sure whether Relacy Race Detector itself will be interesting to the general public, because it's targeted mostly at experts who develop very low-level and complicated algorithms.

Q: What difficulties do you see developers having with TBB?

A: In forums and discussion groups I see that developers face three kinds of problems with TBB algorithms:

1. Task granularity size. In order to achieve good performance, task granularity must be carefully chosen. Tasks that are too fine-grained will lead to high overheads. And tasks that are too coarse-grained will lead to bad scalability due to lack of "parallel slack."

2. Excessive sharing. In order to achieve good scalability, each thread must work mainly with private data. Having each thread, on each iteration, update some global variable (or variables) will turn scalability from a linear positive to a super-linear negative. Task-based programming is especially prone to the problem. Higher-level abstractions (tbb::parallel_reduce, tbb::parallel_scan) incorporate more intelligence to overcome the problem. This strongly suggests that developers should use as high-level abstractions as possible.

3. Locality. Though the modern computer memory sub-system is still called RAM (random-access memory), it's a kind of complicated, distributed, heterogeneous, hierarchical system now. Fortunately, there are very simple tips on how to use it efficiently: First, prefer stride access; second, use all data loaded into the cache; and third, reuse the data in cache while it's still there.

While this advice is applicable to a single-threaded environment too, in a task-based model it's harder to realize whether, for example, access will be in stride or not. Once again, higher-level abstractions are less prone to the problem.

Q: How much are these problems with parallel programming vs. problems with TBB in particular?

These problems are related to parallel programming in general, and in particular to all other parallel programming libraries: OpenMP, Task Parallel Library, Cilk, etc.

Q: When you discuss granularity size, are you talking about the general parallel programming issue of task size, or referring to the problematic TBB 1.0 requirement to pick an explicit grain size (which was fixed in TBB 2.0 with the auto_partitioner)?

A: I am talking about the general parallel programming issue of task size.

Q: What's your biggest challenge in concurrent programming?

A: My biggest challenge in concurrent programming is debugging. Things like non-determinism, asynchronism, the absence of total order of events and state of distribution make debugging of concurrent systems beyond the human brain's strength sometimes. Every "little" error in source code can take up to several days or weeks to fix. And that's the best case scenario. In the worst case, you don't know that there is an error until you get the call from an enraged customer. And the customer can't say under what circumstances it happens.

This is a field where I am looking forward to strong tool support, of all kinds: static analysis, dynamic analysis, post-mortem analysis, advanced IDE support. I have developed some in-house tools for my purposes. But not every developer is able to develop a comprehensive toolset manually.

  Next Page: Intel's Current and Upcoming Tools
Page 1: Using TBB Page 2: Intel's Current and Upcoming Tools
Submit article to:
Ever wonder why we don't hear more from threading practitioners about how they managed to grok concurrency? Perhaps it's because they're too busy enjoying the performance increases. They won't say it's easy, but the Vegas Pro developers at Sony Creative Software are understandably proud of their growing expertise in threading and OpenMP. »
While threading can be a challenge, new software development tools help simplify the process by identifying thread correctness issues and performance opportunities. We present a methodology that has been used to successfully thread many applications and discuss tools that can assist in developing multi-threaded applications. »
This paper describes the performance analysis phase of the threading methodology we presented in our previous paper, "Best Practices for Developing and Optimizing Threaded Applications." »
Understanding Dual Processors, Hyper-Threading Technology, and Multi-Core Systems
Multi-Threading in a Java Environment
Getting Started (95)
Concurrent Programming (112)
Community and Opinion (54)
Tools and Tips (89)
Advanced Concepts (58)
Which proposed Intel tool for client-side application concurrency are you most interested in?
(Choose your top answer.)
Parallel Advisor, for helping developers decide where to add parallelism to existing applications.
Parallel Composer, providing a C/C++ compiler and threaded libraries.
Parallel Inspector, for debugging threaded code.
Parallel Amplifier, for ensuring scalable concurrent code via performance analysis.

View Results
Past Votes
Morty Proxy This is a proxified and sanitized view of the page, visit original site.