DCSIMG
ParallelFX,Performance - All Your Base Are Belong To Us

All Your Base Are Belong To Us

Mostly .NET internals and other kinds of gory details

Browse by Tags

All Tags » ParallelFX » Performance (RSS)
DevReach 2012: Task and Data Parallelism
Thanks for attending my DevReach session on task and data parallelism ! We discussed the APIs available to you in the Task Parallel Library and how to avoid common pitfalls and squeeze performance from seemingly difficult to parallelize algorithms. Among the topics we covered: Measuring concurrency using the Visual Studio Concurrency Visualizer Extracting parallelism from recursive algorithms Symmetric data processing and uneven work distribution Dependency management with continuations Synchronization...
The Future of Microprocessors—Must Read for Developers
Long-time readers of this blog know that I really don’t like rehashing someone else’s thoughts and linking to material that isn’t my own. However, the ACM article The Future of Microprocessors (S. Borkar, A. Chien) warrants an exception to this rule. If you can afford the time (approx. 2 hours), I strongly recommend that you read the article instead of my somewhat incoherent ramblings below. If you’re looking for an executive summary highlighting some of the biggest challenges and likely solutions...
DevAcademy4 Session: Watch the Video and Download the Slides and Demos
I promised you that my DevAcademy4 session will be recorded and available online shortly after the conference. Well, the conference was a blast, and the video recording, slides, and demos are all available online. Everyone who had to stand me for over 60 minutes in the packed session hall—thanks a lot for coming, and I hope you had fun! If there’s anything at all that you would like to follow up on, feel free to use the contact form . It might take a while before the materials are available at the...
Fairness is Highly Overrated
Fairness with respect to synchronization mechanisms is a highly overrated property. When I talk about concurrency, parallelism, Windows synchronization and similar subjects, I’m often asked whether the specific algorithm, mechanism or feature is fair in some respect. First, let’s define fairness. I’ll use a simplistic yet rigorous definition to define a fair lock . (Other synchronization mechanisms may have fairness defined in a similar fashion.) To begin with, a lock is exactly what you think it...
Parallelism in Visual Studio 2010: Demos Updated to Beta 2
As you probably know, Visual Studio 2010 has reached the Beta 2 milestone, with a go-live license available to start coding your production applications using this release. There have been some minor changes around the System.Threading.Tasks namespace and classes, with the most significant change being around the cancellation model for tasks. Instead of explicitly reaching out for a Task object and then calling its Cancel method (which has been removed, along with the static Task.Current property...
Parallel Programming in Visual Studio 2010: MSDN Event Deck and Demos
A couple of days ago I delivered a session on Parallel Programming in Visual Studio 2010 at the Microsoft offices in Raanana. The deck and code will appear on the official website (of Israel MSDN Events) shortly, but for now you can download them from my SkyDrive: Deck Demo code This session complements my earlier session on Concurrent Programming fundamentals , in which I provided a theoretical overview of some of the architectural and implementation issues in modern concurrent applications. In...
Practical Concurrency Patterns: Immutability (Freezables)
Another very simple pattern builds on the foundation of the Safe-Unsafe Cache pattern .  What is the easiest way to protect data from multi-threaded access and to incur the minimal performance cost while doing so?  Making it read-only! Read-only (immutable) objects are extremely convenient to use because there’s no need to enforce thread-safety – all accesses to read-only objects are thread-safe.  Additionally, immutable objects decrease complexity – it’s impossible for one part of...
Practical Concurrency Patterns: Spinlock
Previously in the series we have examined the performance differences between concurrency patterns based on kernel synchronization (critical sections, events, mutexes etc.) and concurrency patterns based on wait-free synchronization (such as the interlocked family of operations). Kernel synchronization tends to be expensive if locks are frequently acquired and released because of the associated context switches, introducing anti-patterns such as lock convoys. Wait-free synchronization tends to be...
Practical Concurrency Patterns: Cyclic Lock-Free Buffer
Last time we have minimized contention by using lock-free operations instead of acquiring a lock on a work item queue (neither a standard lock nor a reader-writer lock offer nearly-linear scaling).  On the other hand, a few weeks ago we’ve also seen a technique for eliminating contention altogether by keeping state in thread-local storage . However, there’s another trick in the bag for using shared state across threads without any locks.  It is applicable for the scenario where multiple...
Practical Concurrency Patterns: Lock-Free Operations
In the previous installments we have reviewed multiple strategies for caching or storing calculated key-value data so that accesses to it are optimized to the highest applicable degree. For simpler storage types, such as a work item queue, there are even cheaper alternatives that what we've seen so far. Consider the following scenario: A stock trade application receives trade orders at the alarming rate of thousands of operations per second from multiple sources Orders are placed in a queue for...
Practical Concurrency Patterns: Safe/Unsafe Cache
After having examined a classical reader-writer-lock-based cache and a thread-local cache , we have come to terms with the deficiencies of both alternatives. A classical RWL-cache requires a lock (albeit a cheap one) for every operation, including a read, even though contention doesn't occur until there's a write. A TLS-cache is fine where the worker threads are countable and controllable, but becomes inapplicable when hundreds of threads from the thread pool are spawned and destroyed. An...
Practical Concurrency Patterns: Thread-Local Cache
In the previous post we have looked at the fairly simple implementation of a read-write cache , where elements are created at the moment they are needed. We've also noticed that for the specific cases where the data is accessed very frequently and calculating the data doesn't cost too much in terms of CPU time and memory, there's a large overhead to this approach because access to the cache must be synchronized. We can introduce a simple optimization by noticing that locks are unnecessary...
Practical Concurrency Patterns: Read-Write Cache
Assume that you have a set of worker threads producing and consuming information. In the process, there's some generated data that can be cached instead of being calculated every time it is accessed. A typical solution for this problem is a thread-safe read-write cache (e.g. a .NET Dictionary) that will contain the calculated data. Due to its changing nature, a lock must be taken when reading from the cache and when writing to the cache. This is typically a reader-writer lock, unless the data...
Waltzing Through the Parallel Extensions June CTP: Known Issues
In the previous posts in this series, we have looked at a multitude of features provided by the PFX June CTP, including synchronization mechanisms , task-related features and new collection classes . However, there's also a large list of known issues with this release - it's obviously not production-ready, but nonetheless is a great milestone by the Parallel Extensions team. The most interesting issues mentioned are: TPL threads are not cleanly shut down when run in the Visual Studio test...
Waltzing Through the Parallel Extensions June CTP: Collection Classes
In the previous posts in this series, we have looked at the new synchronization mechanisms and the new task-related features in the PFX June CTP. This post features a brief overview of the new collection classes introduced in the CTP. In the new System.Threading.Collections namespace we find three new classes which facilitate concurrent programming. These collections do not yet represent the wealth of concurrent and non-blocking collections that might be implemented in the future, but they are certainly...
More Posts Next page »