Debugging Parallel Code in Visual Studio

.NET 4.0 introduced the Task Parallel Library (TPL) and Parallel LINQ (PLINQ) in an attempt to make parallel programming simpler and making best use of multi-core processors easier.

Recently I was playing around with the Parallel.Foreach and the new Enumerator APIs for the File System in System.IO trying to build a Fast Folder Scanner when I chanced upon the Parallel debugging options in Visual Studio. After fiddling around a little bit, I was able to make sense of the information and it was kind of a ‘brain explode’ moment.

Let me share the things that I figured out.

The Harness Code

- Let’s create a Console Application called FastFolderScanner

- Next we put together the following code to scan folders for a particular type of file and split them out.


static void Main(string[] args)
{
    ScanFolders(new System.IO.DirectoryInfo(@"C:\Users\Public\Documents\My Projects\Github\"));
    Console.WriteLine("Done!!!");
    Console.ReadLine();
}


private static void ScanFolders(System.IO.DirectoryInfo dirInfo)
{
    try
    {
        IEnumerable<System.IO.FileInfo> files = dirInfo.EnumerateFiles("*.cs");
        Parallel.ForEach<FileInfo>(files, WriteName);
        IEnumerable<DirectoryInfo> directories = dirInfo.EnumerateDirectories();
        if (directories.Count<DirectoryInfo>() > 0)
        {
            Parallel.ForEach<DirectoryInfo>(directories, ScanFolder);
        }
    }
    catch (Exception ex)
    {
        Console.WriteLine("ERROR: " + ex.Message);
    }
}


private static void ScanFolder(DirectoryInfo currentFolder, ParallelLoopState arg2, long arg3)
{
    ScanFolders(currentFolder);
}

private static void WriteName(FileInfo currentFileInfo, ParallelLoopState arg2, long arg3)
{
    Console.WriteLine(currentFileInfo.FullName);
}

- The ScanFolders method is the crux of code. The main method sends in the starting folder and it’s DirectoryInfo object.

- Using the DirectoryInfo we will pull an Enumerator out for a particular file type.

- Then we will use the Parallel.Foreach from the TPL to print out the names in the current folder

- Next we will use the retrieve the DirectoryInfo enumerator in the current folder

- In the second Parallel.Foreach of the method, spin off a recursive call to the current directory via the ScanFolder method (which calls back into ScanFolder for any given DirectoryInfo).

- This spawns a nice tree of threads for us to visualize

Visualizing the TPL Execution in Visual Studio Debugger

Note: Any capable modern system will most likely spin through a few thousand files in a blink. So experiment around with a Thread.Sleep(xyz) in the WriteName method after the Console.WriteLine to ‘slow’ things down for academic purposes.

In the above code, I’ve put my local Git Repo and asked it to look for cs files and we can rest assured there are ‘lots’ of them.

1. We run the application

2. Switch to Visual Studio quickly and hit Break All or Ctrl+B. Let’s assume the execution broke in the ScanFolders method

break-in-scan-folders

The Parallel Tasks Window

3. Go to Debug > Windows > Parallel Tasks to bring up the Parallel Tasks window. You are likely to see a trace like the following. This shows us the Thread assignment, Thread ID, Status etc. Note the Flag column, you can actually flag a particular Thread and watch it specifically too.

parallel-tasks

The Parallel Watch Window

4. Now do the ‘Continue and Break’ routine till you break in the WriteName method. If you are repeatedly hitting the ScanFolders method, after the first break, put a breakpoint in the WriteName method and let it hit the breakpoint. Go to Debug > Windows > Parallel Watch > Parallel Watch 1. You should see a window similar to this one.

parallel-watch-empty

5. The above doesn’t give us much info about the value of parameters in the thread. So select the ‘currentFileInfo.FullName’ and select ‘Add Parallel Watch’

add-parallel-watch

This will add a window similar to the following one

parallel-watch-params

As you can see, you will be able to see the filename that is being parsed by the active threads.

The Parallel Stack Visualization

So far we have seen the parallel task window and watch window. But we often refer to our stack traces to see where the call is coming from. In parallel executing, a linear stack trace doesn’t help much, so Visual Studio provides a graphical chart. You can invoke it from Debug > Windows > Parallel Stacks. This opens up a Window similar to the following:

parallel-stacks

Look at it and let it sink in. This is hardcore stuff. You actually have the stack trace of each thread being executed. You can hover over code (e.g. Program.ScanFolders) to see the StackFrame

parallel-stacks-frame

Then double-click on the stack and navigate to the code.

parallel-stacks-code-jump

Next you can evaluate the files in the enumerator.

parallel-stacks-code-eval

The ‘Brain Explode’ moment!! Awesome stuff.

Conclusion

Even though the sample we saw today was academic, the tools for Parallel debugging in Visual Studio 2012 is pretty awesome and will definitely help save pesky issues that crop up when doing complex operations in parallel.

Traditionally parallel programming has been considered tough and reserved for the Coding Ninjas only. But with the introduction of TPL supported by the awesome debugging tools, the bar of entry into parallel programming is getting lowered (yeah, very slowly, I know :), but the intern-Ninjas have a lot of help now in form of tooling support).

Download the entire source code (GitHub)




About The Author

Suprotim Agarwal
Suprotim Agarwal, Developer Technologies MVP (Microsoft Most Valuable Professional) is the founder and contributor for DevCurry, DotNetCurry and SQLServerCurry. He is the Chief Editor of a Developer Magazine called DNC Magazine. He has also authored two Books - 51 Recipes using jQuery with ASP.NET Controls. and The Absolutely Awesome jQuery CookBook.

Follow him on twitter @suprotimagarwal.

No comments: