DCSIMG
Multicore Programming (Multiprocessing) Visual C++ Tips: member function execution flow - Asaf Shelly

Multicore Programming (Multiprocessing) Visual C++ Tips: member function execution flow

Published Wednesday, August 27, 2008 12:58 AM

Hey all,

Long time no read... Well, I've been very busy. I thought that I'd start with a technical post (maybe cause there is nothing that exciting with my life, and maybe because I pretend that this is the case :)

As you all know I have beed busy in the are of parallel computing, what people call multi-core programming.

I have learned that Object Oriented programming is very problematic with parallel systems and that OO helps us at design time but has almost no benifit for run-time and many times even damages our ability to understand execution flow. See my post on my Intel blog here: http://softwareblogs.intel.com/2008/08/22/flaws-of-object-oriented-modeling/ 

It is very clear to me that people are used to OOP and will still be using it for many years to come so I have compiled a few basic guidelines for OO Programming to help us get the best of both worlds: manageable code, and clear execution flow. Here is the first:

Prefer clear execution flow for member function

The execution flow is the most important aspect in a parallel application. When the source code that has a clear and simple flow the application is simple to debug and the code is easy to follow. Once of the greatest problems of OOD is multiple short functions that call each other. Code review is almost meaningless and it is extremely difficult to follow the execution flow using a debugger or by going over the execution logs.
It would thus mean a great improvement in code usability when few long functions are used. Sometimes it is easier to read a function of five lines that looks like this:
 
void SampleFunction()
{
   int ch = ReadByte();
   if ( TestValue( ch ) )
   {
      Logger.WriteLine( "OK" );
   }
   else
   {
       throw ( "Error" );
   }
}
 
The code above is simple to read and is very clean, however there is a problem with this code. It is hard to tell if any of these functions are blocking, using locks, throw any exceptions, and worse than all usually an application that uses such functions will have many such short functions. Here is a possible execution flow for such an application:
 
- SampleFunction
-   ReadByte
-     TestFileOpen
-       GetInputFileHandle
-     ReadCharFromFile
-   TestValue
-     GetHashFromInt
-     FindHash
 
The same function could be implemented in the following illustrative way:
 
void SampleFunction()
{
   bool ok = false;
   HFILE hFile = GetInputFileHandle();
   if ( hFile > 0 )
   {
      int ch = fgetchar(hFile);
      if ( ch > EOF )
      {
         ch = (ch * 0x123 + 57 ) / 33;  // get HASH
         if ( lpHashTable[ ch ] != NULL )
         {
            ok = true;
         }
      }
   }
   if (ok)
   {
      Logger.WriteLine( "OK" );
   }
   else
   {
      throw ( "Error" );
   }
}
 
This may be a longer function and it is not "pure" object oriented programming but now you can read and code and understand what it does. You can also find bugs in the execution flow by a means of a simple code review and if you try to debug it step by step you can understand where you are in the flow of the code.
Coding this way helps in finding idiotic deadlocks and reduces the number of exceptions thrown between functions, which also helps in keeping a clear execution flow.


 
Your opinion is most welcome.

 Asaf

Comments

# Sasha Goldshtein said on Thursday, August 28, 2008 3:30 AM

Asaf, I have two problems with this approach - two extremes you could take it to, really.

If you take it to the "assembly language" extreme, then I can tell you that GetInputFileHandle and fgetchar() and Logger.WriteLine are also suspicious.  I'd expect Logger.WriteLine to take a lock, I'd expect fgetchar to throw an exception, I'd even expect lpHashTable[ch] to cause an AV or at least a page fault.  So unless you write down every single assembly language statement inside every function, you're in an infinite loop with this approach.

If you take it to the "let's apply this principle top-down" extreme, then what about the callers of your SampleFunction?  Do they assume that SampleFunction takes a lock, throws exceptions or blocks indefinitely?  If they don't want this assumption, then have to inline the code of SampleFunction, and by applying this recursively you end up with one gigantic (say, 100KLOC) function per application that is *definitely* less readable.

Oh, and what about inter-application communication?  Do you inline that too?

In other words, this is utterly not practical.

# AsafShelly said on Friday, August 29, 2008 11:23 AM

Hi Sasha,

I wouldn't take it to Assembly extreme, just I wouldn't take it to Object Oriented extreme. The answer is of course in the middle. The problem today is that developers are thought OOP and no Assembly (syntax is not a language!).

The closest practice I can find today is Windows NT device drivers, where each driver function has a clear flow and does not call multiple functions and still the entire system is fully object based. In that model even though you go through many objects for a single request, it is always clear where you are system wide, what initiated the operation, and inside a driver you can always follow the flow because it does not have multiple objects inside it.

Drivers are not practical here because the driver model has the Stack attached to the operation. In user mode, unless you create your own infrastructure, the only way to do that is to use a thread for every event and that really requires design methodologies that are not documented anywhere AFAIK.

Practical or not, my point in this article is that you should make your functions longer instead of having more functions. If you have a simple task to perform don't produce many objects just because it looks like a master-piece OOD because you damage run-time.

Asaf

# Sasha Goldshtein said on Sunday, August 31, 2008 3:33 AM

Are you joking?  Windows driver code does not call multiple functions?

http://snurl.com/3lhxh

This is the NTFS driver from ReactOS, which is the closest you get legally to Windows driver sources on the web.

On line 194 starts the NtfsMountVolume function.  It calls the following functions:

IoGetCurrentIrpStackLocation

NtfsHasFileSystem

IoCreateDevice

RtlZeroMemory

NtfsGetVolumeData

IoCreateStreamFileObject

ExAllocatePoolWithTag

CcRosInitializeFileCache

ExInitializeResourceLite

KeInitializeSpinLock

InitializeListHead

ObDereferenceObject

ExFreePool

IoDeleteDevice

Is it clear what the function does?  I don't know, I guess IT DEPENDS.  And if it were using multiple objects, would it be clearer?  I don't know, I guess IT DEPENDS.

So what's the point of making this kind of statements??

And why exactly does having many objects damage runtime?  Do you think that calling a function called IoCreateStreamFileObject is faster than calling the constructor for an object called, say, StreamFile?

# AsafShelly said on Sunday, August 31, 2008 12:02 PM

Hi Sahsa,

I agree that it all depends. My problem is with a programming style that is legal for Object Oriented: see this code (first hit on google for "int return void class"):

gcc.gnu.org/.../java.lang.reflect.Array.diff

So many 3 to 5 lines functions.

The driver code has much longer functions with clearer flow.

The code relates to NT file system and the following functions are lower layers:

IoGetCurrentIrpStackLocation

IoCreateDevice

RtlZeroMemory

IoCreateStreamFileObject

ExAllocatePoolWithTag

CcRosInitializeFileCache

ExInitializeResourceLite

KeInitializeSpinLock

InitializeListHead

ObDereferenceObject

ExFreePool

IoDeleteDevice

Some OO implementations for this code might have:

bool VerifyDeviceObject(...)  // line 209

{

 if (DeviceObject != NtfsGlobalData->DeviceObject)

   {

     return(STATUS_INVALID_DEVICE_REQUEST);

   }

 return(STATUS_OK)

}

DEVICE_OBJECT* GetDeviceToMount(...)  // line 216

{

  return(Stack->Parameters.MountVolume.DeviceObject);

}

CreateDevice for line 225, CreateDevExt for line 236-237, and InitDevExt...

My point is that not every operation is a function and not every group of operations is a class.

Asaf

# Sasha Goldshtein said on Wednesday, September 03, 2008 6:11 AM

Huh???

Using a function to encapsulate two lines of code has nothing to do with object-oriented programming - it's a FUNDAMENTAL technique for REUSE and READABILITY.

You should encapsulate common code into functions whether you're using an OO language or a procedural language.

# AsafShelly said on Wednesday, September 03, 2008 11:13 AM

The problem is that when a developer is working on an object, the object is all that exists for that developer and then you see many 3 line functions that are not reusable code. These functions are there just because it makes a nicer object to have every operation put in a different function. My object opens a file, reads the file and closes the file so I use three functions because it looks cleaner, more object oriented, or for any other reason.

Procedural programming encourages you to use less functions or the code is unreadable.

My point is that it is nice to have readable code, but most efforts today are put into run-time. You have readable code that can be fixed in 5 seconds but it takes you a few days (if at all) to find a simple flow bug like a race condition.

Asaf

# Asaf Shelly said on Thursday, September 04, 2008 10:30 AM

There are still too few online resources available that can help with multicore programming (multiprocessing

Leave a Comment

(required) 
(required) 
(optional)
(required) 

Enter the numbers above: