Bugs are the coolest, most excitingthings in the whole wide world! If you are an entomologist, you just might agree with that statement. Any competent software engineer—and I am 99.999 percent sure that you are since you're reading this—would probably wonder what kind of drugs I'm on. Granted, bugs are definitely not cool when they cause you to stay up all night trying to figure them out, or worse yet, when they cause you to lose your job because too many customers send a buggy product back. Proactively finding and fixing bugs long before your product gets to the customer can be very satisfying—especially since you won't have to spend the last half of the development cycle sitting in a debugger wondering where the problems are.
Welcome to The Bugslayer! This article introduces my new column in MSJ. In my column, I will discuss ways that you can find and fix as many bugs as possible before the customer ever sees your product. Since bugs can creep into your product anywhere from design to final ship, there is a great deal of ground to cover. My goal is to provide information that will help you build automatic bug killing right into your products. While a few columns might contain higher-level discussions, most will focus on particular areas where bugs are found—for example, multithreading. And following theMSJ tradition, they will offer tools, techniques and a bunch of source code you can use.
Now that I have broadly outlined the column, here is where you come in. I want to cover the topics you are most interested in seeing, so let me know what your interests are. While some bug-solving techniques are generic across all environments, others aren't. I want to try and hit those that are most interesting to the most folks. If you have debugging or bug-squashing questions or ideas, drop me a line atjohn@jprobbins.com. I look forward to hearing from you. Your Free Lunch
When Visual C++® 4.0 first arrived on my doorstep, it had all of those nice new features, but in my mind one small little addition was coolest: the Debug Run-Time Library. The strange thing is, most folks don't seem to realize that it exists because much of it is turned off by default. Once you get the proper flags switched on, however, it offers many great features: memory overwrite, underwrite, and freed memory access checking; memory leak checking; memory allocation hook; user-defined memory block dumping; and clean asserting and reporting macros. Plus, many of the features are extensible! In this article, I will extend the debug runtime so that you get even more functionality out of it.
The first library I developed, MemDumperValidator, provides a generic mechanism for hooking into the memory-dumping part of the debug runtime. So, when there is an error with your allocated memory, you get to see exactly what was in that memory. It also sets up a scheme to allow you to validate everything inside a memory block. The second library I developed, MemStressLib, hooks into the allocation portion of the debug runtime so that you can selectively fail memory allocations and test how your program handles them.
Since Microsoft is nice enough to provide the complete runtime source code, all the functionality for the debug runtime is easily seen. In the CRT\SRC directory, all the work takes place in the following files:
DBGDEL.CPP
|
The debug global delete operator.
|
DBGHEAP.C
|
All of the debug heap-handling functions.
|
DBGHOOK.C
|
The stub memory allocation hook function.
|
DBGINT.H
|
The internal debug headers and functions.
|
DBGNEW.CPP
|
The debug global new operator.
|
DBGRPT.C
|
The debug reporting functions.
|
CRTDBG.H
|
The header file you include. This is in the standard include directory.
|
The debug runtime is full of features, but I want to concentrate on the memory tracking and checking features that it offers. The first step to using the debug runtime is to include the main header CRTDBG.H, where all the functionality is defined. Right before you do the actual include, you will need to define _CRTDBG_MAP_ALLOC. This gets the allocation routines mapped to special versions that record the source and line of the call, which really helps give you some extra information about where things happened.
After you get the debug runtime included, you have to get it turned on. The documentation says that most of the debug runtime is turned off to keep the code small and to increase execution speed. While this may be important for a release build, the whole point of a debug build is to find bugs! The increased size and reduced speed of debug builds is inconsequential. The _CrtSetDbgFlag function takes a set of flags, shown inFigure 1, that turns on various options in the debug runtime. If you want to use either of the libraries, I included CRTDBG.H and defined _CRTDBG_ MAP_ALLOC for you. Both libraries make it a snap to get the full debug runtime library turned on.
Now that you have the debug runtime fully available, you get a slew of functions that really help you control memory usage. One of the most useful functions that you can call is _CrtCheckMemory. This function walks through all of the memory you have allocated and checks to see if you have any underwrites or overwrites and if you have used any blocks that were previously freed. This one function alone makes the entire debug runtime worth using.
But wait, there's more! Another set of functions allows you to easily check the validity of any piece of memory. The _CrtIsValidHeapPointer, _CrtIsMemoryBlock, and _CrtIsValidPointer functions are perfect for using as debugging parameter validation functions. If they are wrapped in ASSERT macros, they become doubly useful. While these, combined with _CrtCheckMemory, offer sufficient memory checking, there's still more!
Another neat feature of the debug runtime is the memory state routines. _CrtMemCheckpoint, _CrtMemDifference, and _CrtMemDumpStatistics make it easy to do before and after comparisons of the heap to see if anything is amiss. For example, if you are using a common library in a team environment, you could take before and after snapshots of the heap when you call the library to see if there are any leaks, or to see how much memory is used on the operation.
The icing on the memory-checking cake is that the debug runtime allows you to hook into the memory allocation code stream so you can see each allocation and deallocation function call. If the allocation hook returns TRUE, the allocation is allowed to continue. If the allocation hook returns FALSE, then the allocation will fail. My immediate thought was that, with a small amount of work, I could have a means to test code in some really nasty boundary conditions that would be very difficult to duplicate. Fortunately, you will not have to do that work because it's handled by MemStressLib, one of the libraries that I will present later.
The cherry on top of the icing of the memory-checking cake is that the debug runtime allows you to hook the memory dumping routines and to enumerate client blocks (your allocated memory). With the memory dumping, you can now hook in a dump routine that knows about your data, so that instead of seeing the default cryptic dumped memory, which is not very helpful, you can see exactly what the memory block contains and format it exactly as you want. MFC has the Dump function for this purpose, but it only works with CObject-derived classes. If you're like me, you don't spend your entire coding life in MFC and you need dumping functions that are more generic to accommodate different types of code.
The client enumeration, as the name implies, allows you to enumerate the memory blocks you have allocated. This means you have an excellent opportunity for some interesting utilities. In the MemDumperValidator library, I combined the dumping hooks so the enumeration can dump and validate many types of allocated memory. The validation is very important when you consider that this extensible validation allows you to do "deep" validation instead of the surface checks of underwrites and overwrites. By deep, I mean something that knows about what is in the memory block so it can truly make sure that everything is correct.
I have just highlighted the debug runtime here so you'll more easily understand the libraries that I am about to present. There is a great deal of value in the debug runtime, and the best part is that, in conjunction with the MemStress and MemDumperValidator libraries, you get all of your cake and a fork to help you eat it. But before I jump right into the code, I need to explain a little bit about how things are initialized in Visual C++®. Just How Are Things Initialized and Terminated in C++?
In my MemDumperValidator library, I take advantage of a small trick to get everything initialized by the compiler long before you use the library, and terminated long after the program is finished executing your code. While I could have you call initialization and shutdown functions to tell MemDumperValidator to start and stop, your calls could happen too late and too early, respectively, if you have any static C++ classes that use MemDumperValidator. Static C++ classes are constructed before main/WinMain is called and destructed after main/WinMain returns, so it might be rather difficult for you to figure out when to do the initialization and shutdown calls. My goal is to make the library as automatic as possible so you can just use it without spending a lot of time figuring it out. Also, controlling the initialization order is something that you do not think about much, but when it is wrong it takes some serious debugging effort to get right.
What I need is a way to tell the compiler to call my initialization routines before it calls the ones in my code, and to call my termination routines after it calls those in my code. The documentation refers to this as the initialization order; the #pragma init_seg is how you can control it. There are several "parameters" that you can pass to the init_seg directive: compiler, lib, user, section name, and func-name. The first three are the important ones.
The compiler directive is reserved for the Microsoft compiler, and any objects specified for this group are constructed first and destructed last. Those that are marked as lib are constructed next and destructed before the compiler-marked group, and those marked user are constructed last and terminated first.
Since the code that I developed needs to be initialized before your code, I could just specify lib as the directive to init_seg and be done with it. However, if you are creating libraries and marking them as lib segments (as you should) and want to use my code, my code still needs to be initialized before your code. To handle this contingency, I set the init_seg directive as "compiler." While I would not suggest you do this with release-build code, it is safe enough with debug code.
Since the initialization idea only works with C++ code, MemDumperValidator uses a special static class that simply calls the initializer functions for the libraries. The initializer function is a little more complex than just setting a couple of variables to known values, which is why I need to go to all of this trouble. Additionally, as discussed below, to get around some limitations in the debug runtime memory leak checking, I need to do some special processing on the class destruction. Even though the library only has a C interface, I can take advantage of C++ to get everything lined up so it is ready when you call it.
Using Memory Dumper and Validator Library
As I mentioned earlier, the stock memory dumping routines could be improved because they just display the first couple of bytes of the block and the address. The MemDumperValidator library allows you to hook into the debug runtime memory dumping; you can get nicely formatted output of exactly what is in your memory, so you have an idea what is in the block you are leaking or corrupting. When debugging, any extra piece of information is power.
In addition to memory dumping, the validator portion gives you the means not only to check for overwrites and underwrites, but a clean way to do deep validation of everything in the memory block. Before I delve into the inner workings, I will discuss the high-level view and how to use the library. If you want to follow along with the code, Figure 2 shows the header file MemDumperValidator.h.
The MemDumperValidator takes advantage of the debug runtime block identifier capabilities so that it can associate a block type to a set of routines that knows something about what is in the block. After you set up a class or C data type to use the MemDumperValidator library, the library will be called when the debug runtime wants to dump a block. The library will look at the block value, and if there is a matching dumping function, it will call it to dump the memory. The validation portion will do the same thing when called by the debug runtime, except it calls the validation function. As usual, describing it is easy, but getting it all to work is a little more difficult.
Setting up a C++ class so it can be handled by the MemDumperValidator library is a relatively simple operation. In the declaration of the C++ class, just specify the DECLARE_MEMDEBUG macro with the class name as the parameter. This macro is rather like some of the magic MFC macros in that it expands into a couple of data and method declarations. If you are following along in MemDumperValidator.h, you will notice that there are three inline functions: new, delete, and new with placement syntax. If you have any of these three operators in your class, then you will need to extract what these functions do and place it in your code.
In the implementation file for your C++ class, you need to use the IMPLEMENT_MEMDEBUG macro, again with your class name as the parameter. This sets up a static variable for your class. The IMPLEMENT_MEMDEBUG and DECLARE_MEMDEBUG macros only expand in _DEBUG builds, so they do not need to have conditional compilation used around them.
After you have specified both macros in the correct place, you will only need to implement the two functions that will do the actual dumping and validation for your class. The prototypes for those functions are shown below. Obviously, you will want to put some conditional compilation around them so they are not compiled into release builds.
|