Heap corruption when trimming delayed free queue

I'm currently attempting to track down the source of heap corruption in our code base, which doesn't present itself when full page heap tracking is turned on (so only normal page tracking).

I'm using Application Verifier to break on the corruption, and get a not-so-helpful stop code of 00000008:

APPLICATION_VERIFIER_HEAPS_CORRUPTED_HEAP_BLOCK (8)
Corrupted heap block.
This is a generic error issued if the corruption in the heap block cannot be placed in a more specific category.

=======================================
VERIFIER STOP 00000008: pid 0xD30: Corrupted heap block.

00000000 : Heap handle used in the call.
0861C000 : Heap block involved in the operation.
0000043C : Size of the heap block.
00000000 : Reserved

=======================================

I've had to trim down the report to protect the innocent, but bear with me. The callstack shows:

1000c540 00000008 00000000 vrfcore!VerifierStopMessageEx+0x543
00000008 7c969624 00000000 vrfcore!VfCoreRedirectedStopMessage+0x81
00000000 00000009 0861c000 ntdll!RtlpDphReportCorruptedBlock+0x101
04a680ee 01001002 03ce1000 ntdll!RtlpDphTrimDelayedFreeQueue+0x84
03ce1000 01001002 04a680ee ntdll!RtlpDphNormalHeapFree+0xc0
03ce0000 01001002 137a0040 ntdll!RtlpDebugPageHeapFree+0x79
03ce0000 01001002 137a0040 ntdll!RtlDebugFreeHeap+0x2c
03ce0000 01001002 137a0040 ntdll!RtlFreeHeapSlowly+0x37
03ce0000 00000000 137a0040 ntdll!RtlFreeHeap+0xf9
137a0040 137a0040 030dfe61 msvcrt!free+0xc3

Now initially, I was focusing my attention on the call to free(), assuming that the memory I was trying to free was the culprit of the heap corruption. This may still be the case, but i'm no longer convinced. Watching 0x137a0040 as I step through the delete call, the memory seems to be properly freed by the call to RtlpDphNormalHeapFree(). I'm summising that it is freed properly as the memory from 0x137a0040 to it's upper bound some 76mb later consists solely of f0, defined here as free'd memory.

So my attention turns towards the call immediately before the call to RtlpDphReportCorruptedBlock(), RtlpDphTrimDelayedFreeQueue(). The arguments passed to RtlpDphReportCorruptedBlock() would indicate to me (just a guess, I can't find any hints as to the declarations of these functions) to be the block that is corrupt. Investigation of this block displays the following:

0861c000 f0 f0 f0 f0 4f f0 f0 f0 f0 f0 f0 f0 f0 f0 f0 f0 f0 f0 f0 ....O..............

Why is this 5th byte 4f, while all the others are f0 (already freed)? What does RtlpDphTrimDelayedFreeQueue() do? Is the issue (if this is the issue) that this function is trying to free what is obviously already freed memory, or does this function expect that this memory is already free, and is losing the plot when it encounters that 5th byte?

(The 5th byte is the only odd one out, 0x0861c000 to 0x0861c43c is f0)

Unfortunately, while I can reproduce the heap corruption 100% of the time, the address seems to change every time I place a data breakpoint on it.

I'm running on Windows XP SP3, and the application is written in VC++6

Any ideas?


ANSWERS:


This suggests that you have modified the block after you freed it - perhaps from a different thread, or because something still has a pointer to it. (When you free it the runtime sets it to all F0, holds on to it for a while, then checks that it is still all F0; it isn't, so it must have been modified after the free.)

If the corruption is at a constant offset into the block you could place a breakpoint on that location changing at the point of the call to free().


C or C++ ?

If it is C++ maybe you can override new & delete and find it yourself. Just never actually deallocate memory, put in your bank instead. Allocate memory with poison fields before and after and put poison on memory when it is in your bank and check that poison all the time.

If it is C you can maybe do something similar with #define malloc. I would also search if VC6 allows you to put in your handlers instead of malloc and free.


It looks like you are dealing with a heap corruption and it is almost certain that the corruption happened sometime before the actual crash with the call stack you posted. The Rtl...() functions aren't causing the corruption, they are just forcing it to be detected.

This MSDN message describes a similar issue to yours and a few ways to debug it. There is also this MS-KB article which describes heap corruption in VC6. Both these links (and a few others I've found) mention multi-threading which is something to check if you are using it.

There is also the PageHeap application from MS, although it may do the same thing as Application Verifier.



 MORE:


 ? Heap corruption on free(...)
 ? Heap corruption while freeing memory in a recursion function
 ? Heap Corruption double freeing memory
 ? trying to free allocated memory generate heap error
 ? Heap corruption while freeing memory
 ? Heap corruption error while trying to free a two-dimentional array
 ? Heap corruption when freeing allocated memory (C)
 ? Heap corruption when freeing allocated memory (C)
 ? Heap corruption when freeing allocated memory (C)
 ? Visual Studio - how to find source of heap corruption errors