Thursday, February 21, 2008

Shooting the PAGE_GUARD flag with MiniDumpWithIndirectlyReferencedMemory

A colleague of mine (thanks Ralf for pointing this out!) told me that using MiniDumpWithIndirectlyReferencedMemory in MiniDumpWriteDump can cause a nasty crashes.
Following the "in 99.9% of the cases it is your own fault" pattern I suspected the problem to be somewhere else but in dbghelp. Ralf kindly provided me with a sample project which I condensed a bit to fit on a single page:

// GuardPageDump.cpp : Defines the entry point for the console application.
//
#include "stdafx.h"
#include "C:/Program Files/Debugging Tools for Windows/sdk/inc/dbghelp.h"
#pragma comment( lib, "C:/Program Files/Debugging Tools for Windows/sdk/lib/i386/dbghelp.lib")

#include "process.h"

void
BigFunc ()
{

char
sBigBuffer [20000] = {'\0'};
printf ("BigFunc was called!\nsBigBuffer: %s\n",sBigBuffer);
}


void
ProblemFunc(HANDLE hWaitForMe)
{

char
sDummy1[] = {'A','\0'};

unsigned long
iDummy = reinterpret_cast<unsigned long>(&(sDummy1[0]));

// let iDummp seem like a pointer pointing to the guarded page
iDummy -= 0x2000;
printf ("Integer value: %d\n", iDummy);

//make sure the integer value pointing to the guard page area is on the stack, when MiniDumpWriteDump is called from another stack
//during wait we will issue a dump creation on the main thread
WaitForSingleObject(hWaitForMe,INFINITE);

printf ("Calling BigFunc crashes since stack can no longer be extended\n");

BigFunc();

printf ("Integer value: %d\n", iDummy);
}


void
ProblemFuncThread(void * p)
{

ProblemFunc(reinterpret_cast<HANDLE>(p));
_endthread();
}


int
_tmain(int argc, _TCHAR* argv[])
{

printf ("This program demonstrates the damaging effect of creating a userdump with MiniDumpWithIndirectlyReferencedMemory\n");

HANDLE hWaitForMe = CreateEvent(NULL,FALSE,FALSE,NULL);

printf ("Calling ProblemFunc on a different thread\n");
uintptr_t hThread = _beginthread(ProblemFuncThread,0,hWaitForMe);
// give the thread time to start
::Sleep(1000);

printf ("Creating a userdump of type MiniDumpWithIndirectlyReferencedMemory\n");
printf ("Resets guard page flag on the page pointed to by iDummy\n");
HANDLE hFile = CreateFile(_T("c:\\temp\\test_indirect.dmp"), GENERIC_READ | GENERIC_WRITE, 0, NULL, CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL );
MiniDumpWriteDump(GetCurrentProcess(),GetCurrentProcessId(),hFile,(MINIDUMP_TYPE) (MiniDumpWithIndirectlyReferencedMemory),0,0,0);

CloseHandle(hFile);

SetEvent(hWaitForMe);
CloseHandle(hWaitForMe);

WaitForSingleObject(reinterpret_cast<HANDLE>(hThread),INFINITE);
return
0;
}


This sample assumes you have installed Debugging Tools for Windows to the default location (C:/Program Files/Debugging Tools for Windows) and you have selected to install the SDK as well.

I'm setting two breakpoints: One on the line that is calling MiniDumpWriteDump - another on the line before calling BicFunc.

First let's have a look at the state before calling MiniDumpWriteDump:


Switching to thread 001 we will notice the value of iDummy = 0x88df38 on the stack that is waiting:


Now before calling MiniDumpWriteDump let's have a look at the memory layout:

0:001> !vadump

[...]

BaseAddress: 00125000
RegionSize: 00001000
State: 00001000 MEM_COMMIT
Protect: 00000104 PAGE_READWRITE + PAGE_GUARD
Type: 00020000 MEM_PRIVATE

[...]

BaseAddress: 0088d000
RegionSize: 00001000
State: 00001000 MEM_COMMIT
Protect: 00000104 PAGE_READWRITE + PAGE_GUARD
Type: 00020000 MEM_PRIVATE


Ok, there is a PAGE_GUARD flag for each thread...

0:000> ~0s;!teb
[...]
GuardPageDump!wmain+0x91:
00401991 6a00 push 0
TEB at 7ffdf000
[...]
StackLimit: 00126000
[...]
0:000> ~1s;!teb
[...]
ntdll!KiFastSystemCallRet:
771d9a94 c3 ret
TEB at 7ffde000
[...]
StackLimit: 0088e000
[...]

Adding the RegionSize to BaseAddress gives us the StackLimit observed by !teb.

Now comes the clue: iDummy holds the value 0x88df38 (no pointer) that represents an address in the guarded page. Dbghelp does not know by looking at the stack if this is a pointer or value and follows the indirections. Looking at the memory layout after call to MiniDumpWriteDump reveals the problem:
0:001> !vadump
[...]
BaseAddress: 00125000
RegionSize: 00001000
State: 00001000 MEM_COMMIT
Protect: 00000104 PAGE_READWRITE + PAGE_GUARD
Type: 00020000 MEM_PRIVATE

[...]

BaseAddress: 0088d000
RegionSize: 00003000
State: 00001000 MEM_COMMIT
Protect: 00000004 PAGE_READWRITE
Type: 00020000 MEM_PRIVATE

The PAGE_GUARD flag for thread 001 is gone! Now it's just a question of time until your application will crash without giving you any clue on the root cause of the problem:

0:001> g
(13e0.ee4): Access violation - code c0000005 (first chance)
First chance exceptions are reported before any exception handling.
This exception may be expected and handled.
eax=0088c000 ebx=00bc2e30 ecx=0088b100 edx=771d9a94 esi=00000000 edi=00000000
eip=00401a07 esp=0088ff24 ebp=0088ff2c iopl=0 nv up ei pl nz na pe nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00010206
GuardPageDump!_chkstk+0x27:
00401a07 8500 test dword ptr [eax],eax ds:0023:0088c000=????????

0:001> g
(13e0.ee4): Access violation - code c0000005 (!!! second chance !!!)


Next thing I will do is to tell Microsoft about this. Meanwhile I'll try to restore the PAGE_GUARD flag as a workaround fix using VirtualProtext but I'm not sure if this is really a good idea.

BTW: I've tested this the latest version of dbghelp.dll I know:
A colleague of mine recently investigated a crash that could be deducted to the usage of MiniDumpWriteDump along with MiniDumpWithIndirectlyReferencedMemory flag. The problem is, that MiniDumpWriteDump clears the PAGE_GUARD flag if a value or pointer is on the a stack other than the stack that calls MiniDumpWriteDump.
As a result the process will crash later if the stack needs to be extended.
I've described the problem in detail here:
http://voneinem-windbg.blogspot.com/2008/02/shooting-pageguard-flag-with.html

I would be nice to see fix for this, because the MiniDumpWithIndirectlyReferencedMemory is really useful.
I didn't investigate into the other flags of MINIDUMP_TYPE Enumeration but MiniDumpNormal and MiniDumpWithFullMemory seem to work without a problem.

I've been testing this with the latest public version of dbghelp.dll
0:001> lmvm dbghelp
start end module name
68af0000 68c05000 dbghelp (deferred)
Image path: C:\Users\voneinem\Documents\Visual Studio 2008\Projects\GuardPageDump\Release\dbghelp.dll
Image name: dbghelp.dll
Timestamp: Thu Sep 27 23:27:05 2007 (46FC2029)
CheckSum: 0010087A
ImageSize: 00115000
File version: 6.8.4.0
Product version: 6.8.4.0

3 comments:

Volker von Einem said...
This comment has been removed by the author.
Volker von Einem said...

Just got a reply from microsoft:

Basically MiniDumpWriteDump() calls ReadProcessMemory() to read and dump memory blocks. In your scenario, this is in-proc ReadProcessMemory() call (that is, the process reads its own process memory). Any in-proc attempt to read from a guard page will reset the guard page. That’s one of the reasons why we are explaining folks that using a random pointer, even under a try/except, is dangerous.

[Comment] There is no random pointer the supplied demo - it is simply an ulong.

Bottom line, it is not recommended to call MiniDumpWriteDump() in-proc. In addition to the problem above, potentially it would also cause process deadlock and other weird badness.

[Comment] I have to think about this. We made extensive use of writing minidumps from within the process with great success for a long time and in different projects. The described case is the only bad observation we made so far...

Volker von Einem said...

I was asking back on the problems that might arise when calling MiniDumpWriteDump in-proc and the best practices for gathering production dumps.
Here's the answer I got:

[quote]
When you call MiniDumpWriteDump() with MiniDumpWithIndirectlyReferencedMemory flag, it will scan through thread stack and treat entries as potential pointers as it does not know whether the entries are ulong local variables, function parameters, stack/heap pointers, etc. Please consider the case that you only have running process without symbols nor sources, you could not say that the entry here is simply a ulong local variable.

Generating minidump in-proc has issues, especially in multi-thread process cases. Take deadlock as example, a lot of the system calls made by MiniDumpWriteDump() would request per-process system resources, and you never know whether those resources are free or being held by other threads. If you suspend all thread execution except the one that calls MiniDumpWriteDump(), potentially you would hit a deadlock; if you let all threads executing while you take the minidump, the information in the dump could be inconsistent (which makes the generated dump useless).

There are several approaches to diagnose application issues. You could instrument applications using ETW (http://msdn.microsoft.com/msdnmag/issues/07/04/ETW/default.aspx) then use the trace files as the application footprint (the trace buffer could be in kernel, so it will persist even the app itself crashes); you could setup postmorterm debugger or WER (Window Error Reporting in Vista) to catch crashes/AVs and generate needed minidumps. Also you could consider time travel debugging (http://research.microsoft.com/manuvir/papers/instruction_level_tracing_VEE06.pdf).