One of the hardest things is debugging a reference count leak. COM objects lifetime depends on the reference count (read here for more...). So each client of a COM object must call AddRef on the IUnknown interface when going to use it and it must call Release when done. If any client (and there might be many many of a single one) violates this rule you get into severe trouble.
Scenarios
1.) Number of Release calls = Number of AddRef calls
This is the normal scenario: As soon as no client needs the server object anymore it is getting destroyed
2.) Number of Release calls > Number of AddRef calls
If Release is called one time too often another client might crash because the server get's destroyed too early - bad thing here is that you see the crash in some place but this does not tell you where is root cause is located. All you know is which objects reference count has been corrupted.
3.) Number of AddRef calls > Number of Release calls
If AddRef is called one time too often the reference count never reaches 0 and hence the server object never get's destroyed. This is causing memory leaks and also might cause resource leaks. The effect of this scenario is much less obvious: You might see memory increasing over time and/or performance degrade and/or resources to be locked when they should be unlocked again.
Finding the place where the unbalanced AddRef/Release occurred might be like finding the needle in the hay. I did research in the Google reachable web but didn't find a good tool available that really assist's in this task. Luckily Sara Ford described in this post the first step you need to take in order to get the data necessary to drill down into the problem.
Somehow I didn't manage to set the trace points in Visual Studio 2005 (can anybody tell me how to set a break point on a single objects AddRef, Release methods?) so I launched my beloved WinDbg.
First I created script to create me an xml snippet for an event that alters the ref count (I didn't find a better name so I called it ToXml.txt and placed it into my scripts folder):
.printf "-->\n<Event><Ref>%d</Ref><![CDATA[",poi(${$arg1})
k100
.printf "]]></Event>\n<!--\n"
Then I placed a break point on the server objects constructor
bp MyServer!CMyClass::CMyClass
When the breakpoint hit, I stepped out <Shift>+<H11> into CComCreator::CreateInstance and then stepped over the p->SetVoid(pv); call in this class.
(I think it should be possible to set a breakpoint directly at MyServer!ATL::CComCreator<ATL::CComObject<CMyClass> >::CreateInstance+0xb1, but I didn't try...)
Now I gathered the address of m_dwRef by:
0:000> ?? &(p->m_dwRef)
long * 0x110d724c
Next thing to do is setting the data breakpoint by:
ba w4 0x110d724c "$$>a<C:/windbg/scripts/ToXml.txt 0f084cb4;gc"
(you might need to change the path 'C:/windbg/scripts/')
With .logopen we make sure that we directly write all events into an logfile:
.logopen c:\temp\Events.xml
Now let the application run with 'g' or <F5> and do whatever creates your ref counting problem.
When done break into and close the log with .logclose.
At this point we are half the way through. The Events.xml we created is not valid xml. You need to add
<?xml version="1.0" encoding="UTF-8"?>
<Events>
<!--
at the beginning and
--></Events>
at the end.
Now comes the tooling. In my scenario I had around 1400 Events - a little tedious to analyze all by hand.
So I created "Volkers RefCount Buster" which does the following:
1.) After loading the file (enter path in first text box and press Start) all events are identified for either beeing AddRef or Release
2.) Then the call stack is taken to group the events:
First action is to exclude events that match the pattern entered in the second text box (exclude pattern):
var includeQuery = from frame in this.StackFrames.Frames
where String.IsNullOrEmpty(ExcludePattern) ? true : !excludePattern.IsMatch(frame)
select frame;
Then the remaining frames are searched for the selection pattern:
var selectionQuery = from frame in includeQuery
let match = selectionPattern.Match(frame)
where match.Success
select match.Value;
and the top most match is taken:
string sourceGroup = selectionQuery.FirstOrDefault();
the all events are grouped into the found source groups:
var ResultQuery = from refCountEvent in refCountEvents
group refCountEvent by refCountEvent.SourceGroup into g
select g;
Then the number of AddRefs and Releases is calculated for each group and accumulated:
foreach (var ResultSet in ResultQuery)
{
long numOfAddRefs = (from rce in ResultSet.AsEnumerable()
where rce.RefCountType == EventType.AddRef
select rce).Count();
long numOfReleases = (from rce in ResultSet.AsEnumerable()
where rce.RefCountType == EventType.Release
select rce).Count();
long balance = numOfAddRefs - numOfReleases;
...
Now it's up to you to find the Exclude Pattern and Selection Pattern that will directly point you to the component or file, that is the culprit. Then you just need to look at those stacks that belong to the found bad guy and you will also be able to see the source line that created the problem.
You can download the sources and binaries here...
Have fun,
Volker