after around 30 person-days of effort working on improving the
performance and locking characteristics of a particularly horrendous
distributed transaction in our .NET, ServicedComponent-based legacy
architecture, we encounter this microsoft hotfix which resolves the issue instantly.
to be precise, this wasn't actually a performance issue, it was a
process-wide hang which snared any thread that made a ServicedComponent
call after something, somewhere
happened. we're still not sure what the something is, because the
microsoft hotfix bug description basically says "the servicedcomponent
code we shipped isn't re-entrant. here's an unsupported,
available-by-phone-only, hotfix". a shitty level of support, imho,
given what it cost us to diagnose the problem.
the top portion of the stack trace of the hung threads looks like this:
system.enterpriseservices.thunk.dll!System.EnterpriseServices.Thunk.Proxy.RevokeObject(int cookie) + 0x80 bytes
system.enterpriseservices.dll!System.EnterpriseServices.ServicedComponentProxy.CleanupQueues(bool bGit) + 0x71 bytes
system.enterpriseservices.dll!System.EnterpriseServices.ServicedComponentProxyAttribute.CreateInstance(System.Type serverType) + 0x3b bytes
mscorlib.dll!System.Runtime.Remoting.Activation.ActivationServices.IsCurrentContextOK(System.Type serverType, System.Object[] props, bool bNewObj) + 0x4b bytes
mscorlib.dll!System.Activator.CreateInstance(System.Type type, bool nonPublic) + 0x43 bytes
mscorlib.dll!System.Activator.CreateInstance(System.Type type) + 0x8 bytes
the key warning sign is the RevokeObject method at the top. from the
"CleanupQueues" method (second in the call stack), i'm inferring that
the code does opportunistic clean-up each time ServicedComponentProxyAttribute.CreateInstance() is called. something
has hosed the queue structures, or something referenced by them,
causing the RevokeObject to block infinitely, loop, or otherwise die.
looking at the disassembly for RevokeObject, there's some GIT (global
interface table) calling going on which is opaque to me, given my level
of understanding. i haven't drilled down any further than that.
other ... people ... have ... encountered
the same problem. the cost to developers and clients is significant.
this is not the right way to manage an issue of this sort, MS.
update [later the same day]: a little more research turns up a DisableAsyncFinalization registry key:
[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\COM3\System.EnterpriseServices]
"DisableAsyncFinalization"=dword:00000001
which is suggested as a possible solution
here.
an ms kb article exists titled FIX: COM+ application that uses the Global Interface Table (GIT) may deadlock. remember that within RevokeObject, there's a GIT-style method call. two excerpts from the article:
-
"If you experience this issue, multiple threads in the
process show call stacks that involve access to the Global Interface
Table (GIT)."
-
"When you use COM+ components that are written by using
managed code, such as Visual C# or Visual Basic .NET and you do not
explicitly call the Dispose method on these objects."
so it's possible that our app is just not being rigorous enough about doing its Dispose()s/using()s. more later.