Skip to content

Crash finalizing entries in EventManager dictionary #1044

@daniels220

Description

@daniels220

Roughly speaking, I'm seeing a segfault somewhere in the execution of WeakIdentityKeyDictionary(IdentityDictionary)>scanFor:, specifically when finalizing associations from EventManager.actionMap whose key (a random object that is the source of events) has been garbage collected. But there is clearly more to this, because my naive attempt to reproduce outside of the application hasn't worked so far.

Roughly speaking, the process is simply:

  1. Open a window in my application that listens to events on lots of objects.
  2. Close the window. In this particular case this means that those events are not explicitly released, and the objects become garbage.
  3. Do some other work and either wait for a garbage collection to happen naturally, or trigger one manually.

Here's the crash.dmp output with application details removed. Just once, out of half a dozen occurrences, the crash happened while explicitly removing all events from an object, rather than finalizing when the object itself is gone, so the active-process stack looked like:

Process      0x10003eddb20 priority 40
       0x280033938 M WeakIdentityKeyDictionary(IdentityDictionary)>scanFor: 0x10001d52180: a(n) WeakIdentityKeyDictionary
       0x280033978 M WeakIdentityKeyDictionary(HashedCollection)>findElementOrNil: 0x10001d52180: a(n) WeakIdentityKeyDictionary
       0x2800339c8 M WeakIdentityKeyDictionary(Dictionary)>fixCollisionsFrom: 0x10001d52180: a(n) WeakIdentityKeyDictionary
       0x280033a10 M WeakIdentityKeyDictionary(Dictionary)>removeKey:ifAbsent: 0x10001d52180: a(n) WeakIdentityKeyDictionary
       0x280033a50 M EventManager class>releaseActionMapFor: 0x1000060ec68: a(n) EventManager class

but the C stack looked very similar.

Factors I've thought of that might be relevant:

  • The objects in question are subject to swapping with become:/elementsExchangeIdentityWith:. I don't have a way to test if it happens without this—if I can get a working repro outside my application with this then I can try removing it to see if it matters.
  • They also appear as the values of a WeakValueDictionary. Again I can't see if this matters without having a working repro first.

This is what I've tried for the reproduction—as far as I know this is all the application is really doing, stripped down to bare essentials, but of course there's a tremendous amount actually going on, almost all of it irrelevant, but it's very possible I've missed something:

| looping |
count := 1000.
objectsA := Array new: count.
objectsB := Array new: count.
eventTarget := Object new.
proxyCache := WeakValueDictionary new.

looping := true.
"Simulate another application process doing some unrelated work which generates both garbage in general and event-bearing garbage in particular."
[
[ looping ] whileTrue: [
	(1 to: 1000) collect: [ :i |
		(Object new
			 when: #someOtherEvent send: #onSomeOtherEvent to: eventTarget;
			 yourself) -> (i / 7 * 22 / 4 * 9) ] ] ] forkAt:
	Processor userBackgroundPriority.
[
1 to: 100 do: [ :iteration |
	Transcript crShow: 'Iteration ' , iteration displayString.
        "'Open the window'—create some objects and listen for events"
	1 to: count do: [ :i |
		| objectA |
		objectA := Object new.
		objectA when: #someEvent send: #onSomeEvent to: eventTarget.
		objectsA at: i put: objectA.
		proxyCache at: i put: objectA.
		objectsB at: i put: Object new ].
        "This is also likely to happen in the process of opening the window. It may even happen several times."
	objectsA elementsExchangeIdentityWith: objectsB.
        "User is not infinitely fast, so the other process gets a chance to run."
	Processor sleep: 50.
        "'Close the window'—discard the only strong ref to the objects."
	objectsA atAllPut: nil.
	objectsB atAllPut: nil.
	Processor sleep: 50.
        "Trigger GC until all the objects are gone—though for some reason one seems to hang around!"
	[
	Smalltalk garbageCollect.
	Processor sleep: 10.
	proxyCache size > 1 ] whileTrue.
	Transcript crShow: 'done' ].
looping := false ] fork

I am running Pharo 11 on the 10.0.5 VM. I cannot update to the latest VM because of #1018.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions