Discussion:
[Opensg-users] Core's parent links are inconsistent after cloneTree() and subref with multiple aspects
Marcus Lindblom Sonestedt
2017-02-08 16:54:24 UTC
Permalink
Hi all!

We are consistently triggering a probable bug in OpenSG where it complains
about "link inconsistent" after shallow clone and destroying all refs to
the original.

This happens on the following sequence of events (which the attached code
demonstrates)

1. create node with group core
2. apply changelist in another aspect's thread
3. clone node with cloneTree (share core)
4. clear refereces to original
5. apply change list in another aspect's thread again

I've attached the log output (with dumped changelists), if it helps.

The attached code provides a small repro case (testSyncChangeLists) which I
believe is the minimal code to trigger the bug, and does run on it's own
(no dependency except the changelist dump).

Before I dig deeper to the cause in OpenSG, are we doing something wrong
with how we sync changelists/clone objects, or is this a genuine bug?

Cheers,
--
Marcus Lindblom Sonestedt
*Systemarkitekt*
*BIT ADDICT *- Passion för utveckling
+46 (0)706 43 63 28
***@bitaddict.se
www.bitaddict.se
Marcus Lindblom Sonestedt
2017-02-08 17:15:39 UTC
Permalink
Hi again,

Some minor changes to cleans up thread and callback so test doesn't crash
on osgExit().

Then I also get these lines in the output:

FieldConnectorFactoryBase::terminate
WARNING: FieldContainerFactoryBase::terminate: Entry [290] is not NULL
([0000000002B7F060]).
WARNING: [0] [0000000000000000] [] [N/A N/A]
WARNING: [1] [0000000002BAE0B0] [Group] [1 0]
WARNING: FieldContainerFactoryBase::terminate: Entry [291] is not NULL
([0000000002B7F1E0]).
WARNING: [0] [0000000000000000] [] [N/A N/A]
WARNING: [1] [0000000002166CC0] [Node] [1 0]

/Marcus

2017-02-08 17:54 GMT+01:00 Marcus Lindblom Sonestedt <
Post by Marcus Lindblom Sonestedt
Hi all!
We are consistently triggering a probable bug in OpenSG where it complains
about "link inconsistent" after shallow clone and destroying all refs to
the original.
This happens on the following sequence of events (which the attached code
demonstrates)
1. create node with group core
2. apply changelist in another aspect's thread
3. clone node with cloneTree (share core)
4. clear refereces to original
5. apply change list in another aspect's thread again
I've attached the log output (with dumped changelists), if it helps.
The attached code provides a small repro case (testSyncChangeLists) which
I believe is the minimal code to trigger the bug, and does run on it's own
(no dependency except the changelist dump).
Before I dig deeper to the cause in OpenSG, are we doing something wrong
with how we sync changelists/clone objects, or is this a genuine bug?
Cheers,
--
Marcus Lindblom Sonestedt
*Systemarkitekt*
*BIT ADDICT *- Passion för utveckling
+46 (0)706 43 63 28
www.bitaddict.se
--
Med vÀnliga hÀlsningar,
Marcus Lindblom Sonestedt
*Systemarkitekt*
*BIT ADDICT *- Passion för utveckling
+46 (0)706 43 63 28
***@bitaddict.se
www.bitaddict.se
Marcus Lindblom Sonestedt
2017-02-09 08:55:03 UTC
Permalink
Just following up with new findings here.

First, those "entry [xxx] is not null" log messages are not a part of the
problem, as they disappear if I clear the refs in aspect 0 and apply the
changelist to aspect 1 again before exiting the test,

Second, the group's parents-field seems to auto-update via execSync, but
unlinkParent() is called anyway

* Main: linkParent( node A0) via setCore()

* Thread: applyChangeList A0 -> A1 - linkParent never called on group in
A1, happens automagicallly?

* Main: linkParent( clone A0); -> via cloneTree()

* Main: unlinkParent( node A0); -> via node-destructor as refs => 0 from
smartptr

* Thread: applyChangeList A0 -> A1, causes these calls, but no linkparent
for clone A1 to group 1
* unlinkParent( node A1) - KABOOM, group's parent list only has clone A1.
(.. is called via applyAndClear()->subreferencesunrecorded->(node
A1->resolvelinks()->setCore(null))

So, the unlinkParent(node A1) call via applyAndClear is unnecessary as the
node is already gone, and we get the warning.

So, the warning seems pretty harmless as the state is correct before
calling unlinkParent(), but when cloning trees of ~10000 nodes the log
noise is immense, so we have to filter it out.

That is of course doable, but undesireable, as we get the same error when
using the wrong aspect-pointers across threads.

Cheers,
/Marcus
Marcus Lindblom Sonestedt
2017-02-09 12:28:02 UTC
Permalink
Hi,

I managed to fix it for the simple test case by clearing core and children
of nodes that will be destroyed. This is done before applying the
changelist, using the following workaround function,.

We still get the warning in our app at some places so I need to work on a
small repro-case for that.

void clearCoresForNodesAboutToBeDestroyed(OSG::ChangeList* changeList,
bool log)
{
std::unordered_map<OSG::Node*, int> nodeRefs;

// compute final refcount on nodes in changelist
for (const auto e : *changeList)
{
if ((e->uiEntryDesc & (OSG::ContainerChangeEntry::SubReference |
OSG::ContainerChangeEntry::AddReference)) == 0)
continue;

auto fc =
OSG::FieldContainerFactory::the()->getContainer(e->uiContainerId);
if (!fc) {
if (log)
printf("Container id [%i] not created on this aspect yet\n",
e->uiContainerId);
continue;
}

auto node = dynamic_cast<OSG::Node*>(fc);
if (!node)
continue;

auto i = nodeRefs.find(node);
if (i == nodeRefs.end())
i = nodeRefs.insert({ node, node->getRefCount() }).first;

if (e->uiEntryDesc & OSG::ContainerChangeEntry::SubReference)
i->second -= 1;

if (e->uiEntryDesc & OSG::ContainerChangeEntry::AddReference)
i->second += 1;
}

// clear core and chjildren all nodes whose refcount becomes zero
for (const auto nodeRef : nodeRefs)
{
if (nodeRef.second > 0) // won't be deleted here
continue;

assert(nodeRef.second == 0);
auto node = nodeRef.first;

if (log) {
printf("Clearing Core & children on Node %p [%i], which is about to
die.\n",
node, node->getId());
}

node->setCore(nullptr);
node->clearChildren();
}
}


Cheers
/Marcus



2017-02-09 9:55 GMT+01:00 Marcus Lindblom Sonestedt <
Post by Marcus Lindblom Sonestedt
Just following up with new findings here.
First, those "entry [xxx] is not null" log messages are not a part of the
problem, as they disappear if I clear the refs in aspect 0 and apply the
changelist to aspect 1 again before exiting the test,
Second, the group's parents-field seems to auto-update via execSync, but
unlinkParent() is called anyway
* Main: linkParent( node A0) via setCore()
* Thread: applyChangeList A0 -> A1 - linkParent never called on group in
A1, happens automagicallly?
* Main: linkParent( clone A0); -> via cloneTree()
* Main: unlinkParent( node A0); -> via node-destructor as refs => 0 from
smartptr
* Thread: applyChangeList A0 -> A1, causes these calls, but no linkparent
for clone A1 to group 1
* unlinkParent( node A1) - KABOOM, group's parent list only has clone A1.
(.. is called via applyAndClear()->subreferencesunrecorded->(node
A1->resolvelinks()->setCore(null))
So, the unlinkParent(node A1) call via applyAndClear is unnecessary as the
node is already gone, and we get the warning.
So, the warning seems pretty harmless as the state is correct before
calling unlinkParent(), but when cloning trees of ~10000 nodes the log
noise is immense, so we have to filter it out.
That is of course doable, but undesireable, as we get the same error when
using the wrong aspect-pointers across threads.
Cheers,
/Marcus
--
Med vÀnliga hÀlsningar,
Marcus Lindblom Sonestedt
*Systemarkitekt*
*BIT ADDICT *- Passion för utveckling
+46 (0)706 43 63 28
***@bitaddict.se
www.bitaddict.se
Loading...