You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add a new `infer_cardinality` method to the `MappingSetDataFrame` to
fill the `mapping_cardinality` slot with computed cardinality values.
The approach used here is more or less a direct Python translation of my
existing implementation in SSSOM-Java.
The gist of it is that we iterate over the entire set of records a first
time to populate two hash tables: one that associates a subject to all
the different objects it is mapped to, and one that associates an object
to all the different subjects it is mapped to. Then we can iterate over
the records a second time, and for every record we can immediately get
(1) the number of different objects mapped to the same subject and (2)
the number of different subjects mapped to the same object; the
combination of those two values gives us the cardinality we are looking
for.
To deal with the concept of "scope", the "subjects" and "objects" that
we use to fill the hash tables are not made of only the "subject_id"
slot or the "object_id" slot, but also of all the slots that define the
scope. For example, if the scope is `["predicate_id"]`, then for the
following record:
subject_id predicate_id object_id
DO:1234 skos:exactMatch HP:5678
the "subject" string will contain both `DO:1234` and `skos:exactMatch`,
and the "object" string will contain both `HP:5678` and
`skos:exactMatch`.
0 commit comments