Compute percent agreement for production dataset

Here is the algorithm that @manaswis and I came up with, as written by @manaswis on Slack:

>For agreement/consensus, we want to be able to say that an object was marked as a problem by X users out of Y users who audited this street. So in our results, we can say "Amongst streets audited by multiple users, X% of labels have Y% agreement" and then we break it down by label type.

>So to do this, for a street, we first have to get both the number of users who said there was a problem (for a cluster), and the number of users who said there was no problem. We decided to do this in the following way: for a label cluster (after doing single and multi user clustering), the number of users who have a label in the cluster is the number who said there was a problem. To get the number who said it wasn't a problem: we find which street is closest to that cluster, and then we count the number of users that placed any label on that street, but did not have a label in that cluster. Based on these number, we calculate the percent agreement for a problem cluster.

Although I have an idea for how we can simplify the algorithm, while still feeling confident in the results. Instead of counting the number of users who placed a label on that street, I think we can just take the number of users who audited that street, but _only_ because we are looking at the set of "good" users (i.e., those with a high labeling frequency). Looking at users who had placed a label on that particular street was mostly meant to find users who *acutally* audited the street; I think this is covered when we are looking at our set of "good" users.

@manaswis how do you feel about that modification and its justification?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Compute percent agreement for production dataset #6

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Compute percent agreement for production dataset #6

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions