ECOD and COPOD are the same algorithm. COPOD should be deprecated in favour of ECOD

Hi,

Thank you creating a library with a wide variety of algorithms to use for outlier detection, its been very helpful in my work! However, the presence of COPOD is misleading to the average user. This algorithm is the same as the ECOD algorithm, yet, with the use of copulas, suggests that it takes into account the dependency between dimensions. I'm not knowledgeable on copulas so I hesitate to claim where an error or incorrect assumption was made but there is some discussion on it in #548. Also, this issue is separate from the incorrect implementation of both COPOD and ECOD: #453 #493.

Here I'll try to show that the algorithms are the same with reference to the papers ([COPOD](https://github.com/yzhao062/pyod?tab=readme-ov-file#li2020copod), [ECOD](https://github.com/yzhao062/pyod?tab=readme-ov-file#li2021ecod)). See **Algorithm 1** of both papers:

<div>
  <img align="top" src="https://github.com/user-attachments/assets/78245a45-b48c-49ad-8a58-06a86ae92440" alt="COPOD" width="400"/>
  <img align="top" src="https://github.com/user-attachments/assets/15df7604-f90f-4c2a-a6f4-72cb621b3139" alt="ECOD" width="400"/>
</div>

With a quick observation it is easy to see that they are similar but to be thorough I'll go through each section. 
The input and output formats are the same (see COPOD section II.D.). 
They both have two consecutive loops, the first one looping through columns (_j_ instead of _d_ in ECOD), and the second looping through rows. 
In both first loops we calculate the left & right tail ECDFs (with some notation change; the right tail ECDFs are calculated differently but equivalently) and then the skew (γ instead of _b_ in ECOD) as the 3rd standardized moment, for each column.
The second loops are rearranged a bit. COPOD's lines 7-14 are condensed into ECOD's step 6 by placing U, V, and W into the logs of the negative log sums instead of being assigned in the first place. Nonetheless, in both loops, the left tail probabilities, the right tail probabilities, and the skew dependent choice tail probabilities are, for every row, separately aggregated over the columns with a negative log sum. Finally, COPOD's line 15 is the same as ECOD's step 7, where we choose the maximum of these aggregations as the final score, for each row.
COPOD's return statement indicates a different output format than defined in section II.D. and doesn't make sense in the context of the loop before it, so I'm chalking that up to a mistake. Thus, the algorithms are the same.

Not only this, COPOD's **Table I** and **Table II** seem to perfectly match ECOD's **Table 4** and **Table 5** respectively, apart from different rounding and added comparison algorithms in ECOD's. I'll leave you to take a look at these tables yourself.

Assuming we've established that these algorithms are the same, which should be deprecated? I suggest COPOD since 1) ECOD is newer, 2) ECOD's paper is more refined in my opinion, having fewer mistakes, and adding a runtime evaluation section, and 3) the use of copulas is misleading in my opinion.

Thanks,
Sam

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ECOD and COPOD are the same algorithm. COPOD should be deprecated in favour of ECOD #655

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

ECOD and COPOD are the same algorithm. COPOD should be deprecated in favour of ECOD #655

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions