Skip to content

fix: correct and enrich 229 ISIN codes in equities database#126

Merged
JerBouma merged 1 commit intoJerBouma:mainfrom
AlfaStake:fix/equities-isin-corrections
Mar 5, 2026
Merged

fix: correct and enrich 229 ISIN codes in equities database#126
JerBouma merged 1 commit intoJerBouma:mainfrom
AlfaStake:fix/equities-isin-corrections

Conversation

@AlfaStake
Copy link
Copy Markdown
Contributor

@AlfaStake AlfaStake commented Mar 4, 2026

Summary

Cross-referenced ISIN codes from multiple authoritative sources (ESMA regulatory data, Borsa Italiana listings, top-rated Kaggle double-checked datasets) to fix incorrect and fill missing ISINs.

  • 82 ISIN corrections (replaced incorrect/obsolete codes)
  • 147 ISIN additions (filled previously empty fields)
  • Focused on MIL (Milan) exchange equities verified against Borsa Italiana official data
  • Fixed 6 suspicious ISINs with country-code mismatches (e.g. non-IT ISINs for Italian companies)

PS

The situation with equities DB is rather complex. I have reason to suspect that many ISINs are incorrect because they were extracted using overly simplistic or imprecise regex patterns and/or because they were not properly joined/parsed using the "Symbol".

However, this is everything I was able to do using authoritative sources related to Borsa Italiana.

I believe the entire equities.csv database needs to be reviewed and thoroughly restructured.

Cross-referenced ISIN codes from multiple authoritative sources (ESMA regulatory data,
Borsa Italiana listings, top-rated Kaggle double-checked datasets) to fix incorrect and fill missing ISINs.

Changes:
- 82 ISIN corrections (replaced incorrect/obsolete codes)
- 147 ISIN additions (filled previously empty fields)
- Focused on MIL (Milan) exchange equities verified against Borsa Italiana official data
- Fixed 6 suspicious ISINs with country-code mismatches (e.g. non-IT ISINs for Italian companies)

PS
The situation with equities DB is rather complex. I have reason to suspect that many ISINs
are incorrect because they were extracted using overly simplistic or imprecise regex patterns
and/or because they were not properly joined/parsed using the "Symbol".

However, this is everything I was able to do using authoritative sources related to Borsa Italiana.

I believe the entire equities.csv database needs to be reviewed and thoroughly restructured.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@JerBouma JerBouma merged commit c60d504 into JerBouma:main Mar 5, 2026
@JerBouma
Copy link
Copy Markdown
Owner

JerBouma commented Mar 5, 2026

Thanks for this, I agree but it simply requires more community involvement. Especially given that this type of data is quite expensive to acquire elsewhere, e.g. a strong up-to-date source is Bloomberg coming in at $2.500 a month.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants