-
Notifications
You must be signed in to change notification settings - Fork 12
Open
Labels
enhancementNew feature or requestNew feature or requestgood first issueGood for newcomersGood for newcomershelp wantedExtra attention is neededExtra attention is needed
Description
A given title (in title_j, title_m) can appear under different forms in the database. This might be due to typos (e.g Ibm Tchnical Disclosure Bulletin), abbreviations (Ibm Tdb), parsing error (Ibm Tech-Nical Disclosure Bulletin, Ibm Corp) etc
Example ⬇️
Details
SELECT
DISTINCT(title_j)
FROM
`npl-parsing.patcit.beta`
WHERE
LOWER(title_j) LIKE "%ibm%"
ORDER BY
title_j DESC| title_j |
|---|
| Ibme Technical Disclosure Bulletin |
| Ibm-Tdb |
| Ibm Tecnical Disclosure Bulletin |
| Ibm Technical Dosclosure Bulletin |
| Ibm Technical Document |
| Ibm Technical Dislosure Bulletin |
| Ibm Technical Disclusure Bulletin |
| Ibm Technical Disclosures Bulletin |
| Ibm Technical Disclosure Bulleting |
| Ibm Technical Disclosure Bulletin; 'Improved First-In First-Out' |
| Ibm Technical Disclosure Bulletin, Ref. No. Xp |
| Ibm Technical Disclosure Bulletin, Nn Corp., Us |
| Ibm Technical Disclosure Bulletin, Ibm Corp. Ny |
| Ibm Technical Disclosure Bulletin, Ibm Corp |
| Ibm Technical Disclosure Bulletin Ibm |
| Ibm Technical Disclosure Bulletin |
| Ibm Technical Disclosure Bullentin |
| Ibm Technical Disclosure Bulle |
| Ibm Technical Disclossure Bulletin |
| Ibm Techn.Discl.Mag |
| Ibm Techn. Discl. Bull |
| Ibm Tech-Nical Disclosure Bulletin, Ibm Corp |
| Ibm Tech Disc Bulletin |
| Ibm Tdb |
| Ibm Tchnical Disclosure Bulletin |
| Ibm Disclosure Bulletin |
Feature description
Title variables are useful to many use-cases. A clean and transparent disambiguation would definitely be a strong plus.
- At this point, I have no particular idea on the most appropriate tools/algos to be used in the disambiguation process. Anyone should feel free to contribute.
- Ultimately, we want a correspondence table between a "unique identifier" (e.g "Ibm Technical Disclosure Bulletin") and all the related variations.
- The output of the disambiguation could be used to propagate ISSN(e)s (see issue Multiple
title_jfor the sameISSN/ISSNe#6 )
Reactions are currently unavailable
Metadata
Metadata
Labels
enhancementNew feature or requestNew feature or requestgood first issueGood for newcomersGood for newcomershelp wantedExtra attention is neededExtra attention is needed