| api_or_bulk_downloads |
Bulk |
| citation |
Arts S, Hou J, Gomez JC. (2020). Natural language processing to identify the creation and impact of new technologies in patent text: code, data, and new measures. Forthcoming Research Policy. (https://doi.org/10.1016/j.respol.2020.104144) |
| code |
https://github.com/sam-arts/respol_patents_code |
| contributors |
Sam Arts |
Jianan Hou |
Juan Carlos Gomez |
|
| cost |
None |
| datasets_and_publications_using_this_dataset |
Arts S, Hou J, Gomez JC. (2020). Natural language processing to identify the creation and impact of new technologies in patent text: code, data, and new measures. Forthcoming Research Policy. (https://doi.org/10.1016/j.respol.2020.104144) |
| description |
Different open access data files related to the text of USPTO patent documents, including 1) for each US patent a list of processed, cleaned and stemmed keywords, 2) for each patent a list of the 1,000 most similar patents (based on cosine similarity) from the entire population of US patents, 3) for each US patent the average cosine similarity with all prior patents from the previous 5 years, and the average cosine similarity with all later patents in the following 5 years, 4) each new keyword (unigram), bigram (sequence of two adjacent keywords), trigram, and pairwise keyword combination introduced for the first time in history by a US patent, the number of the patent introducing it for the first time, and the total number of patents from the entire population using these new keywords, bigrams, trigrams, and new keyword combinations. |
| documentation |
https://zenodo.org/record/3515985 |
| doi |
https://doi.org/10.5281/zenodo.3515985 |
| error_metrics |
Yes |
| last_edit |
Fri, 01 Dec 2023 17:56:16 GMT |
| location |
https://zenodo.org/record/3515985 |
| maintained_by |
Sam Arts |
| open_access |
TRUE |
| related_projects |
|
| related_publications |
Arts S, Hou J, Gomez JC. (2020). Natural language processing to identify the creation and impact of new technologies in patent text: code, data, and new measures. Forthcoming Research Policy. (https://doi.org/10.1016/j.respol.2020.104144) |
| shortname |
patent_text_new_measures |
| superseded_by |
Fri, 25 Feb 2022 23:35:52 GMT |
| tags |
patent measures |
text |
natural language processing |
novelty |
impact |
USPTO |
technological progress |
|
| terms_of_use |
Open Data Commons Attribution License v1.0 |
| timeframe |
1969-2018 |
| title |
Patent text: code, data, and new measures |
| uuid |
44f33a6f-5099-4481-abed-af9aadf0bd4f |
| versioning |
FALSE |