Skip to content

Commit a309aba

Browse files
author
Jill Grant
authored
Merge pull request #278361 from fbsolo-ms1/update-open-datasets-files
Freshness update for dataset-catalog.md . . .
2 parents 6aff575 + 052c643 commit a309aba

File tree

1 file changed

+17
-16
lines changed

1 file changed

+17
-16
lines changed

articles/open-datasets/dataset-catalog.md

Lines changed: 17 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -3,11 +3,12 @@ title: Datasets in Azure Open Datasets
33
description: Explore the datasets in Azure Open Datasets.
44
ms.service: open-datasets
55
ms.topic: sample
6-
ms.date: 04/16/2021
6+
ms.reviewer: franksolomon
7+
ms.date: 06/13/2024
78
---
89
# Azure Open Datasets
910

10-
Improve the accuracy of your machine learning models with publicly available datasets. Save time on data discovery and preparation by using curated datasets that are ready to use in machine learning projects.
11+
Improve the accuracy of your machine learning models with publicly available datasets. To save time on data discovery and preparation, use curated datasets that are ready for machine learning projects.
1112

1213
## Transportation
1314

@@ -23,30 +24,30 @@ Improve the accuracy of your machine learning models with publicly available dat
2324
| Dataset | Description |
2425
|--|--|
2526
| [COVID-19 Data Lake](dataset-covid-19-data-lake.md) | COVID-19 Data Lake collection is a collection of COVID-19 related datasets from various sources, covering testing and patient outcome tracking data, social distancing policy, hospital capacity, mobility, etc. |
26-
| [COVID-19 Open Research Dataset](dataset-covid-19-open-research.md) | A full-text and metadata dataset of COVID-19 and coronavirus-related scholarly articles optimized for machine readability and made available for use by the global research community. |
27-
| [Genomics Data Lake](dataset-genomics-data-lake.md) | The Genomics Data Lake provides various public datasets that you can access for free and integrate into your genomics analysis workflows and applications. The datasets include genome sequences, variant info and subject/sample metadata in BAM, FASTA, VCF, CSV file formats. |
28-
27+
| [COVID-19 Open Research Dataset](dataset-covid-19-open-research.md) | A full-text and metadata dataset of COVID-19 and coronavirus-related scholarly articles, optimized for machine readability and made available for use by the global research community. |
28+
| [Genomics Data Lake](dataset-genomics-data-lake.md) | The Genomics Data Lake provides various public datasets available for free, ready to integrate into your genomics analysis workflows and applications. The datasets include genome sequences, variant info, and subject/sample metadata in BAM, FASTA, VCF, CSV file formats. |
29+
2930
## Labor and economics
3031

3132
| Dataset | Description |
3233
|--|--|
33-
| [US Labor Force Statistics](dataset-us-labor-force.md) | US Labor Force Statistics provides Labor Force Statistics, labor force participation rates, and the civilian noninstitutional population by age, gender, race, and ethnic groups. in the United States. |
34+
| [US Labor Force Statistics](dataset-us-labor-force.md) | US Labor Force Statistics provides Labor Force Statistics, labor force participation rates, and the civilian noninstitutional population by age, gender, race, and ethnic groups in the United States. |
3435
| [US National Employment Hours and Earnings](dataset-us-national-employment-earnings.md) | The Current Employment Statistics (CES) program produces detailed industry estimates of nonfarm employment, hours, and earnings of workers on payrolls in the United States. |
3536
| [US State Employment Hours and Earnings](dataset-us-state-employment-earnings.md) | The Current Employment Statistics (CES) program produces detailed industry estimates of nonfarm employment, hours, and earnings of workers on payrolls in the United States. |
3637
| [US Local Area Unemployment Statistics](dataset-us-local-unemployment.md) | The US Local Area Unemployment Statistics datasets provides monthly and annual employment, unemployment, and labor force data for Census regions and divisions, States, counties, metropolitan areas, and many cities in the United States. |
37-
| [US Consumer Price Index](dataset-us-consumer-price-index.md) | The Consumer Price Index (CPI) is a measure of the average change over time in the prices paid by urban consumers for a market basket of consumer goods and services. |
38-
| [US Producer Price Index - Industry](dataset-us-producer-price-index-industry.md) | The Producer Price Index (PPI) is a measure of average change over time in the selling prices received by domestic producers for their output. |
39-
| [US Producer Price Index - Commodities](dataset-us-producer-price-index-commodities.md) | The Producer Price Index (PPI) is a measure of average change over time in the selling prices received by domestic producers for their commodities. |
38+
| [US Consumer Price Index](dataset-us-consumer-price-index.md) | The Consumer Price Index (CPI) measures the average change over time in the prices paid by urban consumers for a market basket of consumer goods and services. |
39+
| [US Producer Price Index - Industry](dataset-us-producer-price-index-industry.md) | The Producer Price Index (PPI) measures the average change, over time, in the selling prices received by domestic producers for their output. |
40+
| [US Producer Price Index - Commodities](dataset-us-producer-price-index-commodities.md) | The Producer Price Index (PPI) measures the average change, over time, in the selling prices received by domestic producers for their commodities. |
4041

4142
## Population and safety
4243

4344
| Dataset | Description |
4445
|--|--|
45-
| [US Population by County](dataset-us-population-county.md) | US population by gender and race for each US county sourced from 2000 and 2010 Decennial Census. This dataset is sourced from the United States Census Bureau. |
46-
| [US Population by ZIP Code](dataset-us-population-zip.md) | US population by gender and race for each US ZIP code sourced from 2010 Decennial Census. This dataset is sourced from the United States Census Bureau. |
47-
| [Boston Safety Data](dataset-boston-safety.md) | Read data about 311 calls reported to the city of Boston. This dataset is stored in Parquet format and is updated daily. |
48-
| [Chicago Safety Data](dataset-chicago-safety.md) | Read data about 311 calls reported to the city of Chicago. This dataset is stored in Parquet format and is updated daily. |
49-
| [New York City Safety Data](dataset-new-york-city-safety.md) | This dataset contains all New York City 311 service requests from 2010 to the present. It’s stored in Parquet format and updated daily. |
46+
| [US Population by County](dataset-us-population-county.md) | US population by gender and race for each US county, sourced from 2000 and 2010 Decennial Census. This dataset is sourced from the United States Census Bureau. |
47+
| [US Population by ZIP Code](dataset-us-population-zip.md) | US population by gender and race for each US ZIP code, sourced from 2010 Decennial Census. This dataset is sourced from the United States Census Bureau. |
48+
| [Boston Safety Data](dataset-boston-safety.md) | Read data about 311 calls reported to the city of Boston. This dataset is stored in Parquet format and receives daily updates. |
49+
| [Chicago Safety Data](dataset-chicago-safety.md) | Read data about 311 calls reported to the city of Chicago. This dataset is stored in Parquet format and receives daily updates. |
50+
| [New York City Safety Data](dataset-new-york-city-safety.md) | This dataset contains all New York City 311 service requests from 2010 to the present. This dataset is stored in Parquet format and receives daily updates. |
5051
| [San Francisco Safety Data](dataset-san-francisco-safety.md) | Fire department calls for service and 311 cases in San Francisco. This dataset contains historical records accumulated from 2015 to the present. |
5152
| [Seattle Safety Data](dataset-seattle-safety.md) | Seattle Fire Department 911 dispatches. This dataset is updated daily, and contains historical records accumulated from 2010 to the present |
5253

@@ -55,8 +56,8 @@ Improve the accuracy of your machine learning models with publicly available dat
5556
| Dataset | Description |
5657
|--|--|
5758
| [Diabetes](dataset-diabetes.md) | The Diabetes dataset has 442 samples with 10 features, making it ideal for getting started with machine learning algorithms. |
58-
| [OJ Sales Simulated Data](dataset-oj-sales-simulated.md) | This dataset is derived from the Dominick’s OJ dataset and includes extra simulated data with the goal of providing a dataset that makes it easy to simultaneously train thousands of models on Azure Machine Learning. |
59-
| [MNIST database of handwritten digits](dataset-mnist.md) | The MNIST database of handwritten digits has a training set of 60,000 examples and a test set of 10,000 examples. The digits have been size-normalized and centered in a fixed-size image. |
59+
| [OJ Sales Simulated Data](dataset-oj-sales-simulated.md) | This dataset is derived from the Dominick’s OJ dataset and includes extra simulated data, with the goal of providing a dataset that makes it easy to simultaneously train thousands of models on Azure Machine Learning. |
60+
| [MNIST database of handwritten digits](dataset-mnist.md) | The MNIST database of handwritten digits has a training set of 60,000 examples and a test set of 10,000 examples. The digits are size-normalized and centered in a fixed-size image. |
6061
| [Microsoft News recommendation dataset](dataset-microsoft-news.md) | Microsoft News Dataset (MIND) is a large-scale dataset for news recommendation research. It serves as a benchmark dataset for news recommendation, and facilitates research in news recommendation and recommender systems. |
6162
| [Public holidays](dataset-public-holidays.md) | Worldwide public holiday data sourced from PyPI holidays package and Wikipedia, covering 38 countries or regions from 1970 to 2099. |
6263
| [Russian open speech to text](dataset-open-speech-text.md) | Russian Open STT is a large-scale open speech to text dataset for the Russian language |

0 commit comments

Comments
 (0)