-
Notifications
You must be signed in to change notification settings - Fork 130
added 2024 release in config #1837
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Summary of ChangesHello @niveditasing, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request focuses on updating and expanding the data configuration for the CDC 500 Places project. It involves refreshing the data source links for the existing 2024 release and establishing a new configuration for the 2025 release, which incorporates a set of data sources. Additionally, a minor documentation enhancement was made to provide clearer guidance on the location of a critical configuration file. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request updates the configuration for the CDC 500 Places data import to include the 2024 release. My review focuses on the correctness of these configuration changes and an update to the documentation.
I've found a few issues:
- The
README.mdupdate is slightly redundant and has a typo. I've suggested a clearer phrasing. - In
download_config.json, a new block forrelease_year: 2025has been added, but it incorrectly uses data URLs from the 2023 release. This is a critical issue that will lead to incorrect data ingestion and should be removed. - The configuration for the 2024 release is missing the ZipCode (ZCTA) data. I've suggested adding it back with the correct URL to ensure data completeness.
Overall, the intent to update to the 2024 release is good, but the configuration needs to be corrected to avoid data issues.
| { | ||
| "release_year": 2025, | ||
| "parameter": [ | ||
| { | ||
| "URL": "https://data.cdc.gov/api/views/swc5-untb/rows.csv?accessType=DOWNLOAD", | ||
| "FILE_TYPE": "County", | ||
| "FILE_NAME": "county_raw_data_2025.csv" | ||
| }, | ||
| { | ||
| "URL": "https://data.cdc.gov/api/views/eav7-hnsx/rows.csv?accessType=DOWNLOAD", | ||
| "FILE_TYPE": "City", | ||
| "FILE_NAME": "city_raw_data_2025.csv" | ||
| }, | ||
| { | ||
| "URL": "https://data.cdc.gov/api/views/cwsq-ngmh/rows.csv?accessType=DOWNLOAD", | ||
| "FILE_TYPE": "CensusTract", | ||
| "FILE_NAME": "censustract_raw_data_2025.csv" | ||
| }, | ||
| { | ||
| "URL": "https://data.cdc.gov/api/views/qnzd-25i4/rows.csv?accessType=DOWNLOAD", | ||
| "FILE_TYPE": "ZipCode", | ||
| "FILE_NAME": "zipcode_raw_data_2024.csv" | ||
| "FILE_NAME": "zipcode_raw_data_2025.csv" | ||
| } | ||
| ] | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This new configuration block for release_year: 2025 appears to be incorrect. The URLs provided (e.g., swc5-untb, eav7-hnsx) correspond to the CDC PLACES 2023 release data, not 2025. This will cause the script to download and process 2023 data as if it were from 2025, leading to data correctness issues. This block was likely added by mistake and should be removed.
| "FILE_NAME": "censustract_raw_data_2024.csv" | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ZipCode (ZCTA) data for the 2024 release seems to be missing from this configuration. According to the CDC PLACES data portal, the 2024 release includes ZCTA data. I've suggested adding it back with the correct URL for the 2024 release to ensure data completeness.
"FILE_NAME": "censustract_raw_data_2024.csv"
},
{
"URL": "https://data.cdc.gov/api/views/t2d6-nre4/rows.csv?accessType=DOWNLOAD",
"FILE_TYPE": "ZipCode",
"FILE_NAME": "zipcode_raw_data_2024.csv"
}| For data refresh for CDC500 import we need to manually search in the website for the latest release files across all geo levels and add the required configuration in [Json file](gs://datcom-csv/cdc500_places/download_config.json) present in the GCP Bucket Location. The config file is present locally as well [download_config.json](https://github.com/datacommonsorg/data/blob/master/scripts/us_cdc/500_places/download_config.json) we can use this file as well to generate the output. | ||
|
|
||
| NOTE: If any changes made in local config update same changes in config file present in GCP as well vice versa. We should always keep both config file in sync. | ||
| Here is the path for download_config.json in bucket : gs://datcom-csv/cdc500_places/download_config.json |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line is a bit redundant as the GCS path is already linked in the paragraph above. It also contains a typo (double space after 'for'). If the goal is to make the path easily copy-pastable, consider rephrasing for clarity and formatting it as a code block.
| Here is the path for download_config.json in bucket : gs://datcom-csv/cdc500_places/download_config.json | |
| The GCS path for `download_config.json` is: `gs://datcom-csv/cdc500_places/download_config.json` |
No description provided.