-
Notifications
You must be signed in to change notification settings - Fork 132
Import : IPEDS_FTE_Enrollment_National Auto Refresh #1796
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from 5 commits
138bbaa
cb3914a
f996517
6829257
7c38c1e
c0d393b
7372c02
e44cee0
874b354
486fae4
5659700
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
krishnaswamypradeep marked this conversation as resolved.
Show resolved
Hide resolved
|
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you add the LICENSE text that we usually add for all source files? # Copyright 2025 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License. |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,35 @@ | ||
| import os | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you add a top level description to this file which you already documented in the README file please? |
||
| import pandas as pd | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't see any usage of pandas in this file. Can we remove it from the list of imports then? |
||
|
|
||
| def clean_csv(file_path): | ||
| with open(file_path, 'r') as f: | ||
| lines = f.readlines() | ||
|
|
||
| start_index = -1 | ||
| for i, line in enumerate(lines): | ||
| if line.strip().startswith('Year'): | ||
| start_index = i | ||
| break | ||
|
|
||
| if start_index != -1: | ||
| cleaned_content = lines[start_index:] | ||
| with open(file_path, 'w') as f: | ||
| f.writelines(cleaned_content) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We generally avoid modifying the original source files so that we can trace any issues back easily. Can we write the content to new files and have statvar processor work on them instead? |
||
| print(f"Cleaned {file_path} successfully, removed {start_index} initial rows.") | ||
|
||
| else: | ||
| print(f"Could not find 'Year' in {file_path}. No changes made.") | ||
|
|
||
| def clean_csv_in_directory(directory): | ||
| if not os.path.isdir(directory): | ||
| print(f"Directory '{directory}' not found.") | ||
| return | ||
|
|
||
| csv_files = [f for f in os.listdir(directory) if f.endswith('.csv')] | ||
|
|
||
| for csv_file in csv_files: | ||
| file_path = os.path.join(directory, csv_file) | ||
| clean_csv(file_path) | ||
|
|
||
| if __name__ == '__main__': | ||
| input_directory = 'input_files' | ||
| clean_csv_in_directory(input_directory) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Similar to previous comment, please add the license text to the top of this file. |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -4,42 +4,4 @@ SCRIPT_PATH=$(realpath "$(dirname "$0")") | |
|
|
||
| mkdir -p "input_files" | ||
|
|
||
| gsutil -m cp -r gs://unresolved_mcf/IPEDS/Enrollment_FTE_National/input_files/*.csv "$SCRIPT_PATH/input_files" | ||
|
|
||
| python3 <<'END_PYTHON' | ||
| import os | ||
| import pandas as pd | ||
|
|
||
| def clean_csv(file_path): | ||
| with open(file_path, 'r') as f: | ||
| lines = f.readlines() | ||
|
|
||
| start_index = -1 | ||
| for i, line in enumerate(lines): | ||
| if line.strip().startswith('Year'): | ||
| start_index = i | ||
| break | ||
|
|
||
| if start_index != -1: | ||
| cleaned_content = lines[start_index:] | ||
| with open(file_path, 'w') as f: | ||
| f.writelines(cleaned_content) | ||
| print(f"Cleaned {file_path} successfully, removed {start_index} initial rows.") | ||
| else: | ||
| print(f"Could not find 'Year' in {file_path}. No changes made.") | ||
|
|
||
| def clean_csv_in_directory(directory): | ||
| if not os.path.isdir(directory): | ||
| print(f"Directory '{directory}' not found.") | ||
| return | ||
|
|
||
| csv_files = [f for f in os.listdir(directory) if f.endswith('.csv')] | ||
|
|
||
| for csv_file in csv_files: | ||
| file_path = os.path.join(directory, csv_file) | ||
| clean_csv(file_path) | ||
|
|
||
| if __name__ == '__main__': | ||
| input_directory = 'input_files' | ||
| clean_csv_in_directory(input_directory) | ||
| END_PYTHON | ||
| gsutil -m cp -r gs://unresolved_mcf/IPEDS/Enrollment_FTE_National/input_files/*.csv "$SCRIPT_PATH/input_files" | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please add a new line to the end of this file. |
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,46 @@ | ||
| observationAbout,observationDate,value,variableMeasured,#input | ||
| country/USA,2024,10895410,dcid:Count_Student_EnrolledInCollegeOrGraduateSchool_FullTimeEquivalent_PublicEstablishment_PostSecondaryInstitution,input_files/controlOfInstitution_data.csv:2:2 | ||
| country/USA,2024,3884457,dcid:Count_Student_EnrolledInCollegeOrGraduateSchool_FullTimeEquivalent_PrivatelyOwnedNotForProfitEstablishment_PostSecondaryInstitution,input_files/controlOfInstitution_data.csv:2:3 | ||
| country/USA,2024,1344830,dcid:Count_Student_EnrolledInCollegeOrGraduateSchool_FullTimeEquivalent_PrivatelyOwnedForProfitEstablishment_PostSecondaryInstitution,input_files/controlOfInstitution_data.csv:2:4 | ||
| country/USA,2023,10568750,dcid:Count_Student_EnrolledInCollegeOrGraduateSchool_FullTimeEquivalent_PublicEstablishment_PostSecondaryInstitution,input_files/controlOfInstitution_data.csv:3:2 | ||
| country/USA,2023,3812142,dcid:Count_Student_EnrolledInCollegeOrGraduateSchool_FullTimeEquivalent_PrivatelyOwnedNotForProfitEstablishment_PostSecondaryInstitution,input_files/controlOfInstitution_data.csv:3:3 | ||
| country/USA,2023,1260771,dcid:Count_Student_EnrolledInCollegeOrGraduateSchool_FullTimeEquivalent_PrivatelyOwnedForProfitEstablishment_PostSecondaryInstitution,input_files/controlOfInstitution_data.csv:3:4 | ||
| country/USA,2022,10591338,dcid:Count_Student_EnrolledInCollegeOrGraduateSchool_FullTimeEquivalent_PublicEstablishment_PostSecondaryInstitution,input_files/controlOfInstitution_data.csv:4:2 | ||
| country/USA,2022,3818557,dcid:Count_Student_EnrolledInCollegeOrGraduateSchool_FullTimeEquivalent_PrivatelyOwnedNotForProfitEstablishment_PostSecondaryInstitution,input_files/controlOfInstitution_data.csv:4:3 | ||
| country/USA,2022,1274605,dcid:Count_Student_EnrolledInCollegeOrGraduateSchool_FullTimeEquivalent_PrivatelyOwnedForProfitEstablishment_PostSecondaryInstitution,input_files/controlOfInstitution_data.csv:4:4 | ||
| country/USA,2021,10985128,dcid:Count_Student_EnrolledInCollegeOrGraduateSchool_FullTimeEquivalent_PublicEstablishment_PostSecondaryInstitution,input_files/controlOfInstitution_data.csv:5:2 | ||
| country/USA,2021,3802117,dcid:Count_Student_EnrolledInCollegeOrGraduateSchool_FullTimeEquivalent_PrivatelyOwnedNotForProfitEstablishment_PostSecondaryInstitution,input_files/controlOfInstitution_data.csv:5:3 | ||
| country/USA,2021,1301882,dcid:Count_Student_EnrolledInCollegeOrGraduateSchool_FullTimeEquivalent_PrivatelyOwnedForProfitEstablishment_PostSecondaryInstitution,input_files/controlOfInstitution_data.csv:5:4 | ||
| country/USA,2020,11366064,dcid:Count_Student_EnrolledInCollegeOrGraduateSchool_FullTimeEquivalent_PublicEstablishment_PostSecondaryInstitution,input_files/controlOfInstitution_data.csv:6:2 | ||
| country/USA,2020,3852214,dcid:Count_Student_EnrolledInCollegeOrGraduateSchool_FullTimeEquivalent_PrivatelyOwnedNotForProfitEstablishment_PostSecondaryInstitution,input_files/controlOfInstitution_data.csv:6:3 | ||
| country/USA,2020,1245792,dcid:Count_Student_EnrolledInCollegeOrGraduateSchool_FullTimeEquivalent_PrivatelyOwnedForProfitEstablishment_PostSecondaryInstitution,input_files/controlOfInstitution_data.csv:6:4 | ||
| country/USA,2019,11420024,dcid:Count_Student_EnrolledInCollegeOrGraduateSchool_FullTimeEquivalent_PublicEstablishment_PostSecondaryInstitution,input_files/controlOfInstitution_data.csv:7:2 | ||
| country/USA,2019,3842713,dcid:Count_Student_EnrolledInCollegeOrGraduateSchool_FullTimeEquivalent_PrivatelyOwnedNotForProfitEstablishment_PostSecondaryInstitution,input_files/controlOfInstitution_data.csv:7:3 | ||
| country/USA,2019,1238471,dcid:Count_Student_EnrolledInCollegeOrGraduateSchool_FullTimeEquivalent_PrivatelyOwnedForProfitEstablishment_PostSecondaryInstitution,input_files/controlOfInstitution_data.csv:7:4 | ||
| country/USA,2018,11470565,dcid:Count_Student_EnrolledInCollegeOrGraduateSchool_FullTimeEquivalent_PublicEstablishment_PostSecondaryInstitution,input_files/controlOfInstitution_data.csv:8:2 | ||
| country/USA,2018,3788721,dcid:Count_Student_EnrolledInCollegeOrGraduateSchool_FullTimeEquivalent_PrivatelyOwnedNotForProfitEstablishment_PostSecondaryInstitution,input_files/controlOfInstitution_data.csv:8:3 | ||
| country/USA,2018,1229025,dcid:Count_Student_EnrolledInCollegeOrGraduateSchool_FullTimeEquivalent_PrivatelyOwnedForProfitEstablishment_PostSecondaryInstitution,input_files/controlOfInstitution_data.csv:8:4 | ||
| country/USA,2017,11429561,dcid:Count_Student_EnrolledInCollegeOrGraduateSchool_FullTimeEquivalent_PublicEstablishment_PostSecondaryInstitution,input_files/controlOfInstitution_data.csv:9:2 | ||
| country/USA,2017,3764543,dcid:Count_Student_EnrolledInCollegeOrGraduateSchool_FullTimeEquivalent_PrivatelyOwnedNotForProfitEstablishment_PostSecondaryInstitution,input_files/controlOfInstitution_data.csv:9:3 | ||
| country/USA,2017,1411905,dcid:Count_Student_EnrolledInCollegeOrGraduateSchool_FullTimeEquivalent_PrivatelyOwnedForProfitEstablishment_PostSecondaryInstitution,input_files/controlOfInstitution_data.csv:9:4 | ||
| country/USA,2016,11441625,dcid:Count_Student_EnrolledInCollegeOrGraduateSchool_FullTimeEquivalent_PublicEstablishment_PostSecondaryInstitution,input_files/controlOfInstitution_data.csv:10:2 | ||
| country/USA,2016,3741014,dcid:Count_Student_EnrolledInCollegeOrGraduateSchool_FullTimeEquivalent_PrivatelyOwnedNotForProfitEstablishment_PostSecondaryInstitution,input_files/controlOfInstitution_data.csv:10:3 | ||
| country/USA,2016,1530130,dcid:Count_Student_EnrolledInCollegeOrGraduateSchool_FullTimeEquivalent_PrivatelyOwnedForProfitEstablishment_PostSecondaryInstitution,input_files/controlOfInstitution_data.csv:10:4 | ||
| country/USA,2015,11490719,dcid:Count_Student_EnrolledInCollegeOrGraduateSchool_FullTimeEquivalent_PublicEstablishment_PostSecondaryInstitution,input_files/controlOfInstitution_data.csv:11:2 | ||
| country/USA,2015,3740928,dcid:Count_Student_EnrolledInCollegeOrGraduateSchool_FullTimeEquivalent_PrivatelyOwnedNotForProfitEstablishment_PostSecondaryInstitution,input_files/controlOfInstitution_data.csv:11:3 | ||
| country/USA,2015,1769917,dcid:Count_Student_EnrolledInCollegeOrGraduateSchool_FullTimeEquivalent_PrivatelyOwnedForProfitEstablishment_PostSecondaryInstitution,input_files/controlOfInstitution_data.csv:11:4 | ||
| country/USA,2014,11573864,dcid:Count_Student_EnrolledInCollegeOrGraduateSchool_FullTimeEquivalent_PublicEstablishment_PostSecondaryInstitution,input_files/controlOfInstitution_data.csv:12:2 | ||
| country/USA,2014,3697921,dcid:Count_Student_EnrolledInCollegeOrGraduateSchool_FullTimeEquivalent_PrivatelyOwnedNotForProfitEstablishment_PostSecondaryInstitution,input_files/controlOfInstitution_data.csv:12:3 | ||
| country/USA,2014,2039179,dcid:Count_Student_EnrolledInCollegeOrGraduateSchool_FullTimeEquivalent_PrivatelyOwnedForProfitEstablishment_PostSecondaryInstitution,input_files/controlOfInstitution_data.csv:12:4 | ||
| country/USA,2013,11682275,dcid:Count_Student_EnrolledInCollegeOrGraduateSchool_FullTimeEquivalent_PublicEstablishment_PostSecondaryInstitution,input_files/controlOfInstitution_data.csv:13:2 | ||
| country/USA,2013,3676175,dcid:Count_Student_EnrolledInCollegeOrGraduateSchool_FullTimeEquivalent_PrivatelyOwnedNotForProfitEstablishment_PostSecondaryInstitution,input_files/controlOfInstitution_data.csv:13:3 | ||
| country/USA,2013,2157925,dcid:Count_Student_EnrolledInCollegeOrGraduateSchool_FullTimeEquivalent_PrivatelyOwnedForProfitEstablishment_PostSecondaryInstitution,input_files/controlOfInstitution_data.csv:13:4 | ||
| country/USA,2012,11924029,dcid:Count_Student_EnrolledInCollegeOrGraduateSchool_FullTimeEquivalent_PublicEstablishment_PostSecondaryInstitution,input_files/controlOfInstitution_data.csv:14:2 | ||
| country/USA,2012,3651247,dcid:Count_Student_EnrolledInCollegeOrGraduateSchool_FullTimeEquivalent_PrivatelyOwnedNotForProfitEstablishment_PostSecondaryInstitution,input_files/controlOfInstitution_data.csv:14:3 | ||
| country/USA,2012,2544278,dcid:Count_Student_EnrolledInCollegeOrGraduateSchool_FullTimeEquivalent_PrivatelyOwnedForProfitEstablishment_PostSecondaryInstitution,input_files/controlOfInstitution_data.csv:14:4 | ||
| country/USA,2011,12059233,dcid:Count_Student_EnrolledInCollegeOrGraduateSchool_FullTimeEquivalent_PublicEstablishment_PostSecondaryInstitution,input_files/controlOfInstitution_data.csv:15:2 | ||
| country/USA,2011,3650465,dcid:Count_Student_EnrolledInCollegeOrGraduateSchool_FullTimeEquivalent_PrivatelyOwnedNotForProfitEstablishment_PostSecondaryInstitution,input_files/controlOfInstitution_data.csv:15:3 | ||
| country/USA,2011,2633725,dcid:Count_Student_EnrolledInCollegeOrGraduateSchool_FullTimeEquivalent_PrivatelyOwnedForProfitEstablishment_PostSecondaryInstitution,input_files/controlOfInstitution_data.csv:15:4 | ||
| country/USA,2010,11804731,dcid:Count_Student_EnrolledInCollegeOrGraduateSchool_FullTimeEquivalent_PublicEstablishment_PostSecondaryInstitution,input_files/controlOfInstitution_data.csv:16:2 | ||
| country/USA,2010,3520524,dcid:Count_Student_EnrolledInCollegeOrGraduateSchool_FullTimeEquivalent_PrivatelyOwnedNotForProfitEstablishment_PostSecondaryInstitution,input_files/controlOfInstitution_data.csv:16:3 | ||
| country/USA,2010,2515909,dcid:Count_Student_EnrolledInCollegeOrGraduateSchool_FullTimeEquivalent_PrivatelyOwnedForProfitEstablishment_PostSecondaryInstitution,input_files/controlOfInstitution_data.csv:16:4 |
Uh oh!
There was an error while loading. Please reload this page.