Skip to content

Albania fix func1 and mapping for admin2_new#55

Open
yukinko-iwasaki wants to merge 7 commits intomainfrom
fix/albania_func1
Open

Albania fix func1 and mapping for admin2_new#55
yukinko-iwasaki wants to merge 7 commits intomainfrom
fix/albania_func1

Conversation

@yukinko-iwasaki
Copy link
Contributor

@yukinko-iwasaki yukinko-iwasaki commented Jul 23, 2025

This PR addresses the follwoing issues:

  1. func1 is not tagged for the year 2024.
  2. added mapping file for admin2 and admin2_new conversion.
  3. alb_publish is now based on the silver table.
  4. filter based on the transfer status is moved down to gold table. (both transfer states are needed for publication.

NOTE:

Now all the auxiliary files for Albania is tracked by git and these files are referred by relative paths instead of accessing the volume. This functionality is only available under repo. (please test this PR under Repo folder and not in your personal workspace.)

Copy link
Contributor

@bhupatiraju bhupatiraju left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yukinko-iwasaki This looks good to me!

If I am not mistaken you are not replacing the admin2 at the extraction stage as I thought you had to do but rather adding this new column, doing the calculations with the admin2 as we had before and then finally revealing admin2_new as the new admin2 in gold.


# admin2 to admin2_new mapping
mapping = pd.read_csv('./mapping.csv')
mapping = mapping[['admin2', 'admin2_new', 'county']].rename(columns={'admin2': 'admin2_tmp'}).astype({'admin2_new': 'str'})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this mapping going to change in the future? I assume this is a one time adjustment to get the new admin regions to conform the old ones but if not, could we check if the admin2 is a 2 digit code or a 3 digit code? In some cases I noticed that we have correct length padding but in other cases the leading zeros are not present. This may not be relevant here though.

Copy link
Contributor Author

@yukinko-iwasaki yukinko-iwasaki Jul 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we check if the admin2 is a 2 digit code or a 3 digit code?

When merging mapping df with the main dataframe, I temporarily converted the admin2 code into integer, so that we could ignore the padding length inconsistencies. So I think for our case, we don't have to worry about the paddings.

)

tag_code_mapping = pd.read_csv(TAG_MAPPING_URL)
tag_code_mapping = pd.read_csv(TAG_MAPPING_PATH)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bhupatiraju

Thank you so much for taking a look!
Could you try running this scrip here to confirm that the files can now be loaded using the relative path in the project?
Could you also make sure that you're in the "Repo/{your email address}/{repo_name} folder when testing?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants