Conversation
|
This data file is created by a script extracting data from http://mfdonline.bps.go.id/ . See https://github.com/edwardsamuel/Wilayah-Administratif-Indonesia/blob/master/scripts/run.sh#L12 It is not useful to modify this generated file. Your changes will be overwritten when the script runs next time. Is the BPS data wrong? If it is wrong, it needs to be fixed in the BPS source. You can see "ALUE DUA MUKA 0" and "SITIMERT0" are used in Other occasions where this data has appeared; https://www.google.com/search?q=%22SITIMERT0%22+%223506190010%22 And a 'bot' created Wikipedia articles: And it appears in a wordlist here: |
|
If we can confirm that the BPS data is wrong, one solution is for this repository to have a 'fixes' list, which |
|
Hi @prasastoadi, Agree with @jayvdb. Any generated files can't be edited manually. It will be overwritten in the next run. You need to modify the script that generates the files, in this project can be |
|
I am very confident that the two villages name are wrong. We know that 0 (zero) is not alphabet. Here is the Sitimerto village Alue Dua Muka O I propose very simple method before write the data to csv. |
| def main(argv): | ||
| if (len(argv) > 0): | ||
| read_html_data(argv[0] + '/' + argv[1]) | ||
| fix_villages({1105130121: 'ALUE DUA MUKA O', 3506190010: 'SITIMERTO'}) |
There was a problem hiding this comment.
I think this method is quite dangerous. In case BPS rename 1105130121 and 3506190010, the generated data will be not following BPS update. What do you think?
There was a problem hiding this comment.
In this case, @prasastoadi only found issue for those two villages, what about the other data. Did he had already check entire village data?
There was a problem hiding this comment.
#16 is a way to check for more problems. But I think we should not wait for all problems to be found. They will be reported when people find them.
And we cant wait for government to fix them. That doesnt happen quickly.
But the fixes should be optional, so we can still use this repo's tools to obtain raw data.
|
IMO it's pointless to update anything in this repo while the source data from BPS still remain wrong. Dear @prasastoadi, one thing that you should do is ask BPS to update their data instead. |
|
Maybe fixes should be wrapped in a separate function call (and possibly separate data file), so that users can easily apply all fixes on top of the existing data. |
Fix '0' to 'O'
ALUE DUA MUKA 0 -> ALUE DUA MUKA O
SITIMERT0 -> SITIMERTO