Skip to content

Fix villages data#11

Open
prasastoadi wants to merge 1 commit intoedwardsamuel:masterfrom
prasastoadi:master
Open

Fix villages data#11
prasastoadi wants to merge 1 commit intoedwardsamuel:masterfrom
prasastoadi:master

Conversation

@prasastoadi
Copy link

@prasastoadi prasastoadi commented Nov 3, 2016

Fix '0' to 'O'

ALUE DUA MUKA 0 -> ALUE DUA MUKA O
SITIMERT0 -> SITIMERTO

@jayvdb
Copy link
Contributor

jayvdb commented Nov 4, 2016

This data file is created by a script extracting data from http://mfdonline.bps.go.id/ . See https://github.com/edwardsamuel/Wilayah-Administratif-Indonesia/blob/master/scripts/run.sh#L12

It is not useful to modify this generated file. Your changes will be overwritten when the script runs next time.

Is the BPS data wrong? If it is wrong, it needs to be fixed in the BPS source.

You can see "ALUE DUA MUKA 0" and "SITIMERT0" are used in
https://web.archive.org/web/20150207100538/http://www.bps.go.id/eng/download_file/Population_of_Indonesia_by_Village_2010.pdf

Other occasions where this data has appeared;

https://www.google.com/search?q=%22SITIMERT0%22+%223506190010%22

And a 'bot' created Wikipedia articles:
https://nl.wikipedia.org/wiki/Alue_Dua_Muka_0
https://nl.wikipedia.org/wiki/Sitimert0

And it appears in a wordlist here:
https://id.wiktionary.org/wiki/Wiktionary:ProyekWiki_bahasa_Indonesia/Daftar_kata/Nama/Tempat/Semua

@jayvdb
Copy link
Contributor

jayvdb commented Nov 4, 2016

If we can confirm that the BPS data is wrong, one solution is for this repository to have a 'fixes' list, which run.sh uses to fix the generated csv files.

@edwardsamuel
Copy link
Owner

Hi @prasastoadi,

Agree with @jayvdb. Any generated files can't be edited manually. It will be overwritten in the next run. You need to modify the script that generates the files, in this project can be run.sh or the python script. But, you need to make sure first if the source (BPS MDF Online) data is wrong.

@prasastoadi
Copy link
Author

prasastoadi commented Nov 7, 2016

I am very confident that the two villages name are wrong. We know that 0 (zero) is not alphabet.

Here is the Sitimerto village
https://goo.gl/maps/qMH3K7LjahB2

Alue Dua Muka O
http://lmgtfy.com/?q=alue+dua+muka+o+site%3Ago.id

I propose very simple method before write the data to csv.
I think better to check villages/districts/regencies/provinces one by one to prevent typo in the data. I hope someone do it in the next patch 😉

def main(argv):
if (len(argv) > 0):
read_html_data(argv[0] + '/' + argv[1])
fix_villages({1105130121: 'ALUE DUA MUKA O', 3506190010: 'SITIMERTO'})
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this method is quite dangerous. In case BPS rename 1105130121 and 3506190010, the generated data will be not following BPS update. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, @prasastoadi only found issue for those two villages, what about the other data. Did he had already check entire village data?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#16 is a way to check for more problems. But I think we should not wait for all problems to be found. They will be reported when people find them.

And we cant wait for government to fix them. That doesnt happen quickly.
But the fixes should be optional, so we can still use this repo's tools to obtain raw data.

@feryardiant
Copy link
Contributor

IMO it's pointless to update anything in this repo while the source data from BPS still remain wrong.

Dear @prasastoadi, one thing that you should do is ask BPS to update their data instead.

@jayvdb
Copy link
Contributor

jayvdb commented Feb 17, 2018

Maybe fixes should be wrapped in a separate function call (and possibly separate data file), so that users can easily apply all fixes on top of the existing data.

contactjavas added a commit to contactjavas/Wilayah-Administratif-Indonesia that referenced this pull request Aug 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants