Geolocation Pipeline

Data pipelines that integrate Brazilian geographic and demographic datasets with fast‑food locations to generate an expansion strategy for McDonald's in Brazil. [see report]

This project was developed using the following tools:

🔶 Kedro - Framework to create reproducible, maintainable, and modular data science code
🐍 PuLP - Linear and mixed integer programming library for optimization problems
📦 UV - Ultra-fast Python package manager
🚀 Just - Modern command runner with powerful features
💅 Ruff - Lightning-fast linter and formatter
🧪 Pytest - Testing framework with fixtures and plugins
🛫 Pre-commit - Hooks to ensure code quality and adherence to standards
🐳 Docker - Multi-stage build and distroless image
🔄 GitHub Actions - CI/CD pipeline

Setup

Environment

This project is based on UV as package manager and Just as command runner. You need to have both installed in your system to work on this project.

Once you have UV and Just installed, you can run just dev-sync in your terminal to create a virtual environment and install all the dependencies.

If you want to build a production environment without the development dependencies, you can run just prod-sync instead.

Finally, to install the pre-commit hooks, run just install-hooks.

Data

This project uses data about the Brazilian population published by the Brazilian Institute of Geography and Statistics (IBGE) and data scraped from the Brazilian McDonald's and Subway websites.

To download all the data needed for this project, you can either access this Google Drive URL or follow these instructions to get the data from the original sources:

IBGE data

a. Population data: access IBGE's Population Estimates page and download:
- "Estimativas_2020" > POP2020_20220905.xls
- "Estimativas_2021" > POP2021_20240624.xls
b. Cities GDP data: access IBGE's Downloads page and download:
- "Pib_Municipios" > "2021" > "base" > base_de_dados_2010_2021_xlsx.zip (then unzip file)
c. Brazil's shapefiles: access IBGE's Municipal Mesh page, select "Editions" > "2021" > "More on the product", and download:
- "Municipalities" > BR_Municipios_2021.zip
- "Federation Units" > BR_UF_2021.zip
- "Microregions" > BR_Microrregioes_2021.zip
- "Mesoregions" > BR_Mesorregioes_2021.zip
Fast-food restaurants data: Location data for McDonald's and Subway restaurants was scraped from their websites using the code in notebooks/scrape_restaurants.ipynb. However, the scraping code might stop working as their websites might change over time, so I have also included a copy of the scraped data in the notebooks/scraped_data/ folder.

After downloading the necessary files, move them to the data/01_raw/ folder, which should look like this:

data/01_raw
├── BR_Mesorregioes_2021.zip
├── BR_Microrregioes_2021.zip
├── BR_Municipios_2021.zip
├── BR_UF_2021.zip
├── mcdonalds.json
├── PIB dos Munic¡pios - base de dados 2010-2021.xlsx
├── POP2020_20220905.xls
├── POP2021_20240624.xls
└── subway.html

Usage

The final output of this project's pipelines is a report on "McDonald's Expansion Opportunities in Brazil", which can be found in data/08_reporting/final_report.md

Running the Pipelines

You can run the full set of Kedro pipelines in this project (process_data, merge_data, and build_report) with:

kedro run

If you want to run a specific pipeline, you can use the --pipeline option. For example, to run the process_data pipeline:

kedro run --pipeline process_data

Similarly, you can run specific nodes or tags by using the --nodes and/or --tags options followed by the name(s) of the node(s) or tag(s) you want to run.

Visualizing the Pipelines

You can visualize the datasets, nodes, and connections of the Kedro pipelines in this project by running the following command:

kedro viz --autoreload

The image below shows the pipelines visualization for this project:

Formatting, Linting and Testing

Run just format to format your code
Run just lint to run linter
Run just test to run the tests
Run just validate to run all of the above (format, lint, and test)

You can configure Ruff by editing the .ruff.toml file. It is currently set to the default configuration.

Have a look at the file src/tests/test_run.py for instructions on how to write your tests. You can configure the coverage threshold in your project's pyproject.toml file under the [tool.coverage.report] section.

Docker

This project includes a multi-stage Dockerfile, which produces an image with the code and the dependencies installed. You can build the image with:

just docker-build

Then, you can run the Kedro pipelines inside a container with the image you just built by running:

just docker-run

The outputs of the pipelines will still be saved in your local data/ folder, because the docker container mounts the data/ folder as a volume.

Github Actions

This project includes a Github Actions workflow that runs the formatters, linters, and tests on every push/PR to the main or develop branches. You can find the workflow file in .github/workflows/format-lint-test.yml.

Name		Name	Last commit message	Last commit date
Latest commit History 77 Commits
.github/workflows		.github/workflows
conf		conf
data		data
images		images
notebooks		notebooks
src/geolocation_pipeline		src/geolocation_pipeline
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
Dockerfile		Dockerfile
Justfile		Justfile
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
ruff.toml		ruff.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Geolocation Pipeline

Setup

Environment

Data

Usage

Running the Pipelines

Visualizing the Pipelines

Formatting, Linting and Testing

Docker

Github Actions

About

Uh oh!

Contributors

Uh oh!

Languages

License

antonacio/geolocation-pipeline

Folders and files

Latest commit

History

Repository files navigation

Geolocation Pipeline

Setup

Environment

Data

Usage

Running the Pipelines

Visualizing the Pipelines

Formatting, Linting and Testing

Docker

Github Actions

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages