README for the EDA example project

In this repository you will find an exemplary exploratory data analysis (EDA). This was a project work provided by neuefische GmbH as part of the graduation tests for their Data Science bootcamp. For this we were given an imaginary stakeholder with a certain profile as well as a data set about housing in the king county area of Seatle. My tasks were:

fetch the data from a postgres data base via SQL
assess and clean the data
generate a presentation that is directed towards the stakeholder and adresses his interests.

If you want to run lines of code of the jupyter notebooks of this repository, please set up your system as instructed below. In the notebook 1_Fetching_the_data_eda you will see my code to access the data from the data base in order to generate the eda.csv file. In the EDA notebook you will find all my code regarding data cleaning, data assessment as well as image preparation. The final presentation to the stakeholder Thomas Hansen can can be found in keynote or PDF format.

Requirements

The requirements are listed in requirements.txt.

Setup

To setup the system in order to run the notebooks, please follow the instructions below:

setting the python version locally to 3.11.3
creating a virtual environment using the venv module
activating your newly created environment
upgrading pip (This step is not absolutely necessary, but will save you trouble when installing some packages.)
installing the required packages via pip

    pyenv local 3.11.3
    python -m venv .venv
    source .venv/bin/activate
    pip install --upgrade pip
    pip install -r requirements.txt

The requirements.txt was generated with the following command at the end of my EDA.

pip freeze > requirements.txt

Note: In rare case such a requirements file created with pip freeze might not ensure that another (especially M1 chip) user can install and execute it properly. This can happen if libraries need to be compiled (e.g. SciPy). Then it also depends on environment variables and the actual system libraries.

Unit testing (Optional)

If you write python scripts for your data processing methods, you can also write unit tests. In order to run the tests execute in terminal:

pytest

This command will execute all the functions in your project that start with the word test.

Set up your Environment

This repo contains a requirements.txt file with a list of all the packages and dependencies you will need. Before you install the virtual environment, make sure to install postgresql if you haven't done it before.

Check the postgresql version by run the following commands:
```
psql --version
```
If you haven't installed it yet, begin at step_1. Otherwise, proceed to step_2.

Before you can start with plotly in Jupyter Lab you have to install node.js (if you haven't done it before).

Check Node version by run the following commands:
```
node -v
```
If you haven't installed it yet, begin at step_2. Otherwise, proceed to step_3.

`macOS` type the following commands :

Step_1: Update Homebrew and install Postgresql by following commands:
```
brew update
brew install postgresql@14
```
Restart Your Terminal and than check the postgresql version by run the following commands:
```
psql --version
```
If psql --version doesn't display the version, add PostgreSQL to your macOS PATH by following these steps:
- Find and copy the PostgreSQL bin directory on macOS.
  
  The default path is typically /Library/PostgreSQL/<version>/bin, where is your PostgreSQL version.
- Edit the .zshrc or a similar .conf file using a text editor like Nano, Vim, or VSCode.
```
nano ~/.zshrc
```
- Add the following line to the .zshrc file. Make sure to replace with your PostgreSQL version.
```
export PATH="/Library/PostgreSQL/<version>/bin:$PATH"
```
- Save and exit the text editor. In nano, you can do this by pressing Ctrl + O, then Enter, and then Ctrl + X to exit.
- Restart Your Terminal
```
source ~/.zshrc
psql --version
```
Step_2: Update Homebrew and install Node by following commands:
```
brew update
brew install node
```

Step_3: Install the virtual environment and the required packages by following commands:

pyenv local 3.11.3
python -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt

`WindowsOS` type the following commands :

Step_1: Update Chocolatey and install Postgresql by following commands:
```
choco upgrade chocolatey
choco install postgresql14
```
Restart Your Terminal and than check the postgresql version by run the following commands:
```
psql --version
```
If psql --version doesn't display the version, add PostgreSQL to your winOS PATH by following these steps:
- Find and copy the PostgreSQL bin directory on winOS.
  
  The default path is typically C:\Program Files\PostgreSQL\<version>\bin, where is your PostgreSQL version.
- Open Command Prompt as Administrator:
  - Search for "Command Prompt" in your Start menu.
  - Right-click on "Command Prompt" and select "Run as administrator."
- Add PostgreSQL to PATH:
  - Replace 14 with your PostgreSQL version if it's different.
```
setx PATH "$($env:PATH);C:\Program Files\PostgreSQL\14\bin"
```
- Close the Administrator Command Prompt window.
- Open a new Terminal and run the following command
```
psql --version
```
Step_2: Update Chocolatey and install Node by following commands:
```
choco upgrade chocolatey
choco install nodejs
```

Step_3: Install the virtual environment and the required packages by following commands.

For PowerShell CLI :

python -m venv .venv
.venv\Scripts\Activate.ps1
pip install --upgrade pip
pip install -r requirements.txt

For Git-Bash CLI :

python -m venv .venv
source .venv/Scripts/activate
pip install --upgrade pip
pip install -r requirements.txt

Note: If you encounter an error when trying to run pip install --upgrade pip, try using the following command:

python.exe -m pip install --upgrade pip

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.github/workflows		.github/workflows
assignment_files		assignment_files
data		data
images		images
optional		optional
.gitignore		.gitignore
1_Fetching_the_data_eda.ipynb		1_Fetching_the_data_eda.ipynb
EDA presentation to Thomas Hansen.key		EDA presentation to Thomas Hansen.key
EDA presentation to Thomas Hansen.pdf		EDA presentation to Thomas Hansen.pdf
EDA.ipynb		EDA.ipynb
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

README for the EDA example project

Requirements

Setup

Unit testing (Optional)

Set up your Environment

`macOS` type the following commands :

`WindowsOS` type the following commands :

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

README for the EDA example project

Requirements

Setup

Unit testing (Optional)

Set up your Environment

macOS type the following commands :

WindowsOS type the following commands :

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

`macOS` type the following commands :

`WindowsOS` type the following commands :

Packages