Skip to content

Commit 089d7a7

Browse files
committed
Adding cookiecutter example
1 parent 15300ee commit 089d7a7

File tree

7 files changed

+302
-0
lines changed

7 files changed

+302
-0
lines changed
Lines changed: 161 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,161 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"## {{cookiecutter.project_name}}\n",
8+
"\n",
9+
"{{cookiecutter.description}}\n",
10+
"\n",
11+
"### Data Sources\n",
12+
"- file1 : Description of where this file came from\n",
13+
"\n",
14+
"### Changes\n",
15+
"- {% now 'utc', '%m-%d-%Y' %} : Started project"
16+
]
17+
},
18+
{
19+
"cell_type": "code",
20+
"execution_count": null,
21+
"metadata": {},
22+
"outputs": [],
23+
"source": [
24+
"import pandas as pd\n",
25+
"from pathlib import Path\n",
26+
"from datetime import datetime"
27+
]
28+
},
29+
{
30+
"cell_type": "markdown",
31+
"metadata": {},
32+
"source": [
33+
"### File Locations"
34+
]
35+
},
36+
{
37+
"cell_type": "code",
38+
"execution_count": null,
39+
"metadata": {},
40+
"outputs": [],
41+
"source": [
42+
"today = datetime.today()\n",
43+
"in_file = Path.cwd() / \"data\" / \"raw\" / \"FILE1\"\n",
44+
"summary_file = Path.cwd() / \"data\" / \"processed\" / f\"summary_{today:%b-%d-%Y}.pkl\""
45+
]
46+
},
47+
{
48+
"cell_type": "code",
49+
"execution_count": null,
50+
"metadata": {},
51+
"outputs": [],
52+
"source": [
53+
"df = pd.read_csv(in_file)"
54+
]
55+
},
56+
{
57+
"cell_type": "markdown",
58+
"metadata": {},
59+
"source": [
60+
"### Column Cleanup\n",
61+
"\n",
62+
"- Remove all leading and trailing spaces\n",
63+
"- Rename the columns for consistency."
64+
]
65+
},
66+
{
67+
"cell_type": "code",
68+
"execution_count": null,
69+
"metadata": {},
70+
"outputs": [],
71+
"source": [
72+
"# https://stackoverflow.com/questions/30763351/removing-space-in-dataframe-python\n",
73+
"df.columns = [x.strip() for x in df.columns]"
74+
]
75+
},
76+
{
77+
"cell_type": "code",
78+
"execution_count": null,
79+
"metadata": {},
80+
"outputs": [],
81+
"source": [
82+
"cols_to_rename = {'col1': 'New_Name'}\n",
83+
"df.rename(columns=cols_to_rename, inplace=True)"
84+
]
85+
},
86+
{
87+
"cell_type": "markdown",
88+
"metadata": {},
89+
"source": [
90+
"### Clean Up Data Types"
91+
]
92+
},
93+
{
94+
"cell_type": "code",
95+
"execution_count": null,
96+
"metadata": {},
97+
"outputs": [],
98+
"source": [
99+
"df.dtypes"
100+
]
101+
},
102+
{
103+
"cell_type": "markdown",
104+
"metadata": {},
105+
"source": [
106+
"### Data Manipulation"
107+
]
108+
},
109+
{
110+
"cell_type": "code",
111+
"execution_count": null,
112+
"metadata": {},
113+
"outputs": [],
114+
"source": []
115+
},
116+
{
117+
"cell_type": "markdown",
118+
"metadata": {},
119+
"source": [
120+
"### Save output file into processed directory\n",
121+
"\n",
122+
"Save a file in the processed directory that is cleaned properly. It will be read in and used later for further analysis.\n",
123+
"\n",
124+
"Other options besides pickle include:\n",
125+
"- feather\n",
126+
"- msgpack\n",
127+
"- parquet"
128+
]
129+
},
130+
{
131+
"cell_type": "code",
132+
"execution_count": null,
133+
"metadata": {},
134+
"outputs": [],
135+
"source": [
136+
"df.to_pickle(summary_file)"
137+
]
138+
}
139+
],
140+
"metadata": {
141+
"kernelspec": {
142+
"display_name": "Python 3",
143+
"language": "python",
144+
"name": "python3"
145+
},
146+
"language_info": {
147+
"codemirror_mode": {
148+
"name": "ipython",
149+
"version": 3
150+
},
151+
"file_extension": ".py",
152+
"mimetype": "text/x-python",
153+
"name": "python",
154+
"nbconvert_exporter": "python",
155+
"pygments_lexer": "ipython3",
156+
"version": "3.6.5"
157+
}
158+
},
159+
"nbformat": 4,
160+
"nbformat_minor": 1
161+
}
Lines changed: 141 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,141 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"## {{cookiecutter.project_name}}\n",
8+
"\n",
9+
"{{cookiecutter.description}}\n",
10+
"\n",
11+
"This notebook contains basic statistical analysis and visualization of the data.\n",
12+
"\n",
13+
"### Data Sources\n",
14+
"- summary : Processed file from notebook 1-Data_Prep\n",
15+
"\n",
16+
"### Changes\n",
17+
"- {% now 'utc', '%m-%d-%Y' %} : Started project"
18+
]
19+
},
20+
{
21+
"cell_type": "code",
22+
"execution_count": null,
23+
"metadata": {},
24+
"outputs": [],
25+
"source": [
26+
"import pandas as pd\n",
27+
"from pathlib import Path\n",
28+
"from datetime import datetime\n",
29+
"import seaborn as sns"
30+
]
31+
},
32+
{
33+
"cell_type": "code",
34+
"execution_count": null,
35+
"metadata": {},
36+
"outputs": [],
37+
"source": [
38+
"%matplotlib inline"
39+
]
40+
},
41+
{
42+
"cell_type": "markdown",
43+
"metadata": {},
44+
"source": [
45+
"### File Locations"
46+
]
47+
},
48+
{
49+
"cell_type": "code",
50+
"execution_count": null,
51+
"metadata": {},
52+
"outputs": [],
53+
"source": [
54+
"today = datetime.today()\n",
55+
"in_file = Path.cwd() / \"data\" / \"processed\" / f\"summary_{today:%b-%d-%Y}.pkl\"\n",
56+
"report_dir = Path.cwd() / \"reports\"\n",
57+
"report_file = report_dir / \"Excel_Analysis_{today:%b-%d-%Y}.xlsx\""
58+
]
59+
},
60+
{
61+
"cell_type": "code",
62+
"execution_count": null,
63+
"metadata": {},
64+
"outputs": [],
65+
"source": [
66+
"df = pd.read_pickle(in_file)"
67+
]
68+
},
69+
{
70+
"cell_type": "markdown",
71+
"metadata": {},
72+
"source": [
73+
"### Perform Data Analysis"
74+
]
75+
},
76+
{
77+
"cell_type": "code",
78+
"execution_count": null,
79+
"metadata": {},
80+
"outputs": [],
81+
"source": []
82+
},
83+
{
84+
"cell_type": "markdown",
85+
"metadata": {},
86+
"source": [
87+
"### Save Excel file into reports directory\n",
88+
"\n",
89+
"Save an Excel file with intermediate results into the report directory"
90+
]
91+
},
92+
{
93+
"cell_type": "code",
94+
"execution_count": null,
95+
"metadata": {},
96+
"outputs": [],
97+
"source": [
98+
"writer = pd.ExcelWriter(report_file, engine='xlsxwriter')"
99+
]
100+
},
101+
{
102+
"cell_type": "code",
103+
"execution_count": null,
104+
"metadata": {},
105+
"outputs": [],
106+
"source": [
107+
"df.to_excel(writer, sheet_name='Report')"
108+
]
109+
},
110+
{
111+
"cell_type": "code",
112+
"execution_count": null,
113+
"metadata": {},
114+
"outputs": [],
115+
"source": [
116+
"writer.save()"
117+
]
118+
}
119+
],
120+
"metadata": {
121+
"kernelspec": {
122+
"display_name": "Python 3",
123+
"language": "python",
124+
"name": "python3"
125+
},
126+
"language_info": {
127+
"codemirror_mode": {
128+
"name": "ipython",
129+
"version": 3
130+
},
131+
"file_extension": ".py",
132+
"mimetype": "text/x-python",
133+
"name": "python",
134+
"nbconvert_exporter": "python",
135+
"pygments_lexer": "ipython3",
136+
"version": "3.6.5"
137+
}
138+
},
139+
"nbformat": 4,
140+
"nbformat_minor": 1
141+
}

pbp_cookiecutter/{{cookiecutter.directory_name}}/data/external/.gitkeep

Whitespace-only changes.

pbp_cookiecutter/{{cookiecutter.directory_name}}/data/interim/.gitkeep

Whitespace-only changes.

pbp_cookiecutter/{{cookiecutter.directory_name}}/data/processed/.gitkeep

Whitespace-only changes.

pbp_cookiecutter/{{cookiecutter.directory_name}}/data/raw/.gitkeep

Whitespace-only changes.

pbp_cookiecutter/{{cookiecutter.directory_name}}/reports/.gitkeep

Whitespace-only changes.

0 commit comments

Comments
 (0)