capstone4ds_template/index.qmd at main · capstone4ds/capstone4ds_template · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
---
title: "Writing a great story for data science projects - Spring 2026 "
subtitle: "This is a Report Template Quarto"
author: "Student Name (Advisor: Dr. Cohen)"
date: '`r Sys.Date()`'
format:
  html:
    code-fold: true
course: Capstone Projects in Data Science
bibliography: references.bib # file contains bibtex for references
#always_allow_html: true # this allows to get PDF with HTML features
self-contained: true
execute:
  warning: false
  message: false
editor:
  markdown:
    wrap: 72
---

Slides: [slides.html](slides.html){target="_blank"} ( Go to `slides.qmd`
to edit)

::: callout-important
**Remember:** Your goal is to make your audience understand and care
about your findings. By crafting a compelling story, you can effectively
communicate the value of your data science project.

Carefully read this template since it has instructions and tips to
writing!

Nice report!
:::

## Introduction

The introduction should:

-   Develop a storyline that captures attention and maintains interest.

-   Your audience is your peers

-   Clearly state the problem or question you're addressing.

<!-- -->

-   Introduce why it is relevant needs.

-   Provide an overview of your approach.

Example of writing including citing references:

*This is an introduction to ..... regression, which is a non-parametric
estimator that estimates the conditional expectation of two variables
which is random. The goal of a kernel regression is to discover the
non-linear relationship between two random variables. To discover the
non-linear relationship, kernel estimator or kernel smoothing is the
main method to estimate the curve for non-parametric statistics. In
kernel estimator, weight function is known as kernel function
[@efr2008]. Cite this paper [@bro2014principal]. The GEE [@wang2014].
The PCA [@daffertshofer2004pca]*. Topology can be used in machine
learning [@adams2021topology]

For Symbolic Regression [@wang2019symbolic] *This is my work and I want
to add more work...*

Cite new paper [@su2012linear]

Classification of Knee OA was studied here [@guida2021knee]

## Methods

-   Detail the models or algorithms used.

-   Justify your choices based on the problem and data.

*The common non-parametric regression model is*
$Y_i = m(X_i) + \varepsilon_i$*, where* $Y_i$ *can be defined as the sum
of the regression function value* $m(x)$ *for* $X_i$*. Here* $m(x)$ *is
unknown and* $\varepsilon_i$ *some errors. With the help of this
definition, we can create the estimation for local averaging i.e.*
$m(x)$ *can be estimated with the product of* $Y_i$ *average and* $X_i$
*is near to* $x$*. In other words, this means that we are discovering
the line through the data points with the help of surrounding data
points. The estimation formula is printed below [@R-base]:*

$$
M_n(x) = \sum_{i=1}^{n} W_n (X_i) Y_i  \tag{1}
$$$W_n(x)$ *is the sum of weights that belongs to all real numbers.
Weights are positive numbers and small if* $X_i$ *is far from* $x$*.*

*Another equation:*

$$
y_i = \beta_0 + \beta_1 X_1 +\varepsilon_i
$$

## Analysis and Results

### Data Exploration and Visualization

-   Describe your data sources and collection process.

-   Present initial findings and insights through visualizations.

-   Highlight unexpected patterns or anomalies.

A study was conducted to determine how...

```{r, warning=FALSE, echo=T, message=FALSE}
# loading packages
library(tidyverse)
library(knitr)
library(ggthemes)
library(ggrepel)
library(dslabs)
```

```{python}
import pandas as pd
```

```{r, warning=FALSE, echo=TRUE}
# Load Data
kable(head(murders))

ggplot1 = murders %>% ggplot(mapping = aes(x=population/10^6, y=total))

  ggplot1 + geom_point(aes(col=region), size = 4) +
  geom_text_repel(aes(label=abb)) +
  scale_x_log10() +
  scale_y_log10() +
  geom_smooth(formula = "y~x", method=lm,se = F)+
  xlab("Populations in millions (log10 scale)") +
  ylab("Total number of murders (log10 scale)") +
  ggtitle("US Gun Murders in 2010") +
  scale_color_discrete(name = "Region")+
      theme_bw()


```

### Modeling and Results

-   Explain your data preprocessing and cleaning steps.

-   Present your key findings in a clear and concise manner.

-   Use visuals to support your claims.

-   **Tell a story about what the data reveals.**

```{r}

```

### Conclusion

-   Summarize your key findings.

-   Discuss the implications of your results.

## References