You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: _posts/2025-01-17-quadratik.md
+21-21Lines changed: 21 additions & 21 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -34,9 +34,9 @@ last_modified: 2025-01-17
34
34
35
35
## Goodness-of-Fit (GoF) Tests
36
36
37
-
Goodness-of-Fit (GoF) tests are classical tools for assessing the compatibility of data with a given probability model. GoF tests typically compute a distance-like metric between the null distribution and observations, rejecting the null hypothesis if the distance exceeds a critical value.
37
+
Goodness-of-Fit (GoF) tests are classical tools for assessing the compatibility of data with a given probability model. GoF tests typically compute a distance-like metric between the null distribution and observations, rejecting the null hypothesis if the distance exceeds a critical value.
38
38
39
-
The methods for normality, two-sample, and k-sample test use a bandwidth parameter `h`. We have also provided an algorithm for determining the optimal value of `h` based on the mid-power analysis (please see Markatou and Saraceno (2024)). You can find more details on algorithm in our [manual](https://quadratik.readthedocs.io/en/latest/user_guide/hselect.html).
39
+
The methods for normality, two-sample, and k-sample test use a bandwidth parameter `h`. We have also provided an algorithm for determining the optimal value of `h` based on the mid-power analysis (please see Markatou and Saraceno (2024)). You can find more details on algorithm in our [manual](https://quadratik.readthedocs.io/en/latest/user_guide/hselect.html).
40
40
41
41
In this section, the various GoF tests are shown with corresponding examples.
<imgsrc="/images/quadratik/normality-test-results.jpg"alt="Results for the Normality Test." />
72
72
</picture>
73
73
</figure>
74
74
75
-
The test rightly fails to reject the null hypothesis, as the samples have been generated from a standard normal distribution.
75
+
The test rightly fails to reject the null hypothesis, as the samples have been generated from a standard normal distribution.
76
76
77
77
### Two-Sample Test
78
78
The two-sample GoF test is used to determine whether two separate samples are likely drawn from the same population distribution.
79
79
80
-
To illustrate the two sample test, we generate n = 200 random samples from a multivariate standard normal distribution and a skewed normal distribution with value of skewness parameter lambda = 0.1.
80
+
To illustrate the two sample test, we generate n = 200 random samples from a multivariate standard normal distribution and a skewed normal distribution with value of skewness parameter lambda = 0.1.
<imgsrc="/images/quadratik/two-sample-test-results.png"alt="Results for the Two Sample Test." />
111
111
</picture>
112
112
</figure>
113
113
114
-
The test rejects the null hypothesis, as the samples have been generated from two different distributions.
114
+
The test rejects the null hypothesis, as the samples have been generated from two different distributions.
115
115
116
116
### K-Sample Test
117
117
118
118
Similar to the two-sample test, the k-sample test examines whether k groups of samples are obtained from the same distribution.
119
119
120
-
For illustrating the k-sample test, we use the glass identification dataset from the [UCI ML repository](https://archive.ics.uci.edu/dataset/42/glass+identification). We use the first three classes of glass types to illustrate the working of the k-sample test.
120
+
For illustrating the k-sample test, we use the glass identification dataset from the [UCI ML repository](https://archive.ics.uci.edu/dataset/42/glass+identification). We use the first three classes of glass types to illustrate the working of the k-sample test.
@@ -154,9 +154,9 @@ The null hypothesis is rejected for the k-sample test indicates that there is **
154
154
155
155
### Uniformity Test on the Sphere
156
156
157
-
In this we test the null hypothesis of uniformity on the sphere. We illustrate this test using an example.
157
+
In this we test the null hypothesis of uniformity on the sphere. We illustrate this test using an example.
158
158
159
-
The data for this example is generated from a multivariate standard normal distribution, and is further divided by the L2 norm of generated vectors. This processed data is uniformly distributed on the surface of the unit sphere.
159
+
The data for this example is generated from a multivariate standard normal distribution, and is further divided by the L2 norm of generated vectors. This processed data is uniformly distributed on the surface of the unit sphere.
The image is segmented into k clusters with k ranging from 2 to 8. Below, we display the regions identified for each value of k.
252
+
The image is segmented into k clusters with k ranging from 2 to 8. Below, we display the regions identified for each value of k.
253
253
254
254
<figurestyle="float: center;">
255
255
<picture>
@@ -258,7 +258,7 @@ The image is segmented into k clusters with k ranging from 2 to 8. Below, we dis
258
258
</picture>
259
259
</figure>
260
260
261
-
Starting from k = 5, the segmented images reveal only minor changes in the identified segments upon closer examination. Let us see if we can validate our observation using the elbow plots.
261
+
Starting from k = 5, the segmented images reveal only minor changes in the identified segments upon closer examination. Let us see if we can validate our observation using the elbow plots.
The elbow plots show a clear elbow at k = 5, which aligns with our observation that all regions of the image are effectively identified at this value of k.
274
+
The elbow plots show a clear elbow at k = 5, which aligns with our observation that all regions of the image are effectively identified at this value of k.
275
275
276
276
The clustering algorithm proposed in Golzy and Markatou has been used in other works such as Golzy et al. (2023), Strelnikoff at al. (2020), and Strelnikoff et al. (2024).
The generated samples can also be visualized on the unit sphere.
296
+
The generated samples can also be visualized on the unit sphere.
297
297
298
298
```python
299
299
import matplotlib.pyplot as plt
@@ -344,7 +344,7 @@ plt.tight_layout()
344
344
345
345
<br>
346
346
347
-
More details on Poisson Kernel-Based Distributions can be found in the package documentation [here](https://quadratik.readthedocs.io/en/latest/user_guide/pkbd.html).
347
+
More details on Poisson Kernel-Based Distributions can be found in the package documentation [here](https://quadratik.readthedocs.io/en/latest/user_guide/pkbd.html).
348
348
349
349
## Dashboard
350
350
@@ -366,7 +366,7 @@ UI().run()
366
366
367
367
`QuadratiK` provides methods to researchers and practitioners to delve deeper into their data, draw robust inference, and conduct potentially impactful analyses and inference across a wide array of disciplines. The `QuadratiK` package is also available in `R` and is hosted on [CRAN](https://cran.r-project.org/web/packages/QuadratiK/index.html). You can learn more about `QuadratiK` in our [arXiv preprint](https://arxiv.org/abs/2402.02290). Additional theoretical papers of interest are listed in the reference section.
368
368
369
-
Please feel free to reach me at raktimmu at buffalo.edu.
369
+
Please feel free to reach me at raktimmu at buffalo.edu.
370
370
371
371
Thank you! Happy coding to you — may your bugs be few, and your data ever insightful! 🚀😊
372
372
@@ -383,7 +383,7 @@ Thank you! Happy coding to you — may your bugs be few, and your data ever insi
383
383
- Markatou, M., & Saraceno, G. (2024). A unified framework for multivariate two-sample and k-sample kernel-based quadratic distance goodness-of-fit tests. DOI: 10.48550/arXiv.2407.16374v1
384
384
385
385
- Golzy, M., Rosen, G. H., Kruse, R. L., Hooshmand, K., Mehr, D. R., & Murray, K. S. (2023). Holistic assessment of quality of life predicts survival in older patients with bladder cancer. Urology, 174, 141-149.
386
-
386
+
387
387
- Strelnikoff, S., Jammalamadaka, A., & Warmsley, D. (2020, December). Causal maps for multi-document summarization. In 2020 IEEE International Conference on Big Data (Big Data) (pp. 4437-4445). IEEE.
388
388
389
-
- Strelnikoff, S., Jammalamadaka, A., & Warmsley, D. M. (2024). U.S. Patent No. 11,907,307. Washington, DC: U.S. Patent and Trademark Office.
389
+
- Strelnikoff, S., Jammalamadaka, A., & Warmsley, D. M. (2024). U.S. Patent No. 11,907,307. Washington, DC: U.S. Patent and Trademark Office.
0 commit comments