-
Notifications
You must be signed in to change notification settings - Fork 36
Description
Submitting Author: Raktim Mukhopadhyay (@rmj3197)
All current maintainers: @rmj3197, @giovsaraceno
Package Name: QuadratiK
One-Line Description of Package: QuadratiK includes test for multivariate normality, test for uniformity on the sphere, non-parametric two- and k-sample tests, random generation of points from the Poisson kernel-based density and clustering algorithm for spherical data.
Repository Link: https://github.com/rmj3197/QuadratiK
Version submitted: 1.1.0
EIC: @Batalex
Editor: @isabelizimm
Reviewer 1: @acolum
Reviewer 2: @ab93
Archive: https://zenodo.org/records/14546750
JOSS DOI: TBD
Version accepted: 1.1.2
Date accepted (month/day/year): 12/03/2024
Code of Conduct & Commitment to Maintain Package
- I agree to abide by pyOpenSci's Code of Conduct during the review process and in maintaining my package after should it be accepted.
- I have read and will commit to package maintenance after the review as per the pyOpenSci Policies Guidelines.
Description
- Include a brief paragraph describing what your package does:
Documentation link : https://quadratik.readthedocs.io/en/latest/
We introduce the QuadratiK
package that incorporates innovative data analysis methodologies. The presented software, implemented in both R
and Python
, offers a comprehensive set of novel goodness-of-fit tests and clustering techniques using kernel-based quadratic distances. Our software implements one, two and k-sample tests for goodness of fit, providing an efficient and mathematically sound way to assess the fit of probability distributions. Expanded capabilities of our software include supporting tests for uniformity on the R
and Python
packages serve as a powerful suite of tools, offering researchers and practitioners the means to delve deeper into their data, draw robust inference, and conduct potentially impactful analyses and inference across a wide array of disciplines.
Scope
-
Please indicate which category or categories.
Check out our package scope page to learn more about our
scope. (If you are unsure of which category you fit, we suggest you make a pre-submission inquiry):- Data retrieval
- Data extraction
- Data processing/munging
- Data deposition
- Data validation and testing
- Data visualization1
- Workflow automation
- Citation management and bibliometrics
- Scientific software wrappers
- Database interoperability
Domain Specific
- Geospatial
- Education
Community Partnerships
If your package is associated with an existing community please check below:
- Astropy:My package adheres to Astropy community standards
- Pangeo: My package adheres to the Pangeo standards listed in the pyOpenSci peer review guidebook
-
For all submissions, explain how the and why the package falls under the categories you indicated above. In your explanation, please address the following points (briefly, 1-2 sentences for each):
-
Who is the target audience and what are scientific applications of this package?
- The QuadratiK package offers robust tools for goodness-of-fit testing, a fundamental aspect in statistical analysis, where accurately assessing the fit of probability distributions is essential. This is especially critical in research domains where model accuracy has direct implications on conclusions and further research directions.
- Spherical data structures are common in fields such as biology, geosciences and astronomy, where data points are naturally mapped to a sphere. QuadratiK provides a tailored approach to effectively handle and interpret these data.
- This package is also of particular interest to professionals in health and biological sciences, where understanding and interpreting spherical data can be crucial in studies ranging from molecular biology to epidemiology and public health.
- The QuadratiK package offers robust tools for goodness-of-fit testing, a fundamental aspect in statistical analysis, where accurately assessing the fit of probability distributions is essential. This is especially critical in research domains where model accuracy has direct implications on conclusions and further research directions.
-
Are there other Python packages that accomplish the same thing? If so, how does yours differ?
-
SciPy
andhyppo
also have collections of goodness-of-fit test functionalities. Our interest focuses on tests that are based on the family of kernel-based quadratic distances. The kernels we use are diffusion kernels, that is, probability distributions that depend on a tuning parameter and satisfy the convolution property. We also implement the Poisson kernel-based tests for uniformity on the d-dimensional sphere. -
We are aware of only a limited number of
Python
libraries that offer spherical clustering capabilities, such asspherecluster
(last updated in November 2018) andsoyclustering
(last updated in May 2020).spherecluster
implements Spherical K-Means and clustering using von Mises Fisher distributions as proposed in "Banerjee, Arindam, et al. "Clustering on the Unit Hypersphere using von Mises-Fisher Distributions." Journal of Machine Learning Research 6.9 (2005).".soyclustering
implements spherical k-means for document clustering which has been proposed in Kim, Hyunjoong, Han Kyul Kim, and Sungzoon Cho. "Improving spherical k-means for document clustering: Fast initialization, sparse centroid projection, and efficient cluster labeling." Expert Systems with Applications 150 (2020): 113288. -
In summary, there are fundamental differences between QuadratiK and existing packages that are as follows -
- The GOF tests are U-statistics based on centered kernels. The concept and methodology of centering is unique to our methods and is not part of the methods appearing in existing packages.
- An algorithm for connecting the tuning parameter with the statistical properties of the test, namely power and degrees of freedom (DOF) is provided. This feature differentiates our novel methods from methods in other packages.
- A new clustering algorithm for data that reside on the sphere using the Poisson kernel-based densities is offered. This aspect is not a feature of the existing packages.
- We also offer algorithms for generating random samples from Poisson kernel-based densities. This capability is also unique to our package.
-
We also implement a GUI to enable interaction with the software in a non-programmatic manner using the
streamlit
library. We have not found any python package that implements a GUI for the above described tasks.
-
-
If you made a pre-submission enquiry, please paste the link to the corresponding issue, forum post, or other discussion, or
@tag
the editor you contacted:
Please see our pre-submission enquiry for this submission at -
Pre-submission Inquiry for QuadratiK #168
-
Technical checks
For details about the pyOpenSci packaging requirements, see our packaging guide. Confirm each of the following by checking the box. This package:
- does not violate the Terms of Service of any service it interacts with.
- uses an OSI approved license.
- contains a README with instructions for installing the development version.
- includes documentation with examples for all functions.
- contains a tutorial with examples of its essential functions and uses.
- has a test suite.
- has continuous integration setup, such as GitHub Actions CircleCI, and/or others.
Publication Options
- Do you wish to automatically submit to the Journal of Open Source Software? If so:
JOSS Checks
- The package has an obvious research application according to JOSS's definition in their submission requirements. Be aware that completing the pyOpenSci review process does not guarantee acceptance to JOSS. Be sure to read their submission requirements (linked above) if you are interested in submitting to JOSS.
- The package is not a "minor utility" as defined by JOSS's submission requirements: "Minor ‘utility’ packages, including ‘thin’ API clients, are not acceptable." pyOpenSci welcomes these packages under "Data Retrieval", but JOSS has slightly different criteria.
- The package contains a
paper.md
matching JOSS's requirements with a high-level description in the package root or ininst/
. - The package is deposited in a long-term repository with the DOI:
Note: JOSS accepts our review as theirs. You will NOT need to go through another full review. JOSS will only review your paper.md file. Be sure to link to this pyOpenSci issue when a JOSS issue is opened for your package. Also be sure to tell the JOSS editor that this is a pyOpenSci reviewed package once you reach this step.
Are you OK with Reviewers Submitting Issues and/or pull requests to your Repo Directly?
This option will allow reviewers to open smaller issues that can then be linked to PR's rather than submitting a more dense text based review. It will also allow you to demonstrate addressing the issue via PR links.
- Yes I am OK with reviewers submitting requested changes as issues to my repo. Reviewers will then link to the issues in their submitted review.
Confirm each of the following by checking the box.
- I have read the author guide.
- I expect to maintain this package for at least 2 years and can help find a replacement for the maintainer (team) if needed.
Please fill out our survey
- Last but not least please fill out our pre-review survey. This helps us track
submission and improve our peer review process. We will also ask our reviewers
and editors to fill this out.
P.S. Have feedback/comments about our review process? Leave a comment here
Editor and Review Templates
The editor template can be found here.
The review template can be found here.
Footnotes
-
Please fill out a pre-submission inquiry before submitting a data visualization package. ↩
Metadata
Metadata
Assignees
Labels
Type
Projects
Status