You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: tutorials/intro.md
+59-26Lines changed: 59 additions & 26 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -161,7 +161,10 @@ GitHub and GitLab also provide continuous integration and continuous deployment
161
161
**An example of Continuous deployment:**
162
162
* When you are ready to release your package to PyPI, a continuous deployment operation might be triggered on release to publish your package to PyPI.
163
163
164
-
Integrated CI/CD will help you maintain your software ensuing that changes to the code don't break things unexpectedly and also maintain a style and format consistency.
164
+
Integrated CI/CD will help you maintain your software, ensuring that
165
+
changes to the code don't break things unexpectedly. They can also
166
+
help you maintain code style and format consistency for every new
167
+
change to your code.
165
168
166
169
:::{figure-md} packaging-workflow
167
170
@@ -170,43 +173,36 @@ Integrated CI/CD will help you maintain your software ensuing that changes to th
170
173
The lifecycle of a scientific Python package.
171
174
:::
172
175
173
-
## What should code in a Python package look like?
176
+
## When should you turn your code into a Python package?
174
177
175
-
Ideally the code in your Python package is general. This means it
176
-
can be used on different data or for different scientific applications. An example
177
-
of a package that is written in a generalized way is matplotlib.
178
+
You may be wondering, what types of code should become a Python package that is both on GitHub and published to PyPI and/or conda-forge.
178
179
179
-
matplotlib does
180
-
one (big important) thing really well:
180
+
There are a few use cases to consider:
181
181
182
-
*It creates visual plots of data.*
182
+
1.**Creating a basic package for yourself:** Sometimes you want create a package for your own personal use. This might mean making your code locally pip installable and you may also want to publish it to GitHub. In that case you don't expect others to use your code, and as such you may only have documentation for you and your future self if you need to update the package.
183
183
184
-
Matplotlib is used by thousands of users for different plotting applications
185
-
using different types of data. While few scientific packages will have the same
186
-
broad application as tools like matplotlib or NumPy, the
187
-
idea of code being used for something more than a single workflow still applies
188
-
to package development if you want other people to use your package.
184
+
> An example of this type of package might be a set of functions that you write that are useful across several of your projects. It could be useful to have those functions available to all of your projects.
189
185
190
-
### Code should also be clean & readable & documented
186
+
:::{todo}
187
+
LINK to pip installable lesson when it's published - it's in review now
188
+
:::
191
189
192
-
The code in your package should also be clean, readable and well documented.
190
+
2. In other cases, you may create some code that you soon realize might also be useful to not just you, but to other people as well.
191
+
In that case, you might consider both creating the package, publishing it on GitHub, and because other users may be using it, you may make user of GitHub's infrastructure including CI/CD pipelines, issue trackers. Because you want other people to use your package, you will want to also include LICENSE information, documentation for users and contributors and tests. This type of package is most often published to PyPI.
193
192
194
-
**Clean code:** Clean code refers to code that uses expressive variable names,
195
-
is concise and does not repeat itself. We will dive deeper into best practices
196
-
for clean code in future pyOpenSci tutorials.
193
+
For example, all of the [pyOpenSci packages](https://www.pyopensci.org/python-packages.html) are public facing with an intended audience beyond just the maintainers.
197
194
198
-
**Readable code:** Readable code is code written with a consistent style.
199
-
You can use linters and code formatters such as black and flake8 to ensure
200
-
this consistency throughout your entire package. [Learn more about code formatters here.](../package-structure-code/code-style-linting-format.html)
195
+
### Packages that you expect others to use should be well-scoped
196
+
197
+
Ideally the code in your Python package is focused on a specific theme or use case. This theme is important as it's a way to scope the content of your package.
198
+
199
+
It can be tricky to decide when your code becomes something that might be more broadly useful to others. But one question you can ask yourself is - is your code written specifically for a single research project? Or could it have a broader application across multiple projects in your domain?
201
200
202
-
**Documented code:** documented code is written using docstrings that help a
203
-
user understand both what the functions and methods in your code does and also
204
-
what the input and output elements of each function is. [You can learn more about docstrings in our guide, here.](../documentation/write-user-documentation/document-your-code-api-docstrings)
205
201
206
-
:::{admonition} Where do research compendia fit in?
202
+
:::{admonition} How does this relate to code for a research project?
207
203
:class: note
208
204
209
-
A Research Compendium is an organized set of code, data and documentation that
205
+
A [Research Compendium](https://the-turing-way.netlify.app/reproducible-research/compendia.html#basic-compendium) is an organized set of code, data and documentation that
210
206
supports a specific research project. It aims to enhance the reproducibility and
211
207
transparency of research by providing a comprehensive record of the methods,
212
208
data, and analyses used in a study.
@@ -216,8 +212,45 @@ specific set of tasks that can be applied across numerous research projects.
216
212
As such a Python package is more generalizable than a Research Compendium
217
213
which supports a specific project.
218
214
215
+
*[Read about `Good enough practices in scientific computing`](https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005510)
216
+
*[Learn more about research compendia (also called repo-packs) in this blog post.](https://lorenabarba.com/blog/how-repro-packs-can-save-your-future-self/)
219
217
:::
220
218
219
+
220
+
Below are a few examples well scoped pyOpenSci packages:
221
+
222
+
*[Crowsetta](https://crowsetta.readthedocs.io/en/latest/): is a package designed to work with annotating animal vocalizations and bioacoustics data. This package helps scientists process different types of bioacoustic data rather than focusing on a specific individual research application associated with a user-specific research workflow.
223
+
*[pandera](https://www.union.ai/pandera) is another more broadly used Python package. Pandera supports data testing and thus also has a broader research application.
224
+
225
+
:::{admonition} Matplotlib as an example
226
+
227
+
At the larger end of the user spectrum, Matplotlib is a great example.
228
+
Matplotlib does one (big important) thing really well:
229
+
230
+
*It creates visual plots of data.*
231
+
232
+
Matplotlib is used by thousands of users for different plotting applications
233
+
using different types of data. While few scientific packages will have the same
234
+
broad application and large user base as tools like Matplotlib, the
235
+
idea of scoping out what your package does is still important.
236
+
:::
237
+
238
+
### Code should also be clean & readable & documented
239
+
240
+
The code in your package should also be clean, readable and well documented.
241
+
242
+
**Clean code:** Clean code refers to code that uses expressive variable names,
243
+
is concise and does not repeat itself. We will dive deeper into best practices
244
+
for clean code in future pyOpenSci tutorials.
245
+
246
+
**Readable code:** Readable code is code written with a consistent style.
247
+
You can use linters and code formatters such as black and flake8 to ensure
248
+
this consistency throughout your entire package. [Learn more about code formatters here.](../package-structure-code/code-style-linting-format.html)
249
+
250
+
**Documented code:** documented code is written using docstrings that help a
251
+
user understand both what the functions and methods in your code do and also
252
+
what the input and output elements of each function is. [You can learn more about docstrings in our guide, here.](../documentation/write-user-documentation/document-your-code-api-docstrings)
253
+
221
254
## Making your package installable - publishing to PyPI & conda-forge
0 commit comments