Skip to content
Snippets Groups Projects
Commit e1269bdb authored by Theresa Mieslinger's avatar Theresa Mieslinger
Browse files

implement Bjorn's review

parent c9d26996
No related branches found
No related tags found
1 merge request!57First draft for Good Scientific and Coding Practice
Pipeline #72929 passed
......@@ -24,15 +24,15 @@ Principles fomulated by the research community that define ***proper research be
:::
*copied from [European Code of Conduct for Research Integrity](https://allea.org/wp-content/uploads/2023/06/European-Code-of-Conduct-Revised-Edition-2023.pdf)*
## The pillars of Good Scientific Practice
## Agenda for this lecture
* **Reliability & Reproducibility**
* *primary data*
* *data management*
* Honesty
* (Honesty)
* **Respect & Accountability**
* *authorship*
* *citation*
:::notes
* we'll cover the topics of primary data, authorship and licenses
:::
# Reliability & Reproducibility
......@@ -49,14 +49,60 @@ Principles fomulated by the research community that define ***proper research be
## What do we need to save and how? {.special}
### Primary Data
::: {.fragment}
* code, configuration, input data
* data management
## Data
:::: {.columns}
::: {.column width="50%"}
**Primary data**
* observational, experimental data
* code base / software version
* configuration
* input data: initial / boundary conditions
:::
::: {.column width="50%"}
**Derived data**
* previously published data
* publicly available data, e.g. most satellite data
* model output
:::
::::
::: {.notes}
* What is needed to reproduce the argument?
* Primary data is typically published for the first time and cannot be re-generated / measured again.
* Derived data is easy to reproduce from accessible sources.
:::
## (Meta)data and [FAIR principles](https://www.go-fair.org/fair-principles/)
* **Findable**: unique identifiers, metadata registered in a searchable resource
* **Accessible**: (meta)data retrievable via standardized communication protocol
* **Interoperable**: compatibility with other data through, e.g. [CF conventions](http://cfconventions.org/), common data formats `netCDF`, `zarr`, `csv`.
* **Reusable**: (meta)data description, attributes, data usage license
## "FAIR is not fun, but fun is FAIR"
:::: {.columns}
::: {.column width="50%"}
**Issues with FAIR data**
* data availability not guaranteed
* accessibility only with credentials possible
* DOIs don't point to data, but only to landing pages
:::
::: {.column width="50%"}
**Beyond FAIR data**
* openly accessible
* analysis-ready cloud-optimized data formats ([Abernathey et al., 2021](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9354557))
:::
::::
::: {.notes}
* “Cloud-ready”: datasets shall be possible to be opened directly from any other server, e.g. OPeNDAP
* cloud-optimized (zarr, TileDB, …)
:::
## What should we document? {.special}
......@@ -64,7 +110,7 @@ Principles fomulated by the research community that define ***proper research be
*intent and usage*
:::
##
### Dokumentation
### Documentation
| self-explanatory code | [Commenting Showing Intent (CSI)](https://standards.mousepawmedia.com/en/stable/csi.html) | Docstrings & Manuals
| -----------| ----------- | ----------- |
......@@ -100,6 +146,10 @@ int items_per_box = floor(items/17)
:::
:::
:::smaller
Examples from [MousePaw Media Standards](https://standards.mousepawmedia.com/en/stable/csi.html)
:::
::: {.notes}
* don't state the obvious
:::
......@@ -182,66 +232,27 @@ See also Python docstring conventions [PEP-257](https://peps.python.org/pep-0257
:::
::::
## (Meta)data and FAIR principles
* **Findable**: unique identifiers, metadata registered in a searchable resource
* **Accessible**: (meta)data retrievable via stnadardised communication protocol
* **Interoperable**: compatibility with other data through, e.g. [CF conventions](http://cfconventions.org/), common data formats `netCDF`, `zarr`, `csv`.
* **Reusable**: (meta)data description, attributes, data usage license
## Which tools shall we use? {.special}
:::{.info .smaller}
[FAIR principles](https://www.go-fair.org/fair-principles/)
:::
## Open Source / Open Development
*Open source is a decentralized software development model that encourages collaboration.*
## Beyond FAIR data
:::: {.columns}
::: {.column width="50%"}
**Issues with FAIR data**
* data availability not guaranteed
* accessibility only with credentials possible
* DOIs don't point to data, but only to landingpages
:::
::: {.column width="50%"}
**Make it FUN :)**
to work with the data
* openly accessible
* analysis-ready cloud-optimized data formats ([Abernathey et al., 2021](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9354557))
:::
::::
::: {.notes}
* “Cloud-ready”: datasets shall be possible to be opened directly from any other server, e.g. OPeNDAP
* cloud-optimized (zarr, TileDB, …)
:::
## Which tools shall we use? {.leftalign}
:::fragment
### Open Source / Open Collaboration
*Open source is a decentralized software development model that encourages open collaboration.*
:::
:::fragment
> [It] relies on **goal-oriented** yet loosely coordinated participants who **cooperate voluntarily** to create a product (or service) of economic value, which is made **freely available** to contributors and noncontributors alike. [Levine and Prietula, 2013](https://pubsonline.informs.org/doi/10.1287/orsc.2013.0872)
:::
## Which tools shall we use? {.leftalign}
### Trustworthy sources {.leftalign}
:::fragment
## Trustworthy sources
* open source products
* products that are used by a critical mass of users (*"democracy works"*)
* software products under active development and maintenance, e.g. according to Git commit history
* proper documentation
:::
::: {.notes}
* examples: NCL, pyicon
* you are responsible for the result!
* try to stay out of supply chain bugs
:::
## Summary on Reliability & Reproducibility
## Summary on Reliability & Reproducibility {.special}
* save the primary data needed to reproduce the argument of your scientific study
* go beyond FAIR data by creating user-friendly datasets and making them publicly available
* use trustworthy sources
......@@ -250,7 +261,7 @@ to work with the data
# Respect & Accountability
*What shall we credit and how?*
## Authorship versus Acknowledgements
## Authorship
:::fragment
> Authorship provides credit for a researcher's contributions to a study and carries accountability. [-- nature](https://www.nature.com/nature-portfolio/editorial-policies/authorship)
......@@ -258,15 +269,14 @@ to work with the data
> To protect the integrity of authorship, only persons who have significantly contributed to the research and paper preparation should be listed as authors. [-- ACP](https://publications.copernicus.org/for_authors/obligations_for_authors.html)
:::
## Authorship
*Which contributions qualify for authorship?*
## Which contributions qualify for authorship? {.speical}
## Authorship
:::leftalign
**Substantial contributions to**
* the conception or design of the work
* the acquisition, analysis, or interpretation of data for the work
* drafting the work or reviewing it critically for important intellectual content
:::
:::fragment
......@@ -274,11 +284,16 @@ to work with the data
:::
:::notes
* examples: phd-supervisor, technical assistance (setting up Python, redesigning data, data papers?), financial / administrative support, hierarchical / power positions
* approval of the published work
* Agreement to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.
* includes being accountable for data / code!
:::
## Integrity of Authorship
* Credit for knowledge contribution should be accurately and fairly attributed.
* adding gift or honorary authorship diminishes the actual contributions of others and thus, is unfair.
## Acknowledgements
:::leftalign
......@@ -287,7 +302,7 @@ Typically used to acknowledge
* acquisition of funding
* data sources / providers
* software
* general supervision of a research group or general administrative support
* administrative support
* writing or coding assistance
:::
:::notes
......@@ -337,9 +352,14 @@ Two main legal concepts relevant for software:
* **Copyleft / "Share-alike" licenses** ensure that any derivative work of a software adopts the same licensing type, typically an open-source license (e.g. GPL, LGPL)
* **Proprietary / non-reuse licenses** restricts users from accessing, modifying, and redistributing the software.
:::info
[Further reading](https://osssoftware.org/blog/open-source-software-licenses-explained-a-beginners-overview/)
:::
:::notes
* if you upload code to GitHub without stating a license, in principle, nobody is allowed to use it
* GNU General Public License (GPL)-licensed code cannot be integrated with proprietary closed-source code. mostly used for Linux software and for ICON! :)
* ICON has BSD
* GNU General Public License (GPL)-licensed code cannot be integrated with proprietary closed-source code. mostly used for Linux software
* GPL, AGPL - require sharing of modified source code; LGPL allows linking with proprietary code
* changing licenses can be a huge challenge.
* Unlike proprietary software that places restrictions on usage and distribution, open source software guarantees end users the freedom to use, modify, and share the software.
......@@ -357,36 +377,42 @@ Two main legal concepts relevant for software:
## Is AI-generated work protected by copyright? {.special}
:::fragment
Many countries do not put AI-generated work under copyright. But the status is unclear.
*Many countries do not put AI-generated work under copyright. But the status is unclear.*
:::
## Summary on Authorship and Credit
## Summary on Authorship and Credit {.special}
* Authors have made substantial contributions to the published work and agree to be accountable for these.
* Respect intellectual property of others and communicate a suitable license for your own work.
# Good Coding Practice
## Good Coding Practice
* clean code: easier to understand for any reviewer and most important, fewer lines of code -> fewer bugs
* efficient code
* **clean code**: easy to understand for any reviewer, fewer lines of code -> fewer bugs
* **efficient code**
* use math if you can, else, keep the order of complexity of your code in mind and check whether it behaves as you'd expect
* use parallel processing if you can pinpoint the performance bottleneck to a task that can be split
## Good Coding Practice
* understandable code: code should be clear in itself, but also accompanied by a statement of its purpose and proper usage (documentation).
* trustworthy code: testing and code review
* tracable code changes: version control ensures a tracable record of code changes, it serves as a backup and is indispensible in any collaborative code development.
* **understandable code**: self-explanatory code, documenting intent and usage
* **trustworthy code**: testing and code review
* **traceable code changes**: version control ensures a tracable record of code changes, it serves as a backup.
# Summary
*stay up to date with coding trends and libraries*
:::notes
* be open and continue learning: new technologies typically improve your productivity :)
:::
# Summary {.speical}
Good Scientific Practice ensures research integrity and the advancement of knowledge.
:::notes
* Use trustworthy sources
* understand your code
* communicate the license
* give credit to contributors
* respect intellectual property (IP)
* stay up to date with coding trends and libraries
* be open and continue learning: new technologies typically improve your productivity :)
:::
# Disclaimer
*This lecture was designed with the help of the Large Language Model OpenAI GPT-4.*
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment