@@ -3,65 +3,96 @@ title: "Good scientific and coding practice"
author: "Bjorn Stevens and Theresa Mieslinger"
---
# Good coding practice
***consistency is the key***
# Good Scientific Practice
*Building trust in research. And in your own work.*
* clean code
* efficient code
* understandable code
* tracable code changes
## What is it and why should we care?
* Principles fomulated by the research community that define proper research behaviour with the aim to ensure a high quality, robustness and reproducibility of results (publications, data, code, software).
* How it relates to this course: it provides rules for building software, using own and other software/data and for communicate the usage of software/data.
## The pillars of Good Scientific Practice
::: {.incremental}
* **Reliability** in ensuring the quality of research, reflected in the design, methodology, analysis, and use of resources.
* **Honesty** in developing, undertaking, reviewing, reporting, and communicating research in a transparent, fair, full, and unbiased way.
* **Respect** for colleagues, research participants, research subjects, society, ecosystems, cultural heritage, and the environment.
* **Accountability** for the research from idea to publication, for its management and organisation, for training, supervision, and mentoring, and for its wider societal impacts.
:::
*copied from [European Code of Conduct for Research Integrity](https://allea.org/wp-content/uploads/2023/06/European-Code-of-Conduct-Revised-Edition-2023.pdf)*
## Style Guides
***code is read much more often than it is written***
## The pillars of Good Scientific Practice
Each programing language has it's own guidlines:
* Reliability / **Reproducibility**
* primary data
* data management and sharing
* Honesty
* **Respect & Accountability**
* authorship
* proper citation and referencing
* [PEP8 for Python](https://peps.python.org/pep-0008/)
* C++ does not have an official guidline, but [Stroustrup and Sutter](https://github.com/isocpp/CppCoreGuidelines/blob/master/CppCoreGuidelines.md) is a good starting point
# Reproducibility
## Clean Code
* delete unused code blocks and only keep as few code as necessary
* fewer lines of code -> fewer bugs (add source to this statement)
* BUT: optimization should be in balance with code readability. Convoluted statements are ok if they improve the code efficiency, but might need to be accompanied by some sort of documentation.
## What do we want to reproduce? {.special}
## Efficient Code
::: {.smaller}
A collection and repetition of statements from previous lectures
::: {.fragment}
*the argument*
:::
* use math if you can, else, keep the order of complexity of your code in mind and check whether it behaves as you'd expect (complexity lecture)
* reduce loops by vectorizing operations, e.g. in Python by using list comprehensions(?)
* efficient memory usage (memory?)
* use parallel processing if you can pinpoint the performance bottleneck to a task that can be split between multiple processors. (parallel programing?)
* ...
## What is needed to reproduce the argument? What do we need to save and how? {.special}
## Understandable Code / Documentation
code should be clear in itself, but also accompanied by a statement of its purpose and proper usage. Additional information could include input/output, author, or date information.
## What should we document? {.special}
* in-line documentation: docstrings are string literals written into your code (add example?)
* comments
* separate documentation: common format is a text file, e.g. README.txt, or a chapter in a linked documentation file or handbook
## FAIR data
* state the ideas and problems of FAIR data
## Version control for code changes
A version control software ensures a tracable record of code changes, it serves as a backup and is indispensible in any collaborative code development.
## Which tools shall we use?
* open source / development
* trustworthy sources
## Testing and Code Review
## Summary on Reproducibility
* save the primary data needed to reproduce the argument of your scientific study
## Staying up to date
* stay up to date with coding trends and libraries
* be open and continue learning: new technologies typically improve your productivity
# Respect & Accountability
*What shall we credit and how?*
## Authorship versus Acknowledgment
## Licenses
CC0 versus CC-BY
## Intellectual Property (IP)
## Using AI
## Summary on Authorship and Credit
# Good Coding Practice
## Good Coding Practice
* clean code: easier to understand for any reviewer and most important, fewer lines of code -> fewer bugs
* efficient code
* use math if you can, else, keep the order of complexity of your code in mind and check whether it behaves as you'd expect
* use parallel processing if you can pinpoint the performance bottleneck to a task that can be split
## Good Coding Practice
* understandable code: code should be clear in itself, but also accompanied by a statement of its purpose and proper usage (documentation).
* trustworthy code: testing and code review
* tracable code changes: version control ensures a tracable record of code changes, it serves as a backup and is indispensible in any collaborative code development.
# Summary
Good Scientific Practice ensures research integrity and the advancement of knowledge.
# Good scientific practice
* Use trustworthy sources
* understand your code
* communicate the license
* give credit to contributors
* respect intellectual property (IP)
* stay up to date with coding trends and libraries
* be open and continue learning: new technologies typically improve your productivity :)
## Licenses
CC0 versus CC-BY
## credit
# Disclaimer
*This lecture was designed with the help of the Large Language Model OpenAI GPT-4.*
## IP
# Further Reading
* [European Code of Conduct for Research Integrity](https://allea.org/wp-content/uploads/2023/06/European-Code-of-Conduct-Revised-Edition-2023.pdf)
* [DFG Guidlines for Safeguarding Good Research Practice. Code of Conduct](https://zenodo.org/records/6472827)