Part 2 Manifesto for ROS Projects

TODO: Need to incorporate the GSAW or other acronym throughout the manifesto

Note: Reproducible and open scientific practices and tools are constantly evolving. The principles outlined in this manifesto are designed to be to be as timeless and generic as possible. However, specific recommendations on or incorporated individual tools, services, and workflows are prone to be modified and updated over time.

2.1 A constantly evolving but heavily opinionated and practical manifesto

We need a standard and consistent approach to doing modern science. While the requirements for quality science are changing rapidly, our approaches haven’t

Conducting science is increasingly getting more difficult. Datasets are rapidly growing in size and complexity. Research questions are more complicated and with them answers that are harder to address. Analytic techniques are increasingly more sophisticated to deal with answering the difficult questions and managing the complex data. On top of that, researchers, the public, and governing bodies are calling for greater transparency and rigor in scientific findings. For instance, the “irreproducibility crisis” has drawn substantial attention to the deep flaws in the current academic scientific establishment. The concepts of “open”, meaning that the scientific process is easily, freely, and publicly accessible, and of “reproducibility”, meaning that published scientific findings can be independently generated from the same data source, has become increasingly more important in this changing scientific landscape.

These new demands should herald in better approaches to doing science, such as greater training in computational aspects of research, data management, and dissemination of findings. While there are small pockets of change adapting to this new landscape, this however, is the exception and mainstream academic science continues as it has for decades. Academia still (obsessively) rewards publications as the currency for promotion, funding, and achievement. Since following open and reproducible scientific practices is presently extremely difficult and requires additional knowledge, expertise, time, and effort, there is currently little incentive to do these practices as that would reduce the effective number of publications produced in any given amount of time. Therefore, until the current obsession with publication numbers declines, efforts to simplify and make doing open and reproducible science (ROS) accessible and (relatively easily) acheivable are a way to increase adherence and acceptance of these practices. Even without the current incentive structure, simplifying the process for doing these “ROS” practices would regardless be beneficial to scientists given the high expectations placed on scientists already.

2.1.1 The benefits of doing reproducible and open science are substantial

There are many benefits to adopting ROS practices for research and scientific activities. Publishing findings under an open access license increases exposure to the public, both via media and direct download, and also increases the number of scientist that may end up benefitting from the findings. Being open with the data and the analysis code increases the transparency and reproducibility of the results and facilitates in assessing the validity of any claims made in the paper, improving the scientific rigor and strength of the study.

2.1.2 Considerable effort is currently required to practice reproducible and open science

An efficient automated and simplified workflow can help facilitate adhering to ROS practices. There presently exist many different tools and services that target one or more aspects of ROS practices or of components of a workflow. These tools and services are constantly evolving, where new ones are being developed, current ones are being updated at a rapid rate, or older ones are being deprecated. Maybe of these tools are built upon open source software. The difficulty for any given scientist is sifting through all these tools and choosing the appropriate ones and then subsequently staying updated on these tools. While learning these tools, services, and workflows can ultimately increase the effectiveness conducting solid and rigorous science that is openly distributed, a scientist must (in the current academic environment) be extremely motivated and driven to adhere to these ROS practices. For most other scientists, this approach is untenable given their other responsibilities and duties.

Given the breadth of excellent software packages available that assist in doing ROS practices, it is often not clear how to link these tools together in chains to form complete workflows. It is common to perform many of these tasks manually or string together individual tools on the go, which is both error-prone and time-consuming. Failing to integrate tools into effective workflows can hinder research effectiveness and the lack of a clear indication on how tools can link together leads to large amounts of time lost searching for an effective workflow.

2.1.3 Two problems: Few opinionated workflow tools and lack of comprehensive documentation

There seems to be two main problems with this lack of integration and uptake of doing ROS. One, there are not many opinionated workflow tools that try to automate and simplify many aspects of ROS. Two, the documentation on many of these ROS tools and services is often incomplete, not comprehensive enough, or not effectively targetted to the end user who is likely completely unfamiliar with many of the ROS terms and concepts. There are other reasons for non-adherence to ROS practices, such as the aforementioned lack of incentive structures. However, these are massive systemic problems that can only be addressed with massive collective action.

A result of these problems leads to many researchers conforming to what their colleagues use or what may work in the short-term in order to meet an impending deadline, even if it leads to more work in the long-term. Those who do try to learn tools often do so in isolation and without much formal guidance, trying to come up with their own workflows. These inefficiencies can be major barriers to conducting open science and can make closed source solutions appear superior as vendors aggressively market these integrations between tools within their product suites. This ends up moving further away from open science principles, as any closed source product by definition is not open.

What is then needed to fix some of these problems is an opinionated toolkit on doing ROS.

2.2 Underlying philosophy

Our philosophy is to encourage reproducible and open scientific practices by automating and streamlining many aspects of a ROS project and by providing an opinionated view on which tools, services, and workflows to use when doing research. The goal is to reduce the burden on researchers and lower the barrier to doing open and reproducible science by creating a Generalized and Structured Analytical Workflow (GSAW).

For now, we are focusing on typical scientific activities such as creating abstracts, slides, posters, and manuscripts. We aim to incorporate creating research software packages, teaching modules, and other scholarly activities into our guiding principles.

Likewise, we currently are targeting biomedical, medical, and health researchers, though this toolkit could easily be adopted by other disciplines. This manifesto is not an exhaustive guide to all possible tool combinations that could yield effective workflows. We intentionally keep things heavily opinionated by design to provide clear and succinct recommendations.

2.2.1 Guiding principles on tool and service use

Since the practicalities of practicing ROS are constantly evolving, any tools or services we recommend and incorporate into our ROS framework may and will likely change over time. So, any present or future recommended tool or service must adhere to these guiding principles:

  • Tools should be open source and services should preferably be from not-for-profit (or at least have a strong history of supporting open source and open science activities)
  • Should be actively developed and well-maintained
  • Should have well-developed documentation, resources, and learning material
  • The company, organization, or community responsible for the tools or services should be ethical, have strong principles in favour of openness, and be a strong advocate and supporter of fairness and equity

When a tool and/or service is mostly equal, consider that:

  • The design focuses and emphasizes simplicity, useability, and accessibility
  • It is already widely used and accepted within the ROS community
  • Has a system to allow easy programmatic access (e.g. has a public API)

2.2.2 Guiding principles on workflow and processes

Likewise, for the analysis and workflow (the GSAW) aspect of ROS, we follow these guiding principles:

  • Favour readability over concision
  • Favour well-established infrastructures and approaches
  • Be internally consistent in filenaming, code style syntax, and language
  • Consider and abide by privacy rules and laws (e.g. GDPR in Europe)
  • Use and adhere to existing checklists (e.g. STROBE in epidemiology)
  • Favour approaches that explicitly show steps taken from data to final publication form

TODO: Add more items?

2.2.3 Distinct stages of ROS adherence

We don’t expect researchers just becoming familiar with ROS to immediately have their work fully reproducible and completely adhering to all open scientific principles. We also understand that certain research projects are limited to some and not all aspects of ROS (e.g. working with sensitive personal data or running computationally time-consuming analyses). So, we have several “stages” in mind when creating and writing our recommendations, tools, and documentation.

  1. Beginner stage: For those new to ROS principles, who are a beginning or early career researcher, or who are unfamiliar with and just learning about modern approaches to data analyses (e.g. using a programming language like R)
    1. Wants to be more reproducible but can’t or is unable to be open
    2. Wants to be more open but can’t or is unable to be reproducible
  2. Advanced stage: For those who may already be familiar with ROS principles and/or who are comfortable with computational or programming terminology and concepts
    1. Wants to be more reproducible but can’t or is unable to be open
    2. Wants to be more open but can’t or is unable to be reproducible

Ideally, those who start at the “Beginner” stage eventually work their way to being in the “Advanced” stage.

2.2.4 Phases of a research project

To help navigate the recommendations and steps for a GSAW-ROS project, phases of a research project are split into:

  • Project management throughout (specifically regarding files, folders)
  • Literature review (note: we currently don’t cover this phase)
  • Data collection and storage (note: we currently don’t cover this phase)
  • Data analysis
  • Writing
  • Dissemination

All current and future tools, services, and workflows incorporated into a GSAW-ROS project template must be based on these guiding principles and considerations.

TODO: Include guiding principles for creating teaching material