To begin

One of the first steps in Muscle alignments is kmer distance clustering and building a guide tree. But what value of ‘k’ is used for this?

./muscle --help

Muscle has been cited almost 23,000 times! The Muscle source code is “open” so you could go read the code and find out for yourself.

“Open”

What does it mean to create “open” works and why is that important?

In science reproducibility is essential to progress. Today, the volume and pace of scientific progress necessitates signficant transparency in the process. Publishing research that anyone can access is important. So is writing and distributing scientific software that is free and easily implemented.

Open != Free Open means transparency, but that does not mean that the research, document, or software is free to use in all cases.

Open Access publications:

Open Source programs in Bioinformatics:

In the field of Bioinformatics there has been a shift in thinking on how to publish code. It is now generally accepted that those who write code should publish it for research use free of charge. Furthermore, it is not generally enough to simply post a program where someone can download it. Rather the original code must be made publicly available for others to use, edit, and review.

The number one most common way to share code is on GitHub. See your instructor’s repository page. For example, the code used to generate the course materials can be found here. However, there are many other services that are used to distribute scientific research software (e.g., CRAN and Anaconda*)

*Technically Anaconda is not a repository, but accesses many open resources.

Scientific Publishing: Preprints

One way to rapidly publish new science and simultaneously ensure that there is free access to your work is through publishing Preprints. These are documents that are posted to public websites that archive the work in a way that is open and free. In biology the bioRxiv project is the primary repository for preprints.

Let’s have a look around https://www.biorxiv.org/collection/bioinformatics for recent preprints in bioinformatics.

WARNING: Preprints have NOT been peer reviewed for quality and soundness of science. Use information from preprints at your own risk. However, this can be where the cutting-edge methods and research are appearing.

The Course Project

Details can be found here

Homework:

Find a repository that your instructor created and try to figure out what it does. This will be varying degrees of difficult depending on how well the repository is documented. Submit answers to #git and collaborate with other students who pick the same repository to learn as much as possible about the code.