LETTER


Extraordinary Command Line: Basic Data Editing Tools for Biologists Dealing with Sequence Data



Magda Mielczarek1, 2, *
iD
, Bartosz Czech1
iD
, Jarosław Stańczyk1
iD
, Joanna Szyda1, 2
iD
, Bernt Guldbrandtsen3
iD

1 Biostatistics Group, Department of Genetics, Wroclaw University of Environmental and Life Sciences; Kozuchowska 7, 51-631Wroclaw, Poland
2 National Research Institute of Animal Production, Krakowska 1, 32-083 Balice, Poland
3 Department of Animal Science, University of Bonn, Endenicher Allee 15, 53115Bonn, Germany


© 2020 Mielczarek et al.

open-access license: This is an open access article distributed under the terms of the Creative Commons Attribution 4.0 International Public License (CC-BY 4.0), a copy of which is available at: https://creativecommons.org/licenses/by/4.0/legalcode. This license permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

* Address correspondence to this author at Biostatistics Group, Department of Genetics, Wroclaw University of Environmental and Life Sciences; Kozuchowska 7, 51-631 Wroclaw, Poland; E-mails: magda.mielczarek@upwr.edu.pl


Abstract

The command line is a standard way of using the Linux operating system. It contains many features essential for efficiently handling data editing and analysis processes. Therefore, it is very useful in bioinformatics applications. Commands allow for rapid manipulation of large ASCII files or very numerous files, making basic command line programming skills a critical component in modern life science research. The following article is not a guide to Linux commands. In this manuscript, in contrast to many various Linux manuals, we aim to present basic command line tools helpful in handling biological sequence data. This manuscript provides a collection of simple and popular hacks dedicated to users with very basic experience in the area of the Linux command line. It includes a description of data formats and examples of editing of four types of data formats popular in bioinformatics applications.

Keywords: Bash, Command line, Data manipulation, DNA, Linux, Sequence data.