Log on Jupyter.

Note: Man pages may not be installed on the system; you can find the man page for bash online.

Setting up the stage.

  • Using the icons in your browser; create a new directory cmdline and in that directory another one called project
  • Open a terminal and change the directory to cmdline/project
  • On the command line, create the following directories data, scripts, output and tmp

Data files

  • Create a Python script (.py extension) that takes two arguments, a number \(n\) and a string \(S\), and outputs \(n\) lines with the same string \(S\)
  • Using a for loop, create data files named chr01.dat, chr02.dat, etc., up to chr22.dat. The file corresponding to chromosome \(k\) file must contain \(10 k + 2\) lines of data (hint: use bash’s “arithmetic expansion”)
  • On the command line, give a detailed list (ls -l) of the data files whose name contains the character “2” (hint: using globbing, aka, pathway expansion)

Data processing

  • Process chr01.dat using the process_md_script.py
  • Do that again but now redirect the standard output to a temporary file
  • How can you interpret the output?
  • Process each data file using the script.py and create two outputs each time: for instance, chr01.dat must be processed to give chr01.out and chr01.err

Documentation

  • Improve the script by including a docopt documentation; you may have to install it on the command line
pip install docopt==0.6.2