Log on Jupyter.
Note: Man pages may not be installed on the system; you can find the man page for bash online.
Setting up the stage.
- Using the icons in your browser; create a new directory
cmdline
and in that directory another one calledproject
- Open a terminal and change the directory to cmdline/project
- On the command line, create the following directories
data
,scripts
,output
andtmp
Data files
- Create a Python script (.py extension) that takes two arguments, a number \(n\) and a string \(S\), and outputs \(n\) lines with the same string \(S\)
- Using a for loop, create data files named chr01.dat, chr02.dat, etc., up to chr22.dat. The file corresponding to chromosome \(k\) file must contain \(10 k + 2\) lines of data (hint: use bash’s “arithmetic expansion”)
- On the command line, give a detailed list (
ls -l
) of the data files whose name contains the character “2” (hint: using globbing, aka, pathway expansion)
Data processing
- Process chr01.dat using the process_md_script.py
- Do that again but now redirect the standard output to a temporary file
- How can you interpret the output?
- Process each data file using the script.py and create two outputs each time: for instance, chr01.dat must be processed to give chr01.out and chr01.err
Documentation
- Improve the script by including a docopt documentation; you may have to install it on the command line
pip install docopt==0.6.2