Chapter 2
Linux programs, irrespective of whether they are compiled, Bash scripts, or scripts written in another language, receive several inputs and send several outputs. As shown in the previous chapter, programs receive command-line arguments and inherit environment variables from their parent processes. Aside from reading input files, they can also read from the standard input, or stdin
. Additionally, they can also receive signals such as terminate (from kill
) or interrupt (from C-c
). For outputs, programs report their exit status (by updating $?
upon exit), write to output files, and print to standard output, or stdout
. Additionally, programs can also print to standard error, or stderr
.
Standard input is what the user enters at the terminal when a program prompts for user input. Standard output is what is output to and displayed on the terminal. Standard error is also displayed on the terminal by default, but this data stream is different from standard output.
We will use the following utility programs in this chapter:
head
Prints the first part of files.
tail
Prints the last part of files.
wc
Counts lines, words, and bytes.
tr
Translates characters.
sort
Sorts lines of text.
shuf
Shuffles lines of text.
diff
Finds differences between two files line by line.
uniq
Reports unique lines in a file.
cut
Cuts out a column of a table.
paste
Pastes lines from files side by side.
join
Joins two tables on a common field.
sed
Performs basic text transformations.
Note on MacOS / OpenBSD
There is a common misconception that MacOS is Linux. It is not. Instead, MacOS came from OpenBSD. Both Linux and OpenBSD sought to emulate the Unix operating system, but they do have important differences, particularly in the utility programs. Therefore, if you are using a MacOS, you will want to install the Linux utility programs (coreutils
) using the installer brew
. If you run into problems, make sure you are using the Linux version of the programs.
Boilerplate
Let's start with a short template for writing your programs. Just a few lines of set up will make your life so much easier down the road.
The first line is the shebang
line that you've seen before. Recall that it simply tells Linux how to execute this script: using the /bin/bash
interpreter. This line is important because the default interpreter is sh
, which may refer to different Shell script interpreters on different systems, and by specifying this line, we are certain that we will be using the correct interpreter.
The last two lines set up the so-called strict mode for Bash. It will make your debugging much easier so that you are hunting for the cause of some silent error. The -e
flag ensures that the script stops on the first encountered error. The -u
flag disallows reading from undefined variables, which helps prevent spelling mistakes in variable names. For example, suppose you defined variable language=Bash
, but when you later try to use it, you misspelled it as langage
. Under the set -u
setting, bash
will generate an error, because you are trying to read a variable whose value is still undefined: langage
. Under the default mode, your script will proceed until some non-silent error occurs. Lastly, the -o pipefail
flag prevents error in a series of pipes from being masked. We will discuss IFS
in a later chapter.
So, we now segue into one of Bash's best features: pipes.
Help!
In the last chapter, we discussed how command line arguments allow you to change program settings and influence its behaviour. Command line arguments can take a short form (e.g. -h
) or the long form (e.g. --help
), where the former has one dash (-
) and the latter has two (--
).
Some programs support only short-form arguments, some support only long-form arguments, and some support both. Therefore, it is important to consult the help or manual documentation for the program.
All core Linux programs come with a manual entry, which can be viewed using the man
command:
Not all programs will have a manual entry, but most of them should have a help output, which you request using the --help
or the -h
argument. (Sadly, some programs have neither.) Programs such as ps
may treat -h
as something else, and other programs don't recognize --help
. It would be nice if all software programmers follow the same standards, but eccentric programmers exist.
Now, you should pause here and examine the help page of the utility programs in the above table.
We'll wait.
Pipes
Now that you have some idea of what the utility programs do, we'll proceed to join the programs together with pipes in order to form a pipeline.
Suppose we want to get the 5th line of the following poem:
sonnet104.txt
One way is to set up a pipe using head
and tail
. First, we print the first 5 lines using head
:
Then, we can keep just the last line of this output by piping the results to tail
, which in turn gives us what we want: the 5th line of the poem.
What the pipe |
operator does is to connect the stdout (standard output stream) from the first program (head
) to the stdin (standard input stream of the second program (tail
). In other words, the output of head
is provided as the input to tail
. These two programs combined together form a pipeline.
For the second task, we would like to see how many times the word "you" appears in the poem. We can build this up one program at a time.
Print the text file to stdout.
Now, we want to build on this first line. Instead of typing it again, we can simply press the up arrow key (or
C-p
) to get the previous command. We can then type the next part of the pipeline.Put each word in its own line.
Bash is generally fussy about line breaks. Normally, we would want to break a command into multiple lines by using the
\
operator in order to continue a command into the next line. In the case of the|
operator without a right-hand side, Bash expects the next line to be a continuation.Convert all uppercase letters to lower case.
Get lines containing just the word of interest.
Create a frequency table.
Now, we have our answer.
So far, we have only worked with a single stream of data: from the stdout of one program to the stdin of the next. Now, what if you want to output multiple streams of data that will be processed by downstream programs. For this, we'll want to give a name to each of the many pipes, so that we can refer to them unambiguously. This is where named pipes come in, but that is for another chapter.
Summary
Here, we saw how to achieve a task by breaking it down into individual subtasks, and we combine these subtasks together using pipes in order to form a pipeline. By focusing on one subtask at a time, we can turn a long, complicated task into a series of easy subtasks. This is precisely what makes Bash so powerful!
Last updated