# Chapter 2

Linux programs, irrespective of whether they are compiled, Bash scripts, or scripts written in another language, receive several inputs and send several outputs. As shown in the previous chapter, programs receive **command-line arguments** and inherit **environment variables** from their parent processes. Aside from reading input files, they can also read from the **standard input**, or `stdin`. Additionally, they can also receive signals such as *terminate* (from `kill`) or *interrupt* (from `C-c`). For outputs, programs report their exit status (by updating `$?` upon exit), write to output files, and print to **standard output**, or `stdout`. Additionally, programs can also print to **standard error**, or `stderr`.

Standard input is what the user enters at the terminal when a program prompts for user input. Standard output is what is output to and displayed on the terminal. Standard error is also displayed on the terminal by default, but this data stream is different from standard output.

We will use the following utility programs in this chapter:

| Command | Use                                               |
| ------- | ------------------------------------------------- |
| `head`  | Prints the first part of files.                   |
| `tail`  | Prints the last part of files.                    |
| `wc`    | Counts lines, words, and bytes.                   |
| `tr`    | Translates characters.                            |
| `sort`  | Sorts lines of text.                              |
| `shuf`  | Shuffles lines of text.                           |
| `diff`  | Finds differences between two files line by line. |
| `uniq`  | Reports unique lines in a file.                   |
| `cut`   | Cuts out a column of a table.                     |
| `paste` | Pastes lines from files side by side.             |
| `join`  | Joins two tables on a common field.               |
| `sed`   | Performs basic text transformations.              |

## Note on MacOS / OpenBSD

There is a common misconception that MacOS is Linux. It is not. Instead, MacOS came from OpenBSD. Both Linux and OpenBSD sought to emulate the Unix operating system, but they do have important differences, particularly in the utility programs. Therefore, if you are using a MacOS, you will want to install the Linux utility programs (`coreutils`) using the installer `brew`. If you run into problems, make sure you are using the Linux version of the programs.

## Boilerplate

Let's start with a short template for writing your programs. Just a few lines of set up will make your life *so much* easier down the road.

```bash
#!/bin/bash
# Description of program

set -euo pipefail
IFS=$'\n\t'
```

The first line is the `shebang` line that you've seen before. Recall that it simply tells Linux how to execute this script: using the `/bin/bash` interpreter. This line is important because the default interpreter is `sh`, which may refer to different Shell script interpreters on different systems, and by specifying this line, we are certain that we will be using the correct interpreter.

The last two lines set up the so-called **strict mode** for Bash. It will make your debugging much easier so that you are hunting for the cause of some silent error. The `-e` flag ensures that the script stops on the first encountered error. The `-u` flag disallows reading from undefined variables, which helps prevent spelling mistakes in variable names. For example, suppose you defined variable `language=Bash`, but when you later try to use it, you misspelled it as `langage`. Under the `set -u` setting, `bash` will generate an error, because you are trying to read a variable whose value is still undefined: `langage`. Under the default mode, your script will proceed until some non-silent error occurs. Lastly, the `-o pipefail` flag prevents error in a series of pipes from being masked. We will discuss `IFS` in a later [chapter](https://github.com/djhshih/quick-bash/blob/master/TODO/README.md).

So, we now segue into one of Bash's best features: pipes.

## Help!

In the last chapter, we discussed how command line **arguments** allow you to change program settings and influence its behaviour. Command line arguments can take a short form (e.g. `-h`) or the long form (e.g. `--help`), where the former has one dash (`-`) and the latter has two (`--`).

Some programs support only short-form arguments, some support only long-form arguments, and some support both. Therefore, it is important to consult the help or manual documentation for the program.

All *core* Linux programs come with a **manual** entry, which can be viewed using the `man` command:

```bash
man head
```

Not all programs will have a manual entry, but most of them should have a help output, which you request using the `--help` or the `-h` argument. (Sadly, some programs have neither.) Programs such as `ps` may treat `-h` as something else, and other programs don't recognize `--help`. It would be nice if all software programmers follow the same standards, but eccentric programmers exist.

```bash
head -h
tail --help
```

Now, you should pause here and examine the help page of the utility programs in the above table.

We'll wait.

## Pipes

Now that you have some idea of what the utility programs do, we'll proceed to join the programs together with **pipes** in order to form a pipeline.

Suppose we want to get the 5th line of the following poem:

`sonnet104.txt`

```
To me, fair friend, you never can be old,
For as you were when first your eye I eyed,
Such seems your beauty still. Three winters cold
Have from the forests shook three summers’ pride,
Three beauteous springs to yellow autumn turned
In process of the seasons have I seen,
Three April perfumes in three hot Junes burned,
Since first I saw you fresh, which yet are green.
Ah, yet doth beauty, like a dial-hand,
Steal from his figure, and no pace perceived;
So your sweet hue, which methinks still doth stand,
Hath motion, and mine eye may be deceived:
For fear of which, hear this, thou age unbred:
Ere you were born was beauty’s summer dead.
```

One way is to set up a pipe using `head` and `tail`. First, we print the first 5 lines using `head`:

```bash
head -n 5 sonnet104.txt
```

Then, we can keep just the last line of this output by piping the results to `tail`, which in turn gives us what we want: the 5th line of the poem.

```bash
head -n 5 sonnet104.txt | tail -n 1
```

What the pipe `|` operator does is to connect the **stdout** (standard output stream) from the first program (`head`) to the **stdin** (standard input stream of the second program (`tail`). In other words, the output of `head` is provided as the input to `tail`. These two programs combined together form a **pipeline**.

For the second task, we would like to see how many times the word "you" appears in the poem. We can build this up one program at a time.

1. Print the text file to stdout.

   ```bash
   cat sonnet104.txt
   ```

   Now, we want to build on this first line. Instead of typing it again, we can simply press the up arrow key (or `C-p`) to get the previous command. We can then type the next part of the pipeline.
2. Put each word in its own line.

   ```bash
   cat sonnet104.txt |
   tr -cs A-Za-z '\n'
   ```

   Bash is generally fussy about line breaks. Normally, we would want to break a command into multiple lines by using the `\` operator in order to continue a command into the next line. In the case of the `|` operator without a right-hand side, Bash expects the next line to be a continuation.
3. Convert all uppercase letters to lower case.

   ```bash
   cat sonnet104.txt |
   tr -cs A-Za-z '\n' |
   tr A-Z a-z
   ```
4. Get lines containing just the word of interest.

   ```bash
   cat sonnet104.txt |
   tr -cs A-Za-z '\n' |
   tr A-Z a-z |
   grep '^you$'
   ```
5. Create a frequency table.

   ```bash
   cat sonnet104.txt |
   tr -cs A-Za-z '\n' |
   tr A-Z a-z |
   grep '^you$' |
   uniq -c
   ```

Now, we have our answer.

So far, we have only worked with a single stream of data: from the stdout of one program to the stdin of the next. Now, what if you want to output multiple streams of data that will be processed by downstream programs. For this, we'll want to give a name to each of the many pipes, so that we can refer to them unambiguously. This is where **named pipes** come in, but that is for another chapter.

## Summary

Here, we saw how to achieve a task by breaking it down into individual subtasks, and we combine these subtasks together using pipes in order to form a pipeline. By focusing on one subtask at a time, we can turn a long, complicated task into a series of easy subtasks. This is precisely what makes Bash so powerful!


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://djhshih.gitbook.io/quick-bash/ch02.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
