Chapter 1

This chapter will introduce you to the shell environment from both a user's and a programmer's perspective. The Portable Operating System Interface (POSIX) standards specify how programs should implement their interface so that they are portable across Unix-like platforms. Bash follows these standards and augments them with additional features.

A shell is a user-interface to an operating system's services, and this guide is about the command-line interface for the shell. We run a terminal program in order to use this interface. In turn, the terminal invokes a shell program, which is typically bash. So, launch a terminal for your platform, and let's get started!

Hopefully, your terminal now eagerly awaits your command, and it is running a shell program, which is probably bash. You can verify this by

ps -p $$

After you typed the above command and pressed enter, the command should print something similar to the output below.

  PID TTY          TIME CMD
25719 pts/0    00:00:00 bash

Your output will be slightly different, but the important part is that the CMD column shows bash. My output indicates that I am using an instance of bash with the process id of 25719 (yours will very likely be different). Every process on Linux is assigned a process ID so that they may be uniquely identified by commands such as kill, which can terminate a launched process.

Note that most commands introduced hitherto should be similarly typed into the terminal following the prompt string (a short piece of text that typically ends with the $ character). If the command-line prompt string is ever missing, it means that your current command is incomplete. Unless you know your command spans multiple lines, your command may contain a syntax error. In this case, you should press C-c (hold Ctrl key and press c) to interrupt the current command and return to the command-line prompt so that you may start over from the last successfully executed command.

Commands are interpreted by bash and executed sequentially. Bash commands follow the format

program arg1 arg2 arg3 ...

Our command begins with the name of the program (ps), followed by the arguments (-p $$). (An argument is a value passed to a program; it can pretty much be anything and does not need to be persuasive.) The arguments are split by whitespace characters (i.e. space, tab, and new-line) and sent as separate values to the program. The program will receive -p as the first argument and $$ as the second argument. Typically, Linux programs are written so that arguments begining with the - character are interpreted as parameter names, which may consume one or more additional arguments. In our example, $$ is the argument to the -p parameter; that is, the program will set the -p parameter to the value of $$.

In general, words that begin with $ are variable names, and Bash will expand variable names into their current values. The $$ variable is expanded into 25719 in the above example and the ps program will actually receive -p and 25719 as command-line arguments (not -p and $$).

The above commands should work if you are using a shell that follows the POSIX standards. Since this guide is strictly about Bash, if the ps -p $$ command outputted a shell other than bash, then invoke bash and enter the Bash environment by simplying entering

bash

This command starts a new instance of bash and enters its environment. When you wish to return to your original shell environment, type

exit

Bash script

Commands can be typed into bash, but they can also be stored in a text file known as a script, which can be run upon request. By convention, shell scripts have the extension sh.

Open a file in the current directory using your text editor (e.g. nano, vim, emacs), and enter the following text

echo "Hello, $1."

Save the file to disk. Then, run the script using bash by

bash hello.sh Brian Fox

In the above command, bash took hello.sh, Brian, and Fox as arguments and runs the hello.sh script with arguments Brian and Fox. The hello.sh script contains a line of command that runs the echo program (which simply outputs a given text) with the argument "Hello, $1.". This argument is converted by Bash into Hello, Brian., and it is passed to echo for output to the terminal.

The $1 variable refers to the first argument. Accordingly, $2 refers to the second, $3 refers to the third, and so on. Since $2 is not used, the second argument (Fox) does not appear in the output. In order to send Brian Fox as one argument, we must surround the text using quotes:

bash hello.sh "Brian Fox"

The quotes preserve the space characters and protect the argument Brian Fox from being split into two arguments.

The current script will work in any POSIX-compliant shell, but we can be (and should be) more specific in how the script is to be run by including a shebang line (which begins with #! in the script:

#!/bin/bash

echo "Hello, $1."

Additionally, we need to make the script executable by enabling the x flag for the file using the chmod (change file modes) program:

chmod +x hello.sh

Then, we can execute (run) the script by

./hello.sh "Brian Fox"

The shebang line in the script instructs the shell to run the script using the executable at /bin/bash, the standard location for bash on most Linux platforms. Effectively, the script is executed by the following command

/bin/bash hello.sh "Brian Fox"

In the first command ./hello.sh, the . is necessary so that the shell knows that the hello.sh script is found in the current directory (.). The /bin/bash in the second command explicitly specifies the path to the bash executable (in the event that multiple instances of bash are installed on the machine. In our casual use of the script earlier, we only specified bash instead of the full path, because executables are installed in standard locations on Linux platforms, and the shell searches standard locations for the first instance of bash.

To see where your default bash is stored, you can use the which program:

which bash

Of course, hello.sh can still be executed by invoking bash hello.sh instead of ./hello.sh regardless of whether the shebang line is present or not. Nonetheless, the shebang line also serves as documentation for how the script is meant to be run.

Parameters

When programs are called, the user may supply arguments to parameters in order to control the behaviour of the program. (Parameters are the variables that can be set in a program; arguments are values that a user supplies to the program.) What parameters and how these parameters are used by the program ultimately depends on the author of the program. Fortunately, Linux programs tend to specify parameters in a similar pattern.

program [--opt_arg0] [--opt_arg1 X] [--opt_arg2 Y Z] <pos_arg1> <pos_arg2>

This template command is known as a usage statement for the program.

In the above usage statements, pos_arg1 and pos_arg2 are positional parameters, which are generally mandatory and passed arguments are assigned in order based on position. These parameters are often enclosed in angular brackets (inequality signs) <>. So, the command

program A B

will set the parameter pos_arg1 to value A and the parameter pos_arg1 to value B. Notice that the parameter names are not specified in the above command; rather, the program understands that the first specified value is for the first positional parameter and the second specified value is for the second positional parameter.

In contrast, optional parameters are not mandatory, and they are usually prefixed by one or two dashes - and possibly enclosed by square brackets []. If the user wishes to specify a value for an optional parameter, the user must refer to the parameter by name. For example,

program --opt_arg1 C A B

will set opt_arg1 to C, and the positional parameters pos_arg1 and pos_arg2 to the same values as before. Some programs will require that optional parameters be specified before position parameters. That is,

program A B --opt_arg1 C

may not be parsed correctly. Some programs also support the use of -- to indicate the end of optional parameters, as in

program --opt_arg1 C -- A B

These parameter descriptions are simply guidelines that many Linux programs follow. Users should also consult the help message or the manual entry for specific instructions for the program.

It would be helpful to users if you follow these guidelines when you design the command-line interface to your program. bash does not automatically parse command line arguments: it simply splits into words as delimited by whitespace, as discussed in the ps -p $$ example earlier. For more complete command line parsing, we can use the getopts command that is built into Bash. Alternatively, we can use newer programs such as docoptp, or parse the arguments using Bash.

Job Control

Inside the Bash environment, commands are executed to run various programs to achieve a task. Each running program is also called a job, or more formally, a process. A running job can be terminated by pressing the Ctrl key together with c (refered to as C-c).

The C-z combination suspends a job and sends it into the background (but does not terminate it) so that the user can continue to interact with the Bash command-line and do additional work in parallel to the background job. To resume the job previously sent to the background while keeping it in the background, run bg. To resume a background job and bring it to the foreground (current command-line prompt), run fg.

If a command is executed with & at the end, the job is launched in the background. In other words, the & character has a similar effect as pressing C-z on a job and resuming it in the background with bg.

bash -c 'sleep 1; echo "Woke up."' &

Here, a new instance of bash is launched with the -c argument, which takes a command to execute: sleep 1; echo "Woke up.". This command sleeps for 1 second, and prints Woke up to the terminal. After running this command, the user is immediately returned to the command-line prompt and allowed to issue more commands. After 1 second, the background running job will complete and print Woke up. to the same terminal. If the & were omitted, the user would have to wait for the command to complete before running additional commands.

In contrast, the C-d combination terminates input to the program, for programs that expect additional user inputs, such as cat (which stands for concatenation). For example, cat does not terminate after the user calls it, because cat waits for additional input and will continue to wait unless it receives the C-d signal. Type

cat > rose.txt
What's in a name? that which we call a rose
By any other name would smell as sweet.

and press C-d. Thereafter, cat > rose.txt outputs your text input to a file named rose.txt. If you were to press C-c to terminate the program, the output file may be incomplete, as the cat process will be interrupted and terminated immediately.

By default when you launch a terminal, you enter a shell environment (e.g. Bash) and start in the home directory of the user (typically /home/brian for a user named brian on a Linux platform. To navigate to other directories, we use the cd (change directory) command:

cd dir/subdir

The argument following cd specifies the path of the new directory dir/subdir, which must already exist. Directories inside directories are separated by the slash character / on Linux platforms. cd recognizes special paths ., and .., and -. The . path refers to the current directory, .. refers to the parent directory, and - refers to the previous directory. To print the current directory, we use pwd (print working directory). Without any argument cd returns the user to the home directory.

To print a list of directories and files within a directory, we use ls, and ls -l for a more verbose output that indicates permission, ownership, time stamp, file size, and other detals.

The table below lists additional commands

Command
Use

cd

Change working directory.

pwd

Print working directory.

ls

List directory contents.

mkdir

Make directory.

rmdir

Remove directory only if it is empty.

touch

Touch a file, creating it if it does not exist.

rm

Remove files or directories.

mktemp

Create a temporary file.

mkfifo

Create a first-in-first-out file (named pipe).

ln -s

Create a symbolic link to a file or directory.

mv

Move files or directories.

cp

Copy files or directories.

Variables

In order to adapt to changing circumstances, Bash scripts use variables to hold information that is provided by the user and is subject to change. Variables are used to store values temporarily, and values they store can be changed subsequently. The $1 variable is an example of a special variable, which is not defined explicitly by the programmer. Aside from $1, $2, $3, ..., we have seen the $$ special variable, which holds the process id of the current running program (e.g. bash). Below are a few more useful special variables:

Variable
Value

$$

ID of current process

$!

ID of the most recently submitted background process

$0

Name of current process

$1 $2 ...

First, second, ... arguments

$#

Total number (#) of arguments

$@

All arguments as an array (starting with $1)

$?

Exit status of the most recently executed foreground program

A number of shell variables are also pre-defined by your platform or previously in the current Bash session. You can print all currently set (defined) shell variables by calling set without any arguments:

set

In addition, several environment variables are also pre-defined. Environment variables are always inherited by the new children processes spawned by the current shell program, whereas shell variables are not inherited by children processes. To see a list of currently defined environment variables, call env without arguments:

env

Among the environment variables, we will point out $PATH and $LD_LIBRARY_PATH, because they are particularly important for Bash to find executables and shared libraries. $PATH contains a list of paths that point to possible storage locations for executables, separated by the colon character :. On most Linux platforms, /usr/bin should be one of these possible locations. Bash searches for programs sequentially in the order the paths appear in the $PATH. The $LD_LIBRARY_PATH environment variable contains locations where shared libraries are stored. Furthermore, the $HOME environmental variable points to the home directory of the current user.

In Bash, you can define your own shell variables by using the following syntax:

author_name="Brian Fox"

Here, the author_name variable is defined and holds the value Brian Fox. Variable names cannot contain spaces or special characters. Generally, you should only use letters, numbers, and underscore for variable names. Importantly, no space surrounding = is allowed. Inserting a space after author_name will cause Bash to interpret author_name as a program name, since it is an isolated word at the beginning of a command.

We access the value of a variable by using the $ prefix in front of the variable name. By doing so, variable expansion occurs: the variable reference is expanded to the value contained in the variable.

echo $author_name

If we omit the $ prefix, we are simply refering to the text author_name and not the value inside the $author_name variable.

echo author_name

If we were to run another instance of bash (by calling bash), we would not be able to access shell variables from the current bash session.

bash -c 'echo $author_name'

In this command, bash -c starts a new instance of bash, executes the specified command, and returns to the original shell environment following the execution of the command. The specified command echo $author_name is executed within the new instance of bash, and it will print nothing because $author_name is not defined for this bash instance.

In order for new bash instances (subprocesses) to have access to a shell variable, the latter must be exported into the list of environment variables:

export author_name

This command exports the shell variable with the name author_name into the environment, so subprocesses, such as another instance of Bash, can access this variable. (Notice that we did not use the $ prefix, because we are literally specifying the name of the variable: no variable expansion should occur.)

Now the command

bash -c 'echo $author_name'

should print Brian Fox.

To recap, $author_name is expanded to the value of the $author_name variable (Brian Fox) if it is defined in the current Bash session. The $author_name variable is defined in the current Bash session, but not in a new Bash session, until the $author_name has been exported as an environment variable.

The astute reader will notice that we used the single quote ' instead of the double quote " to pass echo $author_name to a new bash instance. This choice was intentional, as we wanted to pass the text echo $author_name literally, without variable expansion, to a new bash instance in order to illustrate the need to export variables.

Suppose we use double quotes in the argument to bash -c,

animal=Penguin
bash -c "echo $animal"

Then, Penguin is printed despite that the $animal is not defined in the new instance of bash. We observe this behaviour, because double quotes " permit variable expansion by the current bash instance while single quotes ' do not. Therefore, the "echo $animal" argument is expanded to echo Penguin, which is then passed to the new instance of bash, which receives -c and echo Penguin as arguments. The new instance of bash no longer needs to know about $animal, because it receives the substituted value.

The name=value syntax defines a shell variable in the current bash session, but it can also be used to define variables in a new environemnt if the syntax is directly followed by a command:

food=Bamboo bash -c 'echo $food'

If food=Bamboo and bash -c were on separate lines, the $food variable would be defined for the current bash instance and not for the new bash instance, as discussed earlier. Since food=Bamboo and bash -c are part of the same command, however, the $food variable is actually directly defined in the environment of the new bash instance (and not in the current bash).

Therefore, the new bash instance receives echo $food as arguments (without variable expansion), and it would expand the variable $food into the text value Bamboo, because it received the environment food=Bamboo. Subsequently, the new bash instance would print Bamboo and exit. In contrast, the current bash instance will not have access to $food. That is, the command

echo $food

now prints nothing to the terminal, because the $food variable is no longer defined. Suppose we now use double quotes instead of single quotes to permit variable expansion before invoking bash -c, we similarly receive no output, because the $food is not being defined in the current bash instance, where variable expansion now takes place:

food=Bamboo bash -c "echo $food"

In summary, the syntax for executing a command is

var1=a var2=b var3=c ... program arg1 arg2 arg3 ...

Bash initialization files

When you first enter the shell environment in a terminal, you are working within an interactive, login shell. If you invoke bash at the command-line prompt without arguments, you are creating a subprocess: you are starting another instance of bash. This time, you are using an interactive, non-login bash instance, unless you invoke bash --login. You may exit this subprocess by invoking exit. If you run bash -c and supply an argument containing a command, you are starting a non-interactive, non-login bash instance, which will exit when the command completes. You can also invoke bash script.sh, which runs script.sh using a bash subprocess. Similarly, ./script.sh also runs script.sh in a subprocess as specified by the shebang line. To emphasize, no shell variables are inherited by subprocesses, but the environment variables are inherited by subprocesses, as illustrated with the earlier examples. Shell variables become environment variables when they are exported, and they are in turn inherited by subprocesses.

So what about an independent instance of bash that is invoked in another terminal? A bash instance in one terminal is not a subprocess of the bash instance in another terminal; they are completely independent, and they cannot send variables to each other. In order for both bash instances to have access to a common set of variables, you may define environment variables in initialization script files.

Most commonly, you would define variables in the ~/.bashrc file (where ~ is a short-hand for $HOME). The ~/.bashrc script is run whenever a non-interactive bash instance is started (e.g. bash -c, bash script.sh, ./script.sh). If bash is started interactively (e.g. by using bash --login or bash -l, the ~/.bash_profile initialization script will be run.

Bash also runs the global /etc/profile initialization script in addition to the personal ~/.bash_profile. Furthermore, some platforms may store additional global initializations script /etc/profile.d/ that are run by the /etc/profile script. These script files are only run in an interactive, login bash instance (e.g. first invoked bash inside a terminal and bash --login). The idea is that these profile scripts are sourced once during login, and each bash subprocess sources ~/.bashrc.

On many systems, the ~/.bash_profile actually sources ~/.bashrc so that commands in ~/.bashrc will eventually be executed. (You should verify that your ~/.bash_profile contains a line with . ~/.bashrc, source ~/.bashrc, or something equivalent. Above all, it is important to export all variables intended for global use in the ~/.bashrc initialization script.

Suppose you wish to define your own initialization script that defines non-exported variables for certain tasks. You can create some script file named initialize.sh stored at path and run it within the current bash instance by:

. path/initialize.sh

The space after . is critical, as this . does not refer to the current directory; it instructs bash to run the script in the current instance instead of a subprocess. If you run the script by ./path/initialize.sh or bash path/initialize.sh, you would define the variables in a subprocess, and these variables will not be accessible to the current bash instance. The export command will not help, since only subprocesses inherit environment variables: an environment variable in the subprocess will not be accessible to the parent bash instance. Executing initialize.sh with . is equivalent to copying over the content of initialize.sh and running them in the current bash instance. This command is also known as source, so the below command does exactly the same:

source path/initialize.sh

In fact, you may notice that the standard initialize script files (e.g. /etc/profile) often source other script files using . or source.

Summary

So what's with all the fuss about who inherits what and who has access to what? This chapter illustrated an important concept in modern programming: compartmentalization. Information stored in variables are compartmentalized so that one bash instance does not necessarily have access to all the information; an effective programmer will control the scope of each variable appropriately so that each process has access to the right information at the right time. Indeed, if all variables are shared among all processes, and each process can modify each variable, then it would be increasingly difficult to predict variable content and program behaviour as the size and complexity of the program grows. Therefore, it is important to understand how information flows from external environments to the current program (via command-line arguments), from parent bash instances to child instances (via environment variables), as well as how some information can be propagated among independent bash instances (via modifications to the initialization scripts). The next chapter will elaborate more on how information can flow from one program to another independent program.

Last updated