Chapter 1
This chapter will introduce you to the shell environment from both a user's and a programmer's perspective. The Portable Operating System Interface (POSIX) standards specify how programs should implement their interface so that they are portable across Unix-like platforms. Bash follows these standards and augments them with additional features.
A shell is a user-interface to an operating system's services, and this guide is about the command-line interface for the shell. We run a terminal program in order to use this interface. In turn, the terminal invokes a shell program, which is typically bash
. So, launch a terminal for your platform, and let's get started!
Hopefully, your terminal now eagerly awaits your command, and it is running a shell program, which is probably bash
. You can verify this by
After you typed the above command and pressed enter
, the command should print something similar to the output below.
Your output will be slightly different, but the important part is that the CMD
column shows bash
. My output indicates that I am using an instance of bash
with the process id of 25719 (yours will very likely be different). Every process on Linux is assigned a process ID so that they may be uniquely identified by commands such as kill
, which can terminate a launched process.
Note that most commands introduced hitherto should be similarly typed into the terminal following the prompt string (a short piece of text that typically ends with the $
character). If the command-line prompt string is ever missing, it means that your current command is incomplete. Unless you know your command spans multiple lines, your command may contain a syntax error. In this case, you should press C-c
(hold Ctrl
key and press c
) to interrupt the current command and return to the command-line prompt so that you may start over from the last successfully executed command.
Commands are interpreted by bash
and executed sequentially. Bash commands follow the format
Our command begins with the name of the program (ps
), followed by the arguments (-p $$
). (An argument is a value passed to a program; it can pretty much be anything and does not need to be persuasive.) The arguments are split by whitespace characters (i.e. space, tab, and new-line) and sent as separate values to the program. The program will receive -p
as the first argument and $$
as the second argument. Typically, Linux programs are written so that arguments begining with the -
character are interpreted as parameter names, which may consume one or more additional arguments. In our example, $$
is the argument to the -p
parameter; that is, the program will set the -p
parameter to the value of $$
.
In general, words that begin with $
are variable names, and Bash will expand variable names into their current values. The $$
variable is expanded into 25719 in the above example and the ps
program will actually receive -p
and 25719
as command-line arguments (not -p
and $$
).
The above commands should work if you are using a shell that follows the POSIX standards. Since this guide is strictly about Bash, if the ps -p $$
command outputted a shell other than bash
, then invoke bash
and enter the Bash environment by simplying entering
This command starts a new instance of bash and enters its environment. When you wish to return to your original shell environment, type
Bash script
Commands can be typed into bash
, but they can also be stored in a text file known as a script, which can be run upon request. By convention, shell scripts have the extension sh
.
Open a file in the current directory using your text editor (e.g. nano
, vim
, emacs
), and enter the following text
Save the file to disk. Then, run the script using bash
by
In the above command, bash
took hello.sh
, Brian
, and Fox
as arguments and runs the hello.sh
script with arguments Brian
and Fox
. The hello.sh
script contains a line of command that runs the echo
program (which simply outputs a given text) with the argument "Hello, $1."
. This argument is converted by Bash into Hello, Brian.
, and it is passed to echo
for output to the terminal.
The $1
variable refers to the first argument. Accordingly, $2
refers to the second, $3
refers to the third, and so on. Since $2
is not used, the second argument (Fox
) does not appear in the output. In order to send Brian Fox
as one argument, we must surround the text using quotes:
The quotes preserve the space characters and protect the argument Brian Fox
from being split into two arguments.
The current script will work in any POSIX-compliant shell, but we can be (and should be) more specific in how the script is to be run by including a shebang line (which begins with #!
in the script:
Additionally, we need to make the script executable by enabling the x
flag for the file using the chmod
(change file modes) program:
Then, we can execute (run) the script by
The shebang line in the script instructs the shell to run the script using the executable at /bin/bash
, the standard location for bash
on most Linux platforms. Effectively, the script is executed by the following command
In the first command ./hello.sh
, the .
is necessary so that the shell knows that the hello.sh
script is found in the current directory (.
). The /bin/bash
in the second command explicitly specifies the path to the bash
executable (in the event that multiple instances of bash
are installed on the machine. In our casual use of the script earlier, we only specified bash
instead of the full path, because executables are installed in standard locations on Linux platforms, and the shell searches standard locations for the first instance of bash.
To see where your default bash
is stored, you can use the which
program:
Of course, hello.sh
can still be executed by invoking bash hello.sh
instead of ./hello.sh
regardless of whether the shebang line is present or not. Nonetheless, the shebang line also serves as documentation for how the script is meant to be run.
Parameters
When programs are called, the user may supply arguments to parameters in order to control the behaviour of the program. (Parameters are the variables that can be set in a program; arguments are values that a user supplies to the program.) What parameters and how these parameters are used by the program ultimately depends on the author of the program. Fortunately, Linux programs tend to specify parameters in a similar pattern.
This template command is known as a usage statement for the program.
In the above usage statements, pos_arg1
and pos_arg2
are positional parameters, which are generally mandatory and passed arguments are assigned in order based on position. These parameters are often enclosed in angular brackets (inequality signs) <>
. So, the command
will set the parameter pos_arg1
to value A
and the parameter pos_arg1
to value B
. Notice that the parameter names are not specified in the above command; rather, the program understands that the first specified value is for the first positional parameter and the second specified value is for the second positional parameter.
In contrast, optional parameters are not mandatory, and they are usually prefixed by one or two dashes -
and possibly enclosed by square brackets []
. If the user wishes to specify a value for an optional parameter, the user must refer to the parameter by name. For example,
will set opt_arg1
to C
, and the positional parameters pos_arg1
and pos_arg2
to the same values as before. Some programs will require that optional parameters be specified before position parameters. That is,
may not be parsed correctly. Some programs also support the use of --
to indicate the end of optional parameters, as in
These parameter descriptions are simply guidelines that many Linux programs follow. Users should also consult the help message or the manual entry for specific instructions for the program.
It would be helpful to users if you follow these guidelines when you design the command-line interface to your program. bash
does not automatically parse command line arguments: it simply splits into words as delimited by whitespace, as discussed in the ps -p $$
example earlier. For more complete command line parsing, we can use the getopts
command that is built into Bash. Alternatively, we can use newer programs such as docoptp
, or parse the arguments using Bash.
Job Control
Inside the Bash environment, commands are executed to run various programs to achieve a task. Each running program is also called a job, or more formally, a process. A running job can be terminated by pressing the Ctrl
key together with c
(refered to as C-c
).
The C-z
combination suspends a job and sends it into the background (but does not terminate it) so that the user can continue to interact with the Bash command-line and do additional work in parallel to the background job. To resume the job previously sent to the background while keeping it in the background, run bg
. To resume a background job and bring it to the foreground (current command-line prompt), run fg
.
If a command is executed with &
at the end, the job is launched in the background. In other words, the &
character has a similar effect as pressing C-z
on a job and resuming it in the background with bg
.
Here, a new instance of bash
is launched with the -c
argument, which takes a command to execute: sleep 1; echo "Woke up."
. This command sleeps for 1 second, and prints Woke up
to the terminal. After running this command, the user is immediately returned to the command-line prompt and allowed to issue more commands. After 1 second, the background running job will complete and print Woke up.
to the same terminal. If the &
were omitted, the user would have to wait for the command to complete before running additional commands.
In contrast, the C-d
combination terminates input to the program, for programs that expect additional user inputs, such as cat
(which stands for concatenation). For example, cat
does not terminate after the user calls it, because cat
waits for additional input and will continue to wait unless it receives the C-d
signal. Type
and press C-d
. Thereafter, cat > rose.txt
outputs your text input to a file named rose.txt
. If you were to press C-c
to terminate the program, the output file may be incomplete, as the cat
process will be interrupted and terminated immediately.
Navigating directories
By default when you launch a terminal, you enter a shell environment (e.g. Bash) and start in the home directory of the user (typically /home/brian
for a user named brian
on a Linux platform. To navigate to other directories, we use the cd
(change directory) command:
The argument following cd
specifies the path of the new directory dir/subdir
, which must already exist. Directories inside directories are separated by the slash character /
on Linux platforms. cd
recognizes special paths .
, and ..
, and -
. The .
path refers to the current directory, ..
refers to the parent directory, and -
refers to the previous directory. To print the current directory, we use pwd
(print working directory). Without any argument cd
returns the user to the home directory.
To print a list of directories and files within a directory, we use ls
, and ls -l
for a more verbose output that indicates permission, ownership, time stamp, file size, and other detals.
The table below lists additional commands
cd
Change working directory.
pwd
Print working directory.
ls
List directory contents.
mkdir
Make directory.
rmdir
Remove directory only if it is empty.
touch
Touch a file, creating it if it does not exist.
rm
Remove files or directories.
mktemp
Create a temporary file.
mkfifo
Create a first-in-first-out file (named pipe).
ln -s
Create a symbolic link to a file or directory.
mv
Move files or directories.
cp
Copy files or directories.
Variables
In order to adapt to changing circumstances, Bash scripts use variables to hold information that is provided by the user and is subject to change. Variables are used to store values temporarily, and values they store can be changed subsequently. The $1
variable is an example of a special variable, which is not defined explicitly by the programmer. Aside from $1
, $2
, $3
, ..., we have seen the $$
special variable, which holds the process id of the current running program (e.g. bash
). Below are a few more useful special variables:
$$
ID of current process
$!
ID of the most recently submitted background process
$0
Name of current process
$1
$2
...
First, second, ... arguments
$#
Total number (#
) of arguments
$@
All arguments as an array (starting with $1
)
$?
Exit status of the most recently executed foreground program
A number of shell variables are also pre-defined by your platform or previously in the current Bash session. You can print all currently set (defined) shell variables by calling set
without any arguments:
In addition, several environment variables are also pre-defined. Environment variables are always inherited by the new children processes spawned by the current shell program, whereas shell variables are not inherited by children processes. To see a list of currently defined environment variables, call env
without arguments:
Among the environment variables, we will point out $PATH
and $LD_LIBRARY_PATH
, because they are particularly important for Bash to find executables and shared libraries. $PATH
contains a list of paths that point to possible storage locations for executables, separated by the colon character :
. On most Linux platforms, /usr/bin
should be one of these possible locations. Bash searches for programs sequentially in the order the paths appear in the $PATH
. The $LD_LIBRARY_PATH
environment variable contains locations where shared libraries are stored. Furthermore, the $HOME
environmental variable points to the home directory of the current user.
In Bash, you can define your own shell variables by using the following syntax:
Here, the author_name
variable is defined and holds the value Brian Fox
. Variable names cannot contain spaces or special characters. Generally, you should only use letters, numbers, and underscore for variable names. Importantly, no space surrounding =
is allowed. Inserting a space after author_name
will cause Bash to interpret author_name
as a program name, since it is an isolated word at the beginning of a command.
We access the value of a variable by using the $
prefix in front of the variable name. By doing so, variable expansion occurs: the variable reference is expanded to the value contained in the variable.
If we omit the $
prefix, we are simply refering to the text author_name
and not the value inside the $author_name
variable.
If we were to run another instance of bash (by calling bash
), we would not be able to access shell variables from the current bash session.
In this command, bash -c
starts a new instance of bash
, executes the specified command, and returns to the original shell environment following the execution of the command. The specified command echo $author_name
is executed within the new instance of bash
, and it will print nothing because $author_name
is not defined for this bash
instance.
In order for new bash
instances (subprocesses) to have access to a shell variable, the latter must be exported into the list of environment variables:
This command exports the shell variable with the name author_name
into the environment, so subprocesses, such as another instance of Bash, can access this variable. (Notice that we did not use the $
prefix, because we are literally specifying the name of the variable: no variable expansion should occur.)
Now the command
should print Brian Fox
.
To recap, $author_name
is expanded to the value of the $author_name
variable (Brian Fox
) if it is defined in the current Bash session. The $author_name
variable is defined in the current Bash session, but not in a new Bash session, until the $author_name
has been exported as an environment variable.
The astute reader will notice that we used the single quote '
instead of the double quote "
to pass echo $author_name
to a new bash
instance. This choice was intentional, as we wanted to pass the text echo $author_name
literally, without variable expansion, to a new bash
instance in order to illustrate the need to export variables.
Suppose we use double quotes in the argument to bash -c
,
Then, Penguin
is printed despite that the $animal
is not defined in the new instance of bash
. We observe this behaviour, because double quotes "
permit variable expansion by the current bash
instance while single quotes '
do not. Therefore, the "echo $animal"
argument is expanded to echo Penguin
, which is then passed to the new instance of bash
, which receives -c
and echo Penguin
as arguments. The new instance of bash
no longer needs to know about $animal
, because it receives the substituted value.
The name=value
syntax defines a shell variable in the current bash
session, but it can also be used to define variables in a new environemnt if the syntax is directly followed by a command:
If food=Bamboo
and bash -c
were on separate lines, the $food
variable would be defined for the current bash
instance and not for the new bash
instance, as discussed earlier. Since food=Bamboo
and bash -c
are part of the same command, however, the $food
variable is actually directly defined in the environment of the new bash
instance (and not in the current bash).
Therefore, the new bash
instance receives echo $food
as arguments (without variable expansion), and it would expand the variable $food
into the text value Bamboo
, because it received the environment food=Bamboo
. Subsequently, the new bash
instance would print Bamboo
and exit. In contrast, the current bash
instance will not have access to $food
. That is, the command
now prints nothing to the terminal, because the $food
variable is no longer defined. Suppose we now use double quotes instead of single quotes to permit variable expansion before invoking bash -c
, we similarly receive no output, because the $food
is not being defined in the current bash
instance, where variable expansion now takes place:
In summary, the syntax for executing a command is
Bash initialization files
When you first enter the shell environment in a terminal, you are working within an interactive, login shell. If you invoke bash
at the command-line prompt without arguments, you are creating a subprocess: you are starting another instance of bash
. This time, you are using an interactive, non-login bash
instance, unless you invoke bash --login
. You may exit this subprocess by invoking exit
. If you run bash -c
and supply an argument containing a command, you are starting a non-interactive, non-login bash
instance, which will exit when the command completes. You can also invoke bash script.sh
, which runs script.sh
using a bash
subprocess. Similarly, ./script.sh
also runs script.sh
in a subprocess as specified by the shebang line. To emphasize, no shell variables are inherited by subprocesses, but the environment variables are inherited by subprocesses, as illustrated with the earlier examples. Shell variables become environment variables when they are exported, and they are in turn inherited by subprocesses.
So what about an independent instance of bash
that is invoked in another terminal? A bash
instance in one terminal is not a subprocess of the bash
instance in another terminal; they are completely independent, and they cannot send variables to each other. In order for both bash
instances to have access to a common set of variables, you may define environment variables in initialization script files.
Most commonly, you would define variables in the ~/.bashrc
file (where ~
is a short-hand for $HOME
). The ~/.bashrc
script is run whenever a non-interactive bash
instance is started (e.g. bash -c
, bash script.sh
, ./script.sh
). If bash
is started interactively (e.g. by using bash --login
or bash -l
, the ~/.bash_profile
initialization script will be run.
Bash also runs the global /etc/profile
initialization script in addition to the personal ~/.bash_profile
. Furthermore, some platforms may store additional global initializations script /etc/profile.d/
that are run by the /etc/profile
script. These script files are only run in an interactive, login bash
instance (e.g. first invoked bash
inside a terminal and bash --login
). The idea is that these profile scripts are sourced once during login, and each bash
subprocess sources ~/.bashrc
.
On many systems, the ~/.bash_profile
actually sources ~/.bashrc
so that commands in ~/.bashrc
will eventually be executed. (You should verify that your ~/.bash_profile
contains a line with . ~/.bashrc
, source ~/.bashrc
, or something equivalent. Above all, it is important to export
all variables intended for global use in the ~/.bashrc
initialization script.
Suppose you wish to define your own initialization script that defines non-exported variables for certain tasks. You can create some script file named initialize.sh
stored at path
and run it within the current bash
instance by:
The space after .
is critical, as this .
does not refer to the current directory; it instructs bash
to run the script in the current instance instead of a subprocess. If you run the script by ./path/initialize.sh
or bash path/initialize.sh
, you would define the variables in a subprocess, and these variables will not be accessible to the current bash
instance. The export
command will not help, since only subprocesses inherit environment variables: an environment variable in the subprocess will not be accessible to the parent bash
instance. Executing initialize.sh
with .
is equivalent to copying over the content of initialize.sh
and running them in the current bash
instance. This command is also known as source
, so the below command does exactly the same:
In fact, you may notice that the standard initialize script files (e.g. /etc/profile
) often source other script files using .
or source
.
Summary
So what's with all the fuss about who inherits what and who has access to what? This chapter illustrated an important concept in modern programming: compartmentalization. Information stored in variables are compartmentalized so that one bash
instance does not necessarily have access to all the information; an effective programmer will control the scope of each variable appropriately so that each process has access to the right information at the right time. Indeed, if all variables are shared among all processes, and each process can modify each variable, then it would be increasingly difficult to predict variable content and program behaviour as the size and complexity of the program grows. Therefore, it is important to understand how information flows from external environments to the current program (via command-line arguments), from parent bash
instances to child instances (via environment variables), as well as how some information can be propagated among independent bash
instances (via modifications to the initialization scripts). The next chapter will elaborate more on how information can flow from one program to another independent program.
Last updated