1 Introducing the shell
2 Navigating files and directories
3 Working with files and directories
4 Pipes and filters
5 Shell scripts
6 Loops
7 Finding things
- 7.1 Find
- 7.2 Grep
8 Shell extras
9 Credits
10 References
11 Data Sources

Introducing the shell

Reasons to use the shell

Automate basic tasks
Underlies many other open source languages and applications and can be used to glue them together
Essential for system administration, remote computing, and high-performance computing
Many concise special-purpose tools that can make your life easier
Complements more fully-featured application programming languages

“Unix”

Powerpoint slides: Unix family tree

Unix-like operation systems share a common architecture and layout
Roughly compatible, with similar (or identical) shells and tools
The environment in which most open-source software was written

“Shell”

Broadly speaking, there is a tension between making computer systems fast and making them easy to use.
A common solution is to create a 2-layer architecture: A fast, somewhat opaque core surrounded by a more friendly scriptable interface (also referred to as “hooks” or an “API”). Examples of this include video games, Emacs and other highly customizable code editors, and high-level special-purpose languages like Stata and Mathematica.
Unix shell is the scriptable shell around the operating system. It provides a simple interface for making the operating system do work, without having to know exactly how it accomplishes that work.

The past is always with us

The design and terminology of modern computers is based on metaphors from a previous age.

Files and folders
Teletype input and output
Modern touch devices don’t expose the file system, so you may be less comfortable with navigating directory trees than people whose primary computing devices were desktop computers

Navigating files and directories

File system layout

Powerpoint slides: “Navigating files and directories”

Who are you?

whoami

Where are you?

Current working directory

pwd                             # Print Working Directory

By default, this is probably your home directory (discuss how to view this in Finder or File Explorer)
1. Linux
```
/home/nelle
```
2. Mac OS
```
/Users/nelle
```
3. Windows
```
C:\Users\nelle
```

What’s in this directory?

List the contents of the directory

ls                              # List directory contents

Command flags modify what a command does
```
ls -F     # show category markers
```

Getting help

ls --help                       # In-line help info; should work in Windows
man ls                          # Manual for "ls"

You can navigate through the man page using the space bar and arrow keys
Quit man with “q”
Online references are available for Windows users who don’t have man pages: https://linux.die.net/

Exploring other directories

When a command is followed by an argument, it acts on that argument.

ls -F Desktop                   # get contents of folder
ls -F Desktop/shell-lesson-data # get contents of subfolder

Move down the directory tree

cd Desktop
cd shell-lesson-data
cd exercise-data

Now that you’re “in” a new location, the context for your commands is different

pwd
ls -F

# This produces an error because the folder is in a different location
# relative to the working directory
cd shell-lesson-data

Move up the directory tree . is shorthand for “current directory”; .. is shorthand for “parent directory”

# Show hidden files, including current and parent directories
ls -a

# You can combine flags
ls -Fa

# Move to parent directory
cd ..

Shortcuts

cd ~   # go to home directory
cd -   # go back to previous directory

Relative vs. absolute paths

An absolute path specifies a location from the root of the file system.
A relative path specifies a location starting from the current location.

Working with files and directories

Creating directories

See where we are and what we have

pwd
cd exercise-data/writing  # traverse several layers at once
ls -F

Create a directory

# Make a subdirectory
mkdir thesis
ls -F

# Make multiple directories; create intermediate dirs as required
mkdir -p ../project/data ../project/results

# Show all directory contents recursively
ls -FR ../project

Create a text file. Note that everything is available through the file browser and the terminal.
```
cd thesis
nano draft.txt
```
```
This is my first draft
boop beep boop
```
Edit with Notepad / TextEdit, then re-edit with nano.

Moving files and directories

Move our file to a new location

cd ~/Desktop/shell-lesson-data/exercise-data/writing

# Rename the file by moving it
mv thesis/draft.txt thesis/quotes.txt

# Verify the new file name
ls thesis

# You can also specify the exact file name
ls thesis/quotes.txt

Move our file to the current working directory

mv thesis/quotes.txt .
ls thesis/quotes.txt # Not here anymore
ls                   # now here

Copying files and directories

Copy a single file

cp quotes.txt thesis/quotations.txt
ls thesis
ls

# Alternatively
ls quotes.txt thesis/quotations.txt

Copy a directory recursively

cp -r thesis thesis_backup
ls thesis thesis_backup

Removing files and directories

Remove a file
```
rm quotes.txt
ls quotes.txt
```
Remove a file interactively Deletion is forever!
```
rm -i thesis_backup/quotations.txt
```

Remove a directory and its contents

rm thesis      # This gives un an error
rm -ri thesis  # Remove recursively

Create a backup archive

Deletion is forever. Consider making a backup archive as part of your workflow.

Create an archive with tar (“tape archive”).

cd ~/Desktop/shell-lesson-data/exercise-data/

# [c]reate a new archive with the given [f]ilename
tar -cf writing.tar writing/

Create a compressed (zipped) archive.

# [a]uto-compress the archive based on its file extension
tar -acf writing.zip writing/

# FYI, you may also see
tar -a -cf writing.zip writing/

# FYI, linux servers frequently use g[z]ip
tar -z -cf writing.tgz writing/

tar is an old utility and can be finicky about the order of flags.

Extract your archive

mv writing writing_backup

# e[x]tract the archive to get the original files back
tar -xf writing.zip

# Compare the old and restored directories
ls writing
ls writing_backup

There are many useful utilities: https://www.gnu.org/software/coreutils/manual/coreutils.html

Operations with multiple files and directories

Copy with multiple file names

cd ~/Desktop/shell-lesson-data/exercise-data/

cp creatures/minotaur.dat creatures/unicorn.dat creatures_backup/

Copy using globs (“globals”) You can match a single character with ? or unlimited characters with *. This is an example of shell expansion.

mkdir proteins_backup

# The shell expands *.pdb into the list of all matching files, then does `cp`
cp proteins/*.pdb proteins_backup/

Pipes and filters

The “Unix Philosophy” is to combine many small tools that do one job into a processing pipeline.

Motivating example with `wc`

FYI, .pdb is the Protein Data Bank format

Count words in a file using wc

cd ~/Desktop/shell-lesson-data/exercise-data/proteins/
ls

# Inspect cubane.pdb
cat cubane.pdb

# [w]ord [c]ount for cubane.pdb
wc cubane.pdb

Run wc for all files

# Run the command with default options
wc *.pdb


wc -l *.pdb # lines
wc -c *.pdb # characters
wc -w *.pdb # words

Capturing output from commands

# Redirect output to file
wc -l *.pdb > lengths.txt
ls lengths.txt
cat lengths.txt       # Inspect contents
head -n 1 lengths.txt # Inspect 1st line
less lengths.txt      # Inspect with pager

Filtering output

The sort command runs the file input through a filter and returns the filtered result.

sort lengths.txt    # alphanumeric sort (i.e. text)
sort -n lengths.txt # numeric sort

Send filtered output to new file

sort -n lengths.txt > sorted_lengths.txt
cat sorted_lengths.txt

(Optional) Append to the end of a file using >>

cd ~/Desktop/shell-lesson-data/exercise-data/animal-counts/

# Create new file
head -n 3 animals.csv > animals-subset.csv

# Append to that file
tail -n 2 animals.csv >> animals-subset.csv

Passing output to another command

Pipe output from one command directly into a second command without creating an intermediate file. This is the cornerstone of Unix workflows.

sort -n lengths.txt |  head -n 1

Combining multiple commands

Daisy-chain your commands together. As long as the output of command X is a legitimate input for command Y, it will work.

# Return to the beginning
wc -l *.pdb | sort -n

# Add additional commands
wc -l *.pdb | sort -n | head -n 1

History and pipes

The terminal saves your command history (typically 500 or 1000 commands)
- You can see previous commands using the up/down arrows
- You can edit the command that’s currently visible and run it

Once your command history gets big, you might want to search it:

history           # or `history -1000` in zsh on Mac
history | grep ls # pipe the output of history into search

Shell scripts

We should save this stuff and reuse it.

Creating and running a script

Create a new script
```
cd proteins
nano middle.sh
```

Edit the script file and save

# Get lines 11-15
head -n 15 octane.pdb | tail -n 5

Execute the script
```
bash middle.sh
```

Generalize your script

Use a special variable to run the script on any file ($1 returns the value of a variable; "" ensures that it works if there are spaces.)

nano middle.sh

# Use the 1st argument as your input.
head -n 15 "$1" | tail -n 5

bash middle.sh octane.pdb
bash middle.sh pentane.pdb

Use additional ordered arguments

nano middle.sh

# Select lines from the middle of a file.
# Usage: bash middle.sh filename end_line num_lines
head -n "$2" "$1" | tail -n "$3"

bash middle.sh pentane.pdb 15 5

Use unlimited arguments

nano sorted.sh

# Sort files by their length.
# Usage: bash sorted.sh one_or_more_filenames
wc -l "$@" | sort -n

bash sorted.sh *.pdb ../creatures/*.dat

(Optional) Text processing with Unix tools

cd ~/Desktop/shell-lesson-data/exercise-data/animal-counts/

# Get the second column of the CSV
cut -d , -f 2 animals.csv

# Sort the values
cut -d , -f 2 animals.csv | sort

# Get unique values (`uniq` requires values to be adjacent to one another)
cut -d , -f 2 animals.csv | sort | uniq

(Optional) Language interpreters are also shell commands

# 1. Run a python script that produces a .csv as output
# 2. Extract the 2nd column of that .csv and get the unique values
python script.py | cut -d , -f 2 | sort | uniq

Loops

Don’t repeat yourself.

A basic loop

cd ~/Desktop/shell-lesson-data/exercise-data/creatures/
nano latin.sh

for filename in basilisk.dat minotaur.dat unicorn.dat
do
    # Extract second line of file
    head -n 2 $filename | tail -n 1
done

bash latin.sh

Simplify your loop with globs

nano latin.sh

for filename in *.dat
do
    # Extract second line of file
    head -n 2 $filename | tail -n 1
done

bash latin.sh

Generalize your loop with unlimited arguments

Create a separate directory for your scripts so that you can find them

cd ~/Desktop/shell-lesson-data/exercise-data/
mkdir scripts
cd scripts
nano aggregate.sh

Write a script that takes arbitrary arguments

for filename in "$@"
do
    echo $filename
done

Run the script against the contents of a different directory
```
bash aggregate.sh ../proteins/*.pdb
```

Do work in the script

nano aggregate.sh

for filename in "$@"
do
    echo $filename
    cat $filename >> alkanes.pdb
done

bash aggregate.sh ../proteins/*.pdb

Make your script executable

# List file in long format to show current permissions
ls -l aggregate.sh

# Change file mode (i.e. permissions)
# User can read/write/execute, Group and Other can read
chmod u=rwx,go=r aggregate.sh

# Show changed permissions
ls -l aggregate.sh

# Invoke script
./aggregate.sh ../proteins/*.pdb

Finding things

Find

Find everything

cd ~/Desktop/shell-lesson-data/exercise-data/
find .

Find by type

# List all directories
find . -type d

# List all files
find . -type f

Find files

# Do shell expansion, then run command
find . -name *.txt

# Prevent shell expansion and match wildcard
find . -name "*.txt"

Grep

Grep is a powerful tool for matching text patterns by using regular expressions. You can find introductory documentation for regular expressions in the References section.

Shell extras

Consult the Wooledge Bash Guide (see references below) for more on these topics:

SSH
Permissions
Job control
Aliases and bash customization
Shell variables
Mini-languages (grep, sed, AWK)
Shell expansion
Conditional tests

Credits

The Unix Shell: https://swcarpentry.github.io/shell-novice/

References

A list of command line utilities: https://ss64.com/bash/
GNU core utilities: https://www.gnu.org/software/coreutils/manual/coreutils.html
Bash guide: https://mywiki.wooledge.org/BashGuide
Shell redirection operators(1): https://www.redhat.com/sysadmin/linux-shell-redirection-pipelining
Shell redirection operators (2): https://www.gnu.org/software/bash/manual/html_node/Redirections.html
Grep regular expressions: https://www.gnu.org/software/grep/manual/html_node/Regular-Expressions.html
Using zsh on MacOS: https://scriptingosx.com/2019/06/moving-to-zsh/

Data Sources

Lesson data: http://swcarpentry.github.io/shell-novice/data/shell-lesson-data.zip