- 1 Introducing the shell
- 2 Navigating files and directories
- 3 Working with files and directories
- 4 Pipes and filters
- 5 Shell scripts
- 6 Loops
- 7 Finding things
- 8 Shell extras
- 9 Credits
- 10 References
- 11 Data Sources
Introducing the shell
Reasons to use the shell
- Automate basic tasks
- Underlies many other open source languages and applications and can be used to glue them together
- Essential for system administration, remote computing, and high-performance computing
- Many concise special-purpose tools that can make your life easier
- Complements more fully-featured application programming languages
“Unix”
Powerpoint slides: Unix family tree
- Unix-like operation systems share a common architecture and layout
- Roughly compatible, with similar (or identical) shells and tools
- The environment in which most open-source software was written
“Shell”
- Broadly speaking, there is a tension between making computer systems fast and making them easy to use.
- A common solution is to create a 2-layer architecture: A fast, somewhat opaque core surrounded by a more friendly scriptable interface (also referred to as “hooks” or an “API”). Examples of this include video games, Emacs and other highly customizable code editors, and high-level special-purpose languages like Stata and Mathematica.
- Unix shell is the scriptable shell around the operating system. It provides a simple interface for making the operating system do work, without having to know exactly how it accomplishes that work.
The past is always with us
The design and terminology of modern computers is based on metaphors from a previous age.
- Files and folders
- Teletype input and output
- Modern touch devices don’t expose the file system, so you may be less comfortable with navigating directory trees than people whose primary computing devices were desktop computers
Navigating files and directories
File system layout
Powerpoint slides: “Navigating files and directories”
Who are you?
whoami
Where are you?
-
Current working directory
pwd # Print Working Directory
-
By default, this is probably your home directory (discuss how to view this in Finder or File Explorer)
-
Linux
/home/nelle
-
Mac OS
/Users/nelle
-
Windows
C:\Users\nelle
-
What’s in this directory?
-
List the contents of the directory
ls # List directory contents
-
Command flags modify what a command does
ls -F # show category markers
Getting help
ls --help # In-line help info; should work in Windows
man ls # Manual for "ls"
- You can navigate through the man page using the space bar and arrow keys
- Quit man with “q”
- Online references are available for Windows users who don’t have man pages: https://linux.die.net/
Exploring other directories
-
When a command is followed by an argument, it acts on that argument.
ls -F Desktop # get contents of folder ls -F Desktop/shell-lesson-data # get contents of subfolder
-
Move down the directory tree
cd Desktop cd shell-lesson-data cd exercise-data
-
Now that you’re “in” a new location, the context for your commands is different
pwd ls -F # This produces an error because the folder is in a different location # relative to the working directory cd shell-lesson-data
-
Move up the directory tree
.
is shorthand for “current directory”;..
is shorthand for “parent directory”# Show hidden files, including current and parent directories ls -a # You can combine flags ls -Fa # Move to parent directory cd ..
-
Shortcuts
cd ~ # go to home directory cd - # go back to previous directory
Relative vs. absolute paths
- An absolute path specifies a location from the root of the file system.
- A relative path specifies a location starting from the current location.
Working with files and directories
Creating directories
-
See where we are and what we have
pwd cd exercise-data/writing # traverse several layers at once ls -F
-
Create a directory
# Make a subdirectory mkdir thesis ls -F # Make multiple directories; create intermediate dirs as required mkdir -p ../project/data ../project/results # Show all directory contents recursively ls -FR ../project
-
Create a text file. Note that everything is available through the file browser and the terminal.
cd thesis nano draft.txt
This is my first draft boop beep boop
-
Edit with Notepad / TextEdit, then re-edit with nano.
Moving files and directories
-
Move our file to a new location
cd ~/Desktop/shell-lesson-data/exercise-data/writing # Rename the file by moving it mv thesis/draft.txt thesis/quotes.txt # Verify the new file name ls thesis # You can also specify the exact file name ls thesis/quotes.txt
-
Move our file to the current working directory
mv thesis/quotes.txt . ls thesis/quotes.txt # Not here anymore ls # now here
Copying files and directories
-
Copy a single file
cp quotes.txt thesis/quotations.txt ls thesis ls # Alternatively ls quotes.txt thesis/quotations.txt
-
Copy a directory recursively
cp -r thesis thesis_backup ls thesis thesis_backup
Removing files and directories
-
Remove a file
rm quotes.txt ls quotes.txt
-
Remove a file interactively Deletion is forever!
rm -i thesis_backup/quotations.txt
-
Remove a directory and its contents
rm thesis # This gives un an error rm -ri thesis # Remove recursively
Create a backup archive
Deletion is forever. Consider making a backup archive as part of your workflow.
-
Create an archive with
tar
(“tape archive”).cd ~/Desktop/shell-lesson-data/exercise-data/ # [c]reate a new archive with the given [f]ilename tar -cf writing.tar writing/
-
Create a compressed (zipped) archive.
# [a]uto-compress the archive based on its file extension tar -acf writing.zip writing/ # FYI, you may also see tar -a -cf writing.zip writing/ # FYI, linux servers frequently use g[z]ip tar -z -cf writing.tgz writing/
tar
is an old utility and can be finicky about the order of flags. -
Extract your archive
mv writing writing_backup # e[x]tract the archive to get the original files back tar -xf writing.zip # Compare the old and restored directories ls writing ls writing_backup
-
There are many useful utilities: https://www.gnu.org/software/coreutils/manual/coreutils.html
Operations with multiple files and directories
-
Copy with multiple file names
cd ~/Desktop/shell-lesson-data/exercise-data/ cp creatures/minotaur.dat creatures/unicorn.dat creatures_backup/
-
Copy using globs (“globals”) You can match a single character with ? or unlimited characters with *. This is an example of shell expansion.
mkdir proteins_backup # The shell expands *.pdb into the list of all matching files, then does `cp` cp proteins/*.pdb proteins_backup/
Pipes and filters
The “Unix Philosophy” is to combine many small tools that do one job into a processing pipeline.
Motivating example with wc
FYI, .pdb
is the Protein Data Bank format
-
Count words in a file using
wc
cd ~/Desktop/shell-lesson-data/exercise-data/proteins/ ls # Inspect cubane.pdb cat cubane.pdb # [w]ord [c]ount for cubane.pdb wc cubane.pdb
-
Run
wc
for all files# Run the command with default options wc *.pdb wc -l *.pdb # lines wc -c *.pdb # characters wc -w *.pdb # words
Capturing output from commands
# Redirect output to file
wc -l *.pdb > lengths.txt
ls lengths.txt
cat lengths.txt # Inspect contents
head -n 1 lengths.txt # Inspect 1st line
less lengths.txt # Inspect with pager
Filtering output
-
The
sort
command runs the file input through a filter and returns the filtered result.sort lengths.txt # alphanumeric sort (i.e. text) sort -n lengths.txt # numeric sort
-
Send filtered output to new file
sort -n lengths.txt > sorted_lengths.txt cat sorted_lengths.txt
-
(Optional) Append to the end of a file using
>>
cd ~/Desktop/shell-lesson-data/exercise-data/animal-counts/ # Create new file head -n 3 animals.csv > animals-subset.csv # Append to that file tail -n 2 animals.csv >> animals-subset.csv
Passing output to another command
Pipe output from one command directly into a second command without creating an intermediate file. This is the cornerstone of Unix workflows.
sort -n lengths.txt | head -n 1
Combining multiple commands
Daisy-chain your commands together. As long as the output of command X is a legitimate input for command Y, it will work.
# Return to the beginning
wc -l *.pdb | sort -n
# Add additional commands
wc -l *.pdb | sort -n | head -n 1
History and pipes
-
The terminal saves your command history (typically 500 or 1000 commands)
- You can see previous commands using the up/down arrows
- You can edit the command that’s currently visible and run it
-
Once your command history gets big, you might want to search it:
history # or `history -1000` in zsh on Mac history | grep ls # pipe the output of history into search
Shell scripts
We should save this stuff and reuse it.
Creating and running a script
-
Create a new script
cd proteins nano middle.sh
-
Edit the script file and save
# Get lines 11-15 head -n 15 octane.pdb | tail -n 5
-
Execute the script
bash middle.sh
Generalize your script
-
Use a special variable to run the script on any file (
$1
returns the value of a variable;""
ensures that it works if there are spaces.)nano middle.sh
# Use the 1st argument as your input. head -n 15 "$1" | tail -n 5
bash middle.sh octane.pdb bash middle.sh pentane.pdb
-
Use additional ordered arguments
nano middle.sh
# Select lines from the middle of a file. # Usage: bash middle.sh filename end_line num_lines head -n "$2" "$1" | tail -n "$3"
bash middle.sh pentane.pdb 15 5
-
Use unlimited arguments
nano sorted.sh
# Sort files by their length. # Usage: bash sorted.sh one_or_more_filenames wc -l "$@" | sort -n
bash sorted.sh *.pdb ../creatures/*.dat
(Optional) Text processing with Unix tools
cd ~/Desktop/shell-lesson-data/exercise-data/animal-counts/
# Get the second column of the CSV
cut -d , -f 2 animals.csv
# Sort the values
cut -d , -f 2 animals.csv | sort
# Get unique values (`uniq` requires values to be adjacent to one another)
cut -d , -f 2 animals.csv | sort | uniq
(Optional) Language interpreters are also shell commands
# 1. Run a python script that produces a .csv as output
# 2. Extract the 2nd column of that .csv and get the unique values
python script.py | cut -d , -f 2 | sort | uniq
Loops
Don’t repeat yourself.
A basic loop
cd ~/Desktop/shell-lesson-data/exercise-data/creatures/
nano latin.sh
for filename in basilisk.dat minotaur.dat unicorn.dat
do
# Extract second line of file
head -n 2 $filename | tail -n 1
done
bash latin.sh
Simplify your loop with globs
nano latin.sh
for filename in *.dat
do
# Extract second line of file
head -n 2 $filename | tail -n 1
done
bash latin.sh
Generalize your loop with unlimited arguments
-
Create a separate directory for your scripts so that you can find them
cd ~/Desktop/shell-lesson-data/exercise-data/ mkdir scripts cd scripts nano aggregate.sh
-
Write a script that takes arbitrary arguments
for filename in "$@" do echo $filename done
-
Run the script against the contents of a different directory
bash aggregate.sh ../proteins/*.pdb
-
Do work in the script
nano aggregate.sh
for filename in "$@" do echo $filename cat $filename >> alkanes.pdb done
bash aggregate.sh ../proteins/*.pdb
Make your script executable
# List file in long format to show current permissions
ls -l aggregate.sh
# Change file mode (i.e. permissions)
# User can read/write/execute, Group and Other can read
chmod u=rwx,go=r aggregate.sh
# Show changed permissions
ls -l aggregate.sh
# Invoke script
./aggregate.sh ../proteins/*.pdb
Finding things
Find
Find everything
cd ~/Desktop/shell-lesson-data/exercise-data/
find .
Find by type
# List all directories
find . -type d
# List all files
find . -type f
Find files
# Do shell expansion, then run command
find . -name *.txt
# Prevent shell expansion and match wildcard
find . -name "*.txt"
Grep
Grep is a powerful tool for matching text patterns by using regular expressions. You can find introductory documentation for regular expressions in the References section.
Shell extras
Consult the Wooledge Bash Guide (see references below) for more on these topics:
- SSH
- Permissions
- Job control
- Aliases and bash customization
- Shell variables
- Mini-languages (grep, sed, AWK)
- Shell expansion
- Conditional tests
Credits
- The Unix Shell: https://swcarpentry.github.io/shell-novice/
References
- A list of command line utilities: https://ss64.com/bash/
- GNU core utilities: https://www.gnu.org/software/coreutils/manual/coreutils.html
- Bash guide: https://mywiki.wooledge.org/BashGuide
- Shell redirection operators(1): https://www.redhat.com/sysadmin/linux-shell-redirection-pipelining
- Shell redirection operators (2): https://www.gnu.org/software/bash/manual/html_node/Redirections.html
- Grep regular expressions: https://www.gnu.org/software/grep/manual/html_node/Regular-Expressions.html
- Using zsh on MacOS: https://scriptingosx.com/2019/06/moving-to-zsh/