Orientation
This is an introduction to the fundamental system administration skills a researcher will need when interacting with a Unix-like server. It is not an introduction to HPC or Containers; rather, it is intended to make explanations of HPC or Containers transparent.
Why are we here?
Everyone becomes a system administrator eventually.
No one wants this. We would all prefer that our tools “just work.” The reality is that open science is done with open source tools, most open source tools were (and are) created on Unix-like systems [1], and those tools embed the philosophy and assumptions of their home systems. Novices are frequently surprised to discover that Python development (for example) is much more pleasant on Linux than on user-friendly Windows. This is because Guido van Rossum created Python as a scripting language for Unix servers; you can take the language out of the server, but you can’t take the server out of the language. Other popular research programming languages (C, Perl, Java, R, et cetera) have similar origin stories: Born on Unix servers, best run on Unix-like servers, and disproportionately represented in the desktop world by machines running the Mac OS flavor of Unix.
The programming languages are the tip of the iceberg: Most open science depends in some way on the Unix ecosystem. Many (most?) local file servers, research group databases, university computing clusters, and cloud services run Unix [2]. Using them effectively requires a hands-on understanding of Unix. Everyone becomes a system administrator eventually.
- A “Unix-like” system is one that complies with the POSIX standard. It includes BSD Unix, Linux, Solaris, Mac OS, and others.
- Specifically, they run the Unix-like Linux
Lesson Audience
This workshop is an introduction to the skills a researcher needs to effectively use and manage a Unix server. It is for anyone who needs to work in a Unix environment, or who wants to get more out of their Unix-based tools. This includes activities such as:
- Running a local file server
- Working with remote databases
- Running programs in an HPC or cloud environment
- Troubleshooting weird installation and configuration issues
- Backing up or syncing remote data
The workshop is intended to complement the pre-existing Unix Shell Carpentry workshop. It emphasizes portable skills that are useful in any Unix environment. It is not intended to be an introduction to high-performance computing, containers, or any other specialized topic. Rather, the goal is to give the learners the knowledge and skills they need to tackle those specialized topics if and when they encounter them.
(There are several Carpentry curricula on high-performance computing that partially overlap with this workshop. They are similarly structured in that they attempt to be self-contained and approachable for novices. As a result, each curriculum is a blend of Unix shell, system administration, and HPC-specific materials. Each of those topics is large enough to be its own workshop, and part of my motivation for proposing this workshop is to allow the specialized workshops to dedicate their full time to the specialized topics).
Lesson Outline
Interacting with Remote Servers
Authentication
- Public key authentication
- Creating public keys
- Windows-specific issues
- RSA is Bad, and other cryptographic right answers
Using the terminal
- Connecting with SSH
- “The SSH protocol creates a secure tunnel through which you can transfer a bidirectional stream, and you can use that stream to connect any two processes you like. The most familiar two processes would be a shell (at the server) and an interactive terminal emulator (at the client). That’s what you’re using when you ssh to a server and type commands at the remote shell’s prompt.”
- https://unix.stackexchange.com/a/116691
- Managing multiple connections and environments
- Editing remote files through the terminal
- Latency issues
Moving files to and from the remote server
- SFTP features
- SFTP is implemented on top of SSH; it’s available on any system that has SSH installed, and secure by default (or as secure as your current version of SSHy)
- Interactive: You can view remote file systems, modify file permissions, interrupt file transfer, resume file transfer, etc.
- Alternatives to SFTP (SCP, FTP, etc.)
- “The scp protocol is outdated, inflexible and not readily fixed. We recommend the use of more modern protocols like sftp and rsync for file transfer instead.”
- https://www.openssh.com/txt/release-8.0
- Managing multiple connections, revisited (latency, the danger of “focus follows mouse”)
Managing your environment
Permissions
- Understanding permissions
- Changing permissions with intuitive commands (e.g., “+r”)
- Permission masks
Configuration
- Dot files (.profile, .bash_profile, .bashrc)
- Environmental variables
File management
- Searching with find and grep
- Shell redirection
- Making archives with tar
- Database dumps
- Naming things
Getting files
- wget and curl
- git
System Administration and Troubleshooting
Utilities
- df, du, top, and cron
- Installing better utilities with the package manager
Owners and Groups
- Every process has an owner
- Changing owners
- Managing groups
- Setting and changing passwords
Becoming Root
- sudo and su
- With great power comes great responsibility
Starting and stopping processes
- OS level commands (e.g., kill)
- Utility level commands (e.g., mysql restart)
- Rebooting
Backups and file transfers with rsync
When things go wrong
- Logs
- Startup scripts
- System testing
A very brief discussion of research computing
- High-Performance Computing (running discrete jobs)
- Containers (persistent environments)
References
General Unix and Unix Architecture
- Learning Modern Linux (Hausenblas): https://www.oreilly.com/library/view/learning-modern-linux/9781098108939/
- How Linux Works (Ward): https://www.oreilly.com/library/view/how-linux-works/9781098128913/
- Your terminal is not a terminal: An Introduction to Streams (Costa): https://lucasfcosta.com/2019/04/07/streams-introduction.html
Common Utilities
- GNU core utilities documentation (Free Software Foundation): https://www.gnu.org/software/coreutils/manual/coreutils.html
- Omnibus catalog of command line utilities (SS64.com): https://ss64.com/bash/
- System monitoring utilities (Debian project): https://wiki.debian.org/SystemMonitoring
(Bash) Shell
- Bash Guide (Wooledge): https://mywiki.wooledge.org/BashGuide
- Shell redirection operators (Oliveira): https://www.redhat.com/sysadmin/linux-shell-redirection-pipelining
Security
- Cryptographic Right Answers (Ptacek et al.): https://latacora.micro.blog/2018/04/03/cryptographic-right-answers.html
Prior Art (related Carpentry lessons)
- Using the Shell in a High-Performance Computing Context: http://www.hpc-carpentry.org/hpc-shell/
- Connecting to the remote HPC system (ssh)
- Introduction to High-Performance Computing: https://carpentries-incubator.github.io/hpc-intro/
- Connecting to a remote HPC system (ssh)
- Transferring files (rsync)
- Extra Unix Shell Material: https://carpentries-incubator.github.io/shell-extras/
- Working remotely (ssh)
- Permissions
- Job control (background jobs)