Słownik

Wybierz jedno ze słów kluczowych po lewej stronie…

UtilitiesUnix

Czas czytania: ~55 min

Unix is an operating system invented in the early 1970s at AT&T Bell Labs. Today there are many variants of Unix in wide use around the world, including the Linux operating systems and macOS.

The key elements provided by a Unix-like operating system are

  1. a file system, consisting of folders which can nest and store ,
  2. a set of programs, each serving a limited function,
  3. a shell which provides mechanisms for constructing workflows involving multiple programs and files.

Several Unix shells are available, but the most popular ones provide approximately the same functionality and interface. The most popular shell is called bash. Bash is the default shell in macOS (Mojave and earlier) and some Linux distributions. As of 2016, you can also run bash natively on Windows. If you are a Windows user, it is recommended that you go ahead and install the Windows Subsystem for Linux so you can use the same commands as Linux and Mac users.

If you want to follow along below before you figure out your local setup, you can use the executable cells you see in this page (which are bash cells, not Python cells) or launch a Binder instance (select Terminal or bash from the New pull-down menu in the top right). The latter approach is recommended, because that environment provides some shortcuts that will be helpful to practice (like completing commands and file names when you hit the tab key).

Navigation

When you first open the shell, you'll be in your home directory. You can check this by running the command pwd (which stands for print working directory).

pwd

On Linux, the users' home directories are in a directory called /home/, while on macOS they're in /Users/. Since your user name on Binder is jovyan (a sci-fi reference to a term that means an inhabitant of Jupiter), the directory printed when you run the cell above is called /home/jovyan. The character ~ has a special meaning: it is automatically expanded to the path for your home directory.

Exercise

  1. ~ refers to the user's home directory
  2. pwd prints the contents of the current working directory

The string /home/jovyan is called a path. The forward slashes in a path separate directories, and each directory or file in the path is in the directory immediately to its left. For example, is a subdirectory of .

The very first slash is the root directory, and all of the files and directories on the machine are nested in this directory.

You can view the contents of directory with ls, and you can change directory using the cd command. If the initial slash is omitted in a directory name, the name is interpreted relative to the current directory. For example, you can navigate to /Users/jovyan from the /Users directory by running cd jovyan. Note that arguments are supplied to Unix commands by separating them with spaces following the name of the command. You can also navigate to containing folders using ... For example, cd ../../ navigates to the grandparent directory of the current directory.

Exercise
Write three lines of Unix code in the cell below which change directory into my-data-science-project, list the contents of that directory, and then change back to the original directory.

Solution. Here's an example solution

cd my-data-science-project
ls
cd ../

Since the original directory was the user's home directory, we could have used cd ~ instead in the last step.

Exercise
List the files in the subdirectory bin of the root directory.

Solution. The simplest way to do it in one line is ls /bin.

The mkdir command makes a new directory. So we can make a new directory, check that it's there, and navigate into it as follows:

mkdir example-directory # won't return anything!
ls
cd example-directory # won't return anything!

One extremely useful shortcut is to type an initial part of the file or directory name and hit the tab key to get the rest to pop up (note that this does not work in the cells above, but it will work on your own computer or on mybinder.org). You can also hit the tab key twice to get a list of possible completions. Using this tab completion feature is advised, for two reasons: (1) it saves typing time, and (2) it reduces spelling errors. If the shell is still completing directory names in your path as you type it, you can be sure that those directories are actually present in the operating system. If you insist on typing out the path in full, it takes significantly to catch mistakes.

Another time-saving device is the use of the up and down arrow keys to access previously used commands. You can see a list of what you've run in the shell with the history command.

The position of the cursor in the shell cannot be controlled with your mouse or trackpad. Therefore, it is essential to master a few keyboard shortcuts to avoid having to press the forward and backward arrow keys dozens of times when you need to navigate the text at the prompt.

  • ctrl-a Move the cursor to the beginning of the line
  • ctrl-e Move the cursor to the end of the line
  • ctrl-l Clear the screen
  • ctrl-c Quit the command that is currently running
  • alt-f Move the cursor forward one word (esc-f on macOS)
  • alt-b Move the cursor backward one word (esc-b on macOS)

Note that you can't directly use a space character in a Unix path name, because it would be interpreted by bash as an argument separator. To accommodate a file with a space in its name, escape the space by putting a backslash in front of it. For example, cd My Essays changes directory into a folder called "My Essays".

Here are some other important commands:

  • mv Move a file from one directory to another
  • rm Remove a file
  • cp Copy a file from one directory to another
  • touch Create a file or update its last-modified time
  • open Open a file (xdg-open on Linux)
  • cat Print the contents of a file to the terminal
  • less View the contents of a file in a viewer
  • man Show the documentation for a command
  • head Print the first 10 lines of a file
  • tail Print the last 10 lines of a file
  • wc Count the number of words, lines, and characters in a file
  • grep Find specific text in file contents
  • vim Open an editor for making changes to a file

Many commands in bash take options (analogous to keyword arguments in Python) which modify how they run. For example, rm -i gives you an interactive session where you can say for each file whether you want to delete it. Some options can themselves take arguments, in which case those arguments are listed directly after the option. For example, head -n 20 data.txt prints the first 20 lines of the file data.txt. You can read about the options a command takes by viewing its man page (for example, man head).

Exercise
Navigate into the my-data-science-project directory and the use the grep command to figure out which file contains the text find_packages.

Some helpful information: (i) grep -r text directory searches recursively for text in the directory, and (ii) . is an alias for the current directory.

Solution. Running the commands below, we find that setup.py contains the find_packages function.

cd my-data-science-project
grep -r find_packages .

Vim

Vim is the command line text editor which most consistently available on Unix systems. As a result, you will sometimes find yourself needing some basic familiarity with it, even if you use another editor for the bulk of your work. Furthermore, vim is designed to prioritize efficiency over intuitiveness, so it's really helpful to learn a few vim ideas before you need them. To practice with Vim, open this course's Binder page, open a new Terminal ("New", top right), and run vim tmp.txt. Alternatively, you can run vim in your own Terminal if you have macOS or Linux, or you can download it for Windows.

The most important distinction between vim and most other text editors is that it has multiple modes, the main ones being insert mode and command mode. Insert mode is similar to what other editors provide: keystrokes you type appear as characters in the file. Command mode is for performing various actions on the file.

A vim session often opens to command mode by default. To activate insert mode, press i. To get back to command mode, press the escape key. To save a file, type :w while in command mode and press enter. To close the file, type :q from command mode and press enter. To force-exit vim, type :q! while in command mode and press enter.

To undo and redo, use u and ctrl-r. Copy and paste are yy and p; Page up and page down are ctrl-u and ctrl-d.

Exercise
The single most important vim command is the one for force-exiting, because sometimes a vim editor opens automatically when you run some other command, and all you want to do is get out. If you are in insert mode, what key sequence must you enter to force-exit vim?

Solution. The correct key sequence is [esc]:q!: the escape button switches to command mode, and then :q! force-exits.

Variables

Bash supports variable definition using similar syntax to Python. The main differences are (1) spaces cannot be used around the equals sign, and (2) variable names are conventionally all upper case. Another distinction from Python is that a dollar sign is required to access a variable's value:

MY_FAVORITE_NUMBER=3
echo $MY_FAVORITE_NUMBER

The command echo simply prints its arguments.

Some special variables are available in a bash session without you having to define them yourself. For example, if you run echo $PATH, you'll see a colon-separated list of directories. These are the directories where bash searches for executable files when you run a command. You can see which executable is being run for a given command name using the which command. For example which echo prints /bin/echo. If you look in the /bin directory, you'll see that many of the bash commands we've discussed so far are actually executables in that directory.

Utilities you install on your computer often make their executables available at the command line by modifying PATH. This is done by inserting a line of code in your bash profile, which is a file with a special name that is read by bash every time you start a bash session. For example, if you have a directory, say Users/jovyan/anaconda3/bin, which contains executables that you want to be able to run from the command line, you can add the line

export PATH="/Users/jovyan/anaconda3/bin:$PATH"

to ~/.bash_profile (the ~ refers to your ).

In the command export PATH="/Users/jovyan/anaconda3/bin:$PATH", the dollar sign is used to access the original value of PATH (so that you're adding to the set of PATH directories, not replacing all of the ones that were stored in PATH previously), and the export command makes the new value of PATH available to the bash session (rather than just the ~/.bash_profile script).

If you try to run a command and bash says command not found, one strong possibility is that the executable file that should run that command is "not on your PATH" (a phrase you will see often on StackOverflow!). The solution to this problem is to locate the executable's directory—usually by searching the internet to figure out where the installer puts the executable by default—and edit your ~/.bash_profile accordingly.

Exercise
Write a line of bash code that adds /Library/Frameworks/R.framework/Resources to the end of PATH, so that directory is searched for executables last when a command is run in bash. Where should that line of code be placed?

Solution. The appropriate bash command is export PATH="$PATH:/Library/Frameworks/R.framework", and it should go in ~/.bash_profile.

Piping

The output of a command like echo $PATH, which prints to the screen by default, may be redirected to a file using the operators > or >> or fed as input to another bash command on the same line using the pipe operator |. The use of such operators in Unix is called piping, and it's a key element of bash's design.

The difference between > and >> is that the former eliminates whatever might have been in the file previously, and the latter appends to the end of the target file's current contents.

For example, tmp.txt will contain two lines of text after these two commands are run:

echo "This is the first line" > tmp.txt
echo "This is the second line" >> tmp.txt

You can check that this worked as expected by running :

The pipe operator is the mechanism for composing commands in Unix. For example,

echo "The quick brown fox jumped over the lazy dog" | wc

forwards the text returned by the first command to the wc command, thereby counting the number of lines, words, and characters in the sentence "The quick brown fox jumped over the lazy dog".

Exercise
Write a three-command pipe, using cat, head and tail, prints the portion of a document mydoc.txt between lines 100 and 110.

Solution. If we select the first 110 lines, then the desired lines are the last 11 lines of that selection. So we can do

cat mydoc.txt | head -n 110 | tail -n 11

Glob Patterns

Performing actions on a single file at a time can get pretty time-consuming if there are many files involved. Consider, for example, a directory with 1000 images files, one for each frame of a short video. Suppose the images are named img000.png, img001.png, and so on. If you want to move all of these files into a subdirectory called frames, you can do the third and fourth lines of this block:

touch img000.png # make sure there are actually
touch img001.png # image files to move
mkdir frames
mv img*.png frames/

The asterisk in the file name is telling the command to act on every file whose name looks like img, followed by any number of other characters, followed by .png". We call img*.png a glob pattern (short for global). The asterisk is a wild card. The other common wildcards are ?, which matches any single character, and expressions like [a-e] which match any single character in a given range of characters. You can also list out the characters to match: [aeiou] matches any lowercase vowel.

Exercise
Which of the following names match the glob pattern [aA]nswer.*?

answer.1.txt
my-answer.py
Answer.tex

Solution. The first and third options match. The second one doesn't because the pattern specifies that the first character must be uppercase or lowercase a.

Bruno
Bruno Bruno