Run, Subprocess, Run!

Python is awesome, and can pretty much do everything you ever wanted, but on rare occasion, you may want to call an external program. The original way to do this with Python was to use os.system. 

import os

return_code = os.system("echo 'May the force be with you'")

The message “May the force be with you” would be printed to the terminal via stdout, and the return code variable would be 0 as it did not error. Great for running a program, not so great if you need to capture its output.

So the Secret Order of the Pythonic Brotherhood* meet, performed the required rituals to appease our Benevolent Dictator for Life**, and brought fourth subprocess.                                                                                                         * not real  ** real 

Subprocess is a module dedicated to running other processes. You’ve probably already have used or encountered it in it’s many forms. subprocess.call , subprocess.check_callsubprocess.check_output or even the direct call to the process constructor subprocess.Popen.

These made life a lot easier, as you could now have easy interaction with the process pipes, running it as a direct call or opening a new shell first, and more. The downside was the inconvenience of having to switch between them depending on needs, as they all had different outputs considering how you interacted with them.

That all changed in Python 3.5, with the introduction of subprocess.run (for older versions check out reusables.run). It is simply the only subprocess command you should ever need! Let’s look at a quick example.

import subprocess

response = subprocess.run("echo 'Join the Dark Side!' 1>&2", 
                          stderr=subprocess.PIPE)

# CompletedProcess(args="echo 'Join the Dark Side!' 1>&2",
#                  returncode=0, 
#                  stderr=b"'Join the Dark Side!' \r\n")

Now check that response out. It’s an organized class, that stores what args you sent to start the subprocess, the returncode as well as stdout and/or stderr if there was a pipe specified for them. (If something was sent to stdout or stderr and there wasn’t a pipe specified, it would send it to the current terminal.)

As the return value is a class, you can access any of those attributes as normal.

print(response.stderr)
# Join the Dark Side!
print(response.returncode)
# 0
response.check_returncode() # Would return None in this case

It also includes a check_returncode function that will raise subprocess.CalledProcessError if the return code is not 0. 

Basically, you should use subprocess.run and never look back.  It’s only real limitation is that it is equivalent to using Popen(...).communicate() , which means you cannot provide multiple inputs, wait for certain output, or behave interactively in any manner.

There are plenty of additional capabilities that are good to know, this article will cover:

  1. Timeouts
  2. Shell
  3. Passing arguments as string or list
  4. Pipes and Buffers
  5. Input
  6. Working Directory
  7. Environment Variables

Timeout

In Python 2 it’s a real pain to have a timeout for a subprocess. You could potentially do a poll for a max amount of time before calling it quits. But if you had input it was much harder. On Linux you could use signals, but Windows required either a forever running background thread or run in a separate process.

Thankfully in the modern world we can simply specify one to the run command.

subprocess.run("ping 127.0.0.1", shell=True, timeout=1)

You’ll see some ping responses being printed to the terminal (as we didn’t send it to a pipe) then in a second (literally) see a traceback.

subprocess.TimeoutExpired: Command 'ping 127.0.0.1' timed out after 1 seconds

No crazy multiprocessing or signaling needed, only need to pass a keyword argument.

Shell

I see shell being overused and misunderstood a lot, so I want to define it’s behavior very clearly here. When shell=False  is set, there is no system shell started up., so the first argument must be a path to an executable file or else it will fail.

Setting shell=True will first spin up a system dependent shell process (commonly \bin\sh on Linux or cmd.exe on Windows) and run the command within it. With a shell you can use environment variables, shell built-in commands and have glob “*” expansion.

Also keep in mind a lot of programs are actual files on Linux, whereas they are shell built-ins on Windows. That’s why “echo” with shell=False will work on Linux but will break on Windows:

subprocess.run(["echo", "hi"])

# Linux: CompletedProcess(args=['echo', 'hi'], returncode=0)
# Windows: FileNotFoundError: 
#          [WinError 2] The system cannot find the file specified

“So, just always use shell?” Nope, it’s actually better to avoid it whenever possible. It’s costly, aka slower, to spin up a new shell, and it’s susceptible to shell injection vulnerabilities.

If you are going to be calling an executable file, it’s best to always keep shell=False unless you need one of the shell’s features.

Arguments as string or list

There seems to be very odd behavior with the first argument being passed to subprocess functions, including .run that changes from a list to a string if you use shell=True . In general if shell=False  (the default behavior) pass in a list of arguments. If shell=True, then pass in a string.

subprocess.run(['echo', 'howdy'])             # List when shell=False

subprocess.run('echo "howdy"', shell=True)    # String when shell=True

However it’s important to know why you should do that. It’s because of because of how Python has to create the new processes and send them information, which differs across operating systems.

On Windows you can get away with murder pass either a list or string for either case.  Because when creating a new process, Python has to interpret the list of arguments into a string anyways when shell=False; Otherwise, when shell=True, the string is sent directly to the shell as-is.

On Linux, the same scenario happens when shell=True. The string will be passed directly to the newly spawned shell as is, so it can expand globs and environment variables.  However, if a list is passed, it is sent as positional arguments to the shell. So if you have:

subprocess.run(['echo', 'howdy'], shell=True)

It is not sending “howdy” as an argument to echo , but rather to /bin/sh.

/bin/sh -c "echo" "howdy"

Which will result in confusing behavior of nothing being returned to stdout and no error.

And going the other direction can be a pain on Linux. When shell=False and a string is provided, the entire thing is treated as the path to the program. Which is helpful if you want to run something without passing any arguments, but can be confusing at first when it  returns a FileNotFoundError .

subprocess.run('echo "howdy"')

# FileNotFoundError: [Errno 2] No such file or directory: 'echo "howdy"'

So to be safe, simply remember:

subprocess.run(['echo', 'howdy'])             # List when shell=False

subprocess.run('echo "howdy"', shell=True)    # String when shell=True

You can also “cheat” by always building a string, then use shlex.split on it if you don’t need to use a shell.

import shlex

args = shlex.split("conquer --who 'mine enemy'" 
                   "--when 'Sometime in the next, eh, \"6\" minutes'")

print(args)
# ['conquer ', '--who', 'mine enemy', 
#  '--when', 'Sometime in the next, eh, "6" minutes']


subprocess.run(args)

(Note that shlex.split should also be sent posix=False when being used on Windows)

Stream, Pipes and Buffers

Pipes and buffers are both sections of memory used for the storage and exchange of data between processes. Pipes are designed to transfer and hold the data, while buffers act as temporary vessels to transfer data to files.

Setting stdout or stderr streams to subprocess.PIPE will save any output from the program to memory, and then stored in the CompletedProcess class under the corresponding attribute name.  If you do not set them, they will write to the corresponding default file descriptors, which are same as sys.stdout (aka file descriptor 1) and so on. So if you redirect sys.stdout the subprocess stdout will also be redirected there.

Another common use case is to send their output to a file, i.e:

subprocess.run('sh info_gathering.sh', 
               stdout=open('comp_info.txt', 'w'), 
               shell=True, 
               encoding='utf-8', 
               bufsize=4096)

That way the output is stored into a file, using a buffer to temporarily hold information in memory until a large enough section is worth writing to the file.

If encoding is specified (Python 3.6+), the incoming bytes will be decoded and the buffer will be treated as text, aka “text mode”. This can also happen if either errors or universal_newlines keyword arguments are specified.

There are multiple different ways to use buffering:

bufsizeshorthand description
0unbufferedData will be directly written to file
1line bufferedText mode only, will write out buffer on `\n`
-1system defaultLine buffered if text mode, otherwise will generally be 4096 or 8192
>= 1sized bufferWrite out when (approx) that amount of bytes are in the buffer

Input

With run there is a simple keyword argument of input  the same as Popen().communicate(input) . It is a one time dump to standard input of the new process, which can read any of it at it’s choosing. However it is not possible to wait for certain output for event before sending input, that is more suited to pexpect or similar.

Check

This allows run to be a drop in replacement of check_call. It makes sure the status return code is 0, or will raise subprocess.CalledProcessError.

s.run('exit 1', check=True, shell=True)
# subprocess.CalledProcessError: Command 'exit 1' returned non-zero exit status 1.
s.check_call('exit 1', shell=True)
# subprocess.CalledProcessError: Command 'exit 1' returned non-zero exit status 1.

Working directory

To start the process from a certain directory, pass in the argument cwd with the directory you want to start at.

s.run('dir', shell=True, cwd="C:\\")
# Volume in drive C has no label.
# Volume Serial Number is 58F1-C44C
# Directory of C:\

Environment Variables

To send environment variables to the program, it needs to be run in a shell.

s.run('echo "hey sport, here is a %TEST_VAR%"', 
      shell=True, 
      env={'TEST_VAR': 'fun toy'})

"hey sport, here is a fun toy"
# CompletedProcess(args='echo "hey sport, here is a %TEST_VAR%"',
#                  returncode=0)

It is not uncommon to want to pass the current environment variables as well as your own.

s.run('echo "hey sport, here is a %TEST_VAR%. Being run on %OS%"', 
      shell=True, 
      env=dict(os.environ, TEST_VAR='fun toy'))

"hey sport, here is a fun toy. Being run on Windows_NT"
# CompletedProcess(args='echo "hey sport, here is a %TEST_VAR%. Being run on %OS%"',
#                  returncode=0)