Skip to content
July 29, 2010 / gus3

Demonstrating multi-processing in Bash 4

I have an on-going project, to code the Collatz sequence in as many languages as I can. So far, I’ve implemented it in C (integer and floating-point), assembly (x86, x86_64, and SPARC32), Java, BASIC in Open Office.org, Tcl, Lisp, Ada, Python, and Bash. Naturally, the fastest method is assembly, which can by-pass all the cautions of the other, translated languages.

One point I’ve given myself extra credit for, is to use both cores on my AMD Athlon64 X2, making sure to print out results in order. Different languages have different ways to do this. (Python has the most elegant method I’ve seen, using a source array and a result array, and calling Pool() to dispatch the next iteration to whichever CPU just became available, until all results are finished. However, this means deferring printing the results until the entire result set is available.)

Bash has had multi-processing for a long time, via job control, the $! environment variable, and the wait command. Judicious use of parentheses for sub-processes, and pipes where necessary, can put comparatively long-term procedures into the background. One example of this is the time-of-day monitor I used in my holiday slideshow.

In the case of multi-processing calculations, I wanted to dispatch one thread for calculating Collatz values for odd numbers, and another thread for even numbers. The Python version didn’t do this, but the C and Java versions did, using a global variable to indicate which thread was allowed to print its latest result. That is, the “odd” thread would wait until the flag indicated it was allowed to print, then it would print its result, and finally it would set the flag to allow the “even” thread to print. Each thread, when finished, would pass permission to the other thread.

The Bash version took a different approach: each thread simply printed its results in order, while another thread would read results alternately, via Unix pipes, from the worker threads. The use of Unix pipes provided some buffering, but if one of the threads got ahead of the other, and tried to supply more output than the pipe buffer could handle, it would be suspended until buffer space became available.

Okay, time for some code. First, the single-threaded version:

#!/bin/sh

# The Collatz sequence generator
# $1 = iteration limit

collatz() {
  retval=0
  current=$1
  odd=0

  while [ ! "$current" = "1" ] ; do
    let "retval += 1"
    # test for oddness
    let "odd = current % 2"
    if [ "$odd" = "1" ] ; then
      let "current = current * 3 + 1"
    else
      let "current = current / 2"
    fi
  done
  # give result to caller
  echo $retval
}

max=$1
for ((current=1 ;
      current <= max ;
      current++)) ; do
  echo "collatz($current) = `collatz $current`"
done

The design is purely single-threaded, and on a single-CPU system, it would be the fastest approach.

Now, for the multi-threaded approach:

#!/bin/sh
# The Collatz sequence generator
# re-worked to use Bash 4 coprocesses
# $1 = iteration limit (on the command line)

collatz() {
  retval=0
  current=$1 # from iterator() call
  odd=0

  while [ ! "$current" = "1" ] ; do
    let "retval += 1"

    # test for oddness
    let "odd = current % 2"
    if [ "$odd" = "1" ] ; then
      let "current = current * 3 + 1"
    else
      let "current = current / 2"
    fi
  done

  # give result to caller
  echo $retval
}

iterator() {
  # called in the "coproc" statements
  # in the main thread
  start=$1 ; end=$2 ; step=$3
  for ((current=$start ;
        current <= end ;
        current+=$step)) ; do
    echo "collatz($current) = `collatz $current`"
  done
  echo EOF
  # keep file descriptors open
  sleep 200
}

max=$1 # from command line

# We need separate pipes, so we
# need to give explicit names to
# the coprocesses' stdin/stdout
# pipe arrays. Bash 4 allows this
# only for compound commands,
# so we put the iterator call
# in braces.

# 1 to max, step 2
coproc ODD { iterator 1 $max 2 ; }
# 2 to max, step 2
coproc EVEN { iterator 2 $max 2 ; }

# Now, by reading from each pipe
# in turn, we can guarantee
# that results are printed in order

while true ; do
  read DATA <&${ODD[0]}
  # as soon as one thread's done, we're done
  [ "EOF" == "$DATA" ] && break;
  echo $DATA
  read DATA <&${EVEN[0]}
  [ "EOF" == "$DATA" ] && break;
  echo $DATA
done

kill $ODD_PID $EVEN_PID
exit 0

The collatz() function is the same, but the rest of the code has more work involved. The iteration has been moved into its own iterator() function, for multiple coprocess invocation. This is exactly what happens in the coproc statements.

Some explanation is needed for simple and compound commands in coproc statements. For simple commands called as coprocesses, Bash creates three environment variables in the calling process, ${COPROC[0]}, ${COPROC[1]}, and $COPROC_PID. The first two are aliases for the coprocess’s output and input pipes, respectively. This provides an easy way for the calling process to communicate with the coprocess. The final while loop demonstrates this.

If a coprocess terminates before its output is consumed, the pipe is closed and any unconsumed data in its output buffer is discarded. The calling process is left trying to read from a pipe that no longer exists, causing an “ambiguous redirect” error and possibly an infinite loop. Keeping the coprocess alive with a sleep 200 statement, also keeps the output pipe open long enough for the calling process to retrieve all the data. Once all the data is consumed, the calling process can terminate the coprocess with kill $COPROC_PID.

However, calling and managing more than one coprocess requires separate names for their environment variables. Naming coprocesses is possible only for compound commands, enclosed in parentheses or braces. Even if the enclosed command list contains only one shell command, the braces or parentheses designates it as a compound command.

This makes available the option of assigning a name to its coprocess, and that name forms the bases of the associated environment variables. Where a simple command gets the default $COPROC_PID, a compound command gets $NAME_PID. In my script above, I used both ODD and EVEN as coprocess names, so their process ID’s get stored in $ODD_PID and $EVEN_PID. The same principle applies to the names of the input and output aliases, so the script reads from ${ODD[0]} and ${EVEN[0]}.

The main loop (the final while loop) reads alternately from the ODD and EVEN outputs, and so gets the Collatz values for 1, then 2, then 3, then 4, etc. When one of the coprocesses realizes its next iteration would exceed the final value, it emits “EOF”, causing the caller’s while loop to exit. The caller then terminates the coprocesses with kill $ODD_PID $EVEN_PID.

This seems like a lot of work, but is it worth it? On my dual-core AMD desktop, the single-threaded version took 44 seconds to calculate the first 10,000 Collatz sequences. The coprocess version took 28 seconds. In both cases, the invocation looked like:

$ /usr/bin/time ./collatz.sh 10000 > /dev/null

By directing stdout to /dev/null, any overhead caused by the terminal program (xterm, Gnome terminal, or even a text-mode VT) is eliminated.

I will definitely use the knowledge from this exercise in future scripts that I write.

(All Bash scripts in this article are in the public domain, as all shell scripts should be.)

Advertisements

4 Comments

Leave a Comment
  1. K.S. Bhaskar / Jul 29 2010 1:15 pm

    I have spec’d a benchmark based on computing Collatz sequences – see http://docs.google.com/View?id=dd5f3337_24gcvprmcw
    Also, I have published the results of a benchmark written in GT.M and run on my laptop – see http://docs.google.com/View?id=dd5f3337_25fv4vnnfw
    It would be interesting to work your programs into benchmarks. If you need help understanding my GT.M program, please do ask.

  2. Vance / Jul 31 2010 2:31 am

    I played around with this a bit in a few languages, here are my results of how long it took to calculate the stopping time for Collatz sequences for the integers from 1 to 20000:
    C: 3.13s (1.0x)
    FORTRAN77: 5.14s (1.6x)
    awk: 6.21s (2.0x)
    bc (GNU): 26.42s (8.4x)
    bc (POSIX): 28.49s (9.1x)
    shell: 244.77s (78x)
    No real shock that C was the fastest. awk was the big surprise; I tend to think of it as a text-processing tool, not a mathematical one. It also was the smallest source file and one of the quickest to write. Conversely, bc’s performance was a disappointment.
    All of these were written single-threaded because a) I’m lazy and also a terrible programmer and b) I only have a single-core CPU. You might want to look into the R language – I’m not very familiar with it but it seems like the multicore package can take care of the parallelism for you with a minimum of effort.

  3. gus3 / Jul 31 2010 3:36 am

    Python’s multiprocessing framework bears some resemblance to Mac OS X’s Grand Central Dispatch. Given a list of N things to do, the dispatch sends “the next thing to do” to the next available CPU/core, until the list is empty. For Python’s Pool() method, the “next thing to do” is iteration over an array; for Mac OS X, it’s a C block or dispatch_* function calls.

  4. K.S. Bhaskar / Jul 31 2010 9:52 pm

    Just by way of comparison, my Collatz using GT.M (http://fis-gtm.com) to find the longest sequence of the numbers 1 through 20,000 takes less than one second. It does 1 through 200,000 in 9 seconds on my laptop which has a dual core Intel Core2Duo T7500 at 2.2GHz using four parallel threads.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: