Process Flow Control With Shell Scripts

This article is a critical examination of the use of the UNIX shell (sh, ksh, bash, . . .) in the context of process flow control. The occurring and the problems described within this document are of conceptual nature and also apply to the use of other scripting languages (perl, python, . . .). In no way is it our intention to criticize these languages as a whole. Each of the named languages has its advantages and the author has developed many successful projects with the use of such languages within his career.

Process flow control

Within every IT department there exist tasks which must be fulfilled by executing several single processes. Such program flows consist of a few up to several hundred processes. In order to execute such process flows, it must be ascertained that every single process is executed coordinated, synchronized and in the right order. For this a process flow control is needed. In practice UNIX shell or other script languages are used for implementing such tasks.

In the beginning everything is easy

In the starting phase of a project everything is always quite simple. The number of processes is low, they are of low complexity, the amounts of data are easy manageable and flow control isn’t (yet) an issue. With a few lines of shell script about every process is managed, fast, pragmatic and affordable. Exactly this circumstance hides a great risk. As these solutions will bind considerable development resources because of increasing complexity and increased requirements regarding performance and reliability, they will make a reliable and stable operation in the medium-term at reasonable costs impossible. From the management point of view it is disastrous that in the beginning the costs for process flow control and monitoring are so low that they aren’t budgeted separately. The continuous and sneaky increase of costs will be invisible for the management for a long time (idle power!). At the time this issue is recognized as a problem, there have been spent enormous amounts for the development of control scripts and the required supporting functions and frameworks. In order to protect this investment and because of the fear for a migration of these process flow control systems to a suitable scheduling system, it is often tried to stick to this mode of operation. The costs will rise further and an increasing amount of employee resources will be bound. Turning away from this inefficient mode of operation will be increasingly expensive. 1 In this document we want to sensitise for this kind of problems and reveal that the use of a suitable scheduling system at an early stage will prevent the occurrence of this dilemma and at the same time reduces costs and frees resources for solving the real tasks.

A simple example

We’d like to exemplary present the development of a simple process flow by means of an easy example. We assume that for a simple processing two programs (P1 and P2) have to be executed in succession. This can be implemented by following shell script:

#!/bin/sh
P1
P2

This is really simple, isn’t it?

Error handling

Errors aren’t handled in the above script. In order to prevent P2 processing the wrong data it must be prevented that P2 starts when P1 reports an error. After this has been a recurrent annoyance, we’ll have to program some error handling into the script. It will look something like:

#!/bin/sh
P1
RET=$?
if [ $RET -ne 0 ]
then 
    echo "Fehler $RET in P1 !"
    exit $RET
fi
P2
RET=$?
if [ $RET -ne 0 ]
then
    echo "Fehler $RET in P2 !"
    exit $RET
fi

As we can see quite a few lines of code were added. These lines of code need to be tested because a non processed error can have fatal consequences. This kind of error handling will be differently implemented by different developers. The comprehensibility and maintainability decrease and the costs increase.

Restart

In our example P1 has a running time of about three hours and P2 often returns an error because of a resource shortage in temporary storage. Restarting the script would also superfluously execute P1 again, which results in a time to repair of three hours. In many environments some developer would copy the script, comment out the call of P1 and start the copy of this script. This produces a lot of work and holds a high risk of error. Because this isn’t an acceptable situation, the script needs some global memory and looks approximately like this:

#!/bin/sh
#
# STEP denotes the last step which has been successfully executed
#
STEP=‘cat stepfile‘
if [ $STEP -eq "0" ]
then
    P1
    RET=$?
    if [ $RET -ne 0 ]
    then
        echo "Fehler $RET in P1 !"
        exit $RET
    fi echo "1" > stepfile
fi
if [ $STEP -eq "1" ]
then
    P2
    RET=$?
    if [ $RET -ne 0 ]
    then
        echo "Fehler $RET in P2 !"
        exit $RET
    fi echo "2" > stepfile
fi
echo "0" > stepfile

This seems a fast solution which the developer might decide to implement. But this solution isn’t in fact a solution at all and holds many problems. The ‘stepfile’ has to be initialized before the first execution and might have to be initialized after the script aborted. Error handling for reading and writing of the stepfile is entirely missing. Besides that, it is impossible to run two instances of this script concurrently. In order to make this script business proof, a lot of development effort has to be invested. From now on we will forbear from giving more code examples for our mini project because it will easily evolve to a script with several hundred lines of code. And of course, we want to save us this development effort.

What else?

The example above only showed the first few simple problems of process flow control by scripting. In order to assure a stable and unobstructed operation, among others at least the following functions have to be available:

Monitoring and operator action
Transfer of control information
Possibility of parallel execution
Distributed execution
Resource control

The implementation of the mentioned and other functions into the process control system requires a substantial amount of development and maintenance effort. But if these functions are missing, it will be paid by (largely) increased costs for operation.

Monitoring and operator action

In order to guarantee a stable operation it has to be possible to monitor all processes. This means that all processes must journalise their progress (‘stepfile’ in our example). These protocols must be gathered and processed to get an overview of the currently running processes. This requires some repository which stores status information about running processes. To enable a fast error reaction it is necessary to be notified in case of an error. The needed notification system has to be developed and this alone represents a small project which can easily cost two-digit man-days. In case of problems or errors the operator must be able to intervene into running process flows. He must be able to restart, suspend, resume, cancel, skip, etc. processes or entire process flows. This has to be reasonably convenient which also implies a substantial amount of development effort.

Transfer of control information

It is often necessary to transfer information (timestamps, filenames, . . .) from one process to some following process. In a script based flow control this is typically implemented with the help of files which are written from one process and read by its successor. This solution again holds a lot of disadvantages. At the latest when one of the involved processes has to be executed at some other machine on the network, quite a lot of development effort has to be spent to implement the transfer of information.

Possibility of parallel execution

The script development obtains an entire new dimension if parts of a process flow can be executed in parallel. Processes must be started as background processes and at certain spots within the process flow waiting for the termination of those parallel running processes has to be implemented. This requires a high technical skill of the developer and is together with error handling, restart, monitoring and operating truly not a trivial task.

Distributed execution

When some parts of a process flow have to be executed on different computers, the scripts controlling the process will have to start processes on remote computers. This adds another layer of complexity because the pitfalls of remote commands (ssh, rsh, scp, . . .) regarding monitoring and error recognition have to be circumvented. We’re not even talking about potential security risks, e.g. because of storing passwords somewhere, here.

Resource control

The available systems resources are always limited. Are there at any time too many resources required, it will lead to reduced throughput and accumulated errors because of resource shortage. Therefore it is important to only start new processes when sufficient resources are available. To implement this with reasonable effort in a script based process control is hardly possible.

So far we have only highlighted some of the most important aspects of flow control as well as the resulting efforts when using some scripting solution. Without the use of an appropriate scheduling system significant costs will be generated during operations. Attempting to control these costs by improving the scripting infrastructure means that a huge development and inherent maintenance effort has to be invested.

BICsuite as an alternative and escape

independIT offers with its BICsuite Scheduling system an alternative and an escape from the script trap. The BICsuite Scheduling system offers all required functions to model huge and complex process flows without the need to implement any parts of the flow control within the subprocesses themselves. The effort for development, maintenance and operations will be drastically reduced by the use of the BICsuite Scheduling system. Additionally the operations will get stabler, less error-prone and securer. Recovery times can be noticeably shortened. 5 Concluding remark The script based process flow control requires unreasonable high efforts in development, maintenance and operations. An optimal, transparent and efficient operation is an unreachable goal when using a script based flow control. We therefore recommend the use of an appropriate scheduling system at an very early stage. The earlier a migration to such a system is done, the less investment is lost and the smaller the migration efforts are. If you are currently using scripting for process flow control, a prompt switch to the use of a scheduling system is highly recommendable. The independIT BICsuite Scheduling system offers all functions for development, monitoring and operations of complex process flows and minimizes the costs for development and maintenance thereof. At a fraction of the costs of a scripting solution the BICsuite Scheduling system supports the development and operation of a permanently stable and reliable IT system. Act now!