|
Like to see your
advert somewhere on the Alternative TPF Hompage
site ?

Disclaimer: The
Alternative TPF Homepage is not responsible for the content of external sites
| |
TPF System Overview
The Transaction Processing Facility (TPF) is intended, one
would imagine, to process 'transactions'. So perhaps the first
thing to define as we venture into the workings of the system is
exactly what we mean by a 'transaction'. Typically we are already
in some difficulty ! For the sake of continuity let us use an
example from the airline industry to try and explain the
potential problem. Imagine you are planning a vacation but you
haven't decided whether to head for the sun and sand of the
Caribbean or the snow and slopes of Colorado. You've decided
that you must take your vacation in the third or fourth week in
March and you'd rather not spend more than $2000 for a five night
stay.
Armed with this information you call your travel agent. Over
the next several minutes she (or he) will ask you several
questions to obtain the dates of your intended travel,
destination etc. You will have been able to get several fare
quotes as well, which should help you to decide which vacation
best fits your criteria and perhaps you will actually decide to
go ahead and make reservations. If you decide to finance your
vacation with your favorite credit card you could even purchase
the tickets and reserve your hotel room all in the same telephone
call.
Now let us look a little closer at what just happened. You
have made reservations and paid for the tickets, so in a sense
you have completed a transaction with the travel agent. Now let
us look at the same sequence of events from the perspective of
the travel agent. When she answered your call you probably asked
first about some flight availability around the time you wanted
to travel. She would have typed in a message on her computer
terminal that returns a display of flights that still have seats
available for sale. She would then discuss with you the options
you have, with flight times and dates. When you asked about the
fares, having chosen one or two possible flights, the agent would
type anothdr message into her terminal to obtain quotes for the
various fare possibilities. As we all know this would probably
not be a small display as the varieties of airline fares these
days are anything but small (or simple). This process of asking
the system, by means of typed commands, for information would
continue until you had decided on a particular itinerary and were
ready to make a reservation. The agent would then type in a new
sequence of messages, only this time she would be recording in
the system your own personal details, such as name and telephone
number, plus details of your itinerary. If you chose to purchase
the tickets via credit card she would also input your card number
to the system.
As you can see if we regard the transaction as being the
entire telephone call, up to and including the purchase of the
tickets then that one transaction involved many messages being
sent to the computer system.
Consider another example of a transaction, where you wish to
purchase an item with your credit card. As the cashier passes
your card through the card reader a message is sent to the credit
card company's TPF system and the response is sent back with an
authorization code (we hope) for the transaction. In this case
there is nothing more to be done. The entire transaction
consisted of just one actual message to the TPF system. So we can
see that a 'tranraction' in the TPF world can be one or several
messages. However if there is more than one message in a
transaction then the messages must be logically linked in some
way. In other words they must be contributing to a single end
like selling a seat on an airplane or booking a hotel room etc.
In fact the link goes deeper than that for TPF since there is
actually a key on the keyboard designated as EOT
(End-Of-Transaction). When the system receives this message a
host of special processing is activated to tidy up any resources
in use as a result of the previous, associated, queries.
We could define a transaction in TPF in this way:
A transaction is composed of one or more messages into the
system designed to achheve a particular goal. The goal and how
it is actually achieved depends on the implementation of the
application and the complexity of the information required by the
agent.
So now we have a definition of a transaction we need to
explore exactly what a 'message' is and how it travels from the
agent's terminal to the TPF system complex and what happens after
it gets there.
We're in luck here, as the definition of a message is
relatively simple. When the agent is typing at her keyboard the
terminal is simply echoing her typing to the computer screen
until she presses the 'Enter' key. By pressing this key she has
told the terminal to send the whole contents of its keyboard
buffer (everything the agent has just typed) to the next link in
the communications chain, which will be some sort of terminal
controller. For the communications network to function properly
some extra data must be added to the characters typed by the
agent. Some means of identifying from which terminal the input
is coming is required so that the system knows where to send the
response back to. This 'data stream' is what is eventually
delivered to the TPF system as an input 'message'. At this point
we should briefly mention what lies between the terminal and the
TPF system.
One thing we already know is that TPF supports large
networks that all access a centralized processing complex. The
diagram in figure 1 shows a sample TPF network and processing
complex. The terminals are attached to the central complex by
means of wide area communications facilities. This name is used
to describe transmission facilities that are supplied by common
communications carriers. There are typically two different ways
in which the TPF complex is connected to the terminal network.
One way is to lease communications channels (sometimes called
'leased lines'), connect terminal concentrators and form a
private network. The other is to lease local access lines to a
common communications carrier's data network (this network may
also be used by other systems). The terminal concentrators are
also connected to the network via the local access lines. As you
would expect in a system that routinely supports huge terminal
networks TPF is no stranger to communications. Details of the
various techniques used to integrate TPF into existing networks
or to establish a network solely for TPF will be discussed in
Chapter (x).
Now, in our whirlwind tour of a typical TPF network we have
finally reached the TPF system itself. It is likely however that
we will be forced to wait a very short period of time as the
system is almost certainly busy with other work. At regular
intervals TPF is interrupted and takes time out to accept more
messages before resuming where it left off with the work it had
currently in progress. As a result of our message being accepted
into the system it is placed on the input list, one of the many
TPF work lists. These lists will be discussed in detail later but
are simply used much as you or I would use a 'To Do' list to
schedule activities for the system.
It is now time to introduce for the first time the central
process of the TPF system: the CPU loop. The CPU loop is a
section of code within the control program that accesses and
processes the various 'To Do' lists maintained by TPF. After the
initial set-up of the system, which will be described in detail
later, the final action of the set-up programs is to pass control
to the CPU loop code. The loop of code then proceeds to check
each list in turn to decide if there is any work for the system
to do. As far as new input from the communications subsystem is
concerned there is a slightly different mechanism for that. We
have already said that TPF is an 'interrupt driven' system and
one demonstration of this is the accepting of fresh input to the
main TPF processor(s). Based on a predetermined interval of time
an interrupt is generated by the system that causes the TPF
system to accept all input currently waiting in the
communications subsystem and place the items of work on the input
list. Once this has been accomplished the system returns control
to whatever process was active at the time of the interrupt. In
this way input from the communications subsystem is periodically
'collected' but TPF retains the basic premise that the work
within the system already is more important, hence the need for
an interrupt to cause fresh input to be received. Although the
process might appear complicated from this brief introduction it
will be shown in later chapters to be quite simple and effective.
In some documentation about TPF and often in conversation
you will come across the term 'entry'. This is an often
mishandled term that is used to describe anything from an entire
transaction to a message or something entirely different and
equally imprecise. Since we are trying to be as accurate as
possible the term 'Entry' should really only be applied to our
message after it has been taken off the input list in the TPF
system and attached to an Entry Control Block (ECB). The ECB
will remain associated with this entry for its lifetime in the
system and holds within it numerous pointers and work area that
is used by the TPF system as processing proceeds. The ECB will
be described in some detail in a later section as it is the
central repository for system data associated with the processing
of the particular entry. It is, however, interesting to note
here some key features of the ECB:
- ECBs exist only in main memory (not on file)
- ECBs contain pointers to main memory areas allocated to this
entry by the system
- ECBs contain pointers to the file areas that have been accessed
for the processing of this entry
- ECBs contain the system information necessary for nesting
programs (i.e. processing in program 1, 'entering' program 2,
perhaps to perform some small function, then returning to program
1, at the place you left previously)
Based on a number of criteria associated with the type and
content of the input message the TPF system decides which
application program should process our message and activates it
when our item of work is next in line on the list. To achieve
this some standard TPF 'middleware' programs are used. After the
creation of the ECB, using the Control Program routine OPZERO,
control is passed to the TPF input message editor, UII. UII uses
a TPF utility program, WGR, to locate the Agents Assembly Area
(AAA) associated with the terminal that has input the message.
UII examines the Primary Action Code (PAC), the first character
of the input message (excluding communications information) and
from tables, usually, decides which application segment should
receive the entry for processing. Processing within that
program will now continue until some TPF system function, or
physical I/O, is requested. If it is necessary to access the
database for data the processing of our entry will be suspended
by the system to wait for the successful completion of the data
retrieval by the system routines. When the I/O completes our
entry is reactivated by being placed on another list called the
ready list. Finally after performing whatever processing is
necessary to satisfy the requirements of the input message the
application package will issue an 'EXITC' macro which causes all
system resources, like main storage areas that might have been
used, to be returned to the system. Normally before issuing the
EXITC some sort of output message will have been sent, to the
terminal supplying the input, which will be either the requested
information or perhaps some sort of error message in the case of
a mistyped message or some other sort of problem in the process.
Since we still have the original terminal address that we
saved as we started our journey through the TPF system (in a
field in the ECB) the communications software is able to locate
the correct terminal and process the response. This entire
process, which has been severely summarized for this
introduction, should have taken no more than three seconds.
As we can see immediately there are some major differences
here from the more widely used mainframe operating systems.
There is no concept of a 'job' that is submitted by a user nor is
there a need to create a Tele-Procdssing (TP) monitor to run in
the framework of a batch environment such as is the case in the
Customer Information Control System (CICS) which runs under
Multiple Virtual Storage (MVS). There are drawbacks to the TPF
design strategy; batch style work, or those functions that are
computation intensive, are not so easily accommodated in the TPF
system with its philosophy of a quick turnaround for messages
into the system. On the other hand there is no overhead of a
monitoring system such as are found in the batch-oriented
environments. We will see in a later chapter that some modern
TPF applications are approaching the line between a real-time and
a batch application. For some time this situation has been
helped by the ever increasing power of the hardware managing to
stay just ahead of user demands for system functionality. In the
last five years the need for ever more sophisticated packages
have threatened to swamp even the largest and most powerful TPF
complexes worldwide.
The entire TPF system is structured to permit the maximum
number of messages to be processed in a unit interval of time,
which is usually expressed as 'messages per second'. This must
be accomplished while retaining the quick response times to the
agents in the field. Response time is measured as the time
between the user pressing the button on his keyboard that
initiates the transmission of the message through the
communications network and the display of the first character of
the response, transmitted from the TPF system, on the agent's
screen. We can try to summarize the characteristics that a
typical TPF system will have:
- Realtime interaction with widely dispersed 'agents'
- Relatively short message lengths in both directions
- A common, central, database
- The need to update the database during realtime operation
- The need to perform database maintenance during system
operation
- Duplicate data records for performance and reliability
- A communications interface to support a large, widely
dispersed, network of terminals (of various types)
- The response time must 'match' the application
- The system must operate 24 hours a day
- High system availability is required
- System restart, in the event of a problem must be fast
- Some form of dynamic monitoring of the system performance must
be available
To try and put these requirements in some sort of
perspective I would like to use some figures from an actual TPF
installation. The data is taken from a case study of Trans World
Airlines' (TWA) system which appeared in the Communications of
the ACM in 1984.
- The system had 11,000 - 12,000 terminals attached to it
throughout the world.
- Unlike some other businesses that generate vast amounts of
paper to archive information in parallel with any computer system
processing the only paper backup for the airline is a passenger's
ticket. The airline cannot operate if the computer system is not
available. To improve availability TWA maintains full back-up
power systems that can be online in seconds in the event of power
failure. The interim period is protected by batteries. All data
is duplicated online. In 1976 (the last year that figures were
available) the TWA system had 131 incidents of system outage, but
the estimated losses from such outages in terms of revenue (a
notoriously subjective figure), were considered very small
compared to 1972's 548 outages, where revenue losses were
estimated to be nearly 10 times as great. To put this another
way the system was scheduled to be available 98.7% of the total
8784 hours in that year. It was actually available 99.85% of the
scheduled time. On 270 days there were no system outages at all,
and the mean outage was 6 minutes, though one single outage
lasted 230 minutes.
- So we can see that the system is available to the agents 98% of
the time and TWA guarantees travel agents connected to its system
an availability of 95% or no payment will be required. A typical
daily transaction volume was 7 million and at peak times the
message rate reached 170 messages per second with spikes of over
200 messages per second. (Even at that time another airline was
peaking at 1000 messages per second and recently an airline
system has processed in excess of 3000 messages per second using
TPF 3.1) Average response time was 1.5 seconds and the airline
attempts to ensure that 90% of the messages will have a response
time of no greater than 3 seconds.
- The online database contained between 1 and 1.5 million
passenger records, each consisting of 1.2 to 1.5K of data.
Passenger records constitute the bulk, although by no means all,
of the system's data. So passenger records made up a database of
2 billion bytes, fully duplicated. In addition to this there was
the inventory database etc. Secondary storage had a capacity for
over 45 Gigabytes (1 billion bytes = 1 Gigabyte). Using native
mode IBM 3350's with a capacity of 317.5 megabytes each this
would have required 142 such drives, each about the size of a
household washing machine.
- Reservations are indexed by passenger name, flight number and
date (all three being required to retrieve a reservation) while
inventory can be accessed by date, departure time or city.
It should be clear from this, albeit dated, glimpse into the
real world, if it wasn't already plain, that TPF systems operate
under severe constraints. Since most TPF systems are what are
known as 'Strategic Systems', and are highly prized, and coveted,
corporate assets by their owners it is not always possible to get
up to date performance figures. This is analogous to Formula One
motor racing teams not wishing to publish how their latest
modifications have improved their car's cornering or speed on the
straight. They prefer to demonstrate any increases in power by
the number of new users they can sign onto their systems or how
easily they can cope with major activity spikes (e.g. the day
after a major airline fare reduction...). Perhaps we need to
mention here some general ways in which TPF attempts to meet
these stringent requirements.
TPF should be considered a high performance, realtime,
message driven operating system. That last term describes the
way in which the incoming messages are collected and concentrated
by terminal concentrators on the network and then polled by the
TPF system. This implies that the messages are arriving quite
randomly and hence the term 'message-driven' for TPF.
From a design perspective there are a number of basic
techniques that can be employed to cope with the TPF constraints.
One is that some decisions that need to be made during system
operation could be made by either the system software or by an
operator at the system terminal. The decision to be made here is
whether it is better to spend the time waiting for a human
operator to respond to a situation or whether the time saved by
allowing the system to make a given decision is worth the
complexities that would be introduced into the software to handle
it. What often happens in TPF is that only decisions either
requiring knowledge the system could not have, or that would
potentially be destructive to data within the system, are the
responsibility of system requested human intervention. In almost
all other cases the system has a 'default' course of action if it
cannot solve the problem any other way.
In any large computer system there is a need for some
flexibility in the actual configuration of the components of the
system. There is often an ongoing need to add and remove
devices, perhaps for maintenance or possibly expansion. The
system's configuration is established by a process called System
Generation (sysgen) in which the software of the operating system
is set up to reflect the actual configuration of the hardware.
In some operating environments configuration information can be
supplied to the system software during actual initialization (the
act of bringing the system to an operable state, perhaps after a
power failure or nightly maintenance etc). In the case of TPF,
where the limiting of any outage is central to its philosophy,
all configuration information must be supplied at sysgen time
since the path through initialization must be as short as
possible to limit any downtime. This is not to say that every
INDIVIDUAL device must be known about before the system can IPL.
It means that every TYPE of device should be known about.
TPF is also a highly structured system. It achieves this
structure by making each item of work as small as possible, using
as few resources as possible, and doing relatively trivial
amounts of actual computation work during its lifetime (it might
be wise to mention here that the phrase 'trivial amount of
computation' is widely used in computer science literature to
denote a small amount of processing, in comparison with
communications delays and I/O operational delays, and it does
not imply 'unimportant' !). Of course it is possible to have many
hundreds of tasks active simultaneously within the system so
there is a very high degree of multiprogrammhng involved.
To revisit the aim of 24 hour availability, there should
always be a backup mainframe computer standing by in the case of
a hardware failure on the main system. In some cases there will
be times when a switch over to an alternate machine will be done
for regular maintenance also. The backup machine must be capable
of being IPL'd as a TPF system almost immediately in the event of
a disaster so although it may be used for other work that work
should either be easy to repeat or easy to suspend in case of an
emergency. The failure of a machine is a fairly rare occurrenbe
these days but nonetheless it does still happen and most TPF
installations will have more than one machine capable of
providing backup to its online TPF machine(s).
A final general consideration to help meet the demands of
the TPF philosophy is the duplication of data within the database
and its actual arrangement within the TPF database. It is
possible to choose to duplicate all data or a selection. Again
most operating TPF installations choose the safer route and
duplicate all data. The arrangement of the application data
within the database can be crucial in minimizing I/O delays
during processing. Response times can be dramatically improved
by the following:
- Allocating shared data resources prior to online execution
- Organizing the physical structure of the shared data to improve
data accessibility. The TPF users are responsible for customizing
this aspect of the system to match their unique application
requirements.
- Placing the logic used to access the data in the system
software which, in some systems, is frequently found in
application appendages called 'access methods'.
It would probably be wise at this point to revisit the
concept of performance and the way it is customarily summarized
in the TPF world, by messages processed per second. Unless we
have a fairly good idea of what processing is involved with a
'message' how can we judge the real performance of one TPF system
to another or a TPF system to any other system ? It is common
practice nowadays to issue benchmark test results for Personal
Computers for example. These results show the performance of
competing brands when processing 'standard' test programs that
are designed to test the computer in certain key functional
areas. Wouldn't it be convenient if such a set of benchmark
programs existed for TPF ? Well yes it would but because of the
nature of the uses of TPF and the different developments in the
applications areas within various TPF installations no one
benchmark program exists.
To approach this problem somewhat scientifically we would
need to select a 'typical' input message that we might expect on
a given TPF system. This 'typical' message should be similar to
as many other messages likely to be input across the entire
applications family as possible. Clearly it is no small task in
itself to identhfy such a message. It requires an intimate
knowledge of the various applications, including the number of
database accesses a given enquiry would generate. Having once
identified this typical message one would then write a program to
deliver a number of these similar messages into the TPF system
being tested and using the monitoring packages supplied it would
be possible to observe how the various parts of the system
reacted to the message load. In an ideal world this would be the
preferred way to assess the performance of a TPF system
configuration for your typical system messages. The catch, of
course, is that selection of the typical message. There is also
the concept of the 'message mix'. Since at any given moment the
TPF system is likely to be handling many messages simultaneously
the types of messages active concurrently can be a factor in the
overall performance picture. If all active entries are
computation intensive relative to their I/O requirements they
could slow each other down. Similarly if several of the messages
each need the same record from DASD they could find themselves
having to wait for it.
The way that most large TPF installations have approached
this problem is to simulate a normal message mix when testing or
stress testing their systems. They do this either by capturing
incoming messages to their online or programmer test systems and
replaying them into the system being measured. This is also not
a trivial exercise as the database must be closely coordinated
with the system from which the messages were captured otherwise
the entries will not process correctly and the test will be
useless. The longer and more tedious method is to manually
input, into a package designed to transmit messages at a
predetermined rate from an intelligent termin
|