Pages

Sunday, September 18, 2011

Software System Implementation

cached from: http://www.comp.glam.ac.uk/pages/staff/tdhutchings/chapter8.html

Systems Implementation

Introduction

In this chapter we will discuss a number of issues concerned with the implementation of information systems. We will be looking at CASE and 4GL tools and at Relational Database Design, Prototyping and the Human Computer Interface.

Contents Page

Home Page

Case Tools

Case Tools (Computer Aided Software/Systems Engineering, also known in the literature as CAISE - Computer Aided Information Systems Engineering Tools) have been around for many years and range from basic diagramming packages to systems which take the information systems engineer through the complete process of analysis and design, producing fully documented system models and database creation statements in SQL, some going so far as to produce skeleton code.

At the University of Glamorgan we have the PC/Select package with the SSADM toolkit, over the next week you will complete a PC/Select Open Learning Package, the package covers the basic features of PC/Select, i.e. drawing an LDS, some DFDs and some ELHs and specifying the attributes, primary and foreign keys of the LDS to enable the subsequent creation of a file of SQL Create Table and Create Index Statements.

Contents Page

Home Page

4th Generation Tools

There are many popular 4th Generation Tools (4GLs) which range from packages like ACCESS and PARADOX, suitable for the development of reasonably simple applications, to multi-featured packages such as ORACLE which is powerful enough to handle enormous distributed mainframe applications.

4GLs typically comprise a Relational Database Management System (RDBMS) surrounded by a number of interfaces, ranging from a simple SQL language interpreter to windows based screen development packages. Many 4GLs also have their own 3rd generation programming languages which can be used to add processing power to the default capabilities of the 4GL. Still other 4GLs allow front ends to be developed using languages such as Visual Basic.

Various 4GLs are available at the University of Glamorgan including PARADOX, ACCESS and ORACLE. PARADOX and ACCESS are typically used for developing stand-alone or end-user applications. ORACLE however is a fully featured 4GE. It has a developers toolkit and a database administrators toolkit.

Developers Tools

ORACLE Objects

ORACLE Book

ORACLE Forms

ORACLE Reports

ORACLE Graphics

SQL*Plus

Administrators Tools

Database Manager

Database Expander

Object Manager

Session Manager

User Manager

Export/Import

SQL*Loader

Using ORACLE Forms it is easy to create rapid prototypes for applications using default processing capabilities. ORACLE also provides, through its own programming language (PL/SQL), the ability to introduce complex functionality. These features should enable your first rapid prototype to evolve into your finished application.

Contents Page

Home Page

Relational Database Design and Creation

The first step in the implementation process is the creation or modification of the database in a test environment. In simple relational environments this is usually a straightforward task. Consider a simple LDS in which we have a customer entity type owning many orders and a many to many relationship between orders and parts broken down with an order-line entity type. The following attributes have been identified:-

Customer(Customer_Number, Customer_Surname, Customer_Initials, Address_Line1, District, Postal_Town, County, Postcode, Telephone, Credit_Limit, Outstanding_Balance)

Order(Order_Number, Customer_Number*, Order_Date)

Part(Part_Number, Part_Description, Part_Price, Quantity_In_Stock, Reorder_Level, Reorder_Quantity)

Order_Line(Order_Number*, Part_Number*, Quantity)

Key:- Order_Number Primary Key

Customer_Number * Foreign Key

Here are some rules of thumb for use when converting a documented LDS into an ORACLE database (these need to be upgraded to refer to ORACLE 7):-

  • Each entity in the LDS becomes a table in the RDBMS.
  • Primary key fields should have the not null attribute specified.
  • Foreign key fields where the relationship is mandatory at the member end should also have the not null attribute specified.
  • Each table should have a unique index specified for the columns (or concatenation of columns) making up the primary key.
  • Each foreign key should have a non-unique index associated with it.

This database could be created in the ORACLE RDBMS as follows (these need to be upgraded to refer to ORACLE 7):-

Create table Customers

(Customer_Number Char(6) primary key,

Customer_Surname Char(30),

Customer_Initials Char(6),

Address_Line1 Char(30),

District Char(30),

Postal_Town Char(30),

County Char(30),

Postcode Char(10),

Telephone Char(20),

Credit_Limit Number(8,2),

Outstanding_Balance Number(8,2));

Create table Orders

(Order_Number Char(6) primary key,

Customer_Number Char(6)

foreign key

references CUSTOMERS(Customer_Number),

Order_Date Date);

Create table Parts

(Part_Number Char(6) primary key,

Part_Description Char(30),

Part_Price Number(8,2),

Quantity_In_Stock Number(5),

Reorder_Level Number(5),

Reorder_Quantity Number(5));

Create table Order_Lines

(Order_Number Char(6)

foreign key

references ORDERS(Order_Number),

Part_Number Char(6)

foreign key

references PARTS(Part_Number),

Quantity Number(4)

primary key (Order_Number, Part_Number));

Contents Page

Home Page

The Human Computer Interface

The Human Computer Interface (HCI) can be a key factor in the success of an application. A HCI designer should be aware of what humans are good at and what computers are good at and take this into account. Humans in general are good at controlling, monitoring, decision making and responding to unexpected events. Computers in general are good at storing and recalling data, processing information using pre-specified procedures and presenting options (e.g. a menu or a pick list). Remember your job is to make the task easier not more difficult! Here are some general principles of HCI design:-

  • Allocation of functions; Reduce the amount of memorisation required of commands, codes, syntax, and rules by the user. Reduce the amount of mental manipulation of data required.
  • Consistency; Be consistent within and across applications in the use of display formats, colours, key-strokes and on-line help, e.g. don’t use to move from field to field on one screen and on the next screen.
  • Expectations; Be aware of what other applications the end-user makes use of, identify their expectations and try to conform to them.
  • Ease of Learning/Use; Applications need to be easy to learn, easy to use and provide the required functionality. A balancing act may be necessary. The HCI may have to be designed for novices, intermittent users and experts. This can be very difficult, e.g. the novice and intermittent user would probably be pleased to be led by the nose through a multi-level menu system but this approach would slow up the expert user who knows exactly what they want to do and how to do it.
  • Designing Display Formats; The placement, organisation, sequence spacing, typography and labelling used in display formats can have a significant effect on the ease with which users can notice, interpret, remember and use the data presented.
  • Effective Wording; Text should be legible, brief, clear and consistent. Don’t abbreviate unless the abbreviation is significantly shorter or is more meaningful to the user, e.g. FORTRAN is clearer than Formula Translation Language.
  • Colour; Use colour conservatively. Overuse of colour can hinder rather than help causing particular problems to people with colour blindness.
  • Graphics; Sometimes a picture is worth a thousand words. It is usually easier to understand a graphical representation, e.g. a pie chart than a tabular representation of data.
  • Dialogue Design; Put the user in control, acknowledge their responses and give feedback. Not knowing whether a computer is responding to your actions or doing work on your behalf can be very exasperating.
  • Data Entry; Reduce the amount of data to be input by providing defaults and enabling the user to select data from a list displayed on the screen.
  • Help and Error Messages; use context sensitive help and clear helpful error messages.
  • Consistency; Information such as screen identifier, screen title, page number, date, error messages and system messages should always appear in the same place. Dates should be consistent throughout the application, e.g. DD-MM-YYYY.

Contents Page

Home Page

Prototyping

One of the most useful techniques for getting the HCI right is prototyping. The word prototype is defined as ‘an original thing in relation to a copy, imitation, representation, later specimen improved form etc.; a trail model, a preliminary version’ (Concise Oxford). In information systems engineering prototypes are used to both validate and identify user requirements to verify design and to provide a base line for the eventual development of the system. The idea is that a working model is much easier to understand, from the end-users point of view, than a set of diagrams and supporting documentation.

Some people have viewed prototyping as an alternative to analysis and design, this is not the case, the prototyping life cycle if carried out without conducting thorough analysis and design, is likely to be endless.

The prototyping lifecycle proceeds through six main phases:-

  • Scope; In this phase the boundaries of the system in question are identified.
  • Requirements Analysis; During this phase the requirements of the system are identified (this phase is roughly equivalent to the SSADM Requirements Analysis and Requirements Specification modules).
  • Design; This covers the Logical Design of the system in question (this phase is roughly equivalent to the SSADM Logical System Specification stage)
  • Prototype; During this phase a working model of the system is created by the developers using appropriate 4th generation tools (prototyping for information systems cannot really work effectively unless these tools are available).
  • Demonstrate; During this phase the prototype is presented to the end-user for comment. The demonstration can result in, new or changed requirements, which would necessitate backtracking to the requirements analysis phase, holes being discovered in the analysis and design which would necessitate backtracking to the design phase, and purely cosmetic changes concerning the HCI, which would necessitate backtracking to the prototype phase.
  • Deliver; A number of approaches can be taken to delivery, e.g. the big bang approach in which the whole application is delivered, or the incremental approach whereby parts of the application are delivered as they roll off the prototyping production line.

See slide 10

The literature commonly identifies 4 kinds of prototype:-

  • Cosmetic Prototypes; These are non-functional mock-ups of systems or parts of systems which have the appearance that the designer expects the final system to have.
  • Research Prototypes; Research prototyping occurs when there are specific and firm non-functional (i.e. regarding performance or security) requirements which the designers wish to prove.
  • Throwaway Prototypes; These are prototypes which are built solely with the aim of validating requirements, they are put through verification and then discarded.
  • Functional/Evolutionary Prototypes; Functional prototyping refers is an approach involving designing the system, producing a first working prototype, verifying the systems ability to support user requirements and refining this prototype into the finished system. Functional Prototyping depends to a large extent on whether the tools used in the development environment, e.g. a PC can be ported to the implementation environment, e.g. an ICL mainframe.

Prototyping ‘v’ SSADM’

As may be seen there are some inconsistencies in the approaches adopted under a prototyping scheme and the SSADM scheme. When using prototyping analysis and design are carried out in a less rigorous and comprehensive manner than when using SSADM. When using SSADM the assumption appears to be that the analyst will get it right first time, prototyping has a minor role in SSADM.

There is no right approach which suits all projects. It may be better to use a functional prototyping approach for relatively simple end-user applications, but a complex fundamental business system like payroll or customer billing probably needs the rigour of SSADM. Alternatively a hybrid approach could be adopted, integrating a prototyping approach to implementation, with SSADM.

Tutorial Sheet Prototyping

Objectives

To evaluate the strengths and weaknesses of the prototyping method.

Question 1

Why would an end user find it easier to criticise a prototype than a DFD?

Question 2

In what kinds of situations would you adopt the different prototyping methods and what approach are you adopting in your project.?

Question 3

Why is prototyping ineffective when using 3rd generation languages such as COBOL?

Contents Page

Home Page


Software Quality Assurance

cached from: http://www.comp.glam.ac.uk/pages/staff/tdhutchings/chapter7.html

Quality Assurance

Quality Terminology

  • Quality = Conformance to Requirements; The definition of quality in every-day terms involves some measure of excellence (‘degree of excellence’ - Concise Oxford). In the development of information systems quality has come to mean conformance to requirement, not just in terms of functionality but also with regard to ease of use, performance and value for money. slide 2
  • Quality Control; Quality Control is the formal assessment of deliverables to ensure that they conform to requirements. A typical example of a quality control activty is a structured walkthrough slide 4
  • Quality Assurance; Quality assurance is concerned with the development of guiding frameworks within which individual quality control mechanisms take place. slide 5
  • Quality Management; Quality management is concerned with co-ordinating the efforts of individuals, the tools and methods used and the quality assurance frameworks and quality control mechanisms, towards the common goal of improving quality. It is also concerned with the development of a culture in which each individual takes responsibility for the quality of their own work has access to the appropriate tools and techniques for the job and has the management support necessary to enable them to do so. slide 6

Contents Page

Home Page

Quality Assurance Frameworks

A quality assurance framework involves the following processes or activities:-

  • Identification of Requirements; This process involves identifying and documenting the requirements of the end-users. This has to be the first phase in the development of quality products (if the requirements are unknown or woolly how do we know when we have conformed to them?).
  • Produce Deliverables; This is where the work gets done, e.g. producing a Logical Data Structure, Data Flow Diagram or working program/module.
  • Quality Control; In this process deliverables are checked to ensure that they conform to identified requirements, any errors or areas of non-conformance are recorded. This list of errors feeds back into the produce deliverables process where the errors are corrected.
  • Improve Standards; During this process the errors identified in quality control are analysed and steps taken to prevent these errors from recurring, e.g. by improving standards or by providing training.slide 7

A corporate data architecture as described in chapter 5 would form part of the quality assurance framework for analysis and design since there is a requirement for the LDS to conform to the Corporate Data Architecture.

Contents Page

Home Page

The Costs of Quality

The cost of quality has been defined by Philip B. Crosby in his book ‘Quality is Free’ as the cost of not getting things right first time.

  • Prevention Costs; These are costs incurred in preventing mistakes from happening such as, the implementation of standards, staff training and the costs involved in setting up quality management programs.
  • Appraisal Costs; These are the costs incurred in looking for mistakes before a product is released including, program/system/acceptance testing, walkthroughs, inspections and project reviews.
  • Failure Costs; These are the costs incurred because a product doesn’t conform to requirements such as, re-work, overtime, loss of business and maintenance.

The systems development lifecycle has occasionally been broken down into the break-down phase consisting of identification of requirements, systems analysis and specification, logical design and physical design, and a build-up phase consisting of program testing system testing acceptance testing and operation. slide 10

It is generally accepted that the cost of error correction increases sharply during the build-up phase and again when errors are detected in live operational running (B. Boehm ‘Software Engineering - IEEE Transactions on Computing). Recently a large organisation as part of their quality management program analysed how the IT budget was being spent in terms of prevention, appraisal, failure and productive costs the following results were found:-

CostPercentage of IT Budget
Prevention2%
Appraisal40%
Failure35%
Productive Work23%

The organisation also set itself some targets for improvement.

CostPercentage of IT Budget
Prevention10%
Appraisal40%
Failure10%
Productive Work40%

The organisation expected that a small increase in prevention costs (up to 10%) would result in a dramatic drop in the costs of failure (down from 35% to 10%) thus resulting in a substantial rise in productive work (up to 40%).

The aim of the organisations quality management program was to significantly reduce the costs of failure by;

  • using improved appraisal mechanisms to identify and correct errors as early as possible (i.e. when it is cheapest to do so) and
  • by taking action to prevent the recurrence of errors.

Error detection, correction and recurrence prevention is the essence of quality assurance and the resulting increase in productive work is the thinking behind Philip Crosby’s belief that quality is free

Contents Page

Home Page

Walkthroughs, Inspections and Reviews

These can be informal, e.g. asking a colleagues opinion, semi-formal, e.g. a brainstorming session, or Formal, e.g. a formal inspection of a LDS.

The purpose of a formal inspection is the identification of errors and/or areas of non-conformity. The purpose is not to correct the errors this, in a quality culture, has to be the responsibility of the original producer of the deliverables being inspected.

The following roles are found in most formal inspections:-

  • Moderator; Responsible for arranging the inspection, distributing the materials and acting as chairperson during the inspection.
  • Producer; Responsible for producing the deliverables and delivering them in good time to the moderator.
  • Standards Bearer; Responsible for ensuring that any relevant standards, e.g. SSADM Version 4 are adhered to.
  • Scribe; Responsible for documenting any areas of non-conformance.
  • Reviewer; Everyone has a responsibility for the identification of errors/non-conformity but a reviewer has no other specific responsibilities.

It is important that consensus is reached regarding the outcome of the inspection. There are only three possible outcomes; Pass (as is), Pass (minor modifications required) or Fail (further inspection required). The other important thing for everyone involved but particularly for the producer to remember is that what is being criticised is the work not the person who produced the work. An example of a typical quality control form follows:

Structured Walkthroughs are described in great depth in Ed Yourdon’s book ‘Structured Walkthroughs’.

Contents Page

Home Page

Software Development Methodologies

cached from: http://www.comp.glam.ac.uk/pages/staff/tdhutchings/chapter6.html

Research in Methodologies

Aim

The aim of the research program is to develop a holistic methodology for information systems engineering based on a Corporate Information Management Strategy (CIMS).The intention is to realise the methodology by integrating a wide variety of contemporary approaches from the fields of business process re-engineering, strategic information systems planning, project management, socio-behavioural development, structured methodologies, the object oriented paradigm, rapid application development and end-user systems development.

Contents Page

Home Page

Objectives

  • To define the class of information systems for which the methodology is applicable and explain why such a methodology is required based on an analysis of information systems failure.
  • To demonstrate that the majority of corporate data structures are fundamentally more stable and common across business areas than the majority of business processes and thus form a strong foundation for strategic information systems planning.
  • To demonstrate the importance of adopting, a strategic, rigorous approach to the development of corporate data structures.
  • To demonstrate the dynamism of the activities which organisations carry out in order to achieve their objectives.
  • To demonstrate the importance of adopting a flexible, evolutionary approach to the development of applications which support and enable business processes.
  • To review the following fields and identify the implications for the development of the methodology:-
    • Business Process Re-Engineering,
    • Socio-Behavioural Development,
    • Structured Methodologies,
    • Object Orientation,
    • Rapid Application Development,
    • End-User Systems Development.
  • To document the features required of the methodology.
  • To develop and illustrate the methodology.
  • To describe how the methodology enables rapid application development.
  • To describe how the methodology enables End-User Systems Development.

Contents Page

Home Page

Information Systems Failure

The objective of this section is to define the class of information systems for which the methodology is applicable and explain why such a methodology is required based on an analysis of information systems failure. Much of the work in this area will be based on the material in Sauer’s book entitled Information Systems Failure: A Case Study Approach (Sauer, 1995).

Contents Page

Home Page

Stable Corporate Data Structures

Contents Page

Home Page

Corporate Information Management Strategy

Contents Page

Home Page

Dynamic Business Activity

The objective of this section is to demonstrate the dynamism of the activities which organisations carry out in order to achieve their objectives.

Contents Page

Home Page

Flexible Applications Development

The objective of this section is to demonstrate the importance of adopting a flexible, evolutionary approach to the development of applications which support and enable business activities.

Contents Page

Home Page

Influencing Factors

Contents Page

Home Page

Comparison and Critique of SSADM v4 and DSDM

Contents Page

Home Page

SSADM v4+ (4.2)

Contents Page

Home Page

Required Features

The objective of this section is to document the features required of the methodology.

The framework should have the following aims :-

  • Application for a wide variety of information systems;
  • Facilitation of consistent and long term RAD (through data and software re-use)
  • Effective development, facilitation and management of End-User Computing (EUC).

Contents Page

Home Page

Methodology Development

The objective of this section is to develop and illustrate the methodology.

Overview

Figure <> provides an overview of the proposed methodology. The subsequent sections provide more detailed descriptions of the phases of the methodology.

Figure <> Information Systems Engineering Methodology

It should be noted that the remits of phases 2, 4 and 6 have to cut across any artificial project boundaries.

Corporate Data in Action

In this section the methodology is illustrated by using an example based on a hypothetical University environment. A common shortcoming of many methodologies is that they assume a greenfield site. The illustration will show how the methodology can be used in a more typical site, i.e. one in which there exists a confusing and sometimes conflicting variety of corporate, departmental and personal systems running on a variety of hardware and software platforms.

Scenario

Lecturers in a college are responsible for teaching modules and assessing the students taking the modules through a mixture of coursework and examination. The college has a central Student Administration System (SAS) which records data regarding which students are taking which modules.

The SAS holds data on the contributions that coursework and examination make to the final grade for the module but no detail concerning the number of courseworks and the format and structure of the examination.

At the start of each semester lecturers receive a printed list of students taking their modules from the SAS. The lecturer wants a system which enables the marks students achieve in courseworks and examinations to be recorded and presented back to the SAS. Faced with this problem in previous years lecturers have been left to their own devices and many have resorted to building their own spreadsheets and re-keying the names of the students on the pre-printed list. This uncontrolled approach has lead to

  • duplication of data, the same student is recorded by different lecturers and by the SAS,
  • duplication of effort, each lecturer builds their own spreadsheet and
  • inconsistency of data, e.g. T.D. Hutchings is recorded up to six times (once by the SAS and up to five times for the modules they are taking), ‘Mr. T.D. Hutchings’, ‘Hutchings, Timothy D’, ‘Tim Hutchings’, ‘Timothy David Hutchinson’ etc. (which is correct?).

The implications of this approach on the correct recording of students grades are obvious. Adopting the CDMS via the proposed methodlogy could address some of these problems.

Phase 1: BusinessActivity Analysis

Aims & Objectives:

  • To identify how the current business processes are carried out.
  • To identify the data structures required to support the current business processes

Inputs

<>

Structure

The business process analysis phase concentrates on identifying and documenting low level business activities and consists of two overlapping tasks :-

  • Detailed Data Flow Analaysis using low level Data Flow Diagrams;
  • Detailed Data Structure Analysis using Relational Data Analysis (Normalisation)

The relationship between the tasks is shown in figure <>. The approach accepts the fact that conducting detailed data structure analysis may dictate further data flow analysis and therefore that this phase is iterative in nature.

figure <> Phase 1 Busines Process Analysis

Techniques

<>

Deliverables

The deliverable from this process is a series of detailed process descriptions each consisting of a low level DFD and a set of normalised tables.

Practice

The first task for the systems analyst is to identify and document the processes which lecturers carry out in terms of assessing students. The following business processes are identified :-

The ‘Students Taking Module’ print out is received from the administrative stafff of the department.

During the first few weeks of the semesters the lecturer maintains the list by taking registers, crossing off drop outs and including late starters.

The corrected list is returned to administrative staff in order to update the SAS.

A spreadsheet is created and the names of the students are keyed in. A column or columns are included for each coursework and the examination.

Periodically the courseworks set are marked and the results keyed into the appropriate columns.

The examination is marked and the results entered.

Formulae are created which calculate the overall coursework and examination percentages for each student.

The appropriate columns from the spreadsheet are extracted and passed on to administrative staff for input to the SAS.

A detailed data flow diagram to represent these business processes is shown in figure <>

figure <> 1st Draft Data Flow Diagram

The key data structure which supports these processes is obviously the lecturers spreadsheet, an example of which is shown in figure <>.

figure <> Example Spreadsheet

Analysing the data structure using a combination of normalisation, top-down entity-relationship modelling, common sense and business awareness reveals the following model:

figure <> External Schema for Student Assessment

Phase 2: Business Activity Re-Engineering

Aims & Objectives

  • To identify improved ways of conducting business activities. NB It is relatively easy to identify and remove duplication and inconsistency, however substantial improvements rely on creative input from the people involved.
  • To ensure that the newly engineered business processes are consistent with the results of any overarching business process re-engineering initiatives and with other processes in the value chain of the organisation.
  • To describe the external schema requirements of the re-engineered business processes.

Inputs

<>

Structure

This phase begins with the relatively straightforward task of rationalising the process descriptions produced in phase 1 to remove duplication and inconsistency. The result of this rationalisation is a single set of normalised tables and a hierarchic set of DFDs.

Then begins the difficult task of re-engineering the business process. In this area there are no magic wands there is no substitute for detailed business knowledge and experience. The systems analyst should adopt a facilitatory role. It should be clear that this kind of task cannot be carried out within the boundaries of a particular project.

The final task in this phase involves carrying out any changes to the DFD’s and the normalised tables, necessitated by the re-engineering process.

NB:It should be noted that information collected in tasks 3 and 4 may necessitate revisiting phase 1.figure <>
Phase 2 Business Activity Re-engineering

Techniques

<>

Deliverables

The deliverables of this phase are (a) a set of hierarchic DFD’s with supporting documentation consisting of text, decision tables, structure charts, algorithms etc. as considered appropriate by the analyst and (b) a set of normalised tables with supporting documentation. The set of normalised tables forms the beginning of an external schema definition for the application.

Practice

During this phase the Data Flow Diagrams and External Schema are rationalised to remove duplication and inconsistency. The next step is to consider ways in which the business process could be re-engineered. Finally the Data Flow Diagrams and External Schema are redrafted to support the re-engineered activity.

Phase 3: Functional Model Iteration

Aims & Objectives

  • To iteratively build a prototype which embodies the functional requirements of the system.

Inputs

Analysis & Design Reports from phases 1 and 2.

Technical Standards (e.g. hardware / software platforms and HCI standards).

Structure

The structure of this phase is again iterative, allowing for the creation and refinement of the prototype based on feedback. Ths structure is shown in figure <>.

NB This phase is very close to the identically named phase as part of standard DSDM.

figure <> Phase 3 Functional Prototyping

Techniques

<>

Deliverables

The result of the prototyping process should be agreement on the functional requirements of the application and a revised external schema definition.

Practice

During this phase the analyst adopts the iterative prototyping approach in order to prove the functional requirements of the system. Ideally a CASE tool would be used to support the analysis process and be capable of generating a first cut prototype with little or no extra work from the analyst. Furthermore the use of a scaleable implementation platform within a common technical architecture should enable the functional prototype to be used as a baseline for the forthcoming design and build prototype.

Phase 4: Corporate Data Design

Aims & Objectives

  • To evaluate how the corporate database definition (the conceptual schema) affects and is affected by the new external schema.
  • To design and implement the required changes to the conceptual schema.
  • To design and implement the required changes to the internal schema and to create the new external schema in a test environment.

Inputs

<>

Structure

The Corporate Data Design phase consists of three tasks as illustrated in figure <>

During tasks 1 and 2 the required external schema definition (possibly represented as a set of normalised tables or an Entity-Relationship Model) is integrated with the conceptual schema. At the logical level i.e. in meta-data terms.

During task 3 the identified changes are implemented into a test environment

figure <>

Phase 4 Corporate Data Design

During this phase the External Schema is integrated with the Conceptual Schema. If there is no conceptual schema, i.e. this is the first project to adopt the CDMS, then the External Schema needs to be re-examined in the light of corporate requirements and appropriate parts of the External Schema used to form the beginnings of the conceptual model. (In this example the entities Student, Module and Student taking module would become the beginnings of the corporate database.)

The next task is to implement the changes to the internal schema in a test environment. This is the stage at which many attempts to adopt a CDMS fail due to a lack of commitment and understanding. The problem is that student data, currently held in the central SAS, has to be unlocked. Student data has to become centrally managed in a new internal schema independent of the SAS or any other system. The SAS itself has to be amended in order to work with a view of the new internal schema rather than its own student file. The seemingly wasted effort of amending the SAS is very difficult to justify in the short term, however only by bearing this cost can long term RAD be consistently achieved and EUC effectively enabled.

The Corporate Data Design phase by its very nature cuts across system boundaries and only at this stage can the knock on effects of systems development be identified.

The External Schema is realised by creating views of the internal schema and combining these views with local tables.

Techniques

<>

Corporate Object Modelling

At this stage the techniques of Entity/Event modelling may be used in order to provide a start point for the production of ‘method’ specifications. Some explanation is required at this point. An object consists of data and methods. Methods are essentially modules of code which can be kicked into action in order to create, update, query or delete the object depending on which messages are received from other objects.

This may seem far removed from the entity/event modelling technique found in the SSADM methodology. However, on looking at the technique in detail, Entity Life History Diagrams describe not simply the business events which cause entities to be created, updated and deleted but furthermore the order in which these events should occur. Therefore the combined usage of Entity-Relationship Modelling and Entity Life Histories provides a good starting point for the definitions of classes and methods. Effect Correspondence Diagrams (ELH diagrams show how a particular entity type is affected by the system’s events, Effect Correspondence Diagrams show how a particular event affects the system’s entities) may prove useful for identifying messages.

Deliverables

<>

Phase 5: Design and Build Iteration

Aims & Objectives

To iteratively build a prototype which embodies the functional and non-functional requirements of the system in a test environment which is, so far as possible, identical to the live or production environment.

Inputs

<>

Structure

This phase largely adopts the structure of the identically named phase of the DSDM methodology. However it should be noted that during this phase it may be necessary to ‘software engineer’ certain key methods, algorithms and modules / programs. Using software engineering techniques such program structure diagrams and formal methods as deemed appropriate.

It should be noted that this phase may indicate new or changed corporate data requirements and necessitate returning to the previous phase.

During this phase the local External Schema used by the Functional Prototype is replaced by the new External Schema. The prototype is then developed iteratively to prove both functional and non-functional requirements.

Techniques

<>

Deliverables

<>

Practice

<>

Phase 6: Production Implementation Control

Inadequate systems implementation can probably be blamed for a number of systems failures. Implementation could be inadequate for many reasons including (a) the failure to distinguish between final test versions and production or live systems, (b) inadequate training and (c) failure to co-ordinate production implementation when different systems are developed in parallel.

Aims & Objectives

  • To identify, plan and schedule appropriate ‘delivery units’;
  • To conduct the production implementation of delivery units.

Inputs

<>

Structure

A simple technique is to invoke a regular production implementation slot, prevent implementations at other times and adopt quality assurance procedures for implementation plans. This should enable delivery units and training to be scheduled accurately and avoid uncontrolled implementation.

Techniques

<>

Deliverables

<>

Practice

<>

Contents Page

Home Page

Enabling Rapid Application Development

The objective of this section is to describe how the methodology enables rapid application development.

Contents Page

Home Page

Enabling End-User Systems Development

The objective of this section is to describe how the methodology enables End-User Systems Development.

Contents Page

Home Page

Introduction to Data Management and Database Technology

cached from: http://www.comp.glam.ac.uk/pages/staff/tdhutchings/chapter5.html

Introduction to Data Management and Database Technology

Introduction

The objectives of this section are to describe the development of the database approach, define the term data management, and distinguish between ‘Data Administration’ and ‘Database Administration.

As described in chapter 1 many large organisations have come to realise that their data needs to be professionally managed. This chapter describes some of the activities involved in Data Management and some of the problems likely to be encountered when trying to adopt the database approach.

Contents Page

Home Page

The Database Approach

Firstly some explanation of the ‘database approach’ and its development. Historically information systems development involved the development of stand alone systems for separate applications each with their own set of master files. This meant that data such as addresses could be duplicated in many separate systems, e.g. an employees address could be held once in the payroll system and again in the personnel system. This causes two problems; waste of space and inconsistency, waste because the address is held twice when it only really needs to be held once and inconsistency because it is possible for the address to be updated in one system but not in the other (there may be no easy way of knowing which is definitive). slides 3 - 5

The problems described above led to the development of the database approach. There is a common misconception that the database approach is determined by whether the organisation uses database technology or not. This is not the case, the database approach is an approach in which an organisation attempts to build an integrated corporate database to support all the data requirements of the organisation (in this context database means all the data held by the organisation, whether held in computerised database applications, file systems, spreadsheets or any other medium). slide 6

The real test as to whether an organisation is using the database approach is whether their systems have as a foundation data structures built solely for the immediate requirements of a particular system or on data structures built to support the overall requirements and objectives of the organisation. Of course the implementation of a corporate database won’t happen overnight and there will be compromises between short term requirements and long term aims and objectives. The important thing is that the organisation has a ‘data management’ strategy in place and is moving step by step towards an integrated corporate database.

The problems involved with the implementation of the database approach are concerned with people issues not technical issues. The challenge for the analyst is to get people from different backgrounds with different knowledge of the organisation to agree on common data definitions.

Contents Page

Home Page

What is Data Management?

Data Management is the process of planning, co-ordinating and controlling an organisations data resource.

Contents Page

Home Page

What is a Data Architecture?

Data Management principles can be applied at different levels. At the highest level some organisations (British Gas, British Telecom and the NHS) have built corporate data architectures which define at a very high level the data structures required to support the operation of the whole enterprise. These architectures are not really intended to be used directly in the development of application systems, rather they provide a framework within which lower level data models for particular applications can be validated. In this way it is possible to ensure that individual systems have a view of data which is consistent with the company-wide view. At the other extreme it is often useful to apply data management principles to simple Access applications.

Contents Page

Home Page

What is a Corporate Database?

Many Organisations depend on the accurate and timely presentation of information to employees, customers, suppliers, shareholders and the government for their survival. In many cases computer systems are the only viable mechanism for storing and processing the data necessary to provide this information, given the volume of data and the complexity of the processing involved.

The data can be stored in a variety of ways, e.g.

  • Simple Sequential Files.
  • Indexed Sequential & Random Access Files.
  • Hierarchic Database Management Systems (IMS).
  • Network Database Management Systems (IDMS).
  • Relational Database Management Systems (ORACLE)
  • End-User Database Management Systems (DBASE, PARADOX, ACCESS)
  • spreadsheets (EXCEL, QUATTRO PRO)

The different data storage techniques can exist on practically any level of machine from PC to Mainframe, in various architectures, e.g. Distributed or Client/Server. In practice large organisations that are mature computer users will use many hardware/software platforms appropriate to their needs.

The total Collection of data held by an organisation - ‘The Corporate Database’ can be its most valuable asset. The intrinsic value of data is hard to calculate, so many organisations use a replacement cost approach. Consider how much it would cost to replace a large organisation such as a bank (where the number of customers is measured in millions and the size of the database measured in gigabytes) to replace information on all their customers....names, addresses, telephone numbers, current balance, account history, overdraft limit etc.

A valuable resource such as a corporate database has to be managed effectively on a day to day basis and over the longer term.

At the day to day level the corporate database has to;

  • perform at acceptable levels in supporting the required processing,
  • make reasonably efficient use of disk space and be flexible enough to accommodate changes in storage mechanisms,
  • Cope with changing data structure and/or processing requirements.

At the abstract level in the longer term the corporate database has to;

  • accurately reflect the organisations view of the real world and
  • be flexible enough to change as business requirements change.

Looking to the future the corporate database environment is becoming more and more technically complex, in order to support end users who are increasingly sophisticated in their understanding of the technology and in terms of their requirements. Users expect, from their desktops, to be able to use a friendly (icon or object based) interface to access whatever corporate data they require from many different locations, on lots of different platforms, held in different database management systems or indeed spreadsheet data. They expect to be able to manipulate the data locally using their own particular tools and then transmit updates to other locations in a network, client/server or distributed environment.

The challenge for the database manager is to retain control over data supporting core business activities and to ensure that business rules are consistently applied when this core data is accessed and manipulated. The challenge obviously becomes more difficult in a rapidly changing technical environment.

Contents Page

Home Page

What Activities are Involved in Data Management?

Typically data management activities are divided into two areas, i.e. ‘data administration’ and ‘database administration’.

Data Administration

Data Administration is concerned with the management of an organisations meta-data (i.e. data about data and processes) rather than the physical information held on storage devices. In other words it deals with the logical or business view of data. Data administration involves the following activities:-

  • Offering consultancy on all aspects of meta-data
  • Promoting corporate awareness of the data available (most people have a very narrow view of data relating it only to the particular requirement of their job)
  • Identification of corporate data requirements
  • Data analysis and definition
  • Data dictionary control and monitoring
  • Defining data privacy policy (who has access to what?)
  • Defining security and recovery policy
  • Defining standards for physical data models (diagrams which show the physical structure of a database)

Database Administration

Database administration is primarily concerned with the technical integrity of the physical database environment, the day to day operations of an organisations databases and the policies governing their everyday use. The following activities are involved:-

  • Control of the database environment, including carrying running repairs in the event of failure
  • Database Design
  • Implementation of database, security, recovery and integrity checking mechanisms
  • Technical standards and guidelines
  • Impact assessment
  • Monitoring data usage
  • Training for the database environment

Contents Page

Home Page

Tutorial Sheet Data Management

Objectives

To ensure that you understand the terms ‘database approach’, ‘data management’, data administration’ and database administration’ and understand some of the problems involved in implementing them.

Question 1

Why is data so valuable, why does it need to be professionally managed and what steps can organisations take to protect their investments?

Question 2

How can an organisation using database technology not be adopting the database approach?

Question 3

How could adopting a data architecture affect the timescales of a project, can a data architecture approach be justified in an environment where everybody wants everything yesterday?

Question 4

Consider the following tasks and discuss whether they are data administration activities or database administration activities:-

  1. Changing the definition of an entity in a data dictionary.
  2. Copying a database onto tape and taking the tapes to an off-site safe.
  3. Deciding how often to take the tapes off-site.
  4. Restructuring a database.
  5. Interviewing the director of marketing
  6. Patching a database at 2 am.
  7. Deciding whether the director of finance should be allowed to access individual salary fields.
  8. Giving the finance director access to salary fields.
  9. Deciding whether to build an index for performance reasons.

Contents Page

Home Page

Stable, Common Data Structures

The Data Structures which are required to support the operations of an enterprise are relatively stable over time. This philosophical belief developed by Martin (1976) underpins the Information Engineering methodology (Martin, 1989).

The Information Engineering Methodology (IE, Martin, 1989) differs from other structured methodologies in that its scope is very broad (the whole organisation and all its information systems) and deep (the methodology extends from strategic information systems planning, through to implementation and maintenance. IE is also unusual because of its emphasis on the importance of structural data requirements as opposed to processing requirements.

IE consists of 4 phases :-

  • Information Strategy Planning: This process is carried out at the corporate level and results in the production of a corporate information architecture, a cross mapping of the key corporate entity types and the business functions which access them.
  • Business Area Analysis: During this phase the business areas identified in the information strategy plan are analysed in detail using data and functional analysis.
  • System Planning and Design: This phase is divided into business system design and technical design.
  • Construction and Cutover: This phase covers the construction of software, implementation and maintenance.

In 1989 a large organisation (Organisation A) adopted the Information Strategy Planning phase of the IE Methodology and built a Corporate Data Architecture (CDA) to provide a framework for the development of corporate systems. The architecture consisted of a very high level Entity-Relationship Model. In 1991 a similar organisation (Organisation B) produced a CDA using the same approach. Superficially the CDAs appeared dissimilar. However on closer examination many of the data structures involved were common to both organisations and in many cases the apparent differences concerned the terminology not the underlying meaning.

The CDA for Organisation A contained an entity called ‘Person_Or_Organisation’. The definition of the entity was "A Person or Organisation who has a relationship with Organisation A (past/present/potential). This entity also includes Organisation A itself". The CDA for Organisation B contained an entity called ‘Party’ which had the following sub-types, Individual Party and Organisational Party . The definition of the entity was "An identifiable body that may be of interest to the company"

There are many other examples to be found within the architectures. Clearly, even though the organisations are currently undergoing fundamental change, both internally and with regard to their respective markets, the underlying data structures required to hold information about the people or organisations involved in the business are largely stable. This would appear to be the case both within and across organisations.

The architectures also include matrix analyses of business processes against entities, both sets of matricies indicated that the fundamental data structures are shared by many processes, from different functional areas of the businesses.

Contents Page

Home Page

Corporate Information Management

The paper has indicated that the information systems supporting core business activities such as Engineering, Marketing, Customer Billing, Payroll and Stock Control have largely stable, common data structures. Unfortunately many organisations in the past developed discrete systems to support these areas, each with their own sets of programs and files or databases. The result has been considerable overlap and duplication of data with all the resultant problems (Howe, 1984). A common example of corporate data concerns people and places, e.g. the names and addresses of customers, shareholders, employees.

Discrete Application Systems

Adopting this approach may cause problems in the not unlikely scenario of a single person being a customer, shareholder and employee of the organisation. Logically the person has one name and address. However the name and address is recorded in three places. Which is correct? What happens if the person changes their name or address? slide 1

Corporate Database

Adopting the corporate data approach enables a person’s name to be held in a single central location. The specific details of a person as a customer are held separately. slide 2

The example chosen is a gross simplification of a complex data modelling area, however it serves to illustrate the concept of ‘Corporate Information Management’. A Corporate Information Management Strategy (CIMS) would imply that the data structures required to support an information system be designed not simply to support the immediate functional requirements of the system in question but also to support the overall aims and objectives of the organisation.

Kent (1978) argues that the data model produced can only ever be a model not the model. However in my view the important thing is that the model is effective in supporting the organisation and that common understanding and agreement is reached.

The approach brings numerous long term benefits and not simply concerning the quality of the data. Adopting the first approach implies that three separate programs are required and that each of these programs will have a section of code to deal with names and addresses. Under the corporate data approach a single names and addresses module would be called from the customer, shareholder and employee programs. The sum of many seemingly small gains could result in a drastic reduction in system maintenance. (It may be noted that this approach dovetails neatly with the ‘gen-spec’ and ‘inheritance’ concepts from the field of object orientation, the ‘customer object’ could inherit name and address data from the ‘person’ object.)

Contents Page

Home Page

Database Technology

A Brief History

Machine Code

In the earliest days of computing programming was effectively carried out with a soldering iron. Problems in this area led to the development of the first programming ‘language’ - machine code (1st generation language). A machine code program consisted of a sequence of binary instructions which the computer would process. At this time there was no real division of labour, a small number of people with great expertise acted as analysts, designers, programmers and operators.

Assembler

The next step was the development of assembler (2nd generation language). An assembler program consisted of a series of mnemonic instructions which were translated/compiled into machine code by the computer. Assembler, though complex was obviously easier to learn and use than machine code, which reduced the reliance on technical experts. Assembler is still widely used in real time and safety critical systems and in games programming, however it would be unusual to come across assembler in a business information system.

3rd Generation Languages

The real leap forward in terms of programming languages was the advent of 3rd generation languages (3GL) the most common of which is COBOL (Common Business Oriented Language). The instructions in a Cobol program consist of English like statements such as ‘open customer-file’, ‘read customer-file’, ‘add bill-amount to outstanding-balance’. During the 70’s large organisations developed complex information systems for billing, payroll and stock control using the Cobol programming language and a combination of sequential, indexed and random access files and in fact there are some of these systems still in existence.

The Database Approach & 3rd Generation Database Systems

As described in section 6, historically the development of information systems involved the use of stand alone systems each with their own set of master files. To overcome the problems (waste of space, inconsistency, duplication of code and very importantly lost opportunities to share data) associated with application centred development approaches the hierarchic and network database management systems were developed.

The mid to late 70’s saw the development of the hierarchic and network database systems (3rd generation database systems). Through the 80’s many organisations converted their existing (file based) information systems to use either hierarchic or network database systems.

These systems allowed files to be linked and shared between applications, they provided code to maintain the integrity of data and common I/O routines. This approach took a considerable weight from the shoulders of programmers who no longer had to concern themselves with the intricate details of how data was physically stored only with its logical structure, the physical complexity was managed by the DBMS. The Hierarchic and Network systems continue to provide an established technology with which to maintain corporate data and are still in common use.

Relational Database Systems (The 4th Generation)

The Hierarchic and Network systems although more flexible than traditional file systems are still not flexible enough. The limitations of these so called 3rd generation database systems led to the development of 4th generation or Relational Database Management Systems. The mid to late 80’s through to the present day has seen the increasing realisation of large scale main-frame relational database systems (although the technology has been in existence for much longer). Relational Database Management Systems (RDBMS) are 4th generation database systems. The key difference between 4th and 3rd generation database systems is in terms of flexibility. It is relatively easy and quick to create new database structures and amend existing structures in relational systems, whereas to restructure a network database is a complex and time consuming task.

An integral part of a relational database system is the language SQL (Structured Query language). SQL is the 4th generation language (4GL). The key difference between 4th and 3rd generation languages is database navigation. In a language like COBOL the programmer has to specify in detail the database navigation required to produce the desired results. Using SQL the developer simply has to specify what information is required, e.g. ‘Select Customer_Name from Customers;’ The RDBMS itself then decides what navigation is required.

Summary

The current systems development environment, particularly in large organisation who have adopted the database approach but still have legacy systems to maintain and end-user applications to enable, is complex with a wide variety of different hardware and software platforms.

It can be seen that data/database administration in this environment is a challenging task, not simply because of the complex technical environment but also because of the inherent ‘people’ problems associated with the adoption of a corporate database approach.

The task is not likely to became any easier;

The technical environment may become more complex with the increasing use of Object Oriented Database Management Systems (OODBMS),

The current political climate advocates short term gain, whereas the corporate database approach is s strategy for long term benefit.

Contents Page

Home Page

The Relational Model

Introduction

The first ideas on using a ‘Relational’ model of data were developed by Ted Codd, while working for the IBM Research Laboratory in San Jose, California. Ted Codd published the seminal work on relational systems in an article entitled ‘A Relational Model of Data for Large Shared Data Banks’ in the Communications of the ACM (Volume 13 / Number 6 / June 1970). The article provides a mathematical underpinning for relational systems based on set theory. Since the publication of this article relational theory has been refined and extended by experts in the field.

The main elements of Relational Theory are the following;

  • Ted Codd’s 13 rules for relational systems
  • Relational Integrity
  • Normalisation

In this document I do not intend to describe the theoretical basis of the relational model, but to describe in practical terms how the body of theory affects data analysis and database design.

The three fundamentals of relational theory are

  • That all information is held in table structures and that unlike hierarchic and network systems all relationships among data are described using data values, i.e. foreign keys, rather than physical pointers.
  • That each value in a column in a table is atomic, i.e. does not contain a repeating group or any directly accessible subsidiary elements.
  • The use of SQL

With these basics in mind we now go on to discuss Ted Codd’s 12 rules for relational systems.

The Relational Rules

The following discussion is based on the findings of an experienced data manager at a large UK organisation. Please remember that the fact that an advertised system fails to comply with one of the following rules doesn’t mean it’s necessarily bad. In fact Ted Codd set a very difficult task which, to date, according to my knowledge, has not been met. There are more things to life (in the world of databases) than fidelity to these rules, such as performance.

Rule 0

‘For any system that is advertised as, or is claimed to be, a relational database management system, that system must be able to manage databases entirely through its relational capabilities.’

In practice this means that all data manipulation must be achievable through SQL. Early versions of Paradox, i.e. those prior to the addition of the SQL interface, though advertised as relational were not truly relational.

Rule 1 The Information Rule

‘All information in a relational database is represented explicitly at the logical level in exactly one way - values in tables.’

The table is the only available structure for storing data in a relational system.

Rule 2 Guaranteed Access Rule

‘Each and every datum (atomic value) in a relational database is guaranteed to be logically accessible by resorting to a combination of table-name, primary key value and column name’.

The common Customer/Order scenario could be implemented in Cobol, or indeed in a network DBMS as a single record type as follows

01 CUSTOMER

    03 Customer-Number

    03 Customer-Name

    03 Customer-Address

      05 Street-Address

      05 District

      05 Postal-Town

      05 County

      05 Postcode

    03 ORDER Occurs 10 times

      05 Order-Number

      05 Order-Date

      05 ORDER-LINE Occurs 15 Times

        07 Quantity

        07 Part-Number

        07 Part-Description

        07 Part-Price.

        07 Supplier_Number

        07 Supplier-Name

What are the problems with this approach regarding the following type of operation and how does the relational model overcome them ?

  1. Insertion
  2. Update
  3. Deletion

The distinguishing features and benefits of the relational model stem from the first three rules;

  1. Simplicity; The simple tabular representation of data allows easier understanding and communication.
  2. Flexibility; Navigation using foreign keys does not have to be predefined.
  3. Productivity; In general SQL is easier to learn and use than a 3GL such as Cobol, since it is possible to define what you want without specifying how to get it.

Rule 3 Systematic Treatment of Null Values

‘Null values are supported in fully relational systems for representing missing information and inapplicable information in a systematic way, independent of the data type.’

In simple terms the RDBMS should be able to distinguish between, 0 and null and, between ‘ ‘ and null. Furthermore the RDBMS should allow columns to defined as ‘Not Null’.

Rule 4 Dynamic On-Line Catalogue Based on the Relational Model

‘The Database Description is represented at the logical level in the same way as ordinary data, so that authorised users can apply the same relational language to its interrogation as they apply to regular data.’

This approach enables the use of SQL statements to access database descriptions, i.e. the ‘DESC table’ command enables the user to access tables containing the description of a table, in fact the RDBMS converts DESC into a series of SQL statements. This is one of the fundamental reasons why RDBMS are more flexible than network systems. In network systems the database description has to be predefined and compiled, in relational systems each time a command like ‘Select * From Customers’ is issued the RDBMS accesses the on-line catalogue to establish what ‘*’ means.

Rule 5 Comprehensive Data Sub-language Rule

‘A relational system may support several languages and various modes of use. However there must be at least one language whose statements are expressible, per some well-defined syntax, as character strings and that is comprehensive in supporting all of the following items;

  • Data Definition, e.g. Create Table,
  • View Definition, e.g. Create View CUSTORD as Select etc.,
  • Data Manipulation, e.g. Select,
  • Integrity Constraints (see next section)
  • Authorisation, e.g. Grant Select on Customers to TDHUTCHI.
  • Transaction Boundaries, (Commit and Rollback)’

This means that a single language (SQL) must be available for all the above purposes. This in effect is the same as Rule 0.

Rule 6 View Updating Rule

‘All views that are theoretically updatable are also updatable by the system’

This is best explained using an example;

Consider the following SQL;

Create Table Employees (Employee_Number Number(6) not null,

Employee_Name Char(30) not null,

Employee_Salary Number(10,2));

Create View Emp_Sal

as Select Employee_Name, Employee_Salary from Employees;

Create View Emp_Name

as Select Employee_Number, Employee_Name from Employees;

Which of the views, Emp_Sal or Emp_Name could be used to insert a new Employee row ?

Rule 7 High Level Insert, Update and Delete

‘The capability of handling a base table or a view using set at a time operation applies not only to the retrieval of data but also to the insertion, update and deletion of data.’

The Select statement provides set at a time operation, i.e. The command ‘Select * from Customers’ will retrieve all customers without the need to code an iteration, in other words a single statement can access a set of records. Rule 7 says that the facility for set at a time operation should also apply to insert, update and delete statements.

Rule 8 Physical Data Independence

‘Application programs should remain unimpaired whenever any changes are made in either storage representation or access methods.’

In other words physical changes should not affect the logical function of any programs. Thus a DBA should be able to move a physical data file from one disk to another or decide to index a column or drop an index without any coding changes being necessary.

ORACLE6 falls down somewhat in this respect. What would happen to a program used to create new customers if the unique index on Customer_Number was dropped ?

Rule 9 Logical Data Independence

‘Application programs should remain logically unimpaired when information preserving changes of any kind that theoretically permit unimpairment are made to base tables.’

Suppose that a single Employee table had to be split into two separate tables one for sales employees who earn commission and another for other employees. Consider the following SQL statements;

Create table Employees (Employee_Number Number(6) not null,

Employee_Name Char(30) not null,

Commission Number(8,2));

(*1) Select Employee_Number, Employee_Name From Employees;

Create Table Sales_Employees

as Select * From Employees where Commission is not null;

Create Table Other_Employees

as Select Employee_Number, Employee_Name from Employees where commission is null;

Drop Table Employees;

Create View Employees

Select Employee_Number, Employee_Name, Commission from Sales_Employees

UNION

Select Employee_Number, Employee_Name from Other_Employees;

(*2) Select Employee_Number, Employee_Name from Employees;

Should the select statements numbered *1 and *2 work in the same way ?

Rule 10 Integrity Independence

‘Integrity Constraints specific to a particular relational database must be definable in the relational data sub-language and storable in the catalogue, not in application programs.’

More on relational integrity in the next section.

Rule 11 Distribution Independence

‘Application programs should not be logically impaired when changes are made to the geographical location of data.’

This is simply an extension of rule 9 regarding physical data independence. Put simply your SQL statements should not be affected if your company moves the employee table from Cardiff to Newport.

Rule 12 The Non-Subversion Rule

‘If a relational system has a low-level (row at a time) language (ORACLE has a language called PL/SQL), the language should not be able to subvert or bypass any rules specified for the high level language.’

If a low level language such as PL/SQL allowed you to create tables with repeating groups then the system itself would not be truly relational.

Relational Integrity

The Relational Model has three integrity rules;

  1. Entity Integrity; No attribute participating in the primary key of a table is allowed to accept null values. A primary key value that was null would be a contradiction in terms, in effect saying that some entity had no identity, i.e. it did not exist. In addition the primary key must be unique otherwise how could an individual row be accessed.
  2. Referential Integrity; If a table contains a foreign key from another table, there are two possibilities for the values in this foreign key; if the member takes part in a mandatory relationship with the owner table then the foreign key must be equal to a primary key in the owner table; if the table takes part in an optional relationship with the owner then the foreign key can either take the value of a primary key in the owner table or the value null. (This emphasises the importance of the systematic treatment of null values.)
  3. Other Integrity Rules; The Catalogue should enable the definition of business rules at table creation, e.g. default values and ranges.

ORACLE (at version 6) falls down with each of these rules ;

  • Entity Integrity is partially catered for by allowing the DBA to specify not null for primary keys, but uniqueness required the creation of an index.
  • Referential Integrity is partially catered for by allowing the DBA to specify not null for foreign keys.
  • Other Integrity isn’t catered for at all, default values and ranges have to be specified in programs.

ORACLE version 7 is much better so far as relational integrity is concerned in that it allows you to specify primary and foreign keys at table creation, e.g.

Create Table Order (Order_Number Number(6) Primary Key,

Customer_Number Number(6)

References CUSTOMERS(Customer_Number),

Order_Date Date);

Normalisation

Normalisation is defined briefly but accurately in the following statement;

‘The Key the Whole Key and Nothing but the Key

Typically the literature on normalisation covers many levels of normalisation, 9 is not uncommon, but this seems to me to be a race amongst academics to identify as many levels as possible, in 99 cases out of 100, 3 levels of normalisation are all that is required.

1st Normal Form; converting an un-normalised data structure such as a report or an order form into 1st Normal Form (1NF) is commonly referred to as removing repeating groups but also may involve removing complex groups such as the Address Group described in rule 2. The aim is to ensure that each item is atomic.

2nd Normal Form; Converting a 1NF data structure into 2nd Normal Form (2NF) involves looking at each non-primary key attribute and ensuring that it depends on the whole of the key and not just part of it.

3rd Normal Form; Converting a 2NF data structure into 3rd Normal Form (3NF) involves looking at the interrelationships between non key attributes to see if any non key attributes depend only on each other.

This is all best described by looking at an example. Consider the following table which has been built up by an order entry clerk;

Cust#

NameOrd#Date

Part#

DescQtyPriceSupp#Name

1

Tim12320/3

1

AA21.9923ABC

2

BB32.9923ABC

3

CC43.9924DEF
45621/3

4

DD54.9925GHI

5

EE65.9926JKL

2

John78921/3

4

DD73.9925GHI

6

FF86.9927MNO

This table structure could be implemented quite easily in Cobol or in a network DBMS as shown in the example under rule 2, with all the associated problems.

A common representation of this kind of table in text books is as follows;

CUSTOMERS(Customer_Number, Customer_Name, (Order_Number, Order_Date, (Part_Number, Part_Description, Part_Quantity, Part_Price, Supplier_Number, Supplier_Name))

The internal brackets are meant to represent repeating groups and the underline represents a primary key. This called an un-normalised or 0NF data structure.

There are two approaches converting this 0NF structure to 1NF the first involves replicating the values in the table as follows;

Cust#

NameOrd#Date

Part#

DescQtyPriceSupp#Name

1

Tim12320/3

1

AA21.9923ABC

1

Tim12320/3

2

BB32.9923ABC

1

Tim12320/3

3

CC43.9924DEF

1

Tim45621/3

4

DD54.9925GHI

1

Tim45621/3

5

EE65.9926JKL

2

John78921/3

4

DD73.9925GHI

2

John78921/3

6

FF86.9927MNO

However this seems to be a clumsy approach and results in a three part key consisting of Cust#, Ord# and Part#. A simpler approach is to separate the repeating groups out into separate tables.

Step 1 remove the repeating group of orders

CUSTOMERS(Customer_Number, Customer_Name)

ORDERS(Order_Number, Customer_Number*, Order_Date, (Part_Number, Part_Description, Part_Quantity, Part_Price, Supplier_Number, Supplier_Name))

Step 2 remove the repeating group of parts

CUSTOMERS(Customer_Number, Customer_Name)

ORDERS(Order_Number, Customer_Number*, Order_Date)

ORDER_PARTS(Part_Number, Order_Number*, Part_Description, Part_Quantity, Part_Price, Supplier_Number, Supplier_Name)

The structure is now in 1NF since there are no repeating or complex group items (each item depends on the key). The next step is to convert the structure into 2NF, by examining each non primary key attribute to ensure that each depends on the whole of the key.

The CUSTOMERS and ORDERS tables each have a single column making up their primary key and are therefore by definition in 2NF. However looking at the ORDER_PARTS table it can be seen that Part_Description, Part_Price, Supplier_Number and Supplier Name only depend on Part_Number, i.e. their values are the same regardless of Order_Number. (Part_Quantity depends on the whole of the key since different quantities can appear on different orders.) To convert to 2NF a separate table is created for part descriptions, prices ,and supplier details

CUSTOMERS(Customer_Number, Customer_Name)

ORDERS(Order_Number, Customer_Number*, Order_Date)

ORDER_PARTS(Part_Number, Order_Number*, Part_Quantity)

PARTS(Part_Number, Part_Description, Part_Price, Supplier_Number, Supplier_Name)

The structures are now in 2NF since every non-primary key attribute depends on the whole of the key. The next step is to convert the structure into 3NF by ensuring that each non-primary key attribute depends on nothing but the key.

The CUSTOMERS table is patently in 3NF because there is no non-primary key attribute for Customer_Name to depend on. The ORDERS table is in 3NF because there is no dependency between Order_Date and Customer_Number (a customer can place different orders on different dates). The ORDER_PARTS table is in 3NF because the quantity ordered is dependent on both the order number and the part number. Looking however at the PARTS table it can be seen that the Supplier_Name attribute depends on the Supplier_Number and has nothing to do with the part number. To convert the structure into 3 NF a separate table is created containing supplier details.

CUSTOMERS(Customer_Number, Customer_Name)

ORDERS(Order_Number, Customer_Number*, Order_Date)

ORDER_PARTS(Part_Number, Order_Number*, Part_Quantity)

PARTS(Part_Number, Supplier_Number*, Part_Description, Part_Price)

SUPPLIERS(Supplier_Number, Supplier_Name)

Introduction to SQL

In this section we discuss what has become the de facto standard language for accessing relational database systems, i.e. SQL (Structured Query Language - pronounced SEQUEL or ESQL). It should be remembered that packages such as paradox and access have only recently had SQL interfaces added to them.

The SQL language can be broken down in to three types of statement :-

  • Environment Statements; These control the run time environment, things like pagesize;
  • DDL Statements; These Data Description Language statements enable the user to create and restructure tables. These statements are largely unnecessary in paradox since it provides a tool for creating tables;
  • DML Statements; the Data Manipulation Language statements allow the user to physically insert data into tables, to update this data, to delete data from tables and to run queries against these tables.

In this section we will concentrate on DML and in particular the ‘Select’ statement.

Consider the following tables :-

CUSTOMERS (Customer_Number, Customer_Name)

1 Tim

2 John

3 Anna

ORDERS (Order_Number, Customer_Number*, Order_Date)

1 1 17/10/95

2 1 18/10/95

3 2 19/10/95

ORDER_LINES (Part_Number, Order_Number*, Part_Quantity)

1 1 30

1 2 20

2 1 10

3 1 20

3 2 40

4 3 10

PARTS (Part_Number, Part_Description, Part_Price)

1 aaa 1.99

2 bbb 2.99

3 ccc 3.99

4 ddd 4.99

5 eee 5.99

The SQL language consists of 4 statements, i.e. Insert, Update, Delete & Select.

The insert statement is used to create new rows in tables;

INSERT INTO PARTS (PART_NUMBER,PART_DESCRIPTION,PART_PRICE)

VALUES (5,’fff’,6.99);

The update statement is used to change the information in existing rows in a table;

UPDATE PARTS

SET PART_DESCRIPTION = ‘xyz’

WHERE PART_NUMBER = 1;

The delete statement is used to remove information from tables;

DELETE FROM PARTS

WHERE PART_NUMBER = 5;

The select statement is used to display the information in tables;

SELECT PART_NUMBER,PART_DESCRIPTION

FROM PARTS

WHERE PART_NUMBER <> 1;

The following section describes the ‘Select’ statement in question and answer format;

How can I display all the columns in all the rows of the CUSTOMERS table ?SELECT * FROM CUSTOMERS;

(* could be replaced by a list of all the columns in the customers table)

How can I display all the columns in some of the rows in the customers table ?SELECT * FROM CUSTOMERS

WHERE (CUSTOMER_NUMBER < 3);

(the where clause can use equality, inequality and other features like > and <)

How can I display some of the columns from all of the rows in the PARTS table ?SELECT PART_NUMBER, PART_PRICE FROM PARTS;

(the list of columns doesn’t have to include all the columns in the table)

How can I display some of the columns from some of the rows in the PARTS table and put the results into a particular order ?SELECT PART_NUMBER,PART_PRICE

FROM PARTS

WHERE PART_PRICE > 2.99

ORDER BY PART_PRICE DESC;

(the selection criteria doesn’t have to be based on the primary key)

What happens if I want to display information from two tables, e.g. Display a list of customers together with the dates on which they have placed orders.SELECT CUSTOMERS.CUSTOMER_NUMBER, CUSTOMERS.CUSTOMER_NAME, ORDERS.ORDER_DATE

FROM CUSTOMERS,ORDERS

WHERE (CUSTOMERS.CUSTOMER_NUMBER =

ORDERS.CUSTOMER_NUMBER)

(In this example two tables have been joined together using the foreign key)

What happens if I want to display information from three tables, e.g. display a list of all orders together with their Order_Lines including the part description, placed on a particular dateSELECT ORDERS.ORDER_NUMBER, ORDERS.CUSTOMER_NUMBER ORDERS.ORDER_DATE, ORDER_LINES.PART_NUMBER

PARTS.PART_DESCRIPTION

FROM ORDERS,ORDER_LINES,PARTS

WHERE ((ORDERS.ORDER_NUMBER =

ORDER_LINES.ORDER_NUMBER)

AND (ORDER_LINES.PART_NUMBER =

PARTS.PART_NUMBER))

AND (ORDERS.ORDER_DATE = ‘17/10/95’);

(In this example the ORDERS table has been JOINed to the ORDER_LINES table and the ORDER_LINES table has been JOINed to the PARTS table)

What happens if the results of one query depends on the results of another? e.g. display a list of all customers who haven’t placed orders?SELECT *

FROM CUSTOMERS

WHERE CUSTOMER_NUMBER NOT IN

(SELECT DISTINCT CUSTOMER_NUMBER

FROM ORDERS);

(In this example the results of the outer query depend on the results of the inner query, the other point to note is the use of the ‘distinct’ clause.

What happens if summary results are required, e.g. How many customers are in the database ?

SELECT COUNT(*)

FROM CUSTOMERS;

(Running this query would simply display the answer 3)

What happens if information has to be displayed in groups, e.g. which parts appear on more than one order?SELECT PARTS.PART_NUMBER, PARTS.PART_DESCRIPTION

FROM PARTS,ORDER_LINES

WHERE (PARTS.PART_NUMBER = ORDER_LINES.PART_NUMBER)

GROUP BY PART_NUMBER

HAVING COUNT > 1

(This example joins the parts and order_lines tables, groups the result on the part_number and only displays those groups where there is more than one row)

Display the most expensive partSELECT PART_NUMBER, PART_PRICE

FROM PARTS

WHERE PART_PRICE =

(SELECT MAX(PART_PRICE)

FROM PARTS);

What is the average part price?SELECT AVG(PART_PRICE)

FROM PARTS;

What if the results have to contain calculations, e.g. display each order_line together with its value.SELECT ORDER_LINES.ORDER_NUMBER, ORDER_LINES.PART_NUMBER, ORDER_LINES.QUANTITY,

ORDER_LINES.QUANTITY*PARTS.PART_PRICE

WHERE (ORDER_LINES.PART_NUMBER =

PARTS.PART_NUMBER);

Contents Page

Home Page

Hierarchic and Network Database Systems

In this module we will be discussing the Relational Model in some detail however a brief summary of the features of the Hierarchic and Network Systems follows.

Hierarchic Database Systems

Hierarchic Database Systems (HDS), e.g. IMS (Information Management System) produced by IBM, enable tree structured relationships to be implemented efficiently using indexes and pointers (these concepts will be discussed in detail in the next section on Network Database Systems since the underlying technology is the same). A brief definition of a tree structure is that each child node only has one parent node and at the top of the tree is a single parentless node.

A HDS could be used to implement a customers, orders and order lines structure, however the introduction of parts causes a problem, the model is now no longer a tree structure since the order lines entity has two owners. This common structure is difficult to implement efficiently using HDS and typically extra indexing or data duplication is required.

Network Database Systems

Network Database Systems (NDS) are more flexible and enable ‘network’ structures to be implemented efficiently. A brief definition of a network structure is that child nodes may have any number of parent nodes. We have already seen that LDS’ are networked structures and that these network structures can be implemented in a RDBMS’ such as ORACLE, using appropriate tables and foreign keys.

In the Relational Model the underlying implementation structure is the table. Each entity type in an LDS becomes a table in the RDBMS. The relationships between entities are represented implicitly in tables using a system of foreign keys.

In the Network Model the underlying implementation structures are Record Types and Set Types, typically each entity type becomes a record type and each relationship type becomes a set type.

Network Schemas

Prior to using a NDS each record type and set type has to be defined in a SCHEMA. A schema is a logical definition of the record and set types which comprise a NDS. The first step in the creation of a NDS is the production of a schema An example schema for the well known Customers, Orders, Order_Lines and Parts scenario follows.

Schema Name is ORDERS
Record Name is CUSTOMER
01 CUSTOMER
03 Customer-Number Pic 9(8)
03 Customer-Name Pic X(30)
03 Address
05 Street-Address Pic X(30)
05 District Pic X(20)
05 Postal-Town Pic X(20)
05 County Pic X(20)
05 Postcode Pic X(9)
03 Telephone Pic X(20)
03 Credit-Limit Pic 9(6)V9(2)
03 Outstanding-Balance Pic S9(6)V9(2)
Record Name is ORDER-HEADER
01 ORDER-HEADER
03 Order-Number Pic 9(8)
03 Order-Date
05 Day Pic 9(2)
05 Month Pic 9(2)
05 Year Pic 9(2)
Record Name is PART
01 PART
03 Part-Number Pic 9(8)
03 Part-Description Pic X(30)
03 Quantity-In-Stock Pic 9(6)
03 Price Pic 9(6)V9(2)
Record Name is ORDER-LINE
01 ORDER-LINE
03 Quantity Pic 9(3)
Set Name is PLACES
Owner is CUSTOMER
Member is ORDER_HEADER
Insertion is AUTOMATIC
Retention is MANDATORY
Set Name is CONSISTS-OF
Owner is ORDER-HEADER
Member is ORDER-LINE
Insertion is AUTOMATIC
Retention is MANDATORY
Set Name is APPEARS-ON
Owner is PART
member is ORDER-LINE
Insertion is AUTOMATIC
Retention is MANDATORY.

There are two points to note so far as the record definitions are concerned;

Firstly that group items such as address and date are allowed in NDS (they aren’t in Relational Systems where each item has to be atomic)

Secondly that in this example there are no foreign keys. In network systems foreign keys are not required, since, as will be seen later, the network ‘set’ mechanism physically links each customer to all their orders

In the set definition part of the schema the ‘owner is’ and ‘member is’ clauses are self explanatory, however some explanation of the ‘insertion’ and ‘retention’ clauses is required.

Insertion; The insertion clause defines what happens to the member record of a set when it is created. Insertion can be specified as Automatic in which case the member is automatically connected to the current owner record of the set. The alternative is Manual in which case it is the programmers responsibility to connect the member to the appropriate owner or possibly leave the record free standing.

Retention; The retention clause defines what happens to the member records of a set after initial storage. retention can be specified as Fixed in which case the member record must always remain connected to the same owner record. One alternative is Mandatory in which case the member must remain connected to some owner record. The other alternative is Optional in which case the member can be disconnected from one set occurrence and connected to another or again left free standing.

A schema has to prepared (c/f compiled) before it can be used to ensure that the record and set definitions are valid.

Network Storage Schema

The next step is to create a storage schema which defines how the record types and set types will be physically implemented. An example Storage Schema follows.

Storage Schema Name is SSORDERS
Schema name is ORDERS
Area Name is CUSTORD-AREA
Block Size is 11K
Page Range is 1 to 10,000
Area Name is PART-AREA
Block Size is 2K
Page Range is 10,001 to 12,000
Record Name is CUSTOMER
Area Name is CUSTORD-AREA
Page Range is 1 to 10,000
Location Mode
CALC Using CUSTOMER-NUMBER Duplicates Not Allowed
Set Pointers
Set Name is PLACES
FIRST in Position 1
LAST in Position 2
Record Name is ORDER-HEADER
Area Name is CUSTORD-AREA
Page Range is 1 to 10,000
Location Mode
VIA Set PLACES
Set Pointers
Set Name is PLACES
NEXT in Position 1
PRIOR in Position 2
OWNER in Position 3
Set Name is CONSISTS-OF
FIRST in Position 4
LAST in Position 5
Record Name is PART
Area Name is PART-AREA
Page Range is 10,001 to 12,000
Location Mode
CALC using PART-NUMBER Duplicates Not Allowed
Set Pointers
Set Name is APPEARS-ON
FIRST in Position 1
LAST in Position 2

Record Name is ORDER-LINE
Area Name is CUSTORD-AREA
Page Range is 1 to 10,000
Location Mode
VIA Set CONSISTS-OF
Set Pointers
Set Name is CONSISTS-OF
NEXT in Position 1
PRIOR in Position 2
Set Name is APPEARS-ON
NEXT in Position 3
PRIOR in Position 4
OWNER in Position 5
Set Name is PLACES
Order is SORTED
Key is ASCENDING ORDER-DATE
Set Pointers
NEXT, PRIOR, OWNER
Set Name is CONSISTS-OF
Order is NEXT
Set Pointers
NEXT, PRIOR
Set Name is APPEARS-ON
Order is NEXT
Set Pointers
NEXT, PRIOR, OWNER

The Storage Schema has to be prepared to check that the area, record and set definitions are correct.

There are a number of areas of storage schema definitions which require explanation;

  • Areas
  • Page Ranges
  • Location Mode
  • Pointers
  • Set Order

What is an Area?

An area is a physical division of a database which is used to hold records and sets of various types. Areas map onto physical files and can thus be used to spread a large database over many disks. An area is broken down into a number of pages, with each page mapping onto one block of a physical file.

How do Page Ranges Work?

As described above an area is used to store records and sets. When defining a record type in a storage schema it is necessary to specify which area and what page range within the area the records will be stored in.

How do Location Modes Work?

Two location modes appear in the example storage schema;

CALC; This is the equivalent of a random access file in Cobol. In outline when a CALC record is to be stored an algorithm is used to calculate where the record should be stored. The algorithm takes as input the field defined as the CALC key (e.g. Customer-Number or Part-Number) and the page range specified for the record and outputs a page number within that range. CALC provides fast access providing that the key is known. If the CALC key is provided to the NDS it passes through the same algorithm, the result is the page number which holds the record, which can be used to retrieve the correct block from disk in a single I/O operation.

VIA; Using VIA it is possible to store member records on the same page as their owner records which again provides rapid access, since accessing a particular customer will in this case also retrieve all that customers Order-Headers and Order-Lines.

What are Pointers

Some further explanation of how records are stored on pages is required before describing pointers. When a record is stored on a page it has a line number allocated to it. A records page number and line number is physically stored within the record itself (although this is not accessible by programmers). The concatenated page and line number is called a ‘database key’.

Owner records in sets have first and last pointers, pointing to the first and last members of the set. Member records must have a next pointer which points to the next member in the set and may have a prior pointer which points to the previous member in the set and/or an owner pointer which points to the owner of the set. In effect a set with next pointers is a singly linked list, a set with Next and Prior pointers is a doubly linked list and a set with Next, prior and Owner pointers is a doubly linked list with owner pointers.

We have still not answered the question what is a pointer. A pointer is a database key which provides the location of the next, prior or owner record of the set. It can be seen that the first member record in a set will have identical values in its prior and owner pointers and that the last member in a set will have identical values in its owner and next pointers.

Set Order

The set order clause defines the order in which member records are inserted into sets. There are 5 possibilities;

  • Order is Next; In this case the member record is inserted after the current record, whether this is a member or the owner.
  • Order is Prior; In this case the member record is inserted prior to the current record.
  • Order is First; In this case the member is placed as the first member of the set regardless of which is the current record of the set.
  • Order is Last; In this case the member is placed as the last member of the set.
  • Order is Sorted; In this case the member is inserted into a sorted list according to the value of the data item defined in the storage schema.

Physical Database Creation

The next step is the creation and mapping of physical data files. On an ICL mainframe system running IDMS/X (Integrated Database Management System / Extra) this would be done as follows.

CRF(CUSTORDFILE, BLOCKSIZE=11K, NUMBEROFBLOCKS=10000, DISK=DISK1);

CRF(PARTFILE,BLOCKSIZE=2K,NUMBEROFBLOCKS=2,000, DISK=DISK2);

SVF(CUSTORDFILE);

SVF(PARTFILE);

IDMSXFILES(STORAGESCHEMA=SSORDERS,

AREAS=CUSTORD-AREA & PART-AREA,

FILES=CUSTORDFILE&PARTFILE);

Network Subschemas

The final stage before the database can be used is the creation of subschemas, these are programmers views of the database. Suppose that in this environment various programs require read only access to the customer record and that the order processing program needs write access to all records. Two subschema would be required.

Subschema Name is CUSTREAD-SUB
Storage Schema Name is SSORDERS
Areas
CUSTORD-AREA
Records
CUSTOMER READ-ONLY
Sets.

Subschema Name is ALLWRITE-SUB
Storage Schema name is SSORDERS
Areas
CUSTORD-AREA
PART-AREA
Records
CUSTOMER
ORDER-HEADER
ORDER-LINE
PART
Sets
PLACES
CONSISTS-OF
APPEARS-ON.

These subschemas have to be compiled before they can be used by programs (typically written in COBOL). In this case the CUSTREAD subschema would be held in a public object code library, whereas the ALLWRITE-SUB would be held in a private object code library

Example (skeleton) COBOL Program

Identification Division
Program-ID. Program1.
Subschema name is CUSTREAD-SUB
Environment Division
Data Division
COPYIDMS CUSTOMER
Procedure Division
Ready CUSTORD-AREA
Obtain First CUSTOMER
Perform Until DB-END-OF-AREA
Display CUSTOMER-NAME
Obtain Next CUSTOMER
End-Perform
Finish CUSTORD-AREA
Stop Run.

What does this program do?

Data Manipulation

In the example Cobol Program in the previous section we saw examples of 3 Data Manipulation Language (DML) statements, READY (this prepares a database area for use by a program c/f ‘OPEN’,OBTAIN, (this in effect is a read statement) and FINISH (this closes a database area c/f ‘CLOSE’).

There are many other DML statements that can be embedded in Cobol programs, such as;

  • STORE; inserts a record into the database.
  • MODIFY; updates an existing record.
  • CONNECT; inserts a record into a set occurrence.
  • DISCONNECT; takes a record out of a set.

Contents Page

Home Page

Stats