Term Paper on the File System

Here is a term paper on the ‘File System’ for class 11 and 12. Find paragraphs, long and short term papers on the ‘File System’ especially written for college and IT students.

Term Paper Contents:

Term Paper on the Introduction to File System
Term Paper on File Processing
Term Paper on the Activeness of Files
Term Paper on Processing Modes
Term Paper on the Reconstruction of Data File
Term Paper on File Organisation
Term Paper on System Security
Term Paper on System Audit
Term Paper on Audit Trail
Term Paper on Auditing Computer System

Term Paper # 1. Introduction to the File System:

In our practical life whenever we wish to keep some facts and figures for future use, we write it down on a piece of paper and preserve it. When the number of such papers, memos, notes, receipts, bills, etc., goes up, to keep them in a systematic manner, we group like items together and store each of them in separate files, giving an appropriate name.

In computer system, the basic concept of file system is almost same. But in the computer system, even if we want to keep a single character in storage, it has to be done by keeping it in a file with a unique name attached to it.

ADVERTISEMENTS:

As similar items are grouped together, we have different type of files like program files containing readily executable programs, data files containing organisation’s data in various formatted forms, or simply readable documents called text files. Generally, a data file is called a collection of records, or a unified set of data stored.

Based on the type of usage in business applications, we have different types of data files. The data files which are more or less of permanent nature containing all the necessary data for a specific application is called a Master File, which are regularly updated to contain the current data. There could be a master-file of employees, materials in stock, customers, etc.

The files which are created during recording of transactions from period to period are called Transaction Files. Usually there are a number of transaction files in a business organisation, like, purchase file, sales file, etc., which records the transactions like purchase, sales, etc.

The name Work File System is used to denote a file created during a processing operation at some intermediate stage Sometimes Temporary Files are created to hold some data under processing temporarily. These are deleted after the final processing is over. A Scratch File is one which is not needed any more, containing out-dated data.

ADVERTISEMENTS:

Term Paper # 2. File Processing:

The term file processing is loosely used to mean processing of the data stored in data files. Sorting means arranging the records in a file in ascending or descending sequence based on a particular field [data-item] with unique data values stored in it, called primary key. Sorting involves physical movement of records to a new file that is created and hence there must be enough storage space to hold the duplicate copy of the original file but in sorted order.

Sorting destroys the order of input records of the original file. It is regularly carried out with files stored in magnetic-tapes for undertaking different kind of processing. Sorting is a much slower process than indexing.

i. Matching:

ADVERTISEMENTS:

Matching involves comparing the records of two data files on the basis of values of data, that is, data elements in a specified field. It is generally carried out to determine how the files differ, whether some records are missing, etc. It is an inevitable requirement for merging operation.

ii. Merging:

Merging is the processing operation required extensively with sequential files stored in magnetic tapes. Since in a magnetic tape, a block of data cannot be accurately overwritten by another set of data, that is, overlaying is not possible, the addition / deletion / modification of data in such a file have to be carried out using two files — one file is the original file which is to be modified and the second file contains the records to be changed sorted in the same sequence as the original file.

During the process of merging, records of the two files are matched and a third file is created using records of both the files, the changed records coming from the second file; ultimately the created file contains all the changed records, which is called the “son” of the original file, which is the “father”.

iii. Summarising:

Summarising is the process of creating a summary of the data values of all records based on a common data-element in a specific field. For example, a purchase file may contain a number of records for purchase of each items like chairs, tables, etc.

Summarising will create a file giving the total purchase for each item like chair, table, etc., which will involve totalling the amount field of all chairs and creating one record for it with the total value — there will be one record for each item as against multiple records for each items.

iv. Searching:

ADVERTISEMENTS:

Searching is the process of finding out whether a particular data element in a particular field exists or not. It is most effective when the records are arranged in sequence, either by sorting or indexing, using the specified field as the key. In case there exists a number of same data element, the searching stops at the occurrence of the first such value.

v. Updating:

Updating is the process of bringing the data values in a master-file up to date by using the latest values from a current transaction file — the process is completed by creation of a second master-file. In direct access files, the records of the existing master gets updated, without creation of a new master file, as the data is overlayed.

vi. Indexing:

Indexing is a process of logically sorting the records in a file, which allows the user to view the records in the file on the basis of ascending or descending order of a data item, which forms the primary key. Indexing creates a small second file which contains the record numbers of the indexed file in the required sequence.

Term Paper # 3. Activeness of Files:

Activity refers to the degree of processing a particular file, generally a master-file during a specific period, which means how many times the file has been processed in a year. A high-activity file is one which is accessed over 300 times in a year, a medium-activity file is the one which has been processed at least 50 times and the rest are rated as low-activity files.

Activity Rate is used to mean the number of records processed in a data file as against the number of records stored — the ratio being called hit-rate or activity-rate. Sometimes the term volatility is also used to refer to the frequency at which records are added to or deleted from a file. A high volatile file is one in which records are frequently added/deleted.

Term Paper # 4. Processing Modes:

The mode of processing, based on the timing of processing, is either Batch Processing or Transaction Processing. In the latter case, each transaction is processed as it is completed and obviously this can only be done in the inter-active mode. In Batch Processing, a number of transactions are processed simultaneously after the input data relating to them have been accumulated.

Depending on the physical organisation of the file as stored in the secondary storage, the files can be processed in either Sequential or Direct Mode, also called Random Mode. In sequential processing, the records are processed in a specific sequence in which it is physically stored — in case of magnetic-tape files, it is based on the sequence in which the records are stored generally after sorting, the processing always starting with the first record even if it is not required.

In case of disk-files, the files are directly processed by searching the record to be processed, without going through all the previous records of that particular record. Obviously, some methods are followed to keep track of where the desired data values are stored.

Term Paper # 5. Reconstruction of Data File:

Reconstruction is the process of recovering or rebuilding a data file which has been damaged by mistake, or some other failure, resulting in loss of data. The two methods followed for reconstruction of databases are called roll-forward and rollback processes. The roll-forward procedure starts with a valid copy earlier backed up and then updates it by incorporating the necessary changes to make it up to date with current data.

In the roll-back procedure, the starting point is the current corrupted file from which the invalidities are removed to make the database correct. The procedure to be followed depends on the programs designed to handle different causes of failures.

In the roll-forward method it is absolutely essential to have a proper backup copy of the database and as a routine, backups of databases must be made at regular intervals, especially with disk files because the changes are made by direct overlaying — writing over the existing records. It is also essential to maintain a transaction log detailing the changes made to update the master file.

Sequential files stored in magnetic tapes have one automatic advantage that during the process of updating, a new master-file containing the modified records are always created from the old master-file. As per standard practice, the old master-file is preserved till another updating takes place creating a still newer master file — called three generation of files as grand-father—father—son.

These master files of previous generations can always be used in reconstruction of damaged sequential files stored in magnetic tapes. As already stated, in case of files stored in magnetic-disks, the data is updated directly and so special measures have to be taken to preserve old data. This is generally done by taking periodical back-ups for which appropriate utilities are provided by the operating system.

In some application programs, generally the text-editors, the last file before updating is automatically preserved by giving it a different file extension like, say, .bak. Whatever be the system followed, there should be well laid and tested procedures for reconstruction, before trying them out on damaged files.

Overlay:

In direct access storage devices, the individual records can be directly accessed and the data stored in it can be read, modified, or deleted without affecting any other record. This allows updating of records m-place and this technique is called overlaying.

Term Paper # 6. File Organisation:

It is obvious that data in files have to be organised in a systematic manner for efficient storage as well as retrieval from secondary storage devices for processing. The factors which affect selection of the type of organisation of the files is largely dependent on available types of storage medium as well as the type of processing envisaged, user’s convenience, and cost of processing.

The three choices available are:

i. Sequential,

ii. Random, and

iii. Indexed Sequential

i. Sequential Files:

These type of files are physically so organised that based on the values of a field selected, records are arranged one after another in some order, ascending or descending. During input of data to a sequential file, the records get arranged in the order in which they are entered; the first input becomes first record, the second input becomes second record, and so on.

To obtain the records in a different sequence, say on the basis of ascending employee-number, the original file has to be sorted on the basis of the field of employee-number, and a new sequential file has to be created with employee-numbers in proper sequence. Sequential files may be stored in magnetic-tapes or magnetic-disks, but the choice is generally in favour of the former, if it is available.

During processing, the computer also processes the file sequentially, starting with the first physical record, reading each subsequent one in step one after another and so the records are accessed sequentially. Being kept on tapes, where only sequential files can be stored, the cost of storage is much lower, leading to inexpensive I/O Operation.

The processing is also economical and efficient if the Activity Rate [hit-rate] is high, that is, a large percentage of the records in the file need to be processed when the file is used. Incidentally, it is much easier to create programs for processing sequential files. Because of having the automatic system of creating different generations of the file during processing, reconstruction in case of file crashing is much easier.

However, the greatest disadvantage is that even if a few records need to be accessed, the whole file has to be processed, reading each record in turn, causing unnecessary delay and cost. Statistically, to access any record, half the records have to be accessed. Secondly, for each different operations, the same file have to be sorted in different order and different copies of the same file, arranged in different sequence, have to be kept. Not much advantage is gained if sequential files are stored in direct access storage devices, that is, as disk-files.

ii. Random Access or Direct Files:

In these files, as the name signifies, the records can be entered directly in any random order. Generally, a record-number is available which can be used to store or access different records, if no other addressing method is followed; there being a number of them. Often an arithmetic procedure called transform is used to decide the storage location.

A single record can be accessed directly no matter where it is stored, whether at the beginning or at the end of the file, without having the need to read other records of the file. Because of such nature of storage and retrieval, random-access files cannot be stored in magnetic-tapes; a disk-file in a Direct Access Storage Device [DASD] is a must.

Updating of a master-file can be carried out directly without sorting the transaction files and without creating a new master-file. The cost of secondary storage being disks, however, is much more and it is difficult to reconstruct crashed files unless conscious efforts are made to have regular back-ups. It is suitable for updating few records. The processing time is much lower in disk-files as compared to tape-files. The direct files are also called relative files.

iii. Indexed Sequential Files:

This is an organisation of the files which tries to have the best of both the systems by combining some of the features of sequential and random files. The data in the files are arranged in sequence, as in sequential file, based on the values [data elements] of a specific field [data item] which is used as the key.

In addition, an index is prepared which allows accessing any record without having to go through each record in sequence. So, the file can be processed in batch mode using the physical sequence and also in the direct mode by using the index. For example, you can have a data file of finished products stored under this method. Then this master file can be updated on week-ends in batch mode to account for all sales made and quantities produced during the week, making the balance up to date.

Again, during working days, the marketing people can access the stock of any product to meet enquiries from the customers using the index. Apart from these, when the activity-rate is low, the index can be used for updating the file, leaving sequential operation for high activity-rate. Obviously, direct access storage devices have to be used for such type of files. The method is called Indexed Sequential Access Method or ISAM in short.

Serial Files:

This is the simplest organisation for a data file, where records remain serially in the order they are entered into the file and are also accessed serially. The basic difference between in a serial file and a sequential file is that in both the records are serially arranged, but, in sequential files, the serial is based on the sequence of values in a specific field.

For example, in a serial file of employees, the employee- number will generally not be in sequence unless, special precautions are taken to enter the records in proper sequence based on the employee-number. But the same serial file will become a sequential file, when the records are sorted on the basis of employee-numbers in ascending or descending order.

Term Paper # 7. System Security:

The data stored in a computer system are of vital importance to any organisation for planning and control, as well as to maintain its position in the market. Damage or destruction to data can result from hardware malfunction, software failures, accidents, mistakes, or deliberate vandalism.

Leakage of vital information can may adversely affect the performance of the business organisation. System security refers to protection against all the above factors to maintain integrity and privacy of data.

Basics of Security:

The seven essentials of data base security, as defined by James Martin are as:

A data base should be:

i. Protected: against all kind of damage and destruction

ii. Reconstructable: should be able to rebuild when damaged

iii. Auditable: provision to check the security system

iv. Tamperproof: must be guarded against vandalism

Its users should be:

v. Identifiable: unauthorised persons must not be able to access it

Their actions should be:

vi. Authorised: specific system of granting permission must exist

vii. Monitored: abuse of authority continuously guarded against

The other aspects of security of computer systems are:

i. Ensuring processing of all data from source documents, by pre-numbering, logging activities, and constant monitoring.

ii. Checking validity of input data by incorporating check digits, as well as, using special programs for validation. The system must be capable of blocking or trapping incorrect data. Another measure is to ensure receipt of input data only from authorised sources. In case of on-line entries, control measures should be built in the input program, which is beyond access of the operator. Proper logging of transaction is a must.

iii. In case of damage, the lost data should be recoverable by reconstructing the damaged files. It is essential to maintain a proper catalog of disks and tapes and these must be properly labeled for clear identification to ensure that a wrong file is not processed by mistake, causing damage to it.

iv. Making provisions of system audit to ensure that things are being done as prescribed in a well-planned manner.

v. Unauthorised access to data has to be prevented by available measures, one of which is password control. When trying to access the system or even a part of it, the user must be required to identify himself by entering a predetermined alphanumeric code, which is compared with an existing one kept in the program, preferable in an encrypted form, and entry is allowed only when the two codes match. For further safety, the codes of the password are not displayed on the screen when the user types it.

It is not enough only to have a password system, but it must be ensured that the password owners maintain its secrecy by not disclosing it- to unauthorised persons or noting it down in such a manner that an unauthorised person can get it. In many organisations, the passwords are systematically changed at frequent intervals to doubly ensure their secrecy. The authorised owners should be able to change their passwords as required and this should not be able to be deciphered by others.

vi. Off late, the menace of virus has assumed an alarming proportion. A virus is a self-replicating program code, activated by some normal operation of the computer like say copying of files, which attaches itself to other programs or files or even in specific location of storage devices quietly and then starts causing damage after a period of inhibition.

The entry of virus into a system can be prevented by detecting these at the entry stage by using virus-cards or special software. Some anti-virus programs keep record of existing files and their sizes and reports any change in them to the user. Most of the viruses can be detected by scanning the infected files and eliminated. However, prevention is always better than cure. So floppies must be checked before being used in the system and stray floppies must not be allowed to be used.

To summarise, the three basic pillars of data base security are:

i. Confidentiality:

Ensuring that the data, programs, etc., are not susceptible to fraud, by being accessed by unauthorised persons.

ii. Integrity:

Ensuring that the computer system with its data and all are protected against possible mistakes by authorised persons and malicious modification by unauthorised persons.

iii. Availablity:

Ensuring that the data, etc., would be available to authorised persons without any difficulty or cumbersome procedure.

Just visualise the computerized railway reservation system and you will realize the importance of security. The operators are not given the freedom to manipulate the passenger entries, or to favour somebody who is not in the waiting list or out of turn from the list. There are many other built-in controls to prevent malpractices.

Term Paper # 8. System Audit:

It is not enough to merely install a security system hoping that everything will be functioning properly, it needs to be often checked to see that the security requirements of the computer system are being complied with — the age old term for such verification, which originated with the accounting system, has now spread to many areas, and in computer terminology, it is called System Audit.

It is the group of activities undertaken to get confirmatory evidence of the authenticity, validity, and safety of data that are entered and stored in the data base system. The auditing or checking activities can be broadly classified into two areas — one relating to what goes inside the hardware [CPU] and the other is what are done outside the computer’s hardware but within the computer system.

The former could be designated as ‘auditing through the system’ and the latter as ‘auditing around the system’. In the first case, to test the vulnerability of the system, test data can be entered and traced up to the output stage to see what conversion takes place and whether it matches with the planned expectation.

In the auditing around the system, it involves checking the distribution of authority within the organisation, which must separate different areas of activity like those who prepare the program do not handle the data to be processed. It also checks the log-books maintaining time and duration of processing different jobs, mistakes made and corrected, etc.

The difficulties with normal computer operation is that unless the data is stored, it is not possible to know what goes around, outside the computer, as everything is lost when the power is switched off. Hence, when designing data base systems, this aspect has to be kept in mind, as well as provided for, which is called audit trail.

Term Paper # 9. Audit Trail:

It is the path taken by a transaction when it is processed to generate the output — it includes the source document from which the data is entered, the entries that are subsequently made, and finally the tabulation that is prepared.

To provide for audit trail, it is necessary to ensure that:

i. The documents are so designed, having the necessary details, that it can be used to link the output records with its corresponding input document.

ii. The evidence is created satisfying that all the input documents relating to a particular output has been entered and processed.

iii. Documents are in a position to provide for recreation of transactions for reconstructing operations in case of data loss.

It is absolutely necessary to have audit trail to enable auditing through the system.

Physical Security:

The providing for valid identity cards, recording of entry and exit, etc., are the physical measures that are required for a system security. It is also necessary to ensure that everybody does not have the authority to enter all areas, like store, processing areas, etc. A system of message logging is also a must, which records the communication between different groups of the computer system and the users.

Dumping is a process of taking a hard copy of all the processing done by the CPU up to a stage while carrying out a job. This is used for debugging mainly, but is also used for auditing at times.

Term Paper # 10. Auditing Computer System:

After introduction of computers in business operations, the age old accounting system has not undergone any drastic change, but the system of maintaining records like vouchers, etc., and preparation of accounting statements have undergone vast changes.

Hence, to audit a computer system, it is also necessary to develop new strategies and techniques to ensure that not only the accounting statements prepared are correct, but the computer system itself is not being abused.

The basic job of a computer being to receive input data and generate output information, a check on these two areas deserve maximum attention. In case of centralised processing, as used to be done with mainframe computers, there were enough scope to check the incoming and outgoing documents.

But, with interactive systems being increasingly used, the strategy has to be different, as the type of input documents have changed. May be the sales transaction itself is recorded directly in a computer terminal, where the bill is also being printed.

i. Input Control:

i. Depending on the type of processing being followed, the following aspects may be checked.

ii. Whether the various input documents being received are properly numbered and grouped properly.

iii. Whether the corrections made in the input documents are properly authenticated.

iv. What measures are taken to ensure that each of them have been correctly entered.

v. What procedures have been introduced for data validation. It may be necessary to actually inputting some incorrect data to see whether this being accepted by the system.

vi. Whether coding system is being consistently followed.

ii. Output Control:

The best way to ensure that outputs are being properly generated is to get some relevant data, enter them, and check the information generated. Whether this information is proper as per the processing methods to be followed is to be checked.

One must keep in mind that the computer being a fast processing machine the implication of incorrect input and or processing is stupendous. A wrong material code may cause huge loss and it may go undetected as the final amount is generated by the computer and usually there is no other means to verify such totals.

For instance, in an actual system of payroll accounting it was found that the very program developed to generate the pay statements was faulty, resulting in higher payments. It is for this reason, it is always recommended that before switching over to the computerised system, it must be run in parallel with manual system to ensure that the output generation is correct.

Term Paper on the File System | Computers | Information Technology