Distributed GCC Compiler Interface

2.2 Required Properties Of A Distributed System

As seen already there are characteristics that all distributed systems exhibit plus their advantages and disadvantages. There are five areas that must be dealt with to create a distributed system. These areas are transparency, flexibility, reliability, performance and scalability.

Transparency is making the system appear to look like a single system to the user. There are four properties of transparency, which are location, migration, replication and concurrency. Users should not be able to tell where resources are located within the system (location). These resources should be able to move anywhere in the system without have their names changed (migration). Resources like systems, files and others should be able to be replicated without users noticing (replication). Finally multiple users should be able to share these resources at the same time (concurrency).

Flexibility is one the more important properties to be kept in the forefront of design. Designs must be able to adjust to changes in needs of the system. Systems design for one purpose may quickly outlive their usefulness. If these systems are locked into this one design then an entire new system will need to be constructed for the new problem. A better way would be to keep the designs adjustable enough so that changes can occur with minimal effort. One way to keep a distributed system is to use a micro kernel as the operating system. A micro kernel contains the basic elements necessary to operate the system that it is installed on. Modules that provide additional functions can be added or removed as needed. This type of operating system provides a better allocation of resources and flexibility than a monolithic kernel.

Reliability is one of most important properties in the eyes of the user. Nothing causes more stress and anger than a service that fails to what it was designed to do. The system should be designed with how long should the service be available for service. Also the system whose mission is critical to an operation should have built in redundancy. The idea is that one component fails, that another should be there as a backup to main availability of the service. At no point in a failure should files be damaged from a failure. The idea is that these failures will be hidden from the users. This therefore gives the impression that the system is always performing its duty. Yet while behind the scene, the system deals with an assortment of problems effortlessly.

By providing flexibility and reliability there should be no reduction in the performance of the system. The user should not notice a difference in the systems performance when compared to a single system running the same application. In measuring the performance of the system, response times, throughput (number of jobs per hour), system utilisation and amount the bandwidth used should be monitored. The values will provide an insight into how the system is actually performing versus benchmarks. This can provide areas that need to be targeted to reduce bottlenecks in the network, discover quicker methods to slow algorithms, etc. Andrew Tanenbaum says this about performance:

"The performance problem is compounded by the fact that communication which is essential in a distributed system (and absent in a single-processor system) is typically quite slow. To optimise performance, on often has to minimize the number of messages. The difficulty with this strategy is that the best way to gain performance is to have many activities running in parallel on different processors, but doing so requires sending many messages.

One possible way out is to pay considerable attention to the grain size of all computations. In general, jobs that involve a large number of small computations, especially ones that interact highly with one another, may cause trouble on a distributed system with relatively slow communication. Such jobs are said to exhibit fine-grained parallelism. On the other hand, jobs that involve large computations, low interaction rates, and little data, that is, may be a better fit." ¹²

Fault tolerance also provides problems by requiring extra work to handle errors within the system. This adds extra overhead for the system thereby reducing the performance of the system. To improve performance many systems should be co-operating to prove the same services.

The final element, scalability, is one that must also be in the forefront of the design. A distributed is one that must continually adjust to handle changes. Can the system grow into a larger system to handle the growth in users or demands on it? This question must always be answered whenever a new system is constructed. To ensue that the system is scalable there is one principle that centralized components, tables, and algorithms must be avoided. Failures should be caused by one single pointed of failure. By avoiding centralization the system will maintain its flexibility, reliability and be definite scalable.

Previous: Chapter 2 - Characteristics of Distributed Systems

Home

Next: Chapter 3 - Background on Related Technologies