Parallel Systems Parallel Systems Parallel systems improve processing and I/O speeds by using multiple CPUs and disks in parallel. The driving force behind parallel database systems is the demands of applications that have to query extremely large databases (of the order of terabytes—that is, 1012 bytes) or that have to process an extremely large number of transactions per second (of the order of thousands of transactions per second). Centralized and client–server database systems are not powerful enough to handle such applications. There are three measures of performance of a database system: Throughput: Number of tasks completed per unit time. Turnaround time: The interval from time of submission of the task to the time of completion of the process. Response time: The time of submission to the time the first response is produced Two important issues in studying parallelism are speedup and scaleup. 1. Running a given task in less time by increasing the degree of parallelism is called speedup. 2. Handling larger tasks by increasing the degree of parallelism is called scaleup.
Parallel Database Architectures There are several architectural models for parallel machines. Let M denotes memory, P denotes a processor, and disks are shown as cylinders. 1. Shared memory. All the processors share a common memory. 2. Shared disk. All the processors share a common set of disks Shared-disk systems are sometimes called clusters. 3. Shared nothing. The processors share neither a common memory nor common disk. 4. Hierarchical. This model is a hybrid of the preceding three architectures Shared Memory In shared-memory architecture, the processors and disks have access to a common memory, typically via a bus or through an interconnection network. Advantages The benefit of shared memory is extremely efficient communication between processors—data in shared memory can be accessed by any processor. A processor can send messages to other
processors much faster by using memory writes (which usually take less than a microsecond) than by sending a message through a communication mechanism. Disadvantages The downside of shared-memory machines is that the architecture is not scalable beyond 32 or 64 processors because the bus or the interconnection network becomes a bottleneck (since it is shared by all processors). Shared-memory architectures usually have large memory caches at each processor, so that referencing of the shared memory is avoided whenever possible. Moreover, the caches need to be kept coherent; that is, if a processor performs a write to a memory location, the data in that memory location should be either updated at or removed from any processor where the data is cached. Maintaining cache-coherency becomes an increasing overhead with increasing number of processors. Consequently, shared-memory machines are not capable of scaling up beyond a point; current shared-memory machines cannot support more than 64 processors. Shared Disk In the shared-disk model, all processors can access all disks directly via an interconnection network, but the processors have private memories. Benefits First, since each processor has its own memory, the memory bus is not a bottleneck. Second, it offers a cheap way to provide a degree of fault tolerance: If a processor (or its memory) fails, the other processors can take over its tasks, since the database is resident on disks that are accessible from all processors. Disadvantages The main problem with a shared-disk system is again scalability. Although the memory bus is no longer a bottleneck, the interconnection to the disk subsystem is now a bottleneck. Compared to shared-memory systems, shared-disk systems can scale to a somewhat larger number of processors, but communication across processors is slower (up to a few milliseconds
in the absence of special-purpose hardware for communication), since it has to go through a communication network Shared Nothing In a shared-nothing system, each node of the machine consists of a processor, memory, and one or more disks. The processors at one node may communicate with another processor at another node by a high-speed interconnection network. Benefits Consequently, shared-nothing architectures are more scalable and can easily support a large number of processors. Disadvantage The main drawbacks of shared-nothing systems are the costs of communication and of nonlocal disk access, which are higher than in a shared-memory or shared-disk architecture since sending data involves software interaction at both ends. Hierarchical The hierarchical architecture combines the characteristics of shared-memory, shared disk, and shared-nothing architectures. Commercial parallel database systems today run on several of these architectures. Types of Query