Académique Documents
Professionnel Documents
Culture Documents
Abstract
The recent emergence of heterogeneous chip multiprocessors requires a different operating system organization from the usual SMP (symmetric multiprocessing) organization. Although the SMP organization has been widely adopted in modern multiprocessor operating systems, it is restricted to homogeneous processors with a global shared memory. On the other hand, the master-slave organization has little dependency upon the underlying hardware architecture, thus having great potential to cope with heterogeneous multiprocessors. This motivated us to reexamine the master-slave approach. In this paper, we attempt to address real-time and performance issues associated with the master-slave approach. Specically, we rst describe our previous design of a master-slave architecture, called APRIX. We then present an improved communication mechanism between the master and slave, which allows the master to provide priority-based system call services to slave kernels and also improves the overall multiprocessing performance.
shared memory. On the other hand, the master-slave organization [3] has little dependency upon the underlying hardware architecture since it merely assigns different roles to different processors. In a usual master-slave organization, one processor is designated as the master that can execute in supervisor mode while other processors are designated as slaves that execute only in user mode. This organization does not assume any specic hardware architectures, thus applicable to a wide range of hardware architectures from symmetric multiprocessors with UMA (uniform memory access) to asymmetric multiprocessors with NUMA (nonuniform memory access). Although the master-slave organization has great potential to cope with heterogeneous multiprocessors, it has rendered itself less well studied than the SMP approach due to some serious concerns about performance and difculty of software development. One concern is that a master processor would become a performance bottleneck as the number of slaves increases, thus offering lower scalability than the SMP organization. However, it should be noted that most modern heterogeneous multiprocessor systems of practical interest are developed with a specic purpose in mind such as multimedia processing and/or signal processing rather than massively parallel computation, and thus that they usually contain less than about eight CPUs. The Cell processor [11] is a good example, which contains one power processors as a master and eight synergistic processors as slaves. Furthermore, the target applications like multimedia and signal processing are inherently compute-intensive and involve little masters supervisor mode services. This implies that the master-slave organization can be used without serious performance degradation. Note that a common busbased SMP organization also has limited scalability due to bus contention, lock contention, and cache coherency. It is usually accepted that even the bus-based SMP organization limits the number of processors to about 8 or 16 CPUs. The other concern raised by the master-slave organi-
1. Introduction
The recent emergence of heterogeneous chip multiprocessors such as Philips Nexperia, TI OMAP, ST Nomadic, Qualcomm MSM, and STI Cell [11] requires a different operating system organization from the usual SMP (symmetric multiprocessing) organization [7, 2, 6]. Although the SMP organization has been widely adopted in modern multiprocessor operating systems, it is restricted to homogeneous processors with a global shared memory so that all processors run a single copy of SMP kernel located on the
This work was supported by the Korea Research Foundation Grant funded by the Korean Government (MOEHRD) KRF-2005-041-D00625. Corresponding author: msryu@hanyang.ac.kr.
233
zation is the difculty of software development. In a master-slave setting, the target software needs to be divided into several parts and each part should be separately programmed on each processor. The communications between separate parts can be accomplished by explicit message passing mechanisms. On the other hand, the SMP organization provides the same programming model as the multithreaded uniprocessor programming model. Communications are also made easy since shared address space is available between threads. Here, we note that the difculty of programming in the master-slave organization is mainly caused by the underlying hardware architecture not by the organization style. Programs must be separately developed due to the processor heterogeneity. Communications must be done via message-passing if shared address space is not available. It is not difcult to imagine a master-slave organization on top of homogeneous multiprocessors with UMA as presented in the seminal work [3], which may be able to provide the same programming model as the uniprocessor programming model. Recently, we developed a prototype of master-slave operating system, called APRIX (Asymmetric Parallel RealtIme KernelS) for multiprocessor real-time systems [9]. APRIX has been constructed by converting a uniprocessor RTOS kernel into two cooperative kernels, i.e., master kernel and slave kernel. The master kernel is responsible for allocating and scheduling application tasks onto slaves and providing remote system call (RSC) services requested by slaves. The slave kernel executes allocated tasks in user mode and requests RSC service when needed. In the previous work, we designed and implemented APRIX with an emphasis on the structural design issues. In this paper, we attempt to address real-time and performance issues in more detail. Specically, we will focus on the RSC mechanism that plays a central role in master-slave communications and thus is a signicant factor contributing to the systems overall performance. First, we will discuss how to correctly design the RSC mechanism with common operating system architectures in mind. Second, we identify the problem of priority inversion and describe a prioritybased RSC mechanism. Finally, we present several options to increase the overall multiprocessing performance, which also helps mitigating the problem of master being a bottleneck.
code and data of the SMP kernel, synchronization is a key issue for the SMP architecture. Early implementations of SMP kernels relied on coarse-grained locking to simplify the synchronization problem, but almost all modern SMP kernels are now based on ne-grained locking to achieve the maximum parallelism [7, 2, 6]. Note that modern SMP kernels have evolved from monolithic uniprocessor kernels [10, 1, 2, 4, 8]. Since several processors can execute simultaneously in the kernel and may access the same kernel data structures, sophisticated synchronization mechanisms were required in adapting uniprocessor kernels to SMP versions. The other class of operating system architectures is the master-slave architecture [3]. In [3], Goble et al. implemented a master-slave system on a dual processor VAX 11/780, where the master is responsible for handling all system calls and interrupts. Slaves execute processes in user mode and send a request to the master when a process makes a system call. Recent work on the master-slave approach has been reported in [5]. Kagstrom et al. attempted to provide multiprocessor support without modifying the original uniprocessor kernel. Their idea was to create and run two threads for each application, a bootstrap thread and an application thread. The application thread runs the application on the slave and the bootstrap thread runs on the master awaiting a request for system call. On receiving a request, the bootstrap thread then calls the requested system call on behalf of the application thread. The remainder of this paper is organized as follows. Section 2 describes related work and overview of APRIX. Section 3 presents the improved design of RSC mechanism. Section 4 describes another prototype implementation of APRIX on TI DaVinci and experiment results. Section 5 concludes this paper with future work.
2. Overview of APRIX
APRIX is based on a simple model of master-slave that is similar to that of [3]. The master kernel has two basic functions: assigning application tasks to slaves and providing kernel services to slaves. Every slave, on the other hand, has a simple function of executing application tasks in user mode. Note that the master kernel itself is also able to run application tasks.
234
The master kernel has been constructed by incorporating our uniprocessor kernel, called QURIX [9], with additional components. The additional components are a global task scheduler and a kernel service handle. Figure 1 (A) shows the master kernels structure, where shaded boxes represent added components and white boxes represent native components. Since the slave kernel requires a minimal set of functions, many of the original components have been removed from the uniprocessor kernel. The remaining components include the local scheduler and task manager. The slave kernel also contains two additional components, a local dispatcher and a kernel service proxy that are responsible for handling masters command for task assignment and managing remote invocation, respectively. Fig 2 (B) shows the resulting structure of slave kernel.
(A)
(B) Figure 1. The structures of APRIX : (A) Master kernel and (B) slave kernel.
235
service proxy can be viewed as a stub procedure, which converts the operation name and parameters into a message and sends the message to the master. When the kernel service handler receives the message, it invokes the requested operation, packs the result into a message, and sends it to the slave. The kernel service proxy in turn receives the message and returns the result to the caller task.
extended version of APRIX provides a pool of RCS threads that are solely intended for providing RSC services. APRIX creates the same number of RSC threads as the number of slave processors. Note that there can be different choices for the number of RSC threads. One approach could choose the number of application tasks that exist in the entire system as the number of RSC threads. However, this would increase the overheads required for creating and scheduling RCS threads, thus signicantly degrading the overall performance. On the other hand, if we choose too small a number, the probability of the above phenomenon would increase. A reasonable choice seems to be the number of slave processors, which represents the actual degree of physical parallelism.
236
5. Conclusion
In this paper we presented a master-slave operating system architecture for heterogeneous multiprocessors. Specifically, we improved our initial design of APRIX with an emphasis on the real-time and performance issues. We have not performed thorough evaluation for the proposed techniques since the implementation is still going on at the moment of this writing. We will continue to implement the techniques described in Section 2 and 3, and report more detailed results in the near future.
References
[1] M. J. Bach. Design of the UNIX Operating System. Prentice Hall, 1986. [2] R. Clark, J. OQuin, and T. Weaver. Symmetric multiprocessing for the aix operating system. In Proceedings of the 40th IEEE Computer Society International Conference, pages 110115, 1996. [3] G. H. Goble and M. H. Marsh. A dual processor vax 11/780. In Proceeding of the 9th Annual Symposium on Computer Architecture, pages 291298, 1982. [4] M. D. Janssens, J. K. Annot, and A. J. Van De Goor. Adapting unix for a multiprocessor environment. Communications of the ACM, 29:895901, 1986. [5] S. Kagstrom, H. Grahn, and L. Lundberg. Experiences from implementing multiprocessor support for an industrial operating system kernel. In Proceeding of 11th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications, pages 365368, 2005. [6] S. Kleiman, J. Voll, J. Eykholt, A. Shivalingiah, and D. Williams. Symmetric multiprocessing in solaris 2.0. In Proceedings of the Thirty-Seventh International Conference on COMPCON, pages 181186, 1992. [7] G. Lehey. Improving the freebsd smp implementation. In Proceedings of the FREENIX Track: 2001 USENIX Annual Technical Conference, pages 155164, 2001. [8] C. H. Russell and P. J. Waterman. Variations on unix for parallel-processing computers. Communications of the ACM, 30:10481055, 1987. [9] M. Seo, H. S. Kim, J. C. Maeng, J. Kim, and M. Ryu. An effective design of master-slave operating system architecture for multiprocessor embedded systems. In Proceedings of the 12th Asia-Pacic Computer Systems Architecture Conference, pages 114125, 2006. [10] U. Vahalia. UNIX Internals: The New Frontiers. Prentice Hall, 1996. [11] W. Wolf. The future of multiprocessor systems-on-chips. In Proceeding of the 41st Annual Conference on Design Automation, pages 681685, 2004.
237