Académique Documents
Professionnel Documents
Culture Documents
M.H Neishaburi, Masoud Daneshtalab, Mohammad Reza Kakoee, Saeed Safari, University of Tehran, Iran. mhnisha@cad.ece.ut.ac.ir {kakoee, safari}@cad.ece.ut.ac.ir ; m.daneshtalab@ece.ut.ac.ir Abstract
Nowadays, more critical applications that have stringent real-time constraint are placed and run in an environment with Real-Time operating system (RTOS). The provided services of RTOSs are subject to faults that affect both functional and timing of Tasks which are running based on RTOS. In this paper, we try to evaluate and analyze robustness of services due to soft-errors in two proposed architecture of RTOS which are (SW-RTOS and HW/SWRTOS). According to experimental result we finally propose an architecture which provides more robust services in term of soft-error. Real-Time Operating System (RTOS) users desire predictable response time at an affordable cost, due to this demand Hardware/Software Real-Time Operating Systems (HW/SW-RTOS) appeared. This paper analyzes the impact of soft-errors in real-time systems running applications under purely Software RTOS versus HW/SW-RTOS. The proposed model is used to evaluate robustness of services like scheduling, synchronization time management and memory management and inter process communication in Software based RTOS and HW/SW-RTOS. Experimental results show HW/SW-RTOS provide more robust services in term of soft-error against purely software based RTOS years ago. The sad thing (but not unexpectedly) is that these improvements have not come free. By gradual shifting towards sub-micron era lots of new problems and challenges should be addressed; without that the prophecy of Moore's law can not be held and demise of CMOS technology in a couple of years will be seen. The sub-micron effects beyond 10 nm will be so fundamental that it needs answers in all the hierarchy of design; down to up; device to the system. The system designer must keep in mind that regardless of the attempts of device engineer, faults and single event upsets are inevitable. The system should be designed in a way that becomes robust against all kinds of soft-errors and faults. This new aspect of design will become critical in hardreal time systems in which the output of the design must be valid in the required deadline and the reliability of the system is as important as the functional accuracy. Failure to meet the deadline or the crash of the application may result in catastrophic events. There are lots of applications, e.g. life-supporting instruments, aerospace equipments, traffic control and etc. Several innovations have been introduced by system designers to deal with these problems previously. These solutions mostly were concerned about designing a robust application, for example the method presented in [8] includes an additional application which should check other applications in their workspace memory. In [9] the old idea of replication of systems is proposed. And finally in [10], [11] the researchers were concerned about designing a robust scheduling algorithm. As stated in [7] these techniques are not enough. By fault injection into the system, they found that a soft error may cause failure in multi-tasking process of a RTOS. This fault may be propagated to the application level and thus defeat the entire envisaged fault-tolerant mechanisms. As we mentioned before, this could lead to endangering valuable assets and life. hence, we can conclude the necessity of designing a robust fault tolerant RTOS. In [1], the authors suggested that by HW/SW partitioning of operating system and moving some of the RTOS functionalities (such as task synchronization and scheduling) to HW, much faster executions may be obtained. This improvement has come with the cost of only 13K gates.
General Terms
Reliability, Verification.
Keywords
Software Real-Time Operating System (SW-RTOS), Hardware/Software Real-Time Operating System (HW/SW-RTOS), Soft-Error.
1. Introduction
Up to now the industry of silicon devices has remained loyal to the Moore's Law. Feature size and other aspects of the fabrication process of Integrated Circuits have been improved constantly. The delay and size of the transistors have decreased tremendously comparing to a couple of
1-4244-1031-2/07/$25.002007 IEEE
528
Although, many researches have been done in this topic [2-6] there is still no commercial RTOS which take advantages of this feature. They hoped by advent of the fast inter-chip communication which was introduced by SoCs, the remaining hurdles will be overcome. The main contributions of this paper are as follows: We analyze and evaluate the effect of soft-errors in services which are provided by purely software base RTOS (SW-RTOS) and Hardware/Software RTOS (HW/SW-RTOS). We propose an effective RTOS architecture that provide more effective and robust services related to softerrors. The rest of the paper is organized as follows: section 2 introduces some basic preliminary and definitions, in Section 3 we explain our experimental framework which was used in this research. In Section 4, our experimental results can be found and finally Section 5 concludes the work.
2. PRELIMINARY
Real-Time Operating System ("RTOS") provides an "abstraction layer" that hides hardware details of processor (or set of processors) from software layer .In providing this "abstraction layer" the RTOS kernel supplies four main types of basic services to application software, figure 1 shows these services.
InterProcess communication(IPC) & Synchronization Dynamic Memory Allocation Time Management (Timer) Task Managment
Communication and Synchronization. In order to pass information from one process(task) to other process software developer should be familiar with these services of RTOS. These services make it possible for tasks to pass information from one to another, without danger of that information ever being damaged. They also make it possible for tasks to coordinate, so that they can productively cooperate with one another. Due to stringent timing requirements most Real-Time application which runs under RTOS, most RTOS kernels provide some basic Timer services such as task delays and time-outs. Dynamic Memory Allocation services are another service that are often provided RTOS kernels. This category of services allows tasks to "borrow" block of RAM memory for temporary use in application software. Often these blocks of memory are then passed from task to task, as a means of quickly communicating large amounts of data between tasks. Many non-real-time operating systems also provide similar kernel services. The key difference between general-computing operating systems and real-time operating systems is the need for deterministic timing behavior in the real-time operating systems. Formally, deterministic timing means that operating system services consume only known and expected amounts of time.
3. EXPERIMENTAL FRAMEWORK
In this section we present our approach for both designing a HW/SW-RTOS and injecting faults in the proposed model.
Task Management is the most basic category of kernel services. This module provide services like task creation, task scheduling and priority assignment to task with the help of these services software developers can be able to partition their design as a number of separate parts of software which each of them handles a distinct topic, a distinct goal, and perhaps its own real-time deadline. Each part of software is called a "task." Furthermore, Task Scheduler controls the execution of application software tasks, and can make them run in a very timely and responsive fashion. The second category of kernel services, which is shown in Figure 1, is InterProcess ( InterTask )
529
Application LEVEL
SW-Part
Ts1 Ts2 Ts3
HW-Part
Data Exchanger
Buffers
1 RTOS LEVEL RTOS Kernel Context Switch scheduler MMU n HW1 HW2 2 Task
SW-RTOS
SW-Part
Application LEVEL Ts1 Ts2 Ts3
2 1 nextTask
HW-Part
HW1 HW2
callRTOS
Hardware Scheduler
RTOS LEVEL
RTOS Kernel
Context Switch
Exchanger scheduler
Data
MMU
SocLock CACHE
HW/SWRTOS
RTOS we directly update memory locations used for implementation of semaphores and mutex. While interprocess communications have been done by generating standard bus transactions, consequently they are done in HW/SW-RTOS more efficiently in comparison with SWRTOS implementation. Figure 2 shows the comparison between standard software RTOS (top) and our proposed HW/SW-RTOS architecture (bottom). As shown in Figure 2, in the proposed HW/SW-RTOS, the operating system is composed of three parts (i) Scheduling unit (ii) DataExchanger unit (iii) Context switching unit
530
3.2.2. DataExchanger
Data exchanger unit uses buffers to pass data between different tasks. When a task tries to send data to another task, it informs DataExchanger unit identifier of destination task and the value which must be sent. In this case DataExchanger manages internal buffers to guarantee that the value will be reached to the specified task. Conversely, when a task needs data provided by some other tasks, it informs DataExchanger unit identifier of source task. Then DataExchanger blocks waited task and calls scheduler unit to send the identifier of the schedulable task to the CPU.
Fault tracer collects information about the services that are currently executing in RTOS, (SW-RTOS) and (HW/SW-RTOS) inform this part of (FIE) about the kind of services that are active.
TS3
MEM
SW-RTOS
TS2
CPU
TS1
Fault Generator
Injection
Fault Tracer
Fault
a c t iv a t e
TS3
CPU
HW/SW-
RTOS
TS2
MEM
TS1
531
sends a message periodically into a mailbox, while T2, the receiver, consumes the message and uses it in its future operations. Group4 tasks T1, T2 and T3 access a global variable which has been protected by mutual exclusion semaphore (mutex). Group5 tasks T1, T2, T3, and T4 access a global variable using semaphore1 (sem1), while tasks T5, T6, T7, and T1 access global variable using semaphore2 (sem2). Group6 tasks T1, and T2 access a global variable using mutex; then each of them, that gain access to global variable, sends the results of its computations into message queue (QM) and finally, task T3 receives its message from message queue (QM).
output results, (ii) Real-time problem (iii) Process Hanging (system continue its working but some processes stop their operations). Application Exception: one or more application tasks trigger some exception routine (e.g. illegal instruction, division by zero and etc.); System crash the system stops functioning.
T4
T3
T2
T1 T1
QM
T2
Group2
T1
T2
T3
T2
T1
T4 T3 T2
T1 T5
T6
T7
T1
Gloabal VAR
T2
Mutex QM
T3
Group5
Group6
4. Experimental Results
This section describes and analyses the obtained results to get evidence of soft-errors consequences in the case of a real-time application. Transient faults may cause several malfunctions when the real-time kernels services are corrupted. These malfunctions are classified as follows: Safe: no visible effect on system functionality. Application failure represents a class of faults with some effects on the application level. This class of faults can be subdivided to: (i) Incorrect
532
46 44 42 40 38 36 34
Memor Managemen Synchronization Task Managemen Tim Managemen Scheduler
System Crash
Exception
A remarkable feature of our results that is apparent from Figure 5 and 6 is that all services provided by HW/SW-RTOS are more robust than the same services provide by eCos (SW-RTOS). Figure 7 shows the effectiveness of HW/SW-RTOS services in terms of reliability related to soft-error.
80 70 60 50 40 30 20 10 0
Safe Application Failure System Crash Exception
Figure 8 shows the hardware overhead related to different units of HW/SW-RTOS. As shown in this Figure, the HW/SW-RTOS implementation imposed us hardware overhead equals to 12830 gates.
14000 12000 10000
Number of Gates
5. Conclusion
Real-time applications which have safety-critical constraints are often based on real-time operating systems. Real-time operating systems are subject to faults that affect both the correctness of logical results and the timing of tasks response. Hardware Real-Time Operating Systems (HW/RTOS) appeared to provide predictable response time at an affordable cost. In This paper, we analyzed the impact of soft-error in real-time applications running under a RTOS which is implemented in HW/SW. Our experimental
Services related to both synchronization and time managements are considerably improved as shown in Figure 7. We can justify these improvements by dedicated hardware synchronization part of our HW/SW-RTOS.
533
results show that soft-errors occurring in a real-time operating system (either in SW or HW kernel) have a major impact on the systems behavior. Moreover, it was found that all groups of eCos services have the same sensitivity profile. Experimental results also show the robustness of HW/SW-RTOS services in term of softerror versus SW-RTOS services. Experiments show considerable improvement in robustness of synchronization services which are provided by HW/SWRTOS against SW-RTOS, due to the dedicated synchronization hardware.
Hardware Operating System based Approach for Run-time Recongurable Platform of Embedded Devices, in 6th Real Time Linux Workshop, (Singapore), Nov 3-5 2004.
[6] L. Lindh and F. Stanischewski, Fastchart-idea and
Error Classification and Impact Analysis on RealTime Operating Systems, DATE 2006,
[8] Ph. Shirvani, R. Saxena, E.J. McCluskey, Software
6. Acknowledgment
The authors wish to acknowledge Iran telecommunication Research Center (ITRC) for the partial financial support during the course of this research.
implemented EDAC protection against SEUs, IEEE Transaction on Reliability, Vol. 49, No. 3, Sept. 2000
[9] V. Izosimov, P. Pop, P. Eles, Z. Peng, Design
7. References
[1] S.
optimization of time- and cost-constrained faulttolerant distributed embedded systems, Design, Automation and Test in Europe, Munich, Germany, 7-11 Mars 2005, pp. 864-869
[10] S. Ghosh, R. Melhem, D. Mosse, J. Sarma, Fault-
Chandra, F.Regazzoni, and M. Lajolo, Hardware/Software Partitioning of Operating Systems: a Behavioral Synthesis Approach, in proc. GLSVLSI06, pp. 324-329. V. J. Mooney and D. M. Blough, A hardwaresoftware real-time operating system framework for socs, IEEE Des. Test, vol. 19, no. 6, pp. 4451, 2002. M. Imai, Hardware implementation of a real-time operating system, in Proceedings of the 12th TRON Project International Symposium, pp. 3442, 1995.
tolerant Rate Monotonic Scheduling Journal of Real time systems, vol.15, No.2, September 1998
[11] P. Mejia-Alvarez, D. Moss, A responsiveness
[2]
approach for scheduling fault-recovery in real-time systems, 5th Real-Time Technology and Applications Symposium, 2-4 June 1999,pp.4
[12] http://en.wikipedia.org/wiki/eCos [13] B.Nicolescu, N.Ignat, Y. Savaria, G. Nicolescu,
Sensitivity of Real-Time Operating Systems to Transient Faults: A case study for MicroC kernel, IEEE Radiation and its Effects on Components and Systems, Cap de Agde, France, Sept. 19-23, 2005
[14] Motorola HC12 CPU awareness and true-time
soc designs, in SAC 04: Proceedings of the 2004 ACM symposium on Applied computing, (New York, NY, USA), pp. 869875, ACM Press, 2004.
534