Vous êtes sur la page 1sur 5

Failure Rates

Terminology:

MTTF - Mean time to failure MTBF - Mean time between failures MTTR - Mean time to repair

Availability = MTTF/MTBF Downtime = 1 - Availability MTTF of disk = 300,000 ~= 34 years o This is a misleading measure, no one really knows, but it's probably lower. AFR (Annualized Failure Rate) is a more meaningful measurement o A normal bathtub curve would be expected, but the data doesn't represent that. o In reality, it starts low (due to factory testing), then trends sharply upwards and levels off at about 10%. The RAID 5 failure rate increases more sharply and to a higher percentage. This is due to the fact that if two drives fail, all the drives are toast. Many RAID boxes have a hot space they switch to in case of a failure. ASSUMPTION: Failures are independent.

Distributed Systems

Distributed systems cannot drop into kernel mode. They use message passing. Specifically, they use remote procedure calls (RPC) o Abstracted to look like a normal function, like close(19). Underneath the hood, it's very different Hard modularity is obtained for free due to its nature. On the other hand, we lose call by reference, because there is no shared address space. It is limited by network bandwidth. It is far less secure. Messages can get lost or duplicated. The server can become a bottleneck (however, this can happen to anything) Machines may have different architecture. o This can lead to issues in interpretation. For example, big-endian vs. little-endian

Dealing with architectural differences


Standard interchange format: XML text (this has a considerable network + CPU overhead) Have everybody know everybody else's architecture (this doesn't scare, O(N^2)) Standardize on big-endian (this is how it is done on the net) o This is what is normally done. o It's the same for every other difference. o In the case of big-endian vs. little-endian, there's a machine instruction in x86 that swaps back and forth between the two. Marshalling: Converting data in memory to a data format suitable for storage or transmission. o This is also known as serialization, or pickling. o It's a pain to do this every time you do a function call. o Stubs (automatically derived from a protocol description)

Example of RPC Scenario: A client talking to a window server. Request: Draw 10 20 blue Response: OK Result: The pixel at coordinates (10,20) is turned blue.

In order to optiomize, commands could be combined into more complicated commands. o For example, there could be a rectangle command to draw a rectangle, instead of just a point. o This complicates the API further though.

RPC Failure Modes


Lost messages Duplicated messages Corrupted messages Network may be down or slow Server may be down or slow

Solutions for Lost Messages: AT-LEAST-ONCE RPC: If no response, resend the request (this isn't always the correct action to take). AT-MOST-ONCE RPC: If no response, report an error. EXACTLY-ONCE RPC: This is the ideal. It is also very hard.

Performance Issues

Travel time between computers if often the source of the most delay. o In order to alleviate this, we can use asynchronous calls. However, this can cause other problems, as if calls depend on each other, a earlier call failing can corrupt the state of the program. o Another solution is to coalesce calls. For example, there could be a rectangle command to draw a rectangle, instead of just a point. This complicates the API further.

La Maintenance Professionnelle
Maintenance Professionnelle 25 novembre 2007 Passer dune maintenance de rparation, agissant en pompier un mtier de prventeurs, dexperts pour obtenir 0 panne, cest un changement culturel pour:

le management: o reconnaitre ce qui ne ce voit pas: le prventif, o plutt que ce qui se voit: les dpannages le responsable de maintenance: un autre challenge les techniciens de maintenance: un autre mtier

Les objectifs de la maintenance professionnelle:

Maximiser la fiabilit des quipements pour un cot conomique.En amliorant la fiabilit, la scurit et la qualit produit seront galement amliores liminer les activits de maintenance non planifies, improvises Utiliser les mthodes de maintenance (priodiques, conditionnelles, autonome, ) en fonction de la criticit des machines pour un meilleur cot Dvelopper les comptences des personnels de maintenance et des oprateurs pour supporter la stratgie de maintenance professionnelle. Crer une culture zro dfaillance Planifier les activits pour rduire au maximum les arrt de production.

De faire le minimum de travail ncessaire llimination des pannes et lutilisation maximum des quipements Les 8 fondations dun service de maintenance: 1. 2. 3. 4. 5. tablir la classification des quipements AA, A, B et C Dfinir les flux dinformations et de pices (work flow management) Dvelopper les dossiers machines: comptences quipements Gestion des pices de rechange, magasin Ressources de Maintenance: 5S latelier de maintenance, gestion des soustraitants, comptences des techniciens de maintenance 6. Gestion de la lubrification 7. Gestion des pannes, recueil des informations, analyse 8. tablissement et suivi dindicateurs clefs Les indicateurs de maintenance:

MTBF Mean Time Beetween Failure: le temps moyen entre 2 pannes mesure la fiabilit des quipements, lefficacit du service de maintenance viter les pannes. MTTR Mean Time To Repare: le temps moyen de rparation mesure lefficaict du service de maintenance rparer.

Les diffrents types de maintenance: Et lutilisation des types de maintenance en fonction de la criticit des quipements: Nous ne dploieront pas les mmes ressources pour un quipement critique AA et un autre qui ne lest pas du tout!!

Vous aimerez peut-être aussi