Oh what to do, the boss want the system up 100%, but he wants to spend nothing to do it.
How do I cope?
What if there was a way to give them a number that they could quantify, and then the costs for each step to make a system more reliable could be seen from a cost to benefit ratio?
High availability is more complicated that having five 9’s, there are a lot of things that can be done to improve the availability profile of a IT system. These things are varied and diverse and have different effects on the system. In communicating all the things that need to be done, its easy to get lost in the weeds.
One of the problems, is that non-techies, see only the cost, and the big companies marketing of expensive solutions of fancy equipment. This is not true, it's not about cost, it is about taking advantage of what you already have, and what you could easily do. Of course good quality equipment is required for good results, unreliable inexpensive hardware is, well, Unreliable. However there are many things that can be done with new or existing equipment to provide the maximum reliability that it is capable of. Some things can be done inexpensively, and we need to track them and see that they are both done, and maintained.
To overcome this, I created a spreadsheet with the Availability Scoring system as a prototype, the scoring system I have used is subjective scoring based on my Microsoft Exchange engineer perspective. It's a starting point for discussion. Your site and technology might change what options you have to select from to make the scores work. A different backup strategy would have a different maintainability profile. And of course, there are different considerations for different kinds of servers and systems. Take the ball and run with it, I am sure there are a lot of things I didn't think of that you could do to improve the reliability of the servers in your environment. Look at the aspects of the scoring system below and think of feature and functions that your not using that may be already and easily, or cheaply added to the system.
Excel Calculator Spreadsheet for Availability calculation
(it is a mufti page spreadsheet done with Microsoft excel. Page 1 is instruction, Page 2 is the input page and Page 3 is the result page.)
How do I cope?
What if there was a way to give them a number that they could quantify, and then the costs for each step to make a system more reliable could be seen from a cost to benefit ratio?
High availability is more complicated that having five 9’s, there are a lot of things that can be done to improve the availability profile of a IT system. These things are varied and diverse and have different effects on the system. In communicating all the things that need to be done, its easy to get lost in the weeds.
One of the problems, is that non-techies, see only the cost, and the big companies marketing of expensive solutions of fancy equipment. This is not true, it's not about cost, it is about taking advantage of what you already have, and what you could easily do. Of course good quality equipment is required for good results, unreliable inexpensive hardware is, well, Unreliable. However there are many things that can be done with new or existing equipment to provide the maximum reliability that it is capable of. Some things can be done inexpensively, and we need to track them and see that they are both done, and maintained.
To overcome this, I created a spreadsheet with the Availability Scoring system as a prototype, the scoring system I have used is subjective scoring based on my Microsoft Exchange engineer perspective. It's a starting point for discussion. Your site and technology might change what options you have to select from to make the scores work. A different backup strategy would have a different maintainability profile. And of course, there are different considerations for different kinds of servers and systems. Take the ball and run with it, I am sure there are a lot of things I didn't think of that you could do to improve the reliability of the servers in your environment. Look at the aspects of the scoring system below and think of feature and functions that your not using that may be already and easily, or cheaply added to the system.
Excel Calculator Spreadsheet for Availability calculation
(it is a mufti page spreadsheet done with Microsoft excel. Page 1 is instruction, Page 2 is the input page and Page 3 is the result page.)
These are the Aspect to Availability scoring
Aspect | Description | |||
Resistance |
This is a measure of durability, how does this feature or set of features impact the resistance to an outage caused by a failure or assault | |||
Resilience |
Recovery
from a failure, can a system repair, reset or overcome attack or failure. To bounce back from an outage. |
|||
Maintainability |
How
this feature impacts the upkeep and the cost in labor and time to keep the
system running smoothly. All aspects such as user adding, user support, configuration, for patching, service, upgrade etc . |
|||
Recoverability |
The
ability for a system to be Repair from a failure. Be it partial, minor,
major, catastrophic or total failure. |
|||
Security |
Configuration, service or device that protects the C.I.A. of a system |
In the right hands this can be a powerful tool for an organization to manange, communication and comprehend the complexity of making their IT infrastructure much more reliable and maintainable.
Oh, and leave me a note of what you make of it and how it helps you!