Thoughtful Problem #7:
The core mission of any school is to promote powerful learning through effective teaching. As technology director, you are responsible for systems that are directly related to this core mission of teaching and learning such as Internet access, student reporting and curriculum systems. You are also responsible for systems that relate to student safety, finance and personnel that are fundamental to running the institution on a daily basis. You are told that network-related issues, perhaps virus-related, are interrupting the operation of all major systems.
Which systems do you try to stabilize and restore to service first? Why? What systems might be your second priority? Which systems would have a lower priority. Answer in 3-5 paragraphs using systems that provide clear examples and choices (you don't need to discuss lots of enterprise or auxilliary systems).
Response
Every technology department should have a well defined and tested disaster recovery plan (DRP). As the tech director, it is your responsibility to make sure your organization's plan is well defined and updated as new technology is deployed. With this plan in hand, your response to this problem should already be scripted. Now, you must put your plan to work.
One of the most important part of the "recovery" section of a DRP is the systems analysis section. This analysis should show every system in the organization. In this analysis, each system is rated by its criticality to the organization. Included in the rating is an allowable downtime, ease of recovery and other conditions are recorded. This then becomes your recoverablility guideline.
Most importantly, the analysis needs to be realistic. Users of each system will always say their system is the most important. The analysis needs to be conducted without bias. In addition, realistic down times must be listed. If the email system is rated as having a downtime of 4 hours, then the appropriate resources must be in-place to support this. Otherwise, the plan is worthless.
In the time section, a six means less than 4 hours, a five is 4-6 hours, a four is 7-12 hours, a three is 13-24 hours, a two is 24-48 hours and a zero means there is no criticality to the system and Restoration can occur once all other systems are stable. According to this chart, my first system to get working would be my directory services, active directory. The other headings refer to the conditions that may impact recovery.
Active Directory makes sense as the first system because it is the system that allows users to log on to the network in general. If users cannot log in, all other systems are irrelevant. Once this system has become stable and users are able to log on, the next step would be to restore mail flow. This will allow the communications to begin flowing.
Many may think that a phone system is the most important system to recover first because the district is not reachable in an emergency. In reality, a quick phone call to the phone vendor and all incoming calls are routed to a cell phone or an analog phone that is separate from the district phone system. With this in-place, phones are "restored" before many even know there is an issue.
Once log on and mail flow are complete, you can begin to work on all of the other systems in the chart until all systems are stabilized. Without this type of plan, important time may be wasted just after the outage determining what to recover first.
