Tuesday, October 23, 2007

TP#7



Thoughtful Problem #7:


The core mission of any school is to promote powerful learning through effective teaching. As technology director, you are responsible for systems that are directly related to this core mission of teaching and learning such as Internet access, student reporting and curriculum systems. You are also responsible for systems that relate to student safety, finance and personnel that are fundamental to running the institution on a daily basis. You are told that network-related issues, perhaps virus-related, are interrupting the operation of all major systems.
Which systems do you try to stabilize and restore to service first? Why? What systems might be your second priority? Which systems would have a lower priority. Answer in 3-5 paragraphs using systems that provide clear examples and choices (you don't need to discuss lots of enterprise or auxilliary systems).




Response
Every technology department should have a well defined and tested disaster recovery plan (DRP). As the tech director, it is your responsibility to make sure your organization's plan is well defined and updated as new technology is deployed. With this plan in hand, your response to this problem should already be scripted. Now, you must put your plan to work.
One of the most important part of the "recovery" section of a DRP is the systems analysis section. This analysis should show every system in the organization. In this analysis, each system is rated by its criticality to the organization. Included in the rating is an allowable downtime, ease of recovery and other conditions are recorded. This then becomes your recoverablility guideline.
Most importantly, the analysis needs to be realistic. Users of each system will always say their system is the most important. The analysis needs to be conducted without bias. In addition, realistic down times must be listed. If the email system is rated as having a downtime of 4 hours, then the appropriate resources must be in-place to support this. Otherwise, the plan is worthless.
Here is an example of a systems analysis:











In the time section, a six means less than 4 hours, a five is 4-6 hours, a four is 7-12 hours, a three is 13-24 hours, a two is 24-48 hours and a zero means there is no criticality to the system and Restoration can occur once all other systems are stable. According to this chart, my first system to get working would be my directory services, active directory. The other headings refer to the conditions that may impact recovery.

Active Directory makes sense as the first system because it is the system that allows users to log on to the network in general. If users cannot log in, all other systems are irrelevant. Once this system has become stable and users are able to log on, the next step would be to restore mail flow. This will allow the communications to begin flowing.

Many may think that a phone system is the most important system to recover first because the district is not reachable in an emergency. In reality, a quick phone call to the phone vendor and all incoming calls are routed to a cell phone or an analog phone that is separate from the district phone system. With this in-place, phones are "restored" before many even know there is an issue.

Once log on and mail flow are complete, you can begin to work on all of the other systems in the chart until all systems are stabilized. Without this type of plan, important time may be wasted just after the outage determining what to recover first.

2 comments:

Lifang said...

It is really interesting that “Users of each system will always say their system is the most important”. Teachers may think their assignment grading is the most important and financial staff may argue the accounting system should be the top criticality for a school.

People take it for granted that network works well all time. When something is wrong with the school’s network, the technology technician will work like a firefighter. So, the technology director should concern not only the disaster recovery plan but the daily maintenance for the network stability.

Tim McCann said...

In my own personal blog post I had a hard time coming up with a name and structure for ranking importance related to systems. I like the DRP ideas and it’s ranking system. I thought about the concept of RECOVERY TIME in my post but I was unsure of the actual time it would take to restore certain systems (would have to consult with the network group) but I see a lot of validity in using it in the ranking. I to agree with the BIAS issue and I realize that this is a school and teachers need to teach, BUT I know from teaching experience that technology cannot be relied upon to be working all the time. Non-technologic backup plans always need to be thought about and the act of teaching can be done with out technology (even if teachers and students are inconvenienced). Communication is key, if people are informed of the plan then I feel that all parties involved will be more inclined to understand the process.