Like any large technical organization, we had outages over the six years I was CIO and the team responded in a very professional manner. All three of the major outages were from external sources.
The most common outage was caused by squirrel attack. This occurred weekly and mostly was not noticed by the universities. We ran two separate fiber lines all universities except College of Coastal Georgia (it was impossible to buy a 2nd fiber line to College of Coastal Georgia at the time) and we worked tirelessly to unfold fiber runs. For non-techies, that means we ensured the two fiber runs would not overlap to create a common point of failure. It is much harder than you think but we made a lot of progress over time on PeachNet.
Ellucian releases a software patch monthly. Unlike what they normally do, they once released a patch contained massive memory leaks. To make this an extra spicy incident, the patch was released right before spring semester registration for classes. For those unfamiliar with memory leaks, there are three approaches to deal with memory leaks:
- Find the bad code and release a new patch without the memory leaks. This was the job of Ellucian since it was their code. They took about a week to find the bad code.
- Lengthen the time between system crashes by maximizing memory and storage. To do so, you shut does the service (which takes 20 minutes) with a stack of memory cards and hard drives nearby, plug everything in as fast as you can an reboot (which takes another 20 minutes or more).
- Once you have maxed out resources, time how it takes the memory leaks to exhaust your servers. Schedule reboots and communicate when the reboots will occur so that there are schedules reboots instead of crashes.
My team and I did not sleep for three days as we ameliorated the situation as best we could while waiting on Ellucian to fix their memory leaks. The memory leaks affected all of Ellucian’s customers so at least there was shared misery. The Chancellor and Executive Vice Chancellors were appreciative of our efforts and student registration progressed in forty minute intervals, 24 hours a day, until Ellucian released a fixed patch.
The third major outage only affected University of North Georgia but was significant. AT&T, in the process of installing new fiber with a high speed bore, drilled into their own fiber and shredded it for about a half mile both upstream and downstream. This cut all internet access to University of North Georgia. If you are going to make mistakes, go big or go home.
Rather than use that handy high speed bore to just bypass the one mile of shredded fiber cable and restore service, AT&T faithfully tried to patch the thousands of shreds in the cable. Doing so would save them pennies as long as you assumed the internet service you provided was without value. They firmly believed that the service they provided was without value because they tried to splice the shreds all day. Then, a strange thing happened. It got dark so they went home. Meanwhile, everyone at North Georgia Military University was without internet service. The next morning, eventually AT&T returned to their still dormant high speed bore. We sent guards (really) to make it clear that no one was going home until the service was restored. North Georgia University sent guards (really) as well. Remarkedly, attempts to patch the thousands of shreds ended and they discovered that handy nearby high speed bore.
We held an after-action review at the University of North Georgia. AT&T declined to attend based on legal advice.
The University System of Georgia did not have any major cybersecurity attacks while I was CIO. The closest we came involved the Shared Service Center’s partner, ADP, routinely giving our monthly payroll data to the wrong organization. I may have identity protection for life given how many times this happened. Thankfully, the receiving organization immediately realized the mistake and protected our data.
If you are progressing through the website sequentially, the next chapter is an interlude with a brief but humorous recount of my personal experiences during Snow Armageddon.