ATO battles with corrupted data

Sometimes spending AUD$1.26 Billion will not guarantee against massive system failures.

As reported by ITNews, The ATO had lost 1PB (Petabyte) of storage on December 12, 2016 due to an unspecified hardware issue. It was understood that the failure was in two HPE (HP Enterprise) 3PAR SAN (Storage Area Network) installations aquired late last year.

Things got way worse when the backup systems failed to kick in and the corrupted data was copied to the fault tolerant copy of the data now spreading to both SAN arrays with some in the industry speculating that the backups may have also been corrupted as a result.

Pretty much a perfect storm when it comes to system failures.

The ATO has stressed that no taxpayer information has been compromised which can be interpreted that no personal information has actually left the ATO systems.
It is just well, gone.

Tax agents and other individuals and businesses have been having problems in using the ATO portal costing lost time in productivity as well as long term damage due to lost goodwill with customers.

This outage has left many asking questions about how efficient and reliable the government is in supporting its own services following the ABS Census failure.

Some of the numbers regarding the ATO and technology.

  • AUD$700,000,000 spent on hardware and software since 2011
  • AUD$34,000,000 spent on consultancies during 2015/2016 financial year
  • Outage was not due to upgrade projects being undertaken at the time of failure. This outage occurred during normal operations.

Some questions answered.

  • How big is 1 PB of data?
    • 1 Petabyte is 1024 Terabytes. An equivalent would be 205 5TB portable hard drives worth over $63000. Of course this figure would be much higher as we are dealing with a very different class of hard drive suited for heavy use.
  • How long does it take to restore 1PB of data?
    • Using some of the highest spec robotic tape storage libraries it could take as little as a few hours with some of the lower end-high capacity systems over 200 hours to restore data. If you were using comsumer class technology a petabyte of data would take over a month running 24 hours a day to transfer from one system to another.
  • Is there a chance that the backup is corrupted too?
    • There is no guarantee that the backup is not corrupted as it seems as though the data corruption may have been happening before it was noticed. It would be hard to be sure until the backup has been fully restored and the data is properly checked. This could significatnly add time to the restore process.
  • What happened?
    • It seems a drive or drives may have become corrupted and the system did not pick up the failure right away. This resulted in the drive array spreading the corrupted data throughout the other drives. There is a secondary array that mirrors the first original array to provide fault tolerance by keeping an exact copy of the original array. Unfortunately as the first corrupted drive was not detected, the corruption may have also spread to the second array forcing the ATO systems offline.
      To make matters worse, it is not know if the backups are also corrupted or not.
  • Who is impacted?
    • Tax agents, individuals and businesses trying to access the ATO portal.
  • How long until services are back?
    • It would be hard to tell but I would estimate at least days to restore the data, test the data and then bring systems online again.
  • Is there a chance that tax records will be lost?
    • Hard to say at this point as the ATO has not been specific about what data was impacted and if it was just the online portal or if it included archival storage.
  • Where to from here?
    • Worst case scenario, the ATO may have to rely on the public to resubmit their data which should be straightforward but could incur costs from tax agents and take months. Best case, the systems are restored and up and running in a few days.

Additional coverage:

http://www.afr.com/technology/enterprise-it/ato-promises-definitive-independent-review-into-its-hpe-tech-failure-20161216-gtcp2o

http://www.afr.com/technology/ato-says-it-will-get-all-data-back-but-systems-issues-continue-20161213-gtaopo

 

One thought on “ATO battles with corrupted data

Comments are closed.