SOLVED: SoftLayer Datacenter Issues Leading to Decreased DPD Performance

  • Jason@DPD
  • August 27, 2014
  • No Comments

DPD’s primary servers are located in the Softlayer Washington, DC Datacenter. At approximately 1:30PM they had a fiber optic cable damaged that resulted in 6 minutes of downtime. As they are completing repairs there has been poor network performance connecting to the DPD servers. This impacts all DPD services including mail, product downloads, and the vendor admin.

Status updates are being posted to the two primary Softlayer twitter accounts:

https://twitter.com/SoftLayerNotify
https://twitter.com/SoftLayer

Aug 26, 2014 1:26 PM EST: 6 minutes of downtime when fiber cable was cut. Because of the short downtime period, this was not a critical event.

Aug 26, 2014 1:32 PM EST: All DPD services restored. Our server and data were not affected- this is a network issue with the cables between data centers.

Ongoing: Softlayer Update: “Until WDC01 redundant links are completely restored, customers may experience higher than normal network latency and some packet loss.”

DPD posted a notice in the DPD admin and company twitter accounts.

Aug 26, 2014 4:00 PM EST: Softlayer Update: “Fiber crews are replacing the severed network link between WDC01 and WDC02. This work is expected to take several hours to complete.”

Aug 26, 2014 10:31 PM: Softlayer Update: “The crews encountered difficulty in pulling the cable through the conduit, so the repair has not been completed yet.”

Aug 27, 2014 1:00 AM: Performance seems to be improving. There is no Softlayer update at this time but we are monitoring network traffic to the DPD servers and throughput is rising.

Aug 27, 2014 10:45 AM: Softlayer announced that the repairs were completed. We’re still experiencing latency issues and we’re working with the provider to address them.

Aug 27, 2014 12:38 AM: All network issues are resolved.

Summary:

1. A major fiber optic interconnect was severed at the SoftLayer datacenter in Washington, DC where DPD’s primary servers are located. This resulted in a 6 minute period of downtime while switching to backup lines.

2. While SoftLayer was repairing the severed line, all DPD traffic was moving over backup lines that struggled with the capacity, leading to poor DPD performance for about 23 hours.

3. The line was fixed. After several hours the traffic was returning to correct levels as it was re-routed back to the primary lines and bottlenecks were addressed.

Total time on error was 6 minutes.

Total time for degraded performance was 23.5 hours.

Leave a Response

This site uses Akismet to reduce spam. Learn how your comment data is processed.