Historical status reports for 2006
09:40 AAPT seem to have stuffed up the BGP announcements, resulting
in routes not being advertised. Fixed.
08:29 There seems to be a widespread connectivity issue.
Seems to have started at 0300, most significantly affecting our DSL
customers. Problem appears to be upstream of our upstream provider.
Reports from other ISPs suggests a connect or optus problem. No ETR yet.
13:45 Due to fire and exploding gas cylinders in nearby buildings, the ALI
offices have been evacuated. Due to the short notice, no phone diversion
or other arrangements have been made for customer support.
06:30 Intermittent but severe packet loss between telstra Albury and
Melbourne routers. This doesn't seem to be affecting everything,
and that which is affected is affected in unpredictable ways.
Reported to telstra, waiting on response.
05:00 Again, Glebe is off-air, resulting in loss of all DSL services.
Engineer has been sent to the site, updates as they come to hand.
Updated: 07:30. All equipment is powered-down, both primary and
secondary cisco routers are dead.
Updated: 11:00. Servers are running, Cisco routers are terminal.
Not withstanding its a public holiday in NSW, a replacement big
router has been sourced and is enroute to the datacentre now.
Anticipating restoration of services hopefully around 13:00
10:00 Techs have gained access to the site and determined that the same
interface card on both the primary AND BACKUP routers have failed.
A temporary fix has been achieved, but replacement parts from Cisco
may not be available until early next week, so there may be some
small outage in order to bring new equipment on-line then.
09:00 Something has happened at the data-centre in Glebe, loss of all
connectivity for adsl services. Engineers are at the site, but
their security cards are not letting them in, and the staff at
the security desk can't get in either. Waiting on an update.
00:23 One of the big APC UPSs has died, shutting down all the servers.
UPS replaced with a hot spare, servers back up. Just bedding thing
in, making sure all services are running properly etc.
(First time all the servers have been off for 4 years)
23:56 Major loss at our Albury office. Remote tests indicate most (but not
all) primary servers failed. Tech recalled to site.
04:25 At 04:25 (approx) there was a widespread ADSL outage. The cause is still
being investigated. Although many services re-established within a few
minutes, a significant proportion required a power-cycle of the routers.
12:30 Strange delays and packet loss from albury to and beyond melbourne.
121265350 Telstra advise "there are no outages, anywhere" and told me to reset
mo adsl modem, and seemed perplexed at what a "megalink" was. Have
taken details and promise a call back shortly.
Traceroute to www.abc.net.au:
2 albury-core.albury.NET.AU (22.214.171.124) 3.829 ms
3 Serial2-6.alb3.Albury.telstra.net (126.96.36.199) 9.529 ms
2 albury-core.albury.NET.AU (188.8.131.52) 4.631 ms
3 Serial2-6.alb3.Albury.telstra.net (184.108.40.206) 16.222 ms
4 ATM6-0-0-4.lon-core2.Melbourne.telstra.net (220.127.116.11) 97.309 ms
5 TenGigabitEthernet8-2.lon55.Melbourne.telstra.net (18.104.22.168) 57.639 ms
Same results to various national and international sites.
09:25 Multilink group bundles were out of sync from end to end, and seemed
they could not recover. Required a "reset" of the Albury tigris to
get things back in sync. Seems to have only been the corryong link
that was affected, although a small number of users who were dialed
into the Albury Tigris at the time will have needed to redial.
04:19 Loss of connectivity to both Corryong and Wangaratta sites.
Appears to be a telstra ISDN problem affecting some links only.
Corryong has dialed back in fter 11 seconds, but not passing traffic.
Wangaratta came back on-line 08:05 after a call to the NAS prompted
it to bring up the link.
Still trying to resolve the issue with Corryong. Routing, call state
etc all looks perfect from the Albury end.
07:44 ADSL Services all restored. The exact cause is still being isolated, but
it appears the border router (Cisco 7200) lost power to both supplies,
which were plugged into seperate rails on seperate circuits over two
phases... and for reasons still being investigated, when power returned
the router had lost its entire config, which required additional people
and equipment to attend the site to restore. More information as it is
17:41 Loss of all DSL connectivity affecting all our ADSL customers and
some office services on IP addresses serviced by our own adsl link.
Due to a system error at AAPT wholesale, disconnection requests were
sent to telstra during internal churn of all our services. Restoration
of line codes on customers line being achieved as quickly as possible.
04:28 Loss of radio connectivity from office to Springdale AP resulting in
all radio sites down. Tech called to site. All equipment operating
correctly but not passing traffic. Interface administratively marked
as down, then up, everything working again at 04:46
18:05 Connectivity to Albury seems to have been restored through all services
although telstra are still advising it may be up to midnight before all
services through the affected areas are back on-line. Damage to two
different fibre-optic cables in two seperate locations caused the fault.
11:03 A major fibre cable cut is affecting all our Albury services.
AAPT, Telstra, Comindico/Soul are all down, data and voice.
No restoration time advised yet, although aparantly both Albury
and Griffith are affected (quite likely, elsewhere too. Parts of
Wodonga are known to be down also)
14:12 Strange memory corruption on nameserver has required a
rare reboot. Brief interruption (about 2 mins)
11:00 Just discoverd that today is the day that all the default
timezone files on all our servers said daylight savings
ended. Alas, it was changed due to the commonwealth games
and now ends 2nd April. Main server fixed now, and I am
rolling out the changes to all other servers.
21:00 Power restored to safe levels, site again operational.
Problem believed to be AVR (regulator) at Shelley,
which has now been forced into Manual mode until
repairs can be effected, hopefully tomorrow morning.
19:55 UPSs have shutdown, mains still at dangerous levels
but with horrible waveforms.
Still no word from TXU. At this stage, all customers
using our Corryong POP are affected.
19:17 Major power problems in Corryong. Mains voltage at 280V.
TXU called but have no idea of the fault or restoration
time. Site running on UPS for now.
12:10 Brief interruption to service on all Albury digital services
while telstra re-configured the Onramp lines. (This had been
scheduled for 1/Feb/2006 but telstra let us down, again!)
(13:45, Wangaratta also re-configured, no users affected)
11:50 Wangaratta site required urgent chassis change due to
catastrophic failure of 2 cooling fans. Chassis changed
and back on-air 12:05
12:32 Power restored. Genset shutdown, refueled, ready again.
11:33 Power failure in head office. Moderately widespread, much
of the immediate area was completely out, or mostly out.
We lost 2 of 3 phases, running computer room on genset.
08:20 Power glitch, loss of temporary power supply in the Albury
wireless AP basestation. Proper 18V supply sourced and now
installed, site back to full operation 10am.
07:26 Power restored to Corryong.
06:50 Power loss in Corryong at 6:03, Running on UPS. Batteries
finally gave out, site down.
12:50 Power loss in much of Albury, believed to be caused by
an "incident" by Abygroup with the freeway construction.
Power restored to most areas by 13:10, but some wireless
customers still out due to loss of our albury-central AP.
Took some time to gain access to the site because of no
keyholder. Finally gained access about 16:45 and was able
to identify the cause was a fried switchmode power supply
for the radio equipment. Replaced and site back online in
a matter of 3-4 minutes.
17:59 Wangaratta restored.
17:05 Lost Wangaratta site. Strong winds and storms, a
power fail is believed to be the problem. Trying to
get someone to attend the site.
Sites affected will be Yarrawonga, Benalla, Wangaratta