Historical status reports for 1999
09:05 Power restored to MtBeauty and all services operational.
07:35 At approximately 6:35am, Tawonga South (and probably a
much larger area) have suffered a widespread power failure.
Our UPS ran out of reserve about 7:35am and all our MtBeauty
facilities are off-air until power is restored. No ETA yet.
10:20 MtBeauty service restored, suspect faulty UPS now isolated
and to be replaced.
09:55 MtBeauty link again failed. Technician recalled to site.
05:15 MtBeauty restored. UPS temperature indicates power fail.
05:00 Interruption to service in MtBeauty. Cause unknown as yet.
14:40 We've upgraded our links to Wangaratta! A few minutes of
interruption was unexpected due to routing updates, but
all services are now back and running better than before.
22:50 Finally fixed. Yet ANOTHER OSPF routing problem.
Aparantly this is an open case with Cisco.
22:40 Tim Harman from Telstra Internet has called and confirmed
the problem in indeed in Melbourne, and it working on
21:40 More information, it appears it may not be international
links but a router problem at lonsdale exchange in Melb.
I still can't get anyone at telstra to answer their phone
to give me more information.
20:55 International connectivity has gone, apparantly due to
11306155 a link failure on the return path from USA to Australia.
Telstra have recalled a tech crew to isolate and fix.
No ETR advised yet.
14:10 Scheduled restart of our proxy to bring the new kernel
on-line as previously advised. This should cure the
intermittent problem that server has been experiencing.
12:20 The problem was apparantly one of the major core routers
ran out of memory and started flapping routes. They have
re-loaded the router and the network is operating ok now.
11:10 Telstra routing problems. All I can get from telstra is:
"We are currently having core networking routing problems.
This is causing packet loss and delays through many parts
of the network."
14:21 A 2-minute interruption to proxy traffic was required during
investigative work for last nights proxy/LAN problem.
We expect one more 2-minute interruption later today to bring
a new kernel on-line which should resolve the problem.
19:47 Our proxy spontaneously re-booted. We're investigating
the problem. Users were unable to use the proxy for 6 mins.
20:54 Just got a phone call from Ian at telstra who cheerfully
reported that things are all working again. They have no
idea what was wrong. It all just started working. This just
isn't good enough and I've demanded full details!
20:26 Still no word from telstra, but as of 1 minute ago, it looks
like at least partial international connectivity is restored.
19:37 OK. Peter has admitted that there "seems" to be a problem with
the core router in lonsdale exchange (Melbourne). No estimated
time of restoration although technicians are working on it now.
18:15 Finally, someone has taken a message and is trying to get onto
11196484 Telstra to chase down the problem. Still no update 18:40!
Finally! Chris at telstra finally reported the fault!
18:07 Finally given up on hold, called Robert at corporate faults
to see why telstra internet are not answering their phones and
to get an alternate contact.
17:20 There appears to be no international connectivity. I'm *STILL*
on hold after 15 minutes waiting to report this to telstra.
It seems we can get to anywhere in Australia, just no outside it.
13:30 A 5 minute interruption to service is scheduled to load a new kernel on
on our main servers in Albury. Completed on time without incident.
11:35 Yes, there is a telstra routing issue. I've argued with telstra faults and
11178641 finally got them to admit there IS a problem. They are working on it, but
have no restoration time at this stage. It appears to be in Melbourne.
12:30 Finally, all the routing is fixed, the new link is back on-line and operational.
Full bandwidth is now restored.
11:10 It looks like the router update didn't occur at 11 today, perhaps a few minutes
early and missed the update. I'm in contact with a human at telstra to get the
routers routes manually updated.
10:50 The new router is configred and installed. Routing update requests have been
submitted to telstra, we're waiting on them to become active at 11:00am
10:30 The replacement router arrived at last. We're making remedial repairs now.
10:15 Things are NOT going well. The upgrade supplied has broken the router and
we have lost all connectivity. We're assessing options.
Replacement router will be here this afternoon, in the mean time, we have
restored 50% capacity through backup links. We do not anticipate any loss
of service quality, as our links are now back to what they were last week.
There MAY be a brief interruption to service when we transfer back to the
main 2 meg link.
10:08 Scheduled upgrade of our core router, should interrupt service for 5 minutes.
12:00 One analogue dial-in line on the Albury pool is faulty. It's been locked
11114661 out, so should not cause any issues to subscribers. No dial tone, no side
tone, no B+, appears that the LI is faulty at the exchange.
11:30 Finally, our new 2 megabit link is running. There will be some periods
of intermittent, or slow access while we adjust routing, and perhaps a
little instability over the next day while we try to load balance the
traffic over the new link.
01:44 Mains power restored, our server is back on-line. Full service restored.
00:45 Our UPS finally exhausted its capacity and shut down. All contact with
our corryong facilities has been lost until power is restored.
23:15 Mains power failure in Corryong. Unknown cause, or duration. Our site
is running on UPS.
Sometime around about now, we hope to be bringing on-line a further
2 megabits of capacity. More details as they come to hand.
14:48 Re-boot of our Albury analogue terminal server required to install
new software. Outage of 3 minutes was unavoidable.
17:00 Telstra advises most services are restored to normal operation.
15:34 Telstra advises: "There has been a break in the Asia-Pacific cable.
11022441 This is causing traffic delays to sites outside Australia.
Estimated restore time is currently unknown."
15:00 There appears to be various routing problems to the USA and beyond.
11022441 It's been reported to telstra who are now working on it.
No ETA at this stage.
17:15 After several days work with telstra and Cisco, we've finally
managed to resolve the technical issues previously preventing
us adding the bandwidth we were working on!
Effective immediately, we have increased our bandwidth by 35%
with a further four-fold increase expected soon.
18:00 After several hours working with telstra and Cisco staff,
several serious technical issues remain unresolved. Cisco
are working on an alternative aproach, we hope to resume
work on the project tomorrow. There has been no impact on
our links at this stage, although reconfiguration tomorrow
MAY require a brief outage of up to 2 minutes. We hope to
be able to provide alternate routing during this time.
14:00 Due to unavailability of technically competent telstra staff
we have had to carry this over until tomorrow.
14:00 We are adding more bandwidth and re-balancing our links. We do
not anticipate any interruption to services during this time.
08:25-08:30 The upgrade scheduled for 08:30 commenced slightly ahead of time
and was completed in 4 minutes without incident.
08:30 There will be a brief outage affecting dial-in customers calling our
ISDN and 56K facilities in Albury while we upgrade our Digital
terminal server firmware. We anticipate a 5 minute interruption.
06:52 alb2 router is back in operation and our links are once again
operational. This whole reliability issue it being raised at
administrative and ministerial levels.
06:25 DAMN! These guys are good. alb2 has disappeared AGAIN!
Reported to Mick at telstra. Expected back in 40 minutes.
They're replacing the alb2 chassis.
02:30 Service restored. No indication what exactly failed, but I
have asked for a full report and will be making waves.
00:15 Telstra say they have fixed the bearer problem, and are trying
to get someone from "WAN Services" to reload the router. No
indication of how much longer it will take!
22:15 Telstras *other* albury router (alb1) has dropped its bundle.
10943146 I've called in a fault (22:40, Mick) and telstra are recalling
a technician to fix the problem. Nothing we can do but wait.
11:45 Routing and infrastructure alterations at our Corryong facilities
required to facilitate bandwidth upgrades resulted in a 15 minute
interruption to connectivity to our Corryong facilities.
22:20 Some international routing, noticably to the USA via Sydney, has
10917180 a serious problem at Kent/Wellington routers. Telstra have recalled
echnicians to work on the problem. No restoration time available.
15:37 Looks like telstras router has just come back on-line. There is a huge
backlog of mail, so performance could be a bit slow during that time.
Finally back up after 61347 calls and nearly 11 HOURS isolation!
14:40 Latest word from telstra is that Cisco are on site trying to restore
service. The 240V to 50V power supplies have failed. Although telstra
had offered restoration times of 10:30, then 1PM, it's still nowhere
near fixed, and telstra are still unable to offer any estimated
10:30 Telstra have now determined their router has suffered a serious failure
and are awaiting replacement parts. Restoration now anticipated to be
2.5 hours. Our router has so far made 46275 calls to bring up the link.
09:10 Telstra have recalled a technician to restore service to their router.
Restoration time expected to be within 90 minutes.
04:59 Yet another telstra router failure. I have been unable to contact the
10911954 Fault centre. Seems they are having phone problems too! Finally got
through at 08:30 to report the faults and get some action.
Our router has made 22681 calls since the link failure to try to bring
the link back up, but been unsucessful.
18:51 Our main DNS suffered a memory fault and shut down. Re-started 18:52
but took until 19:02 to complete the restart and be back on-line.
Many services will have been affected unless configured to use our
secondary DNS as an alternate.
16:40 Routing has been a problem all day. Telstra core routers have serious
10876603 problems and engineers are working on them. Restoration time unknown.
13:09 Re-boot of Wangaratta server required to cure a modem problem.
15:32 Links back up again. Waiting on a cause.
15:25 Links failed AGAIN!. Reported to telstra.
14:55 Links restored, telstra don't know what happened, are investigating.
14:50 Links failed. Reported to telstra.
7:45 Links restored. It's taken all day to confirm the cause, which
is apparantly due to a macrolink failure at telstra. telstra
helpdesk seem to want to pass this off as 'unknown' saying it
seems to be working now, and there's nothing they can do about
it. I'm pressing for a cause and resolution. Time will tell.
07:30 All external connections AGAIN lost! Same as at 6:10am, but this
10817118 time it isn't scheduled. Re-reported. Lets see how slow they are.
7:10 Links restored. This was aparantly "scheduled" downtime for
telstra to upgrade their router. Should have been 6-7am
06:10 All external connection lost! Our ISDN calls are all up, but
10817118 no activity through the telstra router. Reported to telstra who
are now aware of the problem and are working on it.
10:00 Mail was interrupted for approximately 60 seconds while we updated
from Sendmail 8.8.8 to 8.9.3 to enhance anti-spam/anti-relay security.
18:27 Our server is back on-line. The mains power dropped to a severe
brown-out just before 6pm, eventually failed at about 6:15pm.
Our UPS as able to maintain our site fully operational until
7:15pm at which point its batteries became discharged and the
site was shut-down. Power returned about 10 minutes later and
the server re-started sucessfully.
18:00 Mains power problems in Corryong. Seems to be widespread.
At this stage, our site is running on UPS (standby batteries).
10:10 The Upgrade started about 9:53, and completed nicely.
Interruption was 3 minutes x 2. A routing issue resulted in our
MtBeauty server being unreachable for about 10 minutes.
09:30 We will be upgrading our Digital Terminal Servers in Albury.
Expected to commence shortly after 09:30 Saturday morning,
the upgrade should take approximately half an hour, during which
time access may be intermittent. This should only affect callers
to our Albury Digital pool. Some MtBeauty and Wangaratta users
may experience connectivity problems during this time, but should
not have their calls interrupted.
15:02 Our weather station has died. Re-built and back on-line 16:05 but
the environment sensors need re-calibrating, so rather than show
incorrect values, we've taken them off-line until things are right.
10:20 Scheduled 3-minute shutdown of our Wangaratta server was completed
on time. No users were affected.
16:30 Telstra have finally repaired their news servers and things should
be fully operational again. It seems the problem extended to BOTH
telstras NSW and Victorian servers, due to an upgrade that went
wrong. With both being down, our redundancy was bypassed.
14:40 Telstras News server(s) are down, so we have no new news. Telstra
are aware of the problem and are hopeful it will be fixed within
the next 2 hours.
09:30 USA Link problems are being experienced by Telstra, and thus we are
10626770 also seeing significant packet loss and connectivity issues to sites
outside Australia. Telstra estimate a 2 hour repair, they have
engineers working on the problem at the moment.
09:00 Our mail server experiended a memory problem and was re-started
Total outage was 3 minutes.
08:33 Scheduled maintenance completed without incident, apart from
starting 8 minutes later than planned and running 3 mins over
the scheduled time window.
08:23 Missed our scheduled maintenance window of 8:15-8:30.
12:43 MtBeauty connection restored. Required a very rare reset of the router
at MtBeauty to restore the ISDN link.
12:30 Loss of connectivity to MtBeauty. Cause currently unknown.
07:10 Inetd died from one of our main servers. This prevented anyone being
able to check mail from our primary mail server or being able to ftp
to our www.albury.net.au site. Investigation as to cause is continuing.
23:20 (Found 7:03am 18/Apr/99). Significant loss of service at Albury.
17:38 All connectivity restored, the fault appears to have been an IOS fault
in one of the routers.
17:10 All connectivity currently lost. Reported to telstra for urgent action.
3429495 No indication of the fault at this stage except that our ISDN lines are
going up and down which indicates a router problem at the Telstra end.
13:30 USA Connectivity is poor again. Reported to telstra, they claim to have
3429112 have no known problems, but are investigating.
06:50 USA Connectivity restored. Still no indication as to the cause.
04:15 Very slow connections to the USA.
3428886 No significant packet loss, just very slow response. Australian connectivity
appears not to be affected. Telstra have been advised.
23:45 All servers back on-line.
23:31 Replacement UPS in place, power restored. Now begins the task of bringing up
all the servers and checking their operation. I took this oportunity to perform
the memory upgrade while the server was already down. Unfortunately, the new
memory which is the correct type, now conflicts with the 64-port serial processor!
Back to the drawing board!
22:40 During a low-level thunderstorm, a power disturbance resulted in our main UPS
going into a shutdown mode for the second time in 5 years. A replacement UPS
should be here within a week, in the meantime our backup will run the site.
12:07 Servers back on-line.
12:04 Unscheduled restart of our main DNS and Authentication server was required to
clear a memory fault. Some users on our Albury Analogue pool had calls terminated
and a brief period of "BUSY" on the analogue lines will have been experienced.
9:12 The upgrade happened within the time anticipated. Unfortunately, the shipped
memory was incompatible with the existing memory. Another shutdown will need
to be scheduled when the correct memory has been supplied.
09:04 Scheduled shutdown of our main DNS and Authentication server for a memory upgrade.
16/Feb/99 to 19/Feb/99
International link performance is intermittently poor as a result of several
combined factors. All are beyond our control and outside our network.
Telstra lost a 45 Mbps link, a DoS attack against Telstra from a US source,
and an overloaded Fast-Ethernet segment at the Paddington exchange. All are
being attended to (or have been fixed already) and we are assured of an
improvement very shortly.
15:20 The link came good while I was reporting it down. Telstra are investigating.
15:15 Main internet link interrupted due to a failure between two Telstra routers.
23:28 Service finally restored to both Corryong servers.
17:45 Widespread power fail continues. Our UPS just shut down.
No estimate on restoration at this time.
16:05 Widespread power fail in Corryong. Our UPS is holding for the moment.