Historical status reports for 1999

09:05	 Power restored to MtBeauty and all services operational.

07:35	 At approximately 6:35am, Tawonga South (and probably a
	 much larger area) have suffered a widespread power failure.
	 Our UPS ran out of reserve about 7:35am and all our MtBeauty
	 facilities are off-air until power is restored. No ETA yet.

10:20	 MtBeauty service restored, suspect faulty UPS now isolated
	 and to be replaced.

09:55	 MtBeauty link again failed. Technician recalled to site.
05:15	 MtBeauty restored. UPS temperature indicates power fail.

05:00	 Interruption to service in MtBeauty. Cause unknown as yet.

14:40 	 We've upgraded our links to Wangaratta! A few minutes of
	 interruption was unexpected due to routing updates, but
	 all services are now back and running better than before.

22:50	 Finally fixed. Yet ANOTHER OSPF routing problem.
	 Aparantly this is an open case with Cisco.

22:40	 Tim Harman from Telstra Internet has called and confirmed
	 the problem in indeed in Melbourne, and it working on
	 restoring service.

21:40	 More information, it appears it may not be international
	 links but a router problem at lonsdale exchange in Melb.
	 I still can't get anyone at telstra to answer their phone
	 to give me more information.

20:55	 International connectivity has gone, apparantly due to
11306155 a link failure on the return path from USA to Australia.
	 Telstra have recalled a tech crew to isolate and fix.
	 No ETR advised yet.

14:10	 Scheduled restart of our proxy to bring the new kernel
	 on-line as previously advised. This should cure the 
	 intermittent problem that server has been experiencing.

12:20	 The problem was apparantly one of the major core routers
	 ran out of memory and started flapping routes. They have
	 re-loaded the router and the network is operating ok now.

11:10	 Telstra routing problems. All I can get from telstra is:
	 "We are currently having core networking routing problems.
         This is causing packet loss and delays through many parts
	 of the network."

14:21 	 A 2-minute interruption to proxy traffic was required during
	 investigative work for last nights proxy/LAN problem.
	 We expect one more 2-minute interruption later today to bring
	 a new kernel on-line which should resolve the problem.

19:47 	 Our proxy spontaneously re-booted. We're investigating
	 the problem. Users were unable to use the proxy for 6 mins.

20:54	 Just got a phone call from Ian at telstra who cheerfully
	 reported that things are all working again. They have no
	 idea what was wrong. It all just started working. This just
	 isn't good enough and I've demanded full details!

20:26	 Still no word from telstra, but as of 1 minute ago, it looks
	 like at least partial international connectivity is restored.

19:37	 OK. Peter has admitted that there "seems" to be a problem with
	 the core router in lonsdale exchange (Melbourne). No estimated
	 time of restoration although technicians are working on it now.

18:15	 Finally, someone has taken a message and is trying to get onto
11196484 Telstra to chase down the problem. Still no update 18:40!
	 Finally! Chris at telstra finally reported the fault!

18:07	 Finally given up on hold, called Robert at corporate faults
	 to see why telstra internet are not answering their phones and
	 to get an alternate contact.

17:20	 There appears to be no international connectivity. I'm *STILL*
	 on hold after 15 minutes waiting to report this to telstra.
	 It seems we can get to anywhere in Australia, just no outside it.

13:30	 A 5 minute interruption to service is scheduled to load a new kernel on
	 on our main servers in Albury. Completed on time without incident.

11:35	 Yes, there is a telstra routing issue. I've argued with telstra faults and 
11178641 finally got them to admit there IS a problem. They are working on it, but
	 have no restoration time at this stage. It appears to be in Melbourne.

12:30	 Finally, all the routing is fixed, the new link is back on-line and operational.
	 Full bandwidth is now restored.

11:10	 It looks like the router update didn't occur at 11 today, perhaps a few minutes
	 early and missed the update. I'm in contact with a human at telstra to get the
	 routers routes manually updated.

10:50	 The new router is configred and installed. Routing update requests have been
	 submitted to telstra, we're waiting on them to become active at 11:00am

10:30	 The replacement router arrived at last. We're making remedial repairs now.

10:15	 Things are NOT going well. The upgrade supplied has broken the router and
	 we have lost all connectivity. We're assessing options.
	 Replacement router will be here this afternoon, in the mean time, we have
	 restored 50% capacity through backup links. We do not anticipate any loss
	 of service quality, as our links are now back to what they were last week.
	 There MAY be a brief interruption to service when we transfer back to the
	 main 2 meg link.

10:08	 Scheduled upgrade of our core router, should interrupt service for 5 minutes.

12:00	 One analogue dial-in line on the Albury pool is faulty. It's been locked
11114661 out, so should not cause any issues to subscribers. No dial tone, no side
	 tone, no B+, appears that the LI is faulty at the exchange.

11:30	 Finally, our new 2 megabit link is running. There will be some periods
	 of intermittent, or slow access while we adjust routing, and perhaps a
	 little instability over the next day while we try to load balance the
	 traffic over the new link.

01:44	 Mains power restored, our server is back on-line. Full service restored.

00:45	 Our UPS finally exhausted its capacity and shut down. All contact with
	 our corryong facilities has been lost until power is restored.

23:15	 Mains power failure in Corryong. Unknown cause, or duration. Our site
	 is running on UPS.

	 Sometime around about now, we hope to be bringing on-line a further
	 2 megabits of capacity. More details as they come to hand.

14:48	 Re-boot of our Albury analogue terminal server required to install
	 new software. Outage of 3 minutes was unavoidable.

17:00	 Telstra advises most services are restored to normal operation.

15:34	 Telstra advises: "There has been a break in the Asia-Pacific cable.
11022441 This is causing traffic delays to sites outside Australia.
         Estimated restore time is currently unknown."

15:00	 There appears to be various routing problems to the USA and beyond.
11022441 It's been reported to telstra who are now working on it.
	 No ETA at this stage.

17:15	 After several days work with telstra and Cisco, we've finally
	 managed to resolve the technical issues previously preventing
	 us adding the bandwidth we were working on!
	 Effective immediately, we have increased our bandwidth by 35%
	 with a further four-fold increase expected soon.

18:00	 After several hours working with telstra and Cisco staff,
	 several serious technical issues remain unresolved. Cisco
	 are working on an alternative aproach, we hope to resume
	 work on the project tomorrow. There has been no impact on
	 our links at this stage, although reconfiguration tomorrow
	 MAY require a brief outage of up to 2 minutes. We hope to
	 be able to provide alternate routing during this time.

14:00	 Due to unavailability of technically competent telstra staff
	 we have had to carry this over until tomorrow.

14:00	 We are adding more bandwidth and re-balancing our links. We do
	 not anticipate any interruption to services during this time.
08:25-08:30 The upgrade scheduled for 08:30 commenced slightly ahead of time
	 and was completed in 4 minutes without incident.
08:30	 There will be a brief outage affecting dial-in customers calling our
	 ISDN and 56K facilities in Albury while we upgrade our Digital 
	 terminal server firmware. We anticipate a 5 minute interruption.

06:52	 alb2 router is back in operation and our links are once again
	 operational. This whole reliability issue it being raised at
	 administrative and ministerial levels.

06:25	 DAMN! These guys are good. alb2 has disappeared AGAIN!
	 Reported to Mick at telstra. Expected back in 40 minutes.
	 They're replacing the alb2 chassis.

02:30	 Service restored. No indication what exactly failed, but I
	 have asked for a full report and will be making waves.

00:15	 Telstra say they have fixed the bearer problem, and are trying
	 to get someone from "WAN Services" to reload the router. No
	 indication of how much longer it will take!

22:15	 Telstras *other* albury router (alb1) has dropped its bundle.
10943146 I've called in a fault (22:40, Mick) and telstra are recalling
	 a technician to fix the problem. Nothing we can do but wait.

11:45	 Routing and infrastructure alterations at our Corryong facilities
	 required to facilitate bandwidth upgrades resulted in a 15 minute
	 interruption to connectivity to our Corryong facilities.

22:20	 Some international routing, noticably to the USA via Sydney, has
10917180 a serious problem at Kent/Wellington routers. Telstra have recalled
	 echnicians to work on the problem. No restoration time available.

15:37	 Looks like telstras router has just come back on-line. There is a huge
	 backlog of mail, so performance could be a bit slow during that time.
	 Finally back up after 61347 calls and nearly 11 HOURS isolation!

14:40	 Latest word from telstra is that Cisco are on site trying to restore
	 service. The 240V to 50V power supplies have failed. Although telstra
	 had offered restoration times of 10:30, then 1PM, it's still nowhere
	 near fixed, and telstra are still unable to offer any estimated
	 restoration time. 

10:30	 Telstra have now determined their router has suffered a serious failure
	 and are awaiting replacement parts. Restoration now anticipated to be
	 2.5 hours. Our router has so far made 46275 calls to bring up the link.

09:10	 Telstra have recalled a technician to restore service to their router.
	 Restoration time expected to be within 90 minutes.

04:59	 Yet another telstra router failure. I have been unable to contact the
10911954 Fault centre. Seems they are having phone problems too! Finally got
	 through at 08:30 to report the faults and get some action.
	 Our router has made  22681 calls since the link failure to try to bring
	 the link back up, but been unsucessful.

18:51	 Our main DNS suffered a memory fault and shut down. Re-started 18:52
	 but took until 19:02 to complete the restart and be back on-line.
	 Many services will have been affected unless configured to use our
	 secondary DNS as an alternate. 

16:40	 Routing has been a problem all day. Telstra core routers have serious
10876603 problems and engineers are working on them. Restoration time unknown.

13:09	 Re-boot of Wangaratta server required to cure a modem problem.

15:32	 Links back up again. Waiting on a cause.

15:25	 Links failed AGAIN!. Reported to telstra.

14:55	 Links restored, telstra don't know what happened, are investigating.

14:50	 Links failed. Reported to telstra.

7:45	 Links restored. It's taken all day to confirm the cause, which
	 is apparantly due to a macrolink failure at telstra. telstra
	 helpdesk seem to want to pass this off as 'unknown' saying it
	 seems to be working now, and there's nothing they can do about
	 it. I'm pressing for a cause and resolution. Time will tell.

07:30	 All external connections AGAIN lost! Same as at 6:10am, but this
10817118 time it isn't scheduled. Re-reported. Lets see how slow they are.

7:10	 Links restored. This was aparantly "scheduled" downtime for 
	 telstra to upgrade their router. Should have been 6-7am

06:10	 All external connection lost! Our ISDN calls are all up, but
10817118 no activity through the telstra router. Reported to telstra who
	 are now aware of the problem and are working on it.

10:00	 Mail was interrupted for approximately 60 seconds while we updated
	 from Sendmail 8.8.8 to 8.9.3 to enhance anti-spam/anti-relay security.

18:27	 Our server is back on-line. The mains power dropped to a severe
	 brown-out just before 6pm, eventually failed at about 6:15pm.
	 Our UPS as able to maintain our site fully operational until
	 7:15pm at which point its batteries became discharged and the
	 site was shut-down. Power returned about 10 minutes later and
	 the server re-started sucessfully.

18:00	 Mains power problems in Corryong. Seems to be widespread.
	 At this stage, our site is running on UPS (standby batteries).

10:10	 The Upgrade started about 9:53, and completed nicely.
	 Interruption was 3 minutes x 2. A routing issue resulted in our
	 MtBeauty server being unreachable for about 10 minutes.

09:30	 We will be upgrading our Digital Terminal Servers in Albury.
	 Expected to commence shortly after 09:30 Saturday morning,
	 the upgrade should take approximately half an hour, during which
	 time access may be intermittent. This should only affect callers
	 to our Albury Digital pool. Some MtBeauty and Wangaratta users 
	 may experience connectivity problems during this time, but should
	 not have their calls interrupted.

15:02	 Our weather station has died. Re-built and back on-line 16:05 but
	 the environment sensors need re-calibrating, so rather than show
	 incorrect values, we've taken them off-line until things are right.

10:20	 Scheduled 3-minute shutdown of our Wangaratta server was completed
	 on time. No users were affected.

16:30	 Telstra have finally repaired their news servers and things should
	 be fully operational again. It seems the problem extended to BOTH
	 telstras NSW and Victorian servers, due to an upgrade that went
	 wrong. With both being down, our redundancy was bypassed.

14:40	 Telstras News server(s) are down, so we have no new news. Telstra
	 are aware of the problem and are hopeful it will be fixed within
	 the next 2 hours.

09:30	 USA Link problems are being experienced by Telstra, and thus we are
10626770 also seeing significant packet loss and connectivity issues to sites
	 outside Australia. Telstra estimate a 2 hour repair, they have 
	 engineers working on the problem at the moment.

09:00	Our mail server experiended a memory problem and was re-started
	Total outage was 3 minutes.

08:33	Scheduled maintenance completed without incident, apart from
	starting 8 minutes later than planned and running 3 mins over 
	the scheduled time window.

08:23	Missed our scheduled maintenance window of 8:15-8:30.

12:43	MtBeauty connection restored. Required a very rare reset of the router
	at MtBeauty to restore the ISDN link.

12:30	Loss of connectivity to MtBeauty. Cause currently unknown.

07:10	Inetd died from one of our main servers. This prevented anyone being
	able to check mail from our primary mail server or being able to ftp
	to our www.albury.net.au site. Investigation as to cause is continuing.

23:20	(Found 7:03am 18/Apr/99). Significant loss of service at Albury.

17:38	All connectivity restored, the fault appears to have been an IOS fault
	in one of the routers.

17:10	All connectivity currently lost. Reported to telstra for urgent action.
3429495	No indication of the fault at this stage except that our ISDN lines are
	going up and down which indicates a router problem at the Telstra end.

13:30	USA Connectivity is poor again. Reported to telstra, they claim to have
3429112	have no known problems, but are investigating.

06:50	USA Connectivity restored. Still no indication as to the cause.

04:15	Very slow connections to the USA.
3428886	No significant packet loss, just very slow response. Australian connectivity
	appears not to be affected. Telstra have been advised.

23:45	All servers back on-line.

23:31	Replacement UPS in place, power restored. Now begins the task of bringing up
	all the servers and checking their operation. I took this oportunity to perform
	the memory upgrade while the server was already down. Unfortunately, the new
	memory which is the correct type, now conflicts with the 64-port serial processor!
	Back to the drawing board!

22:40	During a low-level thunderstorm, a power disturbance resulted in our main UPS
	going into a shutdown mode for the second time in 5 years. A replacement UPS
	should be here within a week, in the meantime our backup will run the site.

12:07	Servers back on-line.

12:04	Unscheduled restart of our main DNS and Authentication server was required to
	clear a memory fault. Some users on our Albury Analogue pool had calls terminated
	and a brief period of "BUSY" on the analogue lines will have been experienced.

9:12	The upgrade happened within the time anticipated. Unfortunately, the shipped
	memory was incompatible with the existing memory. Another shutdown will need
	to be scheduled when the correct memory has been supplied.

09:04	Scheduled shutdown of our main DNS and Authentication server for a memory upgrade.

16/Feb/99 to 19/Feb/99
	International link performance is intermittently poor as a result of several
	combined factors. All are beyond our control and outside our network.
	Telstra lost a 45 Mbps link, a DoS attack against Telstra from a US source,
	and an overloaded Fast-Ethernet segment at the Paddington exchange. All are
	being attended to (or have been fixed already) and we are assured of an
	improvement very shortly.

15:20	The link came good while I was reporting it down. Telstra are investigating.

15:15	Main internet link interrupted due to a failure between two Telstra routers.

23:28	Service finally restored to both Corryong servers.

17:45	Widespread power fail continues. Our UPS just shut down. 
	No estimate on restoration at this time.

16:05	Widespread power fail in Corryong. Our UPS is holding for the moment.