Historical status reports for 2002

23:15    No notice from anyone, but the problem has gone away again.
14517308 Presumably as a result of my e-mail.

22:35    William called back, asked me to mail directly to qwest
14517308 and advise them the problem has retuned.

20:20    Steve Scott. USA Latency issue again. Exactly the same as
14517308 last nights issue, he's referring it directly to William.

10:15    Reach advised there was traffic congestion and has forwarded
14516808 the problem to qwest for investigation. No further notice,
         but the problem cleared at 11:00, pings returned to normal.

07:20    Another call from William, and more tests and e-mail, reach
14516808 replied overnight saying there was no problem, yet clearly
         there is. Provided more traceroute information and requested
         further more thourough investigation from reach.

23:20    After a 36 minute call from William in Perth, the fault
14516808 is being escalated to Reach, who do all telstras international
         traffic and routing etc. SEEMS to be either a problem IN the
         USA, or MANIFESTING itself there at the moment.
         Expecting a call from William, no ETA.

22:20    Excessive (2900mS) ping times to the USA again. Called back
14516808 to chris at telstra, new fault logged.

19:55    Chris from Telstra called to say they had rebooted lon-core3
14516765 and fault fixed. (Noticed it here at 19:30)

19:10    Excessive (2900mS) ping times to the USA. Reported to telstra
14516765 waiting for any response. 

23:00    Corryong telstra site has (again) gone into "Rectifier Failure"
14485378 alarm and is running on batteries. Telstra are working on it.
08:30    Corryong site VCTS rack has power problem. Since this equipment is
14403820 NOT ours or under our control, it has been reported to telstra.
         "Rectifier failure" suggests a serious fault that will require some
         attention by telstra, however the site it running on battery for now.
18:25    Seems sometime between 17:43 and 17:55 telstra have done a major
14391656 exchange fiddle and "broken" all our out of town diversions. People
         calling directly into our numbers in Culcairn, Holbrook, Corowa,
         Yarrawonga or Winton may have experienced "busy" signals as a result.
         Telstra staff have been working on the fault, identified and fixed
         the exchange problems on both ALBG and WNGX exchange nodes at 18:45.
         Expecting full report into the issue tomorrow.
05:30    Telstra advise there will be an interruption to Albury services for
20021118-0500-01 approx 15 minutes:
         "Services connected via Albury will be interrupted at 05:00hrs ACDT
         (UTC+10h30m) on Mon 18 Nov 2002 in order to perform scheduled 
         maintenance work.
         The services will be interrupted for approximately 15 minutes
         during this activity."
00:00    Pathetic responses to USA via Sydney, gone from 200mS to over 800mS.
14375587 Reported to telstra, I suppose we just sit and wait now.
         (Update: fixed 01:00, no word from telstra as to cause)
01:20    Albury annex stopped responding again. Tech recalled to site.
	        System re-initialized and seems operational now. 
         Additional monitoring applied.
13:55    Annex ping times excessive, suggest imminent failure.
	        Tech recalled to site, fixed and back online 14:10
02:00    Somewhere between 02:00 and 04:00, we have been advised to expect
P75592V  an unspecified interruption to services at our MtBeauty, Winton,
PSI75582 Beechworth, Wangaratta, and Yarrawonga facilties for 25 minutes,
         while telstra "upgrade" the software on all their exchanges.
         We have no guarantee when this will happen, nor that they will
         "get it right" this time.

02:00    Somewhere between 02:00 and 04:00, we have been advised to expect
P75413N  an unspecified interruption to services at our Culcairn, Albury,
PSI75413 Corowa, Corryong and Howlong facilties for 25 minutes, while
         telstra "upgrade" the software on all their exchanges. We have
         no guarantee when this will happen, nor that they will "get it
         right" this time.

11:03    Indigo Indial number comming up "busy" even though we have plenty
S14272667 free lines. Reported to Chris at ISP faults centre.
08:35    Getting intermittent Megalink failures. Seems to have been only
14260611 in the last 2 days, and since telstra decided to "remove" the
         original "redundant, hot-spare" unit.
         *** TRAP from local agent at 24-Sep-2002 17:51:58
         *** Link Down, WAN1
         *** TRAP from local agent at 24-Sep-2002 17:51:59 
         *** Link Up, WAN1
         *** TRAP from local agent at 24-Sep-2002 18:11:49
         *** Link Down, WAN1
         *** TRAP from local agent at 24-Sep-2002 18:11:51
         *** Link Up, WAN1
         *** TRAP from local agent at 24-Sep-2002 20:13:11
         *** Link Down, WAN1
         *** TRAP from local agent at 24-Sep-2002 20:13:11
         *** Link Up, WAN1
03:58    Corryong site restored.

02:55    Corryong UPS batteries have given up. Site down.

02:28    Mains failure in Corryong. Site running on UPS.

20:00    We are planning a brief (approx 5 minutes) interruption to service
         for our Mount Beauty site to facilitate the replacement of a
         defective UPS. Exact time will depend on when cabling is completed.
         (Changeover completed approx 20:38, interruption to users approx
         2 minutes only, new UPS should improve things up there now)

17:30    Seems someone at our co-location site did something resulting in
         complete loss of ventilation, and subsequent rapid temperature
         rise in the computer room, resulting in UPS emergency shutdown.
         Unusual that it didn't generate any indication before however
         (investigations are continuing on this one). Temperatures
         restored, equipment checked and safe, site fully restored.

16;45    Complete loss of our MtBeauty site. No indicated power fail prior
         to the event, so unsure of the cause at this time.
         Tech being called to site ASAP.

09:10    Scheduled brief interruption to Corryong site to replace UPS.
	 Exact time was unknown as we were waiting until no users would
         be affected by the interruption. Downtime under 3 minutes.
09:15    Reported general slowness, particularly on http, but also on other
14129135 protocols (but not generally noticable on icmp).
         Spent 70 odd minutes on the phone with Tony Van Ree, one of telstras
         router techs, but been unable to identify anything specifically wrong
         except that things just don't look quite right somewhere.
11:32    Telstra routing issue, transient, only seems to have lasted a couple
14058577 of minutes, but reported to Helen who will investigate.
         traceroute to www.abc.net.au (, 30 hops max, 40 byte packets
          1  albury-core.albury.NET.AU (  3.953 ms  4.117 ms  2.535 ms
          2  Serial4.alb1.Albury.telstra.net (  7.624 ms  9.574 ms  8.810 ms
          3  * * Serial4.alb1.Albury.telstra.net (  8.146 ms !H
         (We were pushing over 1.5 megabits/second outbound at the time)
17:35	 Finally identified the problem, tracked to a faulty Ethernet Hub at
	 our Wangaratta site. Fault fixed, connection restored.
16:55	 Wangaratta Tigris has become unreachable. We're trying to regain comms
	 and control now. No ETA at this stage.
09:38	 At 9:27 our primary proxy ran into memory resource exhaustion and was
	 swapping heavily. Killed and restarted squid after 8 months, which did
	 fix the memory problem, but restarting picked up several inconsistencies
	 in the cache contents. 9:29 it was decided the best option was to blow
	 away the cache, rebuild the heirachy and restart squid. Completed at
	 9:34, everything operating within specs now, but will keep a watch on it.
15:30	 Annex has stopped responding. Tech recalled to site.
	 Reloaded 16:05, back on-line.
09:10	 Telstra tech (Norm) has found and fixed the fault. 
13942952	 From yesterdays work with Clive at the fault centre, the TDR showed the
	 line open at 2190 metres. Since the shop is only 190 metres, and since
	 TDR to another line at home shows 2200 metres, I concluded that someone
	 has un-jumpered the shop and re-jumpered to home. No idea why or who.
16:50	 Our fax line is dead. Not sure how long it has been out for either.
13942952 Called in to faults, they can't get anyone on to it immediately, but
	 will get it done between 9am-midday tomorrow. Sigh!
9:08	 Telstra have finally fixed the problem, and confirmed/admitted that it
13936771	 was widespread and within their network. Service restored between 9:26
	 and 9:29am, we were notified at 9:48am. Their official release says:
	   "During a scheduled network upgrade of a core router, a software 
	    issue occured which disrupted the routing table. This caused some
	    access routers within the Telstra Internet Direct Network to become
	    unstable and provide intermittent service during the fault period."
09:11	 Steven (Level 2 router tech) from telstra has finally determined that
13936771	 this is NOT a problem perculiar to ALI, but it is in fact Australia
	 wide and affecting most telstra customers. Believed to be a BGP problem
	 but they are working on it harder than ever now that it is a "wide area"
	 fault affecting so many.... Still no ETR.
07:55	 There are currently routing issues affecting some domestic and most
13936771	 international traffic. Telstra have been advised but are yet to respond
	 with a cause or restoration time. Dial-in is not affected. Trace to
	 pretty much any international destination goes thus:
	  4  ATM6-0-0-4.lon-core2.Melbourne.telstra.net (  26.156 ms
	  5  GigabitEthernet4-1.lon-core3.Melbourne.telstra.net (  26.848 ms
	  6  Pos1-0.fli-core1.Adelaide.telstra.net (  31.906 ms
	  7  Pos7-0.pie-core1.Perth.telstra.net (  63.818 ms 
	  8  * * *
	  9  * * *
	 10  * * *
16:02	 Finally, the Culcairn number seems to be working again, but I am still
13926723	 waiting on word back from telstra re HOW this happened and what they
	 are doing to prevent it happening again!
15:08	 Called the supervisor (eventually, after 4 dropouts and 14 more mins
13926723	 on the phone) of the faults section. "Vince" in faults, Adelaide, 
	 claims they cannot find who or why the block was put on and says their
	 system does not keep a history of who, when or why changes are made.
	 I have "persuaded" him to go find out who it was and lodge an official
	 complaint, which he has undertaken to do and call back within the hour.
15:03	 Called again at 14:00, and again at 14:30 when I was promised a call
13926723	 back "almost immediately" but did not get any calls! Conrad finally
	 called back about 14:55, our number had somehow mysteriously been put
	 on "block incomming calls". He cannot (or would not) say who had done
	 it or why. He fixed that, but of course the diversion is now cleared.
	 He has gone to re-instate the diversion, should be working again RSN.
12:30	 Culcairn indial number has stopped working. The fault appears to be at
13926723	 the culcairn exchange, it has been reported to telstra, waiting on a
	 response. Callers just get a "this call could not be connected" msg.
06:43	 Loss of routing to SOME USA sites, dies at telstra ken-core4.sydney
13914373	 Reported to John, no answer as yet. I suspect routing issue as no loss
	 of connectivity to our own USA server or various other sites I checked.
19:10	 Still no word from telstra, but approx 19:10, service has been restored.

19:06	 Update: One of the major core routers (LON-CORE3) in Melbourne failed
13851446	 close to 18:00, and has taken down "most of" the Telstra national
	 network. Telstra staff are working on the fault, but have absolutely
	 no idea when it may be fixed. Will update the website as soon as I get
	 more information of relevance.
18:00	 Cause unknown, still trying to get onto telstra.
	 No connectivity past our core router, suspect a major telstra router
	 failure, but until we can get in touch with them, have no idea.
06:20	 Total loss of connectivity past telstras alb3 router. Reported to Frank
13848092	 at telstra, appears to have started at 05:50 or thereabouts.
	  1  9 ms  8 ms  7 ms
	  2  10 ms  12 ms  15 ms
	  3  12 ms  24 ms  12 ms
	  4  *
	  5  *
20:54	 Brief interruption to Wangaratta services from 20:50 to 20:54. Our POP
	 in wangaratta became unreachable, possibly due to significant electrical
	 storms in the area at the time. Site has been re-contacted, all users
	 using that POP will have been affected for the 3-4 minutes the site was
	 unreachable. Investigations to cause underway.
16:16	 Corryong power restored, site fully operational again.

14:48	 Power failed at Corryong 14:20, ups just gave up.
	 No ETA on restoration at this stage, but until power returns, our site
	 is off-line, affecting users from Corryong, Walwa, Khancoban etc.
08:00	 Albury Annex stopped responding, affecting a small number of remote
	 users. Tech called to site, NAS recovered, services operational.
15:35	 Widespread mail problem affecting lots of ISPs. Problem finally found
	 to be unresolvable inputs.orbz.org, so to get things back on track, I
	 have disabled the orbz RBL from our mailer. This will probably affect
	 lots of other ISPs too. All services restored to normal operation by
	 15:55 once we found the problem and devised a workaround.
16:38	 No word on the telstra problem, but checking with various lookingglasses
	 around the world, they had a major route-flap and most of AS1221 was
	 damped. They may get around to telling us what it was some year...
15:37	 Major problems on the telstra network - intermittent connectivity only
	 from Albury to Melbourne, and it seems no connectivity from Sydney to
	 the USA at this time. Telstra are aware they have a problem and are
	 presently trying to identify what it is. No ETR at this stage.
21:03	 Power restored to parts of Lavington. Generator returned to standby,
	 site running on mains. Again, no interruption to any services.
	 Electricity authority has a LOT of repairs to make, seems a pole just
	 down the road sustained a direct strike and has sustained serious
	 damage, and numerous minor damage points throughout Lavington.
17:55	 Major power outage in Lavington at 17:53, power authority STILL trying
	 to locate it at 20:10!! Our secondary site is still running on generator,
	 so no affect to any ALI services have happened or are expected within
	 at least the next several hours.
18:30	 Lightning strike approx 17:55 resulted in multiple power surges at our
	 Albury office. Only one server was affected, which was running without
	 a UPS (don't ask!). As a result, traffic to ethernet-connected remote
	 clients (but NOT dial-in modem users) at MtBeauty, Wangaratta and 
	 Corryong, plus the Albury Webcam, were all off-line until the system 
	 could be repaired by technical staff (completed 18:26). Another 
	 shutdown will probably be required at some later time to be advised
	 in order to swap out the temporary UPS and replace its proper one.
17:00	 Reported to Anthea @ telstra helpdesk - increase in latency to and through
13693264	 sydney commencing from 16:00. Increased from 40mS to 200mS gradually over
	 the hour to 17:00 when I called it in.
16:31	 Excessive delay in telstras network at sydney between kent and paddington
13682331	 exchange routers. Telstra have confirmed the problem exists and are calling
	 suitable technical staff to attend to it.
	  4  ATM2-0-4.win-core2.Melbourne.telstra.net (  32.579 ms
	  5  GigabitEthernet3-0.win-core1.Melbourne.telstra.net (  35.170 ms
	  6  Pos2-0.ken-core4.Sydney.telstra.net (  44.919 ms
	  7  GigabitEthernet0-0.pad-core4.Sydney.telstra.net (  235.986 ms
	  8  GigabitEthernet0-1.syd-core01.Sydney.net.reach.com (  227.695 ms
	  9 (  401.247 ms
14:05	 Telstra have re-arranged their network routing in order to minimise
	 the effect of the loss of international bandwidth.
18:30	 Telstra have confirmed the cause of the slow international traffic is
	 the failure of SMW3 submarine fibre optic cable (yet again). As is now
	 quite typical, they have passed the buck to someone else (reach.com)
	 who supply all telstras international bandwidth, and have said they have
	 no estimated time of restoration of service. (My guess is several days
	 at least, perhaps more from past experiences).
17:45	 Excessive delay on all international traffic (via both sydney and perth)
13678131	  4  ATM2-0-4.win-core2.Melbourne.telstra.net (  23.587 ms
	  5  GigabitEthernet3-0.win-core1.Melbourne.telstra.net (  25.230 ms
	  6  Pos2-0.ken-core4.Sydney.telstra.net (  33.851 ms
	  7  GigabitEthernet0-0.pad-core4.Sydney.telstra.net (  38.073 ms
	  8  GigabitEthernet0-1.syd-core01.Sydney.net.reach.com (  33.800 ms
	  9 (  603.833 ms
	 10  p4-2.lsanca2-cr1.bbnplanet.net (  598.576 ms
	 11  p3-0.lsanca2-br2.bbnplanet.net (  586.290 ms
	 Called in to Lindsay at telstra bpd helpdesk.
16:15	 Secondary DNS was rebooted to clear a hardware fault associated with
	 NFS mounted CD-ROM file system. Uptime was just on 400 days. Sob.
14:40	 Power restored (earlier than expected), however it is still comming
	 and going. Hopefully no more significant breaks before the UPS has
	 fully recharged!
13:33	 There has been a widespread powerfail from Tallangatta east to well
	 past Corryong. The Electricity authority advise estimated time for
	 restoration of services to be approx 16:00 today.
	 Corryong site shutdown 14:00.
12:50	 No explanation, yet.
	 Fault is cleared. Had spoken to Helen, tech working on it (Dan) was as
	 much use as tits on a bull and seemed to have no clue at all.
12:17	 Telstra have lost a major router at albury.
13588927	 They are getting someone onto it immediately.
	  1  albury-core.albury.NET.AU (  2.270 ms
	  2  Serial4.alb1.Albury.telstra.net (  5.075 ms
	  3  FastEthernet0-0.alb3.Albury.telstra.net (  6.940 ms
	  4  *
	  5  *
16:30	 Cooma finally Restored.  Major fault with Telstra SDH which knocked
	 out Telstra services to the Cooma area.  Most of the Snowy area
	 effected including Cooma, Jindabyne, Bredbo.
10:30	 Cooma Link dropped away, equipment still operating.  Appears that
13580253	 Telstra link has a fault.  Telstra contacted and will notify us
	 as soon as they find the fault!
21:21	 Contrary to prebhvious information, power to site was only just restored
	 at 21:17. MtBeauty site back up and operational. Replacement UPS to be
	 ordered tomorrow and installed ASAP.
21:03	 At 17:45 there was a widespread power fail in the MtBeauty/Falls Creek
	 area, co-incident with a thunderstorm. For unknown reasons, we lost our
	 MtBeauty server within 20 minutes of that time. Despite power being 
	 restored to the area approx 20:15, our server has not come back on line
	 and has defied attempts to access it remotely. A tech has been despatched
	 to investigate.