THE MYSTERY
A large municipality hired ACES to upgrade their telemetry system from MDS serial to MDS Ethernet radios. We replaced almost twenty radios and installed two access points. After the replacement one access point periodically locked up. Cycling power would restore operation and ACES Control Systems Investigators were not able to duplicate the problem.
THE CLUES
On the recommendation of MDS the CSI upgraded all firmware and swapped out access points, but neither attempt had any effect. The problem escalated from short communication dropouts to a total communications failure. Then, one cold and stormy night communication ceased and cycling power wouldn’t restore operation. A CSI traveled to the site to gather evidence. He found that the access point could not communicate with the master remote — but all other remotes worked fine. He investigated the master remote in the middle of the rainstorm, but could find no reason for the failure.
The next day the CSI and his assistant traveled to the master. They decided to replace the 10-year-old antenna to see if corrosion or fatigue had degraded the antenna, but that had no effect. Another clue was that the antenna would work for serial and not for the higher Ethernet data rates. The only thing left was the antenna cable. They found the problem.
THE PERP
The CSIs examined the antenna wire and found a secondary lightning arrestor in the cable. When they removed the rubber tape from the arrestor they found the connector was loose. After tightening the connector the signal improved 200%.
THE SOLUTION
Since tightening the connector the access point hasn’t locked up once. The lead CSI surmised that the bad connection at the remote caused the access point to intermittently enter into a routine where it tried to reestablish communication with the failed remote; eventually this took so much time that all other remotes lost communication.
POSTMORTEM ANALYSIS
All clues pointed ACES and MDS to the access point as the culprit. In the end it was just a loose connection — a connection that cost ACES and the customer thousands of dollars in wasted resources. Until the communication path completely failed we could not diagnose the problem.
Every day parts are replaced and resources are lost trying to find and solve control system problems. At ACES we always try to find the problem before replacing parts. In this case we never suspected the remote and we certainly were never pointed to the antenna and cable. Especially since the same antenna and cable had been working successfully for ten years as part of the serial radio link.