Observer

Internet Hop in TW router causing drop in connection

Our headquarters is located in Bowling Green, Ohio. (I’ll label this “HQ” from here on out – IP address is 64.31.80.90)

 

We have a branch office in Lebanon, OH using Spectrum Business Class (I’ll label this “BO” – IP address is 98.103.33.38)

 

We have a Site to Site VPN tunnel between HQ and BO. This allows the BO to use the servers at the HQ as well as the phone system. This tunnel has been up and running well for a couple of years.

 

Starting around July 16th, we started to see the tunnel drop briefly anywhere from 0-5 times per day. The drop is brief, but it is enough to close connections and drop phone calls and is irritating the BO staff members greatly.

 

I started by rebooting routers and firewalls at both ends of the tunnel. The problem continued. I spent literally hours poring over logs in the firewalls looking for something that could explain what I was seeing and could find no indication that anything was wrong. I talked to WatchGuard (the firewall manufacturer) and they looked at it and was able to find nothing other than the firewalls were losing communication with each other (but not the internet).

 

I started placing ping traps on a machine at the BO. What I found was that when the tunnel went down, that the BO was still reaching the Internet, but was just dropping pings to the HQ public IP (64.31.80.90)

 

I checked for bandwidth spikes on both ends. The BO traffic barely is barely using any bandwidth. The HQ normally stays in the 20% range, with very occasional spikes up to 60%, but nowhere near hitting the bandwidth cap.  

 

I then ran a trace route and mapped out the hops. I started running continuous pings on those hops to see if I can see where the traffic breaks down. This is where I found a problem. Below is a screenshot of me pinging the hops a capturing a breakdown in the tunnel. What you see is a series of continuous pings from the HQ to the BO. The first square is a ping to a device at the BO through the tunnel. The second is a ping of the WAN of the BO. Each square after that is a ping to the sequential hops between the HQ and the BO. As you can see, the first drop is the 96.11.184.177 hop, and everything cascade fails after that. This capture was taken on 7/25 at 3:04 PM:

 pingPEH.png

 

This pattern repeats each time the tunnel goes down. I also tried it the other way (Using a PC at the BO, I pinged the hops to the HQ) and I similar results EXCEPT I was able to ping all the hops up until 96.11.184.177 at which point I got a mix of “Request timed out” and “TTL expired in transit”.  I checked and the 96.11.184.177 hop is a TW router. Do you guys have a router flaking out?

 

I called and speant 2 hours on the phone with Spectrum support. The woman said that the engineers refused to speak with me with out a trace route of the incidences. I have since gathered some and sent them and have heard nothing back. Is there anyone there willing to look into this issue for us please?

SO-TR-0803a.png

13 REPLIES
Sharer

Re: Internet Hop in TW router causing drop in connection

Just out of curiosity, but that IP through a looking glass and routing from the west (slingshotting Chicago) was getting bad lag spikes coming through that great lakes region. From the south (around Nashville) was solid though. Wasn't all that surprised to see tbh though. We've seen such issues before... that region around Ohio has always been a bit of a problem.

Something that may be worth looking into are options for keep alives and/or idle timeouts. Might be able to tweak things a bit to help mitigate some of the disconnects. Remember having to fiddle with them on the old AS400's... having to vary Ethernet off/on and all that crud several times a day was a pain.
Observer

Re: Internet Hop in TW router causing drop in connection

TBH the outage only lasts ~5 seconds. If we were running something buffered like video, it probably would not be that noticable. Unfortuntely we have stuff going through the tunnel like VOIP and telnet that are extremely sucseptible to packet loss and this is enough to be extremely irritating. It was all working fine until about 3-4 weeks ago when we started having daily trouble. It took me forever to track down where the breakdown was. Once I did and went to Spectrum, I've been kind of blown off.....

Sharer

Re: Internet Hop in TW router causing drop in connection

Been ages since I farted around on a Watchguard... formally left the field in 2006. Could look into the key magement keepalive. (IKE Keepalive was the option IIRC). Allows you to set a sort of resynch interval... on that recurring cycle it will rebuild the session if it detects stalls. They also started a dead peer detection mechanism where it sort of does a check before the tunnel transmits or something like that. One creates more traffic because of the "heart beat" cycle, while the other my inject a touch of delay if the tunnel frequently goes idle.

By no means good fixes, but one of them may offer a workaround of sorts.
Observer

Re: Internet Hop in TW router causing drop in connection

I do not think that a keep alive setting on the firewalls is going to help if a hop on the internet stops responding. 

Think of the old 2 tin cans and a string. If someone cuts the string in the middle, no amount of fiddling with the cans is going to help. 

Sharer

Re: Internet Hop in TW router causing drop in connection

Depends on the routing policy... more specifically how quickly routing responds to the conditions surrounding that hop.

Right now it sounds like there may not be a mechanism for recovery for IPSEC when the connection stalls... it just drops the connection with no retry. That, or you have exhausted the number of recovery attempts... seem to recall there was a threshold for that as well. It is all a bit fuzzy for me, but do recall there were options for guarding against such session instability scenarios. Though they have their limits, worth a look if no one has worked on them yet..
Observer

Re: Internet Hop in TW router causing drop in connection

Well, as soon as the hop starts responding again, the VPN tunnel reconnects immediately. 

Bear in mind that the trace that I am posting is not going through the tunnel in any way, it is a straight trace between the two endpoints.  

Re: Internet Hop in TW router causing drop in connection

Im having a similar problem with my home connection. But this is just basic service in general. Our connection drops 5 times a day, that we notice. The router and extenders have to reconnect, because "loss of internet".
Replaced the modem, with no change, same with router.
Its never down for long. But, its a pain. I wonder if this is TWC, testing throttling, that they paid (aka bribed) the FTC Chair, POTUS and others for. Anyway, does anyone know of a monitoring tool i can use? I could set up a ping or tracert with logging, but think i would get the standard "We at TWC do not believe in QOS, just pay us an shut up" type answer.
Observer

Re: Internet Hop in TW router causing drop in connection

pingplotter.com is a traceroute program that can tell you where the hops are breaking down

 

Observer

Re: Internet Hop in TW router causing drop in connection

Just happened again:

 

08061430.JPG