Adventures with IPv6 Path MTU Discovery

As I mentioned in a recent blog post, after signing up for a Social Security online account, I wasn’t able to access the login page at their website from my home. After some investigation, I discovered that the problem was caused by my “tunneled” IPv6 connection, which only accepts packets of a certain size (maximum transmission unit, or MTU) and no larger. The website wasn’t discovering my MTU correctly, and as a result large packets it sent just weren’t making it to me.
Path MTU Discovery
For those of you unfamiliar with IPv6, the next generation of Internet protocols, here’s some background on how this should work. Most systems send packets of up to about 1500 bytes, the limit imposed by Ethernet. But there are some network paths that can’t handle packets that big. Connections that are tunneled, which means that a packet is enclosed in another packet, can’t handle packets as large because of the size of the enclosing packet. The IPv6 connection to my home is tunneled, enclosed in IPv4 (the predominent, “old” Internet Protocol) packets, because my Internet Service Provider doesn’t yet provide IPv6 service in my area and I wanted to experiment with it.
Two ways that oversize packets can be handled are fragmentation and Path MTU Discovery. IPv4 uses fragmentation, which calls for routers to split incoming packets into two or more pieces and send them onward separately. That’s a fair amount of overhead for the routers, doesn’t address of the root cause of the problem, and is otherwise considered harmful. IPv6 uses Path MTU Discovery, where the router that runs into the MTU limitation sends the originator back a message, called an ICMP Packet Too Big message, asking them to resend at an acceptable MTU. The sender is expected to adjust its packet transmission and send this and subsequent packets at a forwardable size.
My Problem
Unfortunately, the symptoms of an MTU blockage aren’t at all obvious. In my case, the website download just stalled. I did a little work with a debugging tool called Firebug and it showed that link I clicked had redirected to another site, and then nothing. Another tool, Wireshark, that shows the individual packets, showed that I had opened a connection to the website, sent a “TLS Client Hello” packet, (the first step in making a secure https: connection) and then got nothing, except the keep-alive packet. The response to a Client Hello is a Server Hello, which is typically quite large because it contains such things as the server’s certificate. I retried from a nearby coffee shop that happens to have IPv6 for its customers, and it worked from there. It also worked if I manually tweaked my network interface to smaller packets (the server takes the hint and does likewise) or if I just turned off IPv6. Apparently the IPv6 Path MTU Discovery mechanism wasn’t working.
I began with the standard Social Security support resources, which of course aren’t designed for this sort of problem at all. I then consulted with a former colleague of mine from Cisco, Dan Wing, who suggested I try a mailing list dealing with US Federal IPv6 deployment. Unfortunately that didn’t reach any of the right people at SSA. Dan then tried a private mailing list for IPv6 providers, and got a lot of attention at very high levels. The person responsible for IPv6 deployment at Social Security reached out to me, and convened a conference call, including his firewall and load balancer engineers, during which I reproduced the problem. They reported that they had not received any Packet Too Big (PTB) packets.
I opened a case with Hurricane Electric, my IPv6 tunnel provider (a free service, I might add). They reproduced the problem and sent traces indicating that they are sending PTBs in this specific case. The question became one of where the packets are being dropped.
I traced the route to Social Security’s website, and wanted to confirm that the last hop on the traceroute was, in fact, theirs. Their firewall engineer then noticed that a network, which turned out to be the one from which the Hurricane Electric PTB packets was originating, was blocked due to a DDoS attack that had occurred last summer. The block was removed (temporarily, for now) and the problem cleared. Social Security is currently determining whether it can stay removed, or what should be put in its place that won’t cause collateral problems.
Lessons
I learned a great deal about IPv6 from this experience, which was much of my motivation for deploying IPv6 in the first place. But that’s not everybody’s motivation, and we need IPv6 to work reliably for people who aren’t network engineers with enough time, persistence, and helpful friends to solve these problems. Here are a few take-aways from this particular experience:
- Path MTU Discovery failures cause problems that aren’t always obvious and easy to spot.
- Especially at this stage of IPv6 deployment where more people have to use tunnels to get IPv6 service, having Path MTU Discovery work correctly is very important and should be included in testing (if it isn’t already).
- Bear in mind that PTB packets may come from intermediate routers in the network, typically not from the user’s endpoint, in designing firewall rules and access lists.
- Permit PTB packets wherever possible, including access lists used to mitigate DoS attacks (when this is consistent with the type of DoS attack, of course)
- Few Path MTU Discovery testing resources seem to exist on the Web. It would be nice to have some/more. As it happens, they probably wouldn’t have helped here since the problem was specific to particular IPv6 networks, but it would have helped eliminate some possibilities more easily.
Acknowledgements
I’d like to especially thank the many folks from the Social Security Administration Division of Network Engineering/Office of Telecommunications and Network Operations, whose responsiveness and commitment to getting this fixed were superb. Thanks also to Hurricane Electric, both for their free IPv6 tunnel service and willingness to provide support despite the fact that is is free, and to the folks on the IPv6 provider mailing list who jumped in to help me find the right people to contact. Special thanks to Dan Wing for using his contacts and for providing encouragement and advice along the way.
For security, this problem (how to get hold of someone clueful) has been largely solved. Every large organization has an email address that is easily findable to report security issues. In the absence of this, there is no reason to delay exposing a discovered vulnerability.
Unfortunately, the solution in http://xkcd.com/806/ doesn’t seem to be widely implemented (if at all).
I don’t know how to map the security solution onto the general network connectivity / ipv6 specific problem space.
Turns out that IPv6 tunneled connections to ssa.gov, pay.gov and a host of other federal sites are not working as of today for the same reasons as this article documents. No idea who to contact or whether it will every be resolved as the issue has been ongoing now for at least a couple of YEARS that I am personally aware of. In addition, Netflix and Yahoo are also both suffering from similar problems that dues to the large number of IPv6 resources in the background I have been unable to isolate which specific IPv6 networks are affected. This all goes to show the amateur/inexperienced networks engineers that work at these places.
Sorry to hear that the problem has returned. I’m fortunately on a native IPv6 connection so I’m not subject to the problem any more. Unfortunately, blocking of ICMP packets by firewalls causes more collateral damage for IPv6 than it does for IPv4, and the problems can be subtle as well (as in this case where the problem only shows up for MTU-constrained paths).
I’m not sure what to suggest. Previously I had some very high-level help, and the problem didn’t stay fixed. Not sure what else we can do.