After a bit of debugging it turned out that this issue was related to cookies. The specific URL we were trying to load went through a number of redirects before ending up at the final page. The problem was that the first redirect set a cookie, and that needed to be present for the further redirects to work. By default Java, and hence GATE, doesn't maintain cookies across requests, as each connection is handled independently.
If you are using GATE in an embedded context, then it is trivial to add support for cookies using the default Java cookie handler. This is a JVM level setting so once configured in your own code, all requests made by GATE to load documents will also gain support for handling cookies. The entire solution is the following single line of code:
java.net.CookieHandler.setDefault(new java.net.CookieManager());
The problem we faced though, was that we wanted to be able to load documents that required cookies from within GATE Developer and that required a little more thought. Whilst we could have just added the code to GATE there are a number of reasons not to (details of which are outside the scope of this blog post) and I wanted to make it easier for all existing GATE users to be able to use cookies without needing to upgrade. The answer is the rather versatile Groovy plugin.
If you load the Groovy plugin into GATE Developer you can then access the Groovy Console from within the tools menu. Simply pasting that single line of code into the console and executing it is enough to add the cookie support within that instance of GATE. It's slightly annoying that it won't persist across multiple instances of GATE, but as it's such a simple trick hopefully it's easy enough to apply when needed.