After a bit of debugging it turned out that this issue was related to cookies. The specific URL we were trying to load went through a number of redirects before ending up at the final page. The problem was that the first redirect set a cookie, and that needed to be present for the further redirects to work. By default Java, and hence GATE, doesn't maintain cookies across requests, as each connection is handled independently.
If you are using GATE in an embedded context, then it is trivial to add support for cookies using the default Java cookie handler. This is a JVM level setting so once configured in your own code, all requests made by GATE to load documents will also gain support for handling cookies. The entire solution is the following single line of code:
If you load the Groovy plugin into GATE Developer you can then access the Groovy Console from within the tools menu. Simply pasting that single line of code into the console and executing it is enough to add the cookie support within that instance of GATE. It's slightly annoying that it won't persist across multiple instances of GATE, but as it's such a simple trick hopefully it's easy enough to apply when needed.