Multipart download and Byte Ranges
The why and how of it.
Let's break down what happens when you click on a button to download a file online. You happen to send a HTTP get request to the website's respective web server, requesting for the object, the website responds back with a "200 OK" , "you're good to go" , with the requested object in it's body. Our browser takes care of the requests and processing the responses (pretty smart-ass). You can track all of this by toggling the network tab on your browser's developer toolkit (ctrl + shift + I ). Typically a server sends out the entire object packed into one response message that we download . What if we could split the entire object into fragments and download them simultaneously? Picture a queue infront of an ice-cream shop, it extends for a few couple of meters and the the time taken to serve a customer is a few minutes. Boy oh boy, that's gonna take you a while till you get your cone. What if, instead of one long queue, we had three counters serving three parallel queues at the same rate? This is what multipart download does in a nutshell. Break, grab concurrently , piece it back together and give it back to you, voila. We all use download managers to get our downloading done in a fast and efficient way, don't we? This is the underlying principle. Now how do get this done?
Range Byte Requests
HTTP protocol supports what is called a Range Byte request. We add an extra Range header to our http request. The format is Range : bytes= 0-1000 or Range : bytes=100-200/2700 where (2700 is the size of the object). Also known as byte serving, range requests help you request for just a portion of a file , instead of downloading the entire file. This is very helpful while streaming large files, especially video files, one could skip to portions they wish to view without having to wait for the entire file to download. All you have to do is send a byte request for that portion of the video they wish to view. It serves useful while downloading a lot of files or while viewing pdf applications, you download just the portion that you are currently viewing and nothing more!
The server has to support range requests as well. You could check this by sending a head request. Which requests only for the headers of the response excluding the content. You can check this out by using cURL, a command line tool.
cURL -X -i -H "Range :bytes=0-100" " http://demo8127239.mockable.io/h"
Is translated into :
GET hi HTTP/1.1
Host: demo8127239.mockable.io
Range: bytes=0-100
Server responds back with a
200 OK
Accept-Range : bytes if range requests are supported.
or else, it responds with a
200 OK
Accept-Range :none
However if a GET request is placed we get either a 200 OK
response if range bytes isn’t supported or a 206 Partial Content status.
HTTP/1.1 206 Partial Content
Content-Range: bytes 0-100/343
Content Length :343
Content-Type :application/json
The server sends out the entire file instead if it doesn’t support range bytes.
Multiple range requests can also be placed together
Range :bytes=0-100, 200-300
BROWSER SUPPORT:
Browsers place a restriction on the number of concurrent TCP connections one might establish with a particular server. The list of browsers and their limit is given below.
| Version | Maximum connections |
|---|---|
| Internet Explorer® 7.0 | 2 |
| Internet Explorer 8.0 and 9.0 | 6 |
| Internet Explorer 10.0 | 8 |
| Internet Explorer 11.0 | 13 |
| Firefox® | 6 |
| Chrome™ | 6 |
| Safari® | 6 |
| Opera® | 6 |
| iOS® | 6 |
| Android™ | 6 |
This restriction is placed since multiple connections to the same server might lead to flooding of requests and this might lead the server into thinking that this is a DDoS attack (denial of service attack).
Why multipart
I get that the number of chunks you can download concurrently is limited, and the limit is pretty meagre, so how beneficial can it actually be? Most servers treat each connection uniquely and allocates bandwidth to it. When we make several connections to a server at once, each connection is treated as an individual unique connection and is allocated the same bandwidth. So what happens now, is that the server is pushing out more data to the same user at a given time. So how’s that for beneficial huh? Although many users requesting several connections could overload the server or mislead the server into thinking it is under attack, this is where the built in browser restrictions come into place, although there are workarounds around this,(firefox) where the user gets to set the limit. Also when the server from where you are requesting content is placed all the way on the other side of the planet, this adds latency to the response, (i.e, the propogation delay). In such a scenario, the bits transmitted in multipart trumps the delay, so yay! It also depends heavily on your internet connection, your bandwidth, the distance between you and the requested server and the bandwidth of the server (that’s a lot of factors coming into play). However multipart download helps save you time by utilizing the entire bandwidth as long as you have download requests lined up and this is what makes range bytes a better option than our single stream/channel download. Most download managers support what is called dynamic fragmenting or segmentation, where threads are spawned for each byte range (in an application, the number of parallel threads set by default is 8, for a browser extension, it depends on the number of parallel connections the browser supports , view the above table to see your browser’s capacity), if a thread frees up, the segment yet to be downloaded or a thread whose download is in progress is dynamically fragmented in a such a way that the total bandwidth is used all the time. This speeds up your download process.
Also range byte request comes to the rescue if your download fails mid way. You don’t have to download the entire file all over again , you just pick off where you left off.
Since the size of each chunk is going to be way lesser than the actual file when sent, the chances of bit errors creeping on to your packet is less.
Pretty cool huh?
But wait.
There might be one teeny tiny drawback. When a range header is set, the browser sends what is called a preflight request. This is done to check if the range header and the requested method is allowed on the server side CORS (Cross Origin Resource Sharing) specification. The actual request is placed only after an OPTIONS response is received that explicitly states that the requested method and header is allowed.
access-control-allow-headers :range //accept requests with the range header and process it.
access-control-allow-methods :DELETE,GET,PATCH,POST,PUT,OPTIONS //accept only those requests for the given methods.
access-control-allow-origin : * //accept request from any client
This was brought about to prevent CSRF attacks. Yay for user’s security, nay for multipart downloading, since each byte request has to go through the preflight - OPTIONS response process which adds substantial overhead. :(
Getting over this.
Browsers don’t tend to send preflight requests as long as the generated HTTP request is a simple one. Now what categorizes as simple ?
If the following methods are used:
If the following headers are used:
AcceptAccept-LanguageContent-LanguageContent-Type(but note the additional requirements below)DPRDownlinkSave-DataViewport-WidthWidth
Only if the Content-Type header has any of the following values:
application/x-www-form-urlencodedmultipart/form-datatext/plain
Cool.
ME : But how do we bypass the range header?
Byte Ranges with URL : Bypass range header you say, say no more fam.
byte ranges can be set in the url :
http://host/dir/foo;bytes=0-499 //requesting for the first 500 bytes
http://host/dir/foo;bytes=0-99,500-1499,-200 // multiple range requests , first hundred bytes, bytes from 500 to 1500 and the last 200 bytes
Again, the server must support this.
Even if it doesn’t multipart download makes a difference while downloading files of large size. So if you ask me, I’d say multipart for the win !