The Life and Times of Michael Shadle

Blog | Projects | Netflix

Large File Uploads Over HTTP - The Final Solution (I Think)
Tuesday, August 26th, 2008 at 8:18 pm

Problem statement: HTTP sucks for file uploads.

You know it. I know it. The problems?

What would the ideal file upload experience support?

With all this in mind, I somehow stumbled across the idea (roughly posted here) based on the time-tested learnings from Usenet and NZB files, and BitTorrent. The main idea? Splitting the file up into manageable segments. There's also some other logic too, but that's the main idea.

Why do I claim this is the final solution?

What's required, how does it work?

As of right now, this is what I have down (it has changed already since the PHP post):

  1. The client contacts the server to begin the transaction. It supplies the following information:
    • Action = "begin"
    • Final filesize
    • Final filename
    • Final file hash (SHA256 or MD5, still haven't determined which one)
    • A list of all the segments - their segment id, byte size, hash (again SHA256 or MD5) - XML or JSON or something
  2. Server sends back "server ready, here's your $transaction_id"
  3. Client starts sending the file, one segment at a time, with the following information:
    • Action = "process"
    • Transaction ID = $transaction_id
    • Segment ID = $segment_id
    • Content = base64 or uuencoded segment (for safe transit)
  4. Server replies back "segment received, transaction id $transaction_id, segment id $segment_id, checksum $checksum"
  5. Client compares the checksum for $segment_id, if it matches, move on to the next segment. If not, retransmit.
  6. When the client is done sending all the segments, client sends message to the server:
    • Action = "finish"
    • Transaction ID = $transaction_id
  7. Server assembles all the segments (if they're separate) and sends to the client:
    • Transaction ID = $transaction_id
    • Checksum = $checksum of the final file
  8. Client compares the checksum to it's own checksum. If it matches, client sends message to server:
    • Action = "complete"
    • Transaction ID = $transaction_id

Viola, done. I think the "protocol" transmits some extra information that isn't needed; so some of this might need to be cleaned up. This is the initial idea though. Props to Newzbin for inventing NZB files which was a big influence in this concept.

I'm somewhat rushing this post out, hopefully it solicits some feedback. I'm going to be revising this and working with a Java developer to work on a client written in Java. Hopefully someday we'll get one with less overhead. I'll post PHP code as I write it too to handle the server portion of it.

This entry was posted and is filed under Development. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

One Response to "Large File Uploads Over HTTP - The Final Solution (I Think)"

  1. On August 27th, 2008 at 8:50 am, mike says:

    In laymen's terms I tried to make it into a conversation, and I think I determined that I could send even less information back and forth.

    Client says: "I have $filename and want to upload it" (action = 'begin')
    Server says: "I am ready, here is your $transaction_id" (status = 'OK', transaction = 'transaction id')

    Client says: "Here is segment $segment_id for transaction $transaction_id it has $size bytes and md5 checksum is $checksum, and the body is uuencode($body)" (action = 'process')
    Server says: "I have received $segment_id for transaction $transaction_id" (repeat these two as many times as needed) - the server is reporting success or not based on (status = 'OK', transaction = 'transaction id')

    Client says: "I'm all done with $transaction_id now. Final full file checksum is $checksum and $size bytes" (action = 'finish')
    Server says: "I have received $transaction_id in full" (status = 'OK', transaction = 'transaction id')

    Client now knows the server has received the file in full and moves on to the next file (if uploading multiple files)

    This is starting to feel more and more like parts of a Usenet client reused to work with HTTP :)

Comment on this post:

Entries (RSS) and Comments (RSS). 14 queries. 0.172 seconds.