Archive

Archive for August, 2008

Large file uploads over HTTP - the final solution (I think)

August 26th, 2008 3 comments

Problem statement: HTTP sucks for file uploads.

You know it. I know it. The problems?

  • No resuming
  • POST multipart/form-data bloats the size of the file due to encoding
  • Slow connections typically time out on large files
  • Any server resets or any other network "burps" on the path from client to server effectively kills the upload
  • People have had moderate success by tuning their webserver and PHP to accept large POSTs, and in general, it works - but not for everyone and it suffers from everything previously noted.

What would the ideal file upload experience support?

  • It should resume after manual pause, a server reset, a timeout, or any other error.
  • It should allow for multiple files being uploaded at once.
  • It should work transparently over HTTP - which means proxies will support it like any normal web request, it can be done over HTTPS (SSL), it will reuse cookies and standard HTTP authentication methods.
  • (Ideally!) the browser would handle this itself without requiring Java, Flash, or any other applets.

With all this in mind, I somehow stumbled across the idea (roughly posted here) based on the time-tested learnings from Usenet and NZB files, and BitTorrent. The main idea? Splitting the file up into manageable segments. There's also some other logic too, but that's the main idea.

Why do I claim this is the final solution?

  • It can reuse the same HTTP/HTTPS connection, so proxies and HTTP authentication can be honored.
  • It doesn't care what speed of your connection - due to the small size of the files, it's easier to get them to the server and each piece can be confirmed one step at a time. No more having to start from the beginning due to a failure or timeout.
  • It will support multiple files at once. The server could (although we might not implement it this way) support multi-threaded uploading of the same file, too, just like BitTorrent or Usenet downloading - upload multiple pieces at the same time and assemble them in the end. We're trying to make a decision whether or not we want to do that right now. The fundamental difference is an implementation detail on the server end.
  • It allows for any client that can split a file up, hash it, encode it and upload it via POST
  • It will still require an applet, since browsers have no support for anything but standard file upload semantics (Although this would be a neat thing to get into a specification)

What's required, how does it work?

As of right now, this is what I have down (it has changed already since the PHP post):

  1. The client contacts the server to begin the transaction. It supplies the following information:
    • Action = "begin"
    • Final filesize
    • Final filename
    • Final file hash (SHA256 or MD5, still haven't determined which one)
    • A list of all the segments - their segment id, byte size, hash (again SHA256 or MD5) - XML or JSON or something
  2. Server sends back "server ready, here's your $transaction_id"
  3. Client starts sending the file, one segment at a time, with the following information:
    • Action = "process"
    • Transaction ID = $transaction_id
    • Segment ID = $segment_id
    • Content = base64 or uuencoded segment (for safe transit)
  4. Server replies back "segment received, transaction id $transaction_id, segment id $segment_id, checksum $checksum"
  5. Client compares the checksum for $segment_id, if it matches, move on to the next segment. If not, retransmit.
  6. When the client is done sending all the segments, client sends message to the server:
    • Action = "finish"
    • Transaction ID = $transaction_id
  7. Server assembles all the segments (if they're separate) and sends to the client:
    • Transaction ID = $transaction_id
    • Checksum = $checksum of the final file
  8. Client compares the checksum to it's own checksum. If it matches, client sends message to server:
    • Action = "complete"
    • Transaction ID = $transaction_id

Viola, done. I think the "protocol" transmits some extra information that isn't needed; so some of this might need to be cleaned up. This is the initial idea though. Props to Newzbin for inventing NZB files which was a big influence in this concept.

I'm somewhat rushing this post out, hopefully it solicits some feedback. I'm going to be revising this and working with a Java developer to work on a client written in Java. Hopefully someday we'll get one with less overhead. I'll post PHP code as I write it too to handle the server portion of it.

Categories: Development

Updates on home storage solutions

August 17th, 2008 No comments

For a while I was looking into and hoping to go the eSATA route. Immature chipsets and lack of OS support have somewhat kept that idea frozen.

I want to use ZFS for a filesystem. Or a filesystem -like- ZFS. Currently there are no others out there like it. There is Btrfs, and there is another one I thought picking up steam (although I can't remember it now for the life of me) - both of those however aren't stable yet. ZFS is still not as stable as I wish on FreeBSD. It won't run natively on Linux, and I don't think it's very stable either. The only true way to get a stable filesystem like ZFS is to in fact run ZFS on Solaris.

I was not excited to try Solaris. It used to be a joke to call it "Slowaris" - I remember the old days of using random UNIX shells and hating Solaris boxes because I couldn't run hardly anything or compile anything. However, that's changed somewhat now. I took the plunge and installed SXCE (Nevada build 94) since Solaris 10u5 did not support the new CIFS implementation. So far, I've learned a little bit here and there about Solaris system administration and I've been using ZFS to create some snapshots, filesystems, etc. It is so easy even my mom could handle it. Not to mention Solaris has some pretty neat tools like the Solaris Fault Manager, which I have crontabbed to run every 30 minutes and email me if -any- hardware/faults get reported. So I have this great box sitting there running the best filesystem possible integrity-wise, and it is also damn quiet. It's not a small form factor which I would have liked, though.

So I begin looking into trying to get a small form factor ZFS box. I might be able to, if I want to hack up a Shuttle style case (see Udat at Mashie Design - there are mini-itx motherboards now with 6 SATA ports onboard which would allow for a 5 drive RAIDZ1 + maybe use a Compact Flash card for a boot drive. However, that requires case modding and can only fit 5 drives. I'm not sure I really want to try all that.

Instead, even Mashie himself has admitted to moving into larger form factors for storage boxes (I believe he's using a CM Stacker nowadays) - and from a space perspective, it probably does make the most sense.

Currently I'm exploring going with a full-size case that could hold 15 drives, or a mid-size case that could hold 10 drives (not including optical + boot)

I think I may have found a winner, for the mid-size option. Lian-li has a self-proclaimed "silent" chassis that has 9 bays (which means 6 bays for 2x5-in-3 modules) + optical + 1 boot disk in the spare 5.25" bay. Roughly 8-9TB usable in a mid-size case that would be about as silent as it can get. It even has a front door on it. Actually, there's a second place one - this one is much more extensible, but has no door on it, which I think would help shield any noise coming from these 5-in-3 modules. See here. Cooler Master also has a case like that too - again, no door on the front. I wish I had local access to all of these cases to try each of them out. Right now I have to order them online, and then pay possible restocking fees, and at least the cost of shipping the product back. I'm tired of that back and forth game. I've had to do it too many times in the past.

Lian-li also has a full-size chassis that already includes 10 internal drive bays, + 5x 5.25" front bays. Those extra bays could be used for more drives too. So many options... I'm trying to determine the amount of space I want to use in my office and how large and bulky I want these machines to be. Ideally I would like as little CPUs as possible - no need to have full-out operating systems installed and having to manage all that. I'm just tired of all my equipment making noise, getting hot, failing, and data corrupting due to failure or bit-rot. It's time to upgrade and streamline.

I've pretty much examined every chassis at Lian-li, Coolermaster and Antec. I have an Antec P182 right now. It's great and quiet, but does not support as many drives as I want for these next boxes. Stay tuned as I pull the trigger and build another storage box soon. Perhaps I'll share some pictures and specifications with my existing storage box I just built, which is for "off site" daily snapshots of my hosting infrastructure and some other servers I administer.

These next machines will be for my own personal use... and now I've gained some good knowledge on what to expect. It's been a while since I've built a normal-sized machine, as I've been a Shuttle XPC user for years now :)

Categories: Toys

nginx + WordPress - redux

August 17th, 2008 1 comment

Note: this has been outdated once again. Now it can be simplified with one simple directive shown here.


There's been a minor tweak required in my original WordPress+nginx rewrite rule post.

I think we've finally got it right now. I've been exchanging emails ad nauseum with Igor and I believe the behavior now works properly (part of the exchange was regarding PHP+basic http auth, there were some bugs around the regexps/nested location blocks or something)

Anyway, here is a perfect working example, running this very website:

server {
   listen 80;
   server_name michaelshadle.com;
   index index.php;
   root /home/mike/web/michaelshadle.com/;
   include /etc/nginx/defaults.conf;
   include /etc/nginx/expires.conf;
   error_page 404 = /wordpress/index.php?q=$request_uri;
   location ^~ /wordpress/wp-admin {
      auth_basic "wordpress";
      auth_basic_user_file /home/mike/web/michaelshadle.com/.htpasswd;
      location ~ \.php$ {
         fastcgi_pass 127.0.0.1:11000;
      }
   }
   location ~ \.php$ {
      fastcgi_pass 127.0.0.1:11000;
   }
}

Likewise, you can also omit the basic HTTP authentication if you don't think you need it:

server {
   listen 80;
   server_name michaelshadle.com;
   index index.php;
   root /home/mike/web/michaelshadle.com/;
   include /etc/nginx/defaults.conf;
   include /etc/nginx/expires.conf;
   error_page 404 = /wordpress/index.php?q=$request_uri;
   location ~ \.php$ {
      fastcgi_pass 127.0.0.1:11000;
   }
}

Note: this does require a patched version of 0.7.10 - I assume he will put these changes into 0.7.11. Some of the changes required include the basic HTTP auth stuff. Also, when using error_page 404, it generated logfile noise in the error log. That was fixed as of 0.7.9 or 0.7.10, so no longer will you receive script-handled 404's in your error log.

This should be the last and final need to run WordPress properly under nginx. This has the approval of Igor, the creator - you cannot get better than that. (Note: this is WordPress 2.6.1, but I have not seen any reason the rewrite rule would be different since even before 2.x)

Let me know if it doesn't work! Or better yet, let the nginx mailing list know :)

Categories: nginx, WordPress

Simple WordPress hack - redirect the index.php!

August 16th, 2008 No comments

I don't know how or when, but I wound up getting indexed with index.php in some of my URLs.

For some reason, WordPress hasn't decided that they should parse and remove that. So for the interim, I've decided to finally throw in a couple quick lines of code to do the trick. Throw it in any plugin you want or make a standalone file, it works fine with my 2.6.1 so far.

add_action('init', 'chop_index');

function chop_index() {
   if(preg_match('/index\.php$/', $_SERVER['REQUEST_URI'])) {
        $url = preg_replace('/index\.php$/', '/', $_SERVER['REQUEST_URI']);
        $url = preg_replace('/\/\/$/', '/', $url);
        header("Location: http://".$_SERVER['HTTP_HOST'].$url, true, 301);
        exit();
   }
}
Categories: WordPress

Netflix - the company that actually gives a crap

August 13th, 2008 No comments

You know, Netflix is neat. Not only is their website pretty simple and clean, the company itself seems to "do right" by its customers too. Today is the second time they've made me feel warm and fuzzy being a customer. The first time was when they decided to change their rate plans to be cheaper and automatically started charging me less. A lot of companies drop the price on their monthly rates and you have to re-signup or call and complain, etc...

Dear Michael,

Great news! We're lowering the price of your 3 DVDs out at-a-time plan to $16.99 a month plus applicable taxes. Now you can enjoy Netflix for less!

You don't need to do a thing - except pay less. Your membership will automatically move to the lower price and be reflected in your Membership Terms and Details. The lower price will take effect beginning with your statement on or after July 23, 2007.

Membership Terms and Details: http://www.netflix.com/Terms

Your $16.99 plan not only gives you 3 DVDs out at-a-time but you can also watch 17 hours of movies and TV episodes instantly on your PC each month - for no additional charge.

Check it out: http://www.netflix.com/WatchNow

Enjoy!
Your friends at Netflix

Now that is cool. But then they've given me another reason to praise them. They're reporting disc shipment issues, and being nice about it, they've just decided to issue a credit since we're technically losing a few days of service. They didn't have to do that, but they did.

I just received this email today:

We're Sorry DVD Shipments Are Delayed

Dear Michael,

Our shipping system is unexpectedly down. We received a DVD back from you and should have shipped you a DVD, but we likely have not. Our goal is to ship DVDs as soon as possible, and we will keep you posted on the status of your DVD shipments.

We are sorry for any inconvenience this has caused. If your DVD shipment is delayed, we will be issuing a credit to your account in the next few days. You don't need to do anything. The credit will be automatically applied to your next billing statement.

Again, we apologize for the delay and thank you for your understanding. If you need further assistance, please call us at 1-888-638-3549.

-The Netflix Team

Categories: Consumerism