Wednesday, February 9, 2011

Bulletproof Asset Hosting and CDNs with Rails

Moved here

After setting up a CDN like CloudFront to quickly serve your Rails site's assets worldwide, you'll probably want Rails to actually link to the assets through the CDN, instead of locally. Rails makes this easy with ActionController::Base.asset_host. The easiest way to use asset_host is by putting the following code in a Rails initializer or production.rb:

ActionController::Base.asset_host = "assets.example.com"

After that, calls like image_tag("logo.png") will produce links like:

<img alt="Avvo" src="http://assets.example.com/images/logo.png?1297189653" />

Going past the basics

While this will work for very simple cases, more complicated scenarios need more code. If you put the above snippet in an initializer, you'll have to wrap it in

if Rails.env.production?
  ActionController::Base.asset_host = "assets.example.com"
end

Why would you want to do that when you could stick the code in production.rb? For our site, it's because we rely on configuration set up in other initializers, which won't be ready by the time production.rb runs.

YSlow will tell you about the next thing that needs to be done. It turns out that some browsers will only open two requests to any given hostname during a web request. In order to improve the speed at which a browser can fetch assets, you'll need to do something a little more complex with asset_host.

Giving a Proc to asset_host

Along with taking a string, ActionController::Base.asset_host can also take a proc. This gives us more flexibility when generating these hosts. We can solve the above problem by using a proc instead of a string:

if Rails.env.production?
  # Source is the asset path as a string, request is a request object
  ActionController::Base.asset_host = Proc.new do |source, request|
    # hosts can also be pulled from a config file for more flexibility
    # or skipping the CDN in staging environments
    hosts = ['http://assets1.example.com', 'http://assets2.example.com']

    # Get a semirandom numeric hash that is consistent for the same source,
    # to make sure that the same asset file always maps to the same asset host
    # (This helps the browser cache these assets)
    hash = source.hash

    # Return a semirandomly selected host from the hosts array
    hosts[hash % hosts.length]
  end
end

Dealing with secure pages

The next thing you'll notice, especially if you don't have SSL certificates for your asset hosts, is that you'll start getting warnings when accessing resources on secure pages. This can be easily fixed by adding another condition to the above proc:

if Rails.env.production?
  # Source is the asset path as a string, request is a request object
  ActionController::Base.asset_host = Proc.new do |source, request|

    if request.ssl?
      "#{request.protocol}#{request.host_with_port}"
    else
      # hosts can also be pulled from a config file for more flexibility
      # or skipping the CDN in staging environments
      hosts = ['http://assets1.example.com', 'http://assets2.example.com']
      
      # Get a semirandom numeric hash that is consistent for the same source,
      # to make sure that the same asset file always maps to the same asset host
      # (This helps the browser cache these assets)
      hash = source.hash
      
      # Return a semirandomly selected host from the hosts array
      hosts[hash % hosts.length]
    end
  end
end

It also turns out that when referring to assets in ActionMailer, you sometimes won't get a request. We need to handle that case, too:

if Rails.env.production?
  # Source is the asset path as a string, request is a request object
  ActionController::Base.asset_host = Proc.new do |source, request|

    if !request #request == false for emails, apparently
      "http://#{AppSettings.host}"
    elsif request.ssl?
      "#{request.protocol}#{request.host_with_port}"
    else
      # hosts can also be pulled from a config file for more flexibility
      # or skipping the CDN in staging environments
      hosts = ['http://assets1.example.com', 'http://assets2.example.com']
      
      # Get a semirandom numeric hash that is consistent for the same source,
      # to make sure that the same asset file always maps to the same asset host
      # (This helps the browser cache these assets)
      hash = source.hash
      
      # Return a semirandomly selected host from the hosts array
      hosts[hash % hosts.length]
    end
  end
end

Dynamic assets

There also might be 'dynamic assets' that the app might use, like captchas, images that are generated on the fly by the server, or anything else that's not stored on the filesystem. For these you might just want to use your web server instead of the asset server:

if Rails.env.production?
  # Source is the asset path as a string, request is a request object
  ActionController::Base.asset_host = Proc.new do |source, request|

    # Redirect dynamic assets to the web server
    is_asset = source.match(/images|javascripts|assets|stylesheets/)
    
    if !request #request == false for emails, apparently
      "http://#{AppSettings.host}"
    elsif request.ssl? || !is_asset
      "#{request.protocol}#{request.host_with_port}"
    else
      # hosts can also be pulled from a config file for more flexibility
      # or skipping the CDN in staging environments
      hosts = ['http://assets1.example.com', 'http://assets2.example.com']
      
      # Get a semirandom numeric hash that is consistent for the same source,
      # to make sure that the same asset file always maps to the same asset host
      # (This helps the browser cache these assets)
      hash = source.hash
      
      # Return a semirandomly selected host from the hosts array
      hosts[hash % hosts.length]
    end
  end
end

Hardcoding exceptions

Finally, you might run into some javascript libraries that don't like being loaded from a different domain than the page being loaded. You can hardcode these to be loaded from the web server instead, for the final bit of code:

if Rails.env.production?
  # Source is the asset path as a string, request is a request object
  ActionController::Base.asset_host = Proc.new do |source, request|

    # Some assets aren't happy being loaded from the assets server :-(
    borked_assets = ['tiny_mce']
    
    # Redirect dynamic assets to the web server
    is_asset = source.match(/images|javascripts|assets|stylesheets/)
    
    if !request #request == false for emails, apparently
      "http://#{AppSettings.host}"
    elsif request.ssl? || !is_asset || borked_assets.any? {|asset| source.match(asset)}
      "#{request.protocol}#{request.host_with_port}"
    else
      # hosts can also be pulled from a config file for more flexibility
      # or skipping the CDN in staging environments
      hosts = ['http://assets1.example.com', 'http://assets2.example.com']
      
      # Get a semirandom numeric hash that is consistent for the same source,
      # to make sure that the same asset file always maps to the same asset host
      # (This helps the browser cache these assets)
      hash = source.hash
      
      # Return a semirandomly selected host from the hosts array
      hosts[hash % hosts.length]
    end
  end
end

This snippet of code should handle pretty much any asset you throw at it, and is easily expandable using whatever type of environment-specific configuration you use. Give it a try, and let me know what you think.

No comments: