OPTEN, das einzige Umbraco-zertifizierte Unternehmen der Schweiz

Umbraco + Varnish = ♥

German Version

 

I have seen that in most Varnish Configuration Language-Files (vcl for short) there are fixed caching expiration times. It seems that they wanted to have performance as fast as possible and therefore set a fixed expiration time for the invalidation.

Lets say you want to cache each page/file in Varnish for one hour:

sub vcl_fetch {
      set beresp.ttl = 1h;
      return (deliver);
}

Varnish will now cache all pages or files which are in vcl_fetch() for an hour.

If you want to exclude a page, you have to ignore the URL within the vcl-File:

if (req.url ~ "/liveticker”) {
      return (pass);
}

This means, if the site contains /liveticker (/liveticker, /liveticker/, /liveticker?success=false) it will
be ignored by Varnish.

This is not wrong – but I want to show you how Umbraco and Varnish could work together to achieve higher performance.

Varnish is genius. Varnish can calculate the expiration time from the standardized HTTP-Cache-Header like Expires and Cache-Control. Therefore you can remove the beresp.ttl = 1h from the vcl_fetch and add Cache-Control: max-age(3600) to each request.

Well... how you can do that with Umbraco?

The Application sets the Cache-Control for each (content) request to one hour:

protected override void ApplicationStarted(...)
{
      // Add Cache-Control HTTP header
      PublishedContentRequest.Prepared += PublishedContentRequest_Prepared;
}

void PublishedContentRequest_Prepared(object sender, EventArgs e)
{
      PublishedContentRequest request = sender as PublishedContentRequest;

      if (request == null || request.HasPublishedContent == false) return;

      HttpContext = HttpContext.Current;

      if (httpContext == null) return;

      HttpResponse response = httpContext.Response;

      int maxAge = 3600; // 1 hour

      response.Cache.SetCacheability(HttpCacheability.Public);
      response.Cache.SetExpires(DateTime.Now.AddSeconds(maxAge));
      response.Cache.SetMaxAge(new TimeSpan(0, 0, maxAge));
}

Honestly it's not really different yet – only that the Application now decides when the cache gets invalidated (with this change it's possible to use the same vcl-File for different applications).

How about setting the cache expiration for each page in the CMS? Sounds nice, doesn't it?

We add a Numeric-Property varnishCacheControlMaxAge to the Document Type. And in my opinion the default maxAge (e.g. 3600) should be editable through the web.config.

int maxAge;
int.TryParse(ConfigurationManager.AppSettings["varnish:maxAge"], out maxAge);

IPublishedContent content = request.PublishedContent;

if (content.HasProperty("varnishCacheControlMaxAge") &&
    content.HasValue("varnishCacheControlMaxAge"))
{
      maxAge = content.GetPropertyValue<int>("varnishCacheControlMaxAge");
}

// Only set cache headers if we want to cache the page
if (maxAge > 0)
{
      response.Cache.SetCacheability(HttpCacheability.Public);
      response.Cache.SetExpires(DateTime.Now.AddSeconds(maxAge));
      response.Cache.SetMaxAge(new TimeSpan(0, 0, maxAge));
}

With this change it's easy to disable the caching for the Liveticker-Page (which I showed above) by setting the property varnishCacheControlMaxAge to 0. Thus the if (reql.url ~ "/liveticker") line can be removed completly from the vcl-File.

Backend

I want to see my changes ASAP! 

No problem! There are smart bans to invalidate the cache.

The application just has to send a BAN request (like GET and POST) to the Varinsh Server. And we want do that when something gets published.

And how does Varnish know, which page has to be invalidated?
This is simple: By sending the ID of the page in the HTTP Header to Varnish and letting him cache it.

(It's important that you know, that Varnish not only caches the page itself, it caches the cookies (if not removed), the HTTP Headers and more).

Therefore we can send the ID within the BAN request and Varnish invalidates all pages with this ID.

To achieve that, we have to change the request so the ID is sent as well. I just change the method which I used before for sending the Cache-Control Header.

void PublishedContentRequest_Prepared(object sender, EventArgs e)
{
      
      // Only set cache headers if we want to cache the page
      if (maxAge > 0)
      {
            //TODO: Umbraco 7.3.x add GUID?
            response.Headers.Add("Umb-PageId", content.Id.ToString());
            
      }
}

Next we have to implement the Event which gets fired when something is published.

protected override void ApplicationStarted(...)
{
      
      // Issue BAN for content when published
      ContentService.Published += ContentService_Published;
}

void ContentService_Published(IPublishingStrategy sender, PublishEventArgs<IContent> e)
{
      try
      {
            foreach (IContent content in e.PublishedEntities)
            {
                  if (content.HasProperty("varnishInvalidateCacheOnPublish") &&
                      content.GetValue<bool>("varnishInvalidateCacheOnPublish"))
                  {
                        //TODO: This could be in a separate class/method
                        using (HttpClient = new HttpClient())
                        {
                              httpClient.BaseAddress = new Uri("http://localhost"); //TODO: Get from web.config?
                              httpClient.DefaultRequestHeaders.Clear();
                              httpClient.DefaultRequestHeaders.Add("Varnish-Ban-Umb-PageId", content.Id.ToString());

                              HttpMethod method = new HttpMethod("BAN");
                              HttpRequestMessage request = new HttpRequestMessage(method, httpClient.BaseAddress);

                              await httpClient.SendAsync(request);
                        }
                  }
            }
      }
      catch (Exception ex)
      {
            LogHelper.Error<Startup>("Error issuing BAN to Varnish.", ex);
      }
}

The varnishInvalidateCacheOnPublish-property is just a True/false datatype. With this property I want to achieve, that only the selected pages send a BAN request to Varnish (e.g. News which gets changed a lot).

Finally a short change in the vcl-File:

sub vcl_recv {
      # Catch BAN Command
      if (req.request == "BAN" && client.ip ~ ban) {

            if (req.http.Varnish-Ban-Umb-PageId) {
                  ban("obj.http.Umb-PageId == " + req.http.Varnish-Ban-Umb-PageId);
                  error 200 "Banned Umbraco Page " + req.http.Varnish-Ban-Umb-PageId;
            }
      }
}

While we send the Umb-PageId for each site in the HTTP Header (if maxAge > 0), we can polish the vcl-File.

sub vcl_fetch {
      …

      # Cache static Pages
      if (beresp.http.Umb-PageId) {
            unset beresp.http.Set-Cookie;
            return (deliver);
      }

      # do not cache everything else
      return (hit_for_pass);
}

sub vcl_deliver {
      # Expires Header set by Umbraco are used to define Varnish caching only
      # therefore do not send them to the Client
      if (resp.http.Umb-PageId) {
            unset resp.http.expires;
            unset resp.http.pragma;
            unset resp.http.cache-control;
            unset resp.http.Etag;
      }

      # smart Ban related
      unset resp.http.Umb-PageId;
      
      return (deliver);
}

Now a page only gets put in the vcl_deliver when it has the Umb-PageId in the HTTP Header (except for
everything which is caught earlier, like Images). Then in the vcl_deliver all cache headers get
removed, because the client (browser) does not need them anymore, because Varnish is now responsible
for the caching. The Umb-PageId gets also removed because its not needed.

Here you get more informations about BANs.

Umbraco API- and Surface-Controller

They probably get ignored by the resp.http.Umb-PageId but Varnish should also cache them!

I solved this problem with following solution (maybe you know a better one?):

public class TestApiController : UmbracoApiController {

      [VarnishCacheOutput(ClientTimeSpan = 3600)]
      public string Get() {
      
      }
}

This will add some HTTP Headers to the response when the action gets executed.

public override void OnActionExecuting(HttpActionContext actionContext)
{
      // This is for Varnish resp.http.Varnish-Cache-Output
      response.Headers.Add("Varnish-Cache-Output", "true"); //TODO: Is there a better name?
      response.Headers.CacheControl = new CacheControlHeaderValue
      {
            MaxAge = new TimeSpan(0, 0, ClientTimeSpan),
            MustRevalidate = false, // Could be defined through property
            Private = false // Could be defined through property
      };
}

In order for this to work you have to update following lines in the vcl-File:

sub vcl_fetch {

      
      # Cache static Pages
      if (beresp.http.Umb-PageId || beresp.http.Varnish-Cache-Output) {
            unset beresp.http.Set-Cookie;
            return (deliver);
      }

      # do not cache everything else
      return (hit_for_pass);
}

sub vcl_deliver {
      # Expires Header set by Umbraco are used to define Varnish caching only
      # therefore do not send them to the Client
      if (resp.http.Umb-PageId || resp.http.Varnish-Cache-Output) {
            unset resp.http.expires;
            unset resp.http.pragma;
            unset resp.http.cache-control;
            unset resp.http.Etag;
      }

      # smart Ban related
      unset resp.http.Umb-PageId;
      unset resp.http.Varnish-Cache-Output;

      return (deliver);
}

With this attribute you can decide which Controller gets cached and when the cache gets invalidated.

If you have a multilingual site you have to check that the URLs are different. Otherwise the same result gets cached from Varnish. Normally you have to add a query string like &language=de and &language=en (you can name it like you want).

Backend Healthy

They're not configured in the most installations!

If they're configured, Varnish can deliver the pages, even if the IIS or MSSQL is down (but only if the page went to the cache before the invalidation).

These checks have to be simple calls. If the check doesn't answer, Varnish knows that the backend isn't healthy anymore.

Important: These calls have to answer with status code 200. E.g. you can't have calls which do redirects, because they answer with 301 or 302.

[PluginController("Varnish")]
public sealed class HealthApiController : UmbracoApiController
{
      public string ApplicationIsRunning()
      {
            // Just a simple return, because if Varnish do not get anything, it know the Server is not healthy.
            return "Application is running and healthy.";
      }

      public bool DatabaseCanConnect()
      {
            // If we cannot connect, Varnish should hold the cache longer until the database is healthy.
            if (base.DatabaseContext == null) return false;
            return base.DatabaseContext.CanConnect;
      }
}

We just add them to the vcl-File:

backend default {
      .host = "127.0.0.1";
      .probe = {
            .url = "/Umbraco/Varnish/HealthApi/ApplicationIsRunning ";
            .interval = 1s;
            .timeout = 50 ms;
            .window = 5;
            .threshold = 3;
      }
      .probe = {
            .url = "/Umbraco/Varnish/HealthApi/DatabaseCanConnect";
            .interval = 60s;
            .timeout = 1 s;
            .window = 5;
            .threshold = 3;
      }
}

ApplicationIsRunning() is polled every second. The timeout must be 50ms or less, this gets
checked 5 times and has to work 3 times.

Here is more information about Health Checks.

Care about the smallest file

Basically you can configure it so that everyting in the /images/-Folder (or /img/) is cached for a year – normally they do not change anymore. The same for fonts/ (or /font/ unless you use something like Google Fonts).

<location path="Images">
      <system.webServer>
            <staticContent>
                  <clientCache cacheControlCustom="public" cacheControlMode="UseMaxAge"
                                       cacheControlMaxAge="24:00:00" />
            </staticContent>
      </system.webServer>
</location>
<location path="Fonts">
      <system.webServer>
            <staticContent>
                  <clientCache cacheControlCustom="public" cacheControlMode="UseMaxAge"
                                       cacheControlMaxAge="24:00:00" />
            </staticContent>
      </system.webServer>
</location>

The requests to the IIS are dramatically reduced (even though they're just Static Files).

Another important thing to know: If you use the Bundle & Minification from ASP.NET, you can have CSS- and JavaScript-Files without an extension:

bundles.Add(new ScriptBundle("~/bundles/jquery").Include("~/Scripts/jquery-{version}.js"));

In your HTML you include the script like: <script src=“/bundles/jquery“></script>

Most likely Varnish will not handle this optimally. Therefore I would do the following change:

sub vcl_fetch {
      

      # Cache static files
      if (req.url ~
          "^[^?]*\.(css|js|htc|txt|swf|flv|pdf|gif|jpe?g|png|ico|woff|ttf|eot|otf|xml|md5|json)($|\?)") {
            return (deliver);
      }

      # Catch static files generated from System.Web.Optimization (because these could have no extension)
      if (req.url ~ "(?i)^/scripts/" || req.url ~ "(?i)^/bundles/" || req.url ~ "(?i)^/css/") {
            return (deliver);
      }

      
}

(?i) means case insensitive

This command is very useful. It lets you know which URL is missed in the cache:

varnishlog -m "VCL_call:miss" | grep "RxURL"

(This is also the case for the first call). This list should be as small as possible.

Epilogue

In my opinion Varnish should have less logic and less responsibilities – it should only deliver cached pages/files. I've also seen that some sites handle 404 and 500er errors with Varnish. The vcl-File can get really nasty and confusing especially if you use magic regex.

Furthermore every URL should go through Varnish. As an example: We had an Application which delivered the page very fase, but some modules which are APIs (UmbracoApiController) were extremely slow. This was because these APIs weren't cached from Varnish and caused a high CPU workload on the Windows Server. As soon as the APIs and other small Files (mostly JSON, JavaScript and CSS) also went through Varnish the CPU was almost bored.

Useful Links:


kommentieren


0 Kommentar(e):