Commit 814ebbc9 authored by Per Andreas Buer's avatar Per Andreas Buer

Split up the hitrate chapter into four and added a introduction to ESI. ESI...

Split up the hitrate chapter into four and added a introduction to ESI. ESI needs a bit of work wrt params and operational factors.

git-svn-id: http://www.varnish-cache.org/svn/trunk/varnish-cache@5577 d4fa192b-c00b-0410-8231-f00ffab90ce4
parent 5bc87030
.. _tutorial-cookies:
Cookies
-------
Varnish will not cache a object comming from the backend with a
Set-Cookie header present. Also, if the client sends a Cookie header,
Varnish will bypass the cache and go directly to the backend.
This can be overly conservative. A lot of sites use Google Analytics
(GA) to analyse their traffic. GA sets a cookie to track you. This
cookie is used by the client side java script and is therefore of no
interest to the server.
For a lot of web application it makes sense to completly disregard the
cookies unless you are accessing a special part of the web site. This
VCL snipplet in vcl_recv will disregard cookies unless you are
accessing /admin/.::
if ( !( req.url ~ ^/admin/) ) {
unset req.http.Cookie;
}
Quite simple. If, however, you need to do something more complicated,
like removing one out of several cookies, things get
difficult. Unfornunatly Varnish doesn't have good tools for
manipulating the Cookies. We have to use regular expressions to do the
work. If you are familiar with regular expressions you'll understand
whats going on. If you don't I suggest you either pick up a book on
the subject, read through the *pcrepattern* man page or read through
one of many online guides.
Let me show you what Varnish Software uses. We use some cookies for
Google Analytics tracking and similar tools. The cookies are all set
and used by Javascript. Varnish and Drupal doesn't need to see those
cookies and since Varnish will cease caching of pages when the client
sends cookies we will discard these unnecessary cookies in VCL.
In the following VCL we discard all cookies that start with a
underscore.::
// Remove has_js and Google Analytics __* cookies.
set req.http.Cookie = regsuball(req.http.Cookie, "(^|;\s*)(_[_a-z]+|has_js)=[^;]*", "");
// Remove a ";" prefix, if present.
set req.http.Cookie = regsub(req.http.Cookie, "^;\s*", "");
Let me show you an example where we remove everything the the cookies
named COOKIE1 and COOKIE2 and you can marvel at it.::
sub vcl_recv {
if (req.http.Cookie) {
set req.http.Cookie = ";" req.http.Cookie;
set req.http.Cookie = regsuball(req.http.Cookie, "; +", ";");
set req.http.Cookie = regsuball(req.http.Cookie, ";(COOKIE1|COOKIE2)=", "; \1=");
set req.http.Cookie = regsuball(req.http.Cookie, ";[^ ][^;]*", "");
set req.http.Cookie = regsuball(req.http.Cookie, "^[; ]+|[; ]+$", "");
if (req.http.Cookie == "") {
remove req.http.Cookie;
}
}
The example is taken from the Varnish Wiki, where you can find other
scary examples of what can be done i VCL.
.. _tutorial-esi:
Edge Side Includes
------------------
*Edge Side Includes* is a language to include *fragments* of web pages
in other web pages. Think of it as HTML include statement that works
over HTTP.
On most web sites a lot of content is shared between
pages. Regenerating this content for every page view is wasteful and
ESI tries to address that lettting you decide the cache policy for
each fragment individually.
In Varnish we've only implemented a small subset of ESI. As of 2.1 we
have three ESI statements:
* esi:include
* esi:remove
* <!--esi ...-->
Content substitution based on variables and cookies is not implemented
but is on the roadmap.
Example: esi include
~~~~~~~~~~~~~~~~~~~~
Lets see an example how this could be used. This simple cgi script
outputs the date:::
#!/bin/sh
echo 'Content-type: text/html'
echo ''
date "+%Y-%m-%d %H:%M"
Now, lets have an HTML file that has an ESI include statement:::
<HTML>
<BODY>
The time is: <esi:include src="/cgi-bin/date.cgi"/>
at this very moment.
</BODY>
</HTML>
For ESI to work you need to activate ESI processing in VCL, like this:::
sub vcl_fetch {
if (req.url == "/test.html") {
esi; /* Do ESI processing */
set obj.ttl = 24 h; /* Sets the TTL on the HTML above */
} elseif (req.url == "/cgi-bin/date.cgi") {
set obj.ttl = 1m; /* Sets a one minute TTL on */
/* the included object */
}
}
Example: esi remove
~~~~~~~~~~~~~~~~~~~
The *remove* keyword allows you to remove output. You can use this to make
a fallback of sorts, when ESI is not available, like this:::
<esi:include src="http://www.example.com/ad.html"/>
<esi:remove>
<a href="http://www.example.com">www.example.com</a>
</esi:remove>
Example: <!--esi ... -->
~~~~~~~~~~~~~~~~~~~~~~~~
This is a special construct to allow HTML marked up with ESI to render
without processing. ESI Processors will remove the start ("<!--esi")
and end ("-->") when the page is processed, while still processing the
contents. If the page is not processed, it will remain, becoming an
HTML/XML comment tag. For example::
<!--esi
<p>Warning: ESI Disabled!</p>
</p> -->
This assures that the ESI markup will not interfere with the rendering
of the final HTML if not processed.
.. _tutorial-purging:
Purging and banning
-------------------
One of the most effective way of increasing your hit ratio is to
increase the time-to-live (ttl) of your objects. But, as you're aware
of, in this twitterific day of age serving content that is outdated is
bad for business.
The solution is to notify Varnish when there is fresh content
available. This can be done through two mechanisms. HTTP purging and
bans. First, let me explain the HTTP purges.
HTTP Purges
~~~~~~~~~~~
An HTTP purge is similar to a HTTP GET request, except that the
*method* is PURGE. Actually you can call the method whatever you'd
like, but most people refer to this as purging. Squid supports the
same mechanism. In order to support purging in Varnish you need the
following VCL in place:::
acl purge {
"localhost";
"192.168.55.0/24";
}
sub vcl_recv {
# allow PURGE from localhost and 192.168.55...
if (req.request == "PURGE") {
if (!client.ip ~ purge) {
error 405 "Not allowed.";
}
return (lookup);
}
}
sub vcl_hit {
if (req.request == "PURGE") {
# Note that setting ttl to 0 is magical.
# the object is zapped from cache.
set obj.ttl = 0s;
error 200 "Purged.";
}
}
sub vcl_miss {
if (req.request == "PURGE") {
error 404 "Not in cache.";
}
}
As you can see we have used to new VCL subroutines, vcl_hit and
vcl_miss. When we call lookup Varnish will try to lookup the object in
its cache. It will either hit an object or miss it and so the
corresponding subroutine is called. In vcl_hit the object that is
stored in cache is available and we can set the TTL.
So for vg.no to invalidate their front page they would call out to
Varnish like this:::
PURGE / HTTP/1.0
Host: vg.no
And Varnish would then discard the front page. If there are several
variants of the same URL in the cache however, only the matching
variant will be purged. To purge a gzip variant of the same page the
request would have to look like this:::
PURGE / HTTP/1.0
Host: vg.no
Accept-Encoding: gzip
Bans
~~~~
There is another way to invalidate content. Bans. You can think of
bans as a sort of a filter. You *ban* certain content from being
served from your cache. You can ban content based on any metadata we
have.
Support for bans is built into Varnish and available in the CLI
interface. For VG to ban every png object belonging on vg.no they could
issue:::
purge req.http.host == "vg.no" && req.http.url ~ "\.png$"
Quite powerful, really.
Bans are checked when we hit an object in the cache, but before we
deliver it. An object is only checked against newer bans. If you have
a lot of objects with long TTL in your cache you should be aware of a
potential performance impact of having many bans.
You can also add bans to Varnish via HTTP. Doing so requires a bit of VCL.::
sub vcl_recv {
if (req.request == "BAN") {
# Same ACL check as above:
if (!client.ip ~ purge) {
error 405 "Not allowed.";
}
purge("req.http.host == " req.http.host
"&& req.url == " req.url);
# Throw a synthetic page so the
# request wont go to the backend.
error 200 "Ban added"
}
}
This VCL sniplet enables Varnish to handle a HTTP BAN method. Adding a
ban on the URL, including the host part.
.. _tutorial-vary:
Vary
~~~~
The Vary header is sent by the web server to indicate what makes a
HTTP object Vary. This makes a lot of sense with headers like
Accept-Encoding. When a server issues a "Vary: Accept-Encoding" it
tells Varnish that its needs to cache a separate version for every
different Accept-Encoding that is coming from the clients. So, if a
clients only accepts gzip encoding Varnish won't serve the version of
the page encoded with the deflate encoding.
The problem is that the Accept-Encoding field contains a lot of
different encodings. If one browser sends::
Accept-Encodign: gzip,deflate
And another one sends::
Accept-Encoding:: deflate,gzip
Varnish will keep two variants of the page requested due to the
different Accept-Encoding headers. Normalizing the accept-encoding
header will sure that you have as few variants as possible. The
following VCL code will normalize the Accept-Encoding headers.::
if (req.http.Accept-Encoding) {
if (req.url ~ "\.(jpg|png|gif|gz|tgz|bz2|tbz|mp3|ogg)$") {
# No point in compressing these
remove req.http.Accept-Encoding;
} elsif (req.http.Accept-Encoding ~ "gzip") {
set req.http.Accept-Encoding = "gzip";
} elsif (req.http.Accept-Encoding ~ "deflate") {
set req.http.Accept-Encoding = "deflate";
} else {
# unkown algorithm
remove req.http.Accept-Encoding;
}
}
The code sets the Accept-Encoding header from the client to either
gzip, deflate with a preference for gzip.
Pitfall - Vary: User-Agent
~~~~~~~~~~~~~~~~~~~~~~~~~~
Some applications or application servers send *Vary: User-Agent* along
with their content. This instructs Varnish to cache a separate copy
for every variation of User-Agent there is. There are plenty. Even a
single patchlevel of the same browser will generate at least 10
different User-Agent headers based just on what operating system they
are running.
So if you *really* need to Vary based on User-Agent be sure to
normalize the header or your hit rate will suffer badly. Use the above
code as a template.
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment