Commit a95c1435 authored by Per Andreas Buer's avatar Per Andreas Buer

New tutorial

git-svn-id: http://www.varnish-cache.org/svn/trunk/varnish-cache@4792 d4fa192b-c00b-0410-8231-f00ffab90ce4
parent 07384f36
Advanced Backend configuration
----------------
At some point you might need Varnish to cache content from several
servers. You might want Varnish to map all the URL into one single
host or not. There are lot of options.
Lets say we need to introduce a Java application into out PHP web
site. Lets say our Java application should handle URL beginning with
/java/.
We manage to get the thing up and running on port 8000. Now, lets have
a look a default.vcl.
backend default {
.host = "127.0.0.1";
.port = "8080";
}
We add a new backend.
backend java {
.host = "127.0.0.1";
.port = "8000";
}
Now we need tell where to send the difference URL. Lets look at vcl_recv:
sub vcl_recv {
if (req.url ~ "^/java/") {
set req.backend = java;
} else {
set req.backend = default.
}
}
It's quite simple, really. Lets stop and think about this for a
moment. As you can see you can define how you choose backends based on
really arbitrary data. You want to send mobile devices to a different
backend? No problem. if (req.User-agent ~ /mobile/) .... should do the
trick.
Directors
~~~~~~~~~~
You can also group several backend into a group of backends. These
groups are called directors. This will give you increased performance
and resillience. You can define several backends and group them
together in a director.::
backend server1 {
.host = "192.168.0.10";
}
::
backend server2{
.host = "192.168.0.10";
}
Now we create the director.::
director example_director round-robin {
{
.backend = server1;
}
# server2
{
.backend = server2;
}
# foo
}
This director is a round-robin director. This means the director will
distribute the incomming requests on a round-robin basis. There is
also a *random* director which distributes requests in a, you guessed
it, random fashion.
But what if one of your servers goes down? Can Varnish direct all the
requests to the healthy server? Sure it can. This is where the Health
Checks come into play.
Health checks
~~~~~~~~~~~~
Lets set up a director with two backends and health checks. First lets
define the backends.::
backend server1 {
.host = "server1.example.com";
.probe = {
.url = "/";
.interval = 5s;
.timeout = 1 s;
.window = 5;
.threshold = 3;
}
}
::
backend server2 {
.host = "server2.example.com";
.probe = {
.url = "/";
.interval = 5s;
.timeout = 1 s;
.window = 5;
.threshold = 3;
}
}
Whats new here is the probe. Varnish will check the health of each
backend with a probe. The options are
url
What URL should varnish request.
interval
How often should we poll
timeout
What is the timeout of the probe
window
Varnish will maintain a *sliding window* of the results. Here the
window has five checks.
threshold
How many of the .window last polls must be good for the backend to be declared healthy.
XXX: Ref to reference guide.
Now we define the director.::
director example_director round-robin {
{
.backend = server1;
}
# server2
{
.backend = server2;
}
}
You use this director just as you would use any other director or
backend. Varnish will not send traffic to hosts that are marked as
unhealty.
.. _tutorial-backend_servers:
Backend servers
---------------
Varnish has a concept of "backend" or "origin" servers. A backend
server is the server providing the content Varnish will accelerate.
Our first task is to tell Varnish where it can find its content. Start
your favorite text editor and open the varnish default configuration
file. If you installed from source this is
/usr/local/etc/varnish/default.vcl, if you installed from a package it
is probably /etc/varnish/default.vcl.
Somewhere in the top there will be a section that looks a bit like this.::
# backend default {
# .host = "127.0.0.1";
# .port = "8080";
# }
We comment in this bit of text and change the port setting from 8080
to 80, making the text look like.::
backend default {
.host = "127.0.0.1";
.port = "80";
}
Now, this piece of configuration defines a backend in Varnish called
*default*. When Varnish needs to get content from this backend it will
connect to port 80 on localhost (127.0.0.1).
Varnish can have several backends defined and can you can even join
several backends together into clusters of backends for load balancing
purposes.
Now that we have the basic Varnish configuration done, let us start up
Varnish on port 8080 so we can do some fundamental testing on it.
Achieving a high hitrate
========================
Now that Varnish is up and running, and you can access your web
application through Varnish we probably need to do some changes to
either the configuration or the application so you'll get a high
hitrate in varnish.
HTTP Headers
------------
Cache-control
~~~~~~~~~~~~~
Cookies
~~~~~~~
Vary
~~~~
Authentication
~~~~~~~~~~~~~~
Normalizing your namespace
--------------------------
.. _tutorial-increasing_your_hitrate-purging:
Purging
-------
HTTP Purges
~~~~~~~~~~~
Bans
~~~~
.. _Tutorial:
.. _tutorial-index:
%%%%%%%%%%%%%%%%
Varnish Tutorial
%%%%%%%%%%%%%%%%
Welcome to the Varnish Tutorial, we hope this will help you get to
know and understand Varnish.
.. toctree::
intro.rst
tut001.rst
tut002.rst
backend_servers.rst
starting_varnish.rst
logging.rst
putting_varnish_on_port_80.rst
vcl.rst
statistics.rst
increasing_your_hitrate.rst
advanced_backend_servers.rst
troubleshooting.rst
.. todo::
starting varnish with -d, seeing a transaction go through
......
......@@ -2,50 +2,20 @@
Introduction
%%%%%%%%%%%%
Most tutorials are written in "subject-order", as the old Peanuts
strip goes::
Varnish is a web accelerator. It is installed in frond of your web
application and it caches the reponses, making your web site run Varnish
is fast, flexible and easy to use.
Jogging: A Handbook
Author: S. Noopy
Chapter 1: Left foot
It was a dark and stormy night...
This tutorial does not go through every bit of functionality Varnish
has. It will give you a good overview of what Varnish does and how it
is done.
This is great when the reader has no choice, and nothing better to
do, but read the entire document before starting.
We assume you have a web server and a web application up and running
and that you want to accelerate this application with Varnish.
We have taken the other approach: "breadth-first", because experience
has shown us that Varnish users wants to get things running, and then
polish up things later on.
Furthermore we assume you have read the :ref:`Installation` and that
it is installed with the default configuration.
With that in mind, we have written the tutorial so you can break off,
as Calvin tells Ms. Wormwood, "when my brain is full for today", and
come back later and learn more.
That also means that right from the start, we will have several
things going on in parallel and you will need at least four, sometimes
more, terminal windows at the same time, to run the examples.
A word about TCP ports
----------------------
We have subverted our custom built regression test tool, a program
called ```varnishtest``` to help you get through these examples,
without having to waste too much time setting up webservers as
backends or browsers as clients, to drive the examples.
But there is one complication we can not escape: TCP port numbers.
Each of the backends we simulate and the varnishd instances we run
needs a TCP port number to listen to and it is your job to find them,
because we have no idea what servers are running on your computer
nor what TCP ports they use.
To make this as easy as possible, we have implemented a ```-L
number``` argument to all the varnish programs, which puts them in
"Learner" mode, and in all the examples we have used 20000 as
the number, because on most systems the middle of the range
(1000...65000) is usually not used.
If these ports are in use on your system (is your colleague also running
the Varnish tutorial ?) simply pick another number and use that
instead of 20000.
Logging in Varnish
------------------
One of the really nice features in Varnish is how logging
works. Instead of logging to normal log file Varnish logs to a shared
memory segment. When the end of the segment is reached we start over,
overwriting old data. This is much, much faster then logging to a file
and it doesn't require disk space.
The flip side is that is you forget to have program actually write the
logs to disk they will disappear.
varnishlog is one of the programs you can use to look at what Varnish
is logging. Varnishlog gives you the raw logs, everything that is
written to the logs. There are other clients as well, we'll show you
these later.
In the terminal window you started varnish now type *varnishlog* and
press enter.
You'll see lines like these scrolling slowly by.::
0 CLI - Rd ping
0 CLI - Wr 200 PONG 1273698726 1.0
These is the Varnish master process checking up on the caching process
to see that everything is OK.
Now go to the browser and reload the page displaying your web
app. You'll see lines like these.::
11 SessionOpen c 127.0.0.1 58912 0.0.0.0:8080
11 ReqStart c 127.0.0.1 58912 595005213
11 RxRequest c GET
11 RxURL c /
11 RxProtocol c HTTP/1.1
11 RxHeader c Host: localhost:8080
11 RxHeader c Connection: keep-alive
The first column is an arbitrary number, it defines the request. Lines
with the same number are part of the same HTTP transaction. The second
column is the *tag* of the log message. All log entries are tagged
with a tag indicating what sort of activity is beeing logged. Tags
starting with Rx indicate Varnish is recieving data and Tx indicates
sending data.
The third column tell us whether this is is data comming or going to
the client (c) or to/from the backend (b). The forth column is the
data being logged.
Now, you can filter quite a bit with varnishlog. The basics option you
want to know are:
-b
Only show log lines from traffic going between Varnish and the backend
servers. This will be useful when we want to optimize cache hit rates.
-c
Same as -b but for client side traffic.
-i tag
Only show lines with a certain tag. "varnishlog -i SessionOpen"
will only give you new sessions.
-I Regex
Filter the data through a regex and only show the matching lines. To
show all cookie headers coming from the clients:
``$ varnishlog -c -i RxHeader -I Cookie``
-o
Group log entries by request ID.
Now that Varnish seem to work OK its time to put Varnish on port 80
while we tune it.
Put Varnish on port 80
----------------------
If your application works OK we can now switch the ports so Varnish
will listen to port 80. Kill varnish.::
# pkill varnishd
and stop your web server. Edit the configuration for your web server
and make it bind to port 8080 instead of 80. Now open the Varnish
default.vcl and change the port of the default backend to 8080.
Start up your web server and then start varnish.::
# varnishd -f /usr/local/etc/varnish/default.vcl -s malloc,1G -T 127.0.0.1:2000
We're removed the -a option. Now Varnish will bind to the http port as
it is its default. Now try your web application and see if it works
OK.
.. _tutorial-starting_varnish:
Starting Varnish
----------------
I assume varnishd is in your path. You might want to run ``pkill
varnishd`` to make sure Varnish isn't running. Become root and type:
``# varnishd -f /usr/local/etc/varnish/default.vcl -s malloc,1G -T 127.0.0.1:2000 -a 0.0.0.0:8080``
I added a few options, lets go through them:
``-f /usr/local/etc/varnish/default.vcl``
The -f options specifies what configuration varnishd should use.
``-s malloc,1G``
The -s options chooses the storage type Varnish should use for
storing its content. I used the type *malloc*, which just uses memory
for storage. There are other backends as well, described in
:ref:tutorial-storage. 1G specifies how much memory should be allocated
- one gigabyte.
``-T 127.0.0.1:2000``
Varnish has a buildt in text-based administration
interface. Activating the interface makes Varnish manageble without
stopping it. You can specify what interface the management interface
should listen to. Make sure you don't expose the management interface
to the world as you can easily gain root access to a system via the
Varnish management interace. I recommend tieing it to localhost. If
you have users on your system that you don't fully trust use firewall
rules to restrict access to the interace to root only.
``-a 0.0.0.0:8080``
I specify that I want Varnish to listen on port 8080 for incomming
HTTP requests. For a production environment you would probably make
Varnish listen on port 80, which is the default.
Now you have Varnish running. Let us make sure that it works
properly. Use your browser to go to http://192.168.2.2:8080/ - you
should now see your web application running there.
Lets make sure that Varnish really does do something to your web
site. To do that we'll take a look at the logs.
.. _tutorial-statistics:
Statistics
----------
Now that your varnish is up and running lets have a look at how it is
doing. There are several tools that can help.
varnishtop
==========
The varnishtop utility reads the shared memory logs and presents a
continuously updated list of the most commonly occurring log entries.
With suitable filtering using the -I, -i, -X and -x options, it can be
used to display a ranking of requested documents, clients, user
agents, or any other information which is recorded in the log.
XXX Show some nice examples here.
varnishhist
===========
The varnishhist utility reads varnishd(1) shared memory logs and
presents a continuously updated histogram showing the distribution of
the last N requests by their processing. The value of N and the
vertical scale are displayed in the top left corner. The horizontal
scale is logarithmic. Hits are marked with a pipe character ("|"),
and misses are marked with a hash character ("#").
varnishsizes
============
Varnishsizes does the same as varnishhist, except it shows the size of
the objects and not the time take to complete the request. This gives
you a good overview of how big the objects you are serving are.
varnishstat
===========
Varnish has lots of counters. We count misses, hits, information about
the storage, threads created, deleted objects. Just about
everything. varnishstat will dump these counters. This is useful when
tuning varnish.
There are programs that can poll varnishstat regularly and make nice
graphs of these counters. One such program is Munin. Munin can be
found at http://munin-monitoring.org/ . There is a plugin for munin in
the varnish source code.
Troubleshooting Varnish
=======================
When Varnish won't start
-----------------------
Sometimes Varnish wont start. There is a pletphora of reasons why
Varnish wont start on your machine. We've seen everything from wrong
permissions on /dev/null to other processses blocking the ports.
Starting Varnish in debug mode to see what is going on.
Try to start varnish by::
# varnishd -f /usr/local/etc/varnish/default.vcl -s malloc,1G -T 127.0.0.1:2000 -a 0.0.0.0:8080 -d
Notice the -d option. It will give you some more information on what
is going on. Let us see how Varnish will react to something else
listening on its port.::
# varnishd -n foo -f /usr/local/etc/varnish/default.vcl -s malloc,1G -T 127.0.0.1:2000 -a 0.0.0.0:8080 -d
storage_malloc: max size 1024 MB.
Using old SHMFILE
Platform: Linux,2.6.32-21-generic,i686,-smalloc,-hcritbit
200 193
-----------------------------
Varnish HTTP accelerator CLI.
-----------------------------
Type 'help' for command list.
Type 'quit' to close CLI session.
Type 'start' to launch worker process.
Now Varnish is running. Only the master process is running, in debug
mode the cache does not start. Now you're on the console. You can
instruct the master process to start the cache by issuing "start".::
start
bind(): Address already in use
300 22
Could not open sockets
And here we have our problem. Something else is bound to the HTTP port
of Varnish.
Varnish is crashing
-------------------
When varnish goes bust.
.. _TUT001:
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
TUT001: Let's see if varnishtest is working
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
In the first window, start this command::
varnishtest -L20000 S001.vtc
Then in the second window, run this one::
varnishtest -L20000 C001.vtc
If things work as they should, both programs spit out 5 lines, looking
like this in the first window::
* top TEST S001.vtc starting
** s1 Started on 127.0.0.1:20012
** s1 Ending
* top RESETTING after S001.vtc
* top TEST S001.vtc completed
and like this in the second window::
* top TEST C001.vtc starting
** c1 Starting client
** c1 Ending
* top RESETTING after C001.vtc
* top TEST C001.vtc completed
If that did not work, please consult the XXX: troubleshooting section.
Now, try again, but this time run the client with a couple of ``-v``
arguments::
varnishtest -vv -L20000 C001.vtc
Now the output contains a lot mode detail::
* top TEST C001.vtc starting
*** top client
** c1 Starting client
*** c1 Connect to 127.0.0.1:20012
*** c1 connected fd 3
*** c1 txreq
**** c1 txreq| GET / HTTP/1.1\r\n
**** c1 txreq| \r\n
*** c1 rxresp
**** c1 rxhdr| HTTP/1.1 200 Ok\r\n
**** c1 rxhdr| Foo: bar\r\n
**** c1 rxhdr| Content-Length: 12\r\n
**** c1 rxhdr| \r\n
**** c1 http[ 0] | HTTP/1.1
**** c1 http[ 1] | 200
**** c1 http[ 2] | Ok
**** c1 http[ 3] | Foo: bar
**** c1 http[ 4] | Content-Length: 12
**** c1 body| Hello World!
**** c1 bodylen = 12
*** c1 closing fd 3
** c1 Ending
* top RESETTING after C001.vtc
* top TEST C001.vtc completed
First the client does a ``txreq`` -- "transmit request", and you can
see the HTTP request it sends to the server, a plain boring "GET".
Then it does a ``rxresp`` -- "receive response", and we can see the
HTTP response we get back, including the HTTP object body.
Now try again, this time running the server in the first window with
``-vv``.
If things do not work the way you expect, adding those ``-v`` options
is a good place to start debugging.
Next we put a Varnish cache between the server and the client: _TUT002
.. _TUT002:
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
TUT002: Caching an object with varnish
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Now it is time to loose your viginity and cache an object with ``varnishd``.
In the first window, start this command::
varnishd -L20000 -t30
You should see something like this::
storage_malloc: max size 1 MB.
Using old SHMFILE
Platform: FreeBSD,9.0-CURRENT,amd64,-smalloc,-hcritbit
200 193
-----------------------------
Varnish HTTP accelerator CLI.
-----------------------------
Type 'help' for command list.
Type 'quit' to close CLI session.
Type 'start' to launch worker process.
We will explain this stuff later, for now just type ``start`` and you should
see something like::
child (88590) Started
200 0
Child (88590) said
Child (88590) said Child starts
Next, start a backend in a different window::
varnishtest -L20000 S002.vtc
And finally, send a request from a third window::
varnishtest -L20000 -vv C002.vtc
You will notice that both the backend and client windows both do
their thing, and exit back to the shell prompt.
In the client window you will have a line like::
**** c1 http[ 6] | X-Varnish: 17443679
This is Varnish telling you that it was involved. (The exact number will
be different for you, it is just a transaction-id.)
Now, try running *only* the client command again::
varnishtest -L20000 -vv C002.vtc
Tada! You have just received a cache-hit from varnishd.
This time the ``X-Varnish`` line will have two numbers::
**** c1 http[ 6] | X-Varnish: 17443680 17443679
The first number is the XID for this request, the second one is the
XID that brought this object into varnishd's cache, it matches the
number you saw above.
If you run the client again, you will see::
**** c1 http[ 6] | X-Varnish: 17443681 17443679
You can keep running the client and you will get hits, but
30 seconds after the object was put in the cache, ``varnishd``
will expire it, and then you will get varnish "Guru meditation"
message, because the backend does not respond::
**** c1 body| <head>\n
**** c1 body| <title>503 Service Unavailable</title>\n
**** c1 body| </head>\n
**** c1 body| <body>\n
**** c1 body| <h1>Error 503 Service Unavailable</h1>\n
**** c1 body| <p>Service Unavailable</p>\n
**** c1 body| <h3>Guru Meditation:</h3>\n
**** c1 body| <p>XID: 1685367940</p>\n
**** c1 body| <hr>\n
If you start the backend again:
varnishtest -L20000 S002.vtc
Then you can fetch the object for another 30 seconds.
Varnish Configuration Language - VCL
==============================
How ordinary configuration files work
---------------------------------
Varnish has a really neat configuration system. Most other systems use
configuration directives, where you basically turn on and off a bunch
of switches.
A very common thing to do in Varnish is to override the cache headers
from our backend. Lets see how this looks in Squid, which has a
standard configuration.::
refresh_pattern ^http://images. 3600 20% 7200
refresh_pattern -i (/cgi-bin/|\?) 0 0% 0
refresh_pattern -i (/\.jpg) 1800 10% 3600 override-expire
refresh_pattern . 0 20% 4320
If you are familiar with squid that probably made sense to you. But
lets point out a few weaknesses with this model.
1) It's not intuitive. You can guess what the options mean, and you
can (and should) document it in your configuration file.
2) Which rules have precedence? Does the last rule to match stick? Or
the first? Or does Squid try to combine all the matching rules. I
actually don't know.
Enter VCL
---------
Now enter Varnish. Varnish takes your configuration file and
translates it to C code, then runs it through a compiler and loads
it. When requests come along varnish just executes the relevant
subroutines of the configuration at the relevant times.
Varnish will execute these subroutines of code at different stages of
its work. Since its code it's execute line by line and precedence
isn't a problem.
99% of all the changes you'll need to do will be done in two of these
subroutines.
vcl_recv
~~~~~~~~
vcl_recv (yes, we're skimpy with characters, it's Unix) is called at
the beginning of a request, after the complete request has been
received and parsed. Its purpose is to decide whether or not to serve
the request, how to do it, and, if applicable, which backend to use.
In vcl_recv you can also alter the request, dropping cookies, rewrite
headers.
vcl_fetch
~~~~~~~~~
vcl_fetch is called *after* a document has been successfully retrieved
from the backend. Normal tasks her are to alter the response headers,
trigger ESI processing, try alternate backend servers in case the
request failed.
# $Id$
client c1 -connect 12 {
txreq
rxresp
} -run
# $Id$
client c1 {
txreq
rxresp
} -run
# $Id$
server s1 -listen 2 {
rxreq
txresp -hdr "Foo: bar" -body "Hello World!"
} -start -wait
# $Id$
server s1 {
rxreq
txresp -hdr "Foo: bar" -body "Hello World!"
} -start -wait
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment