Commit 6e616f98 authored by Poul-Henning Kamp's avatar Poul-Henning Kamp

Chapter 5 & 6 in the CHERI saga

parent df9bc186
.. _phk_cheri_5:
How Varnish met CHERI 5/N
=========================
Varnish Workspaces
------------------
To process a HTTP request or response, varnish must allocate bits
of memory which will only be used for the duration of that processing,
and all of it can be released back at the same time.
To avoid calling ``malloc(3)`` a lot, which comes with a locking
overhead in a heavily multithreaded process, but even more to
avoid having to keep track of all these allocations in order to be able
to ``free(3)`` them all, varnish has "workspaces":
.. code-block:: none
struct ws {
[…]
char *s; /* (S)tart of buffer */
char *f; /* (F)ree/front pointer */
char *r; /* (R)eserved length */
char *e; /* (E)nd of buffer */
};
The ``s`` pointer points at the start of a slab of memory, owned
exclusively by the current thread and ``e`` points to the end.
Initially ``f`` is the same as ``s``, but as allocations are made
from the workspace, it moves towards ``e``. The ``r`` pointer is
used to make "reservations", we will ignore that for now.
Workspaces look easy to create:
.. code-block:: none
ws->s = space;
ws->e = ws->s + len;
ws->f = ws->s;
ws->r = NULL;
… only, given the foot-shooting-abetting nature of the C language,
we have bolted on a lot of seat-belts:
.. code-block:: none
#define WS_ID_SIZE 4
struct ws {
unsigned magic;
#define WS_MAGIC 0x35fac554
char id[WS_ID_SIZE]; /* identity */
char *s; /* (S)tart of buffer */
char *f; /* (F)ree/front pointer */
char *r; /* (R)eserved length */
char *e; /* (E)nd of buffer */
};
void
WS_Init(struct ws *ws, const char *id, void *space, unsigned len)
{
unsigned l;
DSLb(DBG_WORKSPACE,
"WS_Init(%s, %p, %p, %u)", id, ws, space, len);
assert(space != NULL);
assert(PAOK(space));
INIT_OBJ(ws, WS_MAGIC);
ws->s = space;
l = PRNDDN(len - 1);
ws->e = ws->s + l;
memset(ws->e, WS_REDZONE_END, len - l);
ws->f = ws->s;
assert(id[0] & 0x20); // cheesy islower()
bstrcpy(ws->id, id);
WS_Assert(ws);
}
Let me walk you through that:
The ``DSLb()`` call can be used to trace all operations on the
workspace, so we can see what actually goes on.
(Hint: Your ``malloc(3)`` may have something similar,
look for ``utrace`` in the manual page.)
Next we check the provided space pointer is not NULL, and
that it is properly aligned, these are both following
a varnish style-pattern, to sprinkle asserts liberally,
both as code documentation, but also because it allows
the compiler to optimize things better.
The ``INIT_OBJ() and ``magic`` field is a style-pattern
we use throughout varnish: Each structure is tagged with
a unique magic, which can be used to ensure that pointers
are what we are told, when they get passed through a ``void*``.
We set the ``s`` pointer.
We calculate a length at least one byte shorter than what
we were provided, align it, and point ``e`` at that.
We fill that extraspace at and past ``e``, with a "canary" to
stochastically detect overruns. It catches most but not
all overruns.
We set the name of the workspace, ensuring it is not already
marked as overflowed.
And finally check that the resulting workspace complies with
the defined invariants, as captured in the ``WS_Assert()``
function.
With CHERI, it looks like this:
.. code-block:: none
void
WS_Init(struct ws *ws, const char *id, void *space, unsigned len)
{
unsigned l;
DSLb(DBG_WORKSPACE,
"WS_Init(%s, %p, %p, %u)", id, ws, space, len);
assert(space != NULL);
INIT_OBJ(ws, WS_MAGIC);
assert(PAOK(space));
ws->s = cheri_bounds_set(space, len);
ws->e = ws->s + len
ws->f = ws->s;
assert(id[0] & 0x20); // cheesy islower()
bstrcpy(ws->id, id);
WS_Assert(ws);
}
All the gunk to implement a canary to detect overruns went
away, because with CHERI we can restrict the ``s`` pointer so writing
outside the workspace is *by definition* impossible, as long as your
pointer is derived from ``s``.
Less memory wasted, much stronger check and more readable source-code,
what's not to like ?
When an allocation is made from the workspace, CHERI makes it possible
to restrict the returned pointer to just the allocated space:
.. code-block:: none
void *
WS_Alloc(struct ws *ws, unsigned bytes)
{
char *r;
[…]
r = ws->f;
ws->f += bytes;
return(cheri_bounds_set(r, bytes));
}
Varnish String Buffers
----------------------
Back in the mists of time, Dag-Erling Smørgrav and I designed a
safe string API called ``sbuf`` for the FreeBSD kernel.
The basic idea is you set up your buffer, you call functions to stuff
text into it, and those functions do all the hard work to ensure
you do not overrun the buffer. When the string is complete, you
call a function to "finish" the buffer, and if returns a flag which
tells you if overrun (or other problems) happened, and then you can
get a pointer to the resulting string from another function.
Varnish has adopted sbuf's under the name ``vsb``. This should
really not surprise anybody: Dag-Erling was also involved
in the birth of varnish.
It should be obvious that internally ``vsb`` almost always operate
on a bigger buffer than the result, so this is another obvious
place to have CHERI cut a pointer down to size:
.. code-block:: none
char *
VSB_data(const struct vsb *s)
{
assert_VSB_integrity(s);
assert_VSB_state(s, VSB_FINISHED);
return (cheri_bounds_set(s->s_buf, s->s_len + 1));
}
Still no bugs though.
*/phk*
.. _phk_cheri_6:
How Varnish met CHERI 6/N
=========================
Varnish Socket Addresses
------------------------
Socket addresses are a bit of a mess, in particular because nobody
dared shake up all the IPv4 legacy code when IPv6 was introduced.
In varnish we encapsulate all that uglyness in a ``struct suckaddr``,
so named because it sucks that we have to spend time and code on this.
In a case like this, it makes sense to make the internals strictly
read-only, to ensure nobody gets sneaky ideas:
.. code-block:: none
struct suckaddr *
VSA_Build(void *d, const void *s, unsigned sal)
{
struct suckaddr *sua;
[… lots of ugly stuff …]
return (RO(sua));
}
It would then seem logical to use C's ``const`` to signal this fact,
but since the current VSA api is currently defined such that users
call ``free(3)`` on the suckaddrs when they are done with them, that does
not work, because the prototype for ``free(3)`` is:
.. code-block:: none
void free(void *ptr);
So you cannot call it with a ``const`` pointer.
All hail the ISO-C standards committee!
This brings me to a soft point with CHERI: Allocators.
How to free things with CHERI
-----------------------------
A very common design-pattern in encapsulating classes look something
like this:
.. code-block:: none
struct real_foo {
struct foo foo;
[some metadata about foo]
};
const struct foo *
Make_Foo([arguments])
{
struct real_foo *rf;
rf = calloc(1, sizeof *rf);
if (rf == NULL)
return (rf);
[fill in rf]
return (&rf->foo);
}
void
Free_Foo(const struct foo **ptr)
{
const struct foo *fp;
struct real_foo *rfp;
assert(ptr != NULL);
fp = *ptr;
assert(fp != NULL);
*ptr = NULL;
rfp = (struct real_foo*)((uintptr_t)fp);
[clean stuff up]
}
We pass a ``**ptr`` to ``Free_Foo()``, another varnish style-pattern,
so we can NULL out the holding variable in the calling function,
to avoid a dangling pointer to the now destroyed object from
causing any kind of trouble later.
In the calling function this looks like:
.. code-block:: none
const struct foo *foo_ptr;
[…]
Free_Foo(&foo_ptr);
If we use CHERI to make the foo truly ``const`` for the users of
the API, we cannot, as above, wash the ``const`` away with a trip through
``uintptr_t`` and then write to the metadata.
The CHERI C/C++ manual, a terse academic tome, laconically mention that:
*»Therefore, some additional work may be required to derive
a pointer to the allocation’s metadata via another global capability,
rather than the one that has been passed to free().«*
Despite the understatement, I am very much in favour of this, because
this is *precisely* why my own
`phkmalloc <https://papers.freebsd.org/1998/phk-malloc/>`_
became a big hit twenty years ago: By firmly separating the metadata
from the allocated space, several new classes of mistakes using the
``malloc(3)`` API could, and was, detected.
But this *is* going to be an embuggerance for CHERI users, because
with CHERI getting from one pointer to different one is actual work.
The only "proper" solution is to build some kind of datastructure:
List, tree, hash, DB2 database, pick any poison you prefer, and
search out the metadata pointer using the impotent pointer as key.
Given that CHERI pointers are huge, it may be a better idea to embed
a numeric index in the object and use that the key,
An important benefit of this »additional work« is that if your
free-function get passed a pointer to something else, you will
find out, because it is not in your data-structure.
It would be a good idea if CHERI came with a quality implementation
of "Find my other pointer", so that nobody is forced to crack The
Art of Computer Programming open for this.
When the API is "fire-and-forget" like VSA, in the sense that there
is no metadata to clean up, we can leave the hard work to the
``malloc(3)`` implementation.
Ever since ``phkmalloc`` no relevant implementation of ``malloc(3)``
has dereferenced the freed pointer, in order to find the metadata
for the allocation. Despite its non-const C prototype ``free(3)``,
will therefore happily handle a ``const`` or even CHERIed read-only
pointer.
But you *will* have to scrub the ``const`` off with a ``uintptr_t``
to satisfy the C-compiler:
.. code-block:: none
void
VSA_free(const struct suckaddr **vsap)
{
const struct suckaddr *vsa;
AN(vsap);
vsa = *vsap;
*vsap = NULL;
free((char *)(uintptr_t)vsa);
}
Or in varnish style:
.. code-block:: none
void
VSA_free(const struct suckaddr **vsap)
{
const struct suckaddr *vsa;
TAKE_OBJ_NOTNULL(vsa, vsap, SUCKADDR_MAGIC);
free(TRUST_ME(vsa));
}
Having been all over this code now, I have decided to constify ``struct
suckaddr`` in varnish, even without CHERI, it is sounder that way.
It is not bug, but CHERI made it a lot simpler and faster for me
to explore the consequences of this change, so I will forfeit
a score of "half a bug" to CHERI at this point.
*/phk*
......@@ -13,6 +13,8 @@ You may or may not want to know what Poul-Henning thinks.
.. toctree::
:maxdepth: 1
cheri6.rst
cheri5.rst
cheri4.rst
cheri3.rst
cheri2.rst
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment