AZ(pthread_key_create) fails if Varnish has been running long enough #1

Closed
opened 2016-08-01 12:10:44 +00:00 by geoff · 4 comments
geoff commented 2016-08-01 12:10:44 +00:00 (Migrated from code.uplex.de)

After an uptime of 41 days, vcl.load with RE object initializations in vcl_init was causing Varnish to crash due to this assertion failure:

#2  0x000000000043ce98 in pan_ic (func=0x2ba69e8025f0 "vmod_regex__init", 
    file=0x2ba69e802112 "vmod_re.c", line=96, 
    cond=0x2ba69e802518 "(pthread_key_create(&re->ovk, ((void *)0))) == 0", 
    err=1, kind=VAS_ASSERT) at cache/cache_panic.c:513
#3  0x00002ba69e801962 in vmod_regex__init (ctx=<value optimized out>, 
    rep=0x2ba6c0296ae8, vcl_name=<value optimized out>, 
    pattern=0x2ba6c0037256 <Address 0x2ba6c0037256 out of bounds>)
    at vmod_re.c:96

errno==1==EPERM=="Operation not permitted" on Linux, but I'm inclined to believe that there were just too many VCLs accumulated, too many RE objects, and we may have hit a limit for pthread keys. The VMOD may have to do a better job of cleaning up pthread keys when they are unused.

After an uptime of 41 days, vcl.load with RE object initializations in vcl_init was causing Varnish to crash due to this assertion failure: ``` #2 0x000000000043ce98 in pan_ic (func=0x2ba69e8025f0 "vmod_regex__init", file=0x2ba69e802112 "vmod_re.c", line=96, cond=0x2ba69e802518 "(pthread_key_create(&re->ovk, ((void *)0))) == 0", err=1, kind=VAS_ASSERT) at cache/cache_panic.c:513 #3 0x00002ba69e801962 in vmod_regex__init (ctx=<value optimized out>, rep=0x2ba6c0296ae8, vcl_name=<value optimized out>, pattern=0x2ba6c0037256 <Address 0x2ba6c0037256 out of bounds>) at vmod_re.c:96 ``` errno==1==EPERM=="Operation not permitted" on Linux, but I'm inclined to believe that there were just too many VCLs accumulated, too many RE objects, and we may have hit a limit for pthread keys. The VMOD may have to do a better job of cleaning up pthread keys when they are unused.
geoff commented 2016-08-01 12:43:55 +00:00 (Migrated from code.uplex.de)

It's also possible that we hit PTHREAD_KEYS_MAX.

$ getconf PTHREAD_KEYS_MAX
1024
It's also possible that we hit PTHREAD_KEYS_MAX. ``` $ getconf PTHREAD_KEYS_MAX 1024 ```
geoff commented 2016-08-10 11:48:58 +00:00 (Migrated from code.uplex.de)

From the glibc docs about pthread_keys:

pthread_key_create returns 0 unless PTHREAD_KEYS_MAX keys have already been allocated, in which case it fails and returns EAGAIN.

The docs don't mention any other reason for pthread_key_create to fail; but in that case we'd expect to see err=EAGAIN=11 in the panic message above (it might be that the panic message is not setting err to the value of errno correctly).

From the [glibc docs about pthread_keys](http://www.sbin.org/doc/glibc/libc_34.html#SEC674): > pthread_key_create returns 0 unless PTHREAD_KEYS_MAX keys have already been allocated, in which case it fails and returns EAGAIN. The docs don't mention any other reason for pthread_key_create to fail; but in that case we'd expect to see `err=EAGAIN=11` in the panic message above (it might be that the panic message is not setting `err` to the value of `errno` correctly).
geoff commented 2016-11-07 18:55:55 +00:00 (Migrated from code.uplex.de)

Since Varnish 4.1, we can use PRIV_TASK scope with objects by using VRT_priv_task() from vrt.h, with the object's address as the "vmod_id" (we have this working in the blobdigest VMOD).

So that will be the solution (meaning that anyone who encounters this bug will need to upgrade to at least Varnish 4.1). Just have to implement it.

Since Varnish 4.1, we can use PRIV_TASK scope with objects by using VRT_priv_task() from vrt.h, with the object's address as the "vmod_id" (we have this working in the blobdigest VMOD). So that will be the solution (meaning that anyone who encounters this bug will need to upgrade to at least Varnish 4.1). Just have to implement it.
geoff commented 2017-05-11 20:04:36 +00:00 (Migrated from code.uplex.de)

Status changed to closed by commit 26f3394ebf

Status changed to closed by commit 26f3394ebfee66b996a4fbe18d61601824d6874d
Sign in to join this conversation.
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
uplex-varnish/libvmod-re#1
No description provided.