Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
L
libvmod-j
Project
Project
Details
Activity
Releases
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
uplex-varnish
libvmod-j
Commits
cb714804
Unverified
Commit
cb714804
authored
Sep 08, 2023
by
Nils Goroll
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
Improve documentation wrt UTF-8
parent
7882fd78
Changes
3
Hide whitespace changes
Inline
Side-by-side
Showing
3 changed files
with
77 additions
and
2 deletions
+77
-2
README.rst
README.rst
+7
-0
vmod_j.man.rst
src/vmod_j.man.rst
+35
-1
vmod_j.vcc
src/vmod_j.vcc
+35
-1
No files found.
README.rst
View file @
cb714804
...
...
@@ -19,6 +19,13 @@ PROJECT RESOURCES
* the mirror at https://gitlab.com/uplex/varnish/libvmod-j for issues,
merge requests and all other interactions.
.. _Höhrmann UTF-8 decoder: http://bjoern.hoehrmann.de/utf-8/decoder/dfa/
.. _Exhaustive Test Program: https://git.sr.ht/~slink/hoehrmann-utf8
This project contains, as a submodule, the `Höhrmann UTF-8 decoder`_
as tested by the `Exhaustive Test Program`_.
INTRODUCTION
============
...
...
src/vmod_j.man.rst
View file @
cb714804
...
...
@@ -50,7 +50,7 @@ THE** `WARNING`_.
.. _JSON: https://www.json.org/json-en.html
.. _RFC 8259: https://www.
rfc-editor.org/rfc/rfc8259
.. _RFC 8259: https://www.
ietf.org/rfc/rfc8259.txt
Formatting `JSON`_ in pure VCL is a PITA, because string processing in
VCL was never made for it. VCL being a Domain Specific Language, it
...
...
@@ -227,6 +227,40 @@ use ``j.array(j.number(1) + j.number(2) + j.number(3))`` and to create
an array of three strings (``["1","2","3"]``) use
``j.array(j.string(1) + j.string(2) + j.string(3))``
UNICODE / UTF-8
===============
.. _Höhrmann: https://git.sr.ht/~slink/hoehrmann-utf8
JSON does not strictly mandate strings to contain valid UTF-8. `RFC
8259`_ section 8.2 reads:
[...] this specification allows member names and string values to
contain bit sequences that cannot encode Unicode characters [...]
however
When all the strings represented in a JSON text are composed
entirely of Unicode characters [...] (however escaped), then that
JSON text is interoperable [...]
For the most part, this module is not concerned with whether or not
strings represent valid UTF-8 or Unicode:
`j.string()`_ with the ``escape=none`` and ``escape=minimal``
(default) options only checks/ensures that strings are properly
escaped and is otherwise transparent with the exception of NUL /
``\0``, which marks the end of the string.
The two exceptions are:
* `j.string()`_ with the ``escape=ascii`` option decodes UTF-8 using
the `Höhrmann`_ decoder, which fails for invalid UTF-8, but only
conducts minimal checks on Unicode points.
* `j.unquote()`_ fails if the input is not a valid JSON string or if
invalid UTF-8 would be produced.
VMOD INTERFACE REFERENCE
========================
...
...
src/vmod_j.vcc
View file @
cb714804
...
...
@@ -36,7 +36,7 @@ THE** `WARNING`_.
.. _JSON: https://www.json.org/json-en.html
.. _RFC 8259: https://www.
rfc-editor.org/rfc/rfc8259
.. _RFC 8259: https://www.
ietf.org/rfc/rfc8259.txt
Formatting `JSON`_ in pure VCL is a PITA, because string processing in
VCL was never made for it. VCL being a Domain Specific Language, it
...
...
@@ -213,6 +213,40 @@ use ``j.array(j.number(1) + j.number(2) + j.number(3))`` and to create
an array of three strings (``["1","2","3"]``) use
``j.array(j.string(1) + j.string(2) + j.string(3))``
UNICODE / UTF-8
===============
.. _Höhrmann: https://git.sr.ht/~slink/hoehrmann-utf8
JSON does not strictly mandate strings to contain valid UTF-8. `RFC
8259`_ section 8.2 reads:
[...] this specification allows member names and string values to
contain bit sequences that cannot encode Unicode characters [...]
however
When all the strings represented in a JSON text are composed
entirely of Unicode characters [...], then that JSON text is
interoperable [...]
For the most part, this module is not concerned with whether or not
strings represent valid UTF-8 or Unicode:
`j.string()`_ with the ``escape=none`` and ``escape=minimal``
(default) options only checks/ensures that strings are properly
escaped and is otherwise transparent with the exception of NUL /
``\0``, which marks the end of the string.
The two exceptions are:
* `j.string()`_ with the ``escape=ascii`` option decodes UTF-8 using
the `Höhrmann`_ decoder, which fails for invalid UTF-8, but only
conducts minimal checks on Unicode points.
* `j.unquote()`_ fails if the input is not a valid JSON string or if
invalid UTF-8 would be produced.
VMOD INTERFACE REFERENCE
========================
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment