In this file a quick overview of all the modifications that have been made for
zone verification.


Configuring the verifier
========================

Configure (nsd.conf) options were added. In the new "verify:" clause:
	enable:
	port:
	ip-address:
	verify-zones:
	verifier:
	verifier-count,
	verifier-feed-zone,
    and verifier-timeout.

And for the "zone:" and "pattern:" clauses:
	verify-zone,
	verifier,
	verifier-feed-zone,
    and verifier-timeout.

To parse the syntax for those options, configlexer.lex and configparser.y are
modified. To hold those configuration values, the structs nsd_options and
pattern_options in the file options.h are extended.

The type of pattern_options::verifier, char**, is in the vector of arguments
form that can be used by the execve family of executing functions. The helper
type "struct component" is defined to help parsing a command with arguments.
A zone_verifier is a list of STRING tokens. A stack of component is
constructed from those strings, that eventually is converted to an argument
in configparser.y.


Difffile modifications
======================

It is possible that during a reload updates for multiple different zones are
read. If some should be loaded (because they verified or didn't need to be
verified) and some not, we have a problem because the database is updated
with all the updates (also the bad ones) and we cannot easily selectively
undo only the bad updates.

In order to break this situation the committed field of each transfer is
utilized. Initially it will be assigned the value DIFF_NOT_COMMITTED (0).
When an update is verified this will be modified to DIFF_COMMITTED (1),
DIFF_CORRUPT (2) or DIFF_INCONSISTENT (4) depending on whether the update
was applied and verified successfully. When a reload resulted in one or
more zones being corrupt or inconsistent, the newly forked server will quit
with exit status NSD_RELOAD_FAILED and the parent server will initiate a new
reload. Then it is clear which updates should be merged with the database (the
updates which committed field is neither DIFF_CORRUPT or DIFF_INCONSISTENT).

	Handling of the NSD_RELOAD_FAILED exit status of a child reload server
	is in server_main (server.c)

To allow updates to be applied again on failure, xfrd has been updated to keep
all updates for each zone around until a reload succeeds. The set of updates
is fixed once a reload has been initiated to avoid a potentially infinite
loop. During the update window, xfrd will accept and transfer updates, but
does not schedule them until the reload finishes. As a result, xfrd manages
the updates stored on disk rather than the server, which previously just
removed each update during the reload process regardless of the result.
Potentially resulting in the same transfer being tried mutiple times if the
set of updates contained a bad update.


Running verifiers
=================

In server_reload (in server.c) the function server_verify is called just after
all updates are merged into the (in memory) database, but just before the new
database will be served. server_verify sets up a temporary event loop, calls
verify_zone repeatedly to run the verifiers and mark each updated zone.
server_reload then inspects the update status for each zone and communicates
the number of good and bad zones in the update. server_reload then decides how
to continue based on the number of good and bad zones as described above.

verify_zone is defined in verify.c (and .h). The function creates the
necessary pipes, starts the verifier and then sets up the required events and
registers them with the event loop.

The state for each verifier is maintained an array of struct verifier. The
size of the array is "verifier-count:" big. Each verifier that runs
simultaneously is assigned a slot. When no free slots are available it waits
until a running verifier is finished (or timed out) and a free slot is
available for a potential next verifier to run simultaneously with the already
running verifiers. The default setting is to run just one verifier at once,
which will probably be fine in most situations.

Once all verifiers are finised (or timed out), the event loop is exited and
server_reload communicates the status for each updated zone.


Environment variables for the verifiers
=======================================

Verifiers are informed on how a zone can be verified through environment
variables. The information on which addresses and ports a verifier may query a
zone to be assessed is available and set on startup just after reading the
configuration and setting up the sockets in nsd.c by calling
setup_verifier_environment (also in nsd.c).

Verifiers are spawned (via verify_zone) with popen3. verify_zone sets the zone
specific environment variables (VERIFY_ZONE and VERIFY_ZONE_ON_STDIN) just
before it executes the verifier with execvp. Server sockets are automatically
closed when the verifier is executed.


Logging a verifiers standard output and error streams
=====================================================

Everything a verifier outputs to stdin and stderr is logged in the nsd log
file.  Handler with handle_log_from_fd (verify.c) as a callback are setup by
server_verifiers_add. The log_from_fd_t struct is the user_data for the handler
and contains besides the priority and the file descriptor, variables that are
used by handle_log_from_fd to make sure logged lines will never exceed
LOGLINELEN in length and will be split into parts if necessary.

Note that in practice error messages are always logged before messages on the
standard output, because stdout is buffered and stderr is not. Maybe it is more
convenient to set stdout to unbuffered too.


Feeding a zone to a verifier
============================

The complete zone may be fed to the standard input of a verifier when the
"verifier-feed-zone:" configuration option has value "yes" (the default). For
this purpose a verify_handle_feed (verify.c) handler is called when the
standard input file descriptor of the verifier is writeable. The function
utilizes the zone_rr_iter_next (verify.c) function to get the next rr to
write to the verifier. The verifier_zone_feed struct is used to maintain state
(the file handle, the rr pretty printing state and the zone iterator).


Serving a zone to a verifier
============================

The nsd struct (in nsd.h) is extended with two arrays of nsd_socket structs:
verify_tcp and verify_udp and an verify_ifs size_t which holds the number of
sockets for verifying. This reflects the tcp, udp and ifs members that are used
for normal serving. Several parts in the code that operate on the tcp and udp
arrays is simply reused with the verify_tcp and verify_udp arrays.

Furthermore, in places in server.c were before the server_close_all_sockets
(server.c) function was used with the normal server sockets, the function is
called subsequently for the verify sockets. Also in server_start_xfrd the
sockets for verifiers are closed in the xfrd child process, because it has no
need for them.


Verifier timeouts
=================

A handler for timeouts (as configured with the "verifier-timeout:" option) is
added by server_verifiers_add at verifier initialization time. The callback is
handle_verifier_timeout (verify.c) and the verifier_state_type for the verifier
is used as user_data.

verify_handle_timeout simply kills the verifier (by sending SIGTERM) and does
not cleanup the verifier state for reuse. This is done in verify_handle_exit,
which is triggered once the verifier exits, because it can handle and start
more verifiers simultaneously.


Aborting the reload process (and killing all running verifiers)
===============================================================

A reload might (especially with a verifier) take some time. A parent server
process could in this time be asked to quit. If that happens and it has a child
reload server process, it sends the NSD_QUIT command over the communication
channel. verify_handle_command, which is registered when the temporary event
loop is created, is triggered and sends a SIGTERM signal to each of the
verifiers.


Refreshing and expiring zones
=============================

When the SOA-Refresh timer runs out, a fresh zone is tried to be fetched from
the master server. If that fails, each SOA-Retry time will be tried again. To
prevent a bad zone from being verified again and again, xfrd remembers the
last serial number of the zone that didn't verify. It will not try to transfer
a zone with the bad serial number again.

Before afer reloading, the reload process informed xfrd which SOA's were
merged in the database, so that xfrd knew when zone needed to be refreshed.
This is adapted to inform xfrd about bad zones. The function
inform_xfrd_new_soas is called for this in server.c. It communicated either
good or bad soas. When bad soas are communicated a session starts with
NSD_BAD_SOA_BEGIN. For only good zones it starts with NSD_SOA_BEGIN. Each soa
is preceded by a NSD_SOA_INFO. When all soas are communicated, NSD_SOA_END is
send. Reception of these messages by xfrd is handled by function
xfrd_handle_ipc_read in ipc.c. In the xfrd_state struct (in xfrd.h), the
boolean parent_bad_soa_infos is added to help with this control flow in ipc.

The soas are eventually processed by xfrd, via xfrd_handle_ipc_SOAINFO in
ipc.c, with the xfrd_handle_incoming_soa function in xfrd.c.  The function
make sure that if a bad soa was received it is remembered in the xfrd_zone
struct. Two new variables are added for the purpose to this struct: soa_bad
and soa_bad_acquired.  The values are stored and read to the xfrd.state file
with the functions xfrd_write_state_soa and xfrd_read_state respectively.

In xfrd.c function xfrd_parse_received_xfr_packet is adapted to make sure that
known bad serials are not transfered again unless the transfer is in a
response to a notify. And even then only when the SOA matches the one in the
notify (if it contained one, otherwise any SOA is good).