TEST PLAN for NSD. By W.C.A. Wijngaards, July 2006, NLnetLabs. 1. Introduction --------------- NSD 3 contains far more features than a typical point release. These features need to be tested and checked to make sure they work well. This document describes a plan to test all the features that have been added to NSD. Regression testing is also very important. The old features must remain working. We have a set of tpkg packages to help with it. And also root-trace speed tests to regression test NSD. The feature tests are to be automated, using tpkg packages where possible. 2. Minor Features ----------------- Some minor features for the test: 2.1. DNAME ---------- DNAME support - there are already extensive DNAME tests. (closed). 2.2. NSEC3 ---------- NSEC3 support - use the perl automated nsec3 test - port to tpkg perhaps. Note NSEC3 hash length byte to be implemented, test against others. Test interoperability of that. A simple zone transfer with Bind. (experimental, no need to test any more). 2.3. NSID --------- Would make a nice nsid.tpkg package. NSID support - run NSD with different NSIDs and queries. a- test NSID with zero length, query with NSID. b- very long length, query with NSID c- 012345678 and query for different things. 1- query OK things. - query error 2- nxdomain, 3- loop, 4- nodata. 5- query error (bad queries, wrong zone) - is NSID present. d- NSID and TSIG. 1- query has OK TSIG 2- query has BAD TSIG 3- query for nxdomain 4- bad query, wrong zone e- configure NSID from config file? - is this possible f- test if NSID in NOTIFY responses. (should there?) ldns-notify and parse result packet for nsid. g- test if NSID in AXFR responses. (should there?) drill axfr and see if nsid in result packets. (experimental, so low priority). (need a way to send NSID enabled queries - no test). 3. Transfers ------------ For the transfers the test are to be done using - NSD as a master(AXFR) in 3.1, or a ldns-ixfr miniserver(IXFR) as a master in 3.2 that serves pre-made ixfr answers. The zone transfer tests can be put in one tpkg by using servers at different ports. The allow- lines are then for localhost, all ports (since the sending process uses ephemeral ports, all must be allowed). The request- lines contain the correct port numbers to send to. 3.1. AXFR --------- 3.1.1. AXFR features -------------------- Setup is a secondary zone which requests to a master. the master zone is updated. Then, the secondary should be informed with the notify: statements. And test if secondary got the same zone as master. By doing axfr from both servers and check if the same, and serial nr. Tests 3.1.1 can be one tpkg. (with serial numbers for SOA, to perform serial rollover). - secondary starts with a zone without content (soa=1) so the zone is only mentioned in the config, the zonefile is empty/nonexist on the slave. Master has a soa and three text records. - axfr an empty zone - only the SOA (soa=2) - axfr a zone with only little data. (soa=3) some NS, MX, A, AAAA records. [NOTE: apparently, due to the linked list mgt in domains (of rrset*) the ordering of rrtypes for a domain is reversed after a zone transfer for NSD, i.e. for query type=any. Ordering within an rrset is preserved. Created fix to ordering, but is slow for many rr types... ] - old zone unsigned, new zone signed. (soa=4) sign with two KSKs and one ZSK. And a prepublished ZSK in the zone. ZSK1: Kexample.com.+005+44537 ZSK2: Kexample.com.+005+03824 (prepublish) KSK1: Kexample.com.+005+53988 KSK2: Kexample.com.+005+25320 (presign) - old zone signed, new zone unsigned. (soa=5) different zone contents, some names are still there, unchanged, some names are there RRs changed, some names there different RRtypes, and some names removed, some names added. www: unchanged (including nsec,rrsig). webmail: mail prio changed. printer: name removed. terms: different RR types, now A type. mail: type TXT added. apex: type DNSKEY, nsec, rrsig removed. newservice, ooo: new names - new zone with nsec3. (soa=6) iter=33 salt=AA44FF11 slave detects NSEC3 settings. - new parameters for nsec3. (soa=7) iter=1078 salt=00998877665544332211AADDCCFF slave detects NSEC3 settings. - new zone no longer uses nsec3. (soa=8) uses nsec. - axfr an empty zone (only the SOA) (soa=9 + a lot for serial wraparound) 2**31 = 2147483648 9 + 2**31 : 2147483657 also tested 9 + 2**31-3 -- works. notify and transfer, zone updated. 9 + 2**31-2 -- works. notify and transfer, zone updated. 9 + 2**31-1 -- notify works, but at transfer time 'serial old'. fixed: works, zone updated. 9 + 2**31 -- notify is ignored, 'serial old'. 9 + 2**31+1 -- notify is ignored, 'serial old'. - axfr wraparound zone, couple txts. (soa=2) serial=2 works after serial=9 + 2**31-2 before. These can be done in order. Test done for AXFR. RRset-type ordering preserve fixed. Serial rollover fixed. Serial printed as unsigned. 3.1.2. Huge xfr - see test tpkg for this. It works already. 3.1.3. AXFR and TSIG Like 3.1.1. but enable tsig. Tested, it works. 3.2. IXFR --------- 3.2.1. IXFR Features -------------------- Setup is a secondary with ldns-mini-ixfr server as a master. (ldns/examples/nsd-test/ldns-testns.c). The mini ixfr server responds with canned replies to a IXFR query. - secondary loaded with only the soa = 1. notified with 10. ixfr server responds with soa=1. (i.e. no update available). - same, ixfr server responds with soa=2, and TC (udp). on tcp connection, it responds with a simple difference package. SOA2 SOA1 SOA2 newTXTA newTXTB SOA2 adds a couple records. - to soa is 3, and this one removes the TXT records, makes a new TXTB record and a TXTC record. test that TXTA domain does not exist NXDOMAIN. test that TXTB is updated. test that TXTC exists. - from 3 to 5, with 4 in between. make a ixfr packet that is 3 .. 4 and 4 .. 5 concatenated without compression, so it means more work for processing. So, in the ixfr packet TXTB is removed from 3, added in 4, removed from 4, added in 5. Also in vs4 a txt5 record is added, which stays around. - from 5 to 7, but this time the redundant work is removed from ixfr packet. - 7 to 8, have one domain name where two RR types exist, A and DNAME. remove one RR type in IXFR then make sure the other type still exists. - 8 to 9, have a name with many different A records. Remove one A record from it. Add another A record to it. Test if the rest is there. [Note: when you delete on RR from an RRset, the ordering of the RRset changes, the contents of the rrset get shuffled (last put in empty slot).] 3.2.2. a huge ixfr. ------------------- create test for it. Should be several MBs worth or data removed, MBs of data that stays the same, and MBs that are added. To make sure that the code can handle multiple packet IXFRs, and the state memory between IXFR packets. - created testns version for multiple packet reply. Small, multiple packets. Test from 3.1. but using multiple packets: one RR per packet. This test also falls over from udp to tcp for ixfr. - This works. The Mbs in size is tested in huge axfr test already. 3.2.3. Test remove domain ------------------------- - is_existing = 0 used to remove a domain. Check and test carefully. - test delete middle name i.e. you have a zone with: c.example.com TXT "x" b.c.example.com TXT "x" a.b.c.example.com TXT "x" and you delete the b.c. record. The b.c becomes empty nonterminal. If you then delete a.b.c. TXT, the b.c becomes NXdomain. [- fixed delete with IXFR for empty nonterminals.] - test delete/add a domain and NXdomain/exist replies. - tested in 3.2.1 already, works. - test delet domain and wildcard replies. - Tested, it works, fix for IXFR that makes empty nonterminals. 3.3. Timeouts ------------- Get zone to expire. Check it does not answer. Start only a secondary server, no master. Set expire timeout short. Timers set as refresh=1 retry=1 expire=10 minimum=10 Provide an update. Check it does answer again. Startup the master server after a while. Transfer should happen within the retry interval. Wait for zone to expire again. Check that. Provide old zone on the master, after expire the slave must transfer it. The above works, old zone is transferred and served. Test that the master says that serial number is OK, in 3.2.1 tests. This test also includes IXFR reply from the master that contains AXFR contents. 3.4. TSIG zone transfers ------------------------ Already TSIG tpkg tests, with transfers TSIG protected, so that is ok. TSIG notifies - test it, create test for it. - notify accepted, nsd->nsd notify by starting master and slave server with tsig keys for a zone, update zone at master. Done in 3.1.3. - notify refused. nsd->nsd notify same but use different secret at one server. Test done. 4. IPC -------- 4.1. deadlocks -------------- Have 100.000 zones, all with short SOA timeouts, expire=1 sec. refresh=10. Expire very quickly. This gives many messages from xfrd to server. Send notifies to the server in a loop from a shell script. Lots of messages the other way around. Provide a master server that will serve all the zones (and say they are ok). then proceed to send queries for the zones to the server and see if you get answers. Wait for an hour and try again. Result, the IPC works okay, but xfrd uses much memory, 16Kb for TSIG regions, per zone. With the 2.5 kb in xfrd almost 20 Kb per zone. For 2G for 100.000. A bit much memory, for the largely unused tsig regions. Fixed, tsig for xfrd uses no preallocated worst case memory use, but only a small footprint. During use this may grow; about 1 K per zone perhaps. About 2.5Kb per secondary zone in xfrd, below 1 Kb for a master zone, that works out for 100.000 secondary zones as 250 Mb for xfrd. Perhaps do also with 100 child servers for the NSD. see if it can keep up and the result if it cannot keep up sending to child servers. Since it has to send for each zone to each child a message, this will take more resources. Tested, it cannot keep up. Child servers operate using old zone status of expired/ok, also the machine load is 100%. Also fixed tsig.other_size to be checked when reading TSIG from network. Due to the length and size, more an incidental test, but can be tpkg-ed. 4.2. IPC FORKS -------------- Infinite loop of reloads on a server. Has 10 child servers. wait. See if it runs out of sockets, file descriptors, etc. incidental test. Tested, with adjusted source that repeats reloads. This puts strain on the reload ipc handshake code. And ipc socket code. It works fine. 5. Random mess test ------------------- Setup 7 servers. In master->intermed->slave, with multiple master(2), intermed(3) and slave(2) servers. TSIGs (different) for everyone. Perhaps also include never respond entries (fake address) in acls. - Load random SOA + random data in servers. Backup the setup so it is repeatable. Let them work out what version to run. - Provide updated zone for a master. See what happens. - Send notifies to the slave servers. - Send notifies to the intermed servers. - Send notifies to the master servers. - Kill some server. Start it again. - Kill some server & delete some file (ixfr.db or xfrd.status). - delete ixfr.db - delete xfrd.status - delete ixfr and xfrd files. - run nsdc patch on a server. - pretend an intermediary was offline for a long time with old zone files and old ixfr.db and xfrd.state(!!) files. and see what happens :-) It should refresh/expire and so based on timers in xfrd.state. Tested: - nsd returns formerr on IXFR queries because of data in NS section. But this is correct, fixed NSD, so it is no longer formerr, but refused / not authorised instead. (or whatever we put in axfr.c). - depending on which server they are asking, servers will use one of the master zones (after expiry time exceeded). If master updated, intermediaries, then slaves update themselves too. - NSD would not start with a corrupt diff file. Now logs error and ignores, fixes, the diff file. 6. Portability test ------------------- Port NSD to as many platforms as possible - local: sparc5(ok), alpha(ok), amd64/OpenBsd(jelte thuis), open=FreeBSD(ok), linuxes(ok), MacOsX(ok), Sunos4(ok). - sf compilefarm for more. - x86-linux2 has ip6 disabled. tests dont work with that. - minix3 if we can get it working (the minix3 setup fails somehow). - would be good to have a test set of tpkg (and tools required) to run after a port-test. A very portable set of tpkgs. OSTYPE: (g)make. autoreconf. (g)indent. -> defaults for * systems. dig 8.3 too old (format of output). Need 9+. however dig/bind is not portable enough. ldns: pcat, pcat-diff, pcat-print. xfr1,2:nsd-ldnsd. pcat-grep.pl manual: md5sum/md5. hping(sudo). long: ldns-testns. Made tests more portable, ran tests on linux, freebsd, Solaris. Full testset run on SPARC/SunOS2.5, and fixed two unaligned memory accesses, all tests succeed now. Full testset runs on Powerpc/MacOSX. 7. CODE REVIEW -------------- Code has already had 1x review by Wouter, some review by Miek. More review (again), Jelte, Wouter. - Do some spots of interest. - perhaps a full review as well. 8. todo-tests ideas ------------------- These would be nice as tpkgs, but perhaps manual tests are needed. 8.1. test combinations of configure options and shells ------------------------------------------------------ " run tests with different shells, aka ==-bug bash 1.1. is too old for [[ in tpkg and tests. Some hosts have awk that puts a space before .pre files in tpkg. Some hosts have bash in /usr/local/bin so tpkg fails on that. run tests with different configure options and combinations of them. Many tests fail with disable-ipv6. implement this in a xen-like environment so that different OSs can be checked. run this daily or only when subversion changes for each test, run our "test-suite" " 8.2. patch file remove ---------------------- rm patch file, check xfrd's behavior. Refetches zones Checked in section transfer_axfr. 8.3. 64 bit ----------- GB 64 bit file size transfers. On alpha so nastiest alignment on 64bit machine. Do transfer of > 4 Gb zone. Needs lots of memory(swap space) and disk space. Not done; no host for test. 8.4. Valgrind ------------- run with valgrind - on two nsds. then do the nsd-nsd, and notify the master to get axfr to happen test, with tsig as well enabled. Done, found one uninit variable. 8.5. Chroot ----------- test chroot and the new files/directories. (And the file/dir not in chroot problem, and if all is OK that it works). Done, default locations for ixfr.db and xfrd.state have full pathnames. 8.6. nsdc --------- In temporary test setup above, test nsdc tool. works. Make sure that if nsdc patch breaks a zone transfer in progress it is reattempted later on. hard to test. 8.7. nsd-patch -------------- nsd-patch - run nsd patch and compare zone files, like AXFR/IXFR tests. Done test axfr run, or test-mess. 8.8. gcov --------- gcov to look at code coverage of the tests. Tests added to improve coverage.