Please use the ANRW reference: Johannes Naab, Patrick Sattler, Johannes Zirngibl, Stephan Günther, and Georg Carle. 2023. Gotta Query ’Em All, Again! Repeatable Name Resolution with Full Dependency Provenance. In Applied Networking Research Workshop (ANRW ’23), July 22–28, 2023, San Francisco, CA, USA.
This dataset contains the raw queries captured by the resolver.
The input is the combination of the (outdated) Alexa List and the Majestic Million.
The embedded delegations for the .com
domains have been extracted from the .com
zone file.
The resolver queries A
, AAAA
, CAA
, MX
and TXT
records for the input domains.
www
subdomains (where reasonable) have been queried for A
and AAAA
records.
The resolver issued internal queries for address resolution (A
, AAAA
) and zone setup (SOA
, NS
, DNSKEY
) as necessary.
The scan was executed on 2023-05-10.
[ 39M] inputlist.zst
Used as input for speedbag run #1 and speedbag run #2.
[2.0M] nameserver.csv.zst [3.2G] queries.csv.zst [4.0G] queries.embedded-json-data.csv.zst [3.7G] results.zip
Used as input for speedbag run #3.
[2.0M] nameserver.csv.zst [3.3G] queries.csv.zst [4.1G] queries.embedded-json-data.csv.zst [3.9G] results.zip
[2.0M] nameserver.csv.zst [ 55k] unknownquery.csv.zst [ 62k] unqueried.csv.zst
[2.0M] nameserver.csv.zst [ 55k] unknownquery.csv.zst [ 62k] unqueried.csv.zst
[2.0M] nameserver.csv.zst [ 61] unknownquery.csv.zst [ 90] unqueried.csv.zst
Data is compressed using Zstandard, commonly available as zstd
.
Important columns are:
nsid
identifies the name server and is referenced in queries.csv
.
ip
provides the IP address.
Important columns are:
qid
query id, used for internal tracking.
nsid
references the name server/IP address via nameserver.csv
.
srcport
and proto
identify the local port and transport protocol.
sent
provides a epoch time stamp in nano seconds, inflight
gives the delta for the response.
qname
, qclass
and qtype
identify the query.
isedns
indicates EDNS0 (RFC 2671) usage.
neterr
, dnserr
,generr
are various possible error.
Minimal DNS message content is provided in flags
, questioncount
, answercount
, authcount
, addcount
.
zid
internal references in which zone the query was executed.
Raw query data is stored in results.zip, the roffset
provides the byte offset for the start of the response in this data structure.
rlen
provides the length (which is also encoded in the data structure).
qoffset
provides the byte offset for the query itself.
prevq
and isfinalq
track reties.
A query that has not been retried later is marked with isfinalq
, prevq
references a qid
for which this query is a retry off.
same as queries.csv
but with the additional columns question
, answer
, authority
, additional
containing a JSON array with decoded resource records of the corresponding section.
RRSIG
, NSEC
, NSEC3
records are omitted in these lists.
Is a zip container for the raw query data.
It contains multiple files starting at querydata/0000000000000000.zst
.
Each one of the files contains about ~4MB raw query data, compressed with zstd.
The basename of the file 00000000003ffdec
contains the start offset of the contained data.
Example: The query 10187,475,44051,udp,1683751435217100208,34120135,f-dns.pl.,1,28,true,,,,33808,1,2,9,15,74,4193772,4193811,1095,0,true
points to roffset
4193811 (hex 0x3ffe13). This is contained in the file querydata/00000000003ffdec.zst
at offset 39 (when considering uncompressed content, 4193811 - 0x3ffdec.
Each individual exchange is stored with a 2 byte length prefix (i.e. TCP DNS exchange framing).
Contains name server statistics as seen by speedbag
.
List of unknown queries, i.e. queries speedbag
received but did not have an original answer to.
List of unqueried queries, i.e. queries speedbag
knows but were not asked in this run.