Partager
26 janvier 2023
Comment un Paquet Réseau Atteint-il sa Destination? (article en anglais)
Visiting Mozilla's website is easy, right? Just enter mozilla.org
in the browser's address bar, and everything's ready.
Let's stop and think for a minute. How can that be so easy? How can your computer know which server, out of the billions of servers on the planet, belongs to mozilla.org
? How can the request then navigate through the Internet and manage to reach that server? Should we come to the conclusion that it is anything but "easy"?
This post will explain, at beginner and intermediate levels, what happens for one of these network packets to arrive at its destination. Some basic knowledge in networking is assumed, but should otherwise be accessible to all. There are a lot of technologies in play, some that will be silently skipped, others skimmed quickly, but I will focus on two:
- Hostname Resolution with DNS;
- Network Routing with BGP.
The explanations and examples are from a Linux Debian computer. It's possible to follow along from within a Docker container:
$ docker run -ti --rm debian:latest bash
root@debian$ apt update && apt install -y dnsutils iproute2
Hostname Resolution
When we type curl example.com
in a terminal, a network packet cannot yet be created, because the process doesn't know what example.com
is, nor how to communicate with its server. For that, an IP address is required, so the first step is to convert a hostname, example.com
, into an IP address.
Linux uses a Name Service Switch, or NSS, which is a configuration file located at /etc/nsswitch.conf
. In this file can be found the hosts
database that tells the computer how to try resolving hostnames:
$ cat /etc/nsswitch.conf | grep hosts
hosts: files dns
As seen in the previous output, our hosts
database is set to use two services: files
and dns
. These services will be used, in order, until one successfully resolves the hostname, otherwise, the hostname is unresolvable.
/etc/hosts
The service named files
says that the hostname should be resolved using the file /etc/hosts
. That file is a relic of how hostnames worked in the early days of ARPANET. It contains an IP address, along with the hostname it belongs to, and any aliases it may have.
$ cat /etc/hosts
127.0.0.1 localhost
::1 localhost ip6-localhost ip6-loopback
If example.com
had been found on a line, that would give us its IP address, but since it wasn't, the next service is used.
DNS
The service named dns
tells the computer to use the Domain Name System, or DNS, to find a hostname's IP address.
Note: To keep a nice flow in this section, I am ignoring that DNS queries are packets that go through a network, which we haven't yet seen.
DNS is a hierarchical system separated into zones of authority. At the top of this hierarchy is the root zone .
, in control of all the TLD zones (eg. com.
, but many others), which themselves can be split into subzones (eg. example.com.
) in control of all domain names under them.
A DNS server is called a nameserver and can be queried for a domain name's resource records. There are many types of records, but we'll focus on two: A
, which defines an IPv4 address, and NS
, which points to an authoritative nameserver.
An authoritative nameserver is one that actually knows the records that are configured on a domain name. Resource records can also be retrieved from other nameservers that have cached them, but these answers are not considered "authoritative".
To retrieve example.com.
's A
record, we just need to make a DNS query to a nameserver. Which DNS server should be used? That's configured in /etc/resolv.conf
:
$ cat /etc/resolv.conf
nameserver 192.168.1.1
This tells the computer that it should send its DNS queries to 192.168.1.1
(port 53
). That could be the router's IP address (if it can handle DNS resolution), the ISP's IP address, or any other DNS resolver, eg. CloudFlare's DNS service 1.1.1.1
.
Now that we know the DNS server, all that's left is to query it for an A
record, which would hopefully result in example.com
's IP address. The command dig
can be used to retrieve resource records:
$ dig +short example.com. A
93.184.216.34
What the previous command does would be similar to the following steps, but we'll ignore caching to illustrate the recursive nature of DNS. Actual steps depend on the nameserver's implementation.
- the computer makes a DNS query to
192.168.1.1
, asking forexample.com.
'sA
record; 192.168.1.1
is not an authority forexample.com
, so it doesn't know its records;192.168.1.1
wants to make a new DNS query toexample.com.
's nameserver, but it doesn't know what it is, so it needs to first askcom.
aboutexample.com.
;192.168.1.1
doesn't know either how to reachcom.
, so it first needs to ask.
aboutcom.
;192.168.1.1
actually knows where to reach.
, because recursive loops need an exit point, and all DNS servers are configured with at least one hard-coded IP address for the root zone. It makes a DNS query to.
, asking forcom.
'sNS
records, and receives them (eg.a.gtld-servers.net.
) along with their IP addresses;192.168.1.1
makes a DNS query to one ofcom.
's nameservers, asking forexample.com.
'sNS
records, and receives them (eg.a.iana-servers.net.
);- since the previous response did not include the nameservers' IP addresses as additional information, we need to retrieve one of them ourselves, asking
.
aboutnet.
, thennet.
aboutiana-servers.net.
, which helpfully includes the desired IP addresses as well; 192.168.1.1
makes a DNS query to one ofexample.com.
's nameservers, asking forexample.com.
'sA
record, and receives93.184.216.34
as a response;- the computer receives the response
93.184.216.34
from192.168.1.1
.
These steps are listed next:
# Already known: . A 198.41.0.4
dig +norecurse @198.41.0.4 com. NS # a.gtld-servers.net. A 192.5.6.30
dig +norecurse @192.5.6.30 example.com. NS # example.com. NS a.iana-servers.net.
dig +norecurse @198.41.0.4 net. NS # a.gtld-servers.net. A 192.5.6.30
dig +norecurse @192.5.6.30 iana-servers.net. NS # a.iana-servers.net. A 199.43.135.53
dig +norecurse @199.43.135.53 example.com. A # example.com A 93.184.216.34
All of this, in order to get example.com
's IP address! With proper caching, these steps are clearly a lot simpler, but they show an overview of how DNS works.
Now that we have the target IP address, a network packet can finally be created, but how can we send it to that IP address? How does it actually reach its destination?
Network Routing
The first step for the network packet to example.com
is to leave the computer, and find out where to go next. From the point of view of the computer, this part is just a series of networking rules.
Let's first look at the routing table. I've removed IPv6 and broadcast entries for brevity.
$ ip route show table all
default via 192.168.1.1 dev enp5s0 proto dhcp metric 100
192.168.1.0/24 dev enp5s0 proto kernel scope link src 192.168.1.201 metric 100
local 127.0.0.0/8 dev lo table local proto kernel scope host src 127.0.0.1
local 127.0.0.1 dev lo table local proto kernel scope host src 127.0.0.1
local 192.168.1.201 dev enp5s0 table local proto kernel scope host src 192.168.1.201
With this routing table and the target IP address, a computer can find out where next to send the packet. On each line, the keyword following dev
(for "device") determines the interface that the packet should go through. These usually lead to a NIC but don't need to.
Loopback
Let's pretend we're trying to send a network packet to localhost
, which was resolved to 127.0.0.1
using /etc/hosts
. Looking at the routing table, the target IP address 127.0.0.1
matches exactly with one of the lines:
local 127.0.0.1 dev lo table local proto kernel scope host src 127.0.0.1
This tells us that the packet will go through the lo
interface. Now, what's the lo
interface?
$ ip link show lo
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
The interface lo
does not have any hardware behind it and is just a virtual interface that is used for loopback. It works by sending back to itself all of its packets, with the IP addresses and port numbers reversed.
Inside the Network
Let's say we try to reach another computer, one on the local network, whose IP address happens to be 192.168.1.210
. There is a line in the routing table that matches this IP address with the CIDR 192.168.1.0/24
:
192.168.1.0/24 dev enp5s0 proto kernel scope link src 192.168.1.201 metric 100
It means that these packets should be sent to the interface enp5s0
.
$ ip link show enp5s0
2: enp5s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
link/ether 00:01:02:03:04:05 brd ff:ff:ff:ff:ff:ff
Right now, this interface has the computer's NIC card behind it, the one with MAC address 00:01:02:03:04:05
, so all packets sent to enp5s0
will go out through that NIC.
Since 192.168.1.210
is in the same network, we only need the target computer's NIC's MAC address. It may be found in the cache, but if it isn't, it will be retrievable via an ARP broadcast:
$ ip neigh show
192.168.1.210 dev enp5s0 lladdr 08:09:0a:0b:0c:0d REACHABLE
All of this stays within the local network, in my case keeping the communications in my router's switch.
Outside the Network
Let's finally continue with our network packet to example.com
. There are no routes that match 93.184.216.34
, so what happens now? There's one route, though, that was a bit different than the others:
default via 192.168.1.1 dev enp5s0 proto dhcp metric 100
This line means that all IP addresses that match none of the routes are sent to the gateway at 192.168.1.1
through the enp5s0
interface. This is the default gateway, in my case a router, whose job it is to find out where to route these packets.
Now typically, that packet will go out of the router, through the modem, reach an ISP, and go into the wild Internet to somehow finally arrive at the target server. That's way too high-level though, so let's dig deeper into the last piece of technology.
BGP
The path a network packet takes is solved by an interesting, albeit daunting, technology: the Border Gateway Protocol, or BGP. BGP is how a network packet gets routed throughout the Internet, moving from router to router.
The Internet is separated into Autonomous Systems, or AS, which are network meshes of routers under the jurisdiction of an ISP, a large company, or a hosting provider. They are in charge of overseeing and/or assigning their allocated IP addresses.
When my router sends the packet to my ISP, their router receives it and matches it with a route in its routing table. There may be different paths available towards the destination, and the best one will be chosen. The network packet will then be sent to the next BGP router in the path, and the cycle continues, until the packet arrives at the target server.
Let's look at an actual example using traceroute
:
$ traceroute example.com
traceroute to example.com (93.184.216.34), 30 hops max, 60 byte packets
1 pfSense.localdomain (192.168.1.1) 0.244 ms 0.268 ms 0.302 ms
2 198-84-254-97.cpe.teksavvy.com (198.84.254.97) 10.884 ms 10.867 ms 10.877 ms
3 10.170.192.53 (10.170.192.53) 14.954 ms 15.005 ms 15.038 ms
4 xe-0-1-0-0-bdr01-mtl.teksavvy.com (192.171.63.17) 14.318 ms 14.264 ms xe-5-2-1-0-bdr01-mtl.teksavvy.com (206.248.155.109) 13.974 ms
5 motl-b2-link.ip.twelve99.net (62.115.44.222) 14.930 ms 14.859 ms 14.901 ms
6 chi-b23-link.ip.twelve99.net (62.115.118.188) 33.839 ms 32.588 ms 32.610 ms
7 edgecast-ic315149-chi-b23.ip.twelve99-cust.net (213.248.97.182) 32.664 ms 33.045 ms 32.752 ms
8 ae-65.core1.chb.edgecastcdn.net (192.229.225.131) 32.955 ms ae-66.core1.chb.edgecastcdn.net (192.229.227.131) 29.664 ms 35.778 ms
9 93.184.216.34 (93.184.216.34) 31.456 ms 28.201 ms 32.559 ms
10 93.184.216.34 (93.184.216.34) 32.550 ms 30.099 ms 32.577 ms
What we can learn from this, using online tools like ipinfo.io, is that a network packet's path from my computer to example.com
goes through three AS, and would arrive at its destination in 10 hops. It should be similar to the following:
Hops | ASN | Company | Router Location |
---|---|---|---|
1 | Quebec City | ||
2, 3, 4 | AS5645 | TekSavvy Solutions | Montreal |
5 | AS1299 | Arelion | Montreal |
6, 7 | AS1299 | Arelion | Chicago |
8 | AS15133 | Edgecast | Chicago |
9, 10 | AS15133 | Edgecast | New York |
The packets leave my router to my ISP, Teksavvy, move inside their network in Montreal for a couple of hops, continue towards Chicago through a Swedish Tier 1 network named Arelion, to finally enter Edgecast's network.
If I do the same from a server hosted in a Toronto data center:
Hops | ASN | Company | Router Location |
---|---|---|---|
1, 2, 3 | AS14061 | Digital Ocean | Toronto |
4, 5 | AS6453 | Tata Communications | Toronto |
6 | AS6453 | Tata Communications | Chicago |
7, 8, 9, 10 | AS6453 | Tata Communications | New York |
11, 12, 13 | AS15133 | Edgecast | New York |
Because a BGP router can make any and all decisions it wants about where next to send a packet, the path a packet will take is not deterministic. In the previous example (from a Digital Ocean droplet), the third hop can go through different routers: 138.197.249.78
, 138.197.249.82
, 138.197.249.86
, or 138.197.249.90
. Which one is chosen depends on the previous router's decision, may have considered many things, including routing rules, latency, congestion, etc.
This also means that a router or server is not necessarily in a definite location, and multiple servers in different cities can receive requests targeting the same IP address. This is one way to implement a Content Delivery Network, or CDN, which appears to be the case for example.com
-- the server in New York seems to be a CDN server for the origin server located in Los Angeles. I can trigger a different path if I use a VPN in Vancouver, which results in a path that stays near the west coast, indeed ending up in Los Angeles. Which server is the origin server may be impossible to verify, though, and could even change at any time.
From a VPN in Ukraine, the packets are routed through several cities in Europe, then in a transatlantic cable towards Boston, before ending up in New York.
The End at the Target Server
The network packet to example.com
just reached the target server, which can build and send back a response, if it wants. For that packet to get all the way to that server, though, it had a lot of things to go through.
We've seen the computer figure out the IP address from a hostname, and followed the packet from the computer to the target server.
Resolving example.com
to 93.184.216.34
was done with DNS, a hierarchical system that decentralizes domain name resolution. With more than 350 million domain names accessible from the Internet, DNS allows everyone to have access to most of them by only remembering a short name.
Finding the path from the ISP to the target server required BGP, a decentralized planetary network mesh that boils down to simple routing decisions. With more than 100,000 AS that take part in BGP, this makes for a constantly evolving network defined by routing tables containing a million entries.
It's easy to forget how the Internet works, and despite its complexity, these systems allow communications across the world as quickly as a few tens of microseconds.