Partager

17 juillet 2023

Adresses Réseau: Historique et Caractéristiques des Adresses IP (article en anglais)

From the beginning of networking, remote hosts were identified by short strings called hostnames, which prevented users from tediously remembering numeric network addresses. This post is about the history and details of these addresses, from the early 70s to IPv4 and IPv6.

History

When the ARPANET brought the early networks together, they were interconnected by IMPs, the fridge-sized ancestors to the current routers, located at key positions and using the existing telephone lines. Addresses were highly dependent on the architecture, operating system, and type of network, but these IMPs allowed different networks to speak to each other, as long as they followed the correct protocol RFC 1.

Users who wanted to connect to a remote server would use a terminal and make a connection using a hostname. This was mapped to an address that was one octet (8 bits) long, of which the high two bits designated the host number, and low six bits were the IMP number. This allowed for 64 IMPs, each with a maximum of 4 hosts, for a total of 256 distinct hosts, which was deemed large enough for the foreseeable future.

yyxxxxxx

As examples of network addresses, the following table lists the hosts that were available in 1972 on the first four IMPs, already out of 35:

hostname net addr IMP host description
UCLA-NMC 1 1 0 UCLA, Network Measurement Center
UCLA-CCN 65 1 1 UCLA, Campus Computing Network
UCSD-CC 129 1 2 UCSD
SRI-ARC 2 2 0 SRI, Augmented Research Center
SRI-AI 66 2 1 SRI, Artificial Intelligence Group
UCSB-MOD75 3 3 0 UCSB
UTAH-10 4 4 0 University of Utah

IPv4

Needless to say, as interest grew and networks were added, a single octet for network addresses was not sufficient, and work towards a replacement began. Although it was a work-in-progress for a few years by then, at the turn of the decade, in january 1980, RFC 760 was proposed with a better alternative known as IPv4.

Being part of the Internet Protocol, located in the Internet Layer of the Internet Protocol Suite, it was later officially implemented in 1983, and quickly became a strong pillar of the Internet.

IPv4 addresses were four octets (32 bits) long, the first of which designating the network number. This allowed for four times as many networks, 256, each with up to 16 million hosts. These new addresses were four times larger than the previous ones, and were deemed large enough for the foreseeable future.

xxxxxxxx yyyyyyyy yyyyyyyy yyyyyyyy

IP Address Classes

When it became clear the very next year that there weren’t going to be enough networks, IPv4 was revised with RFC 791. It split the same 32-bit address into different network classes:

class format networks hosts
A 0xxxxxxx yyyyyyyy yyyyyyyy yyyyyyyy 128 16M
B 10xxxxxx xxxxxxxx yyyyyyyy yyyyyyyy 16k 65k
C 110xxxxx xxxxxxxx xxxxxxxx yyyyyyyy 2M 256

The idea was that organisations and companies, instead of requesting a network number, would request a network class that would be adequate for their need. With this classful system, more than 2 million networks would be available.

Of course, the existing networks were grandfathered in and kept their network numbers (class As), even if most of them were far from needing as many IP addresses.

Subnets

The Internet became increasingly popular in the mid-1980s when every company started using this new tool called electronic mail. It soon became evident that not even network classes would be enough, and RFC 950 was published in 1985 to standardize how a network should split into an arbitrary number of subnets, optionally configured at each network level.

For example, a class B network could have decided for 16 subnets (four bits), thus 4,096 hosts (12 bits) in each subnet:

10xxxxxx xxxxxxxx yyyyzzzz zzzzzzzz

Even if subnets were limited to within a specific network, this was still an important change in how routing worked. Before that, organizations that wanted different networks, eg. for separate buildings, would require multiple IP address ranges. If an organization had four buildings with 1,000 hosts each, it would need four class B networks, and would only use 4,000 IP addresses out of the ~260,000 that would be accessible. Subnets allowed for more efficient use of the network address ranges, since that same organization could now subnet a single class B network.

Imminent Problems

At the beginning of the 1990s, RFC 1380 was published, explaining some problems that required immediate attention. In particular, the Internet was faced against two imminent and critical issues:

  • class B networks would soon be exhausted
  • the routing tables were becoming too large for the current technology

The source of the first issue was that already in 1992, barely 10 years after they were implemented, 54% of class A and 43% of class B were assigned. Since class A networks were too large and rarely given, and class C networks were too small for a typical organization, class B was going to run out.

The second issue was because it took more time, roughly 18 months, for computer performance to double, whereas the Internet was doubling in size every 12 months. Routers’ hardware just could not keep up for long.

Some help was given to the routing tables with the Border Gateway Protocol, proposed in 1989 with RFC 1105, and implemented on the Internet around 1994. With BGP, not all routers needed to keep the complete routing table. This allowed a larger network range to be forwarded to another BGP router, which could be further refined as it neared its destination.

There was also a third issue, but which did not require immediate attention like the other two: actual IPv4 addresses shortage. It was eventually going to happen, so the only thing that anyone could do was slow down when it would happen.

With all that, the mid-1990s were right when computers were now affordable to almost everyone. Reaching most other countries by now, the Internet was about to get its biggest growth yet, exascerbating the pressing need to find an adequate replacement to IPv4.

CIDR

RFC 1517 was published in september 1993 describing Classless Inter-Domain Routing, or CIDR, a measure to slow down the issues mentioned earlier. CIDR removed the old classful network ranges in favor of an additional number that defined how many bits were reserved for the network prefix.

Routing throughout the Internet now needed this additional piece of information. Unallocated IP ranges, mostly class C at this point, could now be assigned with more flexibility, according to an organization’s needs. This system also allowed IP addresses to be better aggregated during routing, simplifying the routing tables.

At its core, CIDR is just a bitmask. For example, a CIDR value of 22 would result in a bitmask with the highest 22 bits set to 1, and the network prefix could be calculated by ANDing the IP address and the bitmask:

    xxxxxxxx xxxxxxxx xxxxxxxx xxxxxxxx
AND 11111111 11111111 11111100 00000000
---------------------------------------
    xxxxxxxx xxxxxxxx xxxxxx00 00000000

NAT

Network Address Translation, or NAT, was proposed in 1994 with RFC 1597. Before that, a company would need an IP address for each computer that was connected to the Internet. NAT helped by allowing a NAT gateway to map the entire private network to a single IP address, using port numbers.

IPv6

After a couple of years, IPv6 was proposed in december 1995 as the successor to IPv4 with RFC 1883, and later standardized in 1998. Since it changed the Internet Protocol, IPv4 and IPv6 are not backward-compatible, and a number of mechanisms were implemented over the years during the slow transition that is still going on to this day.

IPv6 addresses are 16 octets long (128 bits), the first half of which determining the network identifier and subnet, and the rest identifies the host within the network. This results in 2^64 (almost two quintillion) networks, each with as many possible hosts. This is again four times larger than before, and was deemed large enough, again, for the foreseeable future.

IP Address Format

So far, only the internal format of IP addresses were mentioned. An IPv4 address is 4 octets (32 bits) long, while an IPv6 address is 16 octets (128 bits). This is how they are used internally by hardware and software, just a simple integer value, and how they should be persisted to a database (not as a string!). Even better, whenever possible, the features that a database offers around IP addresses should be used. For example, PostgreSQL has types for inet and cidr, and functions that can be used on them.

This raw value, eg. 0x7f000001 for the localhost IPv4 address, is obviously not how they are to be represented for humans to read, and the following sections show how IP addresses are displayed and parsed.

Displaying IPv4 Addresses

An IPv4 address is displayed by separating each octet, in decimal, by a dot character. This is called the dot-decimal notation. Since leading zeroes are not displayed, each part will then be from 0 to 255. For example, the localhost IPv4 address is displayed as 127.0.0.1.

Parsing IPv4 Addresses

From the dot-decimal notation, it is easy to parse a string into its IPv4 address: split the string at every . character, convert each parts into integer values, shift them by the appropriate number of bits, then add them all together.

Most systems and tools use inet_aton (and its inverse, inet_ntoa) to convert a string into an IPv4 address’ underlying 32 bits. inet_aton is a library function that was added in the BSD operating system in the mid-1980s, and which gained popularity in other operating systems.

Unfortunately, parsing dot-decimal notation would be too easy, and there are actually some other ways to provide an IPv4 address string that can be parsed into its internal representation.

Expansion

Some parts can be intentionally left blank. When this happens, the last part is expanded to fill the missing ones. 127.0.1 is valid and is expanded into 127.0.0.1. The string 10.0.4660 is expanded into 10.0.18.52.

Where does the 18 and 52 come from? It is easier to use hexadecimal to illustrate what is happening. The decimal number 4660, in hexadecimal, is written as 1234. Since each octet is two hexadecimal digits, the last parts thus become 12 and 34, which are displayed in decimal as 18 and 52.

Going further, 10.1193046 will expand to 10.18.52.86 and taken to the extreme, even the first part can be omitted: 168965206 expands into 10.18.52.86 as well.

Sometimes, this may be seen as a handy shortcut if the application allows for it, for example 127.1 for localhost, or 1.1 for Cloudflare’s alternate DNS IP address.

Other Bases

The typical IPv4 address parts are displayed in decimal, but this is not always required. IPv4 addresses can be written in hexadecimal (0xa.0x12.0x34.0x56) and octal (012.022.064.0126).

This is usually tied to the system implementation that converts a string to a number, and is why it is a bad practice to display IP addresses in decimal with padded zeroes, eg. 010.018.052.086. This may look nice in a column of monospaced IP addresses, but copy-pasting these IP addresses might very well have it parsed in octal.

Resulting Confusion

To make matters worse, all of these can be mishmashed together to create monstrosities such as 10.022.0x3456, which results in 10.18.52.86. This is because IPv4 address representation was never standardized, and this led to differing and confusing implementations along the years.

Even if the various different ways to write IPv4 addresses are technically valid, it is heavily recommended to only use the typical dot-decimal IPv4 format. Taking two modern programming languages as example, Golang accepts any format mentioned above, while Rust only accepts the “strict” format by following the recommendations in RFC 6943.

Displaying IPv6 Addresses

Contrary to IPv4 addresses, IPv6 has defined a way to display its addresses from the beginning. It was later unofficially standardized in RFC 5952 to address some possible confusion.

An IPv6 address is displayed as eight groups of two octets in hexadecimal, separated by colon characters:

2001:db8::ff00:42:8329

The previous example uses the following two rules to shorten an otherwise much longer representation (2001:0db8:0000:0000:0000:ff00:0042:8329):

  1. Leading zeroes in a group do not indicate a different base and are removed.
  2. Consecutive groups of 0000 are replaced with ::, although only once.

Dot-Decimal Exception

There is only one well-defined exception to these rules, a format to help systems in a mixed environment of IPv4 and IPv6, with the last four octets written in dot-decimal as if they were an IPv4 address:

2001:db8::93.184.216.34

Parsing IPv6 Addresses

Parsing a typical IPv6 address string is straightforward: split the string by :, convert each hexadecimal part into integer values, bit-shift each by the correct amount, and add them all together.

Similarly to IPv4, there are library functions related to IPv6 addresses: inet_pton and its inverse inet_ntop. inet_pton parses a string into an internal IPv6 value, and also works with IPv4 addresses, although contrary to inet_aton, these must be in dot-decimal notation.

Displaying CIDR

As mentioned previously, CIDR notation is used as a bitmask to separate the network and the host. When displayed together with an IPv4 or IPv6 address, it directly follows the IP address with a slash character.

For example, the IPv4 address 93.184.216.34 belongs in a network that takes 24 bits. When talking about networks, it is displayed as 93.184.216.0/24. The IPv4 address 198.202.90.193 is in the network 198.202.64.0/18, which indicates that the network starts at 198.202.64.0 and ends at 198.202.127.255.

IPv6 addresses with CIDR notation are displayed exactly the same. 2001:db8::ff00:42:8329 is in the 2001:db8::/32 network, which means that the first 32 bits represent the network number. The addresses within that network then go from 2001:db8:: to 2001:db8:ffff:ffff:ffff:ffff:ffff:ffff.

Displaying Port Numbers

Before now, only IP addresses were mentioned. On top of the Internet Layer lies the Transport Layer that comes with port numbers. Where an IP address targets a host, a port number targets a process on that host.

When a port number is mentioned with an IPv4 address, they are displayed together and separated with a colon character. For example, 93.184.216.34:443 means the port 443 on the host located at 93.184.216.34.

On the other hand, IPv6 addresses are displayed using the same colon character, so to prevent any ambiguity, the address is enclosed in brackets, eg. [2001:db8::ff00:42:8329]:22 for the port 22.

Assignment of IP Addresses

This section gives an overview of how IP addresses are distributed, and how the end-user gets one.

The IANA

The IANA is an organisation in charge of assigning Internet numbers, including, most relevant for this post, IP addresses. It assigns IPv4 and IPv6 address blocks to the 5 Regional Internet Registries (RIR), each of which managing a region of the world: AFRINIC, APNIC, ARIN, LACNIC, and RIPE NCC. A RIR then assigns sub-blocks to Local Internet Registries (LIR), which are Internet Service Providers (ISP), large companies, or academic institutions.

For example, IANA assigned 2600::/12 to ARIN (a RIR) in december 2006, which then assigned 2607:c000::/32 to Teksavvy (a LIR, my ISP) in march 2010.

Note that some RIRs instead assign blocks to National Internet Registries (NIR), who are essentially middlemen that represent a country. These NIRs then assign sub-blocks to LIRs.

The ISP

An ISP is a company that connects end-users to the Internet. As seen previously, they receive blocks of IP addresses from a RIR (or NIR), which they then assign individually to their customers. For most residential customers, that IP address is dynamic, and may change over time. Static addresses, on the other hand, do not change, but must usually be purchased. These are desired by companies that are accessible from the Internet, lest they need to reconfigure their servers when their IP address changes.

The Router

When a user’s router is connected to the modem and turned on, it receives an IP address from the ISP’s DHCP server. All connections between the network behind the router and the Internet will use this public IP address.

The Device

Similarly to how the router received an IP address from the ISP, a network-connected device receives one from the router via DHCP. The difference is that this IP address is private to the network behind the router.

The device has no idea what the public IP address is. To find it, one must ask the router (usually via its GUI), or make a request to one of various services, like ip.me or ifconfig.me. Both of these services are accessible from a browser, and the terminal via curl. They simply answer with the request’s public IP address.

Geolocation Based on IP Addresses

Because an ISP usually operates around a region or country, the IP addresses that an ISP assigns to its customers can give an idea about where the user is located.

This is only a guess and doesn’t work every time, but this is one way how geoblocking works. For example, let’s say that a server receives a request from 93.184.216.34. A simple RDAP query will show that it belongs to the network 93.184.216.0/24, registered to the following address, and the server can decide to block the request or not:

EdgeCast Networks, Inc.
13031 W Jefferson Blvd
90094
Playa Vista CA
UNITED STATES

A Virtual Private Network (VPN) like Mullvad or ProtonVPN works by acting as a middleman between a user and the Internet. The VPN makes queries on behalf of the user, so the target website is not aware of the user, but instead sees the VPN’s location. Note that bypassing geoblocks may sometimes be illegal, and VPN connections might be flagged by some websites (eg. banks) as being suspicious.

Reserved IP Addresses

This section contains IP addresses that are sometimes useful to know. It is not meant to be comprehensive at all, but contains links that can be followed for more information.

Loopback

The localhost IP Address, 127.0.0.1, is known as the loopback address, and connects back to the host. Less-known is that it is actually the 127.0.0.0/8 range, so any IP address that starts with 127 will loopback to the host, if the system and its routing table are correctly configured.

On the other hand, a single IPv6 address, ::1 (technically ::1/128) is reserved for loopback.

Private Network

Some IP ranges are reserved for private networks. IPv4 addresses in the following ranges are for the current network, and will not be routed out to the Internet:

block address range class
10.0.0.0/8 from 10.0.0.0 to 10.255.255.255 class A
172.16.0.0/12 from 172.16.0.0 to 172.31.255.255 class B
192.168.0.0/16 from 192.168.0.0 to 192.168.255.255 class C

IPv6 has a huge range reserved for a private network, also called unique local address: fc00::/7. This means that all IPv6 addresses that start with fc or fd are only routable within the current private network.

Unspecified Address

The IP addresses 0.0.0.0 and :: mean an unspecified address. It has some different meanings, but it is useful when starting a server, to indicate that it listens on incoming requests from any IP addresses.

Documentation and Examples

Although RFC 1166 mentions some IPv4 addresses that are, or used to be, for documentation and examples, RFC 3849 does a better job by reserving the whole 2001:db8::/32 range.

Conclusion

IP addresses has been in use for more than 40 years. Its history is a ride through time explaining why and how they were modified and expanded to fit the needs of a growing planetary network.

It’s easy to dismiss or laugh at past decisions, but these must be viewed within their own context. One-octet network addresses may seem ludicrous now, but they come from a time when connections were slow and costly. Only research centers, universities, and the military were thought to ever have a use for computers, which were then huge expensive applicances that could only be used for science.

IPv4 and its many extensions came from the context of an exponentially-growing network. Nobody at that time could predict how the Internet would be used by companies, let alone by everyone at home.

Even though we could in theory assign an IPv6 address to each atom in the universe, the protocol was still created before the populatity of mobile and IoT devices. Who knows what kind of technology or paradigm shift might appear in this decade or the next? We might be one “secure connections using unique addresses for each packet” or “time-based rotating quantum qbit addresses” from realizing we need to start thinking about the next network address upgrade. After all, if history taught us anything, it’s that foreseeing the future is not something we’re often good at.

Simon Bernier

Simon Bernier