The Wayback Machine - https://web.archive.org/web/20160314204454/https://tools.ietf.org/html/draft-ietf-dhc-failover-02
[Docs] [txt|pdf] [Tracker] [WG] [Email] [Diff1] [Diff2] [Nits]
Versions: 00 01 02 03 04 05 06 07 08 09 10 11
12
Network Working Group Ralph Droms
INTERNET DRAFT Bucknell University
Greg Rabil
Mike Dooley
Arun Kapur
Quadritek Systems
Kim Kinnear
American Internet
Steve Gonczi
Bernie Volz
Process Software
August 1998
Expires March 1999
DHCP Failover Protocol
<draft-ietf-dhc-failover-02.txt>
Status of this Memo
This document is an Internet-Draft. Internet-Drafts are working
documents of the Internet Engineering Task Force (IETF), its areas,
and its working groups. Note that other groups may also distribute
working documents as Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as ``work in progress.''
To learn the current status of any Internet-Draft, please check the
``1id-abstracts.txt'' listing contained in the Internet-Drafts Shadow
Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe),
munnari.oz.au (Pacific Rim), ftp.ietf.org (US East Coast), or
ftp.isi.edu (US West Coast).
Abstract
DHCP [RFC 2131] allows for multiple servers to be operating on a
single network. Some sites are interested in running multiple servers
in such a way so as to provide redundancy in case of server failure.
In order for this to work reliably, the cooperating Primary and
Secondary servers must maintain a consistent database of the lease
Droms, et. al. [Page 1]
DRAFT January 1998
information. This implies that servers will need to coordinate any
and all lease activity so that this information is synchronized in
case of failover.
This document defines a protocol to provide this synchronization
between two servers. One server is designated the "Primary" server,
the other is the "Secondary" server. Additionally, this document
describes a protocol for the automatic transfer of control from the
Primary to the Secondary in the case of failure (failover), as well
as a network partition.
This document is a merge of draft-ietf-dhc-failover-01.txt and
draft-ietf-dhc-safe-failover-proto-00.txt, along with substantial
changes to each. Unfortunately, this merge was not completed with
sufficient time to allow review by any of the authors of draft-ietf-
dhc-failover-01.txt, and so it may well not reflect their views even
though their names appear as authors. See Section 11, issue #1 and
Section 12 for more details.
1. Introduction
As the use of DHCP servers in networked environments grows, the
dependency of those networks on the DHCP server increases. This is
particularly true of the hosts that receive their configuration
information from the DHCP server. Therefore, it is very important to
be able to provide reliable, continuous availability of DHCP ser-
vices.
This specification describes a protocol to support automatic failover
from a primary to its secondary server. The failover mechanism
allows the secondary server to perform DHCP actions while the primary
is down, or when a network failure prevents the primary and secondary
from communicating. The protocol also specifies how reintegration is
achieved when the primary again becomes operational or when the pri-
mary and secondary can again communicate.
In providing the specification for the failover, the protocol speci-
fies how to guarantee reliable delivery of changes to the secondary.
This is required to synchronize the secondary's lease data with that
of the primary. The protocol further specifies a mechanism to allow
the secondary to determine if it can communicate with the primary
server. The secondary will automatically begin to service DHCP
requests whenever it cannot communicate with the primary. When the
primary server becomes available again, the secondary will convey any
changes that occurred since the time of failover back to the primary.
Through careful control of the difference between the lease times
Droms, et. al. [Page 2]
DRAFT January 1998
offered to DHCP clients and the lease time known by the secondary
server, the protocol allows the primary to communicate with the
secondary after the primary has completed communication with the DHCP
client (a technique known as "lazy" update) and still guarantee that
duplicate IP address allocations do not occur. Thus, the protocol
does not directly impact the ability of a DHCP server to respond to
DHCP client requests.
1.1. Requirements Terminology
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119 [RFC 2119].
1.2. DHCP Terminology
This document uses the following terms:
o "DHCP client" or "client"
A DHCP client is an Internet host using DHCP to obtain confi-
guration parameters such as a network address.
o "DHCP server" or "server"
A DHCP server is an Internet host that returns configuration
parameters to DHCP clients.
o "binding"
A binding is a collection of configuration parameters, includ-
ing at least an IP address, associated with or "bound to" a
DHCP client. Bindings are managed by DHCP servers.
o "binding database"
The collection of bindings managed by a primary and secondary.
o "subnet address pool"
A subnet address pool is the set of IP address which is asso-
ciated with a particular network number and subnet mask. In
the simple case, there is a single network number and subnet
mask and a set of IP addresses. In the more complex case
(sometimes called "secondary subnets", sometimes "super-
scopes"), several (apparently unrelated) network number and
subnet mask combinations with their associated IP addresses
Droms, et. al. [Page 3]
DRAFT January 1998
may all be configured together into one subnet address pool.
o "primary server" or "primary"
A DHCP server configured to provide primary service to a set
of DHCP clients for a particular set of subnet address pools.
o "secondary server" or "secondary"
A DHCP server configured to act as backup to a primary server
for a particular set of subnet address pools.
o "stable storage"
Every DHCP server is assumed to have some form of what is
called "stable storage". Stable storage is used to hold
information concerning IP address bindings (among other
things) so that this information is not lost in the event of a
server failure which requires restart of the server.
1.3. Requirements for this protocol
The following list of goals must be (and are) achieved by this proto-
col.
1. Implementations of this protocol must work with existing DHCP
client implementations based on the DHCP protocol [1].
2. Implementations of the protocol must work with existing BOOTP
relay implementations.
3. The protocol must provide failover redundancy between servers
that are not located on the same subnet.
1.4. Goals for this protocol
1. Provide for continued service to DHCP clients through an
automated mechanism in the event of failure of the Primary
Server.
2. Avoid binding an IP address to a client while that binding is
currently valid for another client. In other words, don't
allocate the same IP address to two clients.
3. Minimize any need for manual administrative intervention.
Droms, et. al. [Page 4]
DRAFT January 1998
4. Introduce no additional delays in server response time as a
result of inter-server communication.
5. Share IP address ranges between primary and secondary
servers; i.e., impose no requirement that the pool of avail-
able addresses be divided between servers.
6. Continue to meet the goals and objectives of this protocol in
the event of server failure or network partition.
7. Provide graceful reintegration of full protocol service after
server failure or network partition.
8. Allow for one computer to act as a Secondary Server for mul-
tiple Primary Servers. Other topologies (e.g.: mesh) are also
possible. Primary and Secondary Servers SHOULD be viewed as
"logical" servers and not necessarily physical computers.
9. Ensure that an existing client can keep its existing IP
address binding if it can communicate with either the Primary
or Secondary DHCP server implementing this protocol - not
just whichever server that originally offered it the binding.
10.Ensure that a new client can get an IP address from some
server. Ensure that in the face of partition, where servers
continue to run but cannot communicate with each other, the
above goals and requirements may be met. In addition, when
the partition condition is removed, allow graceful automatic
re-integration without requiring human intervention.
11.If either Primary or Secondary Server loses all of the infor-
mation that is has stored in stable storage, it should be
able to refresh its stable storage from the other server.
1.5. Limitations of this Protocol
The following are explicit limitations of this protocol.
1. Under normal operation, only one server at a time will ser-
vice DHCP client requests; this protocol provides reliability
through redundancy but not load balancing.
2. This protocol provides only one level of redundancy through a
single Secondary Server for each Primary Server.
3. The protocol provides a way to detect when the primary and
secondary server cannot communicate, but once this condition
Droms, et. al. [Page 5]
DRAFT January 1998
has been detected, does not (indeed, cannot) provide any way
to further distinguish between network failure and failure of
one of the servers.
4. A small number of IP addresses are reserved for Secondary
Server use. In order to handle the failure case where both
servers are able to communicate with DHCP clients, but unable
to communicate with each other, a small number of IP
addresses must be set aside as a private address pool for the
Secondary Server. The Secondary can use these to service
newly arrived DHCP clients during such a period. The size of
this private pool SHOULD be based only on the arrival rate of
new DHCP clients and the length of expected downtime, and is
not influenced in any way by the total number of DHCP clients
supported by the server pair.
5. The Primary and Secondary Servers SHOULD pause normal DHCP
transaction processing while resynchronizing, after a system
failure.
2. Protocol Operations
The protocol necessary in providing redundant/failover servers can be
grouped in three areas:
o Messages to keep the Secondary Server's lease data synchron-
ized with that of the Primary so that when failover occurs,
there is no degradation of service.
o Messages that allow the Secondary to determine the operational
state of the Primary, so as to know when to start servicing
DHCP traffic.
o Messages that are used to coordinate the Primary regaining
control when it has become available again.
2.1. Time synchronization between communicating servers
Each Binding update message carries a "sent time stamp" (the time
when the message was sent in GMT). This provides a simple mechanism
to determine any "time drift" between communicating servers.
DISCUSSION:
If an UDP packet is successfully transmitted (i.e.: it does not
get lost), the packet travel time is negligible in the framework
Droms, et. al. [Page 6]
DRAFT January 1998
of DHCP leases. By providing a GMT "sent time" stamp, the reci-
pient can compare this with its notion of the current GMT time at
the time it receives the packet. The difference (plus the packet
travel time, which we ignore) is the time drift. The recipient
can use this time drift value to bias all "absolute time" values
it receives from the sender.
2.2. Failover Protocol Messages
The Failover Protocol messages are encoded using a packet format
specific to the Failover Protocol. To allow easy recognition of
Failover Protocol messages, BOOTP packet "op" field values 3..14 are
proposed to mark various Failover Protocol messages. A Failover Pro-
tocol message is always unicast from the source to the destination.
The sender, and never the recipient is responsible for reliable re-
transmission.
2.3. Failover Protocol packet header format
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| op (1) | rev (1) | payload offset (2) |
+---------------+---------------+---------------+---------------+
| xid (4) |
+---------------------------------------------------------------+
| 0 or more additional header bytes (variable) |
+---------------------------------------------------------------+
| Payload data, formatted as DHCP-style options |
| (although using a unique option number space) |
| (variable) |
+---------------------------------------------------------------+
op - 1 byte
These values extend the number space of the existing BOOTP message
type "Op" field. The following types are defined:
Droms, et. al. [Page 7]
DRAFT January 1998
3 DHCPPOOLREQ
4 DHCPPOOLRESP
5 DHCPBNDUPD
6 DHCPBNDACK
7 DHCPPOLL
8 DHCPPRPL
9 DHCPCTLREQ
10 DHCPCTLRET
11 DHCPCTLACK
12 DHCPCTLACKACK
13 DHCPREQUEREQ
14 DHCPREQUERESP
rev - 1 byte
Failover protocol version supported. Set to 1 for the Failover Proto-
col described in this draft.
payload offset - 2 bytes, network byte order
The byte offset of the Payload area, from the beginning of the Fail-
over packet header. The value for the current protocol version is 8.
xid - 4 bytes, network byte order
The sender of a failover protocol packet is responsible for setting
this number, and the receiver of the packet copies the number over
into any response packet. To the receiver it is opaque. The sender
SHOULD ensure that every packet sent to a particular IP address and
port combination has a unique transaction id unless that packet is a
re-transmission.
2.4. DHCPPOOLREQ and DHCPPOOLRESP:
Whenever the Secondary server transitions into NORMAL mode, it first
sends a DHCPPOOLREQ message to initiate a transfer of a small range
of IP addresses that will serve as its private address pool.
This is necessary, because initially the Secondary server has no such
address pool, and its pool gets depleted when it hands out addresses
in COMMUNICATION-INTERRUPTED mode. This is why the request is sent
every time the Secondary server transitions into NORMAL mode. The
DHCPPOOLREQ message does not carry any payload data. When the Primary
Server gets a DHCPPOOLREQ message, it computes which addresses should
be transferred to the Secondary, and queues up DHCPBNDUPD transac-
tions, setting the Status of these bindings to "BACKUP". Having done
this, it sends a DHCPPOOLRESP message. The DHCPPOOLRESP message
Droms, et. al. [Page 8]
DRAFT January 1998
carries the "Number of addresses transferred" as its payload.
The Secondary server keeps sending DHCPPOOLREQ messages until it
receives a DHCPPOOLRESP with "Number of addresses transferred" = 0,
or it decides that the partner is not responding. Each one of these
message MUST have the same transaction ID. If a new transaction ID
is used in one of these messages, the receiving server will begin the
transmission of the DHCPBNDUPD messages all over again. To be clear,
if the Secondary Server receives a DHCPPOOLRESP message with "Number
of addresses transferred" > 0, it MUST send another DHCPPOOLREQ mes-
sage. This mechanism makes it possible for the Primary Server to pace
the transfer (e.g., it could generate all addresses all at once, or
one-by-one).
The Primary Server must respond to each DHCPPOOLREQ message it
receives. If it has already generated all private addresses, or it
has no available addresses, it MUST send DHCPPOOLRESP with "Number
of addresses transferred" = 0.
2.5. DHCPREQUEREQ and DHCPREQUERESP:
Whenever either server wishes to be updated with the information that
the other server knows and has not yet transmitted to it, will send a
DHCPREQUEREQ.
The DHCPREQUEREQ message does not carry any payload data. When the
either server gets a DHCPREQUEREQ message, it computes which updates
should be transferred to the Secondary, and queues up DHCPBNDUPD
transactions as appropriate. Having done this, it sends a DHCPRE-
QUERESP message. The DHCPREQUESP message carries the "Number of
addresses queued up" as its payload. The set of binding updates
queued up will depend on the requesting server's state. (The state
has already been communicated via prior DHCPPOLL/DHCPPRPL messages)
The Secondary server keeps sending DHCPPREQUEREQ messages until it
receives a DHCPREQUERESP with "Number of addresses queued up" = 0,
or it decides that the partner is not responding. This is the same
approach as in the DHCPPOOLREQ/DHCPPOOLRESP messages is used. Each
one of these DHCPREQUEREQ message MUST have the same transaction ID.
Use of a new transaction ID will cause re-building of the outgoing
binding update queue.
The Primary Server must respond to each DHCPREQUEREQ message it
receives. If it has already queued up all of the previously unsent
bindings update, then it MUST send DHCPREQUERESP with "Number of
addresses queued up" = 0.
Droms, et. al. [Page 9]
DRAFT January 1998
2.6. DHCPBNDUPD
The Primary notifies Secondary (or the other way around) of a binding
state and data change.
In response to a binding update, the recipient server MUST respond
with a DHCPBNDACK message. Multiple binding updates can be batched
up, and sent in one Failover Protocol message.
2.7. DHCPBNDACK
This message implements a positive, or negative acknowledgement of
one or more binding updates.
A binding update, (or a batch of binding updates sent as one message)
are matched up with their associated acknowledgment by having the
same Xid field value in the message header.
The server sending a DHCPBNDACK message MAY include any of the
options that are acceptable in a DHCPBNDUPD message when the
DHCPBNDACK message returned to the sender. If any of this informa-
tion differs from the information in the DHCPBNDUPD message, the
receiver SHOULD update its bindings database with that information
upon receipt of the DHCPBNDACK message.
The DHCPBNDACK MAY selectively reject one or more updates by includ-
ing one or more IP address - Reject Reason option pairs in the mes-
sage body.
The DHCPBNDACK implicitly acknowledges any binding updates it replies
to, except those it enumerates using Reject Reason Codes.
2.8. DHCPPOLL
In order to determine the state of a given server, or to communicate
a critical change in its own status, a participant can use the above
message.
This message inquires about the current state of the recipient, and
tells the recipient what state the sender is.
In response to the DHCPPOLL message, the participant will listen for
a DHCPPRPL message.
Droms, et. al. [Page 10]
DRAFT January 1998
2.9. DHCPPRPL
This message replies to the DHCPPOLL message (PRPL=Poll reply). The
DHCPPRPL also carries server status information (see message payload
details below).
After a failover, when the Primary Server is restarted, the following
messages are used to coordinate the Primary taking control back from
the Secondary:
DHCPCTLREQ - Request for control
DHCPCTLRET - Return of control initiated
DHCPCTLACK - Return of control completed
DHCPCTLACKACK - Return of control completed message acknowledged.
The Primary Server sends a DHCPCTLREQ message, indicating that it
would like to take control of the bindings database. The Secondary
Server replies with a DHCPCTLRET message, which serves as a signal to
the Primary "Stand by to receive binding updates". This message then
is followed by a set of binding updates from the secondary to the
primary. When all updates have been transmitted (and acknowledged)
from Secondary to Primary, a DHCPCTLACK message is sent from the
Secondary to the Primary, to signal that "all updates from the Secon-
dary are now completed".
DISCUSSION:
Note, that the DHCPCTLACK message type must be transmitted reli-
ably, as the Primary Server will not start servicing clients,
until it has received the DHCPCTLACK message. To provide this
reliability, the DCHPCTLACKACK message is provided. This provides
an acknowledgment of the DHCPCTLACK message, and the DHCPCTLACK
message will be periodically re-sent until it is acknowledged. We
could just periodically re- send the DHCPCTLACK message until we
start receiving binding updates from the Primary, but the Primary
may not have any updates to send at all, hence the need for an
explicit DCHPCTLACKACK message.
The Primary Server transitions into NORMAL state upon receiving a
DHCPCTLACK from the secondary, when the secondary has completed send-
ing all of its updates during synchronization. The DHCPCTLACKACK
message is needed to prevent the primary from waiting and not servic-
ing clients if the DHCPCTLACK message got lost. The Secondary server
will keep re-sending the DHCPCTLACK message, until:
1. It Decides that the primary is not responding, so the Secon-
dary server goes into COMMUNICATION- INTERRUPTED mode.
Droms, et. al. [Page 11]
DRAFT January 1998
2. It receives a DHCPCTLACKACK or a DHCPBNDUPD message from the
primary. The Primary's DHCPBNDUPD messages would start
arriving at the Secondary server, if the Primary did get the
DHCPCTLACK, but the DHCPCTLACKACK message got lost.
3. Protocol Payload Data Format
Payload data is encoded as a set of flexible DHCP/BOOTP style
options. (The usual 1 byte option code, 1 byte length, and "length"
bytes of data). The options are placed after the header, after skip-
ping PayloadOffset bytes. The payload data options are not preceded
"cookie" value.
Since the packet is NOT a DHCP/BOOTP protocol packet, the options
used here do not conflict with any existing "proper" DHCP/BOOTP
options. In fact, these options are allocated in relationship to the
DHCP option space in the following way. In cases where the syntax
and semantics of a Failover Payload Option is identical to that of a
DHCP/BOOTP option, the same number option number is used. For
options unique to the Failover protocol, options numbers starting at
230 are used.
Thus, all new Failover Protocol option numbers are assigned from a
continuous range beginning with 230. This number is shown as an X in
the tables below.
The protocol is permissive in allowing various other DHCP options in
binding updates. As long as the sender wishes to use an option, it
MAY include it. On the other hand, the recipient MUST ignore any
option it is not expecting.
Multiple DHCPBNDUPD transactions can be batched together in one UDP
packet. Option sets for individual transaction MUST always begin
with the IP address (Option 50) . This is the only restriction on
payload item ordering. In any other case, payload data items can be
included in any desired order.
In case an implementation chooses to use the DHCPBNDNAK mechanism,
the DHCPBNDNAK message SHOULD contain one or more Option 50s from the
NAK-ed message, to indicate which specific update items are being
NAK-ed.
While the synchronization is in progress, the secondary MUST NOT
accept client requests, and the primary MUST NOT send any updates to
the secondary. This is necessary to allow the Primary to be the sole
arbitrator of any conflicting updates.
Droms, et. al. [Page 12]
DRAFT January 1998
3.1. DHCP Server Status
This option is used to convey the current state of a server.
Code Len Type
+--+---+------+
| X| 1 | 1-15 |
+--+---+------+
Allowed values for this option:
Value Message Type
----- ------------
1 UNKNOWN-STATE
2 PRIMARY-NORMAL Normal state
3 BACKUP-NORMAL
4 PRIMARY-COMINT Communication interrupted (safe)
5 BACKUP-COMINT
6 PRIMARY-PARTNERDOWN Partner down (unsafe
mode)
7 BACKUP-PARTNERDOWN
8 PRIMARY-CONFLICT Synchronizing, after a
"Partner-Down"
divergence
9 PRIMARY-SYNC Synchronizing, after a
"communications-
interrupted"
divergence.
10 BACKUP-SYNC
11 PRIMARY-RECOVER Recovering ALL
bindings from partner
12 BACKUP-RECOVER
13 FAILOVER-DISABLED The server is running
with the failover
protocol disabled.
(standalone)
14 SERVER-PAUSED The server is inactive,
shutting down for a sort period.
15 SERVER-SHUTDOWN The server is inactive,
shutting down for an extended period.
When a server is being re-started, it should send a DHCPPOLL message
to its partner, reporting its status (SERVER-PAUSED). In response,
the recipient SHOULD go into COMMUNICATION-INTERRUPTED mode.
Droms, et. al. [Page 13]
DRAFT January 1998
When a server is being shut down, it should send a DHCPPOLL message
to its partner, reporting its status (SERVER-SHUTDOWN).
In response, the recipient SHOULD go into PARTNER-DOWN mode.
3.2. DHCP Binding Status
This option is used to convey the current state of a binding. This
option is mandatory for DHCPBNDUPD messages.
Code Len Type
+-----+-----+-----+
| X+1 | 1 | 1-7 |
+-----+-----+-----+
Legal values for this option are:
Value Message Type
----- ------------
1 FREE The lease has never been used
2 ACTIVE assigned to a client *
3 EXPIRED
4 RELEASED A client released the lease
5 ABANDONED A server or client flagged address
as not usable.
6 RESET Lease was freed by some
external agent.
7 BACKUP Lease is set aside for Secondary
server's private address pool.
3.3. Assigned IP address
Uses identical code and format to DHCP Option 50 (requested IP
address).
Code Len Address
+-----+-----+-----+-----+-----+-----+
| 50 | 4 | a1 | a2 | a3 | a4 |
+-----+-----+-----+-----+-----+-----+
Droms, et. al. [Page 14]
DRAFT January 1998
3.4. Lease grant time
An absolute, GMT time value for this option, as time synchronization
has already been achieved between the source and the target server
using the Sent Time Stamp option. Represented as seconds since Jan
1, 1970 (i.e. ANSI C time_t time value representation).
Code Len Time
+------+-----+-----+-----+-----+-----+
| X+2 | 4 | t1 | t2 | t3 | t4 |
+------+-----+-----+-----+-----+-----+
3.5. Sent Time Stamp
A time stamp using GMT, when the packet was sent. It is used to
determine the time drift between the sender and the recipient. The
time drift is defined as the difference between "Arrive Time (GMT)"
and (Send Time (GMT)" . The actual packet travel time is assumed to
be negligible in this context. All Date-Time values contained in
Failover messages will be corrected by the time drift before being
stored by the recipient.
Code Len Time
+-----+-----+-----+-----+-----+-----+
| X+3 | 4 | t1 | t2 | t3 | t4 |
+-----+-----+-----+-----+-----+-----+
The time is a 32 bit unsigned long in network byte order, in units of
seconds (GMT since EPOCH).
3.6. Number of addresses transferred to Secondary Server
A 32 bit unsigned long in network byte order. Reports the number of
addresses transferred by the Primary to the Secondary Server
(addresses to be used for the Secondary Server's private address
pool)
Droms, et. al. [Page 15]
DRAFT January 1998
Code Len Time
+-----+-----+-----+-----+-----+-----+
| X+4 | 4 | t1 | t2 | t3 | t4 |
+-----+-----+-----+-----+-----+-----+
3.7. Lease Duration
Uses the format and code of the standard DHCP IP Address Lease Time
option. It is used by the DHCP protocol in the exact same way by the
DHCPOFFER message. The time is in units of seconds, and is specified
as a 32-bit unsigned integer. A Lease Duration of 0xFFFFFFFF indi-
cates an infinite lease.
Code Len Lease Time
+-----+-----+-----+-----+-----+-----+
| 51 | 4 | t1 | t2 | t3 | t4 |
+-----+-----+-----+-----+-----+-----+
3.8. Client Identifier
The format, code and conventions used are identical to DHCP option
61.
Code Len Type Client-Identifier
+-----+-----+-----+-----+-----+---
| 61 | n | t1 | i1 | i2 | ...
+-----+-----+-----+-----+-----+---
3.9. Client Hardware Address
The format is similar to DHCP option 61. T1 (type) MUST be set to the
proper ARP hardware address code ( it MUST NOT be zero!) TBD: Refer-
ence the ARP document here.
Droms, et. al. [Page 16]
DRAFT January 1998
Code Len Type Client-Identifier
+-----+-----+-----+-----+-----+---
| X+5 | n | t1 | i1 | i2 | ...
+-----+-----+-----+-----+-----+---
Either Client Id, Client Hardware Address or BOTH MAY be present in
binding update transactions. At least one of them MUST be present.
If both are present, the Client Id MUST be used to uniquely identify
the owner of the binding (exactly as in RFC 2131).
3.10. Host Name
Uses the format and code of DHCP option 12.
Code Len Host Name
+-----+-----+-----+-----+-----+-----+-----+-----+--
| 12 | n | h1 | h2 | h3 | h4 | h5 | h6 | ...
+-----+-----+-----+-----+-----+-----+-----+-----+--
3.11. Domain Name
Uses the format and code of DHCP option 15.
Code Len Domain Name
+-----+-----+-----+-----+-----+-----+--
| 15 | n | d1 | d2 | d3 | d4 | ...
+-----+-----+-----+-----+-----+-----+--
3.12. Reject Reason Code
This option is used to selectively reject binding updates. It MAY be
used in DHCPBNDACK message, always following an option 50.(The option
50 contains the IP address of the specific update being rejected).
Droms, et. al. [Page 17]
DRAFT January 1998
Code Len Reason code
+-----+-----+-----+
| X+6 | 1 | R1 |
+-----+-----+-----+-
Reason codes :
1 Illegal IP address (not part of any address pool)
2 Fatal conflict exists: address in use by other client.
3.13. MDLI
Maximum Delta Lease Interval, in seconds. A 32 bit integer value,
in netwotk byte order.
Code Len Time
+------+-----+-----+-----+-----+-----+
| X+7 | 4 | t1 | t2 | t3 | t4 |
+------+-----+-----+-----+-----+-----+
4. Exchange of control between Primary and Secondary
The Primary and Secondary Servers coordinate the exchange control
over the bindings database through the use of DHCPPOLL and DHCPCTLREQ
messages. In normal operation:
The Primary sends notification of each change to its bindings data-
base to the Secondary, and the Secondary keeps its bindings database
synchronized with the Primary's database.
The Secondary periodically sends DHCPPOLL messages to the Primary,
and the Primary responds to each DHCPPOLL message with a DHCPPRPL
message. If the Secondary does not receive a DHCPPRPL response mes-
sage, the Secondary takes control of the bindings database and begins
answering requests from DHCP clients. Note that the Secondary should
be able to be configured to not perform the automatic switch-over.
The conditions under which a Secondary takes control of the bindings
database, e.g., the number of consecutive missing acknowledgments,
should be configurable in the Secondary by the DHCP administrator.
Droms, et. al. [Page 18]
DRAFT January 1998
The Secondary records any changes it makes to the bindings database
while it has control. The Secondary continues to send DHCPPOLL mes-
sages to the Primary. The DHCPPOLL messages also carry information
on the state of the Secondary Server.
To regain control of the bindings database, e.g., after the Primary
Server has recovered from a failure, or a partitioned network condi-
tion, the Primary sends a DHCPCTLREQ message to the Secondary. The
Secondary stops answering DHCP client requests, and responds to its
Primary with a DHCPCTLRET message. After sending the DHCPCTLRET mes-
sage, the Secondary sends DHCPBNDUPD messages for each of the changes
it has made to the bindings database.
The Primary sends a DHCPBNDACK for each DHCPBNDUPD message it
receives. The Secondary completes the transfer of control by sending
a DHCPCTLACK message to the Primary as soon as all of its updates
were acknowledged.
Note, that the Primary SHOULD NOT send any DHCPBNDUPD messages while
synchronization is in progress with the Secondary.
Once the synchronization is completed, and the Primary transitions
into NORMAL state, and starts sending DHCPBNDUPD transactions on any
accumulated binding changes it may have.
5. Duplicate address assignment scenarios
In the following two scenarios, the protocol could end up allocating
duplicate IP addresses, unless the measures recommended in Section 6.
are taken:
Primary Server crash before "lazy" update: In the case where the Pri-
mary Server sends an ACK to a client for a newly allocated IP address
and then crashes prior to sending the corresponding update to the
Secondary Server, the Secondary Server will have no record of the IP
address allocation. When the Secondary Server takes over, it may
well try to allocate that IP address to a different client. In the
case where the first client to receive the IP address is not on the
net at the time (yet while there was still time to run on its lease),
an ICMP echo (i.e., ping) will not prevent the Secondary Server from
allocating that IP address to different client.
A more likely and subtle version of this problem is where the Primary
Server crashes after extending a client's lease time, and before
updating the Secondary with a new time using a lazy update. After the
Secondary takes over, if the client is not connected to the network
the Secondary will believe the client's lease has expired when, in
fact, it has not. In this case as well, the IP address might be
Droms, et. al. [Page 19]
DRAFT January 1998
reallocated to a different client while the first client is still
using it.
Network partition where servers can't communicate but each can talk
to clients: Several conditions are required for this situation to
occur. First, due to a network failure, the Primary and Secondary
Servers cannot communicate. As well, some of the DHCP clients must
be able to communicate with the Primary Server, and some of the
clients must now only be able to communicate with the Secondary
Server. When this condition occurs, both Primary and Secondary
Servers could attempt to allocate IP addresses for new clients from
the same pool of available addresses. At some point, then, two
clients will end up being allocated the same IP address. This will
cause potentially serious problems when the network failure that
created this situation is corrected.
The next section details how the Failover Protocol prevents either of
the above scenarios (and other related scenarios) from causing dupli-
cate IP address allocation.
6. Duplicate Address Assignment Control
There are several ways that the Failover protocol avoids the possi-
bility of duplicate address assignment.
6.1. Control of lease time
The key problem with lazy update is that when the primary server
fails after updating a client with a particular lease time and before
updating the secondary server, the secondary server will believe that
a lease has expired even though the client still retains a valid
lease on that IP address.
In order to handle this problem, a period of time known as the "max-
imum delta lease interval" (MDLI) is defined and must be known to
both the primary and secondary servers. Proper use of this time
interval places an upper bound on the difference allowed between the
lease time provided to a DHCP client and the lease time known by the
secondary server. In order that this is not the maximum lease time
that the primary can ever provide to a client, during a lazy update
the primary typically updates the secondary with lease time informa-
tion which is longer than the lease time previously given to the
client.
In the case where the secondary needs to take over from the primary,
the secondary will not reallocate any IP addresses from one client to
a different clients. When transitioning to the PARTNER-DOWN state
(where the secondary is allowed to reallocate IP addresses), the
Droms, et. al. [Page 20]
DRAFT January 1998
secondary will wait the maximum-delta-lease-interval before complet-
ing the state transition. Thus, any clients which have a lease on an
IP address with a lease time greater that than known by the secondary
will either have contacted the secondary during that time or the
their lease will have expired.
This protocol requires a DHCP server to deal with several different
lease intervals and places specific restrictions on their relation-
ships. The purpose of these restrictions is to allow the other server
in the pair to be able to make certain assumptions in the absence of
an ability to communicate between servers.
The different lease times are:
o desired client lease interval
The desired client lease interval is the lease interval that
the DHCP server would like to give to the DHCP client in the
absence of any restrictions imposed by the Failover Protocol.
Its determination is outside of the scope of this protocol.
Typically this is the result of external configuration of a
DHCP server.
o actual client lease interval
The actual client lease internal is the lease interval that
that DHCP server gives out to the DHCP client. It may be
shorter than the desired client lease interval (as explained
below).
o Primary Server lease interval
The Primary Server lease interval is the interval after which
the Primary Server believes that DHCP client's lease will
expire.
o desired Secondary Server lease interval
The desired Secondary Server lease interval is the interval
the Primary Server tells to the Secondary Server after which
the lease will expire.
o acknowledged Secondary Server lease interval
The acknowledged Secondary Server lease interval is the inter-
val the Secondary Server has most recently acknowledged. The
key restriction (and guarantee) that the Primary Server makes
with respect to lease intervals is that the actual client
Droms, et. al. [Page 21]
DRAFT January 1998
lease interval never exceeds the acknowledged Secondary Server
lease interval (if any) by more than a fixed amount. This
fixed amount is called the "maximum delta lease interval"
(MDLI).
The MDLI MAY be configurable, but for correct server operation it
MUST be known to both the Primary and Secondary Servers.
The Primary Server MUST record in its state both the Primary Server
lease interval and the most recently acknowledged Secondary Server
lease interval. It is assumed that the desired client lease interval
can be determined through techniques outside of the scope of this
protocol.
The above lease time descriptions are written for the case where the
where the Primary server is operating and in communication with the
Secondary server. In the case where the Secondary server is operat-
ing out of communications with the Primary server, then the relation-
ships must hold in the other direction.
The fundamental relationship among these times which MUST be main-
tained is:
actual client lease interval <
( acknowledged other server lease interval + MDLI )
The "acknowledged other server lease interval" is the acknowledged
secondary server lease interval for the Primary server, and it would
be the acknowledged primary server lease interval for the Secondary
server when it is operating out of contact with the Primary server.
DISCUSSION:
This protocol mandates no particular detailed algorithms concern-
ing these lease intervals, as long as above fundamental relation-
ship is preserved.
In the interests of clarity, however, let's examine a specific
example. The MDLI in this case is 1 hour. The desired client
lease interval is 3 days. In operation this might work as fol-
lows:
When a Primary Server makes an offer for a new lease on an IP
address to a DHCP client, it determines the desired client lease
interval (in this case, 3 days). It then examines the ack-
nowledged Secondary lease interval (which in this case is zero).
Droms, et. al. [Page 22]
DRAFT January 1998
Since the actual client lease interval can not be allowed to
exceed the current Secondary lease interval by more than the MDLI,
the offer made to the DHCP client (the actual client lease inter-
val) is for (essentially) the MDLI, 1 hour.
Once the Primary Server has performed the ACK to the DHCP client,
it will update the Secondary Server with the lease information.
However, the Secondary Server lease interval will be composed of
the current actual client lease interval + ( 1.5 * desired client
lease interval). Thus, the Secondary Server is updated with a
lease interval of 4.5 days + 1 hour.
When the Primary Server receives an ACK to its update of the
Secondary Server's lease interval, it records that as the ack-
nowledged Secondary Server lease interval. The Primary Server
MUST ensure that the Secondary Server has received and recorded in
its stable storage the Secondary Server lease interval.
When the DHCP client attempts to renew at T2 (approximately one
half an hour from the start of the lease), the Primary Server
again determines the desired client lease time, which is still 3
days. It then compares this with the remaining acknowledged
Secondary Server lease interval (adjusting for the time passed
since the Secondary Server was last updated), which is 4.5 days +
to the desired client lease interval as it is less than the ack-
nowledged Secondary lease interval.
When the Primary DHCP server updates the Secondary DHCP server
after the DHCP client's renewal ACK is complete, it will calculate
the Secondary Server lease interval as the actual client lease
interval (3 days this time) + .5 the desired client lease interval
(1.5 days). In this way, the Primary attempts to have the Secon-
dary always "lead" the client in its understanding of the client's
lease interval.
Once the initial actual client lease interval of the MDLI is past,
the protocol operates effectively like the DHCP protocol does
today in its behavior concerning lease intervals. However, the
guarantee that the actual client lease interval will never exceed
the acknowledged Secondary Server lease interval by more than the
MDLI allows full recovery from failures in lazy update.
6.2. Controlled re-allocation of IP addresses
When the servers cannot communicate neither server will allow an IP
address previously used by one client to be offered to a different
client. As a corollary, during normal operations the primary server
Droms, et. al. [Page 23]
DRAFT January 1998
must update the secondary server whenever a lease expires or an IP
address is released, and must receive acknowledgement of that update
before offering the IP address of the expired or released IP address
to a different client.
7. Server States
The following server states are defined:
NORMAL State:
NORMAL state is the state used by a server when it can communicate
with the other server in the Primary-Secondary Server pair. When in
this state, the Primary responds to DHCP clients requests, while the
Secondary does not.
COMMUNICATION-INTERRUPTED state:
A server goes into this state whenever it is unable to communicate
with the other server. Both the Primary and Secondary Servers can go
into this state, although the behavior changes that result are dif-
ferent. Primary and Secondary Servers cycle automatically (without
administrative intervention) between NORMAL and COMMUNICATION-
INTERRUPTED state as the network connection between them fails and
recovers, or as the partner server cycles between operational and
non-operational. No duplicate IP address allocation can occur while
the servers cycle between these states. In this state both servers
may respond to DHCP client requests. When allocating new IP
addresses, each server allocates from a different pool. When respond-
ing to renewal requests, each server will allow continued renewal of
a DHCP client's current lease on an IP address.
PARTNER-DOWN state:
PARTNER-DOWN state is a state either server can enter. Once a server
has entered NORMAL state, the PARTNER-DOWN state is entered only on
command of an external agency (typically an administrator of some
sort) or after the expiration of an externally configured minimum
safe-time after the beginning of COMMUNICATION-INTERRUPTED state.
When in this state, the server no longer assumes that the other
server could still be operational and servicing a a different set of
clients, but instead assumes that it is the only server operating.
Only one server should be operating in this state at a time. The
server in this state will respond to DHCP client requests. It will
allow renewal of all outstanding leases on IP addresses, and will
allocate IP addresses from its own pool, and after a fixed period of
time, it will allocate IP addresses from the set of all available IP
Droms, et. al. [Page 24]
DRAFT January 1998
addresses. The server will transition out of PARTNER-DOWN state after
automatic re-integration the companion server is complete. This
automatic re- integration will typically be initiated by the restart
of the server which was down.
POTENTIAL-CONFLICT state:
This state indicates that the two servers are attempting to rein-
tegrate with each other, but at least one of them was running in a
state that did not guarantee automatic reintegration would be possi-
ble. In POTENTIAL-CONFLICT state the servers may determine that the
same IP address has been offered and accepted by two different DHCP
clients.
RECOVER state:
This state indicates that the server has no information in its stable
storage. A server in this state will attempt to refresh its stable
storage from the other server.
SYNC state:
In this state, the Secondary Server attempts to synchronize its
stable storage with the Primary Server. Both the Primary and Secon-
dary may have information that the other lacks.
8. Primary Server Operation
This section discusses the operation of the primary server using the
state transition diagram in Figure 8.2-1.
8.1. Primary Server Initialization
When the Primary Server starts, there are three possibilities: it
has never started before and therefore has no record of any previous
state nor of any client binding information; it has started before
and has a record of a previous state and possibly of some client
binding information; it has started before, but failed catastrophi-
cally, and now has no record of any previous state (nor of any client
binding information).
When the Primary Server starts, if it has any record of a previous
state, then if that state was NORMAL or COMMUNICATION-INTERRUPTED it
moves to COMMUNICATION- INTERRUPTED state. If that state was
PARTNER-DOWN or POTENTIAL-CONFLICT, then it moves to PARTNER-DOWN
state. If that state was RECOVER, then the Primary Server moves into
the RECOVER state.
Droms, et. al. [Page 25]
DRAFT January 1998
If it has no record of any previous state, then either this is an
initial startup, or a recovery from a catastrophic failure where
stable storage and all client binding information was lost. These are
distinguished by recovery from a catastrophic failure being indicated
by some external configuration indication to the Primary Server.
8.2. Primary Server State Transitions
Figure 8.2-1 is the diagram of the Primary Server's state transi-
tions. The remainder of this section contains information important
to the understanding of that diagram.
The server stays in the current state until all of the actions speci-
fied on the state transition are complete. If communications fails
during one of the actions, the server simply stays in the current
state and attempts a transition whenever the conditions for a transi-
tion are later fulfilled.
In the state transition diagram below, the "+" or "-" in the upper
right corner of each state is a notation about whether communication
is ongoing with the Secondary Server. The legend "responsive" and
"unresponsive" in each state indicates whether the Primary Server is
responsive to DHCP client requests in the respective state.
In the diagram state transition diagram below, when communication is
reestablished between the Primary and Secondary Server, the Primary
server must record the state of the Secondary Server when the commun-
ication was reestablished.
If the state of the Secondary Server changes while communicating,
then the Primary Server moves through the communications-failed tran-
sition, and into whatever state results. It then immediately moves
through whatever state transition is appropriate given the current
state of the Secondary Server.
DISCUSSION:
The point of this technique is simplicity, both in explanation of
the protocol and in its implementation. The alternative to this
technique of memory of partner state and automatic state transi-
tion on change of partner state is to have every state in the fol-
lowing diagram have a state transition for every possible state of
the partner. With the approach adopted, only the states in which
communications are reestablished require a state transition for
each possible partner state.
All state transitions of the Primary Server must be recorded in its
stable storage, and thus be available to the server after a server
Droms, et. al. [Page 26]
DRAFT January 1998
restart.
Previous Primary State:
NORMAL or RECOVER PARTNER DOWN
COMMUNICATION <ext. cmd> POTENTIAL CONFLICT
INTERRUPTED | <none>
+---+ V |
| +----------------+ +-----------------+
| | - | | - |
| | RECOVER | | PARTNER DOWN |<-----+
| | (unresponsive) | | (responsive) | |
| +----------------+ +-----------------+ |
| | | | ^ |
| Comm. OK | Comm. OK | |
| Sec. State: | Sec. State: Comm. |
| | | V All Others Failed |
| | RECOVER +<---+ V | |
| All | | +-------------+ |
| Others | Comm. OK | POTENTIAL +| |
| | Note Sec. State: | CONFLICT | |
| | Poss. RECOVER |(responsive) |<---- | --+
| V Error NORMAL +-------------+ | |
| Sec->Pri | Pri->Sec | | |
| Sync | Sync. Resolve Conflict | |
| | | V V | |
| Wait MDLI | +-----------------+ | |
| from Fail. | | + | External | |
| V V | NORMAL |-Command-->+ |
| +-----++------>| (responsive) | | |
| ^ +-----------------+ | |
| | | | |
| Pri<->Sec Comm. External |
| Sync Failed Command |
| | | or |
| Comm. OK | "Safe Period" |
| Sec. State: V expiration |
| NORMAL +-----------------+ | |
| COMM. INT. | - |---------->+ |
| RECOVER------| COMMUNICATIONS | |
| | INTERRUPTED | Comm. OK |
+------------------>| (responsive) |--Sec. State:--+
+-----------------+ All Others
Figure 8.2-1: Primary Server state diagram.
Droms, et. al. [Page 27]
DRAFT January 1998
8.3. Primary Server in PARTNER-DOWN state
When it is in PARTNER-DOWN state, the Primary Server operates largely
as does a normal DHCP server, with none of the special algorithms
described below. In PARTNER-DOWN state the Primary Server MUST
respond to DHCP client requests.
Any available IP address tagged as belonging to the Secondary Server
(at entry to PARTNER-DOWN state) MUST NOT be used until the MDLI
beyond the entry into PARTNER-DOWN state has elapsed.
The Primary Server MUST NOT allocate an IP address to a DHCP client
different from that to which it was allocated at the entrance to
PARTNER-DOWN state until the MDLI beyond the its expiration time has
elapsed. If this time would be earlier than the current time plus
the MDLI, then the current time plus the MDLI is used.
Two options exist for lease times, with different ramifications flow-
ing from each.
If the Primary Server wishes the Failover Protocol to protect it from
loss of stable storage in any state, then it should ensure that the
MDLI based lease time restrictions in Section 6.1 are maintained,
even in PARTNER-DOWN state.
If the Primary Server wishes to forego the protection of the Failover
Protocol in the event of loss of stable storage, then it need recog-
nize no restrictions on actual client lease times while in PARTNER-
DOWN state.
The Primary Server MUST poll the Secondary Server and attempt to
establish communications and synchronization with it.
Once the Primary succeeds in contacting the Secondary Server, the
Primary examines the state of the Secondary Server. If the state of
the Secondary Server is RECOVER or NORMAL, then both servers have
been running in such a way that duplicate IP address allocations were
inhibited. In this case, the Primary Server updates the Secondary
Server with its client binding information, and moves into the NORMAL
state.
Once contact has been established, if the state of the Secondary
Server is anything other than RECOVER or NORMAL then the Primary
Server moves into the POTENTIAL-CONFLICT state.
8.4. Primary Server in RECOVER state
When Primary Server is initialized in the RECOVER state it expects to
Droms, et. al. [Page 28]
DRAFT January 1998
refresh its stable storage from an existing Secondary Server. In
this state the Primary Server MUST NOT respond to DHCP client
requests.
When the Primary Server succeeds in contacting the Secondary Server,
if it determines that the Secondary Server is itself in the RECOVER
state (which indicates that the Secondary Server has no existing
client binding information), the Primary Server will move directly
into NORMAL state after signaling some kind of an error (since some
person had to explicitly start the Primary Server in RECOVER state to
refresh its lost client binding information from the Secondary, and
the Secondary had no state).
If the Primary Server determines that the Secondary Server is in any
state other than RECOVER, then the Secondary Server has some client
binding information that the Primary Server needs before it moves
into the NORMAL state. The Primary Server will attempt to refresh
its state from the Secondary Server, and it will remain in the
RECOVER state until it is successful in doing so.
The Primary Server MUST remain in RECOVER state until a period of at
least the MDLI has passed since the Primary Server was known to have
failed. This is to allow any IP addresses that were allocated by the
Primary Server prior to loss of Primary Server client binding infor-
mation in stable storage to contact the Secondary Server or to time
out.
DISCUSSION:
The actual requirement on this wait period in RECOVER is that it
start when the Primary Server went down, not necessarily when it
came back up. If the time when the Primary Server failed is
known, then it could be communicated to the recovering server, and
the wait period could be reduced to the MDLI less the difference
between the current time and the time the server failed. In this
way, the waiting period could be minimized.
8.5. Primary Server in NORMAL state
When in NORMAL state, the Primary Server takes the following actions
to implement the Safe Failover Protocol:
o Lease Time Calculations
As discussed in Section 6.1, "Control of lease time", the
lease interval given to a DHCP client can never be more than
the maximum delta lease interval greater than the acknowledged
Droms, et. al. [Page 29]
DRAFT January 1998
Secondary Server lease interval.
As long as the Primary Server adheres to this constraint, the
specifics of the lease intervals that it gives to either the
DHCP client or the Secondary DHCP server are implementation
dependent. One possible approach is shown in Section 6.1, but
that particular approach is in no way required by this proto-
col.
o Lazy Update of Secondary Server
After an ACK of a IP address binding, the Primary Server
attempts to update the Secondary with the binding information.
The lease time used in the update of the Secondary MUST be at
least that given to the DHCP client in the DHCPACK. It MAY,
however, be longer.
o Reallocation of IP Addresses Between Clients
Whenever a client binding is released, a DHCPBNDUPD message
must be sent to the Secondary Server, setting the binding
state to RELEASED. However, until a DHCPBNDACK is received for
this message, the IP address cannot be allocated to another
client.
8.6. Primary Server in COMMUNICATION-INTERRUPTED Mode
When in COMMUNICATION-INTERRUPTED state the Primary Server operates
in such a way that correct operation is ensured even if the Secondary
Server is still up and operational, but unable to communicate to the
Secondary Server. When communications are reestablished between the
Primary and Secondary Servers, if both are still in COMMUNICATION-
INTERRUPTED state, then the re-integration of their operation will
proceed automatically and without human intervention. The protocol
is designed to ensure that reintegration will proceed in an error
free manner and that no actions taken by either server while in
COMMUNICATION-INTERRUPTED state will cause problems during reintegra-
tion.
The Primary Server operates in COMMUNICATION-INTERRUPTED state as it
does in NORMAL state.
However, since it cannot communicate with the Secondary in this
state, the acknowledged-Secondary-lease-time will not be updated in
any new bindings. This is likely to eventually cause the actual-
client-lease-times to be the current-time plus the MDLI (unless this
is greater than the desired-client-lease-time).
Droms, et. al. [Page 30]
DRAFT January 1998
The Primary Server can simply queue updates to the Secondary on com-
munication interruption and stay in the NORMAL state. If, at the time
communication with the Secondary is reestablished, the Secondary
remains in the NORMAL state as well, then the queued updates for the
Secondary will simply be processed.
COMMUNICATION-INTERRUPTED state for the Primary Server is a signal
that it has stopped queuing updates to the Secondary, and is able to
respond to a variety of possible Secondary states.
It is anticipated that some alarm condition would be raised upon the
transition from NORMAL state to COMMUNICATION-INTERRUPTED state. Once
the Primary Server has been in COMMUNICATION-INTERRUPTED state for a
period equal to the safe-period, then it can (if configured to do so)
transition into the PARTNER-DOWN state. An external command may also
force a transition to PARTNER-DOWN state.
9. Secondary Server Operation
The Secondary Server responds to DHCP client requests only in the
PARTNER-DOWN and COMMUNICATION-INTERRUPTED states.
9.1. Secondary Server Initialization
When the Secondary Server starts, there are three possibilities: it
has never started before and therefore has no record of any previous
state nor of any client binding information; it has started before
and has a record of a previous state and possibly of some client
binding information; it has started before, but failed catastrophi-
cally, and now has no record of any previous state (nor of any client
binding information).
When the Secondary Server starts, if it has any record of a previous
state, then if that state was NORMAL, COMMUNICATION-INTERRUPTED, or
SYNC, it moves to COMMUNICATION-INTERRUPTED state. If that state was
PARTNER-DOWN or POTENTIAL-CONFLICT, then it moves to PARTNER-DOWN
state. In all other cases (both other previous states and the cases
where there is no record of a previous state), the Secondary Server
moves into the RECOVER state.
9.2. Secondary Server State Transitions
The server stays in the current state until all of the actions speci-
fied on the state transition are complete. If communications fails
during one of the actions, the server simply stays in the current
state and attempts a transition whenever the conditions for a
Droms, et. al. [Page 31]
DRAFT January 1998
transition are later fulfilled.
In the state transition diagram below, the "+" or "-" in the upper
right corner of each state is a notation about whether communication
is ongoing with the Primary Server. The legend responsive" and
"unresponsive" in each state indicates whether the Secondary Server
is responsive to DHCP client requests in the respective state.
In the state transition diagram below, when communication is reesta-
blished between the Secondary and Primary Server, the Secondary
Server must record the state of the Primary Server when the communi-
cations was reestablished. If the state of the Primary Server changes
while communicating, then the Secondary Server moves through the
communications-interrupted transition, and into whatever state
results. At that time, it then immediately moves through whatever
state transition is appropriate for the current state of the Primary
Server.
All state transitions of the Secondary Server must be recorded in its
stable storage, and thus be available to the server after a server
restart.
Droms, et. al. [Page 32]
DRAFT January 1998
Previous Secondary State:
NORMAL RECOVER PARTNER DOWN
COMM. INT. <none> POTENTIAL CONFLICT
SYNC | |
+---+ V V
| +----------------+ +-----------------+
| | RECOVER - | | PARTNER DOWN - |<-----+
| | (unresponsive) | | (responsive) | |
| +----------------+ +-----------------+ |
| | | | ^ |
| Comm. OK | Comm. OK | |
| Pri. State: | Pri. State: Comm. |
| | | V All Others Failed |
| | RECOVER +<---+ V | |
| | | | +--------------+ |
| | | Comm. OK | POTENTIAL + | |
| All | Pri. State: | CONFLICT | |
| Others | RECOVER |(unresponsive)|<--- | --+
| | Note | +--------------+ | |
| | Poss. Sec->Pri | | |
| V Error Sync. Resolve Conflict | |
| Pri->Sec | V V | |
| Sync | +-----------------+ | |
| V V | NORMAL + |-External->+ |
| +-----++------>| (unresponsive) | Command | |
| ^ +-----------------+ | |
| Pri<->Sec | ^ | |
| Sync | Start Alloc Timer | |
| | | Sec->Pri | |
| +--------------+ | Sync | |
| | + |--->+ | External |
| | SYNC | Comm. Comm. OK Command |
| | unresponsive | Failed Pri. State: or |
| +--------------+ | RECOVER "Safe Period" |
| ^ V | expiration |
| | +------------------+ | |
| Comm. OK | COMMUNICATIONS - |---------->+ |
| Pri. State: | INTERRUPTED | Comm. OK |
| NORMAL-----| (responsive) |--Pri. State:--+
| COMM. INT. +------------------+ All Others
| ^
+---------------------+
Figure 9.2-1: Secondary Server State Diagram.
Droms, et. al. [Page 33]
DRAFT January 1998
9.3. Secondary Server in RECOVER state
The Secondary DHCP server comes up in the RECOVER state when it has
no record of any previous state (or that previous state was RECOVER).
It stays in this state until it establishes communication with the
Primary Server, and is unresponsive to DHCP client requests in this
state. Essentially it is idle until it can contact the Primary
Server.
When it establishes communication with the Primary Server, it
attempts to load its client binding database from that of the Primary
Server using the techniques specified in section 6.
Once the Secondary Server's client binding database is refreshed from
that of the Primary, the Secondary Server moves into NORMAL state.
9.4. Secondary Server in NORMAL state
In normal state, the Secondary Server receives state updates from the
Primary Server in DHCPBNDUPD messages. It records these in its
client binding database in stable storage and then sends the
corresponding DHCPBNDACK message to the Primary Server.
While in NORMAL state, the Secondary Server MUST also acquire a
series of IP addresses from the Primary Server to be used to satisfy
DHCPDISCOVER requests from DHCP clients when in COMMUNICATION- INTER-
RUPTED state. See Section 2.2.2 for details of this acquisition pro-
cess.
The Secondary Server periodically polls the Primary Server with the
DHCPPOLL message. If it fails to receive a DHCPPRPL message in reply
after a configured number of retries or some administratively deter-
mined time, the Secondary Server transitions into COMMUNICATION-
INTERRUPTED state. Both the DHCPPOLL and DHCPPRPL messages carry the
current status of the sender.
If an external command is received by the Secondary Server, it can
move from NORMAL to PARTNER- DOWN state directly. Such a command
might be sent when the Primary Server was removed from server, and an
operator wanted the Secondary Server to take over immediately and
completely from the Primary Server.(Note that the Secondary Server
takes over from the Primary Server when in COMMUNICATION- INTERRUPTED
state, but less completely than in PARTNER-DOWN state).
Droms, et. al. [Page 34]
DRAFT January 1998
9.5. Secondary Server in COMMUNICATION-INTERRUPTED state
When in COMMUNICATION-INTERRUPTED state the Secondary Server operates
in such a way that correct operation is ensured even if the Primary
Server is still up and operational, but unable to communicate to the
Secondary Server. When communications are reestablished between the
Primary and Secondary Servers, if both are still in COMMUNICATION-
INTERRUPTED state, then the re-integration of their operation will
proceed automatically and without human intervention. The protocol
is designed to ensure that reintegration will proceed in an error
free manner and that no actions taken by either server while in
COMMUNICATION-INTERRUPTED state will cause any conflicts to occur
during re-integration.
In COMMUNICATION-INTERRUPTED state, the Secondary Server responds to
DHCP client requests.
When processing a DHCPREQUEST from a DHCP client, the Secondary
Server MUST ensure that the client- lease-time is never more than the
maximum-delta-lease- interval from the current-time, independent of
the desired- client-lease-time.
When processing a DHCPRELEASE request from a DHCP client or the
expiration of a lease, the Secondary Server must not reallocate the
IP address to a different client. If the same client subsequently
performs a DHCPDISCOVER request, the Secondary Server SHOULD offer it
the previously used IP address.
When processing a DHCPDISCOVER request from a DHCP client, the secon-
dary MUST allocate IP addresses from the list of IP addresses that it
acquired from the Primary Server in RECOVER state. When it exhausts
this list, it MUST stop responding to DHCPDISCOVER requests (except
those it can satisfy by offering expired or released IP addresses to
their previously bound clients).
The Secondary Server MUST continue to send DHCPPOLL messages to the
Primary Server when in COMMUNICATION-INTERRUPTED state. If it
receives a DHCPPRPL message in reply, the Secondary Server determines
the state of the Primary Server. If the Primary Server is in NORMAL
or COMMUNICATION-INTERRUPTED state, then the Secondary Server moves
into the SYNC state.
If, however, the Primary Server is in RECOVER state, then the Secon-
dary Server updates the Primary Server with its known client binding
information, and moves into NORMAL state upon completion of that
update.
If instructed to by an outside agency (e.g., an administrator), the
Droms, et. al. [Page 35]
DRAFT January 1998
Secondary Server SHOULD move into PARTNER-DOWN state. Once the
Secondary Server has been in COMMUNICATION-INTERRUPTED state for a
period equal to the safe-period, then it may (if configured to do so)
transition into the PARTNER-DOWN state in the absence of an external
command.
9.6. Secondary Server in SYNCH state
The Secondary Server does not respond to DHCP client requests when in
SYNCH state.
DISCUSSION:
This is the entire reason for this states existence, otherwise the
activities specified for this state could happen as part of a
state transition from the COMMUNICATION-INTERRUPTED state to the
NORMAL state. However, in the COMMUNICATION-INTERRUPTED state the
Secondary Server responds to DHCP client requests. Having the
Secondary Server respond to DHCP client requests during the syn-
chronization process (and thus taking actions requiring further
synchronization) seemed like a bad idea.
The Secondary Server synchronizes its information with the Primary
Server while in SYNCH state. Both Primary and Secondary Servers may
have information the other lacks because of operations performed
while communications were interrupted.
During the synchronization process, the Secondary Server continues to
poll the Primary Server with DHCPPOLL messages. If it fails to
receive a reply, it moves back into COMMUNICATION-INTERRUPTED state.
When synchronization is complete, the Secondary Server moves into
NORMAL state.
9.7. Secondary Server in PARTNER-DOWN state
The Secondary Server responds to DHCP client requests when in
PARTNER-DOWN state.
Any available IP address which does not belong to the private pool
established by the Secondary Server (at entry to PARTNER-DOWN state)
MUST NOT be used until the MDLI beyond the entry into PARTNER-DOWN
state has elapsed.
The Secondary Server MUST NOT allocate an IP address to a DHCP client
different from that to which it was allocated at the entrance to
Droms, et. al. [Page 36]
DRAFT January 1998
PARTNER-DOWN state until the MDLI beyond the its expiration time has
elapsed. If this time would be earlier than the current time plus the
MDLI, then the current time plus the MDLI is used.
Two options exist for lease times, with different ramifications flow-
ing from each.
If the Secondary Server wishes the Failover Protocol to protect it
from loss of stable storage in any state, then it should ensure that
the MDLI based lease time restrictions in Section 6.1 are maintained,
even in PARTNER-DOWN state.
If the Secondary Server wishes to forego the protection of the safe
Failover Protocol in the event of loss of stable storage, then it MAY
recognize no restrictions on actual client lease times while in
PARTNER-DOWN state.
The Secondary Server continues to poll the Primary Server with
DHCPPOLL messages. If the Secondary Server receives a reply, and the
Primary Server is in the RECOVER state, the Secondary Server updates
the Primary Server with all of the Secondary's client binding infor-
mation, and then moves into the NORMAL state.
If communications with the Primary Server are reestablished, and the
Primary Server is in any other state but RECOVER, the Secondary
Server moves into the POTENTIAL-CONFLICT state (as does the Primary
Server).
9.8. Secondary Server in POTENTIAL-CONFLICT state
The secondary server enters POTENTIAL-CONFLICT state when the combi-
nation of its state and that of the primary indicate that a potential
conflict of IP address allocation has occurred. There is no guaran-
tee that such a conflict has occurred -- just the possibility. In
this state each server compares its client binding information with
that of the other server and any conflicts are resolved in an imple-
mentation dependent manner.
When (and if) the resolution process completes, each server moves
into the NORMAL state.
10. Safe Period
Due to the restrictions imposed on each server while in
COMMUNICATION-INTERRUPTED state, long-term operation in this state is
not feasible for either server. One reason that these states exist at
all, is to allow the servers to easily survive transient network
Droms, et. al. [Page 37]
DRAFT January 1998
communications failures of a few minutes to a few days (although the
actual time periods will depend a great deal on the DHCP activity of
the network in terms of arrival and departure of DHCP clients on the
network).
Eventually, when the servers are unable to communicate, they will
have to move into a state where they no longer can re-integrate
without the some possibility of a duplicate IP address allocation.
There are two ways that they can move into this state (known as
PARTNER-DOWN).
They can either be informed by external command that, indeed, the
partner server is down. In this case, there is no difficulty in mov-
ing into the PARTNER-DOWN state since it is an accurate reflection of
reality and the protocol has been designed to operate correctly (even
during reintegration) if, when in PARTNER-DOWN state the partner is,
indeed, down.
The other difficulty is when the servers are running unattended for
extended periods, and in this case the option is provided to config-
ure something called a "safe- period" into each server. This OPTIONAL
safe-period is the period after which either the Primary or Secondary
Server will automatically transition to PARTNER-DOWN from
COMMUNICATION-INTERRUPTED state. If this transition is completed and
the partner is not down, then the possibility of duplicate IP address
allocations will exist.
The goal of the "safe-period" is to allow network operations staff
some time to react to a server moving into COMMUNICATION-INTERRUPTED
state. During the safe-period the only requirement is that the net-
work operations staff determine if both servers are still running --
and if they are, to either fix the network communications failure
between them, or to take one of the servers down before the expira-
tion of the safe-period.
The length of the safe-period is installation dependent, and depends
in large part on the number of unallocated IP addresses within the
subnet address pool and the expected frequency of arrival of previ-
ously unknown DHCP clients requiring IP addresses. Many environments
should be able to support safe-periods of several days.
During this safe period, either server will allow renewals from any
existing client. The only limitation concerns the need for IP
addresses for the DHCP server to hand out to new DHCP clients and the
need to re-allocate IP addresses to different DHCP clients.
The number of "extra" IP addresses required is equal to the expected
total number of new DHCP clients encountered during the safe period.
Droms, et. al. [Page 38]
DRAFT January 1998
This is dependent only on the arrival rate of new DHCP clients, not
the total number of outstanding leases on IP addresses.
In the unlikely event that a relatively short safe period of an hour
is all that can be used (given a dearth of IP addresses or a very
high arrival rate of new DHCP clients), even that can provide sub-
stantial benefits in allowing the DHCP subsystem to ride through a
minor problems that could occur and be fixed within that hour. In
these cases, no possibility of duplicate IP address allocation
exists, and re-integration after the failure is solved will be
automatic and require no operator intervention.
11. Open Issues
A number of details remain to be worked out. They are as follows:
1. Level of Agreement and Completion
This draft is incomplete in two senses. First, none of the
authors agree with everything written, and quite a number of
issues remain to be worked out among the various authors (to say
nothing about the rest of the community). Second, this draft is
not yet complete enough to support creation of inter-operable
implementations.
However, we believe that even though this draft is very much a
work in progress, there is value with sharing it with the rest
of the DHCP community in its current form.
2. Failover Port
We need to resolve whether the Failover protocol runs with the
same or a different port as the DHCP protocol. In the interests
of allowing implementation of the Failover protocol by a dif-
ferent process or sub-process, having it use a different port
seems reasonable.
3. High Level Operations
While the detailed operations are beginning to come together,
the higher level operations (like reintegration) are, as yet,
incompletely specifcied. This will be rectified in a later
revision.
4. Option Spaces
The draft currently reflects some rather fuzzy goals of using
DHCP options where they apply but also defining new options. It
Droms, et. al. [Page 39]
DRAFT January 1998
uses the "user defined option space" for this, which is probably
not a good idea. Perhaps the DHCP Panel will produce a larger
option space in which all of these options can be defined, or
perhaps (as it written in the draft) this protocol will just
have to define entirely unique options.
5. Subnet Level Granularity
This protocol talks about a server being in one state or
another, however the desire is for this protocol to operate
independently in each address pool for which a primary and
secondary server is defined. In this way, the "server" state
really refers to the "subnet" state. Once the protocol is vali-
dated, the editing work to make it operate at subnet granularity
will be performed.
6. Secondary Server Communications with DHCP Clients
There are two situations where we may want to allow the secon-
dary server to communicate with DHCP clients even though the
secondary can communicate with the primary and would normally be
unresponsive to DHCP client requests.
The first situation which deserves consideration is where the
secondary has given a DHCP client a lease on an IP address when
it was not able to communicate with the primary, and then subse-
quently the secondary becomes able to communicate with the pri-
mary. When the client unicasts its DHCPREQUEST to the secondary
to renew its lease, the secondary will not be able to communi-
cate with the client (as this protocol is defined). Should we
allow the Secondary to extend the lease for the DHCP client and
then inform the primary of that extension using the DHCPBNDUPD
message in the same was as the Primary uses that message?
The second situation arises where a client can only communicate
with the secondary due to some network failure, but the primary
and secondary server can communicate. As written, the protocol
will not allow the secondary to offer a lease to the DHCP
client, but it would be straightforward to modify the protocol
to allow the secondary to do so. The only difficult part of
this change to the protocol would be to suggest how the secon-
dary would know that the DHCP client could talk only to the
secondary. But, given that if the DHCP primary could talk to
the DHCP client, the secondary would expect to hear about it in
DHCPBNDUPD messages at some point, the absence of such messages
could be used as a signal to communicate to the DHCP client in
question.
Droms, et. al. [Page 40]
DRAFT January 1998
7. UDP or TCP
There has been much debate about the utility of using UDP for
the failover protocol, since it doesn't supply guaranteed
delivery. Certainly rebuilding TCP out of UDP would be a mis-
take. Some factors to consider in this debate are as follows:
First, it is important to recognize that mere receipt of a
packet by the other server in the pair (e.g., receipt of a
DHCPBNDUPD packet by the secondary server) is not sufficient for
the primary to update its own bindings database with new infor-
mation about what the secondary knows. In all cases of
transfers of bindings information, the server of a DHCPBNDUPD
message MUST update its own stable storage prior to replying
with a DHCPBNDACK message (except in the marginal case where all
of the updates are rejected). An action is required by the
receiving server and an explicit ACK is needed by the sending
server to ensure the integrity of the protocol. So, just know-
ing that the other server has received a Failover protocol
packet is not intrinsically interesting.
Second, the DHCP protocol, both the client and server side, is
being implemented in progressively smaller and smaller machines.
While this progression is most evident in DHCP clients, there
exist implementations today of DHCP servers embedded in devices
that are by no stretch of the imagination traditional "servers"
running mainstream operating systems. In many ways, the Fail-
over protocol is very well suited to such devices. Adding addi-
tional protocol infrastructure requirements to implement the
Failover protocol could easily prevent its implementation in
devices that in some ways need it most.
Third, there are only a few cases where the Failover protocol
requires guaranteed delivery of packets. In particular, the
normal Primary to Secondary DHCPBNDUPD message to not have to be
delivered reliably. The consequences of lost DHCPBNDUPD mes-
sages are handled by the use of the MDLI, for the simple reason
that since these messages are "lazy", they may not get delivered
because of a server failover prior to their transmission. Given
that the protocol is robust in the face of loss of either a
DHCPBNDUPD message or a DHCPBNDACK message, a technique known as
"fire and forget" may be used with this protocol and two
cooperating implementations. If the DHCPBNDACK message contains
all of the information originally in the DHCPBNDUPD message,
then the DHCPBNDUPD message may be transmitted and forgotten by
the sending server (typically the primary). When and if the
secondary receives the DHCPBNDUPD and replies with a DHCPBNDACK
message and the primary receives it, the primary will update its
Droms, et. al. [Page 41]
DRAFT January 1998
stable storage with a new picture of what the secondary knows
about the lease time. If either of these messages is lost, the
only downside is that the DHCP client associated with the bind-
ing in question may receive a shorter lease for one lease period
than it would otherwise. This "fire and forget" technique
could substantially ease both the complexity of implementation
and memory requirements of an implementation of the Failover
protocol, especially where two servers were communicating over a
very slow link.
12. Acknowledgments
Ralph Droms started it all, by sketching out an initial interserver
draft that embodied ideas from several past IETF meetings. In that
draft, he acknowledged contributions by Jeff Mogul, Greg Minshall,
Rob Stevens, Walt Wimer, Ted Lemon, and the DHC working group.
Kim Kinnear and Bob Cole each extended that draft, separately and
then together, until they created an interserver draft that supported
any number of servers. The complexity of that approach was just too
great, and led to a much simpler approach embodied in the first Fail-
over draft by Greg Rabil, Mike Dooley, and Arun Kapur and Ralph
Droms. This draft posited only two servers -- a primary and a secon-
dary. Kim Kinnear then wrote the Safe Failover draft to layer on top
of the Failover Draft and increase its the robustness in the face of
certain rare network failures. At the spring 1998 IETF meeting in LA,
the DHC working group said that they wanted a merged Failover and
Safe Failover draft. Steve Gonczi and Bernie Volz stepped up and
produced the raw material for such a merged draft, along with a new
message format designed around DHCP options and other extensions and
clarifications. Kim Kinnear edited their work into draft format and
made other changes, and that is what you have in your hands.
Many people have reviewed the various drafts that went into this
result. At American Internet, ideas have been contributed by Mark
Stapp, Brad Parker, and Ellen Garvey. Glenn Waters of Bay Networks
contributed ideas and enthusiasm to make a Failover protocol that was
both "safe" and "lazy".
13. References
[1] Droms, R., "Dynamic Host Configuration Protocol", RFC 2131,
March 1997.
[2] Alexander, S., Droms, R., "DHCP Options and BOOTP Vendor
Extensions", Internet RFC 2132, March 1997.
Droms, et. al. [Page 42]
DRAFT January 1998
[3] Rabil, G., Dooley, M., Kapur, A., Droms, R., "DHCP Failover
Protocol", draft-ietf-dhc-failover-00.txt.
[4] Gudmundsson, Olafur, "Security Architecture for DHCP",
draft-ietf-dhc-security-arch-00.txt.
14. Author's information
Ralph Droms
323 Dana Engineering
Bucknell University
Lewisburg, PA 17837
Phone: (717) 524-1145
EMail: droms@bucknell.edu
Greg Rabil, Mike Dooley, Arun Kapur
Quadritek Systems, Inc.
10 Valley Stream Parkway, Suite 240
Malvern, PA 19355
Phone: (800) 208-2747
EMail: grabil@quadritek.com
mdooley@quadritek.com
akapur@quadritek.com
Kim Kinnear
American Internet Corporation
4 Preston Ct.
Bedford, MA 01730-2334
Phone: (781) 276-4587
EMail: kinnear@american.com
Steve Gonczi, Bernie Volz
Process Software Corporation
959 Concord St.
Framingham, MA 01701
Phone: (508) 879-6994
EMail: gonczi@process.com
volz@process.com
Droms, et. al. [Page 43]
Html markup produced by rfcmarkup 1.115, available from
https://tools.ietf.org/tools/rfcmarkup/