CGNAT and PCP

Introduction

Description

PCP support is to be added to the CGN system as a way for individual subscribers to open a public port such as to allow inbound connections to be made.

This is not going to be somewhat similar to the figure 3 in RFC 7843, but more accurately as below:

1 2 3 4 5 6 7 8 9 10 11 Internet ^ | CGN <=== [PCP Protocol] ======= Web Portal ^ ^ | | + --- [Subscriber HTTP request] --+ | ^ | Subscriber

The subscriber will not be adjacent to the CGN/PCP device, and hence multicast PCP messages will not be useful, similarly, the subscriber will probably not be informed of the address of the CGN/PCP device. In spite of this, it will be possible for the subscriber (and/or public peer) to learn some CGN/PCP host addresses through normal network probing techniques.

The subscriber will be able to log in to a Web Portal and use this to configure an open public (address/port/protocol) tuple which will be mapped through to a corresponding private (address/port/protocol) tuple. This would require the subscriber to provide the private protocol and port (the subscriber private address being fixed), and would yield a corresponding public address and port for the same protocol. These public values would then be provided to nodes on the Internet in order that they may initiate communication destined to the subscriber.

Distinct CGN interfaces will be used for the link towards the subscriber, the link towards the Internet, and the (management) link towards the Web Portal. The Web Portal may consist of more than one device, the front end HTTP server and a back end device which speaks PCP towards the CGN device.

Only the Web Portal (or its back end function/device) should be allowed to send PCP messages, the subscriber is not to have direct PCP control of the CGN. Likewise, PCP messages from the Internet are to be ignored. Since PCP runs over UDP, and is essentially a command/response protocol there are security concerns to be addressed, including the possibility of spoofed PCP commands directed to the CGN box.

Requirements

Details

In terms of PCP commands, this will require the MAP command, together with the THIRD_PARTY option. The ANNOUNCE command will also be supported in order to allow for easier state restore in case of CGN restart. The PEER command will not be supported, nor will the AUTHENTICATION command.

Contrary to section 13.3 of RFC 6887 which indicates that at least one FILTER option must be supported in combination with the MAP command, we will not support any FILTER options. Any attempt to configure a MAP with a FILTER will be rejected with an appropriate error message. This is due to limitations in the CGN implementation, and should not cause any issues for this deployment.

The CGN boxes support translating TCP, UDP, UDP-Lite, DCCP and ICMP (for pings), together with related ICMP error messages. We will support MAP commands for TCP, UDP, UDP-Lite and DCCP.

The CGN element in the above will actually consist of two devices, an active and standby. Both will be configured to use the same set of per subscriber public addresses and port ranges. As such the Web Portal will need to program both CGN devices with identical values. This means that some invocations of the MAP command will have to specify the required address+port together with the PREFER_FAILURE option.

Since PCP will be being used in an unauthenticated mode and given that the subscriber and/or public peer will be able to learn host addresses for the CGN box(es), we need to restrict which network elements can send PCP messages towards the CGN box. This can not simply be by source address matching as PCP is essentially a stateless protocol, and spoofing of the allowed source address (of the Web Portal back end) could allow for unrestricted programming. As such it is suggested that the CGN system be deployed in combination with
appropriate "local firewall" rules such that PCP messages (UDP destination ports 5350 and 5351) are dropped except when they arrive over the management interface and have the expected source address for the Web Portal back end.

It may also be appropriate to give the CGN box and Web Portal back end IPv6 addresses, and hence operate PCP over IPv6, as this will further restrict the ability of untrusted parties to program the CGN devices.

Functional Behaviour

When PCP encodes a MAP request, it sends a 5-tuple representing the desired mapping. This tuple is (protocol, private-address, private-port, public-address, public-port); of those fields protocol and the private fields are fixed values which are required to be used in the mapping, however, the public fields can either be treated as a hint, and ignored by the receiving server, or treated as mandatory. The default mode of operation is to treat the public fields as a hint. If the request includes a PREFER_FAILURE option, then the public fields are treated as mandatory, and if the exact mapping represented by the tuple can not be created, the request will fail.

The initial attempt by the back end server to create an explicit mapping should be performed with the public port and address being the wildcard value. It should also be without the PREFER_FAILURE option. This is because if specified, the mapping may conflict with existing use of the port for an implicit outgoing connection, and there is no good reason to force such a request to fail.

This implies that the Web Portal should not allow the user to specify the public port. If the user could specify the desired public port, then the Web Portal has to be able to cope with either a different port being allocated or the request to allocate a specific port failing (i.e. using PREFER_FAILURE in the latter case).

The PCP/CGN box will not persistently store mappings, as such they will have to be refreshed by the back end server. This implies that such a server will have to record the actual allocated port, then send periodic MAP commands to refresh the allocation before the existing mapping has timed out. Hence it also
implies that the back end server will have to note the actual returned timeout for the mapping, and refresh mappings within a sufficient time period.

This also implies that the primary CGN box will initially command a MAP request without PREFER_FAILURE, the subsequent commands it receives will contain PREFER_FAILURE; similarly that all MAP commands received by the standby CGN box will contain the PREFER_FAILURE option.

It should not be possible for the primary and secondary CGN boxes to return inconsistent responses, i.e. success from primary and failure from the secondary, but the back end service will have to handle this scenario in some fashion.

The PCP server will not send unsolicited responses indicating a change in mappings, e.g. in response to a CGN pool configuration change.

Upon startup, the PCP server will send ANNOUNCE responses to the configured addresses in order to solicit lost state.

CGN Behaviour

Our CGN operates in one of two modes, 5-tuple or 3-tuple. In 3-tuple mode, it only maintains public information for its own addresses and ports used in translations. In 5-tuple mode it also maintains associated public peer addresses and ports used for live flows. In 5-tuple mode, the 3-tuple entry is maintained as an implicit entry without a timeout, in 3-tuple mode as an explicit entry with a timeout.

In both modes, it exhibits Endpoint-Independent Mapping and Endpoint-Independent Filtering behaviour, such that inbound packets matching the 3-tuple session will be translated.

We will then exhibit two different behaviours depending upon which mode the CGN is operating in.

In 5-tuple mode, we would need to be able to create an explicit 3-tuple session with an associated and effective timeout, despite the timeout on the 3-tuple session currently being ignored in this mode. Once such an explicit (MAP created) 3-tuple session causes an associated 5-tuple session to come into existence, we would still need the 3-tuple session timeout to be honoured, staying around as long as the timeout has not expired, or there are child 5-tuple sessions. If the 3-tuple timeout expires while there are still 5-tuple sessions in existence, then the 3-tuple session will have to be converted to an implicit mapping. This implicit status will persist unless and until a new MAP command arrives to once again 'overlay' an explicit mapping with a timeout.
In 3-tuple mode, the default timeout behaviour will apply - i.e. that outgoing packets extend the lifetime of a session. If the timeout of an existing explicit 3-tuple session is set to 0 by a MAP command, then this simply converts the session to an implicit one, it does not delete it. (See RFC 6887, pg 70, final para - section 15).

We believe there is no need for the PCP server to query/validate the existence of a mapping without causing it to be updated. If such a need did arise it could be provided by the netconf support.

We will not support DMZ behaviour, nor "all ports" i.e. priv-port of zero will be rejected with a failure, as will a "proto" of zero.

When a command fails, the request fields will be repeated in the response JSON - except for the timeout.

The error codes returned should allow for specific PCP error codes to be generated, e.g. the USER_EX_QUOTA or NO_RESOURCES cases.

There will be no explicit dataplane support or PCP server for helping with PCP address change events, the Web Portal will have to handle this itself. 

Configuration Data Model  

A section under "service" which defines a named PCP instance, together with restrictions (e.g. only MAP and ANNOUNCE supported). The service address(es) to listen on (if unspecified, then any), the address(es) to which to send ANNOUNCE messages, and the address(es) of peer elements (e.g. Web Portal backends) from which to accept commands. When multiple instances are in use on different interfaces, then each needs to have a different listen address so that the server can determine which instance is intended, and hence which interface it will target in the dataplane commands for session creation. If multiple instances of the same type are applied to the same interface, then it should be possible for all bar one of the instances to omit the listen to address - i.e. instances of the same type are in effect merged behind the scenes.

In the below if/when this is extended to support other use cases, does 'type' need to allow multiple elements? i.e. for the shared session table used by NAT44/Firewall/NAT64/NAT46? The alternative would be that multiple instances (of different type) could share the same listen address, but behind the scenes this simply results in a single merged instance, allowing functionality across different features.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 service { pcp { feature-interface dp0s5 { instance SOME-NAME } instance SOME-NAME { description type CGNAT /* Restricts contents of messages, hint to find reference */ client { prefix 2.3.4.5/32 { description third-party { /* Must use THIRD_PARTY option */ interface MAN-IF-A interface MAN-IF-B ... } /* authentication / interface restrictions */ } prefix 2.3.5.0/24 { } prefix 2::4 { } ... } listen { address 1.2.3.4 address 1::2 ... } command { map { } announce { to { address 3.4.5.6 address 3::6 ... } } } } } }

Hence all of the PCP specific configurations are under the one named section, and a given section is applied to an interface. It should be possible to configure multiple PCP instances and have those referenced from different CGN instances. Maybe this latter could be left for an enhancement. The idea here is that we may end up extending PCP usage/control to more features (e.g. NAT46, NAT64, NAT44, Firewall). We could possibly have multiple instances (for different features) applied to the one interface, but a given instance would not be applicable to more than one interface at any time. i.e. a many to one mapping.

We believe that we need to support two named instances, each applied to a distinct (Internet-facing) CGN feature interface for the current intended CGN deployment. This is in part due to the fact that if the one pool of addresses was shared between the two Internet-facing interfaces, there is no guarantee which interface would receive an inbound packet, resulting is potential session table lookup failure if the desired address + port mapping was on the other interface. I recall talk of two distinct blocks of subscriber addresses, and two distinct Internet-facing links. This may be necessary in order to derive the interface element which has to be passed to the dataplane to establish a mapping.

It is assumed that the CGN pool of addresses used on the public side is not also used by the CGN box as host addresses.

Yang Model

Note that the number of templates and the number of servers is currently limited to 1 (per routing-instance). The template must be of type "cgnat".

The possibility is open to add different types of templates and multiple servers in future.

Vyatta Service PCP

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 module vyatta-service-pcp-v1 { namespace "urn:vyatta.com:mgmt:vyatta-service-pcp:1"; prefix vyatta-service-pcp-v1; import vyatta-services-v1 { prefix service; } import vyatta-types-v1 { prefix types; } organization "AT&T Inc."; contact "AT&T Inc. Postal: 280 S. Akard Street Dallas, TX 25202 Web: www.att.com"; description "Copyright (c) 2019 AT&T Intellectual Property. All rights reserved. The YANG module for vyatta-service-pcp-v1"; revision 2019-07-29 { description "Initial revision of version 1."; } typedef pcp-types { type enumeration { enum cgnat { description "Carrier Grade NAT"; configd:help "Carrier Grade NAT"; } } } grouping pcp { container pcp { description "Port Control Protocol"; configd:help "Port Control Protocol"; list feature-interface { description "List of feature interfaces managed by PCP"; configd:help "List of feature interfaces managed by PCP"; configd:allowed "vyatta-interfaces.pl --show all --skip=lo"; key "name"; leaf name { type types:interface-ifname; } list template { description "List of templates"; configd:help "List of templates"; max-elements 1; key "name"; leaf name { description "Template name"; configd:help "Template name"; type leafref { path "../../../server/template/name"; } } list internal-prefix { description "Internal prefixes that template is applied to"; configd:help "Internal prefixes that template is applied to"; min-elements 1; key "prefix"; leaf prefix { type types:ip-prefix; } } } } list server { description "List of PCP servers"; configd:help "List of PCP servers"; max-elements 1; key "name"; leaf name { type types:alpha-numeric; } list listener { description "List of listen addresses"; configd:help "List of listen addresses"; key "address"; leaf address { type types:ip-address; } leaf port { description "UDP port to listen on"; configd:help "UDP port to listen on"; type types:port; default 5351; } } container log { description "PCP logging"; configd:help "PCP logging"; leaf debug { description "Enable debug logging"; configd:help "Enable debug logging"; type empty; } } leaf nonce-check { description "Validate nonce"; configd:help "Validate nonce"; type boolean; default true; } list template { description "List of templates"; configd:help "List of templates"; max-elements 1; key "name"; leaf name { description "Template name"; configd:help "Template name"; type types:alpha-numeric; must "count(../../../server/template[name = current()]) = 1" { error-message "template names must be unique across servers"; } } container opcodes { description "Supported Opcodes"; configd:help "Supported Opcodes"; container announce { presence "ANNOUNCE Opcode"; configd:help "ANNOUNCE Opcode"; } container map { presence "MAP Opcode"; configd:help "MAP Opcode"; } } leaf type { description "Template type"; configd:help "Template type"; type pcp-types; mandatory true; } } container third-party { presence "Require third-party option"; configd:help "Require third-party option"; list interface { description "Interfaces to accept requests from"; configd:help "Interfaces to accept requests from"; configd:allowed "vyatta-interfaces.pl --show all --skip=lo"; key "name"; leaf name { type types:interface-ifname; } } } container announce { configd:help "ANNOUNCE response from listener on start-up"; description "ANNOUNCE response from listener on start-up"; leaf multicast { description "Multicast ANNOUNCE message"; configd:help "Multicast ANNOUNCE message"; type empty; } list unicast { description "List of clients to unicast ANNOUNCE message"; configd:help "List of clients to unicast ANNOUNCE message"; key "address"; leaf address { type types:ip-address; } leaf port { description "UDP port client is listening on"; configd:help "UDP port client is listening on"; type types:port; default 5350; } } } } } } augment /service:service { uses pcp; } }

Vyatta Service PCP Routing Instance

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 module vyatta-service-pcp-routing-instance-v1 { namespace "urn:vyatta.com:mgmt:vyatta-service-pcp-routing-instance:1"; prefix vyatta-service-pcp-routing-instance-v1; import vyatta-routing-v1 { prefix routing; } import vyatta-service-pcp-v1 { prefix pcp; } organization "AT&T Inc."; contact "AT&T Inc. Postal: 280 S. Akard Street Dallas, TX 25202 Web: www.att.com"; description "Copyright (c) 2019 AT&T Intellectual Property. All rights reserved. The YANG module for vyatta-service-pcp-routing-instance-v1"; revision 2019-07-05 { description "Initial revision of version 1."; } augment /routing:routing/routing:routing-instance/routing:service { uses pcp:pcp; } }

Configuration Examples

Both examples assume the following baseline CGNAT configuration:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 set service nat cgnat session-timeout tcp established 30 set service nat pool POOL1 entry RANGE1 ip-address range start 10.10.30.1 set service nat pool POOL1 entry RANGE1 ip-address range end 10.10.30.254 set service nat pool POOL1 type CGNAT set service nat pool POOL1 address-allocation round-robin set service nat pool POOL1 address-pooling paired set service nat pool POOL1 port dynamic-block-allocation block-size 512 set service nat pool POOL1 port dynamic-block-allocation max-blocks-per-subscriber 32 set service nat pool POOL1 port allocation sequential set service nat pool POOL1 port range start 1024 set service nat pool POOL1 port range end 65535 set service nat cgnat policy POLICY1 match source ip-address prefix 10.10.0.0/16 set service nat cgnat policy POLICY1 priority 10 set service nat cgnat policy POLICY1 translation pool POOL1 set service nat cgnat interface dp0p1s2 policy POLICY1

Global

1 2 3 4 5 6 7 8 set service pcp feature-interface dp0p1s2 template cgnat internal-prefix 10.10.2.0/24 set service pcp feature-interface dp0p1s2 template cgnat internal-prefix 10.10.20.0/24 set service pcp server default listener 10.10.101.2 set service pcp server default log debug set service pcp server default template cgnat opcodes announce set service pcp server default template cgnat opcodes map set service pcp server default template cgnat type cgnat set service pcp server default third-party interface dp0p1s1.101

Routing Instance

1 2 3 4 5 6 7 8 9 10 set routing routing-instance mgmt interface dp0p1s1.100 set routing routing-instance mgmt service pcp feature-interface dp0p1s2 template cgnat internal-prefix 10.10.2.0/24 set routing routing-instance mgmt service pcp feature-interface dp0p1s2 template cgnat internal-prefix 10.10.20.0/24 set routing routing-instance mgmt service pcp server mgmt listener 10.10.100.2 set routing routing-instance mgmt service pcp server mgmt listener '2000:10:100::2' set routing routing-instance mgmt service pcp server mgmt log debug set routing routing-instance mgmt service pcp server mgmt template cgnat opcodes announce set routing routing-instance mgmt service pcp server mgmt template cgnat opcodes map set routing routing-instance mgmt service pcp server mgmt template cgnat type cgnat set routing routing-instance mgmt service pcp server mgmt third-party interface dp0p1s1.100

Open Source

The PCP daemon will be based on open source "libre" and "repcpd".