Network Working Group | M. Thomson |
Internet-Draft | Mozilla |
Intended status: Standards Track | J. Yasskin |
Expires: February 14, 2019 | |
August 13, 2018 |
Merkle Integrity Content Encoding
draft-thomson-http-mice-latest
This memo introduces a content-coding for HTTP that provides progressive integrity for message contents. This integrity protection can be evaluated on a partial representation, allowing a recipient to process a message as it is delivered while retaining strong integrity protection.
RFC EDITOR: please remove this section before publication
Discussion of this draft takes place on the HTTP working group mailing list (ietf-http-wg@w3.org), which is archived at https://lists.w3.org/Archives/Public/ietf-http-wg/.
The source code and issues list for this draft can be found at https://github.com/martinthomson/http-mice.
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."
This Internet-Draft will expire on February 14, 2019.
Copyright (c) 2018 IETF Trust and the persons identified as the document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.
Integrity protection for HTTP content is highly valuable. HTTPS [RFC2818] is the most common form of integrity protection deployed, but that requires a direct TLS [RFC8446] connection to a host. However, additional integrity protection might be desirable for some use cases. This might be for additional protection against failures or attack (see [SRI]) or because content needs to remain unmodified throughout multiple HTTPS-protected exchanges.
This document describes a “mi-sha256” content-encoding (see Section 2) that is a progressive, hash-based integrity check based on Merkle Hash Trees [MERKLE].
The means of conveying the root integrity proof used by this content encoding will depend on deployment requirements. This document defines a digest algorithm (see Section 3) that can carry an integrity proof.
The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in [RFC2119].
A Merkle Hash Tree [MERKLE] is a structured integrity mechanism that collates multiple integrity checks into a tree. The leaf nodes of the tree contain data (or hashes of data) and non-leaf nodes contain hashes of the nodes below them.
A balanced Merkle Hash Tree is used to efficiently prove membership in large sets (such as in [RFC6962]). However, in this case, a right-skewed tree is used to provide a progressive integrity proof. This integrity proof is used to establish that a given record is part of a message.
The hash function used for “mi-sha256” content encoding is SHA-256 [FIPS180-4]. The integrity proof for all records other than the last is the hash of the concatenation of the record, the integrity proof of all subsequent records, and a single octet with a value of 0x1:
proof(r[i]) = SHA-256(r[i] || proof(r[i+1]) || 0x1)
The integrity proof for the final record is the hash of the record with a single octet with a value 0x0 appended:
proof(r[last]) = SHA-256(r[last] || 0x0)
Figure 1 shows the structure of the integrity proofs for a message that is split into 4 blocks: A, B, C, D). As shown, the integrity proof for the entire message (that is, proof(A)) is derived from the content of the first block (A), plus the value of the proof for the second and subsequent blocks.
proof(A) /\ / \ / \ A proof(B) /\ / \ / \ B proof(C) /\ / \ / \ C proof(D) | | D
Figure 1: Proof structure for a message with 4 blocks
The final encoded message is formed from the record size and first record, followed by an arbitrary number of tuples of the integrity proof of the next record and then the record itself. Thus, in Figure 1, the body is:
rs || A || proof(B) || B || proof(C) || C || proof(D) || D
A message that has a content length less than or equal to the content size does not include any inline proofs. The proof for a message with a single record is simply the hash of the body plus a trailing zero octet.
As a special case, the encoding of an empty payload is itself an empty message (i.e. it omits the initial record size), and its integrity proof is SHA-256(“\0”).
RFC EDITOR: Please remove the next paragraph before publication.
Implementations of drafts of this specification MUST implement a content encoding named “mi-sha256-##” instead of the “mi-sha256” content encoding specified by the final RFC, with “##” replaced by the draft number being implemented. For example, implementations of draft-thomson-http-mice-03 would implement “mi-sha256-03”.
In order to produce the final content encoding the content of the message is split into equal-sized records. The final record can contain less than the defined record size.
For non-empty payloads, the record size is included in the first 8 octets of the message as an unsigned 64-bit integer. This refers to the length of each data block.
The final encoded stream comprises of the record size (“rs”), plus a sequence of records, each “rs” octets in length. Each record, other than the last, is followed by a 32 octet proof for the record that follows. This allows a receiver to validate and act upon each record after receiving the proof that precedes it. The final record is not followed by a proof.
Constructing a message with the “mi-sha256” content encoding requires processing of the records in reverse order, inserting the proof derived from each record before that record.
This structure permits the use of range requests [RFC7233]. However, to validate a given record, a contiguous sequence of records back to the start of the message is needed.
A receiver of a message with the “mi-sha256” content-encoding applied first attempts to acquire the integrity proof for the first record, top-proof. If the Digest header field is present with the mi-sha256 parameter, a value might be included there.
The receiver attempts to read the first 8 octets as an unsigned 64-bit integer, “rs”. If 8 octets aren’t available then:
The remainder of the message is read into records of size “rs” plus 32 octets. The last record is between 1 and “rs” octets in length, if not then validation fails. For each record:
If an integrity check fails, the message SHOULD be discarded and the exchange treated as an error unless explicitly configured otherwise. For clients, treat this as equivalent to a server error; servers SHOULD generate a 400 or other 4xx status code. However, if the integrity proof for the first record is not known, this check SHOULD NOT fail unless explicitly configured to do so.
[RFC3230] describes digests applying to “the entire instance associated with the message”. The instance corresponds to the “representation” in Section 3 of [RFC7231], but unlike the existing digest algorithms, the “mi-sha256” digest algorithm specifies the top-level digest at the point when the “mi-sha256” content coding (Section 2) is applied or removed from the representation.
When the “mi-sha256” digest algorithm is specified for a representation, the recipient MUST use the base64-decoding (Section 4 of [RFC4648]) of the “mi-sha256” digest as the top-proof for the “mi-sha256” content encoding (Section 2.2).
The recipient MUST behave as described by Section 4.2.9 of [I-D.ietf-httpbis-header-structure] if it encounters improper padding, non-zero padding bits, or non-alphabet characters, where rejecting the data means to reject the representation.
If different mechanisms specify different top-proof values for the “mi-sha256” content encoding, the recipient MUST reject the representation.
If “mi-sha256” content coding has not been applied to the representation exactly once (Section 3.1.2.2 of [RFC7231]), the recipient MUST reject the representation.
When rejecting the representation, clients SHOULD treat this as equivalent to a server error, and servers SHOULD generate a 400 or other 4xx status code.
RFC EDITOR: Please remove the next paragraph before publication.
Implementations of drafts of this specification MUST use a digest algorithm named the same as the “mi-sha256-##” content encoding they implement, with the meaning described for “mi-sha256” above.
The following example contains a short message. This contains just a single record, so there are no inline integrity proofs, just a single value in the mi-sha256 parameter of a Digest header field. The record size is prepended to the message body (shown here in angle brackets).
HTTP/1.1 200 OK Digest: mi-sha256=dcRDgR2GM35DluAV13PzgnG6+pvQwPywfFvAu1UeFrs= Content-Encoding: mi-sha256 Content-Length: 49 <0x0000000000000029>When I grow up, I want to be a watermelon
This example shows the same message as above, but with a smaller record size (16 octets). This results in two integrity proofs being included in the representation.
PUT /test HTTP/1.1 Host: example.com Digest: mi-sha256=IVa9shfs0nyKEhHqtB3WVNANJ2Njm5KjQLjRtnbkYJ4= Content-Encoding: mi-sha256 Content-Length: 113 <0x0000000000000010>When I grow up, OElbplJlPK+Rv6JNK6p5/515IaoPoZo+2elWL7OQ60A= I want to be a w iPMpmgExHPrbEX3/RvwP4d16fWlK4l++p75PUu_KyN0= atermelon
Since the inline integrity proofs contain non-printing characters, these are shown here using the base64 encoding [RFC4648] with new lines between the original text and integrity proofs. Note that there is a single trailing space (0x20) on the first line.
The integrity of an entire message body depends on the means by which the integrity proof for the first record is protected. If this value comes from the same place as the message, then this provides only limited protection against transport-level errors (something that TLS provides adequate protection against).
Separate protection for header fields might be provided by other means if the first record retrieved is the first record in the message, but range requests do not allow for this option.
This integrity scheme permits the detection of truncated messages. However, it enables and even encourages processing of messages prior to receiving an complete message. Actions taken on a partial message can produce incorrect results. For example, a message could say “I need some 2mm copper cable, please send 100mm for evaluation purposes” then be truncated to “I need some 2mm copper cable, please send 100m”. A network-based attacker might be able to force this sort of truncation by delaying packets that contain the remainder of the message.
Whether it is safe to act on partial messages will depend on the nature of the message and the processing that is performed.
A new content encoding type is needed in order to define the use of a hash function other than SHA-256.
This memo registers the “mi-sha256” HTTP content-coding in the HTTP Content Codings Registry, as detailed in Section 2.
This memo registers the “mi-sha256” digest algorithm in the HTTP Digest Algorithm Values registry:
[FIPS180-4] | Department of Commerce, National., "NIST FIPS 180-4, Secure Hash Standard", March 2012. |
[I-D.ietf-httpbis-header-structure] | Nottingham, M. and P. Kamp, "Structured Headers for HTTP", Internet-Draft draft-ietf-httpbis-header-structure-07, July 2018. |
[MERKLE] | Merkle, R., "A Digital Signature Based on a Conventional Encryption Function", International Crytology Conference - CRYPTO , 1987. |
[RFC2119] | Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997. |
[RFC3230] | Mogul, J. and A. Van Hoff, "Instance Digests in HTTP", RFC 3230, DOI 10.17487/RFC3230, January 2002. |
[RFC4648] | Josefsson, S., "The Base16, Base32, and Base64 Data Encodings", RFC 4648, DOI 10.17487/RFC4648, October 2006. |
[RFC7231] | Fielding, R. and J. Reschke, "Hypertext Transfer Protocol (HTTP/1.1): Semantics and Content", RFC 7231, DOI 10.17487/RFC7231, June 2014. |
[RFC2818] | Rescorla, E., "HTTP Over TLS", RFC 2818, DOI 10.17487/RFC2818, May 2000. |
[RFC6962] | Laurie, B., Langley, A. and E. Kasper, "Certificate Transparency", RFC 6962, DOI 10.17487/RFC6962, June 2013. |
[RFC7233] | Fielding, R., Lafon, Y. and J. Reschke, "Hypertext Transfer Protocol (HTTP/1.1): Range Requests", RFC 7233, DOI 10.17487/RFC7233, June 2014. |
[RFC8446] | Rescorla, E., "The Transport Layer Security (TLS) Protocol Version 1.3", RFC 8446, DOI 10.17487/RFC8446, August 2018. |
[SRI] | Akhawe, D., Braun, F., Marier, F. and J. Weinberger, "Subresource Integrity", W3C CR , November 2015. |
David Benjamin and Erik Nygren both separately suggested that something like this might be valuable. James Manger and Eric Rescorla provided useful feedback.