Structure of an SSL (X.509) certificate

2020-05-22

I've been working on some tooling for pulling certificate information out of the certificate transparency logs on and off for a while. I started looking at this again after a few weeks away from it and I've forgotten quite a lot! I've even forgotten some of the basics of what makes a certificate. In this post I want to dive into the structure of a certificate, what it is made of at a high level. I won't talk much about how certificates are used in protocols (e.g. Transport Layer Security (TLS)).

This post started as a reference for myself but other folks may find it interesting or useful. It has a lot of external references to RFC's which are stored in footnotes.

Originally posted on dev.to, here.

Quick note, SSL certificates are X.509 certificates. The term SSL certificate is deeply ingrained on the web, and even though the SSL protocol should no longer be used this term is still used everywhere.

Information in a certificate

We'll use the openssl cli to retrieve a certificate, then we can start looking into its structure. If the openssl cli is not installed you should be able to install it through your operating system's package manager.

$ openssl s_client -connect google.com:443 2>/dev/null < /dev/null \
    | sed -n '/BEGIN CERTIFICATE/,/END CERTIFICATE/p' > google.com.crt

$ cat google.com.crt
-----BEGIN CERTIFICATE-----
MIIJRDCCCCygAwIBAgIRAJvxi9ebRliZAgAAAABjmHEwDQYJKoZIhvcNAQELBQAw
QjELMAkGA1UEBhMCVVMxHjAcBgNVBAoTFUdvb2dsZSBUcnVzdCBTZXJ2aWNlczET
MBEGA1UEAxMKR1RTIENBIDFPMTAeFw0yMDA0MTUyMDE2NDdaFw0yMDA3MDgyMDE2
...
V+hT9mqgeN10ryOWyN74CvBaw73K3hobSkDAyQS1HkbAqJP9VTuvjZl4PE0ndaIN
yiz/84k5xbSwxO++BuJgMUwj+WaLcvDW
-----END CERTIFICATE-----

$ openssl x509 -text -noout -in google.com.crt
Certificate:
    Data:
        Version: 3 (0x2)
        Serial Number:
            9b:f1:8b:d7:9b:46:58:99:02:00:00:00:00:63:98:71
        Signature Algorithm: sha256WithRSAEncryption
        Issuer: C = US, O = Google Trust Services, CN = GTS CA 1O1
        Validity
            Not Before: Apr 15 20:16:47 2020 GMT
            Not After : Jul  8 20:16:47 2020 GMT
        Subject: C = US, ST = California, L = Mountain View, O = Google LLC, CN = *.google.com
        Subject Public Key Info:
            Public Key Algorithm: id-ecPublicKey
                Public-Key: (256 bit)
                pub:
                    04:8e:a4:03:0d:0c:a7:1d:52:28:80:ba:89:51:b9:
                    45:7a:7a:60:33:a5:ab:25:a4:05:c8:32:d9:b6:5c:
                    ...
                ASN1 OID: prime256v1
                NIST CURVE: P-256
        X509v3 extensions:
            X509v3 Key Usage: critical
                Digital Signature
            X509v3 Extended Key Usage:
                TLS Web Server Authentication
            X509v3 Basic Constraints: critical
                CA:FALSE
            X509v3 Subject Key Identifier:
                D0:7D:02:36:9B:CD:47:0B:C5:9C:51:0F:27:A7:70:65:5A:C5:50:E9
            X509v3 Authority Key Identifier:
                keyid:98:D1:F8:6E:10:EB:CF:9B:EC:60:9F:18:90:1B:A0:EB:7D:09:FD:2B

            Authority Information Access:
                OCSP - URI:http://ocsp.pki.goog/gts1o1
                CA Issuers - URI:http://pki.goog/gsr2/GTS1O1.crt

            X509v3 Subject Alternative Name:
                DNS:*.google.com, DNS:*.android.com, DNS:*.appengine.google.com, ...
            X509v3 Certificate Policies:
                Policy: 2.23.140.1.2.2
                Policy: 1.3.6.1.4.1.11129.2.5.3

            X509v3 CRL Distribution Points:

                Full Name:
                  URI:http://crl.pki.goog/GTS1O1.crl

            CT Precertificate SCTs:
                Signed Certificate Timestamp:
                    Version   : v1 (0x0)
                    Log ID    : B2:1E:05:CC:8B:A2:CD:8A:20:4E:87:66:F9:2B:B9:8A:
                                25:20:67:6B:DA:FA:70:E7:B2:49:53:2D:EF:8B:90:5E
                    Timestamp : Apr 15 21:16:49.089 2020 GMT
                    Extensions: none
                    Signature : ecdsa-with-SHA256
                                30:45:02:20:26:77:E7:A4:C6:F9:D3:C0:0E:95:15:3C:
                                A2:08:F0:DB:77:9F:1F:7A:EC:7A:26:9B:E8:82:95:33:
                                ...
                Signed Certificate Timestamp:
                    Version   : v1 (0x0)
                    Log ID    : 5E:A7:73:F9:DF:56:C0:E7:B5:36:48:7D:D0:49:E0:32:
                                7A:91:9A:0C:84:A1:12:12:84:18:75:96:81:71:45:58
                    Timestamp : Apr 15 21:16:49.137 2020 GMT
                    Extensions: none
                    Signature : ecdsa-with-SHA256
                                30:45:02:21:00:FC:5A:10:9B:63:81:BB:16:81:8B:D5:
                                88:AF:09:A1:D8:83:FD:C3:86:CB:B1:CD:55:71:FF:76:
                                ...
    Signature Algorithm: sha256WithRSAEncryption
         20:69:ba:0b:e5:b4:7a:36:f7:4f:d2:b2:0f:0d:c1:10:b0:12:
         7e:13:f9:f1:ca:6c:a0:c2:46:21:fb:8a:fd:a8:66:a9:96:43:
         ...

There's quite a lot of information in the certificate. Before we break this down, a quick side note on the openssl command we used in the above code block. Feel free to skip this next section if you understood it.

Side note on the openssl command

The command we used was:

$ openssl s_client -connect google.com:443 2>/dev/null < /dev/null \
    | sed -n '/BEGIN CERTIFICATE/,/END CERTIFICATE/p' > google.com.crt

openssl s_client is used for connecting to hosts over TLS (and originally SSL, but no server should be using this anymore...). With -connect we tell it to connect to google.com on port 443. 2>/dev/null says to redirect anything that goes to stderr in the output of the openssl s_client command into /dev/null. Essentially this says ignore stderr. < /dev/null says read /dev/null into the stdin of the process, in thi s case stdin of openssl. Doing this always returns an end of file. +openssl s_client is used for connecting to hosts over TLS (and originally SSL, but no site should be using this anymore...).

If you don't do this stdin redirect the openssl s_client command will hang waiting for input on stdin, doing this redirect means that once it tries to read from stdin it will read an end of file and the read will finish. The last part of the command, | sed -n '/BEGIN CERTIFICATE/,/END CERTIFICATE/p' > google.com.crt, pipes the output from openssl s_client into sed which pulls the certificate out of the output and redirects it to a file called google.com.crt.

So why does it try to wait for input? There are few commands you can send it. Normally I'd just send Q to quit, however you can also tell it to send a http request. Try it! Run openssl s_client -connect google.com:443 then when it hangs type in:

GET / HTTP/1.1

And press enter twice. It should return the JS/html/css that makes up google.com. That was a bit of a digression, back to certs now.

A breakdown of the main fields

Now we have google.com's cert, what do all the fields mean? Let's break it down one by one. As we do this I will also mention the type and structure of the given field in the ASN.1 (Abstract Syntax Notation1) specification for certificates and try to explain things which may not be obvious. Wherever I outline a part of the definition I will start it with a comment # ASN.1. I will also strip out pieces of information which are not relevant to a particular section and replace them with .... The full ASN.1 definition can be found in Appendix A.1 of RFC 5280 - X.509 Public Key Infrastructure.

Note that the ASN.1 spec for certificates describes a high level representation of certificate information. As bytes, this information is further encoded as DER (Distinguished Encoding Rules). I won't go into the details of ASN.1 -> DER encoding2, but this post should lay some groundwork to make this encoding clearer in a future post.

Above we used openssl to pull out the information from an existing cert. The fields mentioned there can have many possible values. The main structure is defined as follows in ASN.1 - don't worry if you don't understand it, we will cover each field in depth.

# ASN.1
Certificate  ::=  SEQUENCE  {
     tbsCertificate       TBSCertificate,
     signatureAlgorithm   AlgorithmIdentifier,
     signature            BIT STRING  }

TBSCertificate  ::=  SEQUENCE  {
     version         [0]  Version DEFAULT v1,
     serialNumber         CertificateSerialNumber,
     signature            AlgorithmIdentifier,
     issuer               Name,
     validity             Validity,
     subject              Name,
     subjectPublicKeyInfo SubjectPublicKeyInfo,
     issuerUniqueID  [1]  IMPLICIT UniqueIdentifier OPTIONAL,
                          -- If present, version MUST be v2 or v3
     subjectUniqueID [2]  IMPLICIT UniqueIdentifier OPTIONAL,
                          -- If present, version MUST be v2 or v3
     extensions      [3]  Extensions OPTIONAL
                          -- If present, version MUST be v3 --  }

Certificate

Certificate is an ASN.1 SEQUENCE. A SEQUENCE is an ordered list of values. In this case a list with tbsCertificate, signatureAlgorithm and signature. First let's look at signature and signatureAlgorithm.

Signature and Signature Algorithm

In the openssl output for google.com's cert both Signature3 and Signature Algorithm4 can be seen right at the bottom:

Signature Algorithm: sha256WithRSAEncryption
     20:69:ba:0b:e5:b4:7a:36:f7:4f:d2:b2:0f:0d:c1:10:b0:12:
     7e:13:f9:f1:ca:6c:a0:c2:46:21:fb:8a:fd:a8:66:a9:96:43:
     ...

I've stripped out part of the Signature and replaced it with ... as it's not relevant. The Signature Algorithm field indicates the algorithm used by the issuing Certificate Authority (CA) to sign this certificate. Here its value is sha256WithRSAEncryption. For more information on this algorithm see Section 5 of RFC 4055 - Additional Algorithms and Identifiers for RSA ... and RFC 2313 - PKCS #1: RSA Encryption. Signature and SignatureAlgorithm are defined as follows in the ASN.1 spec:

# ASN.1
Certificate  ::=  SEQUENCE  {
     ...
     signatureAlgorithm   AlgorithmIdentifier,
     signature            BIT STRING  }

signature has type BIT STRING, this is simply a string of bits. In this case those bits contain a digital signature computed from the tbsCertificate field of this cert, using the algorithm defined in signatureAlgorithm. signatureAlgorithm is of type AlgorithmIdentifier. openssl encodes this as : separated hexadecimal, e.g. 20:69:ba:0b:e5:b4.... AlgorithmIdentifier itself is a SEQUENCE, an ordered list with the given items. In this case a list with two values, algorithm and parameters.

# ASN.1
AlgorithmIdentifier  ::=  SEQUENCE  {
     algorithm               OBJECT IDENTIFIER,
     parameters              ANY DEFINED BY algorithm OPTIONAL  }
                                -- contains a value of the type
                                -- registered for use with the
                                -- algorithm object identifier value

Here algorithm has type OBJECT IDENTIFIER. An OBJECT IDENTIFIER, or OID, is a standard way of identifying objects defined by the International Telecommunications Union (ITU). It is defined in RFC 3061 - A URN Namespace of Object Identifiers and is definitely not something I'll go into more detail on in this post, it is a big topic! For the curious, the OID for sha256WithRSAEncryption is 1.2.840.113549.1.1.115. It is defined in RFC 4055 Section 5, to fully understand that definition you may have to go down a rabbit hole of RFC's...

parameters defines parameters to the specified algorithm. We will see an example of this in the Subject and Subject Public Key Info section.

TBSCertificate

TBSCertificate contains the information on the subject of the certificate and the issuing CA. We will cover all fields except extensions, issuerUniqueID and subjectUniqueID. There is quite a lot in extensions, so I will leave it for another post. issuerUniqueID and subjectUniqueID are optional and do not appear in google.com's cert. It is also recommended that these not be set6.

Version

In the cert we decoded, the Version7 is as follows:

Version: 3 (0x2)

Version is defined as:

# ASN.1
TBSCertificate  ::=  SEQUENCE  {
     version         [0]  Version DEFAULT v1,
     ...
}
...
Version  ::=  INTEGER  {  v1(0), v2(1), v3(2)  }

This says the version field of a certificate can be one of three values - 0 which means the version is v1, 1 which means the version is v2 and 2 which means the version is v3. It defaults to v1, which is 0. This explains why there is a 3 and a 0x2 in the output from openssl, it's showing the version number (3) and the real value of the field (2).

So what does version mean? There are three versions of X.509 certificates out in the wild, it just indicates which version of the X.509 spec a given cert is using. I will gloss over the differences in the versions in this post as its not too important for an overview. You will mainly see X.509 v3 certificates in the wild.

Serial Number

The Serial Number8 is a unique integer given to the certificate. It is unique for all certificates issued from the same CA, e.g. Digicert or Lets Encrypt, not globally unique. Meaning two certs from different CA's could potentially have the same Serial Number. Serial Number looks as follows in the cert we decoded:

Serial Number:
    9b:f1:8b:d7:9b:46:58:99:02:00:00:00:00:63:98:71

And its ASN.1 definition:

# ASN.1
TBSCertificate  ::=  SEQUENCE  {
     ...
     serialNumber         CertificateSerialNumber,
     ...
}

CertificateSerialNumber  ::=  INTEGER

It is of type CertificateSerialNumber, which is just an alias for an integer.

Signature Algorithm

This field 9 must contain the same algorithm as the signatureAlgorithm field outside the tbsCertficate which we discussed in the Signature and Signature Algorithm section. And in this case it does:

Signature Algorithm: sha256WithRSAEncryption

This is the signature field in the TBSCertificate:

# ASN.1
TBSCertificate  ::=  SEQUENCE  {
     ...
     signature            AlgorithmIdentifier,
     ...
}

We looked at AlgorithmIdentifier in Signature and Signature Algorithm.

This field is duplicated, why?

This was interesting as the reason for the duplication was not clear. We saw the Signature Algorithm appear in the top level Certificate and it also appears inside the TBSCertificate. Section 4.1.2.3 of RFC 5280, which talks about the signature field within TBSCertificate, states:

This field MUST contain the same algorithm identifier as the signatureAlgorithm field in the sequence Certificate (Section 4.1.1.2)

But nowhere in that RFC does it give a reason. Section 1 of RFC 6211 - Cryptographic Message Syntax (CMS) gives some clue as to a possible reason - to prevent algorithm substitution attacks:

The Cryptographic Message Syntax [CMS], unlike X.509/PKIX certificates [RFC5280], is vulnerable to algorithm substitution attacks. In an algorithm substitution attack, the attacker changes either the algorithm being used or the parameters of the algorithm in order to change the result of a signature verification process. In X.509 certificates, the signature algorithm is protected because it is duplicated in the TBSCertificate.signature field with the proviso that the validator is to compare both fields as part of the signature validation process.

I don't know enough about this to comment further. But I am currently investigating. Once I understand more I will write about it.

Issuer

The Issuer10 field is a unique identifier for the CA issuing this certificate.

Issuer: C = US, O = Google Trust Services, CN = GTS CA 1O1

The cert we decoded was issued by Google Trust Services. Google have a number of CA's under Google Trust Services see https://pki.goog/ for more details. The Issuer field along with the Serial Number will uniquely identify a certificate, as long as the Issuer is a globally trusted CA.

Issuer is defined as a Name in the spec:

# ASN.1
TBSCertificate  ::=  SEQUENCE  {
     ...
     issuer               Name,
     ...
}

Name itself is a bit weird:

# ASN.1
Name ::= CHOICE { -- only one possibility for now --
      rdnSequence  RDNSequence }

RDNSequence ::= SEQUENCE OF RelativeDistinguishedName

...

RelativeDistinguishedName ::= SET SIZE (1..MAX) OF AttributeTypeAndValue

# AttributeTypeAndValue is defined as follows
AttributeTypeAndValue ::= SEQUENCE {
     type     AttributeType,
     value    AttributeValue }

AttributeType ::= OBJECT IDENTIFIER

AttributeValue ::= ANY -- DEFINED BY AttributeType

CHOICE defines a list of options to pick from. So, Name is a choice with a single option, an RDNSequence, and it even says there is only one possibility for now. I don't really know why this is, it could be for forwards compatibility on changes to the spec.

Have a look at the definition of RDNSequence, it is defined as SEQUENCE OF RelativeDistinguishedName. This is different than SEQUENCE which we saw in the Signature Algorithm section. SEQUENCE OF defines a list of values which are all the same type, in this case they are all RelativeDistinguishedName.

We haven't seen SET yet either. This defines a set of objects. SET SIZE (1..MAX) OF AttributeTypeAndValue defines a set of AttributeTypeAndValue's of max size MAX, I'm not sure what value MAX is bound to here either, will edit and add this information once I realise what it is.

In short, Name is a list of OID/value pairs where the value is some object bound by that OID. Section 2.3 of RFC 2253 - LDAP v3 describes RelativeDistinguishedName and its encoding in more depth.

Validity

Validity11 specifies the time window a certificate is valid between.

Validity
    Not Before: Apr 15 20:16:47 2020 GMT
    Not After : Jul  8 20:16:47 2020 GMT

The cert we decoded is valid from Apr 15 20:16:47 2020 GMT to Jul 8 20:16:47 2020 GMT. The certificate is invalid outside of this timeframe.

# ASN.1
TBSCertificate  ::=  SEQUENCE  {
     ...
     validity             Validity,
     ...
}
...
Validity ::= SEQUENCE {
     notBefore      Time,
     notAfter       Time  }

Time ::= CHOICE {
     utcTime        UTCTime,
     generalTime    GeneralizedTime }

Validity has two fields which are both of type Time. Time can be one of two types, UTCTime12 or GeneralizedTime13. Dates before the year 2050 must be encoded as UTCTime and dates on or after the year 2050 must be encoded as GeneralizedTime. This is outlined in the definition of Validity in RFC 5280.

Certificate validity and the Brazilian government

I stumbled across this issue on github while on my certificate information seeking adventures. As mentioned above, dates before the year 2050 should be encoded as UTCTime, but the Brazilian government had their own specification which required the use of GeneralizedTime for all dates. This is a good example of what you see in an RFC/specification not being exactly what you see in a real system.

Subject and Subject Public Key Info

Subject14 identifies the owner of the public key in the Subject Public Key Info15 section , which defines the "thing" this certificate identifies. In this case it's identifying google domains.

Subject: C = US, ST = California, L = Mountain View, O = Google LLC, CN = *.google.com

Its definition is:

# ASN.1
TBSCertificate  ::=  SEQUENCE  {
    ...
    subject              Name,
    ...
}

Name was explained in the Issuer section, so it should be clear what this is. Subject Public Key Info has a few fields. Let's look at its ASN.1 definition first.

# ASN.1
TBSCertificate  ::=  SEQUENCE  {
    ...
    subjectPublicKeyInfo SubjectPublicKeyInfo,
    ...
}
...
SubjectPublicKeyInfo  ::=  SEQUENCE  {
        algorithm            AlgorithmIdentifier,
        subjectPublicKey     BIT STRING  }
...

Let's see the actual value of Subject Public Key Info in the certifcate again:

Subject Public Key Info:
  Public Key Algorithm: id-ecPublicKey
      Public-Key: (256 bit)
      pub:
          04:0f:45:4e:2f:0c:a7:88:9a:b9:24:ff:57:50:dc:
          f1:ab:6e:dd:3e:7f:82:26:30:a7:12:9f:81:8a:27:
          9d:7d:06:2e:d3:e2:50:3b:ce:6c:2d:2e:5b:32:ce:
          7d:eb:86:06:7c:8c:29:2b:47:61:de:f0:ca:f8:b7:
          98:00:21:6a:34
      ASN1 OID: prime256v1
      NIST CURVE: P-256

Public Key Algorithm defines the algorithm the key can be used with. This is the algorithm field in the SubjectPublicKeyInfo field defined above. The rest is the BIT STRING. The BIT STRING itself is constrained by the type of algorithm. The information in the fields the BIT STRING decodes to is beyond this post. If interested see Section 2.3.5 of RFC 3279 - Algorithms and Identifiers X.509 PKI ... for more on id-ecPublicKey.

Conclusion

If you made it to here, kudos! I hope this has given you a better understanding of what really makes up a certificate, and the sheer complexity around X.509 in general. There is a lot I glossed over in this post which I hope to drill deeper into in the future.

In the next post I'll cover the TBSCertificate extensions fields.


1

ASN.1 - see Introduction to ASN.1.

2

ASN.1 to DER encoding - see ASN.1 encoding rules.

3

Signature - see Section 4.1.1.3 of RFC 5280.

4

Signature Algorithm in Certificate - see Section 4.1.2.3 of RFC 5280.

5

OID for sha256WithRSAEncryption - see https://oidref.com/1.2.840.113549.1.1.11.

6

issuerUniqueID and subjectUniqueID - see Section 4.1.2.8 of RFC 5280.

7

Version - see Section 4.1.2.1 of RFC 5280.

8

Serial Number - see Section 4.1.2.2 of RFC 5280.

9

Signature Algorithm in TBSCertificate - see Section 4.1.1.2 of RFC 5280.

10

Issuer - see Secion 4.2.1.4 of RFC 5280

11

Validity - see Secion 4.2.1.5 of RFC 5280

12

UTCTime - see Section 4.1.2.5.1 of RFC 5280.

13

GeneralizedTime - see Section 4.1.2.5.2 of RFC 5280.

14

Subject - see Secion 4.2.1.6 of RFC 5280

15

Subject Public Key - see Secion 4.2.1.6 of RFC 5280

16

X509v3 Key Usage - see Section 4.2.1.3 of RFC 5280.

17

X509v3 Extended Key Usage - see Section 4.2.1.12 of RFC 5820.

18

critical - see Section 4.2 of RFC 5280.

19

TLS 1.3 Digital Signature - see Section 4.4.2.2 of RFC 8446.

20

id-exPublicKey - see Section 2.1 of RFC 5480.

21

X509v3 Basic Constraints - see Section 4.2.1.9 of RFC 5280.