simphone security architecture (version 0.8)

Simphone implements state-of-the-art, 256-bit cryptographic security (the highest by today's standards) with the following features:

Simphone uses only standard cryptographic primitives, but combines them in simple ways to increase their security level and add safety margins.

The rest of this document describes used cryptographic procedures in detail. It is assumed that the reader is familiar with cryptography. For readers familiar with cryptography, but not with the TLS protocol, it might be helpful to read RFC5246 and other relevant RFCs to find bits of information not repeated in this document.

1. Use of Openssl

The openssl library is used to implement the initial handshake between two connecting peers. The TLS handshake is first simplified (by removing all unnecessary features) and then strengthened, as described under Key Derivation.

Everything which could be removed from the openssl/crypto library at compile-time without preventing the handshake from succeeding, was eliminated by #defining appropriate OPENSSL_NO_xxx symbols, including the following features:

In addition, the following features that could not be disabled at compile-time are always disabled at run-time:

Note that when a client initiates an Anonymous Key Exchange, more TLS ciphers, elliptic curves and signature algorithms are enabled. But if the server responds with anything other than what is required, the client closes the connection.

2. Symmetric Ciphers

The following block ciphers are currently implemented through the used crypto++ library:

name

block size (bytes)

key size (bytes)

security level (bits)

Rijndael (triple AES)

16

96

512

Serpent

16

32

256

Twofish

16

32

256

RC6-32 (with 40 rounds)

16

64

256

MARS-2

16

56

256

Camellia

16

32

256

ARIA

16

32

256

CAST-256

16

32

256

IDEA (with 16 rounds)

8

16

128

DES-EDE3 (triple DES)

8

24

112

Ciphers are strengthened as follows:

The given key sizes are used for both encryption and key derivation (CBC-MAC). For encryption, ciphers are used in EAX mode, which allows arbitrary cipher block size.

A special composite cipher is used for file encryption and as a default cipher for data transfer. This cipher is defined as a random permutation of the first eight ciphers from the above table, all of them used with the same key. Single (not triple) rijndael is used as a part of the composite cipher. To decide which permutation of the eight ciphers to apply, an SHA2-256 hash of the encryption key is calculated and the first (unsigned big endian) 64 bits of this hash are taken modulo 40320, to obtain one of 40320 possible permutations of eight numbers. This composite cipher is then used in EAX mode to encrypt files or transfer data, and in ECB mode to implement a sub-RNG (it is also used as a hash function to obtain its own key from input entropy when generating a seed, just like other sub-RNGs; see Random Number Generation for details).

The key size of the composite cipher is 64 bytes; this key is truncated to the maximal key size each cipher supports according to the table above. For rijndael, the key is truncated to 32 bytes.

3. Anonymous Key Exchange

The following terms are used when describing handshakes (for brevity):

  • address is a simphone address of one of the peers (unless its an IP address)
  • client is the peer that initiated a TCP connection
  • server is the peer that received a TCP connection
  • proxy is a third peer that facilitates a TCP connection between client and server
  • customer is a server which has no incoming connectivity on its own but is connected to a proxy

Before login (program start), each peer generates two anonymous self-signed secp256r1 elliptic curve keys, using RNG1. These keys are kept in memory; the first one is used only for the anonymous key exchange described in this chapter, while the second one is used only for the Proxy Key Exchange. In either case, the self-signature is not verified. Note that as far as the client is concerned, these keys aren't used, but server keys on any openssl-supported elliptic curve are accepted by the client.

The server calls SSL_accept with its anonymous key while the client calls SSL_connect with no key to do a TLS handshake. When the handshake has completed successfully, both peers have calculated a shared value known as TLS "premaster secret".

A key for the Twofish cipher is derived from this value as described under Key Derivation with only one particularity: if the premaster secret in this case is shorter than 64 bytes, it is replicated in order to become exactly 64 bytes long (or if it was longer than 64 bytes, it would be truncated to 64 bytes) before doing the calculations to derive the cipher key. In practice, the secp256r1 curve generates a 32-byte premaster secret, so it is simply duplicated to become 64 bytes long; however, the current proxy cipher can only take a 32-byte key, so the replicated data have no effect. The counter value for key derivation is 0.

As soon as this is done, the client (customer) sends a packet containing its version number and a list of supported proxy ciphers (as their unique 62-bit identifiers, same as described under Cipher Setup. The only currently supported proxy cipher is Twofish.

This version request packet also contains the IP address and port number that the client (customer) has established a connection to. The server sends back a version reply packet containing its version number and the unique 62-bit identifier of the chosen proxy cipher; if none of the proxy ciphers supported by the client can be chosen, the proxy closes this connection immediately. If the "proxy" is actually the server that this client is trying to connect to, it sends back a zero 62-bit identifier, so the client will discard the Twofish (proxy cipher) key, as described under Proxy Key Exchange, as it is a direct connection (no relaying of packets).

The version reply packet also contains the IP address and source port number of the connecting client, as seen by the proxy; this also happens when a customer is connecting to its proxy. Both version request and version reply packets are encrypted exactly as described under Data Transfer. The tag size is fixed to 16 (equal to the block size of the Twofish cipher).

If client and server version numbers are compatible, EC and RSA key exchange (described in the next two chapters) then take place.

4. EC (Elliptic Curve) Key Exchange

All data of this handshake are encrypted with the chosen proxy cipher (Twofish); the tag size is fixed to the block size of the used cipher.

The server now calls SSL_connect while the client first generates an ephemeral (temporary) EC key and calls SSL_accept. This means that the server sends a "client hello" TLS message which includes a 32-byte "client random" value, while the client sends a "server hello" TLS message, which includes, among other things, a 32-byte "server random" value and an X.509 certificate, which contains the EC public key and the RSA public key of the client. Both public keys are signed by the EC public key; this means that the EC public key is self-signed. The signatures are there only to comply with the TLS protocol; they are not verified.

When the server or the client receives the other peer's public keys, openssl calls a "verify" callback. This callback checks that:

  • at least two public keys were received
  • the first public key is of type EC and uses the brainpoolP512r1 curve
  • the first public key is actually a point on that curve
  • the second public key is of type RSA with a modulus size between 2048 and 16384 bits
  • the signature type of the first key is ECDSA/SHA2-512 or RSA/SHA2-512
  • the signature type of the second key is ECDSA/SHA2-512

If any of these checks fail, the verify callback fails. Otherwise, a RIPEMD-160 hash of the SHA2-256 hash of the concatenation of EC public key followed by RSA public key is calculated; this is the client's address. The server matches this address to its list of contacts, to identify the client. If there is no match, the verify callback will either fail (if configured to refuse contact with strangers) or (by default), the user program will receive a contact request from a new contact with the calculated address (hash of the hash of both public keys). No response is sent to the client until this contact request is accepted by the human user (the client does receive the server's public key by default, but this can be disabled if not desired). A match to the list of blocked or deleted contacts will always fail the verify callback.

Should the verify callback fail, the TLS handshake is aborted and the TCP connection is closed immediately; the server does not reveal its public keys to the client in this case. Otherwise, the server proceeds to send a certificate containing its EC and RSA public keys. The client then verifies the keys in the same manner except it compares the address it was trying to connect to, to the address calculated from received public keys; a mismatch fails the verify callback at the client side.

During the handshake, the server receives the client's ephemeral key and generates its own ephemeral key on the same curve, which is sent to the client. Peers then use both ephemeral keys to calculate a premaster secret. The handshake is signed by the server using SHA2-512 and ECDSA through its non-ephemeral (permanent) EC key. After the handshake is authenticated and completed successfully, the server now checks that the used ephemeral EC key is on the brainpoolP512r1 curve and if this is so, the TLS handshake succeeds.

5. RSA Key Exchange

When/if the EC key exchange (handshake) succeeds, another handshake is initiated through the same TCP connection; all network packets of this exchange are encrypted with the composite cipher. The encryption key of the chosen proxy cipher is discarded and a new key for the composite cipher is derived from the premaster secret of the EC key exchange, as described under Key Derivation (the counter value is 0). Data is transferred as described under Data Transfer, but data packets are padded to a length divisible by 1448 instead of 181; raw (non-TLS) openssl RSA routines are called to implement RSA encryption and decryption.

First the client sends a packet as described by the table below. The server receives this packet and sends a response packet described by the same table. If hs was not sent by the client, this means that only status packets are being exchanged; the server closes the connection immediately after sending its response.

contents

client

server

fwd

rev

list of supported key exchange types (currently only RSA-XOR, described in this chapter)

hs

chosen key exchange type (currently RSA-XOR)

sh

tag size (the maximal block size of any supported block cipher, currently 16)

blk

blk

client_preferred: list of preferred ciphers

pref

client_supported: list of other supported but non-preferred ciphers (not including the composite cipher, which is always supported)

supp

encrypt_cipher: cipher to use for server-side encryption (client-side decryption)

encrypt

decrypt_cipher: cipher to use for server-side decryption (client-side encryption)

decrypt

revocation reason string (public key has been irreversibly revoked)

REVOKE

REVOKE

client-server protocol version number

vereq

verep

"

"

32-bit random identifier of this connection (generated by RNG0)

rnd

rnd

"

"

62-bit number of last received message from this peer

ack

ack

"

"

current status (off/on/away/busy/hide)

s

s

"

"

contact flags: allowed communication types (text chat, audio call, etc.)

f

f

"

"

number of last sent messages that contact wishes to allow you to edit

em

em

"

"

nickname or real name (as configured by the user). Present ONLY when sending a contact request

nick

"

client: IP address this connection was established to
server: IP address this connection was received from

to

from

"

"

port number this connection was established to

top

"

"

IP address of own proxy (not present if no proxy used)

pip

"

listening port number of own proxy (not present if no proxy used)

pp

"

own (internet) IP address

gip

gip

own (internet) listening port number

gp

gp

"

local (intranet) IP address

lip

lip

local (intranet) listening port number

lp

lp

list of supported audio codecs

ac

ac

name of insecure operating system (windows-10) or empty string if used system is secure

bad

bad

client flags: verify IP address
server flags: pending messages, willing to serve as a proxy etc.

flags

flags

"

minimally desired contact period. This reveals the number of authorized contacts of the server (unless manually overridden)

logon

"

number of milliseconds since the last status change. Present ONLY if both connecting peers have the same address

age

age

"

"

The columns of the above table have the following meanings:

  • client: sent by client under the specified name
  • server: sent by server under the specified name
  • fwd: sent ALSO when exchanging only a (forward) status packet
  • rev: sent ALSO when exchanging only a reverse status packet (client-only)

If vereq and verep are compatible, the rest of the packet is processed. Then, if hs was sent, both sides check the received blk and force it to 0 if less than 0 or to 256 if more than 256. This is used during data transfer as number of additional bytes at the beginning (IV) and end (hash tag) of each data packet.

Afterwards, both the client and server simultaneously use their RNG1 to generate a random value as long as the other peer's RSA public key (excluding 42 bytes, required by the used OAEP padding), encrypt it with that key and send it to the other peer. Each of them then receives this encrypted value from the other peer and decrypts it with its private RSA key.

The server then checks again whether the address of the client is already known and if not, handles this "contact request" by adding a new contact with the calculated address and "contact requested" status (not allowed to send or receive data until contact is accepted manually by the human user). There is a user-defined limit of the number of non-accepted contact requests that can be received within 24 hours; if more than that arrive, they are not added to the list of contacts and the handshake fails after the double public key exchange (TLS handshake and RSA handshake) has succeeded.

Finally, both sides check to see if they have stored public keys for their connected peer. If so, received public keys are compared byte-by-byte to the stored ones; if one or both of them are not identical, the handshake fails, and an event is sent to the user program, which displays an error message and plays a special alarm sound in an "infinite" loop. If there were no stored public keys for this contact, both received public keys are stored silently.

If REVOKE has been received, data transfer proceeds only for the revoker to send a special key revocation packet containing an RSA-signature of the Whirlpool hash of concatenation of the revoker's EC public key followed by RSA public key followed by the revocation reason followed by REVOKED. The PSS-MGF1 padding scheme with SHA2-512 is used here. The receiver verifies this signature and on success blocks the revoking peer immediately and irreversibly. A verification failure causes the user program to display a warning and play a warning sound.

6. Key Derivation

openssl usually derives a "master secret" from its "premaster secret" and "client random" and "server random" values, and then proceeds to derive session keys from this master secret. But with Simphone, these steps are skipped and a symmetric cipher key is derived directly from the "premaster secret" and "client random" and "server random" values in a slightly different manner. Note that because CBC-MAC is used for this key derivation, the derived key also depends on the cipher to be used.

The other input values (in addition to the cipher) to this key derivation procedure are:

  • tls_premaster: "premaster secret" agreed upon during the EC handshake
  • client_random: public "client random" value sent by the server during the EC handshake
  • server_random: public "server random" value sent by the client during the EC handshake
  • counter: number of key to be derived
  • user_address: address of server (only if counter is 0)
  • RSA_client: secret random value generated by the client during the RSA handshake
  • RSA_server: secret random value generated by the server during the RSA handshake

tls_premaster is always as long as the elliptic curve size, that is 64 bytes (except for anonymous handshake with a 256-bit elliptic curve, in which case it is 32 bytes). TLS random values are 32 bytes each.

Let L be the block cipher's key size in bytes (as listed in the "symmetric ciphers" table). Then:

   if L < 32 then
     l = L
   else
     l = 32
   endif
   K0 = CBC-MAC-cipher (tls_premaster[0, l], client_random || server_random)
   K1 = HKDF-Whirlpool (tls_premaster[l, 64 - l], client_random || server_random)
   if counter < 2 then
     K2 = user_address
   else
     K2 = RSA_client[32 * (counter - 2), 96] xor RSA_server[32 * (counter - 2), 96]
   endif
   K = K0 xor K1[64 * counter, 96] xor K2

In the above equations, || denotes concatenation, xor is "exclusive or", array[i, n] are n bytes taken from array starting at (zero-based) index i, and K is the derived key.

All results are XORed to produce a derived key. Note that CBC-MAC produces a number of bytes equal to the cipher block size; HKDF-Whirlpool (as defined by RFC5869) produces "unlimited" output, of which 96 bytes are taken, while RSA_client and RSA_server are also quite long, but only up to 96 bytes per key are taken from them. Values are XORed from the beginning; this means that CBC-MAC does not participate in deriving the later bytes. The resulting 96-byte value is then truncated to be only as many bytes long as the cipher can take as a key.

This means that:

  • the number of bytes taken from the beginning of tls_premaster is equal to the key size of the used cipher but at most 32 bytes
  • the rest of tls_premaster is used to key HKDF-Whirlpool (at least 32 bytes) while the two public random values are used as a salt
  • the counter byte is used to derive multiple keys from the same inputs, taking a different 64-byte chunk of K1 for each derived key
  • RSA secret values are taken as they are (without applying any key derivation function)
  • different counter values use different bytes from RSA secret values; counter 0 uses the address

Note that when deriving from tls_premaster with rijndael, taking only 32 bytes of key material means that CBC-MAC is based on single (not triple) rijndael.

7. Cipher Setup

This part takes place during the already described RSA handshake. Both peers agree on cipher(s) that will be used to encrypt and decrypt data. No information about the cipher(s) agreed upon is revealed in packets sent over the network.

Each user has a non-empty list of preferred ciphers, which by default contains all of the supported ciphers that take at least a 32-byte key, except the composite cipher. During the cipher handshake, both peers attempt to agree on a random cipher preferred by both sides; if such a cipher doesn't exist, each side will use one of their preferred ciphers for their sending side of the (TCP and UDP) connection.

The server sets encrypt_cipher and decrypt_cipher to decide which cipher(s) will be used for encryption (client-side decryption) and decryption (client-side encryption) by applying the following algorithm:

   if intersection (server_preferred - composite, client_preferred) is non-empty then
     decrypt_cipher = encrypt_cipher = random_cipher (from intersection)
   else if composite in client_preferred and composite in server_preferred then
     decrypt_cipher = encrypt_cipher = composite
   else
     decrypt_cipher = composite
     if intersection (server_preferred, client_supported) is non-empty then
       encrypt_cipher = random_cipher (from intersection)
     else if composite in server_preferred or intersection (server_supported, client_supported) is empty then
       encrypt_cipher = composite
     else
       decrypt_cipher = encrypt_cipher = random_cipher (from intersection)
     endif
     if intersection (server_supported, client_preferred) is non-empty then
       decrypt_cipher = random_cipher (from intersection)
     else if composite in client_preferred then
       decrypt_cipher = composite
     else if intersection (client_supported, server_supported) is empty then
       decrypt_cipher = encrypt_cipher
     else if decrypt_cipher is composite then
       decrypt_cipher = random_cipher (from intersection)
     endif
   endif

In the above description, client_supported and server_supported never include the composite cipher.

Cipher lists sent over the network (client_preferred and client_supported) transmit a unique 62-bit identifier for each listed cipher. These are precalculated as a truncated hash tag of a 128-bit block full of zeros, encrypted in the cipher's mode (EAX mode) with a key derived from zeros, zero IV, zero sequence number and zero counter value.

encrypt_cipher and decrypt_cipher do not use these cipher identifiers. Instead, each of them contains RNG0-generated random bytes encrypted with the cipher and its would-be derived key. The encrypted string has a RNG0-generated random IV and a hash tag; the total number of encrypted bytes (including the IV and tag) is equal to three times the server tag size sent in the same packet. The client figures out which ciphers are requested by trying to decrypt the packet with all of its supported ciphers and chooses the cipher that could authenticate the bytes correctly. The sequence number is 0, while the counter value is as described in the next paragraph.

With the cipher(s) agreed upon, a symmetric key or keys are derived as described under Key Derivation; the composite cipher key used to protect the RSA key exchange is discarded. If the same cipher is used for both encryption and decryption, the counter value is 2. If different ciphers are used, the client-side encryption cipher uses a counter value of 2, while the client-side decryption (server-side encryption) cipher uses a counter value of 2 + ks, where ks is the client-side encryption cipher key size (counted as a whole number of 32-byte chunks).

Note that the cipher handshake may force you to use a non-preferred cipher also for the sending side. In any case, if a non-preferred cipher is used (for either receiving or sending), an event is sent to the user program, which displays a warning.

8. Data Transfer

When the cipher handshake has completed successfully, data transfer can finally begin. Data is sent in packets, each of which has a 5-byte header, containing the data length (up to 16384) in bytes. This is compatible with the TLS protocol, but openssl is not called to send or receive the data. As required by this protocol, an invisible 64-bit "sequence number" is authenticated together with the data and incremented on each processed packet. These packets go over TCP.

Unlike TLS, where two different keys (both derived from the master key) are always used for both sides of the connection and their sequence numbers always start at 0 for each side, a single key derived directly from the premaster secret is used when using the same cipher for both sides (if two different ciphers are used, then two different keys are used). The server starts its sequence numbers at 2; the client starts its sequence numbers at 262 + 2. The sequence numbers are incremented by 2 on each packet, so they are always even numbers.

UDP data transfer (used only for audio calls) is started upon request and agreement of both parties as follows:

  • when a peer starts talking, "UDP request" packets are sent to the peer's known IP addresses and (UDP) port numbers
  • when a "UDP request" packet is received, a "UDP reply" response is sent back
  • when a "UDP reply" packet is received, a "UDP start" response is sent back

These packets are internally padded by RNG0-generated bytes, so they have at least the same size as audio packets that would follow.

Receiving a "UDP reply" or "UDP start" locks the socket to the IP address where the packet came from and any further UDP packets that arrive over this socket for the duration of the audio call are accepted only from that IP address. It also sets the UDP port for this peer, so that any further audio data packets will, from this point on, be sent via UDP to this port number (and the locked IP address).

UDP packets use the same encryption key as the TCP stream; they do not have a TLS or DTLS header. The sequence number of UDP control packets is fixed to -1 for the server and to 262 - 1 for the client. The sequence number of UDP audio data packets is fixed to -ss for the server and to 262 - cs, where ss is the audio sampling rate for the server, and cs is the audio sampling rate for the client. Their contents can be up to 16384 bytes long (but in practice always less than 1472 bytes, so datagrams aren't fragmented). All UDP packets contain their own incrementing timestamps encrypted inside them.

Each data stream counts the number of bytes sent and received; when the sum of these two exceeds a certain limit (set to one gigabyte by default), the connection is closed by both sides. Both TCP and UDP data are taken into account. A limit of one gigabyte is adequate for ciphers with an 8-byte block size; if those aren't used (and they aren't by default), it should be safe for the user to increase this limit. The maximum allowed is one terabyte.

Data packets are always padded with random data to a length divisible by 181. The packet data contain a random IV, followed by the encrypted data, followed by padding of necessary length, followed finally by the hash tag (so the random padding is also encrypted and authenticated). The length of the IV and hash tag is the maximum of the block cipher's block size and blk (the tag size, for this side of the connection, sent during the RSA handshake). Thus ciphers with an 8-byte block size in the current version will still use 16 bytes as an IV/tag size which should make them indistinguishable from other ciphers to network packet observers. The receiver then will truncate the IV/tag to 8 bytes for those ciphers to process the packets.

Audio packets have a constant size (variable bit-rate is not used), so they are not padded, regardless of whether they are sent over a direct (TCP or UDP) or over a relayed connection.

IVs are generated by RNG0 and then encrypted with the cipher in EAX mode with a random pre-IV (also generated by RNG0) and zero sequence number before being sent over the network. The hash tag and the pre-IV of this encryption is discarded (see Random Number Generation for details on RNG0).

At the receiving side, if a packet does not pass authentication, it is discarded before decryption and the connection to the peer is closed immediately, as required by the TLS protocol. Alerts are neither sent nor processed. However, a successfully authenticated packet that has corrupted contents will send an event to the user program, if detected. The user program displays a warning and plays a special warning sound.

9. Proxy Key Exchange

The previous chapters fully describe direct communication between two peers. However, if a server has no incoming connectivity, it is necessary for a proxy to help others establish connections to this server. When they want to connect to this server, they actually connect to its proxy, which does have incoming connectivity and can relay packets between both sides.

Any user who has incoming connectivity can serve as a proxy for any other user, but contacts are preferred to non-contacts when choosing a proxy. The key exchange through a proxy differs from a direct handshake as described below.

The proxy key exchange takes place between customer and proxy, while client does only an anonymous key exchange with the customer's proxy. EC and RSA key exchanges take place between client and server, through the proxy. The proxy uses the chosen proxy cipher to decrypt received packets from client and re-encrypts them with the chosen proxy cipher before forwarding them to its customer. The same is done when forwarding packets from customer to client. For encrypting packets to client, the proxy uses a key derived from the anonymous handshake with client; for encrypting packets to customer, the proxy uses a key derived from the EC handshake with customer.

After all key exchanges succeed, the Twofish (chosen proxy cipher) keys are NOT discarded by neither client nor customer (if the 62-bit cipher identifier sent by the proxy with the version reply packet is non-zero); data already encrypted with the client-to-server key are re-encrypted for the proxy with the chosen proxy cipher (client-to-proxy or server-to-proxy). Data received from the proxy by either client or customer are first decrypted with the chosen proxy cipher key and then decrypted with the client-to-server key.

The client always knows the address it is connecting to, so it can easily use it in its key derivation procedure. At the proxy side, the address a client is trying to connect to can be one of the following:

  • the proxy's address (a direct client-to-server connection where the "proxy" is actually the server)
  • address of one of the proxy's customers
  • address = 0 (client is actually a new customer requesting a proxy service, also known as the "control" connection of that customer)
  • address = 1 (special NAT traverser connection, which only reports source port number and IP address back to the anonymous requester)
  • any other address is invalid

The proxy finds out which of these five is the case by trying to authenticate and decrypt the client's version packet using encryption keys derived from its own address, zero and one, and addresses of each of its current customers. Should all these decryption attempts fail, the TCP connection is closed immediately.

If the version packet was successfully decrypted with an address of one of the customers, the proxy starts forwarding packets between client and customer as described above.

If it was decrypted with a zero address, an EC key exchange takes place where the customer sends its both public keys (as described under EC Key Exchange), but the proxy usually sends its anonymous key. This means the proxy always identifies its customer while the customer cannot identify the proxy. A new key for the chosen proxy cipher is derived using the brainpoolP512r1 curve. Any further data transferred through the control connection is then encrypted with this key. Control packets are padded to a length divisible by 181; the rest have already been padded by proxy's customer and client, so the proxy forwards them as they are. Tag size is fixed to 16.

In the special case where a customer is an authorized contact of the proxy, the proxy sends its both public keys instead of the anonymous key. Only in this case can the client identify the proxy as its contact; it does not award any special privileges to the proxy other than not disconnecting from it when another contact shows up.

The single TCP control connection of a customer can serve many connected clients, which are identified by a unique 16-bit client number at the proxy; this number is added to all packets and used by the customer to demultiplex them. Those packets are also encrypted with the chosen proxy cipher in both directions. Note that encapsulated packets that flow between customer and client through the proxy are encrypted as usual, before being re-encrypted with the chosen proxy cipher. When a customer sends a packet to the proxy to forward, the proxy uses the 16-bit client number to forward the packet to the right client, but first checks whether this client belongs to the customer that sent the packet; otherwise, the packet is discarded. When a new client connects to the proxy, a special "connection request" control packet is sent to the customer, which includes a new client number, the IP address of the connecting client and the IP address of the proxy that the client has connected to. Closing the TCP connection at either side sends a special "close" control request over the customer connection (with client number) or client connection, which triggers the proxy to close the other side of the relayed connection.

10. NAT Traversal

As soon as the RSA handshake and cipher setup have succeeded over a proxy connection, the server (proxy customer) immediately tries to connect to the client's gip:gp and lip:lp directly. This reverse connection includes a new double (anonymous and EC) handshake, which includes vereq with the value of -1 (instead of the protocol version number), and the rnd which the other peer sent when it established the relayed connection. An incoming reverse connection is answered only if a relayed connection to the same contact with the same rnd exists. The client then replies with verep value of -1 and the server's rnd, to complete the RSA handshake. Nothing else from the table from RSA Key Exchange is sent by either side, except for ack and flags.

If this succeeds, the reversal requester (proxy customer) sends a "reverse switch" packet over the relayed connection and stops sending data over that connection. When the "reverse switch" packet is received, the reversal replier (proxy client) closes its connection to the proxy, and starts sending and receiving data over the new (direct) connection. The proxy sends a "close" request (with client number) to the reversal requester over its control connection, and the reversal requester starts sending and receiving data over the direct (reverse) connection. The client has now become a server, while the customer has become a client.

In case the reverse connection has failed, the customer sends a "reverse failure" packet over the relayed connection, and the client now attempts to connect to the customer's gip:gp and lip:lp directly, in exactly the same manner. The client is now a reversal requester while the customer is a reversal replier. In case of success, the client would send a "reverse switch" packet over the relayed connection, while the customer would send a "close" request to its proxy.

When the reverse direct connection is about to be used for data transfer, the encryption key of the composite cipher is discarded and the encryption key of the original (relayed) connection is reused with the reverse connection. The encryption and decryption sequence numbers of the reverse connection are also discarded and replaced with the sequence numbers of the relayed connection, which continue seamlessly over the reverse connection.

The proxy can also sometimes help its customer and client connect directly to each other, even if reversing the connection as described above cannot succeed. Such a NAT traversal request is sent by the client (traversal requester) to the customer (traversal replier) immediately after the client-side direct connection attempt has failed, but can also be sent by either side at any time later, after the relayed connection has already been used for quite a while.

At first, the traversal requester opens a second connection to the proxy with an address of 1. When the anonymous key exchange has completed successfully (in this case the requester does not use a key), the proxy decrypts the version packet with an address of 1 and responds to that by reporting the requester's "external" IP address and source port number back to the requester (this is Twofish-encrypted as usual). Afterwards, the proxy waits for the requester to close the TCP connection or if that doesn't happen, the proxy will close it after a predefined timeout.

The traversal requester then sends a traversal request over the relayed connection, which includes the source port number that it has just learned from the proxy, and a 62-bit random number (generated by RNG0). Upon receiving this request, the traversal replier also opens a second connection to the proxy with an address of 1, to learn its source port, and sends back that port number to the traversal requester.

Sending/receiving of this traversal reply triggers a simultaneous traversal attempt at both sides; they try to connect directly to each other's IP addresses. Should a traversal request be rejected for any reason (for example, if one of the sides does not allow traversal - such as when using TOR, or is not using a proxy, or encounters a local error), the side which rejected the request will not accept any more traversal requests for as long as this relayed TCP connection is established, but will first inform the other side not to send any more traversal requests.

If traversal succeeds, a direct TCP connection between both peers will appear automagically on a new socket. If this was an incoming connection, the peer that received it checks whether the source IP address is identical to the one it was trying to connect to. Mismatch causes the peer to skip authentication and assume failure (but continue trying to connect and listening for a connection from the correct IP address until timeout).

Both peers then simultaneously send each other a traversal authentication packet which includes the 62-bit random number from the traversal request, over the new connection. There is no key exchange; the authentication packet is encrypted by the cipher and key that was already established for the relayed connection (not including the external encapsulation with the chosen proxy cipher). The sequence number is 1 for the customer and 262 + 1 for the client. If this authenticates and decrypts successfully to the requested 62-bit random number and the other known data, it means that traversal has succeeded. The traversal authentication packet from each side also includes an initial sequence number for this side of the traversed connection, which is currently an odd number. Note that the customer sets the 62nd bit of the sequence number of the client to one.

When traversal authentication succeeds, both sides close their second TCP connection to the proxy (that was opened especially to learn the source port number) and report success to each other over the traversed connection (with the new sequence number); this is repeated a few times. Upon receiving these success packets, the client sends a final "traverse switch" packet over the relayed connection and stops sending data over that connection. When the "traverse switch" packet is received, the customer sends a "close" request (with client number) to the proxy over its control connection, and starts sending and receiving data over the direct connection with the same cipher and key and (odd) sequence numbers that increment by 2 on each TLS packet. The proxy closes its TCP connection to the client, and the client starts sending and receiving data over the direct connection.

If the traversal authentication packet is not received from the other side within one second of sending own traversal authentication packet, or if what was received over the direct connection could not be authenticated, the newly established direct connection is closed and traversal attempts continue up to a mutually agreed time limit of a few seconds (the same thing happens if a direct connection does not appear or does not succeed). Timeout means that traversal has ultimately failed; both sides close their second TCP connections to the proxy and further communication between them proceeds over the relayed connection as before.

UDP traversal (for audio calls) is also implemented. Both peers first send a "UDP proxy request" to the current proxy (over UDP). The proxy figures out who these packets came from by checking the source IP address and replies with a "UDP proxy reply" packet which contains the source IP address and port number of the requester. When a peer receives the "UDP proxy reply", it sends a "UDP traversal request" to the other peer over the existing TCP connection. As soon as a peer learns both its IP address and port number from the proxy, and the other peer's IP address and port number from the other peer, it starts the same UDP connection procedure used for direct connections (as described under Data Transfer).

UDP proxy request and proxy reply packets use the same cipher and key already used by the TCP connection; they are padded to a length divisible by 181. Proxy request packets have a fixed sequence number of 262 - 1, while proxy reply packets have a fixed sequence number of -1. These are a subject to rate limiting: when the set speed limit is exceeded, UDP control packets are dropped instead of being decrypted until speed falls below the limit.

11. Random Number Generation

Five random number generators (RNGs) are used. The simple standard C library rand() is used as needed for DHT network participation (deciding which DHT nodes to contact, calculating timeouts, and so on). This RNG is initialized with the current system time.

Additionally, four cryptographic RNGs are used:

  • RNG0: the public RNG (random_public)
  • RNG1: the session RNG (random_session)
  • RNG2: the private RNG (random_private)
  • RNG3: the seeded RNG (random_seeded)

All four cryptographic generators are implemented through a single algorithm, but are used and initialized differently.

RNG0 is used for "public" random data (stored to a file or sent over the network, such as Initialization Vectors) and initialized with:

  • the list of contacts
  • the configuration table
  • the DHT cache of IP addresses
  • various system data (the current system time, list of processes, etc.)
  • four bytes output by the just initialized rand() generator

If the minimal number of entropy bytes required for initialization was not collected during this process, RNG0 is also initialized with an "uninitialized" variable on the stack.

RNG0 is also used in the following special cases, in order to generate:

  • random bytes for cipher identification during the "cipher setup" and file encryption
  • 64-bit DHT "secret" values (required for generating DHT tokens)
  • a 160-bit DHT node ID and 32-bit DHT search identifiers for finding malicious DHT nodes
  • a random 15-bit port number for UPnP router port forwarding
  • 62-bit message identifier for each sent chat message
  • 16-bit client numbers for proxy customers, and 32-bit connection identifiers for connection reversal
  • 62-bit tokens for NAT traversal
  • contents of special data packets for internet speed measurement
  • other random data sent encrypted to contacts while they are already connected (internal padding of UDP control packets, initial audio timestamp, etc.)
  • a random permutation to shuffle the list of contacts on login so as to set a random contact order
  • random bytes for probabilistic private key validation during login (in which case the RNG0 state after validation is reset back to what it was before validation)
  • random bytes for a dummy RSA operation, for estimation of time required to generate an RSA key before generating it

RNG1 is used for "private" random data (session keys) and initialized with:

  • 256 bytes output by the just initialized RNG0
  • 16128 bytes output by the system cryptographic generator (CryptGenRandom on windows or /dev/urandom on unix)

It is re-initialized with audio data recorded from the first few seconds of the first audio call (except for audio test) after login.

RNG1 is used in the following particular cases:

  • to decide which cipher to use during the cipher setup
  • to generate random bytes to encrypt by RSA, as described under RSA Key Exchange
  • for private key operations during the openssl key exchange
  • to generate 256-bit elliptic curve private keys for the "anonymous key exchange"
  • to generate a random 32-bit salt for hash tables hash function

Public values such as the IV required by the AES256-GCM cipher (used by openssl to encrypt its key exchange) and "client random" and "server random" values are generated by RNG0.

RNG2 is used to generate permanent keys or permanent seeds, and is initialized with data recorded from an audio test. Audio data is collected as 16-bit samples at the default sample rate (usually 44100 Hz) and first debiased using the Neumann algorithm (which removes groups of two identical consecutive bits from the bit stream) before being fed into RNG2. Only audio frames where speech was detected are counted towards the minimal input entropy requirements (but the rest of the frames are used, too).

If the mouse is moved during this audio recording, data collected from mouse movements (cursor position and timing data) is also used. Cursor positions are taken as a difference to the previous cursor positions. Leading zero bits are removed from all mouse values; the sign bit is stored separately for cursor positions differences. The mouse data is then XORed cyclically into the debiased audio data (which is usually a lot more), before splitting it as described below.

When using RNG2 to generate a seed, that seed is generated in consecutive 16-byte chunks. Input audio entropy data is split to parts of equal size; the number of parts is equal to the number of 16-byte chunks in the seed to be generated. Then each part of the seed is generated independently of the others from its corresponding part of the input data. This increases input entropy requirements for longer seeds; the audio test terminates automatically when it detects that enough entropy for the seed size to be generated has been collected (this is the minimal required entropy multiplied by a hard-coded safety factor, currently two). It is possible to collect an unlimited amount of input entropy by performing multiple audio tests in a sequence and then use it for key generation (collected audio data is removed from memory only after it is actually used to initialize an RNG).

RNG3 is used to generate permanent keys, and is initialized with (part of) a permanent seed generated by RNG2.

The algorithm used to implement the four RNGs is based on ANSI X9.31 (also known as ANSI X9.17). Such an RNG is initialized (keyed) with "truly" random data exactly as big as a block cipher's key size, while its internal state is as big as that cipher's block size. Each time a block of random bytes (equal in size to the state size) needs to be output, the following algorithm is applied per sub-RNG:

   if RNG3 then
     DT = encrypt (salt)
     salt = salt + 1
   else
     DT = encrypt (DT xor time)
   endif
   state = encrypt (state xor DT)
   output = state
   state = encrypt (state xor DT)

If more random data is needed, this procedure is repeated as many times as necessary. If less random data is needed, the rest of output (up to the state size) is discarded. time is some data that depends on the current system (clock) time. For RNG3, salt is initialized to zero.

Each of the four RNGs is implemented using ten X9.17 RNGs (sub-RNGs), each of them having its own encrypt function. The first two use hash functions: HMAC-Whirlpool and HMAC-SHA3-512, which is calculated simply as SHA3-512 (key || state). The rest eight sub-RNGs use block ciphers that have a 16-byte block size (composite, serpent, twofish, RC6, MARS, camellia, ARIA and CAST-256). To generate a random block, outputs of all sub-RNGs are XORed.

To initialize or re-initialize an RNG, entropy data (as described above for each of the four RNG types) or/and saved data is used. Saved data consists of initial key for each sub-RNG (the same key that each sub-RNG was last initialized with) and a saved state for each RNG (obtained by generating 16 or 64 bytes through each sub-RNG when saving the random data to a file).

Only the data of RNG0 and RNG1 is saved to a file and reloaded at login. The data (key and state) of RNG2 and RNG3 is removed from memory immediately after generating a permanent key. Therefore, no saved data is used when initializing RNG2 with audio data.

Re-initializing RNG1 is implemented by saving its data to memory (instead of a file) and then immediately initializing the RNG with this just saved data and, in this particular case, collected audio entropy data.

When initializing RNG3, entropy data is missing, while "saved" data consists of:

  • the same user seed as a key to each of the sub-RNGs. This key is padded with zeros to match key size requirements (to a 64-bit boundary for cipher sub-RNGs) or else optionally truncated to the cipher's key size (if the cipher's key size is smaller than the size of the user seed)
  • initial sub-RNG state is calculated as an SHA2-512 hash of the user seed. If the user seed is at least 16 bytes longer than cipher's key size, the first bytes up to the key size do not take part in the calculation; else, the whole seed does. For cipher-based sub-RNGs, the initial state is set to the first 16 bytes of the SHA2-512 hash in either case (hash functions set it to the whole SHA2-512 hash)

Since initializing X9.17 requires fixed-size keys (equal to each cipher's key size for each sub-RNG) and states (equal to ciphers block sizes), entropy data needs to be hashed to produce keys and states of the required sizes. Input (entropy) data is processed in blocks in the following way:

  • the first ten such blocks are hashed to produce 64-byte keys of the two HMAC functions and the first 16 bytes of each of the eight ciphers keys
  • the second eight blocks are hashed to produce the second 16 bytes of the eight ciphers keys
  • the third and fourth groups of two blocks are hashed to produce the third and fourth 16 bytes of keys of the two ciphers that take 64-byte keys (composite and RC6). Note that the MARS cipher takes only 32 bytes of key material when used for hashing (but 56 bytes when used for encryption and as a sub-RNG)
  • finally, the fifth group of ten blocks are hashed to produce initial states of each of the sub-RNGs

After the fifth group is hashed and if there is more input entropy, the rest of the entropy is hashed continuously the same way (first taking a first group of ten blocks, then a second group of eight blocks, and so on).

Input blocks have different sizes, equal to the key size of the sub-RNG's cipher, as follows. For hashing that produces key or state of a sub-RNG, a hash function based on the same block cipher as the one used by that sub-RNG is applied. This hash function has a block size equal to its cipher's key size, and a state as big as its cipher's block size. The hash state is always first initialized to the ASCIIz string "Merkle-Daamgard". An input block is processed by using it as a key to the block cipher to encrypt the current hash state; the result is then XORed into the state and becomes a new state of that hash function. To obtain a final result, the input is padded by appending a 1 bit, followed by the input's total length, and padding with zeros to a block size. Whirlpool and SHA3-512 are used directly (not in HMAC mode) to obtain a 64-byte key from hashing a 64-byte first block, and a 64-byte state from hashing a 64-byte fifth block (this key and state are then used by their sub-RNGs, as for cipher-based sub-RNGs)

To fully initialize all sub-RNGs, the following input blocks are therefore needed:

  • 1 x 2 x 64 = 128 bytes to produce keys for Whirlpool and SHA3-512 (key size is 64 bytes)
  • 4 x 2 x 64 = 512 bytes to produce keys for composite and RC6 (key size is 64 bytes)
  • 2 x 6 x 32 = 384 bytes to produce keys for the other six ciphers (key size 32 bytes)
  • 1 x 2 x 64 = 128 bytes to produce state for Whirlpool and SHA3-512 (block size is 64 bytes)
  • 1 x (2 x 64 + 6 x 32) = 320 bytes to produce state for the eight ciphers (block size is 16 bytes)

The total size of all input blocks needed is then 1472 bytes, so that's the minimum amount of input entropy required for fully initializing an RNG. Less entropy could also work by simply deriving shorter keys (and/or constant initial state) but this is not used.

Keys and states so produced from input data are used directly to initialize sub-RNGs in case no saved data is present. If saved data is present, it is merged into the produced values by XORing (both key and state) to them before initializing sub-RNGs.

12. Key and File Management

Private keys are generated either from seed or directly from audio data. A random binary seed is converted to an ASCII word list after adding key type and checksum to the binary data. Four key types are currently supported: 2048-bit, 4096-bit, 8192-bit and 16384-bit (EC private keys are always 512-bit). RSA keys generated directly from audio data may be of arbitrary size (between 2048-bit and 16384-bit).

Private keys are generated from an user-specified seed by initializing RNG3 and then using crypto++ to generate first an RSA key followed by an EC key. Basically this amounts to generating random blocks of the required key size, until a generated block happens to be within the required range and, for RSA, also a probable prime.

First a check whether the seed security level is bigger than the key security level is made as follows:

  • 2048- or 4096-bit RSA keys require at least a 128-bit seed
  • 8192-bit RSA keys require at least a 192-bit seed
  • 16384-bit RSA keys require at least a 256-bit seed

If the seed size is at least three times bigger than the required size, RNG3 is initialized three times to independently generate first two prime factors of an RSA key and then an EC key. One-third of the seed is used to generate each of these three numbers.

If the seed size is at least twice bigger than the required size, RNG3 is initialized twice with each half of the seed, to generate independently two prime factors of an RSA key from the two halves of the seed. It is then initialized a third time with the whole seed to generate an EC key.

If the seed size is not twice bigger than the required size, RNG3 is initialized only once with the whole seed and then used to generate first two prime factors of an RSA key and then an EC key.

The maximal supported seed size is 3072 bits, which allows to generate each of the required three large numbers independently from a 1024-bit seed each.

When generating private keys directly from audio data, audio entropy is always split into three parts of equal size, and then RNG2 is initialized three times with each of the three parts in order to generate each of the three large numbers.

In the above calculations, if the seed size is not divisible by the number of numbers to generate independently (two or three), the remainder of bytes is always added to the last one.

Generated RSA keys currently have a public exponent of 65537, but any exponent is accepted for other users public keys.

To support use of multiple user identities at the same system, data files for each identity are stored in a separate subdirectory. The key file, saved random data, configuration table, saved pending messages, DHT node cache, contact and proxy lists are encrypted with the composite cipher by default. The key file (which contains the two private keys, RSA and EC) may be:

  • encrypted using a Whirlpool hash of the binary seed as a key (this is the default mode)
  • encrypted using a user-defined password (key is derived from it by PBKDF2/Whirlpool with 12345 iterations)
  • not encrypted at all to allow automatic login at program start

The other files are encrypted using a key derived as a Whirlpool hash of the EC private key concatenated with the RSA private key.

A saved file has an IV (generated by RNG0) and a hash tag, as usual. When decrypting files, an authenticity check is done on the whole contents before decryption.

The user can choose a preferred cipher for file encryption. In order to decrypt a file encrypted with any supported cipher, a block of random bytes is encrypted with that cipher and prepended to each file. The algorithm to do so is the same as for encrypt_cipher and decrypt_cipher, as described under Cipher Setup.

The key generator (login prompt) does not store data of its random number generator. To generate an IV for saving a generated key file, RNG0 is initialized with the generated 160-bit address in addition to system data (as described under Random Number Generation).

When a file is saved, the previously saved file is first renamed to a .old file for backup purposes. If saving failed for any reason, the .old file is renamed back to the original file name. Before the first rename, the previous .old file is deleted. When loading a file, if loading fails for any reason, an attempt is made to load the .old file. Thus no information should be lost if the program crashed or was otherwise interrupted during the saving, and could not rename the .old file back to the original after the failed save. The only exception is the random data file, which does not store a backup copy. The key file is always stored twice (so the .old file is created on its first saving). When deleting a file, it is always filled with a random pattern first.

13. Additional Notes

Contacts are flagged as voice-authenticated by reading out the first 160 bits of the Whirlpool hash of the concatenation of their EC public key followed by RSA public key followed by the first four bytes of K3 (the RSA-XOR value of the current session) and entering it into the program at the other side during an audio call. One can also read the peer's address for additional security, but this is not required.

Users announce their IP addresses to the DHT network by publishing a 160-bit hash of their address, which is calculated as follows:

  • first a Whirlpool hash of the binary (160-bit) address is calculated
  • a 32-bit salt value is appended to it (this is a constant value)
  • RIPEMD-160 is applied to this 544-bit value and the result is published

In TOR mode, user does NOT announce to the DHT but only searches for other users. If DNS queries need to be made for dynDNS hosts, they go through the TOR proxy, and audio data is transferred only over TCP (through TOR). Incoming connections are not accepted (the listening port is closed). Intranet and internet IP addresses and port numbers are not reported for connection reversal or traversal.

When connected through a proxy (as described under Proxy Key Exchange), the proxy announces its own IP address to the DHT for all of its customers. Note that when a user has connected to a proxy, the proxy has verified that this customer is in possession of private/public key pairs that hash to its address, before announcing it to the DHT.

Throughout the library, length-checked strings and collections of strings known as "simtypes" are used. Network packets (before encryption) are made out of them, and they are also used to store/load files and to communicate with the GUI, in addition to being used internally. The few cases where static buffers are internally used are length-checked, too. This hopefully avoids susceptibility to buffer overflows, like most C programs are. When freeing a simtype, stored data is overwritten with zeros before releasing memory back to the operating system. There is also a global setting that allows to lock the whole program into RAM to prevent swapping to disk (not turned on by default).

14. Security Risks

This table lists possible security breaches and their expected impact in descending order of severity.

risk

If...

Then...

countermeasures

1:1000000

a large quantum computer has broken both ECDSA and RSA

your security is totally compromised; third parties can also revoke your identity

not available

1:1000000

both RSA and SHA2-512 are broken

you have no security any longer; past communication (prior to the break of SHA2-512) is still private

use a larger RSA key size

1:100

a friend of yours has tricked you into running a trojan horse, or has ran one when you let him use your computer

nothing is private or secure, including your simphone communication and private keys. Your former friend is now your overlord

do not use software unless absolutely sure it's backdoor-free and do not leave your computer unattended

1:1000

someone else has physical access to your computer or remote access to your user account at the computer

nothing is private or secure, including your simphone communication and private keys

keep your computer secure

1:1000

unauthorized parties have obtained a copy of your private keys

the parties can impersonate you or revoke your identity; future communication is no longer private, but past communication still is, even if network traffic has been recorded prior to stealing your private keys

keep your private key safe and your computer secure

1:10000000

the eight RNG symmetric ciphers AND Whirlpool AND SHA3-512 are all broken

the simphone random number generator is not secure. Your keys may be compromised; your communication is probably not private

not available

1:100000

a symmetric cipher is broken

your communication (past or present) may not be private only in case you have used that cipher

choose a strong symmetric cipher

1:10000

Whirlpool is broken

contact verification is not secure; a man-in-the-middle could successfully impersonate your contacts even after verification

when reading out the verification tokens, read out and compare your simphone addresses, too

1:10000

the mainline DHT network is broken

you will not be able to connect to any of your contacts unless you have already connected to them previously and they have not changed their dynDNS host name (if they have any) or IP address since then

use dynDNS

1:100000

ECDSA is broken, but RSA is not

someone can send you fake status messages and eventually perform denial-of-service attacks against you, but they will not be able to send or receive data to or from you

not available

1:10000

SHA2-512 is broken, but RSA is not

same as the previous point, but it may be more difficult to mount a successful attack in practice

not available

1:100000

both RIPEMD-160 and SHA2-256 are broken

someone can impersonate contacts you have added, but only before you have connected to them for the first time. You will detect the security breach as soon as you attempt to verify your contacts

verify all your contacts

1:10000

AES256-GCM is broken

third parties can replay status messages you have previously received from your contacts

not available

1:1000

a man-in-the-middle has taken control of your communication lines

the man can disrupt your communications at will; public keys are revealed

use different networks

1:1000

a man-in-the-middle has taken control of your communication lines

the man can impersonate unverified contacts, which you have added after your connection was hijacked; communication with verified contacts (even after the hijack) is still fully secure. For unverified (impersonated) contacts, Simphone would eventually raise an alarm if you moved your computer to a non-hijacked network environment

verify all your contacts and use different networks

1:100

someone is running a malicious simphone proxy

if you happen to connect to that proxy by chance, the proxy will obtain your public key and can prevent your contacts from connecting to you or drop such connections at random times; you can still connect to your contacts, unless they suffer from the same problem

make sure you have incoming connectivity

1:10

your network traffic is being recorded

the records will show IP addresses of your contacts (unless you or they are using TOR)

use TOR

1:100000

RSA is broken, but neither ECDSA nor SHA2-512 is

increased risk in case something else gets broken

use a larger RSA key size

Numbers in the above table are an attempt to compare likelihoods. They are not real probabilities.