Simphone implements state-of-the-art, 256-bit cryptographic security (the highest by today's standards) with the following features:
Simphone uses only standard cryptographic primitives, but combines them in simple ways to increase their security level and add safety margins.
The rest of this document describes used cryptographic procedures in detail. It is assumed that the reader is familiar with cryptography. For readers familiar with cryptography, but not with the TLS protocol, it might be helpful to read RFC5246 and other relevant RFCs to find bits of information not repeated in this document.
The openssl library is used to implement the initial handshakes between two connecting peers. The TLS handshake is first simplified (by removing all unnecessary features) and then strengthened, as described under Key Derivation.
Everything which could be removed from the openssl/crypto library at compile-time without preventing the handshake from succeeding, was eliminated by #defining appropriate OPENSSL_NO_xxx symbols, including the following features:
In addition, the following features that could not be disabled at compile-time are disabled at run-time:
Note that when a client initiates an Anonymous Key Exchange, more TLS ciphers, elliptic curves and signature algorithms are enabled. But if the server responds with anything other than what is required, the client closes the connection.
The following block ciphers are currently implemented through the used crypto++ library:
name |
block size (bytes) |
key size (bytes) |
security level (bits) |
---|---|---|---|
Rijndael (triple AES) |
16 |
96 |
512 |
Serpent |
16 |
32 |
256 |
Twofish |
16 |
32 |
256 |
RC6-32 (with 40 rounds) |
16 |
64 |
256 |
MARS-2 |
16 |
56 |
256 |
Camellia |
16 |
32 |
256 |
ARIA |
16 |
32 |
256 |
CAST-256 |
16 |
32 |
256 |
IDEA (with 16 rounds) |
8 |
16 |
128 |
DES-EDE3 (triple DES) |
8 |
24 |
112 |
Ciphers are strengthened as follows:
The given key sizes are used for both encryption and key derivation (CBC-MAC). For encryption, ciphers are used in EAX mode, which allows for ciphers with any block size.
A special composite cipher is used for file encryption and as a default cipher for data transfer. This cipher is defined as a random permutation of the first eight ciphers from the above table, all of them used with the same key. Single (not triple) Rijndael is used as a part of the composite cipher. To decide which permutation of the eight ciphers to apply, an SHA2-256 hash of the encryption key is calculated and the first (unsigned big-endian) 64 bits of this hash are taken modulo 40320, to obtain one of 40320 possible permutations of eight numbers. This composite cipher is then used in EAX mode to encrypt files or transfer data, and in ECB mode to implement a sub-RNG (it is also used as a hash function to obtain its own key from input entropy when generating a seed, just like other sub-RNGs; see Random Number Generation for details).
The key size of the composite cipher is 64 bytes; this key is truncated to the maximal key size each cipher supports according to the table above. For Rijndael, the key is truncated to 32 bytes.
The following terms are used when describing handshakes (for brevity):
Before login (program start), each peer generates two anonymous self-signed elliptic curve keys, using RNG1. These keys are kept in memory; the first one is on the secp256r1 curve and is used only for the anonymous key exchange described in this chapter, while the second one is on the brainpoolP512r1 curve and is used only for the Proxy Key Exchange. In either case, the self-signature is not verified.
The server calls SSL_accept with its anonymous key while the client calls SSL_connect with no key to do a TLS handshake. When the handshake has completed successfully, both peers have calculated a shared value known as TLS premaster secret.
A key for the Twofish cipher is derived from this value as described under Key Derivation with only one particularity: if the premaster secret in this case is shorter than 64 bytes, it is replicated in order to become exactly 64 bytes long (or if it was longer than 64 bytes, it would be truncated to 64 bytes) before doing the calculations to derive the cipher key. In practice, the secp256r1 curve generates a 32-byte premaster secret, so it is simply duplicated to become 64 bytes long; however, the current proxy cipher can only take a 32-byte key, so the replicated data have no effect. The counter value for key derivation is 0.
As soon as this is done, the client (customer) sends a packet containing its version number and a list of supported proxy ciphers (as their unique 62-bit identifiers, same as described under Cipher Setup. The only currently supported proxy cipher is Twofish.
This version request packet also contains the IP address and port number that the client (customer) has established a connection to. The server sends back a version reply packet containing its version number and the unique 62-bit identifier of the chosen proxy cipher; if none of the proxy ciphers supported by the client can be chosen, the proxy closes this connection immediately. If the "proxy" is actually the server that this client is trying to connect to, it sends back a zero 62-bit identifier, so the client will discard the Twofish (proxy cipher) key, as described under Proxy Key Exchange, as it is a direct connection (no relaying of packets).
The version reply packet also contains the IP address and source port number of the connecting client, as seen by the proxy; this also happens when a customer is connecting to its proxy. Both version request and version reply packets are encrypted exactly as described under Data Transfer. The tag size is fixed to 16 (equal to the block size of the Twofish cipher).
If client and server version numbers are compatible, EC and RSA key exchange (described in the next two chapters) then take place.
All data of this handshake are encrypted with the chosen proxy cipher (Twofish); the tag size is fixed to the block size of the used cipher.
The server now calls SSL_connect while the client calls SSL_accept. This means that the server sends a "client hello" TLS message which includes a 32-byte "client random" value, while the client sends a "server hello" TLS message, which includes, among other things, a 32-byte "server random" value and an X.509 certificate, which contains the EC public key and the RSA public key of the client. Both public keys are signed by the EC public key; this means that the EC public key is self-signed. The signatures are there only to comply with the TLS protocol; they are not verified.
When the server or the client receives the other peer's public keys, openssl calls a verify callback. This callback checks that:
If any of these checks fail, the verify callback fails. Otherwise, a RIPEMD-160 hash of the SHA2-256 hash of the concatenation of EC public key followed by RSA public key is calculated; this is the client's address. The server matches this address to its list of contacts, to identify the client. If there is no match, the verify callback will either fail (if configured to refuse contact with strangers) or (by default), the user program will receive a contact request from a new contact with the calculated address (hash of the hash of both public keys). No response is sent to the client until this contact request is accepted by the human user (the client does receive the server's public key by default, but this can be disabled if not desired). A match to the list of blocked or deleted contacts will always fail the verify callback.
Should the verify callback fail, the TLS handshake is aborted and the TCP connection is closed immediately; the server does not reveal its public keys to the client in this case. Otherwise, the server proceeds to send a certificate containing its EC and RSA public keys. The client then verifies the keys in the same manner except it compares the address it was trying to connect to, to the address calculated from received public keys; a mismatch fails the verify callback at the client side.
During the handshake, the client generates an ephemeral (temporary) EC key and sends it to the server; the server receives the client's ephemeral key and generates its own ephemeral key on the same curve, which is sent back to the client. Peers then use both ephemeral keys to calculate a premaster secret. The handshake is signed by the server using SHA2-512 and ECDSA through its non-ephemeral (permanent) EC key. After the handshake is authenticated and completed successfully, the server now checks that the used ephemeral EC key is on the brainpoolP512r1 curve and if this is so, the TLS handshake succeeds.
When/if the EC key exchange (handshake) succeeds, another handshake is initiated through the same TCP connection; all network packets of this exchange are encrypted with the composite cipher. The encryption key of the chosen proxy cipher is discarded and a new key for the composite cipher is derived from the premaster secret of the EC key exchange, as described under Key Derivation (the counter value is 0). Data is transferred as described under Data Transfer, but data packets are padded to a length divisible by 1448 instead of 181; raw (non-TLS) openssl RSA routines are called to implement RSA encryption and decryption.
First the client sends a packet as described by the table below. The server receives this packet and sends a response packet described by the same table. If hs was not sent by the client, this means that only status packets are being exchanged; the server closes the connection immediately after sending its response.
contents |
client |
server |
fwd |
rev |
---|---|---|---|---|
list of supported key exchange types (currently only RSA-XOR, described in this chapter) |
hs |
|||
chosen key exchange type (currently RSA-XOR) |
sh |
|||
tag size (the maximal block size of any supported block cipher, currently 16) |
blk |
blk |
||
client_preferred: list of preferred ciphers |
pref |
|||
client_supported: list of other supported but non-preferred ciphers (not including the composite cipher, which is always supported) |
supp |
|||
encrypt_cipher: cipher to use for server-side encryption (client-side decryption) |
encrypt |
|||
decrypt_cipher: cipher to use for server-side decryption (client-side encryption) |
decrypt |
|||
revocation reason string (public key has been irreversibly revoked) |
REVOKE |
REVOKE |
||
client-server protocol version number |
vereq |
verep |
" |
" |
32-bit random identifier of this connection (generated by RNG0) |
rnd |
rnd |
" |
" |
initial sequence number for successfully reversed connection |
seq |
seq |
" |
" |
61-bit number of last received chat message from this peer |
ack |
ack |
" |
" |
current status (off/on/away/busy/hide) |
s |
s |
" |
" |
contact flags: allowed communication types (text chat, file transfer, audio call, etc.) |
f |
f |
" |
" |
number of last sent chat messages that contact wishes to allow you to edit |
em |
em |
" |
" |
nickname or real name (as configured by the user). Present ONLY when sending a contact request |
nick |
" |
||
client: IP address this connection was established to |
to |
from |
" |
" |
port number this connection was established to |
top |
" |
" |
|
IP address of own proxy (not present if no proxy used) |
pip |
" |
||
listening port number of own proxy (not present if no proxy used) |
pp |
" |
||
own (internet) IP address |
gip |
gip |
||
own (internet) listening port number |
gp |
gp |
" |
|
local (intranet) IP address |
lip |
lip |
||
local (intranet) listening port number |
lp |
lp |
||
list of supported audio codecs |
acs |
acs |
||
name of insecure operating system or an empty string if used system is secure |
bad |
bad |
||
unique 60-bit identifier of this computer (generated by RNG3) |
cid |
cid |
" |
" |
client flags: verify IP address |
flags |
flags |
" |
|
minimally desired contact period. This reveals the number of authorized contacts of the server (unless manually overridden) |
logon |
" |
||
number of milliseconds since the last status change. Present ONLY if both connecting peers have the same address |
age |
age |
" |
" |
The columns of the above table have the following meanings:
If vereq and verep are compatible, the rest of the packet is processed. Then, if hs was sent, both sides check the received blk and force it to 0 if less than 0 or to 256 if more than 256. This is used during data transfer as a number of additional bytes at the beginning (IV size) and end (hash tag size) of each data packet.
Afterwards, both the client and server simultaneously use their RNG1 to generate a random value as long as the other peer's RSA public key (excluding 42 bytes, required by the used OAEP padding), encrypt it with that key and send it to the other peer. Each of them receives this encrypted value from the other peer and decrypts it with its private RSA key.
The server then checks again whether the address of the client is already known and if not, handles this "contact request" by adding a new contact with the calculated address and "contact requested" status (not allowed to send or receive data until contact is accepted manually by the human user). There is a user-defined limit of the number of non-accepted contact requests that can be received within 24 hours; if more than that arrive, they are not added to the list of contacts and the handshake fails after the double public key exchange (TLS handshake and RSA handshake) has succeeded.
Finally, both sides check to see if they have stored public keys for their connected peer. If so, received public keys are compared byte-by-byte to the stored ones; if one or both of them are not identical, the handshake fails, and an event is sent to the user program, which displays an error message and plays a special alarm sound in an "infinite" loop. If there were no stored public keys for this contact, both received public keys are stored silently.
If REVOKE has been received, data transfer proceeds only for the revoker to send a special key revocation packet containing an RSA signature of the Whirlpool hash of concatenation of the revoker's EC public key followed by RSA public key followed by the revocation reason followed by REVOKED. The PSS-MGF1 padding scheme with SHA2-512 is used here. The receiver verifies this signature and on success blocks the revoking peer immediately and irreversibly. A verification failure causes the user program to display a warning and play a warning sound.
openssl usually derives a "master secret" from its premaster secret and "client random" and "server random" values, and then proceeds to derive session keys from this "master secret". But with Simphone, these steps are skipped and a symmetric cipher key is derived directly from the premaster secret and "client random" and "server random" values in a slightly different manner. Note that because CBC-MAC is used for this key derivation, the derived key also depends on the cipher that is to be used.
The other input values (in addition to the cipher) to this key derivation procedure are:
tls_premaster is always as long as the elliptic curve size, that is 64 bytes (except for anonymous handshake with a 256-bit elliptic curve, in which case it is 32 bytes). TLS random values are 32 bytes each.
Let L be the block cipher's key size in bytes (as listed in the "symmetric ciphers" table). Then:
if
L < 32
then
l = L
else
l = 32
endif
K0 = CBC-MAC-cipher (tls_premaster[0, l], client_random || server_random)
K1 = HKDF-Whirlpool (tls_premaster[l, 64 - l], client_random || server_random)
if
counter < 2
then
K2 = user_address
else
K2 = RSA_client[32 * (counter - 2), 96] xor RSA_server[32 * (counter - 2), 96]
endif
K = K0 xor K1[64 * counter, 96] xor K2
In the above equations, || denotes concatenation, xor is "exclusive or", array[i, n] are n bytes taken from array starting at (zero-based) index i, and K is the derived key.
All results are thus XORed to produce a derived key. Note that CBC-MAC produces a number of bytes equal to the cipher block size; HKDF-Whirlpool (as defined by RFC5869) produces "unlimited" output, of which 96 bytes are taken, while RSA_client and RSA_server are also quite long, but only up to 96 bytes per key are taken from them. Values are XORed from the beginning; this means that CBC-MAC does not participate in deriving the later bytes. The resulting 96-byte value is then truncated to be only as many bytes long as the cipher can take as a key.
This means that:
Note that when deriving from tls_premaster with Rijndael, taking only 32 bytes of key material means that CBC-MAC is based on single (not triple) Rijndael.
This part takes place during the already described RSA handshake. Both peers agree on cipher(s) that will be used to encrypt and decrypt data. No information about the cipher(s) agreed upon is revealed in packets sent over the network.
Each user has a non-empty list of preferred ciphers, which by default contains all of the supported ciphers that take at least a 32-byte key, except the composite cipher. During the cipher handshake, both peers attempt to agree on a random cipher preferred by both sides; if such a cipher doesn't exist, each side will use one of their preferred ciphers for their sending side of the (TCP and UDP) connection.
The server sets encrypt_cipher and decrypt_cipher to decide which cipher(s) will be used for encryption (client-side decryption) and decryption (client-side encryption) by applying the following algorithm:
if
intersection (server_preferred - composite, client_preferred)
is
non-empty
then
decrypt_cipher = encrypt_cipher = random_cipher (from intersection)
else
if
composite
in
client_preferred
and
composite
in
server_preferred
then
decrypt_cipher = encrypt_cipher = composite
else
decrypt_cipher = composite
if
intersection (server_preferred, client_supported - composite)
is
non-empty
then
encrypt_cipher = random_cipher (from intersection)
else
if
composite
in
server_preferred
then
encrypt_cipher = composite
else
if
intersection (server_supported - composite, client_supported - composite)
is
empty
then
encrypt_cipher = composite
else
decrypt_cipher = encrypt_cipher = random_cipher (from intersection)
endif
if
intersection (server_supported - composite, client_preferred)
is
non-empty
then
decrypt_cipher = random_cipher (from intersection)
else
if
composite
in
client_preferred
then
decrypt_cipher = composite
else
if
intersection (client_supported - composite, server_supported - composite)
is
empty
then
decrypt_cipher = encrypt_cipher
else
if
decrypt_cipher
is
composite
then
decrypt_cipher = random_cipher (from intersection)
endif
endif
Cipher lists sent over the network (client_preferred and client_supported) transmit a unique 62-bit identifier for each listed cipher. These are precalculated as a truncated hash tag of a 128-bit block full of zeros, encrypted in the cipher's mode (EAX mode) with a key derived from zeros, zero IV, zero sequence number and zero counter value.
encrypt_cipher and decrypt_cipher do not use these cipher identifiers. Instead, each of them contains RNG0-generated random bytes encrypted with the cipher and its would-be derived key. The encrypted string has a RNG0-generated random IV and a hash tag; the total number of encrypted bytes (including the IV and tag) is equal to three times the server tag size sent in the same packet. The client figures out which ciphers are requested by trying to decrypt the packet with all of its supported ciphers and chooses the cipher that could authenticate the bytes correctly. The sequence number is 0, while the counter value is as described in the next paragraph.
With the cipher(s) agreed upon, a symmetric key or keys are derived as described under Key Derivation; the composite cipher key used to protect the RSA key exchange is discarded. If the same cipher is used for both encryption and decryption, the counter value is 2. If different ciphers are used, the client-side encryption cipher uses a counter value of 2, while the client-side decryption (server-side encryption) cipher uses a counter value of 2 + ks, where ks is the client-side encryption cipher key size (counted as a whole number of 32-byte chunks).
Note that the cipher handshake may force you to use a non-preferred cipher also for the sending side. In any case, if a non-preferred cipher is used (for either receiving or sending), an event is sent to the user program, which displays a warning.
When the cipher handshake has completed successfully, data transfer can finally begin. Data is sent in packets, each of which has a 5-byte header, containing the data length (up to 16384) in bytes. This is compatible with the TLS protocol, but openssl is not called to send or receive the data. As required by this protocol, an invisible 64-bit "sequence number" is authenticated together with the data and incremented on each processed packet. These packets go over TCP.
Unlike TLS, where two different keys (both derived from the master key) are always used for both sides of the connection and their sequence numbers always start at 0 for each side, a single key derived directly from the premaster secret is used when using the same cipher for both sides (if two different ciphers are used, then two different keys are used). The server starts its sequence numbers at 2; the client starts its sequence numbers at 262 + 2. The sequence numbers are incremented by 2 on each packet, so they are always even numbers.
UDP data transfer (used only for audio calls) is started upon request and agreement of both parties as follows:
These packets are internally padded by RNG0-generated bytes, so they have at least the same size as audio packets that would follow.
Receiving a "UDP reply" or "UDP start" locks the socket to the IP address where the packet came from and any further UDP packets that arrive over this socket for the duration of the audio call are accepted only from that IP address. It also sets the UDP port for this peer, so that any further audio data packets will, from this point on, be sent via UDP to this port number (and the locked IP address).
UDP packets use the same encryption key as the TCP stream; they do not have a TLS or DTLS header. The sequence number of UDP control packets is fixed to -1 for the server and to 262 - 1 for the client. The sequence number of UDP audio data packets is fixed to -ss for the server and to 262 - cs, where ss is the audio sampling rate for the server, and cs is the audio sampling rate for the client. Their contents can be up to 16384 bytes long (but in practice always less than 1472 bytes, so datagrams are not fragmented). All UDP packets contain their own incrementing timestamps encrypted inside them.
Each data stream counts the number of bytes sent and received; when the sum of these two exceeds a certain limit (set to one gigabyte by default), the connection is closed by both sides. Both TCP and UDP data are taken into account. A limit of one gigabyte is adequate for ciphers with an 8-byte block size; if those are not used (and they aren't by default), it should be safe for the user to increase this limit. The maximum allowed is one terabyte.
Data packets are always padded with random data to a length divisible by 181. The packet data contain a random IV, followed by the encrypted data, followed by padding of necessary length, followed finally by the hash tag (so the random padding is also encrypted and authenticated). The length of the IV and hash tag is the maximum of the block cipher's block size and blk (the tag size, for this side of the connection, sent during the RSA handshake). Thus ciphers with an 8-byte block size in the current version will still use 16 bytes as an IV/tag size which should make them indistinguishable from other ciphers to network packet observers. The receiver then will truncate the IV/tag to 8 bytes for those ciphers to process the packets.
Audio packets have a constant size (variable bit-rate is not used), so they are not padded, regardless of whether they are sent over a direct (TCP or UDP) or over a relayed connection.
IVs are generated by RNG0 and then encrypted with the cipher in EAX mode with a random pre-IV (also generated by RNG0) and zero sequence number before being sent over the network. The hash tag and the pre-IV of this encryption is discarded (see Random Number Generation for details on RNG0).
At the receiving side, if a packet does not pass authentication, it is discarded before decryption and the connection to the peer is closed immediately, as required by the TLS protocol. Alerts are neither sent nor processed. However, a successfully authenticated packet that has corrupted contents will send an event to the user program, if detected. The user program displays a warning and plays a special warning sound.
The previous chapters fully describe direct communication between two peers. However, if a server has no incoming connectivity, it is necessary for a proxy to help others establish connections to this server. When they want to connect to this server, they actually connect to its proxy, which does have incoming connectivity and can relay packets between both sides.
Any user who has incoming connectivity can serve as a proxy for any other user, but contacts are preferred to non-contacts when choosing a proxy. The key exchange through a proxy differs from a direct handshake as described below.
The proxy key exchange takes place between customer and proxy, while client does only an anonymous key exchange with the customer's proxy. EC and RSA key exchanges take place between client and server, through the proxy. The proxy uses the chosen proxy cipher to decrypt packets received from client and re-encrypts them with the chosen proxy cipher before forwarding them to its customer. The same is done when forwarding packets from customer to client. For encrypting packets to a client, the proxy uses a key derived from the anonymous handshake with that client; for encrypting packets to a customer, the proxy uses a key derived from the EC handshake with that customer.
After all key exchanges succeed, the Twofish (chosen proxy cipher) keys are NOT discarded by neither client nor customer (if the 62-bit cipher identifier sent by the proxy with the version reply packet is non-zero); data already encrypted with the client-to-server key are re-encrypted for the proxy with the chosen proxy cipher (client-to-proxy or server-to-proxy). Data received from the proxy by either client or customer are first decrypted with the chosen proxy cipher key and then decrypted with the client-to-server key.
The client always knows the address it is connecting to, so it can easily use it in its key derivation procedure. At the proxy side, the address a client is trying to connect to can be one of the following:
The proxy finds out which of these five is the case by trying to authenticate and decrypt the client's version packet using encryption keys derived from its own address, zero and one, and addresses of each of its current customers. Should all these decryption attempts fail, the TCP connection is closed immediately.
If the version packet was successfully decrypted with an address of one of the customers, the proxy starts forwarding packets between client and customer as described above.
If it was decrypted with a zero address, an EC key exchange takes place where the customer sends its both public keys (as described under EC Key Exchange), but the proxy usually sends its anonymous key. This means the proxy always identifies its customer while the customer cannot identify the proxy. A new key for the chosen proxy cipher is derived using the brainpoolP512r1 curve. Any further data transferred through the control connection is then encrypted with this key. Control packets are padded to a length divisible by 181; the rest have already been padded by proxy's customer and client, so the proxy forwards them as they are. Tag size is fixed to 16.
In the special case where a customer is an authorized contact of the proxy, the proxy sends its both public keys instead of the anonymous key. Only in this case can the client identify the proxy as its contact; it does not award any special privileges to the proxy other than not disconnecting from it when another contact shows up.
The single TCP control connection of a customer can serve many connected clients, which are identified by a unique 16-bit client number at the proxy; this number is added to all packets and used by the customer to demultiplex them. Those packets are also encrypted with the chosen proxy cipher in both directions. Note that encapsulated packets that flow between customer and client through the proxy are encrypted as usual, before being re-encrypted with the chosen proxy cipher. When a customer sends a packet to the proxy to forward, the proxy uses the 16-bit client number to forward the packet to the right client, but first checks whether this client number belongs to the customer that sent the packet; otherwise, the packet is discarded. When a new client connects to the proxy, a special "connection request" control packet is sent to the customer, which includes a new client number, the IP address of the connecting client and the IP address of the proxy that the client has connected to. Closing the TCP connection at either side sends a special "close" control request over the customer connection (with client number) or client connection, which triggers the proxy to close the other side of the relayed connection.
As soon as the RSA handshake and cipher setup have succeeded over a proxy connection, the server (proxy customer) immediately tries to connect to the client's gip:gp and lip:lp directly. This reverse connection includes a new double (anonymous and EC) handshake, which includes vereq with the value of -1 (instead of the protocol version number), seq (currently an odd number), and the rnd which the other peer sent when it established the relayed connection. An incoming reverse connection is answered only if a relayed connection to the same contact with the same rnd exists. The client then replies with verep value of -1 and the server's seq (currently an odd number) and rnd, to complete the RSA handshake. Nothing else from the table from RSA Key Exchange is sent by either side, except for ack and flags.
If this succeeds, the reversal requester (proxy customer) sends a "reverse switch" packet over the relayed connection and stops sending data over that connection. When the "reverse switch" packet is received, the reversal replier (proxy client) closes its connection to the proxy, and starts sending and receiving data over the new (direct) connection. The proxy sends a "close" request (with client number) to the reversal requester over its control connection, and the reversal requester starts sending and receiving data over the direct (reverse) connection. The client has now become a server, while the customer has become a client.
In case the reverse connection has failed, the customer sends a "reverse failure" packet over the relayed connection, and the client now attempts to connect to the customer's gip:gp and lip:lp directly, in exactly the same manner. The client is now a reversal requester while the customer is a reversal replier. In case of success, the client would send a "reverse switch" packet over the relayed connection, while the customer would send a "close" request to its proxy.
When the reverse direct connection is about to be used for data transfer, the encryption key of the composite cipher is discarded and the encryption key of the original (relayed) connection is reused with the reverse connection. The encryption and decryption sequence numbers of the reverse connection are replaced by the seq values sent during the reversal double handshake.
The proxy can also sometimes help its customer and client connect directly to each other, even if reversing the connection as described above cannot succeed. Such a NAT traversal request is sent by the client (traversal requester) to the customer (traversal replier) immediately after the client-side direct connection attempt has failed, but can also be sent by either side at any time later, after the relayed connection has already been used for quite a while.
At first, the traversal requester opens a second connection to the proxy with an address of 1. When the anonymous key exchange has completed successfully (in this case the requester does not use a key), the proxy decrypts the version packet with an address of 1 and responds to that by reporting the requester's "external" IP address and source port number back to the requester (this is Twofish-encrypted as usual). Afterwards, the proxy waits for the requester to close the TCP connection or if that doesn't happen, the proxy will close it after a predefined timeout.
The traversal requester then sends a traversal request over the relayed connection, which includes the source port number that it has just learned from the proxy, and a 62-bit random number (generated by RNG0). Upon receiving this request, the traversal replier also opens a second connection to the proxy with an address of 1, to learn its source port, and sends back that port number to the traversal requester.
Sending/receiving of this traversal reply triggers a "simultaneous" traversal attempt at both sides; they try to connect directly to each other's IP addresses. Should a traversal request be rejected for any reason (for example, if one of the sides does not allow traversal - such as when using TOR, or is not using a proxy, or encounters a local error), the side which rejected the request will not accept any more traversal requests for as long as this relayed TCP connection is established, but will first inform the other side not to send any more traversal requests.
If traversal succeeds, a direct TCP connection between both peers will appear automagically on a new socket. If this was an incoming connection, the peer that received it checks whether the source IP address is identical to the one it was trying to connect to. Mismatch causes the peer to skip authentication and assume failure (but continue trying to connect and listening for a connection from the correct IP address until timeout).
Both peers then send to each other a traversal authentication packet which includes the 62-bit random number from the traversal request, over the new connection. There is no key exchange; the authentication packet is encrypted with the cipher and key that was already established for the relayed connection (not including the external encapsulation with the chosen proxy cipher). The sequence number is 1 for the customer and 262 + 1 for the client. If this authenticates and decrypts successfully to the requested 62-bit random number and the other known data, it means that traversal has succeeded. The traversal authentication packet from each side also includes an initial sequence number for this side of the traversed connection, which is currently an odd number. Note that the customer sets the 62nd bit of the sequence number of the client to one.
When traversal authentication succeeds, both sides close their second TCP connection to the proxy (that was opened especially to learn the source port number) and report success to each other over the traversed connection (with the new sequence number); this is repeated a few times. Upon receiving these success packets, the client sends a final "traverse switch" packet over the relayed connection and stops sending data over that connection. When the "traverse switch" packet is received, the customer sends a "close" request (with client number) to the proxy over its control connection, and starts sending and receiving data over the direct connection with the same cipher and key and (odd) sequence numbers that increment by 2 on each TLS packet. The proxy closes its TCP connection to the client, and the client starts sending and receiving data over the direct connection.
If the traversal authentication packet is not received from the other side within one second of sending its own traversal authentication packet, or if what was received over the direct connection could not be authenticated, the newly established direct connection is closed and traversal attempts continue up to a mutually agreed time limit of a few seconds (the same thing happens if a direct connection does not appear or does not succeed). Timeout means that traversal has ultimately failed; both sides close their second TCP connections to the proxy and further communication between them proceeds over the relayed connection as before.
UDP traversal (for audio calls) is also implemented. Both peers first send a "UDP proxy request" to the current proxy (over UDP). The proxy figures out who these packets came from by checking the source IP address and replies with a "UDP proxy reply" packet which contains the source IP address and port number of the requester. When a peer receives the "UDP proxy reply", it sends a "UDP traversal request" to the other peer over the existing TCP connection. As soon as a peer learns both its IP address and port number from the proxy, and the other peer's IP address and port number from the other peer, it starts the same UDP connection procedure used for direct connections (as described under Data Transfer).
UDP proxy request and proxy reply packets use the same cipher and key already used by the TCP connection; they are padded to a length divisible by 181. Proxy request packets have a fixed sequence number of 262 - 1, while proxy reply packets have a fixed sequence number of -1. These are a subject to rate limiting: when the set speed limit is exceeded, UDP control packets are dropped instead of being decrypted until speed falls below the limit.
Five random number generators (RNGs) are used. The simple standard C library rand() is used as needed for DHT network participation (deciding which DHT nodes to contact, calculating timeouts, and so on). This RNG is initialized with the current system time.
Additionally, four cryptographic RNGs are used:
All four cryptographic generators are implemented through a single algorithm, but are used and initialized differently.
RNG0 is used for "public" random data (stored to a file or sent over the network, such as Initialization Vectors) and initialized with:
If the minimal number of entropy bytes required for initialization was not collected during this process, RNG0 is also initialized with an "uninitialized" variable on the stack.
RNG0 is also used in the following special cases, in order to generate:
RNG1 is used for "private" random data (session keys) and initialized with:
It is re-initialized with audio data recorded from the first few seconds of the first audio call (except for audio test) after login.
RNG1 is used in the following particular cases:
Public values such as the IV required by the AES256-GCM cipher (used by openssl to encrypt its key exchange) and "client random" and "server random" values are generated by RNG0.
RNG2 is used to generate permanent keys or permanent seeds, and is initialized with data recorded from an audio test. Audio data is collected as 16-bit samples at the chosen sample rate and first debiased using the Neumann algorithm (which removes groups of two identical consecutive bits from the bit stream) before being fed into RNG2. Only audio frames where speech was detected are counted towards the minimal input entropy requirements (but the rest of the frames are used, too).
If the mouse is moved during this audio recording, data collected from mouse movements (cursor position and timing data) is also used. Cursor positions are taken as a difference to the previous cursor positions. Leading zero bits are removed from all mouse values; the sign bit is stored separately for cursor positions differences. The mouse data is then XORed cyclically into the debiased audio data (which is usually a lot more), before splitting it as described below.
When using RNG2 to generate a seed, that seed is generated in consecutive 16-byte chunks. Input audio entropy data is split to parts of equal size; the number of parts is equal to the number of 16-byte chunks in the seed to be generated. Then each part of the seed is generated independently of the others from its corresponding part of the input data. This increases input entropy requirements for longer seeds; the audio test terminates automatically when it detects that enough entropy for the seed size to be generated has been collected (this is the minimal required entropy multiplied by a hard-coded safety factor, currently two). It is possible to collect an unlimited amount of input entropy by performing multiple audio tests in a sequence and then use it for key generation (collected audio data is removed from memory only after it is actually used to initialize an RNG).
RNG3 is used to generate permanent keys at login, and is initialized with (part of) a permanent seed generated by RNG2.
The algorithm used to implement the four RNGs is based on ANSI X9.31 (also known as ANSI X9.17). Such an RNG is initialized (keyed) with "truly" random data exactly as big as a block cipher's key size, while its internal state is as big as that cipher's block size. Each time a block of random bytes (equal in size to the state size) needs to be output, the following algorithm is applied per sub-RNG:
if
RNG3
then
DT = encrypt (salt)
salt = salt + 1
else
DT = encrypt (DT xor
time
)
endif
state = encrypt (state xor DT)
output = state
state = encrypt (state xor DT)
If more random data is needed, this procedure is repeated as many times as necessary. If less random data is needed, the rest of output (up to the state size) is discarded. time is some data that depends on the current system (clock) time. For RNG3, salt is initialized to zero.
Each of the four RNGs is implemented using ten X9.17 RNGs (sub-RNGs), each of them having its own encrypt function. The first two use hash functions: HMAC-Whirlpool and HMAC-SHA3-512, which is calculated simply as SHA3-512 (key || state). The rest eight sub-RNGs use block ciphers that have a 16-byte block size (composite, Serpent, Twofish, RC6, MARS, Camellia, ARIA and CAST-256). To generate a random block, outputs of all sub-RNGs are XORed.
To initialize or re-initialize an RNG, entropy data (as described above for each of the four RNG types) or/and saved data is used. Saved data consists of initial key for each sub-RNG (the same key that each sub-RNG was last initialized with) and a saved state for each RNG (obtained by generating 16 or 64 bytes through each sub-RNG when saving the random data to a file).
Only the data of RNG0 and RNG1 is saved to a file and reloaded at login. The data (key and state) of RNG2 and RNG3 is removed from memory immediately (after generating a permanent key). Therefore, no saved data is used when initializing RNG2 with audio data.
Re-initializing RNG1 is implemented by saving its data to memory (instead of a file) and then immediately initializing the RNG with this just saved data and, in this particular case, collected audio entropy data.
When initializing RNG3, entropy data is missing, while "saved" data consists of:
RNG3 is also used to generate a unique 60-bit identifier of each sent file. In this case, almost the same procedure is applied, but it includes a 64-byte random token. This token is generated by RNG0 the first time such an identifier is needed and usually remains constant for as long as Simphone is installed. Then:
A single pseudo-random block is then output and 60 bits are taken from it. This value is multiplied by two and the value of one is optionally added, to produce an identifier of a file that is to be sent. The peer with the lesser ASCII address always adds one to the value, thus always producing odd identifiers, while the peer with the greater ASCII address never adds one, thus always producing even identifiers.
Exactly the same procedure is used to obtain a unique and permanent 60-bit computer identifier, which is used internally by file transfer to keep track of which files were sent to which computer, in case a user connects from or to the same user identity at more than one computer. System file information is taken from the user directory in this case or it is empty if there is no user directory. This value is not multiplied by two.
Since initializing X9.17 requires fixed-size keys (equal to each cipher's key size for each sub-RNG) and states (equal to ciphers block sizes), entropy data needs to be hashed to produce keys and states of the required sizes. Input (entropy) data is processed in blocks in the following way:
After the fifth group is hashed and if there is more input entropy, the rest of the entropy is hashed continuously the same way (first taking a first group of ten blocks, then a second group of eight blocks, and so on).
Input blocks have different sizes, equal to the key size of the sub-RNG's cipher, as follows. For hashing that produces key or state of a sub-RNG, a hash function based on the same block cipher as the one used by that sub-RNG is applied. This hash function has a block size equal to its cipher's key size, and a state as big as its cipher's block size. The hash state is always first initialized to the ASCIIz string "Merkle-Daamgard". An input block is processed by using it as a key to the block cipher to encrypt the current hash state; the result is then XORed into the state and becomes a new state of that hash function. To obtain a final result, the input is padded by appending a 1 bit, followed by the input's total length, and padding with zeros to a block size. Whirlpool and SHA3-512 are used directly (not in HMAC mode) to obtain a 64-byte key from hashing a 64-byte first block, and a 64-byte state from hashing a 64-byte fifth block (this key and state are then used by their sub-RNGs, as for cipher-based sub-RNGs)
To fully initialize all sub-RNGs, the following input blocks are therefore needed:
The total size of all input blocks needed is then 1472 bytes, so that's the minimum amount of input entropy required for fully initializing an RNG. Less entropy could also work by simply deriving shorter keys (and/or constant initial state) but this is not used.
Keys and states so produced from input data are used directly to initialize sub-RNGs in case no saved data is present. If saved data is present, it is merged into the produced values by XORing (both key and state) to them before initializing sub-RNGs.
Private keys are generated either from seed or directly from audio data. A random binary seed is converted to an ASCII word list after adding key type and checksum to the binary data. Four key types are currently supported: 2048-bit, 4096-bit, 8192-bit and 16384-bit (EC private keys are always 512-bit). RSA keys generated directly from audio data may be of arbitrary size (between 2048-bit and 16384-bit).
Private keys are generated from an user-specified seed by initializing RNG3 and then using crypto++ to generate first an RSA key followed by an EC key. Basically this amounts to generating random blocks of the required key size, until a generated block happens to be within the required range and, for RSA, also a probable prime.
First a check whether the seed security level is bigger than the key security level is made as follows:
If the seed size is at least three times bigger than the required size, RNG3 is initialized three times to independently generate first two prime factors of an RSA key and then an EC key. One-third of the seed is used to generate each of these three numbers.
If the seed size is at least twice bigger than the required size, RNG3 is initialized twice with each half of the seed, to generate independently two prime factors of an RSA key from the two halves of the seed. It is then initialized a third time with the whole seed to generate an EC key.
If the seed size is not twice bigger than the required size, RNG3 is initialized only once with the whole seed and then used to generate first two prime factors of an RSA key and then an EC key.
The maximal supported seed size is 3072 bits, which allows to generate each of the required three large numbers independently from a 1024-bit seed each.
When generating private keys directly from audio data, audio entropy is always split into three parts of equal size, and then RNG2 is initialized three times with each of the three parts in order to generate each of the three large numbers.
In the above calculations, if the seed size is not divisible by the number of numbers to generate independently (two or three), the remainder of bytes is always added to the last one.
Generated RSA keys currently have a public exponent of 65537, but any exponent is accepted for other users' public keys.
To support use of multiple user identities at the same computer, data files for each identity are stored in a separate subdirectory. The key file, saved random data, configuration table, saved pending messages, list of current file transfers, DHT node cache, contact and proxy lists are encrypted with the composite cipher by default. The key file (which contains the two private keys, RSA and EC) may be:
The other files are encrypted using a key derived as a Whirlpool hash of the EC private key concatenated with the RSA private key.
A saved file has an IV (generated by RNG0) and a hash tag, as usual. When decrypting files, an authenticity check is done on the whole contents before decryption. For non-encrypted files, only an integrity check is done.
The user can choose a preferred cipher for file encryption. In order to decrypt a file encrypted with any supported cipher, a block of random bytes is encrypted with that cipher and prepended to each file. The algorithm to do so is the same as for encrypt_cipher and decrypt_cipher, as described under Cipher Setup.
The key generator (login prompt) does not store data of its random number generator. To generate an IV for saving a generated key file, RNG0 is initialized with the generated 160-bit address in addition to system data (as described under Random Number Generation).
When a file is saved, the previously saved file is first renamed to a .old file for backup purposes. If saving failed for any reason, the .old file is renamed back to the original file name. Before the first rename, the previous .old file is deleted. When loading a file, if loading fails for any reason, an attempt is made to load the .old file. Thus no information should be lost if the program crashed or was otherwise interrupted during the saving, and could not rename the .old file back to the original after the failed save. The only exception is the random data file, which does not store a backup copy. The key file is always stored twice (so the .old file is created on its first saving). Before deleting a file, its contents is filled with a random pattern first.
Contacts are flagged as voice-authenticated by reading out the first 160 bits of the Whirlpool hash of the concatenation of their EC public key followed by RSA public key followed by the first four bytes of K2 (the RSA-XOR value of the current session) and entering it into the program at the other side during an audio call. One can also read the peer's address for additional security, but this is not required.
Users announce their IP addresses to the DHT network by publishing a 160-bit hash of their address, which is calculated as follows:
In TOR mode, the DHT is still searched for contacts, but no announcements are made to it. If DNS queries need to be made for DynDNS hosts, they go through the TOR proxy, and audio data is transferred only over TCP (through TOR). Incoming connections are not accepted (the listening port is closed). Intranet and internet IP addresses and port numbers are not reported for connection reversal or traversal.
When connected through a proxy (as described under Proxy Key Exchange), the proxy announces its own IP address to the DHT for all of its customers. Note that when a user has connected to a proxy, the proxy has verified that this customer is in possession of private/public key pairs that hash to its address, before announcing it to the DHT.
Throughout the library, length-checked strings and collections of strings known as "simtypes" are used. Network packets (before encryption) are made out of them, and they are also used to store/load files and to communicate with the GUI, in addition to being used internally. The few cases where static buffers are internally used are length-checked, too. This hopefully avoids buffer overflows that most C programs are susceptible to. When freeing a simtype, stored data is overwritten with zeros before releasing memory back to the operating system. There is also a global setting that allows to lock the whole program into RAM to prevent swapping to disk (not turned on by default).
This table lists possible security breaches and their expected impact in descending order of severity.
risk |
If... |
Then... |
countermeasures |
---|---|---|---|
1:1000000 |
a large quantum computer has broken both ECDSA and RSA |
your security is totally compromised; third parties can also revoke your identity |
not available |
1:1000000 |
both RSA and SHA2-512 are broken |
you have no security any longer; past communication (prior to the break of SHA2-512) is still private |
use a larger RSA key size |
1:100 |
a friend of yours has tricked you into running a trojan horse or has ran one when you let him use your computer |
nothing is private or secure, including your simphone communication and private keys. Your former friend is now your overlord |
do not use software unless absolutely sure it's backdoor-free and do not leave your computer unattended |
1:1000 |
someone else has physical access to your computer or remote access to your user account at the computer |
nothing is private or secure, including your simphone communication and private keys |
keep your computer secure |
1:1000 |
unauthorized parties have obtained a copy of your private keys |
the parties can impersonate you or revoke your identity; future communication is no longer private, but past communication still is, even if network traffic has been recorded prior to stealing your private keys |
keep your private key safe and your computer secure |
1:10000000 |
the eight RNG symmetric ciphers AND Whirlpool AND SHA3-512 are all broken |
the simphone random number generator is not secure. Your keys may be compromised; your communication is probably not private |
not available |
1:100000 |
a symmetric cipher is broken |
your communication (past or present) may not be private only in case you have used that cipher |
choose a strong symmetric cipher |
1:10000 |
Whirlpool is broken |
contact verification is not secure; a man-in-the-middle could successfully impersonate your contacts even after verification |
when reading out the verification tokens, read out and compare your simphone addresses, too |
1:10000 |
the mainline DHT network is broken |
you will not be able to connect to any of your contacts unless you have already connected to them previously and they have not changed their DynDNS hostname (if they have any) or IP address since then |
use DynDNS |
1:100000 |
ECDSA is broken, but RSA is not |
someone can send you fake status messages and eventually perform denial-of-service attacks against you, but they will not be able to send or receive data to or from you |
not available |
1:10000 |
SHA2-512 is broken, but RSA is not |
same as the previous point, but it may be more difficult to mount a successful attack in practice |
not available |
1:100000 |
both RIPEMD-160 and SHA2-256 are broken |
someone can impersonate contacts you have added, but only before you have connected to them for the first time. You will detect the security breach as soon as you attempt to verify your contacts |
verify all your contacts |
1:10000 |
AES256-GCM is broken |
third parties can replay status messages you have previously received from your contacts |
not available |
1:1000 |
a man-in-the-middle has taken control of your communication lines |
the man can disrupt your communications at will; public keys are revealed |
use different networks |
1:1000 |
a man-in-the-middle has taken control of your communication lines |
the man can impersonate unverified contacts, which you have added after your connection was hijacked; communication with verified contacts (even after the hijack) is still fully secure. For unverified (impersonated) contacts, Simphone would eventually raise an alarm if you moved your computer to a non-hijacked network environment |
verify all your contacts and use different networks |
1:100 |
someone is running a malicious simphone proxy |
if you happen to connect to that proxy by chance, the proxy will obtain your public key and can prevent your contacts from connecting to you or drop such connections at random times; you can still connect to your contacts, unless they suffer from the same problem |
make sure you have incoming connectivity |
1:10 |
your network traffic is being recorded |
the records will show IP addresses of your contacts (unless you or they are using TOR) |
use TOR |
1:100000 |
RSA is broken, but neither ECDSA nor SHA2-512 is |
increased risk in case something else gets broken |
use a larger RSA key size |
Numbers in the above table are an attempt to compare likelihoods. They are not real probabilities.