[Cryptography] doing traffic analysis for good - analysing TLS metadata for evidence of malware

iang iang at iang.org
Wed Jun 21 07:50:21 EDT 2017


(not clear to me what is the key data here that indicates malware, but 
interesting story nonetheless.  Only a few snippets copied below...)

https://continuum.cisco.com/2017/06/20/security-without-compromise-how-cisco-engineers-used-machine-learning-to-solve-an-impossible-problem/

...by analyzing millions TLS flows, malware samples and packet captures, 
Anderson and McGrew found that the unencrypted metadata in a TLS flow 
contains fingerprints that attackers cannot hide, even with encryption. 
TLS is really good at obscuring plain text, but by doing so it also 
creates a “complex set of observable parameters” that engineers like 
McGrew and Anderson can use to train their data model.

For instance, when a TLS flow begins, it starts with a handshake. The 
client (like your Chrome browser) sends a ClientHello message to the 
server it’s trying to reach (like Facebook). The “hello” message 
includes a list of parameters, like what cipher suite to use, what 
versions are acceptable and a list of optional extensions.

ETA examines the ClientHello exchange, which holds many fingerprints 
that can be used to determine what traffic is malware.
TLS metadata like the ClientHello are not encrypted, because they 
transfer back and forth before the encrypted messages begin. This means 
Anderson’s model can analyze the unencrypted data with no knowledge of 
what is actually inside the message. And the model will then accurately 
categorize what traffic is malware and what is benign.

According to Anderson’s latest testing, not only does this approach 
preserve user privacy by not breaking encryption, but tests of ETA 
against large samples of  network data and malware samples show 
promising results for its accuracy. Using only NetFlow features, ETA 
catches malware about 67 percent of the time. When ETA is fed those 
NetFlow features with additional feature sets like Service Packet Length 
(SPL), DNS, TLS metadata, HTTP and others, the accuracy jumps up to more 
than 99 percent.



More information about the cryptography mailing list