Machine Learning and Data Mining for Network Security

Pursuing basic and applied research on network security and intrusion detection, with a focus on applying statistical methods and machine learning techniques to massive data sets for measuring and predicting security incidents, identifying precursors of malicious activity, and automated methods to assist with forensics, analysis, and remediation. Related work in authentication fraud and mobile malware detection/identification.

Related questions of interest:

  1. Are there 'natural laws' of network security? There is tantalizing evidence to suggest that -- at least at large scale -- network security incidents begin to take on some properties (memory and temporal clustering) more reminiscent of natural events than human-generated ones. Is this true? What physical/logical aspects of the network drives this behavior? How do we model them? Can we?

  2. How can computational complexity results be tied into network security? Accurate parsing of user-supplied inputs (to e.g. avoid SQL injection attacks) is well-studied, but poorly adopted. A surprisingly deep "underground" literature suggesting the ultimate impossibility of reliable virus detection also exists. Can we make "good enough" probabilistic recognizers (e.g. ones that make an error on at most some (hopefully small) fraction of inputs with some minimum (hopefully large) probability) for certain classes of grammars where the recognition problem is formally undecideable?

  3. What is algorithmically feasible with respect to intrusion detection results, particularly in the domain of deep packet inspection? The amount of data to be inspected appears set to continue on its exponential trajectory, while the introduction of IPv6 has made DPI even more difficult for a single packet as well as resurrecting some classic IPv4 attacks in new IPv6-specific variants. How can we successfully drink from the proverbial firehose?