Analyzing network flows is challenging both because of the complexity of interactions that it captures, but also because of the sheer volume of the data that can be captured from routers and monitors in large networks. Hadoop is a popular parallel processing framework that is widely used for working with large datasets. However, there is a lack of information about effective uses of Hadoop on NetFlow datasets. Typically, research publications focus on presenting results of work built on top of Hadoop, rather than enlightening about effective uses of the popular framework. In this paper we make a first step in achieving that goal. We identify basic tasks making up any exploratory analysis process of netflow datset, describe their realization in Hadoop framework and characterize their performance in two commonly used Hadoop deployments.
Authors:
Xiaofeng Zhou, Milenko Petrovic, Tom Eskridge, Marco Carvalho, Xi Tao
Bibtex:
@article{, author = "Xiaofeng Zhou, Milenko Petrovic, Tom Eskridge, Marco Carvalho, Xi Tao", title = "Exploring Netflow Data using Hadoop", journal = "Proceedings of the Third ASE International Conference on Cyber Security", year = "2014" }
Download:
[pdf]