Packet analysis – or information about traffic coming into a network – has been the focus of most big data analyses related to cybersecurity, particularly in the federal government. Whether looking at the volume of traffic, where it is coming from, or where it is going, agencies have traditionally viewed cybersecurity threats from the outside in.
But insider threats have emerged as a great concern for government agencies, and often go beyond whistle blowers or deliberate employee attacks. In many cases, innocent behaviors can be just as threatening. Even simple actions like employees connecting to unsecure networks, clicking on bad links, or hosting sensitive information in public places can pose a significant threat to an agency.
The time has come for agencies to view cybersecurity from the inside out. Instead of just focusing on packets, agencies need to also focus on people, and — more specifically — their behaviors. Insider threats, whether aggressive or accidental, are a real threat to federal agencies. Ensuring each agency has a behavioral analytics toolkit should be the first line of defense.
Behavioral analytics to predict threats
Behavioral analytics aim to understand the interaction between people and systems. They tell us who is doing what, when and from where, and can help us predict potential threats. For example, Suzy in accounting may suddenly be visiting a Website 100 times a day that she had previously never visited. A behavioral analytics solution would identify this as unusual behavior that should be explored further to uncover a would-be threat.
When a potential threat is observed, confirming or denying it involves a combination of manual and programmatic analyses of new data combined with historic data. The manual effort often consists of analysts interactively slicing and dicing data to learn more about the potential threat. Each potential threat will be unique, and new questions will be asked each time which will require getting instant answers from the data.
Apply this concept to an entire agency and the massive amounts of data that it has, and it becomes a big data challenge. For example, the EINSTEIN project — an intrusion detection system that monitors the network gateways of federal departments and agencies for unauthorized traffic — deals with petabytes of data from multiple agencies. The depth and scope of these datasets can be a challenge as well, as agencies are looking at broad time frames and different data elements.
Luckily, there are several technologies that agencies can consider to leverage all of their data as part of a behavioral analytics solution.
Cluster computing – One tool that has gotten a lot of attention lately is a cluster computing tool called Apache Spark. Cluster computing expedites data processing. This is important, especially for government agencies that need information now, not at the end of the day. Spark also facilitates streaming analytics — the ability to identify threats in real time by processing data as it comes in. Spark streaming allows agencies to monitor streams of data to identify events that may be possible threats. CISOs like Spark because it is easy to use and works well with visualization tools, allowing them to see different modes of data.
Machine learning – Machine learning algorithms can draw insights, identify patterns and continually improve the accuracy of their insights based on massive, historic and dynamically-changing data. This makes predictive analytics possible. MLib is a popular open-source machine learning project that works with Spark and other tools.
Business intelligence – Business intelligence involves data mining, online analytical processing, querying and reporting. Business intelligence tools have been in use for decades, but recent innovations using tools that take advantage of today’s big data platforms are rejuvenating their relevance and impact. Impala is a connector for business intelligence tools that allows them to analyze broader, deeper datasets on Hadoop.
Storage – Data is deep and wide in government agencies, and storing all of it is no easy task. The Hadoop ecosystem of open-source projects provides a cost efficient way to store massive quantities of sparse data, while also offering capabilities, including compression, in-memory operations and real-time, large-scale inputs and outputs.
Governance and security – Governance and security controls are meant to ensure that only those given proper authority can access and manipulate data. There are a number of tools within the Hadoop ecosystem that have been purpose-built to address these needs:
- Apache HBase provides random, real-time read/write access to large data sets, including cell-based access control.
- Sentry is an independent security module that integrates with open source SQL query engines Apache Hive and Impala, delivering advanced authorization controls to enable multi-user applications and cross-functional processes for enterprise data sets.
- RecordService is a distributed, scalable, data-access service for unified access control and enforcement.
People before packets
Behavioral analytics have emerged as an impactful way for agencies to predict threats, particularly insider threats. The volume, variety and distributed nature of agencies’ data sets can make the application of behavioral analytics seem like an unsurmountable challenge, but with the right tools, agencies can make data work for them.
Of course, each agency’s individual needs, experiences and current stage in the data journey will drive its adoption of a particular suite of tools. But what holds true across all agencies is that these open and flexible technologies are driving government innovation and can help agencies employ behavioral analytics to thwart insider threats.
Rob Morrow is senior systems engineer for Cloudera. Previously, he was the engineered systems architect, Oracle National Security Group, and principal systems engineer at Oracle.