Collateral Damage in the Cloud

So you're hanging out minding your own business when suddenly you get hit by a car. After the investigation, it turns out that the car was actually aiming for the person next to you. Only you're the one that is dead.

I was participating in a Cloud Computing Security workgroup run by ENISA this week, and the concept of collateral damage came up.

Cloud computing gets its benefit through the efficiency of resource allocation. It gets this benefit by allowing multiple parties to share the resources since they only need some subset of the maximum available resources.

Regardless of the architecture, there will be a point where resources (cpu, network, storage, application, etc.) are shared, and an attack against those resources may affect multiple parties. For these scenarios,  the risk associated with those multiple parties must be considered.

Some examples of collateral damage:

  • An attacker acquires a virtual machine through commercial means or compromise and executes a previously unknown VM escape attack, or an attack against the storage subsystem.
  • An attacker finds a SQL injection vulnerability against a SaaS provider.
  • An attacker operating from resources shared by you, is identified by authorities and those resources are confiscated by government authorities.

It is not clear that these risks are particularly high, and if you are extremely risk tolerant it is unlikely that they will impact you at all (it is the resource with the

Security Measurements Illustrated

Once again, Richard Bejtlich at TaoSecurity takes quantitative measures to task:

"Calculating risk" or "measuring ROI/ROSI" are all subjective jokes.
 


Remarkably, after making this statement, he follows it with:

It doesn't matter how much you spend on security (inputs) if the organization is horribly compromised (outputs).

he irony of this superficial take on security spending (and perhaps the biggest "if" as weak probability statement in the security profession) highlights the nature of the problem and gives us an opportunity to see the value of applying some quantitative measures. Some of you might be agreeing with the statement, but if I say, "Okay, I am going to spend a trillion dollars on it" you would likely suggest it is a ludicrous amount. It might even make you mad, even though I wasn't the one who said "it doesn't matter."

Here's the point: someone lost in ambiguity-land can say "it doesn't matter how much you spend" and not really mean it, while someone with quantitative measures would quickly be shown the door at $1 trillion, $1 billion, $100 million... until some point that is considered "reasonable" within the context of the situation.

In short, folks who use quantitative measures can be judged. The level of precision may not be accurate, but everyone understands it and can offer their own value judgments to the approach.

On the other hand, using subjective "expert" opinion, one can establish much more wiggle-room. I think of this as a cop-out.

No doubt, this stuff is hard. It will never be perfect. But it is ultimately more beneficial to an enterprise than existing guesswork, if only to get past the subjective joke of suggesting "it doesn't matter how much you spend on security." Clearly, it matters.

Security and "Healthy"

Last week, Hoff pointed to a presentation by Mark Masterson on cloud security. Given Hoff's level of enthusiasm, I was underwhelmed by the content, but sometimes that can happen.

An excerpt:

(slide) What's the cloud got to do with this? (slide) It increases the complexity of the overall system. (slide) Makes an existing problem more urgent.


My takeaways from Masterson's deck were:

  1. You can't use the risk formula to "prove" that a complex system is "secure."
  2. Cloud computing increases complexity, so you can't use it (the formula) or prove it even more.
  3. Somehow, this proves that people that support "defense in depth" are wrong.
  4. You are better off "finding ways to design systems that cope gracefully with uncertainty." (This, of course, is also unprovable.)
  5. We need to stop thinking in terms of "security" and start thinking in terms of "health." 
  6. The Cloud will be just as "safe" as "healthy" as you already are.

To be honest, I think there are HUGE holes in Masterson's logic (I still can't decide whether the first part was intended to be logical or simply analogous). That said, I agree with a lot of what he is saying, which boils down to a call for de-perimeterization and component-based security supported by autonomic computing.

I suppose my biggest criticism is that Masterson talks about "healthy" as if everyone agrees on what's healthy (and, analogously, what is "safe"). I don't trust/believe this by a long shot. Take, for example, this snippet from a recent NY Times article on health:

No study, these critics say, has ever proved a causal relationship between moderate drinking and lower risk of death -- only that the two often go together. It may be that moderate drinking is just something healthy people tend to do, not something that makes people healthy.


(This little snippet was my original inspiration for writing this post, though it sort of went in a different direction.)

Ultimately, I think people are much better off using the risk formula as a model to consider how things will change in their cloud environment - exposure to more users (sources of threats), potential for more components (attack surface), and effect on the consequences.

Cost-Benefit vs. Cost-Effectiveness

Dans Geer and Conway have their new "For Good Measure" column up where they deprecate cost-benefit in favor of cost-effectiveness. It is a great column to learn about cost-effectiveness. The part that leverages true/false positives/negatives is particularly useful to folks trying to work out effectiveness of any controls. They also indirectly show a process that can be used to calculate ROI in security.

The only piece I take issue with is the initial assertion that cost-benefit is worthless. This, of course, is flawed. If you are operating in the interests of your enterprise, you can't opt-out of cost-benefit, you can only obscure it. Luckily, the meaning detective is at work to show you the light ;-).

The article shows its true colors in the last paragraph of the first page:

The first test, if used alone, would leave you with nearly a million false positives—too many to
fix; the second test, if used alone, would cost you $100,000,000—completely unaffordable; but used together and in that order, you find 90% of the flaws for US$11,233.34 apiece.


Do you see all the value statements in the paragraph? the phrase "too many" and the word "unaffordable" topped with an implicit assumption that the $11,233.34 (note the impact of perceptual contrast) must somehow be "affordable" all suggest that the value is not worth it in the first two cases and clearly worth it in the final one. So we have a rudimentary confidence interval of expected value between the "unaffordable" $100,000,000 and an "affordable" $10,000,000 (the $11k times 900 vulns).

A more detailed analysis would factor in risk to make the full value judgement, and there we have the makings of value which can be used in a cost-benefit analysis.

It is clear that the authors are working from experience to assess what is "worth it" or not. That is what we all do. But we shouldn't ignore the fact that we are making value judgements all the time and we can (should, and in many cases must) translate that with a bit more rigor, lest we end up spending too much in the first place.

Security and Risk in the Cloud, ongoing...

There has been a lot of discussion recently about whether the cloud is the same or different. Most of the time, these chocolate-peanut butter (tastes great - less filling?) arguments in tech involve different levels of granularity in thought. On the one hand, everything is different, but on the other, everything is the same. So everybody is right... and wrong.

I can understand Schneier's point that the aggregation of computing resources (timesharing -> cloud) is similar, but I think he completely misses the changes in architecture and appears to ignore the changes in risk.

From an architecture perspective, the big difference between cloud and timesharing is the entire notion of "loosely-coupled" that dominates service-oriented architecture. In the early days of the mainframe, architectures were closed and security was dominated by menu-based access control. This worked fine based on the threat profile.

Today, the flexibility provided by "loosely-coupled" systems and open networks increases the attack surface dramatically (note that we have gotten used to this over the past 20/30/40 years but it is a significant change compared with timesharing). Add standards to the equation and we have a number of dots that are connected in delivering any service, where each dot is a computing component but also an attack point.

On the risk front, the cloud has the potential to be significantly different to what you are used to. On the SaaS side, the value proposition to attackers is significantly higher if they can compromise a single application and potentially gain access to the data of many organizations. Of course, with SaaS there are also significant opportunities to enhance controls over configuration and patch.

On the PaaS and IaaS side, you can't ignore your neighbors and what they might be doing. The types of computing being done may impact your own installation. Consider the attacker that purchases his own compute VM with a higher incentive to escape the VM and get to the hypervisor level. Or on the business side, the possibility of being co-located with an illegal VM and having your compute resources being confiscated by authorities as they sort through illegal activity.

As usual, a full assessment involves understanding your relative changes as you consider the cloud - ask yourself what changes to your programs, data, and users.

R.I.P. Peter Bernstein

I just found out that Peter Bernstein died last week. I can think of no other book that has influenced me more in my career in risk management than his "Against the Gods: The Remarkable Story of Risk." Not only did it help me to realize that risk is a function of bad events (i.e. process matters indirectly, the percentage of bad transactions drive risk) but it also introduced me to Kahnemann and Tversky's Prospect Theory and the entire field of behavioral economics.

For that, I thank him.

(hat tip: the Curious Capitalist)

The Cloud's Pay-per-use Model

A while ago I posted my take on "Defining the Cloud" that was mostly tongue in cheek, but I also made a comment about the NIST definition of cloud:

And I have to say that I really, really HATE the idea that the "pay-per-use model" is considered some sort of seminal part of all this...


In the comments, "Jeff" asked:

    "Can you elaborate as to why you hate the "pay per use," portion of the definition?"

And I replied:

Yes, mostly because my position is that cloud should focus on technical architecture and ppu is a pricing model. If the "cloud" thrives (which I am not convinced it will) then demand will go up, scalability will take over, and all the providers will convert to a subscription service (people like ppu when they are dabbling and subscription when they become heavy users).

So I see it as limiting* and unnecessary.

*This might seem strange since I am railing on the all-encompassing nature of the cloud, but at this stage the buzz is too strong and the cloud has lost all meaning.

I recently was researching the Eucalyptus open-source, private cloud offering and found this:

Myth #4: "Clouds only provide 'pay-as-you-go' access."

One of the most attractive features of the public clouds is that they allow users to change their resource usage dynamically in response to customer demand or offered load, and to pay only for the resources being used from moment to moment. While this type of charging is an important feature, it is by no means the only method a cloud can and should support. In particular, if an allocation is to be shared among several users within a single organization, it may make more sense to offer a maximum resource quota on a subscription basis to keep conflicting resource needs from causing confusion. If multiple users are to share the VMs within a single allocation, enabling all of them to acquire and release resources dynamically (possible resources in use by other users of the allocation) can lead to chaos.

This is an excellent illustration of my point - putting something so arbitrary in a definition welcomes the opportunity to ignore the definition entirely (heck, there are 20+ to choose from).

Recession-ready Acquisitions: The Strong will Survive

Well, from a mergers/acquisitions standpoint, 2009 has been fairly robust for security companies. Perhaps there are fire sales going on, or perhaps the strongest companies are flexing their muscles. More likely a combination of both...

Here's what I show so far:

January Computer Associates Orchestria
January Grisoft / AVG Sana Security
January Riverbed Mazu
January RIM Certicom
February Netezza Tizor Systems
February netForensics High Tower (assets)
February Trustwave Mirage Networks
February Versata Alterpoint
April LogLogic Exaprotect
April Marshal8e6 Avinti
April Symantec Mi5 Networks
April Thoma Bravo Entrust
April Trend Micro Third Brigade
May McAfee Solidcore
May QinetiQ Cyveillance
May EMC ConfigureSoft


Have I missed any 2009 acquisitions? (BTW, send me a note if you want my Security M&A spreadsheet that lists almost 250 acquisitions).

A Few "Favorite" Security Metrics - RSA 2009 Edition

I moderated a "Security Metrics Exchange" peer-to-peer roundtable at RSA 2009. Here is the abstract:

"Many metrics sessions never actually get to metrics. In this p2p, we will discuss real-world metrics in use. To participate, you must bring your own "Top 5" metrics and be ready to discuss their value proposition and use cases. We will interactively evaluate the metrics and everyone will leave with the group's list of the most useful metrics in today's enteprise security environments."


So, the goal was a simple one, and it is clear that there is no overarching structure to the metrics, but I think it is useful to see what metrics were top-of-mind at the session. Here are the ones we came up with (and discussed) during the session:

% of Managers that Certify User Roles

Number of Password Resets

Cost per Password Reset

Number of Failed Logins

Number of Stale Accounts

Dead but Active Accounts

Login Spoof Attempts

Avg Time to Provision

Number of Privileged User Accounts

Number of VPN Connections per Week

User Account Growth Rate

% Systems Patched {OS; Platform use}

% Bandwidth Change

Number of Viruses

Avg Time to Recover

Avg Vulns per Host

Number of Applications

Number of Ports Open

Number of Servers

Number of Desktops

% of Systems with AV Installed {turned on; up-to-date}

Number of Exploitable Vulns

% of Security-related Defects

Avg Time to Find Vulns

Incidents per Month

I am not a fan of some of these metrics, but they are interesting nonetheless. I hope to find time to analyze them further in the future.


Is PCI Working?

I was just going down the path of looking for PCI numbers to follow up on my post about the Verizon DBIR Report and PCI Compliance, when, lo and behold, they come from the heavens (thanks, New School!). So, now we know that 362, 702, and 2634 Level 1, 2, and 3 merchants respectively. And essentially all of them are PCI certified.

In my previous post, I wrote:

So, for example, if 19 PCI-compliant companies were breached out of, say 2 million, then that is a pretty good effectiveness ratio. And if we compared it to 81 non-PCI companies out of, say 1 million, then it would be an interesting point in favor of PCI. Of course, since we are working hypotheticals here it is easy to imagine a scenario that is exactly the opposite here.


We don't have Level 4 Merchant numbers*, which some accounts I've read suggest number in the millions, but it seems unlikely to me that Verizon would be called into shops that small (this is a bigger assumption than I like, but there it is nonetheless). At least we can create a better example with the numbers. So, 17 of the 90 companies (19%) that Verizon worked at claimed to be PCI Compliant. Using the Level 1-3 numbers, that means 17 out of 3,700 PCI compliant companies were compromised, for a success rate of 99.54%.

We can now suggest that PCI is "working" if the success rate for non-PCI-compliant companies is lower than 99.54%. We don't really have a good way for determining a comparable population of companies in this group, but we can find the equivalent population size and infer from there. To have the same success rate, the 73 remaining cases must be part of a comparable group of about 16,000 non-pci-compliant companies.

Here is where it would help to have a better sense for the number of companies at various revenue levels, but I don't have quick access to them, so you have to decide for yourself whether the comparable population is greater than 16,000, in which case PCI is not working, or less than 16,000, in which case PCI may be working.  

Anyone want to offer an opinion?


* There are a lot of caveats here, but this is just a thought exercise anyway.