As an industry, we're good at looking at code compared to looking at security. You can run all the tools you want, you'll find security bug but you won't uncover any design bugs: they're the hardest to fix.
At the very high level, threat modeling is about identifying threats, mitigate them and move on. A couple of years ago, the threat modeling information we were giving was pretty lousy, you couldn't use it much if you were not a security expert. That's what we've been focusing on over the last few years.
When the Windows Mobile team did their first Threat Model, they had to answer a question: Is a stolen mobile devide still in the threat model. If the bad guy has the device, are you still trying to mitigate? They decided to try and give a fight, and include it.
For Windows Server 2003, we've asked the team to think about an admin browsing on a Domain Controller. If you compromise a DC, you can compromise the whole environment. The question always come to mobile code, and we decided to deactivate all mobile code.
Finally, you need to define who your users are. If your user is an IT guy rather than a mother, they're different people. When you present a mitigation to a user, will they understand it?
You have to model your application. We use Data Flow Diagrams for that. You model data, which is where most of the security issues come from these days.
Most whiteboard architectures are like DFD. [EDIT: Won't reproduce the diagrams, the process is defined on Larry's blog.]
When DFD talk about processes, they're not .exe, they're whatever process data.
You need to define in your DFD trust boundaries. You need to decide if two processes, when they talk together, if they trust each others. In DFD it's explicit.
For Vista we decided to add this functionality to run some processes with lower privileges, like IE runs in protected mode, and there's a trust boundary between IE and the OS. A process boundary may be between the system process and the user, as you don't trust data coming from the user. Finally Kernel transition is also a trust boundary.
Theres several types of DFDs. Context diagrams give a high-level content. In level 0, you dig into more details. I've rarely seen much deeper than Level 1. If you go over, you're probably in analysis paralysis.
One feature that never made into Vista was the Castle project: it was where workgroups meet the domains. In a DC the domain authenticate you. In the home, you have several computers but no domain. If you have 5 computers and no DC, how do you synchronize files, passwords etc across machines. So we came with the idea of a castle, a machine that was responsible for synchronizing the passwords.
Local user ---> Castle service -/-/-> Remote castle
The level 0 DFD contains many services. The shell runs as the user and communicate with the castle service: There's a trust boundary.
The castle service runs as SYSTEM because it has to manipulate the SAM. Config data in the registry is accessed by the Castle service but there's no trust boundary. And finally the machine boundary.
Purists would say that a service should be a complex service and be described in more level, but if it all runs within the same process boundary, having a simple process is good enough.
Each DFD is susceptible to certain kind of threats: STRIDE. [EDIT: again, Larry has the description. Saves me some typing]
Example of repudiation. My brother in law lied when sending a check. He said it was sent, I thought he was lying. I got the proof when the check arrived, because they were all post-marked. You use a disinterested and trusted third party.
Each element in the DFD is subject to attack. How you attack them depends on the element type. You attack data stores differently than you do with processes. How do you protect these elements?
You need to determine threats. You have primary threats and secondary threats.
If you have an external entity (a user for example), you're subject to spoofing. It's subject to repudiation. Data flow and data store can all be tampered. The vast majority of DOS come from processes, rarely against data stores. Finally processes are subject to all of the STRIDE attacks.
Secondary threats come from threat trees. They've been around for a long time. They're a bunch of pre-conditions that can lead to a threat. Unless you're a security expert, you do not know what the pre-conditions are. We made away with threat trees, and work on canonical threats.
For every primary threat, we build the threat tree. At the top of the root you have the prime threat. For example spoofing a user, there are 4 ways of doing it. For the first way, you have several reasons, and several causes which lead you to the leaf nodes in the tree. You can ask very interesting questions about the leaf nodes. [EDIT: Example about user spoofing].
The example of analyzing a bank log-in system. The connection is SSL, the password has to be strong. However the cookie was predictable, as it was i+1 for each new authentication. Suddenly by predicting the cookie you can hijack someone else's session.
We have in the SDL book all the threat trees and all the questions you can ask about the tree nodes.
Every single information disclosure threat can become a privacy issue. If you have an PII you can get a lawsuit. Any data store can be attacked.
Now you need to calculate the risk of every of these threats. We used to use numbers to calculate risks. but they're always wrong. Plus, the more degrees of freedom you have in the calculation, you can easily fudge the numbers so you don't have to fix the bugs. The more numbers, the easiest.
We used to use DREAD. I don't like it. Even David posted a weblog about it.
Calculating risk with heuristics with a simple rules of thumb: we remind you of that with the MSRC bulletin rankings. The problem with numbers is that if the number is under a level, subjectivity will permit you not to fix it, whereas the MSRC you have no way of doing that. You ask is it remote / local, deos it crash the machine, is it a server or a client product. The different levels are all described in the book.
Now you need to mitigate the threats. You have a bunch of options: leave as-is, it's low priority, trying to fix it may delay the product etc; remove the product like we did with castle; remety with technoogy counter measure; warning users, but i don't believe in the prompt, you push the security of the system on the user instead of taking ownership of the problem, its not a good solution if it's your only mitigation.
Each STRIDE threat can be mitigated. Spoofing, you have authentication. The authentication technologies you use are different: Amazon.com uses SSL. They use encryption, authentication and integrity. Tampering, you can have integrity: Digital signatures, hashes, etc. Repudiation threat is mitigated with non-repudiation, strong third parties etc. Information disclosure: You have confidentiality, like ACLs. DOS, you have availability. Elevate of Privilege you have authorization.
[EDIT" Won't take notes on the castle example, so i can enjoy it. Maybe Michael could publish that on his blog as an example!]