Ethics versus economics for security research

Independent security researchers often have a reputation as narcissistic vulnerability pimps (true or not), but the environment which has evolved around information security largely drives this. This came to a head for me tonight in a Twitter discussion kicked off by Steve Werby:

Creating an exploit can often pay anywhere between 1k and 100k (or possibly more in specific circumstances), depending on the researcher’s choice of market and product (or technology). This even affects areas that many users believe unrelated, like mobile OS jailbreaks, which essentially consist of exploits to gain root control despite the operating system’s best efforts to the contrary.

No equivalent market exists for threat-related research. Freelance malware analysts don’t have similar economic drivers because organizations with an interest in this information generally do the research themselves. You can’t monetize malware or attribution the same way. Put another way, nobody believes that Krebs and Danchev get rich from what they do. I don’t think we can “fix” this with the market, although I’d welcome discussion of ideas or evidence to the contrary. But we need to recognize this when thinking about issues around software security and threat identification.

Believing that security, on its own, adds value often turns into a form of the broken windows fallacy. And creating artificial demand for threat intelligence could lead to all sorts of perverse incentives. Some of the same organizations interested in purchasing vulnerabilities and exploits might have an interest in highly-focused intelligence, such as espionage on particular threat groups, but at this point the line between “offense” and “defense” becomes very fuzzy.

I’d love to hear alternate viewpoints and suggestions on where this can go.

Pizza with a bad taste: BHEK intel

pizza failI got some spam today that made me hungry (even after eating real spam so many times as a kid).

You've just ordered pizza from our site

[snipped yummy but long listing of pizzas and drinks including crappy beer]

If you haven’t made the order and it’s a fraud case, please follow the link and cancel the order.
CANCEL ORDER NOW!

If you don’t do that shortly, the order will be confirmed and delivered to you.

With Respect
AZZO`s Pizzeria

However, I wasn’t really worried about the fraud possibility, so I decided to ignore the spam and instead to take the opportunity to run the URL through thug. It performed spectacularly well, grabbing the page, finding the exploits (at least some of them, anyway), and keeping everything neat, orderly, and secure.

hxxp://sweety-angel[.]de/local.htm redirects to hxxp://gimalayad[.]ru:8080/forum/links/column.php, which loaded a Java applet, a Flash file, and two PDF documents. At the time I ran them, VirusTotal hadn’t seen them before but a few engines identified the PDFs and the Flash file as part of the Black Hole Exploit Kit. I found the use of old Adobe Reader vulnerabilities (2010 vintage) a little humorous. Contact me via Twitter or email if you’d like the actual files. I published the IOCs as a Google Doc for reference.

Brain dump of DFIR and network security research ideas

Maybe I could get more of these done with this.

Maybe I could get more of these done with this.

I’ve seen several people talk about lacking ideas for research projects, often around DFIR or network security. Personally, I have the opposite problem: endless ideas for projects, often with the barest hint of a start, but not enough time to pursue them all. So I thought I’d publish a bit of a brain dump. I actually have made good progress on a few of these, and I have concrete plans around others (beyond just “wouldn’t it be cool if…”), but in any case I’d love to see other people pick them up and run with them.

If you do happen to get interested in any of the following, I wouldn’t mind a quick note to touch base to see about possibilities for collaboration or at least an acknowledgement in whatever you publish. Don’t interpret that as any sort of requirement, though; ideas have no value without execution, so all the hard work hasn’t even begun.

  • Malware
    • Classification across a large corpus
    • Automated IOC extraction and publication
  • Threat Actors
    • Profiling systems, particularly based on OSINT
    • Underanalyzed crime groups (e.g. drug cartels involvement in malware, spam, and fraud)
    • Hacktivism motivations and methods
  • Passwords
    • Cracking lab setups
    • Useful entropy calculations
  • Quantitative analysis of incidents
    • DDOS attacks (hard to get numbers on these)
    • Defacements and low-level leaks
  • Active Defense
    • Honeypots and honeyclients
    • Vocabulary or taxonomy on various methods
    • Callback Trojans in documents
    • C2 / RAT vulnerability research

Maltrieve: retrieving malware for research

ThreadsAs I continued to hack on mwcrawler over the last month, I found that it didn’t really meet my needs for various reasons: slowness, difficulty of maintaining and adding sources, repeated grabbing of the same URL, and lack of response from the original author. So I’ve rewritten it and released Maltrieve, which (as the name indicates) retrieves malware directly from the sources listed at a number of sites. Improvements listed in the README include:

  • Proxy support
  • Multithreading for improved performance
  • Logging of source URLs
  • Multiple user agent support
  • Better error handling

Right now, Maltrieve only looks at four meta-sources because two of the six in mwcrawler appear offline. But I have at least four more on deck, and mwcrawler didn’t parse all of its meta-sources correctly in any case. I also know of a few bugs that I haven’t figured out how to squash yet, but the core functionality works and it needs a broader audience to bang on it. Thus, I’ve tagged this version “beta-1″. Don’t rely on this for serious production, please.

If you use it, please let me know just so I can bask in the warm glow of productivity. The project itself remains under the GPL, of course. Suggestions, bug reports, etc. also would make me happy, whether via issues and pull requests on Github, contacting me on Twitter, or comments here.

Konig: malware, graph theory, and fuzzy hashes

As a small personal research and learning project, I spent a few hours this weekend writing Konig. This is intended to evolve into a framework for investigating relationships between fuzzy hashes (e.g. a corpus of malware gathered with mwcrawler) using graph-theoretical methods. Underneath, it basically just marries NetworkX and ssdeep.

At the moment, the code is fairly barebones: create the hash library based on files in a particular directory, then construct a graph of the relationships between those files where the similarity exceeds a user-specified threshold. Also, please keep in mind that my Twitter bio for a while just said “I write bad code”, and for good reason: I do. The GUI purely consists of a matplotlib window and needs a lot of work. (I have less experience with interfaces than almost anything, so keep your expectations even lower). I’ve added some very basic information on the properties of the graph (order, density, etc.), as well as the ability to select the connected component that includes a node (file) of interest.

Example output:

kmaxwell@gauss:~/src/konig$ python konig.py -d ~/data/mwcrawler/unsorted/PE32 -t 90 -i PE32.json
Loading saved hash database
Calculating fuzzy hashes for all files in /home/kmaxwell/data/mwcrawler/unsorted/PE32...
Creating graph structure for files with similarity >= 90...
Name:
Type: Graph
Number of nodes: 2932
Number of edges: 265625
Average degree: 181.1903
Graph density: 0.0618185990375
Preparing plot of graph structure...

Konig screenshot

The goals here include refreshing my knowledge of graph theory, as the last time I seriously studied this stuff, I think the OJ Simpson verdict hadn’t come back. Also, this code will help pave the way for some related work I have slated to use mwcrawler and vxcage together. In fact, I really think of Konig as a proof-of-concept implementation to throw away before doing something more useful and robust.

Getting into the guts of mwcrawler

Earlier this week, my buddy Ken Pryor mentioned a project with which I had no prior familiarity:

So I went over and dug into mwcrawler. From the project README:

mwcrawler is a simple python script that parses malicious url lists from well-known websites (i.e. MDL, Malc0de) in order to automatically download the malicious code. It can be used to populate malware repositories or zoos.

It turns out that it really is pretty simple and hackish, which fits my needs perfectly. This is all a very experimental side project just to keep me amused during the (relatively) cold weather here in Texas.

Given how much I already love Github, I forked the project, then made a few improvements to allow for the use of a proxy (for OPSEC reasons) and to specify a dump directory from the command line. Requiring the user to modify source just to change config options works fine for alpha, but a little bit of polish goes a long way. I’ve also started implementing some logging to keep the metadata (like source URLs for each file). And yes, I’ve submitted pull requests, but neither mine nor the user agent randomization patch from Ben Jackson have gotten any response from the project owner. Hopefully that will change now that the holidays have finally run their course.

Now once I have all this data, I wanted to do something with it. Just for messing around, I went with the old standby of ssdeep to find relationships. That doesn’t mean it’s a final step at all; this weekend, I’ll run them through VirusTotal API, for example, to classify known samples by hash, and perhaps also incorporate something like pyew for clustered analysis to pull out interesting features. And it features integration with thug, which I’ve not started running yet. Some bugs still exist, like unhandled exceptions when the script can’t reach the page or dependence on the semi-deprecated Beautiful Soup 3.

But my current tiny little repository includes 227 MB in 344 PE32 executables (not counting other file types like archives and such). As an extremely simple preview, even basic fuzzy hashing as mentioned above creates some interesting clusters (graph generated with awk and Maltego):

mwcrawler-ssdeep

Book Review: Challenges in Intelligence Analysis

I have always believed in the value of interdisciplinary studies. Specifically, I like to examine approaches taken in superficially-dissimilar fields where the underlying problems or useful solutions have stronger connections to those on which I work when examined more closely. For example, nearly 10 years ago I read Level 4: Virus Hunters of the CDC and found a number of useful lessons for combating malware outbreaks and dealing with large-scale incidents.

Challenges in Intelligence AnalysisMore recently, my interest has turned to applying lessons from intelligence analysis. This isn’t much of a reach, truthfully, because those of us working in infosec (“cyberintelligence”) frequently do the same work as those in military intelligence and related agencies. As part of this effort, I recently finished reading Challenges in Intelligence Analysis by Timothy Walton (ISBN 0521132657). Out of all the books I’ve read recently on intelligence, this offered perhaps the most direct application in any number of fields (including mine). I read the Kindle edition, so I can’t say much about the quality of the printing, readability of the text, or appearance of the figures.

The structure makes it particularly straightforward to read. After the initial chapters dealing with challenges and solutions in somewhat general and abstract terms, Walton runs through nearly 40 case studies ranging from the Israelite spies in Canaan (as recounted in the Book of Numbers, chapter 13) to George Washington to the pre-WWII Luftwaffe to Aldrich Ames to Aum Shinrikyo. Apart from the history lessons, each case study examines the intelligence analysis techniques used and discusses what could have possibly improved upon the approach. “Questions for Further Thought” provide utility for classroom settings or those simply interested in taking the time to structure their thoughts in response. Each case also has a recommended reading list, which I find particularly useful because a number of historical cases have striking parallels in current situations (beyond their own intellectual appeal).

For example, Chapter 10 “Estimating the Strength of the Luftwaffe in the 1930s” immediately resonated with me in thinking about challenges regarding ‘cyberwar’ with China and understanding their strengths. The same challenge would apply in looking at the US, I’d think. And Chapter 17 “Counterinsurgency in Malaya” has a number of connections to the US’ recent conflicts in Iraq and Afghanistan, something not lost on General David Petraeus and Lieutenant General James Amos when they wrote the new Counterinsurgency Field Manual.

Several techniques appear frequently in the text. It does not limit discussion to easily-understood tools like timelines, flow charts, and matrices. Walton also reviews link and network analysis (particularly applicable in cyberintelligence), analysis of competing hypotheses, indicators (sound familiar?), and red teaming. This latter goes beyond a simple penetration test to emulate the tactics, techniques, and procedures of specific adversaries. Decision trees and especially scenario analysis also recur throughout the case studies. Cognitive biases also play a significant role in the discussions, especially confirmation bias, groupthink, and even hindsight bias given the context of the book.

A few of the case studies seem a little rushed. Even when we have less data on the situation for historical review, Walton doesn’t always take the opportunity to explore analysis techniques in greater detail. Related to this, a few case studies seem a little forced (“Sun Tzu” has a lot to say about intelligence analysis, but he isn’t a case study per se). And I would have liked a little more description on why he recommends certain books for further reading, especially in the general (non-case-specific) list at the end of the book.

In general, I highly recommend this book to anyone with an interest in intelligence analysis, world history, or critical and analytical thinking.

A version of this review also appears on Amazon.

DFIR fundamentals with Mandiant updates

Chew-bach-a

Chewbacha revisits the classics

Today, I had the opportunity to listen to the latest installment of Mandiant’s web series “Fresh Prints of Mal-ware”: The Nutts and Boltz of APT Persistence Mechanisms, hosted by Chris Nutt and Jason Rebholz. (The puns are strong with this one!)

The first part of this discussion consisted of some DFIR fundamentals, like looking at the file system timeline. This should include all eight time stamps in Windows / NTFS (file times and system information metadata). Rather than just start “looking for evil,” the investigator needs to start with a question. My favorite, where applicable, is to look at all system activity around the time of whatever other suspicious activity caused me to look at the system in the first place (e.g. network traffic). Another colleague mentioned using Splunk for forensic timeline research. I’ve not used this technique myself but the concept is solid.

The second part discussed persistence mechanisms in more detail, like autoruns and the various locations. On Twitter, the #m_fp discussion pointed me to two resources, one from Silent Runners and another from Trusted Signal. But they spent a good amount of time on DLL search order hijacking also, given that it doesn’t get a lot of attention but they’ve seen it in use by targeted (as opposed to opportunistic) malware.

I think this approach of revisiting fundamentals with a few new twists to keep things fresh works really well, and I hope to see more of this sort of thing from Mandiant (and whomever else!) in the future.

Threat intel sharing with OpenIOC

Indicator of Compromise by Kool-Aid Man

Mandiant recently announced OpenIOC, “an extensible XML schema that enables you to describe the technical characteristics that identify a known threat, an attacker’s methodology, or other evidence of compromise.” For example, you might have an IOC listing something as simple as a set of MD5 hashes and file names, or as complex as descriptors of the structure of a particular executable (PE file. The schema includes terms for network indicators as well, like URIs, IP addresses, and strings in network traffic.

Those of us who react to threats every day already know we need to get better at sharing threat intel and acting on it quickly. A number of industry and other organizations exist that help get these data out to folks who can use it, but often the intel comes in the form of a human-written. This means that systems can’t parse the data easily, and in fact the communication sometimes has significant ambiguity on it. When systems and tools can’t parse the data, not only does that introduce delays into the detection process, it also makes validation difficult. So sometimes we get notified of malware with the MD5 sum “d41d8cd98f00b204e9800998ecf8427e” (the hash of the zero or null string), or of “http://google.com?webhp&hl=en”. Both of these have happened to me in the last few months, and while that’s simple human error, allowing tools to do some basic sanity checks would help with this.

This, of course, shows up the weakness with OpenIOC: a classic chicken and egg problem. The XML files don’t serve much purpose until tools can read them, but at the moment the only tools that can read them come from Mandiant: their enterprise commercial product MIR and the free no-cost IOC Finder. (Note that, while OpenIOC is released under the Apache 2 license and therefore qualifies as ‘free software‘, the same does not hold true for IOC Finder.)

For OpenIOC to work well, we need more tools and responders to support it. That could start with truly free tools like Splunk, Sleuthkit, and Snort, but I’d like to see large commercial tools like Arcsight, EnCase, and Sourcefire incorporate it as well. This applies as much to producing IOCs as it does to consuming them, by the way: if FireEye’s malware detection and analysis tools could export an IOC, detection across the network would become much more straightforward. But Mandiant, as much as I love many of the people who work there, has sort of a NIH problem: they like to blaze new trails and do cool new stuff, but working with other vendors has always seemed to stymie them as far as I can tell. Hopefully Doug Wilson, the new point man on OpenIOC, can turn that around.

OpenIOC can solve a key problem, but we will see whether anybody actually uses it to do so.

Theory versus practice: threat-centrism

Al Gore: THAT IS AN IMPROPER USE OF INTERNET TECHNOLOGYI currently work in a threat-centric role, in the sense that we detect and respond to threats as they occur. We handle malware, log analysis, and network & system forensics. So I use “threat” in a concrete sense: bits that represent the actions of outside parties who may do harm to our enterprise.

At the same time, many security roles (including an opening I’m considering at my company) focus on an “information security architecture” team. These roles often handle vulnerability assessment, data leakage prevention, and general issues of design, planning, and policy. Note that the incident response team usually exists separate from architecture, which is where I have to make some private assessments.

I’ve started taking the advice of Greg Pendergast by “assessing, to the extent possible, whether you could make this new position your own by working in the threat-centric aspects.”

This concept strikes me as really interesting: how do we work real threat data into architecture? This differs in important ways from threat modelling, in which we design systems to counter different possible threats. In theory, theory and practice are the same, but in practice, they’re completely different.

I’ve got some ideas of how that could work specifically in our enterprise, but generalized answers might be worth considering as well. For example, how do organizations handle the sharing, both inbound and outbound, of threat data? Who handles the overall architecture of security monitoring systems? What log data can you get that analysts may not even realize exists (or could exist)?

The ideas have started to flow and I look forward to seeing what happens next.