The cautionary saga of Log4Shell – Part 1

Developers
Devops
Testers
Java

log4j is a popular Java logging component. Let's deep dive into the vulnerabilities that put it in the crossfire recently.

2021 ended with a massive bang: the Log4Shell vulnerability in log4j, a very widely used Java logging component has compromised countless devices, servers, cloud infrastructures, causing immeasurable damage and exposing 93% of all cloud environments to risk. Let’s take an in-depth look at how this all happened. But to understand Log4Shell, we first need to understand what made it so special.

If we look back at 2021, it was a rather harrowing year for software security – there were over 20000 new vulnerabilities discovered over the year, continuing the unfortunate sharp upward trend. They marked over 2500 of these as critical, with a CVSS score of 9-10, which is usually an indicator of remote code execution (RCE) on the affected system. Of course not all high-CVSS vulnerabilities imply code execution, some can compromise security in other ways: for one example, check our writeup from last year about a critical SSRF vulnerability in Apache httpd).

And out of these 2500 critical vulnerabilities, one vulnerability from the end of the year stood above all others: CVE-2021-44228 or ‘Log4Shell’ (sometimes also called Logjam, though that can be confused with a certain cryptographic attack). But as we’ll see, this story is not just about this single vulnerability – in fact, it tells us a lot about attack surface management, secure software design, and defensive programming.

In this first article, we’ll deal with the vulnerability itself, and how it got into log4j. But stay tuned – in subsequent articles we’ll show why this was such a huge problem, and how one could exploit it.

UPDATE! You can already access the subsequent articles here: Part 2 and Part 3.

The birth of a (vulnerability) unicorn

Ever so often, an RCE vulnerability will be a zero-day: exploitable by various attackers before anyone could patch their affected systems (or even before the developers have a chance to create such a patch). And then out of all zero-day vulnerabilities, some will be in some sort of widely used software component or even operating system that expose potentially millions of targets to the attackers, causing massive damage and making international headlines. There may be just a handful of these each year, but they’re all notorious enough to have actual names. Just a few examples: Shellshock (2014), Dirty COW (2016), EternalBlue leading to the infamous WannaCry and NotPetya ransomware (2017), Drupalgeddon 2 (2018), BlueKeep (2019), SMBGhost (2020)… and now Log4Shell (2021).

For contrast, consider Stagefright, a set of vulnerabilities from 2015 which allowed full remote compromise of Android phones by sending them a malicious MMS. It didn’t make this list because wide-scale exploitation just didn’t happen – probably because an exploit tool was only available 5 months after the vulnerability’s publication, and most affected systems were patched by then.

The reason Log4Shell is on this list is because it’s a critical RCE vulnerability in log4j, a widely used open-source component for Java. Even if a Java developer wasn’t using log4j directly, they were most probably using a component that did – and then all an attacker needed to do was to send the right input and force the program to log it (e.g. by triggering an error state).

Log4Shell was exploited in a wide variety of targets, from web applications to video games like Minecraft… and even reverse engineering tools like Ghidra. As we know from the Java installer splash screen, “3 billion devices run Java”. Now that’s a pretty big attack surface!

log4j log4shell
Logo from: https://www.lunasec.io/docs/blog/log4j-zero-day/

What if I tell you that log4j has been vulnerable for 10 years?

The story starts with functionality that doesn’t sound too crazy: interpolation (basically: text substitution) in configuration file strings, called Lookups; a kind of templating. For example, it’s possible to add the current Java runtime version to the beginning of every log file by adding the following section to the configuration file:

<File name="Application" fileName="application.log">
  <PatternLayout header="${java:runtime} - ${java:vm} - ${java:os}">
    <Pattern>%d %m%n</Pattern>
  </PatternLayout>
</File>

Nothing out of the ordinary so far. At first glance this may raise the question of a malicious lookup string that forces log4j to read a sensitive file or environment variable and write it to a log file. But that is only an issue if the attacker can modify the log4j configuration file and read the log file. And if they can actually do that, we have much bigger problems anyway.

But there’s a twist. When we try to find where this functionality is actually implemented in log4j, we find a class called StrSubstitutor (note that we intentionally linked the JavaDoc of the still vulnerable version here) that performs these text substitution methods in a recursive manner. And even though the documentation is talking about text substitution in configuration files only, there are quite a few methods in that class that actually take LogEvent parameters… wait, are these substitutions also performed on log entries? Possibly even on data that comes directly from the user?! Yep, this seems to be the case.

At this point, alarm bells should be ringing like crazy, because this is a classic example of an injection vulnerability! If the attacker can get the program to log their input, they can potentially force the system to do something nefarious. Say, if our code did logger.log(evilUserInput), where evilUserInput was set by the attacker to e.g. “${java:/comp/env/jdbc/Database}”, they could potentially force the code to log database credentials. This is clearly bad, even if the attacker (hopefully) can’t access the log file easily to read them out.

What’s even worse is that there is a common pattern for dealing with injection problems: separating the context from the user-provided parameters so user data cannot change the context in which it’s interpreted. Thus, logger.log(“User ” + userName + ” did something bad”) is obviously vulnerable code, but logger.log(“User {} did something bad”, userName) does not seem to be vulnerable, since we expect that log4j would not interpolate the string any further after the substitution of userName for the placeholder. But due to the recursive implementation of the interpolation in log4j, these two lines were actually equivalent. Consequently, even developers following the aforementioned secure coding best practices could end up with vulnerable code! Even worse: attackers can actually use this to evade security software such as web application firewalls and Intrusion Detection System as long as the attacker can control the format string. For example, logger.log(“{}{}{}{}”, “${ja”, “va:run”, “time”, “}”) will evaluate ${java:runtime} in the end, which may be an indication of something nasty going on, while an IDS will most probably not detect this as a dangerous input.

And when was this text substitution function enabled for log entries? It has actually been there for over 10 years before the vulnerability was discovered in 2021, even before the release of log4j 2.0.0 – specifically October 19, 2011. If you’re interested in a deep-dive on the origins of the vulnerability and how it flew under the radar for a decade, this article is a great read.

The bottom line is: we had a very bad vulnerability in log4j, a commonly used logging component. Attackers could force our logging code to do something nasty and dangerous simply by making the application log something they pick. They can create such a log entry by intentionally triggering an error situation with their input containing a malicious lookup string that could “hurt” log4j. This input would be logged, and then their malicious lookup string would get interpolated.

As shown above, the vulnerability is a classic example of injection; it can be considered a type of template injection, to be more specific. Best practices against injection are well-known at this point, along with cheat sheets and guidelines – but the design of the text interpolation in log4j actively worked against some of these practices. Injection is one of the most common secure coding mistakes and we discuss that topic in literally all of our courses; check them out in our catalog.

In the next part of the article, we’ll show how one could exploit this vulnerability. As you will see, it’s not trivial, but it isn’t rocket science either.