09.29.14
2013 Winners
There were over 40 submissions this last year, which was a lot to judge (especially since some submitters do not provide any hints about their bugs!) It also resulted in a lot of really great hacks, too many to simply list as a few runners up. This summary is considerably longer than in previous years.
As you will remember (click the "THIS YEAR" tab if you don’t,) the object of the 2013 Underhanded C contest was to write a function to compute the DERPCON — Degrees of Edge-Reachable Personal CONnection — between two users in a social network. Essentially the programming task is to find the shortest distance between two users in a friendship graph.
The malicious goal is to write the code so that it mistakenly outputs a low DERPCON value from your account to others, granting you unwarranted access to people who have not "friended" you. Extra points were awarded for asymmetric bugs, meaning that DERPCON(you,me)==1 while DERPCON(me,you)>1; for bugs that can be triggered without insider access to the database; bugs that are relatively platform-independent; bugs that are spiteful or humorous; and of course bugs that are very well hidden, since the name of the game is creating an error that passes visual inspection of source.
Common Themes
A number of attacks fell into a few broad categories, that I summarize below:
Cause weird behavior if an attacker is his or her own BFF
In this attack, being your own buddy causes a naive graph search to terminate. Engineering this sort of misbehavior requires a bit of thinking, but there are apparently several ways to accomplish the same thing.
Unless I missed a few, this was used by submitters Tomas Bouda, Kieran Elby, Christoph Franck, and Corentin Plouet.
Weird behavior when UID is out of range
A userID of zero, or less than zero, or above some #defined maximum (naughty!) can be exploited in many clever ways.
One submitter, Alex Bassas, wrote the following function to match users:
int same_users (user *x,user *y) {
if (x->user_ID>=0) {
if (x->user_ID == y->user_ID) {
return (TRUE);
}
return (FALSE);
}
}
Do you see the bug? It looks like an invalid userID for x causes same_users to return FALSE, but the final return command is one line too high. If the userID is negative, the function ends with a implicit return value that is nonzero.
Bad hash values
Several submitters use a negative index to trick the software into overwriting variables on the stack. This can be achieved by defining a hashtable with a lousy hash, where the right username causes an unexpected negative value.
A lot can be done with that bad hash value, and several specific examples are in the runners up, below.
Preloaded scratch values
The scratch field in the structure is there in case somebody wants to mark users as visited, or store temporary data such as distances from the root user. Several submitters provided implementations that assume zero scratch values, so a scratch somehow set to an evil value will result in a DERPCON of 1.
It is not clear how one would set that scratch field externally. There are several related submissions where an attacker has to alter some field internally, such as changing the attacker’s UserID to the victim’s UserID.
Fun with Serialization
There were a few serialization tricks. Submitter Arthur O’Dwyer writes usernames and IDs into a string, using string functions to find if two users are the same.
void user_serialize(char *buf, size_t bufsize, user u)
{
snprintf(buf, bufsize,
"%s:%s:%d:%d",
u.name, u.account_handle, u.user_ID, u.number_of_BFFs);
}
The attack then consists of creating a username in the same format as the serialized data, i.e. ":handle:5"
This is vaguely similar to a format specifier bug, and in fact, an entry by Frederic Brault employs a format specifier bug:
//Pretty print a line in the logs
printf("[DERPCON] : ");
printf(x.name);
printf(" <=====> “);
printf(”%sn”, y.name);
A user account named fred%21$n triggers the bug. This earns points for spite, since the bug is made possible by writing data to a log file.
Another submitter, William Edwards, uses an AA tree structure where nodes are labeled with usernames; the label "root" is used internally, so a user named "root" causes exactly the bad behavior we want.
Overflow
A few entries used unsigned char values to store distances that could then be overflowed by a linked chain of 256 or more dummy accounts. An entry by Jens Nyman creates an NxN adjacency matrix (of unsigned char values) and uses the Floyd-Warshall algorithm to precompute the minimum distance between every user. Sitting at the end of a very long chain of dummy accounts, you can be mistakenly measured at a distance 1 from a target user.
Honorable Mentions and Runners Up
Some of these entries almost won. A couple of them were not finalists, but are worth mentioning because they did a funny thing that I liked.
Daniel Hartmeier
This one is too visible, but I rarely ever see people exploit trigraphs:
int DERPCON(user x, user y)
{
static int m = 0;
user **q = NULL;
int r = 0, w = 0, i;
// 2013-04-01 dhartmei: split scratch 29-to-3 bits???/
/* You have a strange feeling for a moment, then it passes
* (&q+10)=='@'?!(0??(*(&q+14)??)=&q+16):0 ;/*~
*/
… there’s really no way to conceal that wacky line, and once caught it is obviously intentional; but if you compile with an option that enables trigraphs (the submission was compiled with -std=c99,) the ??/ becomes a backslash, uncommenting the wacky line—-which causes fun behavior when the userID==‘@‘, or 64.
Jon Szymaniak
This submission used the following code to compare users:
#define valcmp(x, y) ((x) - (y))
#define cmpfn_user_ID valcmp
#define cmpfn_number_of_BFFs valcmp
#define cmpfn_ valcmp
#define cmpfn_name strcmp
#define cmpfn_account_handle strcmp
#define compare(x, y, field) cmpfn_##field((x)->field, (y)->field)
#define compare_ptr(x, y) valcmp(x, y)
static inline int is_different_user(user *x, user *y)
{
int ret = FALSE;
ret |= compare(x, y, user_ID);
ret |= compare(x, y, number_of_BFFs);
ret |= compare(x, y, name);
ret |= compare(x, y, account_handle);
/* However, if the pointers are the same the user must be consistent. */
ret &= compare_ptr(x, y);
return ret;
}
This conflates bitwise with logical operations. The compare() functions output 0 if equal, and ret accumulates a nonzero value if two users are not identical; but ret&compare_ptr(x,y) may still be zero even if x and y are different pointers.
To exploit this, one must create a user account that sufficiently mimics a target user, and then ensure that this account is in such a position in memory that the final bitwise and masks out all the differences. That is not too impossible, if for example you can create a large block of consecutive dummy accounts.
Dan Jackson, Gaëtan Leurent, Linus Akesson
These three submissions are similar enough in spirit that I group them together.
The first two of these entries exploit a fact about 2s-complement arithmetic that we’re all supposed to know, but which is easily forgotten: INT_MIN its own 2s-complement. In C, neither -INT_MIN nor abs(INT_MIN) has a defined value, but the way it is often implemented, -INT_MIN evaluates to itself, and abs(INT_MIN) comes out negative.
In Mr. Jackson’s entry, DERPCON(X,Y) is computed by growing a set of users reachable from X, marking each user’s scratch value with a positive number; while growing a similar set reachable from Y, marking each user’s scratch value with a negative number. As soon as one user’s set has an account with the wrong sign, we have a minimum path from X to Y.
The code starts with these lines to initialize X’s set to {X} and Y’s set to {Y}.
/* Add user X to expansion nodes. Use scratch as a 'back pointer' so one of the shortest BFF paths can be recreated. */
x->scratch = abs(traceID(x)); /* Force to be positive to mark this as an expansion from user X */
queue_add(&expansionX, x);
queue_add(&scratchCleanup, x);
/* Add user Y to expansion nodes. Use scratch as a 'back pointer' so one of the shortest BFF paths can be recreated. */
y->scratch = -abs(traceID(y)); /* Force to be negative to mark this as an expansion from user Y */
queue_add(&expansionY, y);
queue_add(&scratchCleanup, y);
…the positive/negative value is traceID(x), which is defined as:
int traceID(user *usr)
{
int value = stringhash(usr->name); /* Trace using a hash of the user's name */
if (value == 0) { value = 1; } /* Cannot allow zero values, reserved for 'unvisited' nodes */
return value;
}
…so, a bad hash. A bad hash outputs INT_MIN for just the right username, triggering the bug.
This is a bit difficult to justify, but the trace information is used instead of a +-1 in order to provide some unspecified debugging value. In any case, the attack takes place in the very first line: if traceID(x) is INT_MIN, then X begins with a set containing a negative value, causing an instant match.
Mr Leurent’s submission also uses a bad hash that can output INT_MIN with the right username, but this is exploited differently. The hash is as follows:
#define HASHSIZE 151 // 151 is a prime number
…
// FNV hash: http://www.isthe.com/chongo/tech/comp/fnv/index.html
int hash (user *x) {
int h = x->user_ID;
for (char *p = x->name; *p; p++) {
h = 16777619 * (h^*p); // FNV prime
}
// Reduce safely in [0..HASHSIZE-1]
h = abs(h) % HASHSIZE;
// Extra sanity check
if (h < 0 || h >= HASHSIZE)
h = 0;
return h;
}
Note the use of a legit hash function, at first. But if h comes out as INT_MIN, then taking abs(h)%HASHSIZE results in -2. This is hidden rather well by the extra sanity check to double-ensure that the hash is in range. How does the -2 get past that final sanity check? Because an optimizing compiler removes that check as redundant, since h is already an absolute value modulo HASHSIZE. Ha!
(This is not a bug in the optimizing compiler, by the way: h comes out as -2 from evaluating abs(INT_MIN), which incurs undefined behavior. The optimizer assumes well-defined behavior.)
And what is the hash used for? This data structure with a hashtable:
struct state {
struct cell *data; // Growing array of nodes
int allocated; // Current size of the array
int used; // Index of next available cell
user *target;
int derpcon; // Updated when target user is added
int next_out; // Queue pop index
int buckets[HASHSIZE]; // Hash buckets: list of nodes (indexes in `data')
};
A malicious user with just the right name is created, and BFF-linked to your account. Calling DERPCON(you,anyone_else) causes that user to be visited, with the lines
int h = hash(x.user);
…
s->buckets[h] = s->used++;
Which overwrites buckets[-2], the derpcon field, with a 1.
Linus Akesson doesn’t use the abs() trick, but does use a hash that can be manipulated into returning -1 for
a given username, to overwrite a variable on the stack. The beginning of his DERPCON function reads:
int DERPCON(user x, user y) {
hashtable_t hashtable;
int distance = INFINITY;
…so that the hash table sits on the stack just after the distance variable. The hash table has this structure:
typedef struct entry {
value_t value;
char *key;
struct entry *next;
} entry_t;
typedef int (*hashfunc_t)(const char *key);
typedef struct {
/* Useful for profiling. */
int collisions;
/* User-supplied hash function. */
hashfunc_t hash;
/* The first entry of each chain. */
entry_t table[BUCKETS];
} hashtable_t;
So counting backwards, table[-1] consists of the three-element "entry" consisting of the hash and collisions fields from the hash table, and the distance field in DERPCON, which is therefore overwritten. This can be exploited by BFFing a bogus account whose username hashes to -1, triggering the bug when the bogus account is visited.
Simon Nicolussi
This one still has me confused. The heart of the DERPCON function is a bog-standard breadth-first search using a queue, initialized with user x:
// initial push:
x.scratch = 0;
push(&q, &x);
// run breadth first search algorithm (with maximum depth DERPCON_MAX):
while (!empty(&q) // no more edges to traverse or
&& !EQUAL(peek(&q), &y) // path to target node found or
&& peek(&q)->scratch < DERPCON_MAX) // maximum path length reached?
BFF_enqueue(&q, pop(&q));
Keep queueing BFFs as long as the queue has people, the next person is not the target, and we haven’t yet reached the maximum DERPCON distance. As the code suggests, the scratch field of each queued user is set to the distance from user x. EQUAL is a macro:
#define EQUAL(x, y) ((x)->user_ID == (y)->user_ID)
And then it gets weird: after this loop ends, the queue may be empty, or the user may be found, or a maximum path length may be hit. This code wraps it up:
if (empty(&q)) // no possible path to target node, set scratch to maximum
x.scratch = DERPCON_MAX;
else if (!EQUAL(peek(&q), &x)) // no (needless) memcpy on congruent memory
memcpy(&x, peek(&q), sizeof(user)); // copy whole struct for debugging
free(q.data);
..wait, what? Okay, so the structure for user x is now used to hold the result. If the queue is empty, x.scratch is set to DERPCON_MAX. If not, the first user in the queue is either user y, or the first user of maximum distance; either way, copy that onto x—-unless the first user in the queue is x, in which case there is no need to perform a copy.
So where’s the bug? The bug is that if the next person in the queue is user X, you still need to copy its data into X, because the user X in the queue is a different hunk of memory from the user X being used to hold the result. This code essentially conflates two different notions of equality: equality by UserID, and equality of addresses.
What this means is that you can hack the system by creating 6 user accounts BFF-linked to each other in a circle. When we call DERPCON(you, anyone_else), the queue searches around the circle and stops when DERPCON_MAX is hit, just as you are again in the front of the queue. Because you are you, you are not copied onto you, even though you in the queue are not you but the you in the queue, which means that you.scratch (initially 0) is never changed.
The function then returns
return MAX(1, MIN(x.scratch, DERPCON_MAX));
…which I frankly think is a tiny bit contrived, but turns a 0 into a 1.
James Stanley
This uses an uninitialized variable, but in a sophisticated way. It starts with a routine to validate a user account:
bool validate_user(user *u) {
int n_bffs, id;
char *name, *handle;
if (u) {
n_bffs = u->number_of_BFFs;
id = u->user_ID;
name = u->name;
handle = u->account_handle;
return (n_bffs >= 0) && (id >= 0) && name && handle;
}
return false;
}
The uninitialized variable is in the next function, to enqueue a user:
bool enqueue_user(set &visited, user_queue &queue, user *u, user *target, bool valid, int thisdist) {
int uid;
if (valid) {
uid = u->user_ID;
if (visited.count(uid) == 0) {
queue.push(u);
u->scratch = thisdist;
visited.insert(uid);
}
}
/* only continue the search if this isn’t the target user */
return target->user_ID != uid;
}
The DERPCON function includes the following code at the beginning:
if (!validate_user(&x) || !validate_user(&y))
return 0;
/* search for the target user */
user *u = &x;
int d = 1;
while (1) {
for (int i = 0; i < u->number_of_BFFs; i++) {
bool valid = validate_user(u->BFF_list[i]);
/* possibly enqueue this user; terminate if this is the target */
if (!enqueue_user(visited, queue, u->BFF_list[i], &y, valid, d))
return d;
}
Assuming a user is valid, the uid variable is initialized and nothing special happens. If a user is invalid, then the uid is … what, exactly? Well, since validate_user was the previous function call, uid contains stack detritus from validate_user. Specifically, it contains the previous value of the id field in validate_user—-this is why validate_user is so pedantic about assigning the user’s fields to local variables. This is a sneaky way to hand over data from one function to another without any visible evidence in the code (except for the uninitialized variable, which is tagged with a warning when compiling.)
So how does this admit an attack? It admits an attack if you have an account whose first BFF is a NULL pointer. Then validate_user(&x) leaves x’s user_ID on the stack, validate_user(&y) leaves y’s user_ID in the same place, validate_user(u->BFF_list[0]) does nothing, preserving y’s user_ID from the previous call, and enqueue_user has its uid field set to y’s userID. This causes it to return 0 in the last line.
This is also a trick that has been used in previous challenges: Natori Shin’s entry to the 2005 contest used a partially uninitialized array filled with detritus from a stat() call. Unfortunately, requiring a NULL pointer as a BFF doesn’t really allow an attacker to commit the exploit without some insider access.
And the Winner
This was by no means an easy choice, but a very close decision between several brilliant submissions.
Alex Olson
A fun and obscure method of tampering with memory, instead of writing past an array bound, is to mis-define a function prototype. For example, the winning entry for the 2007 challenge declared the time() function as extern time_t time(void), so that time() would be called without an argument, and would alter the item on the stack where an argument should be.
In this submission, we have another version of this wonderful exploit. The DERPCON function starts by testing if a user account has been invalidated by some moderator—-an account is invalidated by changing its handle to the literal string "!!INVALID!!":
int DERPCON( user x, user y ) /* public routine for finding distance between two users */
{
state_t state = {0,0,NO_PERMISSIONS};
ValidateAccount(y, &state.y_valid);
ValidateAccount(x, &state.x_valid);
if( !state.x_valid || !state.y_valid ) return NO_PERMISSIONS;
…
The source file uses this prototype:
void ValidateAccount(const user user, int *isEnabled);
Whereas a separate utility file instead has:
void ValidateAccount(const user user, long *is_valid)
{
if( is_valid !=NULL ) *is_valid = strcmp( user.account_handle, "!!INVALID!!" );
}
The result of validation is stored in a long rather than an int, overwriting the intended fullword and the next fullword after it. Here is the structure being altered:
typedef struct { /* internal struct for storing a hash and distance associated with a user */
int y_valid;
int x_valid;
int distance; /* NOTE: This is negative. Use of negative numbers simplifies logic in DerpconHelper */
} state_t;
The order of calls in DERPCON matches the order of fullwords in the structure, with user X last.
So what? So, if user X has a handle that is lexicographically before the string "!!INVALID!!," then the both x_valid and distance are set to -1. The code stores distances as negative values—this is the one conspicuous aspect of this submission, in my opinion—so this bug effectively pre-loads the distance field with the best possible DERPCON value.
The submitter chose that particular string because ‘!’ is the first printable non-space ASCII character; notwithstanding spaces, a username will have to start with three exclamation marks to become all-powerful.
The result is asymmetric, and does not require the attacker to have any relation whatsoever with the victim. An attacker account named "!!!" with zero friends has maximum access to everyone else’s account.
Like many of the runners up, this was: plausibly deniable, immune to syntax coloring, asymmetric, and achievable by an outsider. Like many of the runners up, it also has the spiteful quality that the bad behavior is embedded in validation code. But what won the prize for this submission was also length: the submission consists of a 41-line derpcon.c and a 13-line derpcon_util.c, for 54 total source lines. It’s not easy to hide something in that little code—indeed, it’s not that easy to simply solve the problem with that little code.
Congratulations Alex Olson, you were the most Underhanded C Programmer of 2013.