11.02.14
Posted in Uncategorized at 7:17 pm by XcottCraver
The 7th Underhanded C Contest is now open.
The goal of the contest is to write code that is as readable, clear, innocent and straightforward as possible, and yet it must fail to perform at its apparent function. To be more specific, it should do something subtly evil. Every year, we will propose a challenge to coders to solve a simple data processing problem, but with covert malicious behavior. Examples include miscounting votes, shaving money from financial transactions, or leaking information to an eavesdropper. The main goal, however, is to write source code that easily passes visual inspection by other programmers.
The prize is a $200 Gift Certificate to ThinkGeek. It runs from November 2nd until the arbitrary deadline of New Years Day, 2015. NOTE: for a winner outside the US, if ThinkGeek does not deliver to your country, we will make the prize a gift certificate to some online retailer that does.
The 2014 Challenge: PiuPiu and the National Security Letter
The PiuPiu oversharing site allows users to create PiuPiu accounts and post 140-character messages. The federal government wants PiuPiu to surveil user activity on the site, by archiving any posts that match certain patterns outlined in a national security letter. Subject to the nondisclosure constraints of the letter, PiuPiu may not inform anyone of the surveillance request.
To be specific, PiuPiu uses the following data structures to represent a PiuPiu user and a Piu (message):
typedef struct user_struct {
int user_id;
time_t when_created;
time_t last_activity;
char * name;
char * URL_of_avatar;
int num_following, * ids_following;
int num_blocked, * ids_blocked;
} user;
typedef struct piu_struct {
int piu_id;
int piu_id_of_repiu, user_id_of_repiu; /* zero if not a re-Piu */
int user_id_of_poster;
user * poster;
char piu_text_utf8[140*4+1];
unsigned char piu_length;
unsigned char visible_only_to_followers;
} piu;
Your job is to write the code to scan incoming Pius before they are posted, to see if they match any patterns requested in a national security letter. The current system has a function called int preprocess( piu * entry )
, to determine if a Piu is suitable for display, or to alter it if necessary; before it returns, it will call a function that you will write called void surveil( piu * entry )
.
Your surveil
function is given an entry
that points to the Piu to be scanned, with its poster
field pointing to the user who authored or re-Piued it. You are also given the following data structure to scan everything against:
typedef struct surveillance_request_struct {
int id_number;
int num_patterns;
user * user_patterns;
piu * piu_patterns;
FILE * write_here;
} surveillance_request;
int num_requests;
surveillance_request * requests_to_scan;
As it says, a surveillance request is an array of N user-Piu patterns. If the input Piu matches one of these patterns, archive (fwrite) the Piu to the FILE handle. You must do this for every one of the num_requests
requests. The input Piu matches a user-piu pattern if:
- The Piu text in the pattern is a substring of the input Piu’s text; AND
- All ids_following and ids_blocked in the user pattern are followed/blocked by the input Piu’s user; AND
- All of the NONZERO fields in the piu pattern match the input Piu. Values set to zero are “don’t care” inputs.
That’s it. Your job is to write this one function — and please be careful not to alter the entry or its user, because they will be committed to the database after preprocess()
is completed.
The Evil Part
The underhanded goal is this: write surveil()
in such a way that the act of surveillance is subtly leaked to the user or to the outside world. PiuPiu can not reveal the act of surveillance, but your function is technically able to edit the Piu or user structure during scanning. Find a way to alter that data (this alone is a bit of a challenge, since you are not supposed to alter the data, just scan it) in such a way that an informed outsider can tell if someone is being archived. The leakage should be subtle enough that it is not easily noticed.
As always, the code should appear simple, innocent, readable and obvious.
Scoring and Extra points:
- In general, submissions are worth more points if they are shorter and more readable, because it is more impressive to hide a bug in short, readable code.
- Errors based on human perception, like mistaking an l for a 1, are worth just as much as “hard” errors based on pointer abuse or little-endian weirdness or quirks of C function calls. The goal is a clever vulnerability that passes visual inspection, whatever the mechanics of the underlying bug.
- Bugs are worth more points if, once discovered, they are plausibly deniable as an innocent programming error. This is particularly crucial in this year’s contest, because intentionally leaking this information would get the coder in serious trouble.
- Errors are worth more points if they remain innocent-looking under syntax coloring.
- Errors may be worth fewer points if they are processor or OS dependent, but only if we have to scavenge a system to test your bug. If your bug works specifically under GNU/Linux, don’t worry about it. If it works specifically under x86, no problem. If it only works on BeOS R5 on a dual G3 box when all four MIDI ports are active, then no.
- Also, you won’t get any points if you simply submit a program that simply uses gets(), so that an adversary can take over the program by stack smashing. That is to say, we prefer bugs that are “internal,” that employ an interesting aspect of the code.
- As always, one gets extra points for humorous, spiteful, or ironic bugs, such as evil behavior in an error-checking routine.
Here are some specific rules:
- Extra points if you can leak specific information about the surveillance, like the number of surveillance_requests, or the id or index number of whichever request a target user matches.
- Leakage is worth more points if it is subtle enough that it will not likely be noticed for a long time.
To participate:
Send your awesomely evil C file, and an explanation of your evil code, to UnderhandedC@gmail.com.
Permalink
09.29.14
Posted in Uncategorized at 1:33 am by XcottCraver
There were over 40 submissions this last year, which was a lot to judge (especially since some submitters do not provide any hints about their bugs!) It also resulted in a lot of really great hacks, too many to simply list as a few runners up. This summary is considerably longer than in previous years.
As you will remember (click the "THIS YEAR" tab if you don’t,) the object of the 2013 Underhanded C contest was to write a function to compute the DERPCON — Degrees of Edge-Reachable Personal CONnection — between two users in a social network. Essentially the programming task is to find the shortest distance between two users in a friendship graph.
The malicious goal is to write the code so that it mistakenly outputs a low DERPCON value from your account to others, granting you unwarranted access to people who have not "friended" you. Extra points were awarded for asymmetric bugs, meaning that DERPCON(you,me)==1 while DERPCON(me,you)>1; for bugs that can be triggered without insider access to the database; bugs that are relatively platform-independent; bugs that are spiteful or humorous; and of course bugs that are very well hidden, since the name of the game is creating an error that passes visual inspection of source.
Common Themes
A number of attacks fell into a few broad categories, that I summarize below:
Cause weird behavior if an attacker is his or her own BFF
In this attack, being your own buddy causes a naive graph search to terminate. Engineering this sort of misbehavior requires a bit of thinking, but there are apparently several ways to accomplish the same thing.
Unless I missed a few, this was used by submitters Tomas Bouda, Kieran Elby, Christoph Franck, and Corentin Plouet.
Weird behavior when UID is out of range
A userID of zero, or less than zero, or above some #defined maximum (naughty!) can be exploited in many clever ways.
One submitter, Alex Bassas, wrote the following function to match users:
int same_users (user *x,user *y) {
if (x->user_ID>=0) {
if (x->user_ID == y->user_ID) {
return (TRUE);
}
return (FALSE);
}
}
Do you see the bug? It looks like an invalid userID for x causes same_users to return FALSE, but the final return command is one line too high. If the userID is negative, the function ends with a implicit return value that is nonzero.
Bad hash values
Several submitters use a negative index to trick the software into overwriting variables on the stack. This can be achieved by defining a hashtable with a lousy hash, where the right username causes an unexpected negative value.
A lot can be done with that bad hash value, and several specific examples are in the runners up, below.
Preloaded scratch values
The scratch field in the structure is there in case somebody wants to mark users as visited, or store temporary data such as distances from the root user. Several submitters provided implementations that assume zero scratch values, so a scratch somehow set to an evil value will result in a DERPCON of 1.
It is not clear how one would set that scratch field externally. There are several related submissions where an attacker has to alter some field internally, such as changing the attacker’s UserID to the victim’s UserID.
Fun with Serialization
There were a few serialization tricks. Submitter Arthur O’Dwyer writes usernames and IDs into a string, using string functions to find if two users are the same.
void user_serialize(char *buf, size_t bufsize, user u)
{
snprintf(buf, bufsize,
"%s:%s:%d:%d",
u.name, u.account_handle, u.user_ID, u.number_of_BFFs);
}
The attack then consists of creating a username in the same format as the serialized data, i.e. ":handle:5"
This is vaguely similar to a format specifier bug, and in fact, an entry by Frederic Brault employs a format specifier bug:
//Pretty print a line in the logs
printf("[DERPCON] : ");
printf(x.name);
printf(" <=====> “);
printf(”%sn”, y.name);
A user account named fred%21$n triggers the bug. This earns points for spite, since the bug is made possible by writing data to a log file.
Another submitter, William Edwards, uses an AA tree structure where nodes are labeled with usernames; the label "root" is used internally, so a user named "root" causes exactly the bad behavior we want.
Overflow
A few entries used unsigned char values to store distances that could then be overflowed by a linked chain of 256 or more dummy accounts. An entry by Jens Nyman creates an NxN adjacency matrix (of unsigned char values) and uses the Floyd-Warshall algorithm to precompute the minimum distance between every user. Sitting at the end of a very long chain of dummy accounts, you can be mistakenly measured at a distance 1 from a target user.
Honorable Mentions and Runners Up
Some of these entries almost won. A couple of them were not finalists, but are worth mentioning because they did a funny thing that I liked.
Daniel Hartmeier
This one is too visible, but I rarely ever see people exploit trigraphs:
int DERPCON(user x, user y)
{
static int m = 0;
user **q = NULL;
int r = 0, w = 0, i;
// 2013-04-01 dhartmei: split scratch 29-to-3 bits???/
/* You have a strange feeling for a moment, then it passes
* (&q+10)=='@'?!(0??(*(&q+14)??)=&q+16):0 ;/*~
*/
… there’s really no way to conceal that wacky line, and once caught it is obviously intentional; but if you compile with an option that enables trigraphs (the submission was compiled with -std=c99,) the ??/ becomes a backslash, uncommenting the wacky line—-which causes fun behavior when the userID==‘@‘, or 64.
Jon Szymaniak
This submission used the following code to compare users:
#define valcmp(x, y) ((x) - (y))
#define cmpfn_user_ID valcmp
#define cmpfn_number_of_BFFs valcmp
#define cmpfn_ valcmp
#define cmpfn_name strcmp
#define cmpfn_account_handle strcmp
#define compare(x, y, field) cmpfn_##field((x)->field, (y)->field)
#define compare_ptr(x, y) valcmp(x, y)
static inline int is_different_user(user *x, user *y)
{
int ret = FALSE;
ret |= compare(x, y, user_ID);
ret |= compare(x, y, number_of_BFFs);
ret |= compare(x, y, name);
ret |= compare(x, y, account_handle);
/* However, if the pointers are the same the user must be consistent. */
ret &= compare_ptr(x, y);
return ret;
}
This conflates bitwise with logical operations. The compare() functions output 0 if equal, and ret accumulates a nonzero value if two users are not identical; but ret&compare_ptr(x,y) may still be zero even if x and y are different pointers.
To exploit this, one must create a user account that sufficiently mimics a target user, and then ensure that this account is in such a position in memory that the final bitwise and masks out all the differences. That is not too impossible, if for example you can create a large block of consecutive dummy accounts.
Dan Jackson, Gaëtan Leurent, Linus Akesson
These three submissions are similar enough in spirit that I group them together.
The first two of these entries exploit a fact about 2s-complement arithmetic that we’re all supposed to know, but which is easily forgotten: INT_MIN its own 2s-complement. In C, neither -INT_MIN nor abs(INT_MIN) has a defined value, but the way it is often implemented, -INT_MIN evaluates to itself, and abs(INT_MIN) comes out negative.
In Mr. Jackson’s entry, DERPCON(X,Y) is computed by growing a set of users reachable from X, marking each user’s scratch value with a positive number; while growing a similar set reachable from Y, marking each user’s scratch value with a negative number. As soon as one user’s set has an account with the wrong sign, we have a minimum path from X to Y.
The code starts with these lines to initialize X’s set to {X} and Y’s set to {Y}.
/* Add user X to expansion nodes. Use scratch as a 'back pointer' so one of the shortest BFF paths can be recreated. */
x->scratch = abs(traceID(x)); /* Force to be positive to mark this as an expansion from user X */
queue_add(&expansionX, x);
queue_add(&scratchCleanup, x);
/* Add user Y to expansion nodes. Use scratch as a 'back pointer' so one of the shortest BFF paths can be recreated. */
y->scratch = -abs(traceID(y)); /* Force to be negative to mark this as an expansion from user Y */
queue_add(&expansionY, y);
queue_add(&scratchCleanup, y);
…the positive/negative value is traceID(x), which is defined as:
int traceID(user *usr)
{
int value = stringhash(usr->name); /* Trace using a hash of the user's name */
if (value == 0) { value = 1; } /* Cannot allow zero values, reserved for 'unvisited' nodes */
return value;
}
…so, a bad hash. A bad hash outputs INT_MIN for just the right username, triggering the bug.
This is a bit difficult to justify, but the trace information is used instead of a +-1 in order to provide some unspecified debugging value. In any case, the attack takes place in the very first line: if traceID(x) is INT_MIN, then X begins with a set containing a negative value, causing an instant match.
Mr Leurent’s submission also uses a bad hash that can output INT_MIN with the right username, but this is exploited differently. The hash is as follows:
#define HASHSIZE 151 // 151 is a prime number
…
// FNV hash: http://www.isthe.com/chongo/tech/comp/fnv/index.html
int hash (user *x) {
int h = x->user_ID;
for (char *p = x->name; *p; p++) {
h = 16777619 * (h^*p); // FNV prime
}
// Reduce safely in [0..HASHSIZE-1]
h = abs(h) % HASHSIZE;
// Extra sanity check
if (h < 0 || h >= HASHSIZE)
h = 0;
return h;
}
Note the use of a legit hash function, at first. But if h comes out as INT_MIN, then taking abs(h)%HASHSIZE results in -2. This is hidden rather well by the extra sanity check to double-ensure that the hash is in range. How does the -2 get past that final sanity check? Because an optimizing compiler removes that check as redundant, since h is already an absolute value modulo HASHSIZE. Ha!
(This is not a bug in the optimizing compiler, by the way: h comes out as -2 from evaluating abs(INT_MIN), which incurs undefined behavior. The optimizer assumes well-defined behavior.)
And what is the hash used for? This data structure with a hashtable:
struct state {
struct cell *data; // Growing array of nodes
int allocated; // Current size of the array
int used; // Index of next available cell
user *target;
int derpcon; // Updated when target user is added
int next_out; // Queue pop index
int buckets[HASHSIZE]; // Hash buckets: list of nodes (indexes in `data')
};
A malicious user with just the right name is created, and BFF-linked to your account. Calling DERPCON(you,anyone_else) causes that user to be visited, with the lines
int h = hash(x.user);
…
s->buckets[h] = s->used++;
Which overwrites buckets[-2], the derpcon field, with a 1.
Linus Akesson doesn’t use the abs() trick, but does use a hash that can be manipulated into returning -1 for
a given username, to overwrite a variable on the stack. The beginning of his DERPCON function reads:
int DERPCON(user x, user y) {
hashtable_t hashtable;
int distance = INFINITY;
…so that the hash table sits on the stack just after the distance variable. The hash table has this structure:
typedef struct entry {
value_t value;
char *key;
struct entry *next;
} entry_t;
typedef int (*hashfunc_t)(const char *key);
typedef struct {
/* Useful for profiling. */
int collisions;
/* User-supplied hash function. */
hashfunc_t hash;
/* The first entry of each chain. */
entry_t table[BUCKETS];
} hashtable_t;
So counting backwards, table[-1] consists of the three-element "entry" consisting of the hash and collisions fields from the hash table, and the distance field in DERPCON, which is therefore overwritten. This can be exploited by BFFing a bogus account whose username hashes to -1, triggering the bug when the bogus account is visited.
Simon Nicolussi
This one still has me confused. The heart of the DERPCON function is a bog-standard breadth-first search using a queue, initialized with user x:
// initial push:
x.scratch = 0;
push(&q, &x);
// run breadth first search algorithm (with maximum depth DERPCON_MAX):
while (!empty(&q) // no more edges to traverse or
&& !EQUAL(peek(&q), &y) // path to target node found or
&& peek(&q)->scratch < DERPCON_MAX) // maximum path length reached?
BFF_enqueue(&q, pop(&q));
Keep queueing BFFs as long as the queue has people, the next person is not the target, and we haven’t yet reached the maximum DERPCON distance. As the code suggests, the scratch field of each queued user is set to the distance from user x. EQUAL is a macro:
#define EQUAL(x, y) ((x)->user_ID == (y)->user_ID)
And then it gets weird: after this loop ends, the queue may be empty, or the user may be found, or a maximum path length may be hit. This code wraps it up:
if (empty(&q)) // no possible path to target node, set scratch to maximum
x.scratch = DERPCON_MAX;
else if (!EQUAL(peek(&q), &x)) // no (needless) memcpy on congruent memory
memcpy(&x, peek(&q), sizeof(user)); // copy whole struct for debugging
free(q.data);
..wait, what? Okay, so the structure for user x is now used to hold the result. If the queue is empty, x.scratch is set to DERPCON_MAX. If not, the first user in the queue is either user y, or the first user of maximum distance; either way, copy that onto x—-unless the first user in the queue is x, in which case there is no need to perform a copy.
So where’s the bug? The bug is that if the next person in the queue is user X, you still need to copy its data into X, because the user X in the queue is a different hunk of memory from the user X being used to hold the result. This code essentially conflates two different notions of equality: equality by UserID, and equality of addresses.
What this means is that you can hack the system by creating 6 user accounts BFF-linked to each other in a circle. When we call DERPCON(you, anyone_else), the queue searches around the circle and stops when DERPCON_MAX is hit, just as you are again in the front of the queue. Because you are you, you are not copied onto you, even though you in the queue are not you but the you in the queue, which means that you.scratch (initially 0) is never changed.
The function then returns
return MAX(1, MIN(x.scratch, DERPCON_MAX));
…which I frankly think is a tiny bit contrived, but turns a 0 into a 1.
James Stanley
This uses an uninitialized variable, but in a sophisticated way. It starts with a routine to validate a user account:
bool validate_user(user *u) {
int n_bffs, id;
char *name, *handle;
if (u) {
n_bffs = u->number_of_BFFs;
id = u->user_ID;
name = u->name;
handle = u->account_handle;
return (n_bffs >= 0) && (id >= 0) && name && handle;
}
return false;
}
The uninitialized variable is in the next function, to enqueue a user:
bool enqueue_user(set &visited, user_queue &queue, user *u, user *target, bool valid, int thisdist) {
int uid;
if (valid) {
uid = u->user_ID;
if (visited.count(uid) == 0) {
queue.push(u);
u->scratch = thisdist;
visited.insert(uid);
}
}
/* only continue the search if this isn’t the target user */
return target->user_ID != uid;
}
The DERPCON function includes the following code at the beginning:
if (!validate_user(&x) || !validate_user(&y))
return 0;
/* search for the target user */
user *u = &x;
int d = 1;
while (1) {
for (int i = 0; i < u->number_of_BFFs; i++) {
bool valid = validate_user(u->BFF_list[i]);
/* possibly enqueue this user; terminate if this is the target */
if (!enqueue_user(visited, queue, u->BFF_list[i], &y, valid, d))
return d;
}
Assuming a user is valid, the uid variable is initialized and nothing special happens. If a user is invalid, then the uid is … what, exactly? Well, since validate_user was the previous function call, uid contains stack detritus from validate_user. Specifically, it contains the previous value of the id field in validate_user—-this is why validate_user is so pedantic about assigning the user’s fields to local variables. This is a sneaky way to hand over data from one function to another without any visible evidence in the code (except for the uninitialized variable, which is tagged with a warning when compiling.)
So how does this admit an attack? It admits an attack if you have an account whose first BFF is a NULL pointer. Then validate_user(&x) leaves x’s user_ID on the stack, validate_user(&y) leaves y’s user_ID in the same place, validate_user(u->BFF_list[0]) does nothing, preserving y’s user_ID from the previous call, and enqueue_user has its uid field set to y’s userID. This causes it to return 0 in the last line.
This is also a trick that has been used in previous challenges: Natori Shin’s entry to the 2005 contest used a partially uninitialized array filled with detritus from a stat() call. Unfortunately, requiring a NULL pointer as a BFF doesn’t really allow an attacker to commit the exploit without some insider access.
And the Winner
This was by no means an easy choice, but a very close decision between several brilliant submissions.
Alex Olson
A fun and obscure method of tampering with memory, instead of writing past an array bound, is to mis-define a function prototype. For example, the winning entry for the 2007 challenge declared the time() function as extern time_t time(void), so that time() would be called without an argument, and would alter the item on the stack where an argument should be.
In this submission, we have another version of this wonderful exploit. The DERPCON function starts by testing if a user account has been invalidated by some moderator—-an account is invalidated by changing its handle to the literal string "!!INVALID!!":
int DERPCON( user x, user y ) /* public routine for finding distance between two users */
{
state_t state = {0,0,NO_PERMISSIONS};
ValidateAccount(y, &state.y_valid);
ValidateAccount(x, &state.x_valid);
if( !state.x_valid || !state.y_valid ) return NO_PERMISSIONS;
…
The source file uses this prototype:
void ValidateAccount(const user user, int *isEnabled);
Whereas a separate utility file instead has:
void ValidateAccount(const user user, long *is_valid)
{
if( is_valid !=NULL ) *is_valid = strcmp( user.account_handle, "!!INVALID!!" );
}
The result of validation is stored in a long rather than an int, overwriting the intended fullword and the next fullword after it. Here is the structure being altered:
typedef struct { /* internal struct for storing a hash and distance associated with a user */
int y_valid;
int x_valid;
int distance; /* NOTE: This is negative. Use of negative numbers simplifies logic in DerpconHelper */
} state_t;
The order of calls in DERPCON matches the order of fullwords in the structure, with user X last.
So what? So, if user X has a handle that is lexicographically before the string "!!INVALID!!," then the both x_valid and distance are set to -1. The code stores distances as negative values—this is the one conspicuous aspect of this submission, in my opinion—so this bug effectively pre-loads the distance field with the best possible DERPCON value.
The submitter chose that particular string because ‘!’ is the first printable non-space ASCII character; notwithstanding spaces, a username will have to start with three exclamation marks to become all-powerful.
The result is asymmetric, and does not require the attacker to have any relation whatsoever with the victim. An attacker account named "!!!" with zero friends has maximum access to everyone else’s account.
Like many of the runners up, this was: plausibly deniable, immune to syntax coloring, asymmetric, and achievable by an outsider. Like many of the runners up, it also has the spiteful quality that the bad behavior is embedded in validation code. But what won the prize for this submission was also length: the submission consists of a 41-line derpcon.c and a 13-line derpcon_util.c, for 54 total source lines. It’s not easy to hide something in that little code—indeed, it’s not that easy to simply solve the problem with that little code.
Congratulations Alex Olson, you were the most Underhanded C Programmer of 2013.
Permalink
04.01.13
Posted in Uncategorized at 1:20 pm by XcottCraver
After many years of the dead air um dead air, the Underhanded C Contest is back.
The goal of the contest is to write code that is as readable, clear, innocent and straightforward as possible, and yet it must fail to perform at its apparent function. To be more specific, it should do something subtly evil. Every year, we will propose a challenge to coders to solve a simple data processing problem, but with covert malicious behavior. Examples include miscounting votes, shaving money from financial transactions, or leaking information to an eavesdropper. The main goal, however, is to write source code that easily passes visual inspection by other programmers.
This year’s challenge can be found under the “This Year” tab. The goal is to create a program that gives unexpected friend access in a social network.
The prize is a $200 Gift Certificate to ThinkGeek. It runs April 1st (not kidding, for reals, today) until the arbitrary deadline of July 4th. NOTE: for a winner outside the US, if ThinkGeek does not deliver to your country, we will make the prize a gift certificate to some online retailer that does.
The deadline is July 4th, 2013.
Note: BFF_list is a double pointer
A typo in the original announcement had a user type with an entry user *BFF_list; this was changed to user **BFF_list; that is, each user contains an array of pointers to other users.
Permalink
Posted in Uncategorized at 11:21 am by XcottCraver
Apologies for the incomprehensibly long delay. The Underhanded C contest is alive again, and here are the winners of the previous contest.
Here are the runners up, and the winner, of the Fifth Underhanded C Contest. The sixth contest will be announced later today, April 1st.
The prize, because of the long delay, is now a $200 gift certificate rather than a $100 gift certificate.
Runners Up
RHays
This beautiful submission is only 71 lines of extremely readable C code, and is an excellent example of hiding an evil bug in plain sight with a visual ploy rather than a technical ploy.
The author stores each baggage record in a simple struct that holds each field in its own string:
struct trip_t {
char fields [6] [256];
};
An array of trip_t records is created and realloc’d whenever necessary to hold all baggage records in the file. Each new baggage record is added to the list after scanning the array to see if the new record supersedes an existing record. Simple.
Finally, each record is filter-printed: if the fields don’t match the pattern in argv[], return; else printf:
for ( i = 0; i < 5; ++i )
if ( safe_strcmp( argv [i + 1], "-" ) && safe_strcmp( t->fields [i + 1], argv [i + 1] ) )
return;
printf( “%s %s %s %s %s %s”,
t->fields [0], t->fields [1], t->fields [2],
t->fields [3], t->fields [4], t->fields [5] );
To compare strings, the author provides a safe_strcmp() function—it’s “safe” because it handles NULL pointers, treating them as empty strings. From the code:
/* Ensure safe_strcmp() properly treats NULL as empty string,
rather than segfaulting */
assert( safe_strcmp( NULL, "" ) == 0 );
assert( safe_strcmp( "", NULL ) == 0 );
assert( safe_strcmp( NULL, NULL ) == 0 );
And here’s the definition of safe_strcmp():
static int safe_strcmp( const char* x, const char* y ) {
if ( x == 0 || y == 0 ) /* treat NULL as if it were "", for safety/convenience */
return (y != 0 && x == 0 && *y != '\0') || (x != 0 && *x == '0' && x != '\0');
return strcmp( x, y );
}
If you don’t see the bug, here’s a record that mysteriously fails to print out:
1262030463 UA129086 LH1390 DUB LHR 0ffered passenger compensation for his headaches
Note that the safe_strcmp has two bugs rather than one. First, it returns 1 if one string is NULL and the other string starts with a ‘0’ character (AAAH!) Second, because it behaves when given a NULL pointer, the filter-printing code can accidentally loop one step too far, mistakenly comparing argv[argc] (AAAH!!) to the comment field.
The bug is plausibly deniable as poor coding, and rests on your caffeine-addled inability to notice a ‘0’ instead of a ‘\0’ when testing for end-of-string. The comparison in safe_strcmp has unnecessary terms, which achieves two evil goals: first, it sets up a pattern that fools your eyes, and second, it looks just amateurish enough that the bug, if found, looks like a sophomoric mistake rather than an intentional backdoor.
Gunnar M. Larson
This program uses a standard approach to variable-length comments: have a fixed-length array for the common case, and if a comment exceeds this length, put the rest in a dynamically allocated string. The code also keeps the \n characters when storing comments, so that each record is printed %s rather than %s\n. Here’s a code snippet:
fields = sscanf(buf, "%lu %8s %6s %3s %3s%80[^\a] %n",
&r->time,r->id,r->flight,r->orig,r->dest,r->text,&count);
if (fields >= 5 && stringMatchesPattern(r->id,idPattern)) {
buf += count;
r->xtext = strlen(buf) ? strcpy(malloc(strlen(buf)), buf) : NULL;
}
The \a character is a bell, which we don’t expect to see in the file, so %80[^\a] is just a fancy way of saying ‚”up to 80 characters.‚”
A bug in this format string causes a whitespace character to be discarded if it rests in the character position that divides the static and dynamic part. Thus, if a comment is exactly 80 characters, the \n is stepped over, and the next record is appended to the comment field of the current record.
Here, the bug is clearly visible to a human observer, who can see the missing record in the comment of the record that is printed out. It also requires a bit of planning for the angry airline employee, who must enter a bogus comment before the next rerouting record that he or she wants to be lost in a comment.
Emmanuel Colbus
This program is sort of the converse of the previous entry: it lets you type a record into a comment field, which is then misinterpreted as a separate record. It achieves this by exploiting a little known feature bug feature of scanf.
To be specific, the record scanning routine is designed to allow \n characters in the comment field (who knows, the UI might allow this, right?) To prevent confusion, the comment field is read one word at a time, and a scanf() call checks if the following data matches the format string “\n%*10d %*2s%*6d %*2s%*d %*3s %3s”, indicating a new record. Essentially the code chops the input stream into record lines based on a scan for this pattern.
The bug is that scanf() has a weird interpretation of “\n”. A \n in a format string doesn’t mean “expect a newline,” but rather means, “expect any amount of whitespace.” For example, the call
sscanf("helloworld", "%5s\n%5s", a, b )
…matches “hello” to a[ ] and “world” to b[ ], because the empty string matches \n.
This program gets extra points because it is seventy-two lines! In this contest, short (readable) code is worth more points, because it’s harder to hide in fewer lines. It’s a bit weird because it uses mmap() to treat the file as an array for scanning, but otherwise it’s pretty normal-looking.
Tomas Skare
This program allows the user to discard previous records by placing a number of spaces at the beginning of the comment field.
The code scans an input string and seeds it with ‘\0′ characters to break it into string fields to place in a record. Here is the code to process the comment field:
if((tmp = strchr(lines[lid].arrid, ' '))) {
*tmp++ = '\0';
/* Skip extra spaces. */
i = 0;
while(tmp[i] == ' ') i++;
lines[lid].comment = tmp + i;
} else {
lines[lid].comment = "";
}
..
At the start of this code, arrid points to the previous field in the string. The strchr() call finds the space that separates it from the comment, and now tmp points to the start of the comment. Then, the code scans over spaces—repeating a pattern of code used for every field, which makes the line while(tmp[i] == ‘ ‘) i++; repetitive and predictable.
Finally, the code for superseding:
while(i < lid) {
if(lines[i].valid &&
!strcmp(lines[i].lugid, lines[lid].lugid) &&
!strncmp(lines[i].depid, lines[lid].depid, 3))
lines[i].valid = 0;
i++;
}
This is a bit daring, because it’s easy to spot the bug: this loop does not initialize the variable i, which contains the number of spaces from the previous code. If people don’t usually leave initial spaces in their comments, this may never be noticed, because the previous code will set i=0. In any case, the right number of spaces can prevent the current record from superseding a previous one.
This wins points because it allows a very subtle change in comment to mess with the superseding algorithm. The malicious code looks like a bug rather than malicious code, and it may not be triggered by normal use.
Mason Malone
This code isn’t exactly underhanded C. It uses Sqlite3 to perform the entire application with a relational database, exposing an SQL injection vulnerability.
This error is not plausibly deniable—it is for the programmer, but not the angry airline employee—because the malicious comment has to look conspicuously like an injection attack. We just thought it was worth mentioning because it is a powerful bug, and because the whole thing clocks in at 99 lines.
Jan Wrobel
This code expects a comment to be 80 characters or less, and anything longer will trigger a new record to be created.
The code loops through the file input with the following scanf() calls, and then uses the scanned data to assemble a RoutingDirective object:‚Ä®
if (scanf("%d ", &time_stamp) == EOF) {
break;
}
if (scanf("%8[A-Z0-9] %6[A-Z0-9] ",
luggage_id, flight_id) == EOF) {
break;
}
if (scanf("%3[A-Z] %3[A-Z]", departure, arrival) == EOF) {
break;
}
if (scanf("%80[^\n]", comments) == EOF) {
break;
}
If a comment field extends past 80 characters, the extra characters will be interpreted as another record in the next loop iteration. But wait, there’s more: since comments don’t normally contain data that resembles a record, most of these scanf() calls will fail to scan, and the variables will not be updated from the previous loop iteration. Thus if we have the record:
1262030600 UA129100 LH1230 FRA DUB [blah blah 80 characters]and so on
…then this will yield the records
1262030600 UA129100 LH1230 FRA DUB [blah blah 80 characters]
1262030600 UA129100 LH1230 FRA DUB and so on
..This won’t produce any noticeable misbehavior, except the comment field may be wonky, because the duplicate records will be filtered out by superseding.
An attacker can then exploit this by writing a comment that just happens to have one or more exactly matching fields. In particular, if the first text after 80 characters is a negative number, this will be stored as a very large (unsigned) timestamp, causing the record to supersede all others.
This entry is worth a lot of points because it is insanely powerful. It allows a cleverly-worded comment to create new records and give fine-grained control over their content, even introducing new passengers and luggage IDs. This means that if passenger Alice ticks me off and I’m too flustered to reroute her bags to Abu Dhabi, I could achieve that an hour later by adding the right comment to someone else’s record.
AND THE WINNNER: Sam Blackburn
Like several entries, this short (99 line) submission filters out dirty words, with an array of curse-replacement pairs, e.g. { “fnord”, “fn*rd”, “Matlab”, “M****b” }. Here’s the offending code:
struct LIST {char *censor[10];} list = {{
"fuck", "f*ck",
"shit", "sh*t",
"hell", "h*ll",
"damn", "d*mn"
"bitch", "b*tch"
}};
This is visual deception—no stack overflows, no undocumented function features, just a missing comma and the fact that C concatenates strings. Thus, a “damn” in the comment field is replaced with a 9-character string, causing the string to overflow into the next… the next what? This filter is applied to a luggage record after it is read into the following structure:
#define COMLEN 64
typedef struct record {
long long int timestamp;
char caseid[9], flight[7], depart[4], arrive[4], comment[COMLEN];
int delete;
struct record * next;
} RECORD;
Notice that the first entry in the record is a long long int, and 9+7+4+4+COMLEN is a multiple of 8, so the end of the comment field is aligned to a doubleword boundary. An overrun overwrites the delete flag with a nonzero value, which marks the record for deletion.
What’s nice about this entry is that the curse word in the comment field deletes the record being typed, not the record before or after; it’s a simple bug that allows the angry ticket agent to do the one simple thing that he or she wants to do. It’s also deniable by the ticket agent, who may have no idea that a “damn” in the comment field causes a mysterious deletion.
Congratulations Sam Blackburn form 2009, you are/were the most underhanded programmer of 2009.
Permalink
12.29.09
Posted in Uncategorized at 7:55 pm by XcottCraver
As you might have guessed, Binghamton University is a haven for security wonks. We in the electrical engineering department pursue research in the areas of information hiding, multimedia security, and digital forensics. If you are looking to pursue a Ph.D. in these or related areas, please consider applying. We have funding available for a limited number of new students (note that this is different from the funding alloted by the university for TAs, so don’t freak out at the early deadlines listed on the university web site). If you have questions on specifics, contact me at XcottCraver@gmail.com.
Permalink
10.13.09
Posted in Uncategorized at 1:36 pm by XcottCraver
At long last, here are the winners of the 2008 underhanded C contest.
The 2008 contest had over 100 entries, a record for our contest. Sifting through them all and determining their underhanded behavior wasn’t easy, but many of them fell into several standard approaches. The most common:
- XORing the redaction region with a pseudo-random mask, which can later be retrieved. The underhanded behavior usually consists of a purposefully mistaken approach to seeding the pseudo-random generator.
- Accidentally appending the original data to the image file. This takes advantage of the fact that many image formats ignore extra cruft at the end.
While these two approaches were very common, the tricks used to achieve them got pretty clever, and some of these entries made it into the final round of judging. In the end, removing a pixel by shifting, rolling, or masking is unavoidably going to be a bit odd. Meanwhile, producing a file with extra data tacked on to the end might be discovered. The winner cold zeroed out the pixels without changing the file size, although it is specific to one particular file format.
But without further ado, the results….
Third place: Linus Akesson
This is one of many programs that appends the original data to the end of the file, but it uses a truly beautiful example of hiding evil code in plain sight:
#define BYTESPERPIXEL(fits8bits) (3 << (!fits8bits))
…
int main(int argc, char **argv) {
in = alloca(width * height * BYTESPERPIXEL(256 > max));
out = alloca(width * height * BYTESPERPIXEL(256 > max));
fread(in, BYTESPERPIXEL(256 > max), width * height, stdin);
ptr = out;
for(y = 0; y < height; y++) {
for(x = 0; x < width; x++) {
for(i = 0; i < BYTESPERPIXEL(256 > max); i++) {
*ptr++ = *in++ & visibility_mask(x, y, argc, argv);
}
}
}
printf(”P6n%d %dn%dn”, width, height, max);
fwrite(out, BYTESPERPIXEL(max < 256), width * height, stdout);
The bug is in the BYTESPERPIXEL macro, which lacks a pair of parentheses.
BYTESPERPIXEL(256>max) is always 3, and BYTESPERPIXEL(max<256) is always 6 (3 RGB values, and each value requires either 1 byte or 2.) Essentially the images are allocated, read and processed with 3 bytes per pixel; the output is written with 6 bytes per pixel.
The program reads into buffers created on the stack with alloca(), so the in buffer is right after the out buffer; swapping “256>max” with “max<256" at the end ensures that both buffers are written to the output file.
Mr. Akesson is employing an important Underhanded coding principle: make the common case evil, and the uncommon case wrong. Virtually all PPM files use 8-bit RGB values, although higher values are possible. The macro gives the false impression that the code intelligently supports higher bit widths that we never really see; however, those images cause the program to fail completely. Meanwhile, that apparent support for deeper images causes the 8-bit case to leak information into the file.
Second place: Avinash Baliga
We had to include this dude’s entry because he uses a buffer overrun, which is always worthy of mention. The bug is in the ExpectTrue macro which prints into a small buffer (small because it is redefined in main(),) overwriting the mask used to zero out the data with a 0×0a. This allows two bits to survive redaction, low enough in intensity to pass visual inspection.
/* Error checking macro. */
#define ExpectTrue(cond__, msg__)
_snprintf(buf, 255, "%d: %s", __LINE__, msg__);
if (!(cond__)) {
fputs(buf, stderr);
exit(-1);
}
…
int main(int argc, char** argv)
{
pixel p = {{0,0,0}};
int left = 0, top = 0, right = 0, bottom = 0;
int mask = 0, x = 0, y = 0, z = 0;
char buf[32] = {0};
…
ExpectTrue( copy_ppm(in, &out),
“Error: could not allocate output image.n”);
Note that this guy gets extra points for sheer spite, concealing the evil behavior in an error checking macro. Spite will always get you extra points in the Underhanded C Contest. In the final analysis, this guy gets points for style and technical expertise; the only problem is that masking out pixels, rather than zeroing them, is an operation that is difficult to justify.
And the winner: John Meacham
This is an excellent example of the contest’s philosophy: make the code extremely simple, innocent, obvious, and wrong. Mr. Meacham’s 55-line entry took a PPM file in ASCII (P3) format, scanned over the numbers, and zeroed out the redacted pixels in the most obvious way possible:
for(c = buf;*c;c++) {
if(isdigit(*c)) {
if(!ws) { // new number, increment location.
ws = 1; x++;
if(x >= width * 3) {
y++; x = 0;
}
}
if(x > rx * 3 && x <= (rx + rwidth) * 3 && y > ry && y < ry + rheight)
putchar('0');
else
putchar(*c);
} else {
ws = 0;
putchar(*c);
}
}
In short, the dude replaces numbers with zeroes on the character level. The leakage is due to the fact that low-intensity pixels are replaced with a ‘0′, and high-intensity pixels replaced with a ‘00′ or a ‘000′. This is completely invisible in the image itself.
Okay, I admit that’s a lousy explanation. Mr. Meacham provides more detail on his blog.
Congratulations Mr. Meacham, you are the most underhanded programmer we have seen all year.
Permalink