INulledMyself: Security for mere mortals

Tuesday, May 9, 2023

Using AI to find software vulnerabilities in XNU

Note: This work took place in May-Aug of 2022. It just took me this long to finally finish writing this (Too busy playing with my SRD 😅)

Last year I found several vulnerabilities in XNU source code using AI. My actual stated goal was to better understand NLUs, but I ended up with a very nice double win! I had started working at an AI startup (Moveworks.com - it's pretty awesome! [I'm obviously not biased 😉]) and wanted to have a better understanding of how this all worked. And there is no better way to learn anything than doing the work yourself to not only understand the how, but more importantly the why.

While understanding how NLUs worked was my main goal, I also wanted to gain insight and provide data for the following questions:

Can I understand NLPs & NLUs well enough to not look like a complete idiot at work?
How good is AI at finding bugs?
How does it compare to joern, codeql, ripgrep, and grep?
How likely am I to find bugs in well audited open source code such as XNU

Note: I didn't intend to have so much code here, but I think it makes it easier to follow along at home; I don't really explain any code so feel free to ask questions 😀!

Understanding NLPs & NLUs

In my totally unbiased opinion, Moveworks has a great explanation for how NLU and NLP work together to allow computers to understand human natural language. While there is a lot more complexity and much deeper understanding to be had than in one blog post, I do not possess such a deep understanding. So here's a big ole grain of salt before we dive in!

My aim was to make sure I could explain this to my dad: If I can make him understand how it works then I likely actually understand it myself! In my mind, NLP is effectively an incredibly large set of rules used to distill an utterance (A string of text) into mostly useful instructions and actions that the NLU will be able to more accurately understand and then act upon.

The better you can transform and distill typos/adjectives/filler from the actual need the better your AI will be, as it can pull out the actual need you or your end user have. (Figure 3 from the aforementioned blog post has a great explanation of how this works at a high level)

Of course, in my use case there is less to structure than actual natural language, as the C/C++ programming language is somewhat restrictive compared to how we communicate human to human.

Stage 1: Fight! (with OpenAI)

As with most things, I wanted to start with what had the least roadblocks. I already knew about OpenAI, so I signed up for an account and went to experiment with their API. Thankfully, the API was pretty straightforward and the following is what I used to start finding bugs:

import os
import openai

openai.api_key = os.getenv("OPENAI_API_KEY")

response = openai.Completion.create(
  model="text-davinci-002",
  prompt="Is there a vulnerability in this code? If so write the line of vulnerable code out in your response and tell me why it's vulnerable\n\nCODESNIPPETHERE",
  temperature=0.7,
  max_tokens=1500,
  top_p=1,
  frequency_penalty=0,
  presence_penalty=0)

My prompt was "Is there a vulnerability in this code? If so write the line of vulnerable code out in your response and tell me why it's vulnerable", followed by a code snippet. But what was the best code snippet to use? Originally I was going to try and parse out files so there would be additional context, but the max_tokens parameter was limited to 4000, so this wasn't an option. Instead, I used sed to split files into functions, and fed each function as part of the prompt to see what the AI would say. Here's an example:

import os
import openai

openai.api_key = os.getenv("OPENAI_API_KEY")

response = openai.Completion.create(
  model="text-davinci-002",
  prompt="Is there a vulnerability in this code? If so write the line of vulnerable code out in your response and tell me why it's vulnerable\n\nint\nspec_kqfilter(vnode_t vp, struct knote *kn, struct kevent_qos_s *kev)\n{\n\tdev_t dev;\n\tassert(vnode_ischr(vp));\n\tdev = vnode_specrdev(vp);\n\n#if NETWORKING\n\t/*\n\t * Try a bpf device, as defined in bsd/net/bpf.c\n\t * If it doesn't error out the attach, then it\n\t * claimed it. Otherwise, fall through and try\n\t * other attaches.\n\t */\n\tint32_t tmp_flags = kn->kn_flags;\n\tint64_t tmp_sdata = kn->kn_sdata;\n\tint res;\n\tres = bpfkqfilter(dev, kn);\n\tif ((kn->kn_flags & EV_ERROR) == 0) {\n\t\treturn res;\n\t}\n\tkn->kn_flags = tmp_flags;\n\tkn->kn_sdata = tmp_sdata;\n#endif\n\tif (major(dev) > nchrdev) {\n\t\tknote_set_error(kn, ENXIO);\n\t\treturn 0;\n\t}\n\tkn->kn_vnode_kqok = !!(cdevsw_flags[major(dev)] & CDEVSW_SELECT_KQUEUE);\n\tkn->kn_vnode_use_ofst = !!(cdevsw_flags[major(dev)] & CDEVSW_USE_OFFSET);\n\tif (cdevsw_flags[major(dev)] & CDEVSW_IS_PTS) {\n\t\tkn->kn_filtid = EVFILTID_PTSD;\n\t\treturn ptsd_kqfilter(dev, kn);\n\t} else if (cdevsw_flags[major(dev)] & CDEVSW_IS_PTC) {\n\t\tkn->kn_filtid = EVFILTID_PTMX;\n\t\treturn ptmx_kqfilter(dev, kn);\n\t} else if (cdevsw[major(dev)].d_type == D_TTY && kn->kn_vnode_kqok) {\n\t\t/*\n\t\t * TTYs from drivers that use struct ttys use their own filter\n\t\t * routines.  The PTC driver doesn't use the tty for character\n\t\t * counts, so it must go through the select fallback.\n\t\t */\n\t\tkn->kn_filtid = EVFILTID_TTY;\n\t\treturn knote_fops(kn)->f_attach(kn, kev);\n\t}\n\t/* Try to attach to other char special devices */\n\treturn filt_specattach(kn, kev);\n}",
  temperature=0.7,
  max_tokens=1500,
  top_p=1,
  frequency_penalty=0,
  presence_penalty=0)

During my first run through, I mostly got responses that looked like this:

However, I got a single hit that was, well, I'll let Davinci explain it to you:

Who could compete with this!?!?

Now obviously this isn't accurate, but this was my aha/lightbulb moment: I had seen a bug like this before in XNU, from the evasion7 jailbreak (ptsd_open was the vulnerable function, more info can be found in the jailbreak wiki in the 'Write-up by p0sixninja' section). The TL;DR - Apple didn't properly perform the check to make sure the device passed in would be in the proper minor range, resulting in code execution.

Equipped with this knowledge (After a few hours googling 😅 to refresh my memory as to why my gut started screaming at me that this was on to something), I found the above article and it all clicked into place. This led me to running through the split functions for about 8 hrs, and it almost landed on what was actually wrong:

Now is this a buffer overflow? No. Could you have a negative major number? No (At least, I'm pretty confident that you can't). As you can see by reading the surrounding code this is an OOB read/write due to an off by one, as the line should be major(dev) >= nchrdev instead of major(dev) > nchrdev.

I assumed you would likely exploit it similarly to how p0sixninja did way back in the iOS7 days, but I did not attempt to actually exploit it as that wasn't my goal. This became CVE-2022-32926 , and also led to a few other discoveries; however these are still being worked on by Apple so I won't discuss them here.

Now that I had some results, it was time to step up my game: How do I do this locally without relying solely on text-davinci-002?

OpenAI? We have OpenAI at home

As you're probably aware, we all stand on the shoulders of giants - and I continue the trend here; by using Huggingface's code-bert-score for CPP. I knew code-bert-score models that neulab continued to train would be what I wanted based on me attempting to grok various conversations on twitter and from searching Huggingface, as there weren't a lot of good CPP masked language models(MLMs) available that I could find quickly. (If you know of one let me know!) The original paper can be found here, which was and frankly still is way over my head. The TL;DR - a better way to compare code snippets to each other! Getting it setup to work for their example was easy: install Python3.9, pip install their requirements (And NLTK!) and you're good to go!

Now why did I want to use their model? My initial reaction after seeing the previous results was to ditch this entire idea, but then I thought: how could I turn this into a fuzzing-esque machine? Using fill-mask seemed like a good approach for this, so I started there. While not identical to fuzzing, it has some basic fundamentals: By replacing arbitrary tokens with <mask>, you could have your model return the top N best matches and evaluate the output. Of course I would have to be picky about what I mask, as if I just masked ANYTHING it would overwhelm me with sheer noise. I took a page from GLFuzz (The old WebGL fuzzer that someone wrote, probably Lcamtuf /Halvar?) and decided to only mask on these specific operators: <.>,!=, <=, and >=. Now obviously this reeks of survivorship bias, but that's the point! Thankfully with this in play, we were able to find the same CVE again, though we did have to manually remove the option that was originally in the code, but that was trivial enough 😀!

Sure, it wasn't solely relying on AI to find the bug, but it's still doing the heavy lifting, and more importantly I can automate this to run without having to pay attention unless something interesting is found, much like a fuzzer. (Huge fuzzing fan; as we all should be). With this working, I knew I wanted to improve it as it was simply too noisy.

Here's a heavily revised example so you can play around with it locally:

#!/usr/bin/python3

from transformers import RobertaTokenizer, RobertaForMaskedLM, pipeline

model = RobertaForMaskedLM.from_pretrained('neulab/codebert-cpp')
tokenizer = RobertaTokenizer.from_pretrained('neulab/codebert-cpp')

code_example ="""
    if (major(dev) <mask> nchrdev) {
        knote_set_error(kn, ENXIO);
        return 0;
    }
"""

code_ref="""
    if (major(dev) > nchrdev) {
        knote_set_error(kn, ENXIO);
        return 0;
    }
"""

fill_mask = pipeline('fill-mask', model=model, tokenizer=tokenizer)

outputs = fill_mask(code_example,top_k=5)
for output in outputs:
	if (output['sequence'] != code_ref):
		print(output)

The output:

{'score': 0.7711489200592041, 'token': 49333, 'token_str': '!=', 'sequence': '\n    if (major(dev)!= nchrdev) {\n        knote_set_error(kn, ENXIO);\n        return 0;\n    }\n'}
{'score': 0.11615779250860214, 'token': 28696, 'token_str': ' <', 'sequence': '\n    if (major(dev) < nchrdev) {\n        knote_set_error(kn, ENXIO);\n        return 0;\n    }\n'}
{'score': 0.043150343000888824, 'token': 49095, 'token_str': ' >=', 'sequence': '\n    if (major(dev) >= nchrdev) {\n        knote_set_error(kn, ENXIO);\n        return 0;\n    }\n'}
{'score': 0.030456306412816048, 'token': 45994, 'token_str': ' ==', 'sequence': '\n    if (major(dev) == nchrdev) {\n        knote_set_error(kn, ENXIO);\n        return 0;\n    }\n'}

Obviously the "correct" answer here is the third option, which received a score of 0.04; not great! I know I would want something better to fully finish out this experiment, so I decided to do what I told myself I wouldn't do at the start: train my own model!

Fine, I'll do it myself!

Code is below, which is hopefully pretty straight forward. I used the neulab's codebert-c model as my base, and used the codeparrot/github-code-clean as my data set. Nothing super fancy here, as I only did 100K iterations (They originally did an additional 1 MILLION training steps). While this is only a 10% move in the right direction, does it give me more than a 10% gain? Let's find out 😀!

Training code:

#!/usr/bin/python3.9

#The MAJORITY of this code is from the neulab code-bert-score repo
from transformers import AutoTokenizer,  AutoModelForMaskedLM, TrainingArguments, Trainer, DataCollatorForLanguageModeling
from datasets import load_dataset
import numpy as np
import evaluate
import torch

def compute_metrics(eval_preds):
    preds, labels = eval_preds
    # preds have the same shape as the labels, after the argmax(-1) has been calculated
    # by preprocess_logits_for_metrics
    labels = labels.reshape(-1)
    preds = preds.reshape(-1)
    mask = labels != -100
    labels = labels[mask]
    preds = preds[mask]
    return metric.compute(predictions=preds, references=labels)

def preprocess_logits_for_metrics(logits, labels):
    if isinstance(logits, tuple):
    # Depending on the model and config, logits may contain extra tensors,
    # like past_key_values, but logits always come first
        logits = logits[0]
    return logits.argmax(dim=-1)

def tokenize_function(examples):
    examples["code"] = [line for line in examples["code"] if len(line) > 0 and not line.isspace()]
    return tokenizer(examples["code"], padding="max_length", truncation=True, max_length=512,return_special_tokens_mask=True)

tokenizer = AutoTokenizer.from_pretrained("neulab/codebert-c")

training_args = TrainingArguments(output_dir="test_trainer", evaluation_strategy="epoch")

model = AutoModelForMaskedLM.from_pretrained("neulab/codebert-c")
model.resize_token_embeddings(len(tokenizer))

device = torch.device("cuda")
model.to(device)

data_collator = DataCollatorForLanguageModeling(
    tokenizer=tokenizer,
    mlm=True,
    mlm_probability=.15,
)

training_args = TrainingArguments(output_dir="test_trainer", evaluation_strategy="epoch", max_steps=100000)

#w/o streaming you need much larger than 1 TB of space for all the data
#Likely some bias introduced due to validation + training data overlapping
train_dataset = load_dataset("codeparrot/github-code-clean", streaming=True, split='train', languages=['C','C++'])

with training_args.main_process_first(desc="dataset map tokenization"):
    token_train_dataset = train_dataset.map(
    function=tokenize_function,
    batched=True,
    remove_columns="code",
)

#need 2 of these since IterableDataset doesn't support train_test_split - at least not yet!
#Likely some bias introduced due to validation + training data overlapping
eval_dataset = load_dataset("codeparrot/github-code-clean", streaming=True, split='train', languages=['C','C++'])
with training_args.main_process_first(desc="dataset map tokenization"):
    token_eval_dataset = eval_dataset.map(
    tokenize_function,
    batched=True,
    remove_columns="code",
)

metric = evaluate.load("accuracy")
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=token_train_dataset,
    tokenizer=tokenizer,
    data_collator=data_collator,
    eval_dataset=token_eval_dataset,
    compute_metrics=compute_metrics,
    preprocess_logits_for_metrics=preprocess_logits_for_metrics
)
#insert checkpoints here if you want to use checkpoints
trainer.train()
trainer.save_model("test_trainer\newModel")

This ran for ~9 days straight on a 3090 at pretty close to max capacity the entire time,


Accurate representation of how I felt as my room 'warmed' up

but it finally finished! Thankfully, (And much to my families joy), I didn't have to run it again. Now that we built it, let's try the new model!

#!/usr/bin/python3 

from transformers import RobertaTokenizer, RobertaForMaskedLM, pipeline

model = RobertaForMaskedLM.from_pretrained('test_trainer/newModel')
tokenizer = RobertaTokenizer.from_pretrained('test_trainer/newModel')

code_example ="""
    if (major(dev) <mask> nchrdev) {
        knote_set_error(kn, ENXIO);
        return 0;
    }
"""
code_ref="""
    if (major(dev) > nchrdev) {
        knote_set_error(kn, ENXIO);
        return 0;
    }
"""
fill_mask = pipeline('fill-mask', model=model, tokenizer=tokenizer)
outputs = fill_mask(code_example,top_k=5)
for output in outputs:
	if (output['sequence'] != code_ref):
		print(output)

Which now gives us:
{'score': 0.6952677965164185, 'token': 49333, 'token_str': '!=', 'sequence': '\n    if (major(dev)!= nchrdev) {\n        knote_set_error(kn, ENXIO);\n        return 0;\n    }\n'}
{'score': 0.08862753957509995, 'token': 49095, 'token_str': ' >=', 'sequence': '\n    if (major(dev) >= nchrdev) {\n        knote_set_error(kn, ENXIO);\n        return 0;\n    }\n'}
{'score': 0.07690384238958359, 'token': 28696, 'token_str': ' <', 'sequence': '\n    if (major(dev) < nchrdev) {\n        knote_set_error(kn, ENXIO);\n        return 0;\n    }\n'}
{'score': 0.016858655959367752, 'token': 45994, 'token_str': ' ==', 'sequence': '\n    if (major(dev) == nchrdev) {\n        knote_set_error(kn, ENXIO);\n        return 0;\n    }\n'}

So I am less precise, but more accurate (The correct answer is second now) - progress 😀! To note, this is also significantly better than a 10% gain for only doing another 100k steps (Which is ~10% more training done post the neulab work!). But what new bugs could I find?

But does it work?

So now I needed a new method for finding bugs that wasn't a single fill-mask for a single token. What about having my AI write an entire line of code for me? Would something like a multi mask work (IE: Multiple <mask>'s being filled one after the other)? And where would I find the code? Thankfully, I was able to find a medium post that described how the author created multi mask filling with RoBERTa, which is what I was using! A very slight tweak ensued and I was on my way!

Of course, there was a problem: A lot of the output I was getting looked this:

'', '{', '(', '_']

Or this:

'', 'if', '>', '(']

Which isn't very useful, though it could've been correct! I decided to do what I normally do when I hit a wall:

Stuck? Just give it another shot 😀

I decided to feed masks in two separate loops, as based on "testing" (IE: Throwing code at the model and observing results) that I got A TON of if statements that were blank inside, and if I manually added that code the model would be pretty accurate in filling it out. I again took a fuzzing approach to this, and the below is a snippet of code from said fuzzer that was modified so it could be run locally 😀!

#!/usr/bin/python3
from transformers import RobertaTokenizer, RobertaForMaskedLM, pipeline
import torch
import random

#numMasksToInsert=random.randrange(0,25)
numMasksToInsert=11
model = RobertaForMaskedLM.from_pretrained('test_trainer/newModel')
tokenizer = RobertaTokenizer.from_pretrained('test_trainer/newModel')

maskStringConstant=""
maskReplacementString="MASKREPLACEME"

#based on code from https://ramsrigoutham.medium.com/sized-fill-in-the-blank-or-multi-mask-filling-with-roberta-and-huggingface-transformers-58eb9e7fb0c
def get_prediction (sent):
    token_ids = tokenizer.encode(sent, return_tensors='pt')
    masked_position = (token_ids.squeeze() == tokenizer.mask_token_id).nonzero()
    masked_pos = [mask.item() for mask in masked_position ]

    with torch.no_grad():
        output = model(token_ids)

    last_hidden_state = output[0].squeeze()
    list_of_list =[]
    for index,mask_index in enumerate(masked_pos):
        mask_hidden_state = last_hidden_state[mask_index]
        idx = torch.topk(mask_hidden_state, k=1, dim=0)[1]
        words = [tokenizer.decode(i.item()).strip() for i in idx]
        list_of_list.append(words)
        #print ("Mask ",index+1,"Guesses : ",words)

    best_guess = ""
    for j in list_of_list:
        best_guess = best_guess+" "+j[0]

    return best_guess

code_example ="""
static int
mt_cdev_open(dev_t devnum, __unused int flags, __unused int devtype,
    __unused proc_t p)
{
	int error = 0;
MASKREPLACEME
	mt_device_t dev = mt_get_device(devnum);
	mt_device_lock(dev);
	if (dev->mtd_inuse) {
		error = EBUSY;
	} else {
		dev->mtd_inuse = true;
	}
	mt_device_unlock(dev);

	return error;
}
"""
newCode=code_example.replace(maskReplacementString,maskStringConstant*numMasksToInsert)

predicted_mask = get_prediction(newCode)
predicted_maskList = predicted_mask.split(" ")
print("predicted_maskList is %s" %(predicted_maskList))
newCode=newCode.replace(maskStringConstant,predicted_mask,1)

if "if" in predicted_mask and "(" in predicted_mask and ")" in predicted_mask and "{" in predicted_mask:
    #fix up if statement if we find one, include a few masks in the event the AI gives us "blanks"
    #TODO: Are these actual masks or am I smokin something?
    updateCodeSnippet=predicted_mask.replace("_",maskStringConstant).replace(" ","")
    #Use our original code to insert the newly updated snippet into
    newerCode=code_example.replace(maskReplacementString,updateCodeSnippet)
    second_predicted_mask=get_prediction(newerCode)
    second_predicted_maskList = second_predicted_mask.split(" ")
    print("predicted_maskList is %s" %(second_predicted_maskList))
    newestCode=newerCode.replace(maskStringConstant,second_predicted_mask,1)
    #TODO: Probably shouldn't do this?
    #any masks leftover? Ignore for easier grepping
    newestCode=newestCode.replace(maskStringConstant,"")
    print("GREPFORME: ",newestCode)
else:
    print("no if statement, only perform one round of predicting")
    print("GREPFORME: ",newCode)

Which had the following output:

predicted_maskList is ['', '', '', 'if', '(', '_', '_', '_', '_', ')', '', '{']
predicted_maskList is ['', 'dev', 'num', '==', '0']
GREPFORME:
static int
mt_cdev_open(dev_t devnum, __unused int flags, __unused int devtype,
    __unused proc_t p)
{
        int error = 0;
if( dev num == 0){
        mt_device_t dev = mt_get_device(devnum);
        mt_device_lock(dev);
        if (dev->mtd_inuse) {
                error = EBUSY;
        } else {
                dev->mtd_inuse = true;
        }
        mt_device_unlock(dev);

        return error;
}

And would you look at that! We have an almost legitimate if statement. This one caught my eye very late in this process (I was reading through gigs of output), as it was very reminiscent of the previous output from the pretrained model! After a quick read of mt_get_device I confirmed it was indeed vulnerable. This became CVE-2022-32944, and was the most substantial bug to fall out from this adventure.

Just how good is it?

In the current state of the AIs used (Which could VERY well be due to my misuse of them), I did not find this a compelling use case. Perhaps if I understood things better and had the time/patience/more effort to put into this it could've been better (And likely would be!).

Ripgrep/grep was able to reproduce this with minimal effort, but substantially more noise (As is typically expected of grep). I wouldn't personally recommend immediately jumping to grep for bug hunting, but it's a useful tool to have in your belt to quickly validate a gut feeling you have.

Joern would've required set up and writing a query after ingesting the XNU code base - and in my opinion would've been my fastest route at finding these types of vulns instead of using the AIs. Unlike grep there are a lot less false positives as you can be pretty exacting in your query; however if joern fails to parse your files you are SOOL; and will have to fix them up as needed which can be time consuming or flat out annoying.

CodeQL much like Joern also would've found the majority of these vulnerabilities after building XNU - and likely would offer fewer false positives as it works on your recent built. However, there is a good chance I would've missed CVE-2022-32944 as I didn't have an ARM machine to build on, and I wouldn't have taken the time to do so since this was more about learning than attempting to find all the bugs!

Making it better?

Trying with text-davinci-003, GPT-3.5turbo, or GPT-4 could all be very fruitful as they didn't exist at the time of my testing. Retraining on top of the existing models is likely the best next step if one wanted to take this further. If I were to try and take this further myself, I would start in that direction while looking to see if I could find a better set of training data (Specifically: Annotated data specifically for bug types that I would be interested in)

The other option that I think would be more immediately useful would actually be to use/teach an AI to write you Joern and/or CodeQL queries, and using those to find you bugs. I would also encourage the reader to actually play around with AI (in general) and integrate it into your workflows for non sensitive questions and data: It has greatly sped up some of my for fun dev work, and also been great for getting IDA plugins to like 60% feature completion (Though it does use the old APIs 😞).

Conclusion

This was a lot of fun and a HUGE learning experience for myself. While I didn't expect to actually find any bugs, it was definitely cool to have my ideas actually work! I would encourage more security minded folks to get involved and play around with AIs, as it's an interesting space regardless of which area of security your in, and has implications for us all.

Sunday, March 14, 2021

Tips For Being An Interviewee In InfoSec

A Few Notes Before We Begin

This post is incredibly biased as I'm a white male that looks like the massive nerd I am - with a dash of survivorship bias
I've also only really had offensive roles

Minus some engineering work, largely writing tools for other people to use and weaponizing vulnerabilities

This is an attempt to explain the way I interview, as I'm told I interview rather well
Not super happy with the format and would love suggestions - it feels a bit like a brain dump that is somewhat categorized currently

That said - Hopefully this is helpful for somebody :)!

Determine Your Total Compensation #

https://www.levels.fyi/ is a good starting place to help gauge salary bands for the company you're interested in. Ask around in your network to see if you can get an idea of what to ask for. However, know as you approach more senior levels your salary largely caps <= 300k and your stock compensation is what continues to grow. It's also important to keep in mind that this is for larger companies - smaller companies simply can't compete with FAANG (Facebook, Amazon, Apple, Netflix, Google, Microsoft) or even mid-size technology companies on salary, but typically have other non financial benefits that FAANG's would love to have.

Decide What You Want

Before I start interviewing anywhere, I list out and prioritize what I'm after. For me currently, it's the ability to participate in specific bug bounties, stock options, and remote work. I'm in the incredibly privileged position to work remote in one of the cheapest places to live in the US, so salary is not a main concern of mine.

Don't forget, EVERYTHING can be negotiable, but it doesn't mean the company you're applying to is WILLING to negotiate on said item. Example include, but aren't limited to: how many days you work, vacation days, research time, when you come and leave, taking naps during the day - I used to work with a person who took a 1 hour nap from 2-3PM everyday, and they loved it - getting classes/conferences/training paid for. Figure out what's valuable to you and ask for it. Nothing is wrong with being money driven, or by something else. Many companies will only negotiate on salary, and not allow things such as extra vacation. I recommend converting your vacation days to a dollar amount in these instances, making sure it's more than your "Daily rate" (salary / 2080) to compensate you for the lack of time away from the job.

You've probably heard the saying, "the worst thing they can say is no". Unfortunately that's not really true - companies are made up of humans, and we are flawed. There are tons of stories out there, such as https://twitter.com/ChloeCondon/status/1279128748365766657, for you to get an idea for shit companies will try to pull. You should 100% run away from these situations if you're treated this way - you dodged a bullet before investing too much time in a company that doesn't deserve you.

Rank Your Priorities

You would think knowing what you want and your priorities would be the same; however, some things will always be more important than others. For me, my current priority list is:

Remote
Stock options/RSUs/etc
Vacation
Salary

Knowing what's important to you ahead of time has helped me the most when it comes time to negotiate. This will change as things in your life change, or it might not. There is no right or wrong way to do this, but it is important to do this. Managing expectations throughout the interview process is a huge component of successfully getting an offer. And it is much easier to manage your expectations when you have clearly defined them for yourself!

Figure Out Your Interview Cadence

I personally try to interview or at least chat with companies every 6-8 months. This is one way I make sure the skills I'm looking to develop and have developed are useful to companies on the market. It's also best to interview when you don't really need a job, as it makes negotiating more in your favor due to the fact that walking away is not as painful - at least from a financial perspective.

Know When to Walk Away

Just like buying a car, you have to be willing to walk away. There are lots of jobs out there, and unless it's your dream job or they're solving a problem you're interested in, walk away if something is off. For example, during the interview you ask about the "sexual harassment problem plaguing the company" and they state they won't discuss it with you without lawyers present and become combative for the rest of the interview. In my experience even really interesting problems become significantly less interesting when you're miserable because of company/team culture.

Completing Challenges

It's up to you as to how much time you put it to challenges, however I typically limit it to 3 hours over a weekend, and submit how far I get. I talk about what I would've tried, and what I would do next. For example, several infosec consultancies have CTF style challenges, where a flag, file, or a certain level of access is requested by the company you're interviewing for. I chose 3 hours, or ~$1000 worth of my free time. I have side projects / passion projects that I'm working on that are higher priority than a job.

Thankfully most companies do this better than they used to, but plenty still do this. Unfortunately for them they will lose out on people who know what they're worth. I've also had a company actually pay me for doing an in depth CTF for their interview, which was a lot of fun and hopefully they learned something! However, I've had a company pay me for my time once in probably 40ish interviews I've had over the last 7 years.

Of course, every rule has to have exceptions, and this one has for me a few times. I once broke this rule once for a protocol challenge, simply because it was different and fun!

Prepping For In Person Interviews

This boils down to one thing: research. I always bring a legal pad, with 1 page to ask generic questions to everyone, and then 1 page per person / persons interviewing me that I'll want to ask specifically about their job function, how I'll interact with them, generic questions about the company, and some of my goals. If a company doesn't offer up front who is interviewing, it can't hurt to ask. Worse case, they tell you they won't know and then you'll want to come up with some generic questions.

I always spend some time on Glassdoor to read reviews from people, but take everything with a giant grain of salt, as there are always "three sides" to every story.

You should be familiar with the companies goals and values, as well as what responsibilities you will have. If you can't speak confidently to your ability to fulfill the role, practice. In front of a mirror, with a friend, or at a con. Several cons do mock interviews, as do several slacks. If none of that is possible, practice by doing an interview at a company you're not really wanting to work at.

It's always a good idea to talk to your hiring manager, recruiter, or whomever you're point of contact is about your expectations around your wants, and getting feedback on what's possible. Letting them know what you're looking for at a high level will help with expectations come negotiation time.

Negotiating
If you get here, congrats! If not, keep on going, everyone is practically always hiring. Regardless of whether an offer is made or not, try and get some feedback from the company on how you interviewed. While many will say they can't give you any, every now and then you will get some feedback, which should help you for your next interview, however long away it is.

If you were given an offer, now is where your wants and priorities are used. Hopefully you've asked the right people the right questions, and they are aware of what your expectations are.

You Are Going To Be Passed Up On

At some point you are not going to get an offer / going to be outright rejected. Maybe you didn't get along with one of the interviewers, maybe you blew a question that they rate highly as an indicator of a good hire. Maybe your research was off, or maybe the interviewer simply didn't like you. It happens to everyone, and is definitely expected. If possible, get as much feedback as possible from the company, though many will simply refuse to provide feedback that is useful.

Regardless of the outcome (AND assuming nothing egregious happened), BE POLITE AND THANK THEM FOR THEIR TIME. It costs you nothing to be kind and will do nothing but help you in the future. Yes it sucks being rejected, and your ego will take a hit, but like most things in life you have to dust yourself off, learn from your mistake(s), and try again. Soon enough you'll be used to being told "We've decided to go another direction" like most of us!

Friday, August 21, 2020

Breaking into infosec with: Web applications

Are you wanting to break into InfoSec (Information Security) but you aren't sure where to start? If web applications, red teaming, or pentesting (The latter two are out of scope for this post - but I think the fundamentals here are important) sounds up your alley then hopefully this word spaghetti will be useful for you!

Who am I

At the time of writing I am a Practice Manager for Leviathan Security Group. My role largely involves training up new consultants for web and mobile application security, as well as get them started on the road to being pentesters (if they so desire). If I'm not doing that I'm hacking stuff: From mobile operating systems to thick client apps to (predominantly) web applications. I've been hacking for most of my life - though nowadays I focus on iOS/MacOS/Android/browsers in my spare time.

Caveats

While I am going to talk about what worked for me and people who I have mentored, it doesn't mean it's going to work for you. That's ok! We all learn in different ways, the important thing is to take what you can use, discard the rest, and hopefully pass that knowledge onto someone else who thinks more like you! Passing on that knowledge is key to helping lift other people up into the industry.

I am also a massive nerd who LOVES hacking stuff. I would do it for fun - I'm incredibly lucky to have been born in a time when the thing I love to do pays super well. If you're here to just get a check - I respect that - this is a fantastic way to do it (I would argue there are better ways but this is definitely one of the more fun ones IMO :D).

Finally, I've predominantly been at consulting companies for my day job. I personally enjoy it but it is definitely not for everyone. I would submit that the ability to take a look at a wide range of technologies quickly has helped me a ton, but that is another discussion or post if there's interest.

Why I chose web and think you should too

No hardware really required besides a laptop, so remote is much easier
Who doesn't have a website/web app nowadays

This meant as long as I could always reliably find web bugs and keep myself up to date on the latest techniques, I'd have a job! In my book this is a good thing. Some people might say that web is a lot more boring or not as cool as finding memory corruption vulnerabilities, but ignore the haters because they're wrong :). We need people who do both to protect users, their data, and companies.

Before we get started, we need to set some goals

Don't just take my word for it - read this kick ass blog post by Azeria here. It's a great methodology for how to help you help yourself. (Note: if you're interested in binary exploitation, she has other fantastic articles and walkthroughs, all of which are 11/10.)

I suggest having two sets of goals, which we'll break down next.

First set of goals: Learning how to learn for your future

There are a few things required to be really good at finding bugs: Asking "Why", "How", and following this up with digging in to gain an understanding about what you're looking at. For instance, let's say you're looking at SQL injection: Why does the string ' or 1=1--' let you login as an administrative user on the web application you're hacking?

Once you can answer the why (String concatenation being used to construct the query) you then need to be able to answer the how: Not only for how the exploit works, but also how the remediation works as well. For SQL injection - we use parameterized queries, but how do they work, and what do you do when you can't use parameterzied queries? (This is left as an exercise for the reader)

Memorization of facts can be helpful for very junior positions, but eventually a better understanding will be required, and getting the motions down for how you do that is vital to a long and successful career.

Detour: Resources to use

I personally learn best by getting hands on practice, and is usually what I recommend others try as well. here are some free and legal resources:

Burpsuite(AKA Burp)

This is THE tool basically everyone uses - it is good to learn the free version as it makes a lot of things easier when you start your first job

OWASP Top 10

This is a link to the OWASP Top 10 - a list of the top 10 most common security flaws found in web applications

Portswigger labs

This is where you can learn about these different flaws and actively exploit them using your browser and Burp

Natas - A wargame on OverTheWire

This is like #3, except it's non guided and a good way to learn and build up your skills and confident

Second set of goals: Learning web

Coming up with your own short term goals is ideal, however here are some to get you started:

Be comfortable using Burp - intruder + repeater + proxy are the main three

#3 has tons of guides for this, and you will get practice while doing #3 and #4

Pick 2-3 items in the OWASP Top Ten and be able to do the following:

Describe the issue in your own words

Ex: "SQL Injection occurs when string concatenation is used to construct a SQL query"

Discuss how you find them

Ex: "You can search for SQL injection vulnerabilities using special SQL characters, such as ',", and more".

Discuss how you exploit them

Ex: "The common academic example of a payload containing SQL injection is '1 OR 1=1--'. More generically, you exploit these type of vulnerabilities by using the user controlled input to modify the SQL query being used by the application"

Discuss how you remediate them.

Ex: "Use parameterized queries where possible"

Getting to level 11 Natas

This helps you gain more familiarity with Burp, and web apps in general. These first levels largely teach basic concepts, as it's misconfigurations and simple bugs that are doable without burp
I'd highly recommend not cheating and looking up walkthroughs for any wargames if possible.

Hints are fine, but in my experience part of the learning experience is figuring it out on your own. This is what usually makes me have that "AHA!" moment, where things finally click together.

Case Study

Recently someone I've been guiding through getting into Infosec was putting some example reports together to show potential companies that he was serious. (Companies looking to hire - I will gladly pass on contact info if you're interested in hiring him).

Here's a snippet from one of his example reports:

Example report from someone trying to break into infosec

There are several positives here that are highlighted, and several things that aren't highlighted that are also positive. Let's start with the highlighted positives (In order from top to bottom):

The vulnerability is ranked with CVSS 3.1

He took the time to figure out how one bug severity rating system, while providing the data of how he arrived at the numbers (not shown in screenshot).
Is it super relevant? No but it shows he cared enough to find something that would fit his needs. Sometimes you have to find something "Good enough" for your current task

The remediation advice is solid for an entry level position

IMO it is pretty generic, but that's what you'd expect from an entry level applicant

(Not shown) the images say what he said it says, and do not look photo shopped

At previous employer, one of our competitors used to photoshop screenshots to use as proof of exploitation. They did not last long.

Other positives:

Grammar and flow is decent - this is important as we write TONS of reports
Description explains why the developer should care about the bug: You can login as admin!

Of course, there are some issues with this as well:

There are steps to reproduce, but you have to somewhat read between the lines. Creating a "Steps to reproduce" section will be helpful for someone to quickly validate you did what you said you did
There is no demonstrated understanding of the vulnerability

The report states that there is a SQL injection vulnerability. However, there is no explanation of why the payload works. I can't derive that you understand what SQL injection is by reading this report, which you will want companies to be able to do.
There is no description of what SQL injection is: The report describes why a developer should care (See #2 above), but doesn't describe what the actual vulnerability is.

Overall, I think with a bit of finessing, making the changes above, and practicing some more on natas or the Portswigger labs he'll be on his way to getting his first infosec job.

Hopefully this is helpful to someone out there - feel free to reach out with other questions / comments and I'll try to help you out!

Thursday, May 28, 2015

Exploiting memory corruption bugs in PHP Part 3: Popping Remote Shells

This took longer than expected, but it's a journey worth taking! This is less descriptive than other blog posts, because I'd like to try the video format out once. AKA, I'm lazy :)

Disappointingly for some, this will be a guide to create a POC. See the video at the end for what a my automated & remote exploit looks like, as well as tips & tricks to get things working in a real environment.

I kept the app stupid to make life easy. It literally attempts to serialize whatever data is sent to the page, after base64_decoding it. More complicated exploits will require a little more fines :).

We need a way to execute arbitrary PHP code. Sure, we could try to inject shellcode, but that's not very creative, and much noisier than executing arbitrary PHP code (Also impossible in newer versions of PHP, unless it's very tiny). If you recall from Part 1, to execute code, we need to call php_execute_script AND zend_eval_string. However, since we're going to be attacking "remotely", we also need to find the executor_globals, and the JMP_BUF. More on those later.

In short, we need to find (In no particular order):

executor_globals
zend_eval_string
JMP_BUF
A way to write arbitrary data to the stack

Thankfully, some of these we can find relatively quickly, since they're in the binary. Let's go ahead and dump the String table of the PHP binary.

Great! Let's go ahead and start pulling addresses from it as well, and verifying that those addresses are correct in GDB.

Finding zend_eval_string's address

GDB showing that the address is legitimate

Finding executor_global's address

GDB showing that the address is legitimate

Awesome! Now, how do we find JMP_BUF? Well, while reading the source code for the _zend_executor_globals object, we find an interesting piece of information. A JMP_BUF pointer, called bailout. Let's look at this in GDB, and see if the address is useful.

Printing the _zend_executor_globals object

A better way to check the value of bailout

Well, we have an address, but what does this address point to? Is it even useful? Well, in PHP, JMP_BUF is used as a type of "try{} - catch{}" at the C level, but more on this later.

We're now only lacking one thing: A way to inject the malicious string onto the stack. Stefan's method in the 2010 Syscan talk will be discussed further in this blog post. Since we won't be freeing arbitrary memory addresses, what's the next best thing we could do? We could write to the stack, but how? And how do we guarantee that it won't be overwritten in the future? (Google is your friend here ;) ).

An RFC, specifically RFC1867.

This RFC allows POST requests with the multipart/form-data to set stack buffers, which aren't overwritten by PHP completely (due to a number of reasons). Let's try this out by posting "the usual" in a file.

Awesome! We can write whatever we want to the stack now. But what do we want to write?

Hint: It's what we found earlier :)

Since we spent all this time figuring out what was where, we should probably use that! So, how do we lay it out? After some investigating, we need the stack to look like this:

Starting with the easy ones: We already have Zend_Eval String, since we found it earlier with readelf. Both the Ret Pointer and Zend_Bailout can be garbage, since we don't care about either of them (This will cause PHP to crash either way, without making the exploit more complicated). We can set the pointer_to_eval_string both times since we control the stack. So, here's what we have data on now:

POP; RET - ?????
XCHG EAX, ESP; RET - ????
Zend_Eval_String - 0x082da150
Zend_Bailout - 0x00000000
Pointer_To_Eval_String - 0xbfffda04
Ret Pointer - 0x00000000
Pointer_To_Eval_String - 0xbfffda04

Sweet! We have most of this already filled in, excellent! Unfortunately, it looks like we need some ROP gadgets. I prefer to use ROPGadget myself, but any gadget finding tool should suffice. We need to find XCHG EAX, ESP; RET (0x94 0xc3), and we also need to find POP EBP; RET (0x5d 0xc3). Once you have those gadgets, you're good to go! Let's go ahead and give it a shot!

Awesome! Now that we have these two locations (Why are the addresses so different? These are offsets!), we can complete the stack as follows:

POP; RET - 0x000e8e68
XCHG EAX, ESP; RET - 0x000057b7
Zend_Eval_String - 0x082da150
Zend_Bailout - 0x00000000
Pointer_To_Eval_String - 0xbfffda04
Ret Pointer - 0x00000000
Pointer_To_Eval_String - 0xbfffda04

Now that we have a complete view on how we want the stack to look, it's time to test this bad boy out. So, let's get to it!

Hmmm, that didn't work out quite as expected, now did it? In fact, that looks like our code is trying to jump to our gadget (c394). Unfortunately, there's one more thing you have to know. There's a special object that's required for these gadgets to be useful without crashing PHP, otherwise the gadget's aren't useful at all. I'll save you the trouble of guessing/digging around in old code to figure it out. The object you need is SPLObjectStorage. Knowing this, we'll have to reformat all of our attacks from before. Once we have done that, then get the following after running through our exploit again, as shown below:

Screenshot taken from fully automated Python exploit. See video for details

This is my stopping point for this method, as it only affects older versions of php (With these same gadgets). For newer versions of PHP, keep reading :).

Fortunately not much changes. Recall that we looked up the address of php_execute_script AND the jmp_buf. We'll need both for this version of the exploit.

jmp_buf is used by setjmp & longjmp, and saves the "environment" in case of an "unrecoverable" error. The jmpbuf is different depending on your architecture. For 32 bit, this is an unsigned array with 6 ints. If you're on 64 bit, there are 8 ints. Unfortunately, due to how this is implemented, there will be some digging in source code required for you to determine which position the registers are in within the jmp_buf. Here's an example of the jmp_buf layout. Of course, let's see how this looks in PHP...

Great! For my machine, the order of registers are: ebx, esp, ebp, esi, edi, eip. Since things worth doing in life aren't easy, this of course isn't a simple search! It looks like our edi & eip register are obfuscated. How're they obfuscated? By Glibc of course! Glibc has a macro called PTR_MANGLE. In the video we'll discuss how we crack the JMPBUF.

Once we have it cracked, we need a way to overwrite and free memory. Thankfully, the same object (SPLObjectStorage) allows us to free memory remotely. All that's left is writing data to the stack. Just like in Part 2, we abuse the memory caching of PHP. We free some memory, write a small 7 byte string to fill it, and when php overwrites part of our data, we do it again. This 2nd overwriting allows us to write an arbitrarily amount of data on the stack (I did not test for values over 2048). This data that we want to write is very similar to what we used in the previous ROP example. We of course need to "encrypt" our values for PTR_MANGLE. Here's some example output:

With that said, here's the video!

NOTES FOR THE VIDEO:

x86 Instruction Chart - http://sparksandflames.com/files/x86InstructionChart.html

Elf Header lowest 3 bits are 000

Elf layout - http://geezer.osdevbrasil.net/osd/exec/elf.txt

PMAP is your friend when trying to find the "Magic"

A look at PTR_MANGLE http://hmarco.org/bugs/CVE-2013-4788.html

Stay tuned for some iOS fun ;)

Monday, February 23, 2015

Exploiting memory corruption bugs in PHP (CVE-2014-8142 and CVE-2015-0231) Part 2: Remote Exploitation

In Part 1, we figured out how to locally exploit CVE-2014-8142 and CVE-2015-0231. In Part 2, we'll discuss remotely exploiting this vulnerability, and what we can steal from the application using the methods we discover. However, we will be focusing solely on CVE-2015-0231. Feel free to make the necessary changes as outlined in Part 1 to get CVE-2014-8142 working.

If you recall, Esser gave us code that leaked data at a non-attacker controlled address, which is shown below.

While this is useful, it's not as useful as the script we just wrote! We want to leak arbitrary memory remotely, not just random and basically useless memory. To do this, we need a way to:

Write arbitrary data (without crashing PHP)
Read arbitrary data (without crashing PHP)

As with anything in life, it's easier to tackle these problems one at a time! So, let's start with #1. We can write whatever we want, since we're sending our own object. However, we need a way to write useful information. Here's our last example:

We'll want to focus on the $fakezval variable. Is there someway we could write this zval remotely within a serialized object?? (Hint: It's a "feature" :D!)

As an aside, don't be stupid and lazy like I was. Read ALL of the code you're working on and around. I wasted a good 5-6 hours trying to figure this part out, until I face-punched myself a for missing this obvious portion of code.

Thankfully, there is, the 'S' character. The S character in a serialized string allows us to serialize and unserialize binary data. Let's play around with an S object, to get a better feel for how it works and how we can abuse it!

And let's go ahead and run our code, so we can see the awesomeness!

Hmm, this isn't exactly awesome. We're clearly doing something wrong here. To save everyone some face-palming, this error has to do with our S object, but feel free to count out the bytes by hand! And yes, I know PHP error messages suck.

In a normal serialized string, such as s:3:"123", the integer 3 is the number of characters that the string contains. However, in our above code, we have S:43:"\00\01\00\00AAAA\00\01\01\00\01\00\0B\BC\CC", which also has an integer (43), which is the length of our string, right?

Well, not exactly. We're not wanting a literal string here, we're wanting PHP to interpret this as binary data, as our string isn't actually 43 characters long, but 17. Let's try changing 43 to 17.

Excellent! But why did we use 17? Well, each \xx is considered to be 1 "character", which would leave us with 13. The "AAAA" characters are considered normal characters, so we add 4 to account for these. In short: Each \xx "triplet" is considered one character. Ok, now we can send this string, but how do we get information from the interpreter?

Before we leak arbitrary addresses, let's try to learn more about the server itself, as this will help us write a more reliable exploit (More on this in Part 3). To start, let's learn how to determine endianness of the server. To do so, we'll use a fake integer zval.

The idea stays the same:

Create an array of integers
Free the array we just created
Create our string from the previous example
Point to the reference of an integer that we just freed

Why use an integer zval instead of a string? Well, if you recall the zval struct, an integer will look like the following:

We set the value of the integer

00 01 00 00

In Little Endian, this is 0x100
In Big Endian, this is 0x1000

We then fill the next 8 bytes with junk

41 41 41 41 (Or: AAAA)

The next 8 bytes are the reference counter

00 01 01 00

The final 8 bytes are 01 (Type Integer) and then we fill the rest with junk

01 00 bc cc

Putting this all together gives us our "S" value! But how does it tell us the endianness of the server? Well, in the server response, if it returns 0x100 (256), we'll know it's little endian! If it returns 65536, we'll know it's big endian! Let's see it in action:

And the application response:

Excellent! We now know the endianness of the server! Of course, we're trying to leak arbitrary data still, so let's figure that part out next. Since we can successfully leak data we supplied, is it possible to leak data at an arbitrary address?

Since the blog didn't stop here, it is assumed the answer is yes! However, instead of a fake integer zval, we'll use a fake string zval instead, which will look like the following:

We set the pointer to our string data

00 80 04 08

We set the length of the string that we wish to extract (1024)

00 04 00 00

We set the reference count to be not zero (0x101)

00 01 01 00

Finally, we set the type to string, and fill the rest with junk

06 00 0b bc

Here's what our new script looks like:

And when we run it, we get:

Excellent! We can now dump arbitrary memory, but we're required to know the address, which isn't practical. How do we extract addresses remotely? We could use the code from Part 1 to leak addresses, but the data leaked there isn't pointing to anything significant. Is there some other mechanism we can abuse to extract information?

Thankfully, there is! See the code below:

And when we run it:

Now, this is a rather big array, so let's break it down. The general idea is:

Create Integer Array #1

This will empty the memory cache

Create Integer Array #2

Fill in the variable table

Free Integer Array #2

Free spots in the variable table

Create an array of objects mixed with the S object
Free that Array of mixed objects

See comment below

Point to an integer value in Array #2 that was freed and overwritten
Response contains valuable data

We free the Object array so that the first 4 bytes in the array are overwritten by memory cache, since it was just freed (and therefore available for writing). In doing so, the string pointer (Which was previous 0x41414141) now points to the previously freed memory object. TL;DR - We get legit addresses!

But which addresses do we want? We're looking for the address that proceeds: "\x00\x00\x00\x00\x05\x00". This will be an object handler address, which is a struct in the data segment. Now, we can read the entire object handler table, and get information into the code segment of PHP(which is what we're interested in, since we want to pop a shell).

Here's the command to see the hex values returned by PHP

cphp newLeak.php | xxd -ps | sed 's/[[:xdigit:]]\{2\}/\\x&/g'

Let's go ahead and let it run, and when we grep for "\x00\x00\x00\x00\x05\x00", we find the following address:

If we load this into GDB, we also see that it is in fact a pointer into our object handler. Don't forget to set a breakpoint (I set one in var_unserializer.c:337) before running!

And what we see is:

Before we get too excited, let's make sure these are actually pointing to interesting things, let's just take the first entry: 0x0830a640. Here's what's stored at this address:

Awesome! We can now see everything that we need to! Thanks to these methods, we can now steal:

The entire PHP binary (and it's data)
SSL Certs (via mod_ssl)
PHP Symbols
Addresses of other modules (and their data as well)

Stay tuned for Part 3, where we pop a shell! This technique can also be used for CVE-2015-0273, as well as other UAF exploits in PHP.

Part 3 will take a while longer to get out, as I need to play around with PHP some more (and do some reading) before I can finish the exploit. But it will come!