Binary Analysis & CTF stuff
- Reversing
- Debuggers
- Code Search
- Dynamic Binary Instrumentation
- Vulnerabilities
- Exploits
- Elf stuff
- Virus
- (De)Obfuscation
- CTF
- What is Binary Analysis
- Misc
- nmap - Digital forensics - pwn.college
- shellcode
Reversing
Disassembly
disassembler linear sweep Shingled recusrive traversal anti-disassembly https://github.com/AppleReer/Anti-Disassembly-On-Arm64
relative disassembler performance
capstone http://www.capstone-engine.org/ - disassembler. converse of key zydis xed distrom iced bddisasm yaxpeax
McSema - older trail of bits lifter. Uses llvm as IR remill lift to llvm bitcode anvill processing remill rellic makes C like code
BAP ANGR - VEX which is valgrind’s ir
gtirb ddisasm grammatech https://grammatech.github.io/gtirb/ https://github.com/GrammaTech/gtirb “GTIRB explicitly does NOT represent instructions or instruction semantics but does provide symbolic operand information and access to the bytes. “
Speculative disssembly https://ieeexplore.ieee.org/document/7745279 decode every offset. Refine blocks. Spedi, open source spcualtive disassembler https://github.com/abenkhadra/spedi Nucleus paper https://mistakenot.net/papers/eurosp-2017.pdf Compiler-Agnostic Function Detection in Binaries
superset disassembler kenneth https://personal.utdallas.edu/~hamlen/bauman18ndss.pdf civuentes thesis
probablistic disassembly using proabablistic datalog? bap mc + datalog?
Formally Verified Lifting of C-Compiled x86-64 Binaries Scalable validation of binary lifters
- Linear
- Recursive
-
Delay slots are an annoyance. Some architectures allow instructions to exist in the shadow of jump instructions that logically execute beofre the jump instruction. This makes sense from a micro architectural perspective, but it is bizarre to disassemble
https://rev.ng/ rev.ng paper osr - offset shift range. I;ve seen this called value set analysis? rev.ng thesis
Analyzing Memory Accesses in x86 Executables Reps and Balakrishnnan
byteweight - bap neural network thing identifies function starts
Ghidra repackaging: lifting bits sleigh pypcode https://github.com/StarCrossPortal/sleighcraft https://github.com/black-binary/sleigh
An Empirical Study on ARM Disassembly Tools
https://github.com/simonlindholm/asm-differ
Decompiler
Interesting. they are actuallyb trying to get source that compiles to exactly the assembly. They might have the original tool chain “matching” decompilation https://decomp.me/ https://github.com/simonlindholm/decomp-permuter https://github.com/matt-kempster/m2c mips and powerpc decompiler
https://github.com/zeldaret/oot ocarina of time deocmpilation https://objmap.zeldamods.org/#/map/z3,0,0 https://github.com/pmret/papermario paper maro https://github.com/pret/pokeemerald https://github.com/pret
https://github.com/leoetlino/project-restoration
https://github.com/trailofbits/BTIGhidra bincat retypd
https://x.com/mahal0z/status/1717600833037377613?s=20 https://www.zionbasque.com/files/publications/sailr_usenix24.pdf Ahoy SAILR! There is No Need to DREAM of C: A Compiler-Aware Structuring Algorithm for Binary Decompilation Graph schema matching? Smart methodology. Take codebase, decompile, compare number of gotos in original vs decompiled functions. Find hotspots. Binary search for passes responsible
DREAM Pheonix
FoxDec - Formally verified x86-64 decompilation
relic - no more gotos
a comb for decompiled C https://rev.ng/downloads/asiaccs-2020-presentation.pdf
Cifuentes thesis kind of a ludicrous amount of info “Decompiler compiler” references BB91 BB92 bbl91 bbl93 bow91 bow93 A Compendium of Formal Techniques for Software Maintenance the redo project final report
Decompilation: The Enumeration of Types and Grammars From programs to object code and back again using logic programming: Compilation and decompilation The art of computer un-programming: Reverse engineering in Prolog Generating Decompilers
https://www2.cs.arizona.edu/~collberg/Teaching/620/1999/Handouts/rmeadows1.pdf
Polymorphic Type Inference for Machine Code
Static Single Assignment for Decompilation - Emmerik https://twitter.com/brightprogramer/status/1626841275801702400?s=20
Decompilers and beyond Ilfak Guilfanov, Hex-Rays SA, 2008
Type-Based Decompilation? (or Program Reconstruction via Type Reconstruction)- Alan Mycroft
Loop recovery
Getting structured control flow from CFG Stackify https://gitlab.inria.fr/why3/why3/-/merge_requests/453
havlak and tarjan testing redicibility with union find
A New Algorithm for Identifying Loops in Decompilation
dfs. label nodes. Intervals of labels are subsets of nodes. timestamp of first visit timestamp of last ivist. backedge forward edge, cross edge
https://zenodo.org/record/6727752#.YsxRw9rMI2w
Also… egraphs
webassembly is at the core of both of these Allllll things come around I wonder if webassembly would be a good universal disassembler ir someone should do that https://github.com/nforest/awesome-decompilation
Interactive
- IDA
- Ghidra
- Binary Ninja
- Cutter
- angr management
https://github.com/GrammaTech/gtirb-vscode
decompiler explorer Hmm. too bad it’s not a web service
Binary Ninja
SCC- shellcde compiler. Why is this a top level thing?
Binary View Making a plugin - has remote denug to vs code https://docs.binary.ninja/dev/plugins.html
current_function
bv.functions
f.callers f.callees
Types
https://docs.binary.ninja/guide/type.html#boolean ` bv.parse_type_string(“uint64_t”) ` slow but convenient
Type.bool Type.char
#!/usr/bin/env python3
import binaryninja
with binaryninja.open_view("/bin/ls") as bv:
print(f"Opening {bv.file.filename} which has {len(list(bv.functions))} functions")
IR Tower
Lifted IL
LLIL
HLIL
Ghidra
See ghidra notes
radare
https://book.rada.re/first_steps/history.html
radare2.ra
radare2.radiff2
radare2.sleighc
https://github.com/radareorg/r2ghidra
echo "
int foo(int x) {return x*x + 1;}
" > /tmp/foo.c
gcc /tmp/foo.c -c -o /tmp/foo.o
radare2.r2
Angr
import angr #, monkeyhex
proj = angr.Project('/bin/true')
state = proj.factory.entry_state()
code = '''
int fact(int x){
int acc = 1;
while(x > 0){
acc *= x;
x--;
}
return acc;
}
'''
import tempfile
import subprocess
import angr #, monkeyhex
import os
with tempfile.NamedTemporaryFile(suffix=".c") as fp:
with tempfile.TemporaryDirectory() as mydir:
fp.write(code.encode())
fp.flush()
fp.seek(0)
print(fp.readlines())
print(fp.name)
print(mydir)
outfile = mydir + "/fact"
print(outfile)
print(subprocess.run(["gcc", "-g", "-c","-O1", "-o", outfile, fp.name], check=True))
print(os.listdir(mydir))
print(subprocess.run(["objdump", "-d", outfile], check=True))
proj = angr.Project(outfile)
print(dir(proj))
print(proj.arch)
print(proj.entry)
print(proj.filename)
print(proj.loader)
block = proj.factory.block(proj.entry)
print(block.pp())
print(block)
print(block.vex)
print(block.instructions) # numebr of instryuctions
print(dir(proj.analyses))
state = proj.factory.entry_state()
print(dir(state))
print(proj.loader.find_symbol("fact"))
#state = proj.factory.entry_state()
print(state.step())
print(dir(state.solver))
print(state.solver.all_variables)
echo "
int fact(int x){
int acc = 1;
for (int i = 1; i < x; i++){
acc *= i;
}
return acc;
}
" > /tmp/fact.c
gcc /tmp/fact.c -c -o /tmp/fact
echo "
int max(int x){
return x > 0 ? x : -x;
}
" > /tmp/max.c
gcc /tmp/max.c -c -o /tmp/max
import angr
p = angr.Project('/tmp/fact')
cfg = p.analyses.CFGFast()
print("This is the graph:", cfg.graph)
entry_func = cfg.kb.functions[p.entry]
print(entry_func.name)
blocks = list(entry_func.blocks)
print(blocks)
irsb = blocks[0].vex
print(dir(entry_func))
print(entry_func.block_addrs)
print(dir(blocks[0]))
print(entry_func.addr)
import angr
p = angr.Project('/tmp/max')
cfg = p.analyses.CFGFast()
print("This is the graph:", cfg.graph)
entry_func = cfg.kb.functions[p.entry]
print(entry_func.name)
blocks = list(entry_func.blocks)
print(blocks)
irsb = blocks[0].vex
from pyvex.stmt import *
from pyvex.expr import *
jmp_table = ["""
jump_table:
switch(state.gr[184]){
"""]
#for addr in addrs:
# jmp_table.append(f"case 0x{addr:x}: goto {label(addr)};")
#jmp_table.append("""
#default:
# assert(false); // unexpected indirect jump
#}
#""")
def proc_binop(expr):
#print(expr.op)
# TODO: perhaps I need casts here to signed unsigned?
if expr.op == "Iop_Sub64":
return f"({proc_expr(expr.args[0])} - {proc_expr(expr.args[1])})"
elif expr.op == "Iop_Sub32":
return f"({proc_expr(expr.args[0])} - {proc_expr(expr.args[1])})"
elif expr.op == "Iop_Add64":
return f"({proc_expr(expr.args[0])} + {proc_expr(expr.args[1])})"
elif expr.op == "Iop_Add32":
return f"({proc_expr(expr.args[0])} + {proc_expr(expr.args[1])})"
elif expr.op == "Iop_Shr64":
return f"({proc_expr(expr.args[0])} >> {proc_expr(expr.args[1])})"
elif expr.op == "Iop_And64":
return f"({proc_expr(expr.args[0])} & {proc_expr(expr.args[1])})"
elif expr.op == "Iop_Xor64":
return f"({proc_expr(expr.args[0])} & {proc_expr(expr.args[1])})"
elif expr.op == "Iop_CmpLT32S":
return f"({proc_expr(expr.args[0])} < {proc_expr(expr.args[1])})"
elif expr.op == "Iop_Mul32":
return f"({proc_expr(expr.args[0])} * {proc_expr(expr.args[1])})"
else:
assert False, f"Unimplemented binop {expr.op}"
def proc_unop(expr):
#print(expr.op)
if expr.op == "Iop_64to32":
return f"((int32_t) {proc_expr(expr.args[0])})"
elif expr.op == "Iop_32Uto64":
return f"((int64_t) {proc_expr(expr.args[0])})"
elif expr.op == "Iop_1Uto64":
return f"((int64_t) {proc_expr(expr.args[0])})"
elif expr.op == "Iop_64to1":
return f"((bool) {proc_expr(expr.args[0])})"
else:
assert False, f"Unimplemented unop {expr.op}"
def proc_expr(expr : IRExpr):
#print(expr)
if isinstance(expr, Binder):
assert False
elif isinstance(expr, VECRET):
assert False
elif isinstance(expr, GSPTR):
assert False
elif isinstance(expr, RdTmp):
return str(expr)
elif isinstance(expr, Get):
assert expr.ty == "Ity_I64"
return f"state->gr[{expr.offset}]"
elif isinstance(expr, Qop):
assert False
elif isinstance(expr, Triop):
assert False
elif isinstance(expr, Binop):
return proc_binop(expr)
elif isinstance(expr, Unop):
return proc_unop(expr)
elif isinstance(expr, Load):
#assert expr.ty == "Ity_I32" # TODO. We need to deal with these types and endianess. Yuck
return f"state->mem[{expr.addr}]"
elif isinstance(expr, Const):
return str(expr)
elif isinstance(expr, ITE):
return f"{proc_expr(expr.cond)} ? {proc_expr(expr.iftrue)}: {proc_expr(expr.iffalse)}"
elif isinstance(expr, CCall):
assert False
else:
assert False
def label(addr):
return f"L_0x{addr:x}"
def proc_type(typ):
if ty == "Ity_I32":
return "int32_t"
elif ty == "ItyI64":
return "int64_t"
else:
assert False
def proc_stmt(stmt : IRStmt):
#print(stmt.tag)
if isinstance(stmt, NoOp):
return "// NoOp"
elif isinstance(stmt, IMark):
# assert PC = expected pc here?
# f"assert(state->gr[{}] == stmt.addr);"
#return f"break; case 0x{stmt.addr:x}: //{label(stmt.addr)}:" # also print original assembly in comment
return f"//{label(stmt.addr)}:"
elif isinstance(stmt, AbiHint):
return f"return; // {stmt}" # TODO: add return parameter? Or state is good. Is abihint always at returns?
elif isinstance(stmt, Put):
return f"state->gr[{stmt.offset}] = {proc_expr(stmt.data)};"
assert False
elif isinstance(stmt, PutI):
assert False
elif isinstance(stmt, WrTmp):
return f"t{stmt.tmp} = {proc_expr(stmt.data)};"
elif isinstance(stmt, Store):
return f"state->mem[{stmt.addr}] = {proc_expr(stmt.data)};"
elif isinstance(stmt, CAS):
assert False
elif isinstance(stmt, LLSC):
assert False
elif isinstance(stmt, MBE):
assert False
elif isinstance(stmt, Dirty):
assert False
elif isinstance(stmt, Exit):
print(stmt.jk)
assert stmt.jk == "Ijk_Boring"
return f"if({proc_expr(stmt.guard)}){{ state->gr[{stmt.offsIP}] = 0x{stmt.dst.value:x}; break; }}"
# TODO deal with other jumpkind
# What represents an indirect jump? goto jump_table;
# elif stmt.jk == "Ijk_Ret" f"return;"
# elif stmt.jk == Ijk_Call hmmm. f"{funname}()" maybe?
# elif stmt.jk == "Ijk_Exit"
elif isinstance(stmt, LoadG):
assert False
else:
print("unrecognized IRStmt")
assert False
output = []
tmps = set()
for block in blocks:
#print(dir(block))
#print(block.instructions)
output.append("/*")
output.append(str(block.disassembly))
output.append("*/")
output.append(f"case 0x{block.addr:x}:")
for stmt in block.vex.statements:
#stmt.pp()
output.append(proc_stmt(stmt))
if isinstance(stmt,WrTmp):
tmps.add(stmt.tmp)
# output.append("goto jump_table;")
output.append("break;")
output.append("default: assert(0); // Unexpected PC value. Something has gone awry.")
#print("\n".join(output))
header = """
#include <stdint.h>
#include <assert.h>
#include <stdbool.h>
#define PUT(reg) reg
#define PC 184
typedef struct state_t
{
int64_t *mem;
int64_t *gr;
} state_t;
"""
with open("/tmp/decomp.c", "w") as file:
file.write(header)
file.write(f"void {entry_func.name}_decomp(state_t *state){{\n") # use entry_func.name?
file.write("int " + ",".join([f"t{tmp}" for tmp in tmps]) + ";\n") # declare temps
file.write(f"state->gr[PC] = 0x{entry_func.addr:x};\n") # initilizae PC to entry point
file.write("while(1){\n") # interpreter loop invariant on PC? Could make gas?
file.write("switch(state->gr[PC]){")
file.writelines([x + "\n" for x in output])
file.write("}}}\n")
clang-format -i --style=google /tmp/decomp.c
cat /tmp/decomp.c
gcc /tmp/decomp.c -O2 -c -o /tmp/decomp.o -Wall -Wextra -Wcast-align -Wcast-qual -Wmissing-declarations
esbmc /tmp/decomp.c --function max_decomp #--goto-functions-only
Control flow encoding. Do I do separate jump table, go to jump table every time?
while(true){switch state.gr[pc]{ case: case: case: case default: } }
This is fairly conservative.
Big block encoding trusts fall through behavior.
Byte address and then cast mem pointers.
import angr
p = angr.Project('/tmp/fact')
cfg = p.analyses.CFGFast()
print("This is the graph:", cfg.graph)
entry_func = cfg.kb.functions[p.entry]
print(entry_func.name)
blocks = list(entry_func.blocks)
print(blocks)
irsb = blocks[0].vex
import pyvex
stmts = []
counter = 0
def fresh():
global counter
counter += 1
return f"%v{counter}"
def print_expr(dst, expr):
if isinstance(expr, pyvex.expr.Const):
stmts.append( { "op": "const", "type": "int", dest: dst, "value": expr.con })
elif isinstance(expr, pyvex.expr.Binop):
args = [fresh() for _ in range(2)]
for a, e in zip(args, expr.child_expressions):
print_expr(a,e)
stmts.append({ "op": expr.op, "type": "int", "dest": dst, "args": args })
elif isinstance(expr, pyvex.expr.RdTmp):
print(expr.tmp)
#stmts.append({ "op": "id", "type": "int", "dest": dst, "args": })
elif isinstance(expr, pyvex.expr.Get):
return f"%{expr.offset}"
else:
print(expr)
print(type(expr))
assert False
for stmt in irsb.statements:
stmt.pp()
if isinstance(stmt, pyvex.IRStmt.Store):
expr = print_expr(stmt.data)
print(stmt.end)
assert stmt.end == "Iend_LE"
expr["op"] = "store"
expr["dest"] = expr.addr
stmts.append(expr)
elif isinstance(stmt, pyvex.IRStmt.Put):
print_expr(stmt.data)
print(stmt.offset)
elif isinstance(stmt, pyvex.IRStmt.WrTmp):
print_expr(stmt.data)
print(stmt.tmp)
elif isinstance(stmt, pyvex.IRStmt.IMark):
pass
else:
print(stmt)
print(type(stmt))
assert "unrecognized stmt" == None
print(stmts)
Patching
See notes on patching
Diffing
see note on patching
Binary reversing
https://corte.si/posts/visualisation/binvis/index.html hilbert curves for binary vsiualization benford’s law
binwalk ofrak fra cwechecker
entropy visualization
https://www.usenix.org/system/files/sec22-burk.pdf Decomperson: How Humans Decompile and What We Can Learn From It
Debuggers
See note on debuggers
Code Search
https://github.com/bstee615/tree-climber https://github.com/langston-barrett/tree-sitter-souffle https://github.com/weggli-rs/weggli https://github.com/BrianHicks/tree-grepper isn’t semgrep basically a treesitter grepper?
Joern codeql semgrep
Dynamic Binary Instrumentation
dynamorio https://dynamorio.org/ frida https://frida.re/ injects a quickjs huh pin https://www.intel.com/content/www/us/en/developer/articles/tool/pin-a-dynamic-binary-instrumentation-tool.html
Vulnerabilities
https://github.com/river-li/awesome-uefi-security
CWE - common weakenss enumeration https://attack.mitre.org/
integer overflow https://cwe.mitre.org/data/definitions/190.html
sophos - comprehensive exploit prevention 2018
Current State of Exploit Dev 2020
Web App
Burp suite
Windows
microsoft exploits and exploit kits
Mitigations
Control Flow integrity is a broad term for many of these CONFIRM: Evaluating Compatibility and Relevance of Control-flow Integrity Protections for Modern Software
DEP - data execution prevention executable space protection This says DEP is Windows terminology? NX bit
shadow stack
control flow guard - windows reverse flow guard extreme flow guard kernel data protection
stack canary https://www.keil.com/support/man/docs/armclang_ref/armclang_ref_cjh1548250046139.htm -fstack-protector. Guard variable put on stack SSP stack smashing protection. Stackguard, Propolice. https://embeddedartistry.com/blog/2020/05/18/implementing-stack-smashing-protection-for-microcontrollers-and-embedded-artistrys-libc/ Buffer overflow protection
ASLR ASLP A Address Space Layout Randomization. Libraries are linked in at a different location. This make code reuse in an exploit more difficult.
Fat pointers
endbr intel control flow enforcement technology (CET). Valid locations for indirect jumps.
ASLR - Addresses are randomized cat /proc/mem/self ? To look at what actually loaded Also ldd shows were libraries get loaded in memory Stack canaries - set once per binary run, so with forking you can brute force them or maybe leak them?
checksec tells you about which things are enabled. https://opensource.com/article/21/6/linux-checksec which also has a rundown of the different things and how you could check them manually. Can output into xml, json, csv
gcc options -no-pie -pie -fpie -no-stack-protection -fstack-protector-all -z execstack makes stack executable
RELRO - relocation read only. GOT table becomes read only. Prevents relocation attacks
binary diversification - compiler differently every time. code reuse becomes way harder diversification make many versions of binary to make code reuse attacks harder. disunison
Exploits
Buffer Overflows
https://punkx.org/overflow/ a game version of buffer overflows. cool.
buffer overflow When a buffer overflow occurs you are writing to memory that possibly had a different purpose. Maybe other stack variables, maybe return address pointers, maybe over heap metadata.
Sanitization of user input Off by one errors String termination
Primitives
AWP Arbitrary write primitive CWE-123: Write-what-where Condition ARP arbitrary read primitive
Return Oriented Programming (ROP)
ropgadget ropium supports semantics queries. hmm in cpp. V impressive.
return to libc libc is very common and you can weave together libc calls. “Solar Designer” solar designer 1997
smashing the stack for fun and profit - stacks are no longer executable
https://acmccs.github.io/papers/geometry-ccs07.pdf geometry of innocent flesh on the bone. ROP
https://github.com/sashs/Ropper
pop_gadget ; value ; nextgadget loads from stack into register
pure buffer overflow from command line:
#include <stdio.h>
int main(int argc, char *argv[])
{
char buf[256];
memcpy(buf, argv[1],strlen(argv[1]));
printf(buf);
}
rop prevention by binary rewriting ropguard
stack pivoting moving over to a different stack.
ret2libc ret2dlresolve ret2csu ret2plt
https://medium.com/cyber-unbound/buffer-overflows-ret2libc-ret2plt-and-rop-e2695c103c4c
JOP COP
jop rocket. blackhat talk
https://i.blackhat.com/asia-21/Thursday-Handouts/as-21-Brizendine-Babcock-Prebuilt-Jop-Chains-With-The-Jop-Rocket-wp.pdf
https://github.com/Bw3ll/JOP_ROCKET
Data Oriented Programming (DOP)
Heap
https://c4ebt.github.io/2021/01/22/House-of-Rust.html house of rust
If you can overwrite a struct that contains a pointer, you can use this to obtain reads or writes when that pointer is read or written. If the struct contains a code pointer, you get control flow execution.
Heap layout problem Heap layout manipulation
Metadata
Double free Use after free
http://phrack.org/issues/61/6.html advanced doug lea malloc hacking
https://milianw.de/blog/heaptrack-a-heap-memory-profiler-for-linux.html valgrind massif perf-mem, valgrind massif, and heaptrack
https://heap-exploitation.dhavalkapil.com/ https://github.com/DhavalKapil/heap-exploitation
advanced doug lea malloc - phreak post
glibc.
ldd /bib/ls
libc.so.6 - symbolic link probably
glibc 2.27
libc-2.31.so actually
pie
You can run it? /lib/x86_64-linux-gnu/libc.so.6
malloc chunks of memory new/delete make_unique
pwndbg vis
let’s you see allocated chunks. How do it do it?
vmmap also shows memory regions classified
top chunk. I think the top chunk is resized to hand out new memory.
Metadata - size field. prev_in_use flag.
Allocator hands out in discrete sizes, not arbitrarily flexible
Playing for K(H)eaps: Understanding and Improving Linux Kernel Exploit Reliability defragmentation
slake
libheap examine glibc heap in gdb. seems like there is a python model of heap in here.
https://ir0nstone.gitbook.io/notes/types/heap/bins
chunks - has prev size, size, info bits. Free chunks have pointers in content top chunk - large piece of memory new chunks are carved out of last remainder chunk tcache fastbin - last in first out. a stacklike structure
large bins - varying size. chunks put in order
printf(“%p”) is your friend
free lists when we call free memory fastbin. hold free checks of a specific size set context-sections code b main vis to visualize heap fastbins command -x20 - 0x80 arenas main arena - glibc data section pwndbg arena
fd forward pointer
House of Force
fastbin dup
arbitrary write double free fastbin will return twice. frame command to context code
immediate double free is prtoected
free_hook
find_fake_fast one_gadget - constraints https://github.com/david942j/one_gadget
malloc_alloc
unsafe unlink
n vis
unsorted bin doubly linked circular free unsorted chunks have forward and backward pointers. new free added to head. malloced from tail
adjeacent chunks get considlated prev in use flags unlinking. chunks being removed from doubly linked list
2000 solar designer voodoo malloc tricks
Type Confusion
Kernel
double free ps4 https://github.com/Cryptogenic/Exploit-Writeups/blob/master/FreeBSD/PS4%205.05%20BPF%20Double%20Free%20Kernel%20Exploit%20Writeup.md
https://twitter.com/sirdarckcat/status/1584846038866989056?s=20&t=udFq9u7zLY-5-Ae6VrdqeQ Joy of explooitation the kernel http://slides.kernel.kitchen
Browser
https://twitter.com/5aelo?lang=en Attacking JavaScript Engines: A case study of JavaScriptCore and CVE-2016-4622
How I started chasing speculative type confusion bugs in the kernel and ended up with ‘real’ ones
Return to sender Detecting kernel exploits with eBPF https://github.com/Gui774ume/krie
Automated Exploit Generation (AEG)
usenix security heaphopper angr symbolic analysis for heap exploits? archeap maze toward heap feng shui backward search from heaphopper teerex discover of memory corrupton vulen [symcc]
Elf stuff
See linker elfmaster elf reverse
example interesting elf files overlaying headers. smallest that doesn’t voilate spec
binary golf workshop https://codegolf.stackexchange.com/ size coding dead bytes libgolf Elf Binary Mangling Pt. 4: Limit Break
Virus
https://tmpout.sh/2/6.html Preloading the linker for fun and profit ~ elfmaster https://arcana-technologies.io/blog/the-philosophy-of-elves-in-linux-threat-detection/
https://www.vx-underground.org/#E:/root vxunderground VXHeaven - mirror https://github.com/opsxcq/mirror-vxheaven.org tmp.out https://tmpout.sh/3/02.html second to hell interview https://github.com/SPTHvx/SPTH roy g biv hh86 herm1t
https://spthvx.github.io/ezines/ ezines
Language infection project Valhalla 0-4 zine Metamorphism, Formal grammars and Undecidable Code Mutatio “Polymorphism and Grammars” - qozah
see herm1t’s metamorphic Linux virus Linux.Lacrimae in EOF #2 and his article “Recompiling the Metamorphism”
From the design of a generic metamorphic engine to a black-box classification of antivirus detection techniques - tau obfuscation UNIX ELF Parasites and virus - Silvio Cesare
More virus ezines https://ivanlef0u.fr/repo/madchat/vxdevl/vdat/ezines1.htm ~2000 mostly
https://privacy-pc.com/articles/vx-the-virus-underground.html initeresting little article
https://www.exploit-db.com/ also has zines in papers https://gitlab.com/exploit-database/exploitdb
antivirus AV
metamorphic / polymorphic - mutate themselves to avoid simple detection Relationship to quines?
(De)Obfuscation
https://github.com/mrphrazer/obfuscation_detection https://obfuscator.re/omvll/ like obfuscation passes for llvm? https://github.com/javascript-obfuscator/javascript-obfuscator https://github.com/HikariObfuscator/Hikari llvm archived https://github.com/Guardsquare/proguard java https://obfuscator.re/dprotect/ bytecode android https://github.com/dashingsoft/pyarmor python
anti hooking - maybe more generally these are anti-RE techniques. debugger detection. VM detection. control flow breaking opaque constants
virtual machine VM obfuscation https://reverseengineering.stackexchange.com/questions/24956/virtual-machine-code-obfuscation-implementation-details jit anti alias analysis self modification
https://tigress.wtf/index.html - C source to source https://tigress.wtf/transformations.html list of transformations
Mixed Boolean Arithmetic (MBA)
https://arxiv.org/pdf/2209.06335.pdf Efficient Deobfuscation of Linear Mixed Boolean-Arithmetic Expressions ∗
https://www.youtube.com/watch?v=5yDzbFz2yWo&ab_channel=HackInTheBoxSecurityConference HITB2022SIN #LAB Advanced Code Obfuscation With MBA Expressions - Arnau Gàmez Montolio
https://archive.bar/pdfs/bar2020-preprint9.pdf QSynth - A Program Synthesis based Approach for Binary Code Deobfuscation https://github.com/quarkslab/qsynthesis
https://secret.club/2022/08/08/eqsat-oracle-synthesis.html https://github.com/fvrmatteo/oracle-synthesis-meets-equality-saturation
mixed Boolean arithmetic https://github.com/softsec-unh/MBA-Solver/blob/main/README.md mbasolver https://github.com/quarkslab/sspam/blob/master/README.md
https://github.com/RUB-SysSec/syntia/blob/master/README.md https://github.com/astean1001/ProMBA simb gamba goomba msynth https://github.com/mrphrazer/msynth
https://dl.acm.org/doi/pdf/10.1145/3453483.3454068 Boosting SMT Solver Performance on Mixed-Bitwise-Arithmetic Expressions
Viruses try to obfuscate themselves t avd detection
CTF
SUID https://gtfobins.github.io/ GTFOBins is a curated list of Unix binaries that can be used to bypass local security restrictions in misconfigured systems.
Misconfiguration https://osquery.readthedocs.io/en/stable/
commando vm
yolo space hacker. steam game of ctf stuff exploit_me very vulnerable example applications pwntools
pwndbg
cyclic_gen
What is Binary Analysis
Dynamic - It feels like you’re running the binary in some sense. Maybe on an emulated environment Static - It feels like you’re not running the binary
Fuzzing is definitely dynamic. Dataflow analysis on a CFG is static There are greys areas. Symbolic execution starts to feel like a grey area. I would consider it to be largely dynamic, but you are executing in a rather odd way.
Trying to understand a binary Why?
- Finding vulnerabilities for defense or offense
- buffer overflows
- double frees
- use after frees
- memory leaks - just bad performance
- info leaks - bad security
- Verification - Did your compiler produce a thing that does what your code said?
- Reversing/Cracking closed source software.
- Patching and Code injection. Finding Bugs for use in speed runs. Game Genie.
- Auditing
- Aids for manual RE work. RE is useful because things may be undocumented intentionally or otherwise. You want to reuse a chip, or turn on secret functionality, or reverse a protocol.
- Discovery of patent violation or GPL violations
- Comparing programs. Discovering what has been patched.
I don’t want my information stolen or held ransom. I don’t want people peeping in on my conversations. I don’t want my computer wrecked. These are all malicious actors We also don’t want our planes and rockets crashing. This does not require maliciousness on anyone’s part persay.
- Symbol recovery
- Disassembly
- CFG recovery
- Taint tracking
- symbolic execution
https://github.com/analysis-tools-dev/dynamic-analysis A list of tools https://analysis-tools.dev/ CSE597: Binary-level program analysis Spring 19 Gang Tan
Program Analysis
What’s the difference? Binaries are less structured than what you’ll typically find talked about in program analysis literature.
Binaries are tough because we have tossed away or the coupling has become very loose between high level intent and constructs and what is there.
How are binaries made
C preprocessor -> Maintains file number information isn’t that interesting
C compiler -> assembly. You can ask for this assembly with -S
. You can also
Or more cut up
C -> IR
IR -> MIR (what does gcc do? RTL right? )
MIR -> Asm
Misc
https://twitter.com/33y0re/status/1528719776142475264?s=20&t=vcTgXMu6ZeZRjj7LiSMcFg kernel exploitation brwoser exploitation blog posts
- Arbitrary code guard
- Code interity guard
- hypervisor protected code integrity (HVCI) “the acg of kernel mode”
- vistualization based security VBS. credential guard
-
local privilege escalation (LPE)
People
-
elfmaster ryan oneill
cfi directives - call frame information
joern.io https://github.com/RUB-SysSec/EvilCoder automatic bug insertion using joern phaser slither
Hiding instructions in instructions https://lucris.lub.lu.se/ws/portalfiles/portal/78489284/nop_obfs.pdf
Thomas stars https://github.com/bsoddreams?tab=stars
SGX enclaves
obfuscation snapchat ollvm vmprotect https://github.com/void-stack/VMUnprotect opaque preciates - one branch always taken
chris domas https://github.com/xoreaxeaxeax/movfuscator tom 7
firmadyne emulating and analyzing firmware
burp suite idor - autorize
bloodhound
shellcode encoding and decoding - sometimes you need to avoid things like \0 termination. https://www.ired.team/offensive-security/code-injection-process-injection/writing-custom-shellcode-encoders-and-decoders Shellcode generators. What do they do? shellcode database
google dorking Like using google with special commands? Why “dork”? shodan
nmap
-A -T4. OS detection nmap nse - nmap scriping engine. There is a folder of scripts
p0f - passive sniffing. fingerprinting
malware reversing class live overflow youtube exploit education rop emporium linux exploitation course yara - patterns to recognize malware. Byte level patterns? Sigma snort
SIEM IDS - intrusin detection systems https://en.wikipedia.org/wiki/Intrusion_detection_system
shellcode encoder/decoder/generator https://www.msreverseengineering.com/blog/2017/7/15/the-synesthesia-shellcode-generator-code-release-and-future-directions synesthesia
FLIRT https://github.com/avast/retdec
https://github.com/grimm-co/NotQuite0DayFriday exploit examples
Gray Hat Hacking The Shellcoder’s handbook Attacking network Protocols Implementing Effective Code Review
https://objective-see.com/blog/blog_0x64.html
Hacking: http://langsec.org/papers/Bratus.pdf sergey weird machine paper
https://github.com/sashs/filebytes
blackhat defcon bluehat ccc https://en.wikipedia.org/wiki/Security_BSides bsides ctf project zero kpaersky blog https://usa.kaspersky.com/blog/ spectre/meltdown https://www.youtube.com/watch?v=b7urNgLPJiQ&ab_channel=PinkDraconian
return oriented programming sounds like my backwards pass. Huh.
Digital forensics
- Volatility https://www.volatilityfoundation.org/
- wireshark
- sleuth kit?
radare2, a binary analysis thingo. rax is useful for conversion of hex
binary ninja
ghidra
IDA
RSACTFTool
factordb
manticore
Maybe we should get a docker of all sorts of tools. Kali Linux? https://github.com/zardus/ctf-tools
klee, afl, other fuzzers? valgrind
cwe-checker
shellcode
ROP
https://quipqiup.com/ - solve substitution cyphers
https://github.com/openwall/john john the ripper. Brute force password cracker
ropper
Best CTFs. I probably don’t want the most prestigious ones? They’ll be too high level? I want the simple stuff
https://ctf101.org/ - check out the heap exploitation github thing
pwntools
metasploit, pacu - aws, cobalt strike
and the pwn category of ctf
ROP JOP SROP BOP - block oriented
return 2 libc - a subset of rop?
pwn.college
ryan chapman syscall
https://github.com/revng/revng
privilege escalation - getuid effective id.. Inherit user and group from parent process. switching to user resets the setuid bit. sticky bits id command
shellcode - binary that launchs the shell system call execv(“/bin/sh”, NULL, NULL) - args and env params
intel vs at&t syntax Load up addresses constantrs in binary with .string gcc -static -nostdlib objcopy –section .text=outfile exiting cleanly is smart. Helps know what is screwing up ldd
Trying out shellcode mmap. mprotect? read() deref function pointer
gdb x for eXamine $rsi x/5i $rip gives assembly? x/gx break *0xx040404 n next s step ni si
strace is useful first debugging
Intro
system calls set rax to syscall number. call syscall instruction https://blog.rchapman.org/posts/Linux_System_Call_Table_for_x86_64/ man yada strace
- fork
- execve
- read
- write
- wait
- brk - program brk. change size of data segment. sbrk by increments. sbrk(0) returns current address of break
stack. rbp, rsp. stack grows down decreasing. Rsp + 0x8 is on stack, rbp - 8 is on stack most systems are little endian calling conventions. rdi rsi rdx rcx r8 r9, return in rax rbx rbp r12 r13 r14 r15 are callee saved. guaranteed not smashed
http://ref.x86asm.net/coder64.html opcode listing https://github.com/yrp604/rappel - assembly repl https://github.com/zardus/ctf-tools
binary files
file - tells info about file
elf - interpreter,
- sections - text, plt/got resolve and siprach library calls, data preinitilize data, rodata, global read only,, bss for uniitialized data. sections are not required to run a binary
- symbols -
- segments - where to load
readelf, objdump, nm - reads symbols, patchelf, objcopy, strip, kaitai struct https://www.intezer.com/blog/malware-analysis/executable-linkable-format-101-part-2-symbols/
process loading
https://0xax.gitbooks.io/linux-insides/content/SysCall/linux-syscall-4.html what to load. look for #! or elf magic. /proc/sys/fs/binsmt_misc can match a string there. hand off to elf defined interpeter is dynamically linked.
Then it’s onto ld probably. LD_PRELOAD,, LD_LIBRARY_PATH,, DT_RUNTIME in binary file,, system wide /etc/ld.so.conf, /lib and /usr/lib relocations updated /proc/self/maps https://gist.github.com/CMCDragonkai/10ab53654b2aa6ce55c11cfc5b2432a4 libc is almost always linked. printf, scanf, socket, atoi, amlloc, free
shellcode
Protection
ASLR - Addresses are randomized cat /proc/mem/self ? To look at what actually loaded Also ldd shows were libraries get loaded in memory Stack canaries - set once per binary run, so with forking you can brute force them or maybe leak them?
checksec tells you about which things are enabled.
gcc options -no-pie -no-stack-protection
pwntools
attaching to gdb and/or a process is really useful. cyclic bytes can let you localize what ends up where in a buffer overflow for example cyclic_find