Getting Bap in the Browser 1
At my day job, I spend a lot of time loving and fighting bap, the binary analysis platform.
For some reason, I have a mind virus that makes it highly compelling for me to compile things to javascript. I like being able to share. I think if I have some interesting idea or doo dad, the odds of anyone downloading and compiling some script of mine to try it out are almost 0. But, if it runs right in their browser, hey maybe they’ll give it a try, appreciate it.
Hence, a very compelling aspect of ocaml is the existence of the js_of_ocaml compiler. It converts ocaml bytecode produced by ocamlc
into javascript. There are often hiccups in these things.
I really want bap to run in the browser, because I think we do cool stuff at work and want to show it off. It is also irritating to me that I am developing all this knowledge. Bap has a number of useful IRs, data structures, and compiler passes inside it.
Bap is a very complex system. I have gotten much further than I thought I would in getting it running in browser, so I’m writing up what I’ve got so far. Here is the github repo with the bits and pieces.
It’s been a weird week. I was supposed to go home but my flight got cancelled because of omicron covid so I had a rather desolate week fighting build systems, reading bap source, and tracking down javascript bugs instead. Still ultimately productive feeling. I had some big wins.
Comments on Ocaml compiling flags and dune
Ocaml has two main compiler exectuables, ocamlc
and ocamlopt
. ocamlc
produces bytecode and ocamlopt
produces native code. By and large, I use native code generation. It produces faster code. Js_of_ocaml operates on bytecode however.
The ocaml manual for these tools is very nice btw. One should always give such things at least one read over to kind of know what is there
I have realized that using dune
had crippled my understanding of the ocaml compilation process, since it takes care of it for you. This is a problem of all build tools. I have somehow gotten this far by barely understanding build tools and linkers and just smashing my way through via google-fu and guess work. One way to start learning is using --verbose
which will tell you the commands dune is running. The will look huge though.
It is somewhat surprising if you’ve always dealt with package managers and build tools what the raw way of adding libraries to a compiler is. The flags for this are rather similar for many compilers. -I
adds to the include directories. For every library in ocaml, you actually need to write the file name of the library to the compiler. libraries are actually executed in the order you specify them to the compiler. Consider adding print statements to some ocaml files. The order they print depends on the order you hand them to the compiler. Also, this perhaps help explain why the dependencies of your modules need to form a DAG, for which you give the compiler a valid topological ordering.
Ocaml has some different file types. .cmo
are object files (corresponding to .ml files), .cmi
are interface (corresponding to .mli files ish), .cma
are archive files which are bundles. This is also the case for C compilers, which have .o
, .h
, and .a
files, which are kind of similar. THe native compilers has variants of these .cmx .cmxi .cmxa
.
To build something using raw js_of_ocaml run ocamlc on the different bits to get an .cmo
or a .out
file. I kept screwing up whether I was using .cmo
or .out
files. The first is a library, the second is a linked runnable file. You can tell the difference between these things also by running the command file
on them.
ocamlfind is a separate package finding facility for ocaml that basically comes by default with opam (I think?). You can find the location of installed packages ocamlfind query z3
for example. You can also use it alongside other compiler commands. This is very useful and how you can specify libraries without finding all the paths and files yourself to add to the compiler.
Js_of_ocaml has two different modes, separate compilation and whole compilation. Seperate compilation is probably good for compile times. Whole compilation appears to still be necessary if you want to use dynlinking and the toplevel. Whole compilation is achieved in dune via the --release
flag.
If you just want to use bap data structures like Var.t
, you can get by with using dune and including bap as a library. You will need the stanza (flags -linkall)
. You can compile for js_of_ocaml by including (modes byte js)
and running dune build main.bc.js
.
The difference between a --release
build and dev build for js_of_ocaml in dune are very different. I tend to need --release
A tricky bug
My initial attempt just to see what I can do with bap compiling https://gist.github.com/philzook58/6501f86082954f795812b50181d014bf. Here I was attempting to manually add all the appropriate javascript stubs. Eventually I did switch over to using dune.
This stopped a a certain point complaining of closing an already closed stream. It is somewhat unfortunate actually that there is a software engineering tendency to fail gracefully by catching all errors. Actually, this significantly hinders the debugging process. A completely dirty fail gives a stack trace at the actual problem location.
In this case, the problem was actually a missing javascript stub for a unix system call utime
. This was completely obscured by the exception handling mechanisms. This call the utime was actually in the camlzip library and it took me a little bit of digging to find exactly where.
Adding the file
//Provides: unix_utimes
function unix_utimes(x,y,z,w){
return;
}
and adding to dune the stanza (js_of_ocaml (javascript_files helpers.js))
which I figured out by examining the zarith_stubs_js example. This package does need to be added to bap for javascript compiling.
Dynamic Linking
I was just a bit confused about how dynlinking was to work in js_of_ocaml. Bap uses a plugin system which is based around the Dynlink module.
You start a bap program, you are always supposed to call Bap_main.init and really in typical usage you should be letting bap do that for you. This loads up all the plugins that bap needs to implement the features you require. Bap plugins are zip files and are why camlzip was being called above. I’m actuall shocked this is working, because some external functions of camlzip are marked as missing still. The bap_plugin library is orchestrating this. It in turn uses stuff from the Bap_bundle library. load
in bap_plugin is a ref which can be set to either the toplevel dynamic linker interface or the Dynlink one. For a moment I thought maybe js_of_ocaml only support the toplevel dynamic linker, but now I’m not so sure. Baptop
uses this mechanism for example like so
let loader = Topdirs.dir_load Format.err_formatter in
setup_dynamic_loader loader;
I asked about how to use dynlinking here. Long story short, use whole compilation mode, --toplevel --dynlink +dynlink.js +toplevel.js
. In addition, maybe on new versions of ocaml you need to add Sys.interactive := false
. Here is a simple repo I put together.
open Js_of_ocaml_toplevel
let () = JsooTop.initialize ()
let () = Sys.interactive := false
let () = Printf.printf "%b\n" Dynlink.is_native
let _ = Dynlink.loadfile "plugin.cmo"
let () = print_endline "hello"
Stack overflow and Js_of_ocaml
Unfortuantely, Javascript and ocaml are not in alignment in regards to expectations of the stack and tail call elimination. Because of this, it is quite possible to get stack overflow errors.
Javascript apparently put tail call elimination into the spec quite a while ago, but there was disagreement about this and the browsers have not really implemented it.
Scala I believe has a similar proble, being a functional programming language on the JVM.
I ran into this problem in multiple places. The first is a stack overflow in the js_of_ocaml compiler itself when used from within javascript while bap is dynamically linking in libraries (bap_c in this case). I made a variant of the compiler that switched from a non tail recursive call to one that is with an explicit accumulator parameter. You can fidn that here.
Bap recently converted some of it’s monads into a CPS style. This is probably not good without tail call optimization. Bap 2.3.0 had an older style that did not run into these issues on my simple test. Here I followed JSCoq’s lead and added trampolines to these monads.
If you define a thunk datatype, you can early return while there is still computaiton to do. What this does is unwind the stack up the the point where you call trampoline
and save any necessary data inside closures that go in the heap. So at any point in a 'a thunk
computation, you can turn a my_complicated_computation
TailCall (fun () => my_complicated_compuation)` and this will demarcate a point in which you move stuff from stack to heap.
type 'a thunk =
| Fin of 'a
| TailCall of (unit -> 'a thunk)
let rec trampoline r = match r with
| Fin a -> a
| TailCall f -> trampoline (f ())
I added this transformation just the to return
case of the monad, which is sufficient for the moment. See the patch to bap here This was surprising to me. I thought I’d be changing bind
, but I followed jsCoq’s lead.
There is some possiblity the js_of_ocaml flag --disable inline
may help stack overflows, but I didn’t see it ever work.
Js_of_ocaml external file system
Bap needs many files to work right.
Js_of_ocaml includes a super cool ability to include arbitrary files into a psuedo file system. The command line flag --file abolsute/path/to/myfile
will inlcude this file in the produced javascript. --file
apparently adds files in a folder called /static/
. This also needs to be added to bap’s various internal lookup paths.
This was a nice oppurtunity to use strace. Once I had something working in my command line node
but not in browser, I could strace all the file openings to try to include them with --file
one by one.
strace -e trace=open,openat,close,connect,accept dune exec ./main.bc
strace is quite the handy thing sometimes. It traces system calls, which are the portal from your program to the world.
Js of ocaml troubleshooting tips
Make sure to use all the prettifying command line options. Your life will be impossible without them
--pretty --debuginfo --source-map-inline
. You need to run ocamlc
with debug generation on -g
.
I tended to use node
to make sure stuff initially works. node
allows you to increase the stack size with stack-size=10000
. Browsers do not. node
is also faster apparently.
Surprisingly, I found it very helpful to edit the generated _build/default/main.bc.js
directly by hand to do console.log
debugging, since some errors were coming from libraries that were very annoying to print debug at the ocaml level.
The actual error being printed was often being thrown and caught from somewhere. Grepping for that string in main.bc.js and grepping for RangeError
and adding console.log(exn.stack)
at those places was immesnely helpful.
For stack overflow it is often helpful to Error.stackTraceLimit = Infinity;
to get the entire trace.
TODO: The Actual Hard Parts
Emscriptenize the llvm disassembler / ghidra / z3 and then connect those to js_of_ocaml.
However, with just the primus lisp loaders, I think I can do a lot of fun stuff. Who cares about binaies anyway