diff --git a/blog/2017/10/10/seccomp-and-you/index.markdown b/blog/2017/10/10/seccomp-and-you/index.markdown new file mode 100644 index 0000000..29aa0dd --- /dev/null +++ b/blog/2017/10/10/seccomp-and-you/index.markdown @@ -0,0 +1,98 @@ +--- +tags: ~ +title: Seccomp and you +--- + +So one of the big goals for App::EvalServerAdvanced is to make creating and maintaining a +sandbox for arbitrary code easier. The biggest way it does this is via Seccomp-bpf +(heretofore refered to as seccomp). + + seccomp-bpf is an extension to seccomp[8] that allows filtering of system calls using + a configurable policy implemented using Berkeley Packet Filter rules. It is used by + OpenSSH and vsftpd as well as the Google Chrome/Chromium web browsers on Chrome OS and + Linux. (In this regard seccomp-bpf achieves similar functionality to the older + systrace—which seems to be no longer supported for Linux). + -- https://en.wikipedia.org/wiki/Seccomp + +Right now this is all handled in App::EvalServerAdvanced::Seccomp, with a large set of +predefined rules, organized into 'profiles'. Each profile is intended to represent a +single kind of action that a program could do, such as open a file for reading, open a +file for writing, etc. + +I've created a few profiles to start with + + - stdio + Allow reading from STDIN, and writing to STDOUT/STDERR. + + - file_open + Allows calling some file related system calls, such as: open, openat, close, select, +read (on any descriptor), pread64, lseek, fstat, lstat, stat, fcntl, and ioctl with flags to detect if it's a +tty. The flags that are allowed to go to a opening a file are defined in the "open_modes" +rules that will be covered later + + - file_opendir + Allows opening a directory to get a list of files, and also includes the file_open +profile to allow interacting with the handle. Essentially allows the behavior of /bin/ls +or similar programs + + - file_tty + Adds O_NOCTTY to the allowed flags passed to open() and similar calls + + - file_readonly + Adds O_NONBLOCK, O_EXCL, O_RDONLY, O_NOFOLLOW, O_CLOEXEC to be passed to open() and +similar calls + + - file_write + Adds O_CREAT, O_WRONLY, O_TRUNC, O_RDWR to be passed to open() and similar calls. + Also allows the use of write, pwrite64, mkdir, and chmod syscalls. + + - time_calls + Allows calling nanosleep, clock_gettime, and clock_getres syscalls. For perl this +means allowing time(), and similar calls, and sleep() along with Time::HiRes. + + - ruby_timer_thread + This one is a special ruby specific profile. It allows ruby to create a thread that +it uses internally, and only allows that thread creation with a specific set of flags, +CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID +This prevents it from doing arbitrary fork() calls, while still allowing the interpreter +to run. It also allows for pipe2 to be called to create communication between the two +threads. + + - perl_file_temp + This was added specifically for behavior of File::Temp, and might get folded into a +more generic profile. It allows chmod with a mode of 0600 and unlink to be called. + + - exec_wrapper + This one is seriously special. It's not a predefined set of rules, but in fact +generates the rules at runtime. This is because of limitations of seccomp. Since +seccomp can't inspect inside of pointers, there's no way to verify the contents of a +string being passed to execve(), instead we create a white-list of strings that can be +passed to it, and only allow calls to execve that are passed pointers to this syscall. +This isn't perfectly secure since someone could overwrite the contents at a later point +but it's safe enough because an attacker can't view the generated BPF to extract the +addresses, and the strings themselves should be gone from memory by the time their code +runs, preventing them from recreating the original addresses. This requires ASLR in order +to be effective at preventing an attacker from derriving the address of the strings from +previous runs. + +There's also some other profiles like ruby_timer_thread specifically for allowing node.js +to do similar things to ruby (create a thread, use epoll, etc.). + + +=== Handling flags to syscalls + +The way the rules are defined allow syscalls like open() to not need special handling. +Since many syscalls can take flags, it's useful to be able to limit the flags they can +take. + + {syscall => 'openat', permute_rules => [['2', '==', \'open_modes']]}, + +Inside A::ESA::Seccomp you can define a syscall like the above, to take a set of +automatically generated rules from a permutation. In this cases it's called 'open_modes'. +A profile can add (but not remove) values to the permutation rules, and then when the +whole BPF program gets compiled it'll generate all the applicable rules for you. This +makes setting up calls like open much much simpler since you don't have to write out all +possible modes yourself. This is also an area where I could be doing better to optimize +the whole thing, but have not done so yet. Seccomp itself supports doing some bitwise +operations that could make this more effective but they were not well exposed through +Linux::Seccomp when this was originally designed.