Initial article ready

2017-10-11 14:39:31 -07:00 · 2017-10-11 14:39:31 -07:00 · ff1283df19
commit ff1283df19
parent 27733001ac
1 changed files with 98 additions and 0 deletions
--- a/blog/2017/10/10/seccomp-and-you/index.markdown
+++ b/blog/2017/10/10/seccomp-and-you/index.markdown
@ -0,0 +1,98 @@
+---
+tags: ~
+title: Seccomp and you
+---
+
+So one of the big goals for App::EvalServerAdvanced is to make creating and maintaining a
+sandbox for arbitrary code easier.  The biggest way it does this is via Seccomp-bpf
+(heretofore refered to as seccomp).
+
+    seccomp-bpf is an extension to seccomp[8] that allows filtering of system calls using
+    a configurable policy implemented using Berkeley Packet Filter rules. It is used by
+    OpenSSH and vsftpd as well as the Google Chrome/Chromium web browsers on Chrome OS and
+    Linux. (In this regard seccomp-bpf achieves similar functionality to the older
+    systrace—which seems to be no longer supported for Linux).
+    -- https://en.wikipedia.org/wiki/Seccomp
+
+Right now this is all handled in App::EvalServerAdvanced::Seccomp, with a large set of
+predefined rules, organized into 'profiles'.  Each profile is intended to represent a
+single kind of action that a program could do, such as open a file for reading, open a
+file for writing, etc.
+
+I've created a few profiles to start with
+
+  - stdio
+    Allow reading from STDIN, and writing to STDOUT/STDERR.
+
+  - file_open
+    Allows calling some file related system calls, such as: open, openat, close, select,
+read (on any descriptor), pread64, lseek, fstat, lstat, stat, fcntl, and ioctl with flags to detect if it's a
+tty.  The flags that are allowed to go to a opening a file are defined in the "open_modes"
+rules that will be covered later
+
+  - file_opendir
+    Allows opening a directory to get a list of files, and also includes the file_open
+profile to allow interacting with the handle.  Essentially allows the behavior of /bin/ls
+or similar programs
+
+  - file_tty
+    Adds O_NOCTTY to the allowed flags passed to open() and similar calls
+
+  - file_readonly
+    Adds O_NONBLOCK, O_EXCL, O_RDONLY, O_NOFOLLOW, O_CLOEXEC to be passed to open() and
+similar calls
+
+  - file_write
+    Adds O_CREAT, O_WRONLY, O_TRUNC, O_RDWR to be passed to open() and similar calls.
+    Also allows the use of write, pwrite64, mkdir, and chmod syscalls.
+
+  - time_calls
+    Allows calling nanosleep, clock_gettime, and clock_getres syscalls.  For perl this
+means allowing time(), and similar calls, and sleep() along with Time::HiRes.
+
+  - ruby_timer_thread
+    This one is a special ruby specific profile.  It allows ruby to create a thread that
+it uses internally, and only allows that thread creation with a specific set of flags,
+CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID
+This prevents it from doing arbitrary fork() calls, while still allowing the interpreter
+to run.  It also allows for pipe2 to be called to create communication between the two
+threads.
+
+  - perl_file_temp
+    This was added specifically for behavior of File::Temp, and might get folded into a
+more generic profile.  It allows chmod with a mode of 0600 and unlink to be called.
+
+  - exec_wrapper
+    This one is seriously special.  It's not a predefined set of rules, but in fact
+generates the rules at runtime.  This is because of limitations of seccomp.  Since
+seccomp can't inspect inside of pointers, there's no way to verify the contents of a
+string being passed to execve(), instead we create a white-list of strings that can be
+passed to it, and only allow calls to execve that are passed pointers to this syscall.
+This isn't perfectly secure since someone could overwrite the contents at a later point
+but it's safe enough because an attacker can't view the generated BPF to extract the
+addresses, and the strings themselves should be gone from memory by the time their code
+runs, preventing them from recreating the original addresses.  This requires ASLR in order
+to be effective at preventing an attacker from derriving the address of the strings from
+previous runs.
+
+There's also some other profiles like ruby_timer_thread specifically for allowing node.js
+to do similar things to ruby (create a thread, use epoll, etc.).
+
+
+=== Handling flags to syscalls
+
+The way the rules are defined allow syscalls like open() to not need special handling.
+Since many syscalls can take flags, it's useful to be able to limit the flags they can
+take.
+
+  {syscall => 'openat', permute_rules => [['2', '==', \'open_modes']]},
+
+Inside A::ESA::Seccomp you can define a syscall like the above, to take a set of
+automatically generated rules from a permutation.  In this cases it's called 'open_modes'.
+A profile can add (but not remove) values to the permutation rules, and then when the
+whole BPF program gets compiled it'll generate all the applicable rules for you.  This
+makes setting up calls like open much much simpler since you don't have to write out all
+possible modes yourself.  This is also an area where I could be doing better to optimize
+the whole thing, but have not done so yet.  Seccomp itself supports doing some bitwise
+operations that could make this more effective but they were not well exposed through
+Linux::Seccomp when this was originally designed.