Initial article ready
This commit is contained in:
parent
27733001ac
commit
ff1283df19
1 changed files with 98 additions and 0 deletions
98
blog/2017/10/10/seccomp-and-you/index.markdown
Normal file
98
blog/2017/10/10/seccomp-and-you/index.markdown
Normal file
|
@ -0,0 +1,98 @@
|
|||
---
|
||||
tags: ~
|
||||
title: Seccomp and you
|
||||
---
|
||||
|
||||
So one of the big goals for App::EvalServerAdvanced is to make creating and maintaining a
|
||||
sandbox for arbitrary code easier. The biggest way it does this is via Seccomp-bpf
|
||||
(heretofore refered to as seccomp).
|
||||
|
||||
seccomp-bpf is an extension to seccomp[8] that allows filtering of system calls using
|
||||
a configurable policy implemented using Berkeley Packet Filter rules. It is used by
|
||||
OpenSSH and vsftpd as well as the Google Chrome/Chromium web browsers on Chrome OS and
|
||||
Linux. (In this regard seccomp-bpf achieves similar functionality to the older
|
||||
systrace—which seems to be no longer supported for Linux).
|
||||
-- https://en.wikipedia.org/wiki/Seccomp
|
||||
|
||||
Right now this is all handled in App::EvalServerAdvanced::Seccomp, with a large set of
|
||||
predefined rules, organized into 'profiles'. Each profile is intended to represent a
|
||||
single kind of action that a program could do, such as open a file for reading, open a
|
||||
file for writing, etc.
|
||||
|
||||
I've created a few profiles to start with
|
||||
|
||||
- stdio
|
||||
Allow reading from STDIN, and writing to STDOUT/STDERR.
|
||||
|
||||
- file_open
|
||||
Allows calling some file related system calls, such as: open, openat, close, select,
|
||||
read (on any descriptor), pread64, lseek, fstat, lstat, stat, fcntl, and ioctl with flags to detect if it's a
|
||||
tty. The flags that are allowed to go to a opening a file are defined in the "open_modes"
|
||||
rules that will be covered later
|
||||
|
||||
- file_opendir
|
||||
Allows opening a directory to get a list of files, and also includes the file_open
|
||||
profile to allow interacting with the handle. Essentially allows the behavior of /bin/ls
|
||||
or similar programs
|
||||
|
||||
- file_tty
|
||||
Adds O_NOCTTY to the allowed flags passed to open() and similar calls
|
||||
|
||||
- file_readonly
|
||||
Adds O_NONBLOCK, O_EXCL, O_RDONLY, O_NOFOLLOW, O_CLOEXEC to be passed to open() and
|
||||
similar calls
|
||||
|
||||
- file_write
|
||||
Adds O_CREAT, O_WRONLY, O_TRUNC, O_RDWR to be passed to open() and similar calls.
|
||||
Also allows the use of write, pwrite64, mkdir, and chmod syscalls.
|
||||
|
||||
- time_calls
|
||||
Allows calling nanosleep, clock_gettime, and clock_getres syscalls. For perl this
|
||||
means allowing time(), and similar calls, and sleep() along with Time::HiRes.
|
||||
|
||||
- ruby_timer_thread
|
||||
This one is a special ruby specific profile. It allows ruby to create a thread that
|
||||
it uses internally, and only allows that thread creation with a specific set of flags,
|
||||
CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID
|
||||
This prevents it from doing arbitrary fork() calls, while still allowing the interpreter
|
||||
to run. It also allows for pipe2 to be called to create communication between the two
|
||||
threads.
|
||||
|
||||
- perl_file_temp
|
||||
This was added specifically for behavior of File::Temp, and might get folded into a
|
||||
more generic profile. It allows chmod with a mode of 0600 and unlink to be called.
|
||||
|
||||
- exec_wrapper
|
||||
This one is seriously special. It's not a predefined set of rules, but in fact
|
||||
generates the rules at runtime. This is because of limitations of seccomp. Since
|
||||
seccomp can't inspect inside of pointers, there's no way to verify the contents of a
|
||||
string being passed to execve(), instead we create a white-list of strings that can be
|
||||
passed to it, and only allow calls to execve that are passed pointers to this syscall.
|
||||
This isn't perfectly secure since someone could overwrite the contents at a later point
|
||||
but it's safe enough because an attacker can't view the generated BPF to extract the
|
||||
addresses, and the strings themselves should be gone from memory by the time their code
|
||||
runs, preventing them from recreating the original addresses. This requires ASLR in order
|
||||
to be effective at preventing an attacker from derriving the address of the strings from
|
||||
previous runs.
|
||||
|
||||
There's also some other profiles like ruby_timer_thread specifically for allowing node.js
|
||||
to do similar things to ruby (create a thread, use epoll, etc.).
|
||||
|
||||
|
||||
=== Handling flags to syscalls
|
||||
|
||||
The way the rules are defined allow syscalls like open() to not need special handling.
|
||||
Since many syscalls can take flags, it's useful to be able to limit the flags they can
|
||||
take.
|
||||
|
||||
{syscall => 'openat', permute_rules => [['2', '==', \'open_modes']]},
|
||||
|
||||
Inside A::ESA::Seccomp you can define a syscall like the above, to take a set of
|
||||
automatically generated rules from a permutation. In this cases it's called 'open_modes'.
|
||||
A profile can add (but not remove) values to the permutation rules, and then when the
|
||||
whole BPF program gets compiled it'll generate all the applicable rules for you. This
|
||||
makes setting up calls like open much much simpler since you don't have to write out all
|
||||
possible modes yourself. This is also an area where I could be doing better to optimize
|
||||
the whole thing, but have not done so yet. Seccomp itself supports doing some bitwise
|
||||
operations that could make this more effective but they were not well exposed through
|
||||
Linux::Seccomp when this was originally designed.
|
Loading…
Add table
Reference in a new issue