Protecting Users John Moser October 17 2004 Abstract: This paper presents a collection of existing and hypothetical technologies which can be deployed in such a way as to create a heightened level of security without changing the overall user experience or significantly impacting the general administrative experience negatively. The main ideas are (1) protecting the user from malicious attacks utilizing bugs in software; (2) preserving the user's experience such that the operation of a system with the security enhancements is indistinguishable from that of a system without; (3) maintaining a binarily and operationally compatible environment such that third party software continues to operate; and (4) maintaining a similar administrative environment such that the administrator is not given a significant added workload. Introduction Code run on a system is built from source code of various languages. Most of this source code is compiled to code native to the target machine; in some cases, the code is compiled for a byte code interpreter or written for a script interpreter. No matter how it is delivered, all code executed must in some way interact as native code. Byte code and scripts may either be interpreted, triggering sequences of native code; or they may be just-in-time compiled, being themselves turned into native code on demand and then executed. If the code has bugs, then it may be possible to disrupt the program and cause it to execute a sequence of code crafted by an external source. This can be used to gain access to a machine illegitimately and carry out an array of tasks. This paper explores the possibility of preventing such bugs from being used to cause great damage in ways not incurring a great disruption to the user or administrative experience. The burden of protecting against these papers is placed on others; their research is brought together here to form an effective method of protecting the general user base. Section 2 classifies the environments in which users may operate and defines our focus to provide a background for explaining the nuisances and threats we aim to mitigate; section 2 also explains in greater detail the nuisances and threats, and what environments are most affected. Section 3 describes related work, particularly the protection methods which we will employ to achieve our ends. Section 4 explains deployment of each protection method to meet our goals. Section 5 presents conclusions and discusses issues for future work. Target Environments There are several classes of computing environments which can be examined and assessed to determine if it is necessary and prudent to protect them. The classes here are completely arbitrary, only loosely based on computer science concepts. 1. Embedded Systems Embedded systems are those systems which exist implanted in devices such as PDAs or cell phones. Such systems rarely contain sensitive information, and are rarely primary targets for attackers. We shouldn't need much if any security here, except in very special cases, such as on routers. The special case embedded systems may be high risk; however, their primary security concerns are often kernel level. As such, protection is difficult and normally restricted to code auditing. 2. Desktop Systems Desktop systems are those which users access directly on a frequent basis. They are most frequently used to play games, do graphics work, edit documents, send and receive e-mail, and browse the Internet. Desktop systems may be home or office use. All desktop systems face the concern of worms. Worms are programs, often small, which propagate their code through computer networks via bugs or design flaws in programs. They may use e-mail or Web client flaws which automatically execute scripts; or they may use programming bugs which allow them to inject themselves into a running program to propagate. They may harmlessly spread; interfere with usage of the machine; destroy files; or mine data. Because worms may spread stealthily and mine data, they are a serious security risk. Desktop systems also face the risk of crackers. Crackers are commonly separated into two classes: "Script Kiddies" and "Blackhats." These are de-facto standard classifications. Script Kiddies attack as a joke or out of simple anger, and use exploits written by others, rarely having the technical knowledge to understand them much less create their own. In contrast, Blackhats have technical knowledge enough to write their own exploits, viruses, and worms; and often have a definite malicious intent such as stealing sensitive client data for identity theft, or simply destroying as much data as possible. Both are obvious security threats. Script Kiddies more often are responsible for desktop intrusions perpetrated directly by a human. 3. Servers Servers are machines dedicated to providing services such as web servers, e-mail access, IRC, streaming media, shared files, or shell access for group work on programming projects. They are often run on a higher class of hardware; but can be run from a 486 in somebody's basement. In either case, the environment is the same; servers are impossible to fully firewall, because their most vulnerable point of attack--the service they supply--is required by design to be exposed fully. Servers are vulnerable to worms and crackers as well. Blackhats more often focus their efforts towards servers than Script Kiddies; Blackhats are more prone to have a real objective, such as locating and mining sensitive data. Our primary focus is on desktop systems. These systems are the most common, being the dominating computing environment at home and in business settings. Embedded systems may in truth hold a much wider range of influence, being that they are inside PDAs, VCRs, DVD players, video game systems, cell phones, and many other small appliances; but most of these are of no concern security wise. Related Work Several projects have addressed the many security concerns present with various techniques. Those covered here are focused on Linux or BSD; but can be deployed on all operating systems if they must be rewritten. Each project uses its own technique to prevent entire classes of exploits or to stop exploits at classes of bugs. 1. Stack Smash Protection Stack Smash Protection is a method for detecting buffer overflows in programs written in languages which do not support strict bounds checking, such as C. The concept was covered in detail in a paper titled, "Protecting from stack smashing attacks,"[1] by Hiroaki Etoh and Kunikazu Yoda of the IBM Tokyo Research Laboratory. The paper detailed the approaches of several systems, including their own, which evolved from Immunix StackGuard[2] into the IBM Stack Smash Protector[3]. The IBM Stack Smash Protector patch is implemented as a patch to the gcc compiler[4]. It is deployed in Adamantix, Hardened Gentoo, and OpenBSD, to name a few. Microsoft's newer C compilers are equipped with stack smash protection as outlined in a paper[5] at CoreLabs. 2. Executable Space Protection Operating systems impose memory protections on the virtual memory space of tasks. They set which memory is readable, writable, and executable. If access to memory occurs in a mode which is not explicitly granted, the CPU raises a Segmentation Fault, which the OS can handle by either adjusting permissions to allow the program to continue, or by terminating the program. In general, data memory such as the stack and heap must be given Read and Write permissions; and code memory such as shared libraries and the executable (.text) segment of programs must be given Read and Execute permissions. By enforcing these separations, attacks such as direct code injection may not succeed; these attacks rely on writing code into memory and then executing it, and thus require an area that is Writable first and then Executable[6]. To complete the separation, operating system functions which allow tasks to change their memory protections must prevent certain operations to occur[7]. No program should be able to create memory that is both Writable and Executable; and no program should be able to transition non-Executable memory to Executable. Administrative control over this restriction should be in place so that it can be disabled for select programs for compatibility reasons. In this way, the administrator can at all times know which programs may be a risk; can judge which are critical enough that that risk is acceptable; and can enforce that judgment. A very small set of programs rely on Read, Write, and Execute permissions to be combined. Just-In-Time compilers commonly used to execute Java and .NET code. These can be modified[8] to produce shared libraries in a temporary directory with proper file system protections on them, and map them into memory as any other shared library; thus, they can exist without being reliant on RWX pages. On the other hand, full machine emulators which are aimed at generating native code in real time cannot function with such a work-around as JIT compilers. These cannot possibly predict what is code in the virtual machine well enough to generate a very large buffer under current technology. Emulators such as Qemu[9] make a good example. These will require the ability to mark pages as RW and then transition to RX, at the very least. Several implementations supply viable Executable Space Protections. These range across several open source and proprietary operating systems. Linux on architectures supplying an NX bit properly marks pages initially so that the stack and heap are non-Executable, and so that the code is non-Writable. POSIX defines the mprotect() function and the PROT_READ, PROT_WRITE, and PROT_EXEC restrictions; and so other Unix-type systems should also function similarly. A patch for Linux known as PaX[10] supplies an NX bit on the x86 architecture, where there is no hardware available to back the PROT_EXEC protection. PaX also utilizes mprotect() restrictions and ensures that the most secure memory protection policy is enforced. PaX also supplies fine grained administrative control over which programs get what protections. OpenBSD supplies a system known as W^X[11]. This system supplies proper memory protection and attempts to emulate an NX bit on x86 architectures. It does not, however, supply mprotect() restrictions. There is also a very similar patch for Linux by Ingo Molnar, known as Exec Shield. It is more similar to W^X than to PaX. Microsoft Windows XP Service Pack 2 introduces the ability for the OS to control the NX bit on AMD's x86_64 processors. Implementation details are unknown to the author of this paper at the time of this writing. 3. Address Space Layout Randomization When loading a program into memory, an operating system separates out various parts or "segments" and loads them at various addresses in the virtual memory space. Using conventional methods, the resulting address space is always predictable based on the executable and all libraries loaded. Any information an attacker needs to acquire from the address space, including the addresses of functions he may want to use in injected code, can be easily obtained. By using Address Space Layout Randomization[12] (ASLR), such information is located at unpredictable offsets in memory. This makes Return-to-Libc style attacks impossible to guarantee, and makes the task of mining data from the address space dangerous and unpredictable; missing with these attacks results in various terminations ranging from Segmentation Faults to Illegal Instruction Faults. Full range ASLR is supplied by both PaX[13] and Exec Shield[14] on Linux. With PaX, the entropy ranges from 16-24 bits (65000-16M positions) depending on segment on 32 bit x86 architectures. On AMD64, the randomization ranges 24-32 bits (16M-4B positions). The level of randomization, as illustrated here, varies from architecture to architecture. OpenBSD provides a form of ASLR as well, as a compliment to W^X. A much less entropic form of ASLR, Library Load Order Randomization, is also possible, in which library load order is randomized. This creates a situation in which it is much easier to guess at the placement of a loaded library. The attacker only needs to know the address layout in relation to the libraries containing the functions or data he needs; thus, the chances of success are, as with full ASLR, continuously lower as the attack becomes more complex. Attacks only needing a single library, however, will probabilistically succeed approximately 1/N times, where N is the number of libraries loaded at execution time. 4. Information Leak Containment Security by obscurity[15] is the concept in which security flaws in a system are kept secret in the hopes that attackers will not find out about them. The flaws are known, but not publicized. This does not work, because attackers always find the flaws eventually, either by debugging their copies of the code or by simple trial and error. Once the flaw is known, it is easily exploited. Information Leaks do not necessarily relate to security by obscurity. Flaws in a system can be known, but can require certain information to pull them off. For example, Stack Smash Protection works on the concept that the attacker must know the value of a random guard variable. An Information Leak supplying this value would make Stack Smash Protection useless. Knowing the design details of SSP, however, does not help an attacker discover the variable. Information Leaks are contained by two methods used in conjunction. First, the information has to be obscured from the attacker. Second, the information must be altered in a way which is both unpredictable and obscured from the attacker. For example, with Stack Smash Protection and a randomized address space, the heap is at an indeterminate offset. The value of the canary is chosen in an indeterminate way. An attacker cannot simply look to a known or predictable offset or use a known or predictable value. Information Leaking can be prevented using a variety of methods. The GrSecurity Project[16] employs a method of protecting the entropy of ASLR, for example, by setting all address information supplied by the kernel to user space to 0. This includes, but is not limited to, the /proc//maps file, which details the memory mappings for a program, as well as what type of mappings they are, and what file they're mapped to if they're file backed. GrSecurity also employs a variety of randomization, many adapted from OpenBSD or other sources. For example, GrSecurity will randomly assign process IDs to tasks. This, in combination with the /proc restrictions which prevent users from getting information about other users' tasks, will assure that even local attackers do not have a starting point when directly attacking a process, unless they have administrative access. Also employed by GrSecurity are various randomizations of network data, which prevent OS fingerprinting. This is simple security by obscurity, which at best will avoid scripted attacks reliant on OS detection. This may also help obscure design details about the internal network, such as how many clients are running behind a NAT, and if a particular connection is from a NAT client, as such things can also be fingerprinted. 5. Format String Vulnerabilities Format String Vulnerabilities are defined in a paper[17] about FormatGuard. They are vulnerabilities which use user supplied format strings to read and write data to arbitrary areas of program memory. Among other potential methods of exploitation, the %n format specifier can be included in the string for particularly dangerous results. Format strings are processed with %n indicating that the argument passed is of type (int*), and will write the size of the formatted output thus far to the address given by that argument. A carefully crafted format string can write an arbitrary number of bytes to an arbitrary location in memory. This allows protections such as Stack Smash Protection to be completely evaded. Immunix developed a protection known as FormatGuard, which entails a set of macros to wrap the most common format string functions--syslog(), printf(), fprintf(), sprintf(), snprintf()--inside a set of macros which count the format strings. These macros work for 0-99 argument calls. If the number of format specifiers in the string is greater than the number of arguments, then the call to the function is not allowed; otherwise, the wrapper simply calls the function. Another potential way to do it would be to use a macro which handles a variable length argument list and has a function count the format specifiers. This would require a non-standard patch to the compiler to allow variable length macros to know how many arguments they were called with. This approach has two advantages. First, a single macro and function handle all cases, instead of just up to N cases; although normally format string functions don't have more than 99 arguments. Second, the complexity of the code is reduced. Still another way to do it, not requiring nonstandard compiler alterations, would be to use a macro with a variable argument list to verify number and type of args. A macro could call a function, fmt_check(char *fmt, ..., int args_end), which would verify that *fmt ended at &args_end. It would be unacceptable to ever end beyond &args_end. It would also be unacceptable to end before &args_end if a %n was found, as carefully crafted format specifiers could use misalignment to write to arbitrary data. This would predictably have higher overhead than simple counting, and would possibly raise more false positives. It would also require a full format string implementation, as all valid format specifiers would need to be properly interpreted. All of these methods have the potential to break compatibility with existing code, or to raise false positives or false negatives. False positives are events where harmless, normal operation is identified as an attack; false negatives are events where attacks are uncaught and may succeed. Any implementation must be careful to not pride itself on a false sense of security. 6. Fork Bomb Defusers Sometimes users are as a joke told to run simple commands such as `:(){:|:;};:` in their shell sessions. These commands produce an infinite number of processes in a recursive manner, grinding the machine to a halt. To control this, a Fork Bomb Defuser can be employed. Normal resource limiting can be bothersome to a user on a single user machine such as a business or home desktop. With only one user at a time using the machine, there is no logic to restricting the maximum resource usage by that user. This limiting would constrict the number of processes the user can be running at once; and while effective, may have a definite delay lasting longer than that of a Fork Bomb Defuser. An FBD may also be able to trap well-crafted attacks which don't die when they hit the upper limit: `:(){while(true);do :|:;done;};:`. Of course these attacks are more obvious, but not when embedded into a script, which the user may not necessarily visually check. There is an excellent FBD known as rexFBD[18] which can be configured to limit how many fork()s can be done per second, as well as how many tasks each user may run. It is not clear whether this is per second per process tree, or just per second. It does manage to kill the task at the top of the fork() bomb, although the details have not been researched. A good method would be to trace up the task tree starting at whichever process violated max_forks_per_second, and kill the last task with the same executable image. Certain applications may violate the max_forks_per_second rule in a server environment. Apache will likely do this if max_forks_per_second=20 or some other relatively small but effective number when the server is under high load. An FBD is definitely not appropriate in a server environment. 7. Ultimate Anti-Virus Viruses spread to executable programs, injecting themselves into the code and changing the executable image. This is largely protected against by not running strange programs with administrative access. These guidelines are not always followed, however. Oftentimes single user machines are run by users who like to install programs to the system, by switching to an administrative account and running an installer program instead of a package manager. Even worse, attackers may install root kits to ease the repeated penetration of a machine. Digital signatures can be used to rectify these problems. Digital signatures are created when an asymmetric encryption algorithm is used to encrypt a hash with a private key. There are also algorithms specifically created for signing data, such as DSA. In either case, a secret key is used to encrypt a hash of a set of data, which is decrypted by a publicly known key. This secret "Private Key" must not be recoverable within any feasible measure of time by any method, especially by derivation from the "Public Key." By using digital signatures to sign executable files and libraries, alterations to these files can be detected. Furthermore, digitally signing drivers would allow the kernel to verify that tainted code is not being loaded. Thus, digital signatures can be used to protect against viruses, root kits, and stealth attacks involving malicious kernel modules. The DigSig project[19] implements signing of executable binaries in a very robust manner. It uses RSA encryption and includes caching of signature checks to greatly reduce the overhead. It uses the BSign program to insert digital signatures in binaries. This is an excellent start and could be further extended to a full featured implementation, which would have both user space and kernel space code. There are several operating systems which deploy a variety of protections. For example, Adamantix, Hardened Gentoo, and OpenBSD deploy the IBM Stack Smash Protector. Adamantix and Hardened Gentoo also deploy PaX, with Position Independent Executables to enhance the effectiveness of ASLR; while OpenBSD deploys the similar but less mature W^X. Redhat Linux deploys Exec Shield, similar to W^X. Adamantix and Hardened Gentoo employ protections from GrSecurity as well to prevent basic information leaking, obscure OS details, and prevent chroot() jail breakouts. Deployment It is important that we understand the importance of covering as wide a range as possible in deploying a secured system. The ICAT Metabase publicizes statistics[20] on the distribution of security vulnerabilities for each year, each class, each type, and for other attributes of the vulnerabilities. In deployment, we are most focused on desktop machines. As such, some of this section will be slanted towards compatibility, even in scenarios where the security will be decreased. Normal users will prefer their browser to be vulnerable over being forced to use an alternative; at the very least, they will want to be able to make the choice themselves. This is obviously not acceptable for production servers, and may not be acceptable for business settings even just for employees to get their e-mail and browse the company Intranet; such situations would either simply abandon having a web browser, or use a different browser. Our reference model here is based around the Linux operating system kernel. Being an open source and actively developed operating system, Linux makes an excellent demonstration base; however, the deployment of these enhancements is not limited to Linux, or even to open source systems. Rather than repeatedly constructing statements around phrases like, "... or a similar system," we will simply discuss implementation on Linux and allow those implementing other operating systems to adapt the protections. 1. Stack Smash Protection The recommended deployment method of SSP is to use the IBM Stack Smash Protector with the latest version of gcc, using the -fstack-protector switch, to build all programs which do not suffer incompatibilities with the protector. This switch is less prone to breaking things due to bugs in the SSP implementation, and incurs less overhead than -fstack-protector-all. It heuristically decides which functions to protect, so it may supply less overhead than -fstack-protector-all as well. The IBM Stack Smash Protector protects the stack frame pointer, return pointer, and arguments to functions. It is the most robust form of SSP that could be located in research for this paper, and is thus the most suited for all applications which benefit from SSP. The protector itself may have bugs in the way it applies protection; and some applications may contain harmless buffer overflow bugs which set off the protections in the normal course of use of the program. Such programs must be built without protection, or must be repaired to work properly with protection. In the case of bugs in the protector itself, programs should be built with protection in a debugging environment so that the bugs can be analyzed and the protector fixed; however, deployment to the user base must still remain unprotected. There are many ways the protector could be varied to be more controlled, or to be more useful in providing information. Such variations may exist in user space; or they may bring SSP functions into the kernel itself. They are a design decision to be handled by those deploying SSP, although recommendations will be made. A. Kernel Side SSP One interesting approach would be to implement the protector code in kernel. This would place the protector's guard value, __guard, and the check against __guard in the kernel. The guard value would have to be accessed via syscalls, as would the protection checking. This would potentially incur greater overhead than storing it in the heap; but if done properly would protect the exact location of the guard value from attackers who know its static offset from the base of whatever library supplies it. If SSP is integrated with the kernel, the user space side could still copy the guard value into user space. Utilizing mmap() and ASLR to map a single page of memory to a random virtual memory area, the guard value could be loaded into a random location in memory. This still leaves the fault that the address is in a specific symbol supplied by a given library, and can be located in the GOT. The above technique can protect the guard value itself by mprotect()ing the mmap() page that holds __guard. The page would be PROT_NONE when not used, and PROT_READ when the guard value is being checked. This would need to be done via a supplied function which locks a mutex, mprotect(PROT_READ), gets the value, mprotect(PROT_NONE), and returns the value. This would, of course, be more overhead than simply reading the guard value each time using a syscall; mprotect() is a syscall, so this would be guaranteed to be at least twice as slow, and definitely more due to the extra code calling mprotect(). It is probably most wise to simply keep the guard value in the kernel if this route is taken. This would allow the kernel to seed and check the __guard value, and to log the abort of the program itself. Possible pseudo code transitions are shown in Fig. 3.1-1. int foo() { /*unprotected*/ char a[15]; /*insert code here*/ return 0; } int foo() { /*protected*/ int local_guard; char a[15]; local_guard = sys_getguard(); /*set our guard value*/ /*insert code here*/ sys_stack_smash_handler(local_guard, __FUNC__); /*check our guard*/ return 0; } Fig. 3.1-1 In the above, the local_guard and sys_stack_smash_handler() are placed in the same way that the full user space implementation of SSP places them. The difference is that local_guard is seeded with a syscall, sys_getguard(); and that the guard check is done by a syscall, sys_stack_smash_handler(). In the kernel, task initialization would be augmented to entail pulling a random piece of data from the system entropy pool and attaching it to the task_struct for the process. The sys_getguard() syscall would simply return a copy of this value to user space. sys_stack_smash_handler() would check the passed local_guard against this value, and would kill the task and log the function and local_guard if there was a mismatch. The code in Fig. 3.1-2 illustrates. int sys_getguard() { return current->guard_value; } int sys_stack_smash_handler(int local_guard, char *fcn) { if (local_guard == current->guard_value) return 0; /*we're fine, they match*/ printk("Stack smashing attack in process %i in function %s;" " Task aborted with damage %i\n", current->pid, fcn, local_guard); kill_task(current); return -EFAULT; } Fig. 3.1-2 One thing the kernel side variation could supply over a full user space implementation would be the ability to control SSP via a program header marking. Currently, PT_PAX_FLAGS allows tristates for 6 flags. Assuming each tristate uses 2 bits--default (00), on (01), off (10)--this makes 12 bits used in total. The minimum that the PT_PAX_FLAGS header can hold in this case would be 16 bits, room enough for 2 more tristates. The `paxctl` program could easily be altered to take the -gG switches to disable/enable checking of __guard. It is important to note that moving __guard into the kernel will not prevent attackers from reading the copies of __guard on the stack. No risk analysis has been done to determine how likely it would be that an exploit could be crafted to trace the location of __guard from the GOT in the presence of ASLR, nor how likely it would be that an exploit could determine the address of any given copy of __guard on the stack. For this reason, it is indeterminate at best whether kernel-side handling of SSP would be a significant enhancement in security. B. Kernel Handled SSP Control over SSP can be done without doing full kernel implementation. The responsibility of killing the task can be left to the kernel, which could decide based on the tristate mentioned above whether to actually kill or just log. This would only require that __stack_smash_handler() occur inside the kernel, via syscall. With this method, gcc would not need additional modifications to the protector code; the current SSP implementation could be altered to do the syscall from __stack_smash_handler(), which is already used to handle a detected attack. Fig. 3.1-3 illustrates possible variations of the protected foo(), the syscall sys_stack_smash_handler(), and the __stack_smash_handler() function. int foo() { /*protected w/kernel kill responsibility*/ int local_guard = __guard; char a[15]; /*insert code here*/ if (local_guard != __guard) __stack_smash_handler(local_guard, __FUNC__); /*bad guard*/ return 0; } int __stack_smash_handler(int damage, char *fcn) { return sys_stack_smash_handler(damage, fcn); } int sys_stack_smash_handler(int local_guard, char *fcn) { printk("Stack smashing attack in process %i in function %s;", current->pid, fcn); if (current->flags & PT_PAX_GUARD) { printk(" task aborted with damage %i\n", local_guard); kill_task(current); } else printk("task continuing with damage %i\n", local_guard); return -EFAULT; } Fig. 3.1-3 In both cases, the damaged __guard is given for debugging reasons. This is not a serious information leak. If the task is to be killed, salvageable parts of __guard after a partial overwrite are useless; if the task is going to continue, then any attacker has no use for the __guard value anyway. It is important that a good entropy source be used, however, to prevent local attackers from seeking patterns by overwriting only the first byte of __guard in multiple attacks. With a protector implemented as such, current programs built with SSP will use the sys_stack_smash_handler() automatically, as it is called by the already in use __stack_smash_handler() supplied by SSP. This is a definite advantage over full Kernel Side SSP. C. Information Gathering Augmentation LD_PRELOAD libraries could augment SSP to gather debugging information and automatically pool it on a remote server. This would be bad form security wise for a mission critical server; but for less critical environments that can allow for it, it can aid in discovering buggy programs and new exploits. Such environments may realistically include home and business desktops; but it is important to leave this as an Opt-In option only. Because SSP catches exploits before they are triggered by a return, it is theoretically not a security issue to make function calls which package debugging data and ship it out to a server, or dump it to a file. When these functions have finished, SSP will abort the program at a point still preempting the return from the exploited function. Still, these actions may package up sensitive information, or may themselves call vulnerable code, and thus are not appropriate to be deployed by default. The addition of information gathering systems would be possible for all SSP implementations. Any implementation can use __stack_smash_handler(), even if it simply makes a syscall by default. The library or libraries supplying SSP symbols and all symbols needed by SSP must be placed in /lib. Because system-critical programs such as mount and init require libraries in /lib, /lib is always on the / partition. Thus, a library in /lib would always be accessible, unlike libraries in /usr which may exist on a separate partition in some implementations. Currently, the IBM implementation of SSP places __guard and __stack_smash_handler() in gcclib. This library commonly exists somewhere below /usr. Programs such as gzip exist in /bin, with libz in /lib. These programs and libraries may be vulnerable to buffer overflow based attacks, and would benefit from SSP. Without /usr mounted, such programs cannot function with SSP; these programs may be needed in such cases, for example to decompress a compressed backup of /usr. Some implementations of the IBM Stack Smash Protector, such as that used by Gentoo Linux, modify it to place the __guard and __stack_smash_handler symbols in glibc. It has also been discussed that these symbols may be placed into their own library to decrease the memory footprint of protected programs not using glibc; however, this route would disallow the library from using calls such as syslog() to inform users and developers about the fault. It would be possible to provide multiple SSP symbol implementations, especially since some programs compile against different libc replacements like uCLibc. Managing which is used and where SSP is used would be an important task in this case. For example, uCLibc and glibc would need to be patched to supply __guard and __stack_smash_handler; programs using other libraries would need to either not have SSP, or have a stripped down SSP in a libssp. However, such programs may use their own __guard and __stack_smash_handler, or may get them from yet another libc clone. The task of handling alternative libc implementations is fortunately neither extremely complex nor extremely common. Most programs will invariably use the system's default libc implementation, which makes case-by-case handling infrequent, aside from it being for all intents and purposes easy. This paper recommends Kernel Handled SSP, with a tristate in the program header to control enforcement, as illustrated by Fig. 3.1-3 above. A library in /lib should supply __guard and __stack_smash_handler(), whether it be libc itself or a stand-alone libssp. There are other variations not covered here. These may be interesting to research, as they may reveal ways to prevent the leaking of the __guard value to an attacker, or may present other advantages. To deploy SSP, the gcc specs file can be modified[21] to include the use of -fstack-protector by default. This is done by the Hardened Gentoo project, for example. A variation on the affected section from the Hardened Gentoo version of the specs file is shown in Fig. 3.1-4, including the modification to generate PIE binaries. *cc1: %(cc1_cpu) %{profile:-p} %{m32: %{!msse2:-mno-sse2} } \ %{!D__KERNEL__: %{!static: %{!fno-PIC: %{!fno-pic: %{!shared: \ %{!nostdlib: %{!nostartfiles: %{!fno-PIE: %{!fno-pie: %{!nopie: \ %{!fPIC:%{!fpic:-fPIE}}} } } } } } } } } %{!nostdlib: \ %{!fno-stack-protector: -fstack-protector } } } Fig. 3.1-4 Changing the specs file system-wide breaks isolation; the system can fail to function as expected due to two isolated tasks which normally would have no effect on one another. Still, the changes to the specs file may cause some programs to fail to compile. In order to fix this, an environment variable should be used to control which specs file gcc uses. CFLAGS can supply -specs=/path/to/specs for this; or a separate environment variable such as GCC_SPECS can hold the path to the specs file. The second solution is being explored by Gentoo Linux at the time of this writing. Because environment variables are inherited as new environments are created, such as during the run of a build process, they are prime candidates for containing such information. However, some build systems ignore CFLAGS. By modifying gcc to check the GCC_SPECS environment variable, the best guarantee that the proper specs will be used is achieved. On a final note, some versions of the protector are known to have the protection code optimized away with -O3. This should be fixed in the IBM Stack Smash Protector as soon as possible; in the mean time, avoid -O3. 2. Executable Space Protections Executable Space Protections can be deployed on many architectures using PaX. A number of methods of deployment could be used, each ranging its own ratio of security vs. compatibility. The recommended course of action is to allow the administrator to control how protections are applied, either by setting an automatic default method or by being asked where protections should be applied on a case by case basis. Any binary which may function under full restrictions should be set to function under full restrictions automatically, without asking. There may be an option to ask the administrator in every case including those where the greatest security is used by default; but in most cases, the administrator will not want to be bothered unless a security concern is raised. There are three states for restrictions. In the Default state, the restriction is not explicitly enabled or disabled; PaX decides whether to use the restriction based on the Softmode setting. If the system is in Softmode, PaX does not enable restrictions in the Default state; if the system is not in Softmode, PaX enables restrictions in the Default state. Contrastingly, restrictions in the Enabled state are enabled under PaX regardless of Softmode, while restrictions in the Disabled state are disabled under PaX regardless of Softmode. Here, the term "compatibility" is used to indicate how much software doesn't work. A system with low compatibility will have software that does not run due to security restrictions; while a system with high compatibility will run most if not all software, including third party software. There are four basic methods of PaX flag control, each detailed briefly below. As stated above, the administrator should choose which method to employ. A. Manual Control Manual Control is not recommended as a default. Under Manual Control, all restrictions remain in the Default state on all binaries at installation time. This imposes the most added administrative duty and the least compatibility. B. Selective Disable Selective Disable is the most basic form of control, allowing the implementation to ship with everything working. Under Selective Disable, binaries known to break due to PaX restrictions have those restrictions set to the Disabled state when installed, leaving the rest in the Default state. This relieves most administrative duty and increases compatibility, although third party binaries may not come marked. C. Inheritive Selective Disable Inheritive Selective Disable is similar to Selective Disable, except that libraries are also marked and tabs are kept on these. When software is installed which uses a library, the Disabled features of the executable and each library are masked together to come up with the final mask to apply to the executable. These masks can later be generated for third party programs with an administrative tool in order to enhance compatibility further; although third party programs and libraries requiring other markings in themselves not also needed by other libraries will still break. D. Selective Enable Selective Enable is the only method leveraging Softmode to enhance compatibility. It is also the only method which will leave third party binaries completely exposed with no reason aside from that they are not explicitly packaged with a set of listed restrictions. Under Selective Enable, executable binaries have all restrictions except those known to break them set to Enabled, leaving the rest in the Default state. Third party binaries which come with no markings will have no restrictions in Softmode, and so full compatibility is reached with the maximum justifiable trade-off in the range of executables protected by PaX. The above methods become progressively more compatible, but at the same time less secure. Both the standard and Inheritive variations of the Selective Disable method are about on par in principle; the administrator will obviously disable protections on third party software that breaks, so attempting to pre-empt this by identifying what absolutely will break the software may prevent shotgunning (disabling everything that could possibly cause a problem). This is a best-effort compatibility model; anything obviously incompatible is adjusted, but some things may be missed. The Selective Enable method, on the other hand, will take compatibility out to its edge and switch completely to a best-effort security model, where anything obviously compatible is secured, but anything not defined is left alone to avoid breakage. Microsoft's implementation of NX protections on 64 bit Windows XP SP2 allows a mode which is akin to this, in which only core system software is protected[22]. In any protection method, third party vendors can mark their binaries to explicitly enable and disable protections to be compatible under any method. If this additional information could be guaranteed from third party vendors, the Inheritive Selective Disable method could be completed by masking the still Enabled protections with the inherited Disabled protections to produce the final mask, giving guaranteed compatibility with least privileges. To build PaX, a patched Linux kernel must be used. Fig. 3.2-1 shows settings for PaX that may be used. Security options->PaX [*] Enable various PaX features Security options->PaX->PaX Control [*] Support soft mode [*] Use legacy ELF header marking [*] Use ELF program header marking MAC system integration (none) ---> Security options->PaX->Non-executable pages [*] Enforce non-executable pages [*] Paging based non-executable pages [*] Restrict mprotect() [ ] Disallow ELF text relocations Fig. 3.2-1 The settings shown in Fig. 3.2-1 are derived from those available on an AMD x86_64 architecture. Other architectures with hardware-based NX support will appear similar; however, the 32 bit x86 architecture will have significant differences, most importantly a second method of non-executable page enforcement and the ability to emulate trampolines. The affected sections are shown in Fig. 3.2-2. Security options->PaX->Non-executable pages [*] Enforce non-executable pages [*] Paging based non-executable pages [*] Segmentation based non-executable pages Default non-executable page method (SEGMEXEC) ---> [*] Emulate trampolines [*] Restrict mprotect() [ ] Disallow ELF text relocations Fig. 3.2-2 It is important that those deploying PaX read the help on options not covered here, and preferable that they read the help on all options. One important example is the "Automatically emulate ELF PLT" option on several platforms. This option's help screen explains that it may be impossible to even boot the system with this; but that the preferred solution would be to rebuild binaries with a toolchain which produces non-writable PLTs. 3. Address Space Layout Randomization Address Space Layout Randomization can also be applied to multiple architectures using PaX. To further enhance the randomization, Position Independent Executables (PIE) should be used. These can be built using the modified *cc1 section of the gcc specs file, as shown in figure 3.1-4. To build a kernel with ASLR supplied by PaX, the kernel must be patched with PaX. It must then have certain options enabled. Figure 3.3-1 shows which options to enable for x86_64, while figure 3.3-2 shows which options to enable for x86. Security options->PaX [*] Enable various PaX features Security options->PaX->Address Space Layout Randomization [*] Address Space Layout Randomization [*] Randomize user stack base [*] Randomize mmap() base [*] Randomize ET_EXEC base Fig. 3.3-1 Security options->PaX [*] Enable various PaX features Security options->PaX->Address Space Layout Randomization [*] Address Space Layout Randomization [ ] Randomize kernel stack base [*] Randomize user stack base [*] Randomize mmap() base [*] Randomize ET_EXEC base --- Disable the vsyscall page Fig. 3.3-2 As always, it is wise for those deploying ASLR to read the help for each option. Especially noteworthy here is the "Randomize kernel stack base" option on x86, which may cause unexpected stack overflows. This cannot be disabled on a per-process basis, so enabling it should only be done after careful testing and with the understanding that third party software may still break irreparably. The vsyscall page is automatically disabled on x86 when PAGEEXEC is built. The logic of PAGEEXEC allows an enhancement which gains much speed over the original method when tasks are protected with PAGEEXEC; however, this method fails to protect data below the highest executable page, and thus the old method is fallen back to. Because the vsyscall places executable code at a very high address, it will cause PAGEEXEC to always fall back to the original method, which will incur a large level of overhead. For this reason, the PaX Team has decided to disable the vsyscall page on x86 whenever PAGEEXEC is built in. Some versions of glibc break without the vsyscall page, leaving the system non-bootable. Debian bug #245563[23] was opened for this reason, and was closed when GOTO Masanori backported the proper behavior to glibc 2.3.2. 4. Information Leak Containment There will always be information leaking bugs. The best that can be done is to find and fix them, while containing basic information leaks with patches such as GrSecurity. While bugs are an unexpected phenomena which cannot be controlled, basic information leaks in expected operation are both highly predictable and easily contained. GrSecurity brings with it several information leaking controls. One interesting one is the "Remove addresses from /proc//[maps|stat]" option, which obscures the address space layout of processes. This makes it impossible for attackers to assess the organization of a program's address space without making non-guaranteed guesses in most if not all cases. This is an important compliment to ASLR. Other options that may prevent information leaking include the "Hide kernel symbols" option; the "Proc restrictions" and related options; the "Dmesg(8) restriction" option; and various randomization options such as "Randomized PIDs," "Truly random TCP ISN selection," "Randomized IP IDs," "Randomized TCP source ports," and "Randomized RPC XIDs." These may have varied effects, some of which are as simple as preventing automated scripts from discovering what OS is running; while others such as the combination of /proc restrictions and randomized PIDs result in leaving attackers without a starting point for certain attacks such as ptrace() based exploits. 5. Format String Vulnerabilities Format String Vulnerabilities have not yet seen a perfect solution. Technologies such as FormatGuard should be examined and improved; but a full solution may never be ready. 6. Fork Bomb Defusers The rexFBD fork bomb defuser should be improved and ported to the latest Linux kernel. Because fork bombs cannot be effective while evading detection, fork bomb detection and defusing need not be precise. Also, some tasks may fork() a lot, and may be killed by the fork bomb defuser. These tasks will need a control on them to prevent this, such as a tristate in PT_PAX_FLAGS. It should be noted that the rexFBD has not been examined in detail, and so this paper may recommend that it do things which it does already. These guidelines are still valid for new implementations. The FBD should be time sensitive and fork() count sensitive. It should count forks, and reset the count if it has not reached max_forks_per_second every second. This reset should also reset time. Fig. 3.6-1 illustrates pseudocode which has this logic. /* Check for fork bomb * This won't even get called if the fork bomb defuser is disabled for * this task; why waste the cycles if you're just not going to do anything */ int fork_check() { /* We want the highest parent in the process tree with the same image, * without skipping any inbetween. */ task_t *fork_parent = top_parent_with_image(current); if (time() - fork_parent->forkbomb_time > 1*SECOND) { fork_parent->forkbomb_time = time(); fork_parent->forks = 1; return 0; /*eh we're fine*/ } fork_parent->forks++; /* Check if too much is going on, kill if so */ if (fork_parent->forks > max_forks_per_second) { kill_task_tree(fork_parent); return -1; } return 0; /*still fine*/ } Fig. 3.6-1 Although defusing fork() bombs is important, there's also the issue of sudden memory allocation. Because different programs have different demands, it is difficult to tune a system to kill sudden high order allocations. A task may normally use 16M and may suddenly allocate 600M, and that would be killed, as it would likely be a runaway; however, a filesystem browser may suddenly allocate 400M to display a directory listing of 1,000,000 files, and be killed for a legitimate allocation. This problem may deserve some research. A Fork bomb defuser may test that a program has a certain program header flag in it. This could be added to the PaX PT_PAX_FLAGS header and controlled via paxctl -fF. The paxctl tool would need modification for this. References [1] http://www.research.ibm.com/trl/projects/security/ssp/main.html [2] http://www.immunix.com/technology/compiler.php [3] http://www.research.ibm.com/trl/projects/security/ssp/ [4] http://gcc.gnu.org/ [5] http://www.coresecurity.com/common/showdoc.php?idx=242&idxseccion=11 [6] http://pax.grsecurity.net/docs/pax.txt [7] http://pax.grsecurity.net/docs/mprotect.txt [8] http://www.kaffe.org/pipermail/kaffe/2004-October/099938.html [9] http://savannah.nongnu.org/projects/qemu/ [10] http://pax.grsecurity.net/ [11] http://www.openbsd.org/33.html [12] http://en.wikipedia.org/wiki/ASLR [13] http://pax.grsecurity.net/docs/aslr.txt [14] http://people.redhat.com/mingo/exec-shield/ [15] http://en.wikipedia.org/wiki/Security_by_obscurity [16] http://grsecurity.net/ [17] http://www.usenix.org/events/sec01/cowanbarringer.html [18] http://rexgrep.tripod.com/rexfbd.htm [19] http://disec.sourceforge.net/ [20] http://icat.nist.gov/icat.cfm?function=statistics [21] http://www.gentoo.org/proj/en/hardened/etdyn-ssp.xml?style=printable [22] http://www.microsoft.com/technet/prodtechnol/winxppro/maintain/sp2mempr.mspx [23] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=245563