ELF Internals - Objects, Executables, and Libraries

Previously, in “Why Userspace ELF Loading isn’t Stealthy”, I wrote about my attempt to replicate a Windows style stealthy execution technique by building a custom ELF loader. That article focused on what can be observed when using such a loader and highlighted its limitations, but it didn’t go into what an ELF loader actually is or why it works the way it does. While that piece was likely most useful to developers with a Windows background and some Linux familiarity, I think it’s valuable to step back and examine the ELF format itself. This will form the foundation for a short series, starting from the basics of ELF, through PIE and non-PIE executables, and eventually, moving into Windows PE internals.

Define ELF #

Executable and Linkable Format (ELF) is a standard binary file format for Unix and Unix-like systems. Every ELF file begins with an obligatory ELF header(Elf32_Ehdr/Elf64_Ehdr), which identifies the file as an ELF object and provides essential metadata, including the target architecture, endianness, word size (32-bit or 64-bit), and the type of file (executable, shared library, or object file). Besides that, the header contains offsets to other structures within the file, like the program header table (Elf32_Phdr/Elf64_Phdr) and the section header table (Elf32_Shdr/Elf64_Shdr), which define how the binary is mapped into memory and how sections like .text, .data, and .bss are organized. A visual representation can be found in the corkami project - elf101

Technically, an ELF file without a program header table (Phdr) or section header table (Shdr) can still be valid, though it depends on the context.

The ELF header (Ehdr) is the only truly mandatory structure for an object file. For an executable or shared library, a Phdr is also required, because it tells the loader which segments to map into memory. The SHDR, on the other hand, is optional and primarily used for linking and debugging.

.o .out .so #

For the following explanations I will use this short C program payload.c:

#include <stdio.h>  
#include <stdlib.h>  
  
int main(int argc, char *argv[], char *envp[]) {  
   printf("[*] Hello payload!\n");  
   printf("[*] Received %d arguments.\n", argc);  
  
   for(int i = 0; i < argc; i++) {  
       printf("    argv[%d]: %s\n", i, argv[i]);  
   }  
  
   if (envp[0]) {  
       printf("[*] First environment variable: %s\n", envp[0]);  
   }  
  
   return 69;  
}

Object files #

These files, usually with a .o extension, contain machine code (.text), data (.data), symbols (.symtab), but cannot run on their own, because they lack an ELF Program header (Phdr). These files can be created with gcc -c (compile and assemble, but do not link).

Example after compiling with the -c flag:

$ file payload.o
payload.o: ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), not stripped
  • relocatable - ET_REL
  • LSB - little-endian
  • x86-64 - target architecture
  • version 1 - ELF version
  • SYSV - ABI (System V)
  • not stripped - symbol table present (.symtab) Note: For many more details readelf would be better, though I will continue using file, because its output is enough for the scope of this article.

If you strip the file, .symtab and .strtab will disappear. The section header table (Shdr) may remain, but some tools can remove it entirely. SHDR is needed for linking, but not for runtime execution.

Since this is a relocatable object (ET_REL), you need to link it. The linker (ld, usually invoked by gcc) takes one or more object files and combines them into an executable.

Example:

$ gcc payload.o -o payload

Executables #

These are ELF files that can be directly run by the kernel. Unlike object files, they contain all the information needed to start a process. Executables contain the entry point, the necessary segments containing code, initialized/uninitialized data, and an ELF Program header table (Phdr), which tells the kernel which segments should be mapped into memory and with what permissions. After linking the previous object file payload.o into payload we see a pie executable - ET_DYN (instead of relocatable - ET_REL ), which is dynamically linked:

$ file payload  
payload: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildI  
D[sha1]=c7b15a1e169780efdf1a65815b07f58373fec1ce, for GNU/Linux 3.2.0, not stripped

$ ./payload
[*] Hello payload!  
[*] Received 1 arguments.  
   argv[0]: ./payload  
[*] First environment variable: SHELL=/bin/bash

I will write more on statically/dynamically linked PIE/NO-PIE executables in the articles that follow.

Shared Objects #

Shared objects are ELF files of type ET_DYN, usually with a .so extension. They contain code and data intended to be linked with other programs at runtime, rather than compile time. They allow multiple processes to use the same physical memory for common libraries (like libc), saving resources.

Both PIE executables and shared objects are ET_DYN. The difference is how they are intended to be used, not their ELF type.

For instance, if I attempt to create a shared object from the above code like this:

gcc payload.c -o libpayload.so

I will still get a pie executable (the extension is completely irrelevant):

$ file libpayload.so    
libpayload.so: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2,  
BuildID[sha1]=b1dbd97462f8fc20da006dc7ee2b3f869328594e, for GNU/Linux 3.2.0, not stripped  

$ ./libpayload.so    
[*] Hello payload!  
[*] Received 1 arguments.  
   argv[0]: ./libpayload.so  
[*] First environment variable: SHELL=/bin/bash

A conventional shared library does not define main and is linked with the -shared flag:

$ gcc -shared payload.c -o libpayload.so

$ file libpayload.so
libpayload.so: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, BuildID[sha1]=0ac7932048a12f46af0867367a82  
484be82ad11f, not stripped  

$ ./libpayload.so    
Segmentation fault (core dumped)

So, is it impossible to run a library?

Effectively, the -shared flag stripped away the executable characteristics. A shared object like this cannot be executed directly because there is no interpreter (.interp), which means the kernel doesn’t know to load the dynamic linker (ld-linux). Also, _start isn’t linked, the ELF header’s entry point points to unresolved code. The library assumes the caller sets up the process state, not the kernel. When you try to run it, the kernel jumps to the address specified in the ELF header (often just offset 0 or the start of .text). The CPU tries to execute instructions without a valid stack frame or resolved memory addresses, resulting in an immediate Segmentation Fault.

So, shared objects are meant to be loaded by another program, not the kernel. Even if they define main(), without linking as an executable, the kernel cannot run them.

However, you can engineer a shared object that is also a valid executable. The most famous straightforward example is libc:

$/lib/x86_64-linux-gnu/libc.so.6
GNU C Library (Ubuntu GLIBC 2.39-0ubuntu8.6) stable release version 2.39.  
Copyright (C) 2024 Free Software Foundation, Inc.  
This is free software; see the source for copying conditions.  
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A  
PARTICULAR PURPOSE.  
Compiled by GNU CC version 13.3.0.  
libc ABIs: UNIQUE IFUNC ABSOLUTE  
Minimum supported kernel: 3.2.0  
For bug reporting instructions, please see:  
<https://bugs.launchpad.net/ubuntu/+source/glibc/+bugs>.

The glibc (GNU C Library) build process links libc.so with a custom entry point and explicitly defines an interpreter path, effectively creating a hybrid ET_DYN that works both ways.

New file payload_lib.c:

#include <stdio.h>  
  
void payload_run(void) {  
   printf("[*] Function from libpayload.so executed\n");  
}

New file payload_call.c:

#include <stdio.h>  
  
void payload_run(void);  
  
int main(void) {  
   printf("[*] Calling function from shared library\n");  
   payload_run();  
   return 0;  
}

Now compile and execute:

$ gcc -shared payload_lib.c -o libpayload.so  
cation@local:~/Documents/elf-loader/tests$ file libpayload.so    
libpayload.so: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, BuildID[sha1]=62c81b235174fc966b4dc79233b1  
51b9ed620928, not stripped  

$ gcc payload_call.c -L. -lpayload -o payload_call  
cation@local:~/Documents/elf-loader/tests$ file payload_call  
payload_call: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, B  
uildID[sha1]=06fa46ecd5268b95fdfc1c3454c243ba0e54d0a7, for GNU/Linux 3.2.0, not stripped  

$ LD_LIBRARY_PATH=. ./payload_call    
[*] Calling function from shared library  
[*] Function from libpayload.so executed  

By default, when the kernel starts payload_call, it loads the interpreter (ld-linux) from .interp. Then, ld-linux searches only a few locations to find libpayload.so. To tell the dynamic linker (ld-linux) where to search at runtime, we use LD_LIBRARY_PATH. This is okay for testing, but not production.

A better practice is to pass additional options (runtime path) to the linker through -Wl:

$ gcc payload_call.c -L. -lpayload -Wl,-rpath,'$ORIGIN' -o payload_call
$ ./payload_call
[*] Calling function from shared library  
[*] Function from libpayload.so executed  

$ORIGIN expands to the directory of the executable, this way no environment variables are needed. This is how most real world binaries ship .so files.

What is next #

This article intentionally stopped at what ELF binaries are and how executables and shared objects differ at a structural level. The next articles in this series will cover PIE vs nonPIE. What actually changes between ET_EXEC and ET_DYN, how ASLR applies in each case, and why modern Linux defaults to PIE. What gets resolved at link time versus runtime, what the dynamic linker is responsible for, and how relocation actually occurs.

Also, there are more interesting nuances related to what we did when creating the final shared object, and I plan on doing an article about that as well. It demonstrates dynamic loader security violation categorized as a MITRE ATT&CK technique.


Quickly On AI hallucinated knowledge #

When reading technical details about how Linux works (this probably applies to most content online nowadays), treat the following as red flags:

  • Invented function names that only sound plausible
  • Collapsing multiple call layers into fake high-level summaries
  • Mixing historical kernels and academic descriptions
  • Mentioning functions that cannot be found in the kernel via grep
  • Avoiding concrete call graphs or code references
  • Relying on phrases like “high level actions” instead of actually showing the control flow

Any explanation that invents functions or obscures the call chain should be treated as untrusted. Read kernel code. Verify claims. And if I get something wrong in the articles that follow, correct me.

Get Involved #

I think knowledge should be shared and discussions encouraged. So, don’t hesitate to ask questions, or suggest topics you’d like me to cover in future posts.

Stay Connected #

You can contact me at ion.miron@tutanota.com