Outside of a dog, a book is a man's best friend. Inside a dog it's too dark to read. | |
Groucho Marx |
Let's get a bit more serious and examine the assembly program from In the language of evil with readelf, part of the binutils package.
Command.
#!/bin/sh strip tmp/evil_magic/nasm ls -l tmp/evil_magic/nasm readelf -l tmp/evil_magic/nasm |
Output.
-rwxrwxr-x 1 alba alba 476 Apr 10 00:03 tmp/evil_magic/nasm Elf file type is EXEC (Executable file) Entry point 0x8048080 There are 1 program headers, starting at offset 52 Program Header: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align LOAD 0x000000 0x08048000 0x08048000 0x00097 0x00097 R E 0x1000 Section to Segment mapping: Segment Sections... 00 .text |
Nice to see the entry point we retrieved through od again. Program layout is a simplified variation of Sort of an answer. The value of FileSiz includes ELF header and program header. The size of this overhead is:
So effective code size is:overhead = Entry point - VirtAddr = 0x08048080 - 0x8048000 = 0x80 bytes
This matches with the disassembly listing. However, the ratio of file size to effective code deserves the title "Bloat", with capital B.code size = FileSiz - overhead = 0x97 - 0x80 = 0x17 = 23 bytes
Only 5 percent of the file actually do something useful!code size / file size = 23 / 476 = 0.048
Anyway, we see that even for trivial examples the code is surrounded by lots of other stuff. Let's zoom in on our target.
Command.
#!/bin/sh ls -l /bin/bash readelf -l /bin/bash |
Output.
-rwxr-xr-x 1 root root 519964 Jul 9 2001 /bin/bash Elf file type is EXEC (Executable file) Entry point 0x8059380 There are 6 program headers, starting at offset 52 Program Headers: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align PHDR 0x000034 0x08048034 0x08048034 0x000c0 0x000c0 R E 0x4 INTERP 0x0000f4 0x080480f4 0x080480f4 0x00013 0x00013 R 0x1 [Requesting program interpreter: /lib/ld-linux.so.2] LOAD 0x000000 0x08048000 0x08048000 0x79273 0x79273 R E 0x1000 LOAD 0x079280 0x080c2280 0x080c2280 0x057e0 0x09bd0 RW 0x1000 DYNAMIC 0x07e980 0x080c7980 0x080c7980 0x000e0 0x000e0 RW 0x4 NOTE 0x000108 0x08048108 0x08048108 0x00020 0x00020 R 0x4 Section to Segment mapping: Segment Sections... 00 01 .interp 02 .interp .note.ABI-tag .hash .dynsym .dynstr .gnu.version .gnu.version_r .rel.got .rel.bss .rel.plt .init .plt .text .fini .rodata 03 .data .eh_frame .ctors .dtors .got .dynamic .bss 04 .dynamic 05 .note.ABI-tag |
Looks intimidating. But then the ELF specification says that only segments of type "LOAD" are considered for execution. Since the flags of the first one are R E, meaning "read & execute", we know that it must be the code segment. The other one has RW, meaning "read & write", so it must be the data segment.
MemSiz is larger than FileSiz in the data segment. Just like with mmap(2) excessive bytes are defined to be initialized with 0. The linker takes advantages of that by grouping all variables that should be initialized to zero at the end. Note that the last section of segment 3 (counting starts with 0) is called .bss, the traditional name for this kind of area.
The mapping for segment 2 looks even more complex. But I would guess that .rodata means "read-only data" and .text contains productive code, as opposed to the administrative stuff in the other sections.
The distance between the two LOAD segments is interesting:
VirtAddr[2] - VirtAddr[1] - FileSiz[1] = 0x80c2280 - 0x8048000 - 0x79273 = 0x100d = 4109 bytes
Only 13 bytes (0xd) would be needed to align the first LOAD segment up to the alignment of 0x1000. For some reason at least one complete page lies between code segment and data segment. This would be easy target for a tiny virus. So lets check out whether this is a unique phenomenon.
Source.
#!/usr/bin/perl -w use strict; my $min = 0xFFFFFFFF; my $max = 0; LOOP: while(my $filename = <>) { chomp $filename; $filename =~ s/^\s*//; if ( ! -e $filename ) { printf "Can't find [%s]\n", $filename; next LOOP; } open(ELF, '-|', "readelf -l $filename 2>&1") || die "$1 ($filename)"; my $nrLoad = 0; my $end = 0; while(my $line = <ELF>) { chomp $line; if ($line =~ m/^\s*LOAD\s*/) { $nrLoad++; my @number = split / +/, $line; my $virtaddr = hex($number[3]); my $filesiz = hex($number[5]); if ($end != 0) { my $dist = $virtaddr - $end; if ($dist < 0x1000) { printf "%-32s virtaddr=%#08x dist=%#08x\n", $filename, $virtaddr, $dist; } $max = $dist if ($dist > $max); $min = $dist if ($dist < $min); } $end = $virtaddr + $filesiz; } } if ($nrLoad != 2) { printf "%-32s has %d LOAD segments.\n", $filename, $nrLoad; } close ELF; } printf "\n%d files; min_distance=%#08x max_distance=%#08x\n", $., $min, $max; |
Command.
#!/bin/sh find /bin -type f -maxdepth 1 | src/check_dist/check_dist.pl echo "" echo tmp/evil_magic/nasm | src/check_dist/check_dist.pl |
Output.
/bin/igawk has 0 LOAD segments. /bin/vimtutor has 0 LOAD segments. 73 files; min_distance=0x001000 max_distance=0x00101f tmp/evil_magic/nasm has 1 LOAD segments. 1 files; min_distance=0xffffffff max_distance=00000000 |
Yes, this empty page is common usage, at least in /bin.
You may have heard that Linux is a difficult target for malware because there are so many different distributions. Well, they all use basically the same compiler, producing the same idiosyncrasies. This allows us to cheat in big style.
Insert our code between code segment and data segment.
Modify inserted code to jump to original entry point afterwards.
Change entry point to start of our code.
Modify program header
Include increased amount of code in entry of code segment.
Move all following entries down the file.
Modify section header
Include trailing code in last section of code segment (should be .rodata).
Move all following sections down the file.
This setup has two big problems, however.
Code size is limited to 0x1000 bytes. Manageable with assembly. Tough luck for C.
Infected executables will be detected by above Perl script. Yes, I actually wrote the scanner before the virus. The truly paranoid doesn't trust himself.
Of course the naive implementation through parsing readelf's output significantly limits performance. But use of file as a fast file-type filter will lower noise ("has 0 LOAD segments") and duration to acceptable regions.
Command.
#!/bin/sh find /usr/bin -type f -maxdepth 1 -print0 \ | xargs -r0 file -i \ | sed -ne 's/: *application\/x-executable-file,.*//p' \ | src/check_dist/check_dist.pl |
Output.
file: Using regular magic file `/usr/share/magic.mime' file: Using regular magic file `/usr/share/magic.mime' 1031 files; min_distance=0x001000 max_distance=0x00101f |
Since all executables in /bin and /usr/bin follow the same layout, a heuristic scanner can easily spot deviations. A "perfect infection", resulting in a executable indistinguishable from the real thing, is far from sight. But then there are bigger issues an innocent virus seeking a warm nest in the wild would face.
For example RPM-based distributions maintain a checksum database. Verifying a single file, a complete package, or even all installed packages takes just one command.
If you know what you are looking for:
rpm --verify -f /bin/sh |
For dedicated people with enough time to read the output:
/bin/nice -n 19 rpm --verify --all |
A possible counter attack is to patch the database after infection. This is distribution dependent and requires root permissions. And it won't help against people who have the checksums offline, e.g. with tripwire.
Another possible attack is to hide the original (uninfected) executable on the file system, and patch the kernel via an inserted module to fake calculation of the checksum. And if the kernel is compiled without module-support, there is still direct access to /dev/kmem to install a kernel-patch…
On this road lies madness.
<<< Previous | Home | Next >>> |
The magic of the Elf | One step closer to the edge |