Using The Coroner's Toolkit : Rescuing files with lazarus

Implementation Details
Applies to the practice:
Analyze all available information to characterize an intrusion

Applicable technologies:
Sun Solaris 2.x; UNIX operating systems and derivatives

 
Introduction In the aftermath of a network break-in, system administrators are often asked to explain what happened. The Coroner's Toolkit (TCT) is a collection of tools that gather and analyze data on a UNIX system and help the administrator answer this question. unrm and lazarus are part of the toolkit and can be used to restore deleted files and data that can not be easily accessed. 
  • unrm provides access to the unallocated portions of a UNIX file system and writes the raw data into an output file.
  • lazarus attempts to reconstruct files or data from raw data. Usually this raw data is provided by unrm. In addition, lazarus can be used to reconstruct data objects from other sources such as system memory and swap space. The tool has been used with data from UFS, EXT2, NTFS, and FAT32 file systems, but it can be used on just about any type of file system. Your success will vary with the way the data is stored.
Together both tools provide important assistance whenever files are accidentally deleted. For the forensic analyst, these tools provide access to files an intruder may have deleted to hide his attack tools or eliminate important log information. 

As the TCT authors point out, "If there was a theme, it would be the reconstruction of the past - determining as much as possible what happened with a static snapshot of a system." Certainly such activities require an experienced and committed system administrator during the forensic investigation phase of an intrusion. No software can replace someone who knows his or her system, but TCT is a start. 

In addition to unrm and lazarus, TCT contains another tool, called grave-robber, that can aid in identifying what happened after a break-in. Grave-robber controls several other tools in an attempt to capture as much information as possible about a potentially compromised system and its files. Its use is explained in the implementation Using the Coroner's Toolkit: Harvesting information with grave-robber.

Using the TCT tools can require a great deal of time and effort. You need to review all documentation carefully and test all parts before use so that you can understand and take full advantage of their features. In particular, when using unrm and lazarus, you need to read the file help-recovering-file and docs/lazarus.README which comes as part of the TCT package. 

The installation of the TCT toolkit is explained within the implementation Installing The Coroner's Toolkit and using the mactime utility.

This implementation discusses the use of two TCT tools, unrm and lazarus, on the Sun Solaris operating system, version 2.x. You can use this approach with other UNIX operating systems and hosts.


Effort Estimates The time needed to retrieve unallocated disk space by unrm depends on the underlying system, its processor, and required disk space. The time needed to analyze the raw data by lazarus depends on this as well. 

Using these tools can take considerable time and create a very large number of output files. The space needed for the output of unrm is 1000f the unallocated blocks that should be recovered. lazarus will also use approximately 1200f this space. 

The technical analysis of this output can easily take days.


Prerequisites The output of unrm and lazarus can take considerable space. For example, for a system with 1.0 GB of unallocated disk space, unrm and lazarus will need approximately 2.2 GB of free space to store all output data. You need to create this output data on another file system; otherwise the unallocated disk space will be used for the newly created files, thereby destroying the files you are attempting to recover. In addition, during forensic analysis, you want to minimize changes to the system under investigation. 

Make sure that spare disks are reserved and available when you need them to store the unrm and lazarus output. Make sure you know how to connect such disks to a live system such that minimal changes occur on the system being analyzed. Ensure that you mount the disk that needs to be analyzed as read only on a different system to minimize the risk of losing interesting data as a result of normal system events. This implies that unrm and lazarus are used after grave-robber

To reliably analyze any system, you must use unmodified, authentic tools. Therefore, use write-protected media to store tools like the TCT and others used during forensic analysis.


Recovering data from the unallocated section with unrm. To run unrm, you first need to identify the appropriate system device from which to recover the unallocated disk space. Then you need to identify a safe place to store the retrieved raw data. 

For our example, we assume that the system device you want to recover raw data from is mounted read-only as /dev/sd6. To create a safe place for the analysis results, first create a new directory for all results. Then assign another account, not the root account, as owner of this directory. lazarus does not depend on root privileges. Check that the output will be stored on a local disk (/dev/sd2) which  hosts the /spare/tct-data directory and that enough free space is available: 

# mkdir /spare/tct-data
# chmod 750 /spare/tct-data
# chown kpk:wheel /spare/tct-data

Based on this information, unrm is invoked with the following command line. We can use the local copy of the TCT tools as we are working from a secure workstation with the potentially compromised disk mounted . Additional commands are necessary to arrange the right access controls as well the ownership and location of the created file: 

# cd /spare/tct-data
# /usr/local/tct/bin/unrm /dev/sd6 >> unrm-20010303.out
# chmod 440 unrm-20010303.out
# chown kpk:wheel unrm-20010303.out

If you are interested in the whole disk, use dd instead of unrm. Invoke the following command line with the correct values: 

# /bin/dd bs=1024k if=/dev/sd6 > dd-20010303.out

The overall size of the output file together with the lazarus output will be 2200f the size of the entire disk. 

The dd command can also be used to retrieve raw data from other devices. Use the correct device name as the argument for the if option in the command line above. Be aware that direct interference with critical files such as kernel memory has the potential to crash your system. 

At this stage you have all raw data within a single file and you can start to use other tools to review this data - such as od, strings or less

For the forensic analyst it usually will be helpful to use lazarus to split the huge binary file into several smaller files in addition to any other method to review the raw data.


Running lazarus to split the raw data into smaller blocks lazarus will operate on any binary file and try to break it into smaller blocks. While splitting the data into blocks, lazarus will try to identify which kind of data the blocks belong to. The size of the blocks is 1 KB by default and can be modified by changing the variable $BLOCK_SIZE in the TCT configuration file /usr/local/tct/conf/lazarus.cf (if TCT was installed in /usr/local/tct/). 

We recommend that you create separate directories to be able to separate all output data. After changing the user id (to kpk as assumed for our example), change to the directory containing the raw data file and create two subdirectories for lazarus-specific files: 

# su kpk
$ cd /spare/tct-data
$ mkdir blocks
$ mkdir html

You can start lazarus using the following options: 

-h creates a HTML-based view that can be used with any browser to ease access to individual blocks 
-D <dir> directs that all blocks be written to the given directory 
-H <dir> directs the main HTML files that provide the overall navigation to the specified directory 
-w <dir> directs that all other HTML output be written to the given directory 

The last argument is the binary file that serves as input for lazarus. After the command terminates, make sure to remove write access to the created files. 

$ /usr/local/tct/bin/lazarus -h -H . -D /spare/tct-data/blocks -w /spare/tct-data/html unrm-20010303.out
$ chmod 440 *.html html/*.html

lazarus takes a considerable amount of time to split the input file into smaller blocks. After this step, the directory will look similar to the output below: 

$ ls -la /spare/tct-data


total 24


drwxr-x---   4 kpk     wheel         1024 Mar  3 22:36 .


drwxr-xr-x  23 kpk     wheel         1024 Mar  3 22:25 ..


drwx------   2 kpk     wheel         2048 Mar  3 22:36 blocks


drwx------   2 kpk     wheel         5120 Mar  3 22:36 html   


-r--r-----   1 kpk     wheel    545611359 Mar  3 22:16 unrm-20010303.out


-r--r-----   1 kpk     wheel          233 Mar  3 22:35 unrm-20010303.out.frame.html


-r--r-----   1 kpk     wheel        11359 Mar  3 22:36 unrm-20010303.out.html


-r--r-----   1 kpk     wheel         1472 Mar  3 22:35 unrm-20010303.out.menu.html
Start reviewing the smaller blocks by loading the file unrm-20010303.out.html into your browser and go from there.

Understanding the lazarus output lazarus handles the input file as follows: 
  1. A block of the input file is read and the first 100f the data is analyzed to see if it is text or binary.
  2. If it consists of printable characters only, it is assumed to be text and it is then tested against a number of regular expressions to identify C, HTML or other specific text formats.
  3. If the block is considered to be binary, a TCT-specific version of the file(1) command is run over this block. If this gives no results, the first few bytes are analyzed to see if the block is in ELF format.
Each block is then written to an output file. If the block has the same type as the previous block, it is appended. It is also appended if the block was not recognized as any other known data type. This is based on the assumption that data from the same file will often be stored continuously on a disk. Otherwise, a new file is opened and the block is written to it. 

The files written to disk are named according to a specific name scheme. The first part of the name is based on the actual number of the analyzed block. The second part (separated by ".") shows the type using a single letter as defined in the table below. All files are assigned a "txt" extension (again separated by ".") other than graphic files that are given the extension "gif" or "jpg". 
 
Key Letter and Color Code Explanation
A archive
C C code
E ELF
f sniffers
H HTML
I image/pix
L logs
M mail
O null
P programs
Q mailq
R removed
S lisp
T text
U uuencoded
W password file
X exe
Z compressed
. binary
! sound

Links to all files are collected in a kind of map that is presented as a HTML page. This allows you to access the files by simply selecting one link and clicking to it. The map uses the same approach to represent the type of each file as it is used for assigning the file names. The first block of a sequence of blocks is shown as a capitalized letter. The number of blocks that are represented within the map is encoded to save space. The first character represents one block or less, the second from 0 to 2 blocks, the third from 0 to 4 and so on. If five characters are displayed for one type, this reflects 16 to 31 blocks of data. 

A typical result might look like the following example (please note, that the example does not contain active links but underlined characters only). 
 
...!!!!!!Tttt.!!Ttt...TtMmmm


Analyzing the lazarus output The analysis is far from being automated. The level and success of the analysis depends on the ability of the analyst as well as the way the data was originally stored on the analyzed system. Therefore, we can provide no meaningful estimates on how long it will take to go through the data or how good the results will be. 

The lazarus documentation suggest various approaches for reviewing the output: 

  • If your looking for specific files with known content, search for this content by characteristic key words etc. The best tool for this is grep or egrep. The results will depend on the quality of the key words.
  • If you know nothing at all about what might be contained in the raw data, use some tool such as less which can display binary data and can also be used to go through the data. The HTML navigation map will help in this task as well.
  • If you are only interested in readable output, you might use a tool like strings. Use either the single binary file as input for strings or the split blocks.
  • Review specific file types by using the name convention to separate all files which contain C code. List them using ls *.c.txt and then review the files produced.
  • Graphic files could be viewed with a graphical browser.
The tools you are using should be capable of handling binary data. 

No assistance is provided by lazarus to help the analyst re-arrange single blocks to build larger aggregations. This can be done by concatenating files manually and editing them to remove raw data not belonging to the file. This involves making decisions about what blocks belong to each other. This task is difficult and  might affect the overall correctness and result of the analysis.