Understanding the collected information

This system collects different events during the execution of the sample. Afterwards, we compute a set of properties that measure the complexity of a run-time packer. These properties capture different aspects of an unpacking process, and combined together, allow us to classify packers into six classes of incremental complexity (see Packer Complexity Types).

For a formal description of these properties, refer to our research presented at IEEE Security and Privacy.

1. Layers

A layer represents the set of instructions that were modified by (at least) the previous layer (and afterwards executed).

An instruction belongs to layer 0 (first layer) if it was present in the binary when it was loaded in memory (i.e., it was not modified, decoded, or decrypted at run-time). Whenever a new instruction is executed, we will find the deepest layer (l) that modified its overlapping memory, and assign it to the following layer (l+1).

Types:

  1. Single: Single layer packers contain one layer for the unpacking routine, and one single layer for the unpacked content
  2. Multi-layer: Multi-layer packers contain more than one layer of unpacked code.

2. Transitions

A transition occurs when the execution 'jumps' from one layer to another layer.

Forward transitions bring the execution to a higher layer, whereas backward transitions jump back to a previously unpacked layer.

Overall, a run-time packer can present two different transition models:

  1. Linear: there is only one transition from each layer to the following one.
  2. Cyclic: the packer has backward transitions from a layer to one of its predecessors.

3. Isolation

This feature measures the interaction between the unpacking code and the protected code.

Types:

  1. Tail Transition packers execute all the unpacking code, and once the original application has been recovered, the execution is redirected to it.
  2. Interleaved packers mix the execution of certain parts of the unpacking routine with the original application code.

4. Frames

A layer can be unpacked at different times. For instance, we might observe that the unpacking routine first recovers and executes a part of the original code. Later during the execution, it might recover a different part (e.g., a different routine), and execute it. Since both routines would have been modified by the same layer, both would belong to the same layer.

An unpacking frame is a subset of a layer representing a region of memory that was written and executed at one time.

An unpacking frame is a region of memory in which it is observed a sequence of memory write followed by a memory execution.

Types:

  1. Single: packers that have one unpacking frame for each layer; the code is fully unpacked in one layer before the next layers are unprotected.
  2. Multiple: the code of one layer is reconstructed and executed one piece at a time; there are multiple frames per layer.

5. Code Visibility

This property measures how the original code is revealed in memory.

Types:

  1. Full-code packers first unpack all the original code and data, and then redirect the execution to the original entry point. There is always a point in time in which the entire code of the malware can be scanned or retrieved from memory.
  2. Incremental packers reconstruct the original code-on demand, just before it is executed.
  3. Shifting decode frames packers present a more complex version of incremental unpacking that involves re-packing each frame of code after its execution.

6. Granularity

This property measures the granularity of the protected regions of code (for incremental and shifting decode frames packers).

  1. Page, for packers whose the code is unpacked one memory page at a time.
  2. Function, when each function is unpacked before it gets invoked.
  3. Basic Block / Instruction, when the unpacking is done at the level of basic blocks or single instructions.

Packer Complexity Types

Taxonomy

Type I

  • One single unpacking routine.
  • This routine is executed before transferring the control to the unpacked program (which resides in the second layer).
  • Example: UPX
  • Sample report (UPX).

Type II

  • Multiple unpacking layers.
  • Each layer is executed sequentially to unpack the following layer.
  • When the original code has been reconstructed, the last transition transfers the control to it.
  • Sample report (Unknown packer).

Type III

  • Multiple unpacking layers.
  • The unpacking routines are more complex, including loops i.e., transitions back and forth between layers.
  • The original code is not necessarily located in the last (deepest) layer, and the last layer can contain code belonging to the packer, including routines such as integrity checks, anti-debug routines or part of the obfuscated code of the packer.
  • A tail transition exists to separate the packer and the application code. Once the final application code starts executing the code belonging to the packer is not executed any more
  • Example: UPolyX 0.4
  • Sample report (Themida).

Type IV

  • Single- or multi-layer packers.
  • Part of the packer code (but not the one responsible for unpacking) is interleaved with the execution of the original program.
  • There is a moment when the entire original code is completely unpacked in memory. The execution can jump from the packer code to the unpacked application, back and forth between different layers in different ways (multi-threaded application, hooking certain API calls…).
  • Example: ACProject 1.09.
  • Sample report (Upack).

Type V

  • The unpacking code is mangled with the original program.
  • The layer containing the original code has multiple frames, and the packer unpacks these frames incrementally.
  • They have a tail jump, but only a single frame of code may have been revealed at this point. If a snapshot of the process memory is taken after the end of the program execution, all the executed code can be successfully extracted and analyzed.
  • Example: Beria.
  • Sample report (Beria).

Type VI

  • Packers in which only a single fragment of the original program is available in memory at any given moment in time.
  • This fragment can be as little as a single instruction.
  • Example: Armadillo 8.0
  • Sample report (Armadillo).

Not packed

  • The sample is not packed! You got an easy one.

How to interpret graphs

Graphs show the structure of the packer and provide us, at a glance, a general idea of how the unpacking routines have recovered the protected code.

Color blindness

The unpacking graphs are color-coded and the explanations below use colors to refer to certain parts of the graphs.

The following image shows the colors and color names that we will refer to. Note that there are two types of greens, Arrow Green and Box Green, that appear on arrows (connectors) and boxes respectively.

Processes

Our system monitors all the processes created during execution, as well as those the sample interacts with. A graph may show one or several processes, if we detect, for example, that the binary has injected code to another process. Each process is designated by a process number (e.g. P0, P1). The unpacking layers and memory regions of each process are contained in a separate box for each process.

The graph shown below shows 2 processes.

Layers

Each process will have at least 1 layer of code (if some code was executed), and up to any number of layers.

Each layer is represented as a blank box containing horizontally aligned colored boxes (that represent memory regions). Each layer has a header that follows the format [LayerNumber]#[NumberOfFrames]. The first one indicates the layer number: 0 for the executed code that was present in the binary (typically, the code of the packer), and greater than 0 for every unpacked layer. The second number represents the number of frames of code that the layer contains.

The figure shown below represents a packer with 4 layers (3 of them containing unpacked code, because they are gray, see Memory regions (boxes)). All the layers contain a single frame. Nevertheless, if we look at the previous example, the layer 1 in process P1 contains 4 different frames, given that it is an incremental packer that unpacks memory pages on-demand, just before they are executed (i.e., there is one frame per memory page executed).

Memory regions (boxes)

The colored boxes inside each layer are memory regions. Each memory region represents a set of contiguous memory addresses that were executed. Furthermore, we group into the same region all executed instructions located at a distance lower than one memory page (4096 bytes). This does not mean that the binary executed ALL the possible instructions in that region, but we group them together to facilitate visual representation.

These regions follow a simple color scheme:

  • Yellow regions. Represent regions in which there is not a single instruction that wrote the memory of another region. In other words, it represents a piece of code with no unpacking behavior.
  • Gray regions. These regions, on the contrary, contain at least one instruction that wrote the memory of another region: it shows some unpacking behavior.
  • Green regions.There regions represent memory that has been written remotely from another process (either via WriteProcessMemory, shared memory regions, or by loading a file that was written by another process).
  • Red regions. You will only find one red region in each graph, and it contains the last instruction that was executed during analysis.

Also, the regions contain 4 lines of text with different types of information:

  • Line 1: Type of memory and base address. We distinguish between 3 types of memory. “M” for module address space, “H” for heap, “S” for stack, and finally, we will use “N” whenever our system does not properly retrieve the memory type.
  • Line 2: Size. The size of the region in bytes (in hexadecimal).
  • Line 3: APIs executed. It shows 3 attributes separated by #. [NumAPICalls]#[NumDiffAPICalls]#[APICallsByFamily]

    The first one represents the total number of API calls made from the region. The second, the number of different API calls executed, and finally, there are 4 spaces for 4 letters: “VCGM”. Each letter represents the presence of a given family of API calls (an underscore “_” represents the absence of such API call). “V” corresponds to the GetVersion function family, “C” corresponds to the GetCommandLine function family, “G” corresponds to the GetModuleHandle function family, and finally “M” corresponds to the MessageBox related group of functions. The first 3 groups of APIs are related to typical C runtime API calls, and are sometimes used as a way to locate the original entry point of an application. MessageBox related functions were also monitored for testing purposes, and are left because many unpack-me challenges show a message box as a payload. See API call families for more information.

  • Line 4: Frames. Finally, the last line represents the number of frames that the region contains.

Memory write operations (green and red connectors)

Memory write operations between regions are represented as green and red connectors. The green color is used whenever a region writes the memory of another region. In contrast, the red color is used whenever there is a memory write and a execution transition between the same pair of regions. If a region writes another region and then the execution jumps to this code, the connector will be represented in red. Each connector has an hexadecimal number next to it, showing the number of bytes written. For clarity, we only show the connectors between contiguous layers. Showing all the connections would produce and unreadable graph in certain cases.

Execution transitions (gray and blue connectors)

Execution transitions are depicted as gray connectors, and show the execution jumps from one region to another region in the following layer. Like for memory write operations, we omit execution transitions that occur inside the same layer, as well as transitions between non-contiguous layers. The number shown next to each connector represents the number of transitions observed. If the connector is blue instead of gray, it means the transition occurred between two different processes. An inter-process transition does not imply process synchronization and might just be a consequence of process scheduling.

Frames

As described before, the number of frames is represented at two different points: at layer level, and at region level. These numbers may not coincide, but why? A frame represents a set of memory regions written and executed at one time. For instance, imagine a packer that first unpacks and executes a given routine, then goes back to the packer code, unpacks another one, and then executes it. This packer would present two frames, one for each routine. The explanation for counting the number of frames with two different granularities (layer and region) is simple: these two frames may be located in the same layer (and therefore, the layer header would show “2” next to the layer number”), but the code for each frame might be located in different regions, and thus each region would contain only one frame of code. Now, look at the first example graph. Layer 1 in process 1 has 4 frames. Nevertheless, only the region starting at 0x401000 contains 2 frames. This packer protects each memory page separately, so, whenever the execution jumps to a protected memory page, the packer comes in and decrypts its contents (resulting in a new frame). The only region with a size greater than one page is the one at 0x401000 (with 0x1001 bytes), and as a consequence, it presents 2 frames.

API call families

We monitor API calls and check whenever certain specific API functions are used. A typical heuristic to find the original code is to wait until its C run-time initialization routine is executed. Therefore, we monitor the execution of the following functions, related to these initialization routines.

  • GetVersion* family (V): GetVersion, GetVersionExA, and GetVersionExW.
  • GetCommandLine* family (C): GetCommandLineA and GetCommandLineW.
  • GetModuleHandle* family (G): GetModuleHandleA, GetModuleHandleW, GetModuleHandleExA and GetModuleHandleExW.
  • MessageBox* family (M): MessageBox, MessageBoxEx and MessageBoxIndirect.

Deep Packer Inspector API v1.0


Overview

Public API

The public API is available to all registered users. It allows to perform the operations that are usually made in the web interface under the same visibility access constraints.

Public API keys have a 4 request per minute restriction.

Submissions made by API have lower priority than those made via the web interface.

Endpoint Description Required permissions
POST /dpiapi/v1/scan Upload a sample to scan with Deep Packer Inspector. perm-scan
GET /dpiapi/v1/report/{id} Retrieve the results of a report given its id. perm-get-report
GET /dpiapi/v1/memorydump/{id} Download the memory dump of a sample. perm-get-memorydump
GET /dpiapi/v1/graph/{id} Download the unpacking graph of a sample. perm-get-unpacking-graph

Endpoints

POST /dpiapi/v1/scan

To request a file scan you must include a form field api-key with your API key, and a form field main-file with the file. The content length must not exceed 8MB.

Form field Required? Description
api-key Yes Your API key.
main-file Yes The file you wish to analyze.
private No Set Yes in this field to mark your submission as private, otherwise it will be public.
- No The remaining fields will be accounted as auxiliary files (e.g. dlls) of the analysis, the only restriction is that the field names cannot be any of the above.

The response will have the following fields if everything went OK:

JSON field Type Description
status Number 200
dpicode Number 1 or 2. 1 if the request was queued. 2 if the submission was repeated (only on public submissions).
description String Verbose description of what has happened with the request.
id String Unique ID of this submission.
upload_sha256 String SHA256 that identifies all the submissions with the same main file and auxiliary files.
report-url String An url to the report. You must be logged in the service to access a private submission.

If there is an error the response will have the following fields:

JSON field Type Description
status Number 40X
dpicode Number 0
description String Verbose description of what has happened with the request.

Example requests

cURL

        curl --form api-key=your-api-key --form private=Yes \
        --form "main-file=@packedfile.exe" --form "dll=@somelib.dll" \
        https://www.packerinspector.com/dpiapi/v1/scan
      
Python

        import requests
        
        form = {'api-key':'your-api-key', 'private': 'Yes'}
        files = [('main-file', open('packedfile.exe', 'rb')), ('dll', open('somelib.dll', 'rb'))]
        response = requests.post('https://www.packerinspector.com/dpiapi/v1/scan',
                                  data=form, files=files)
        json_response = response.json()
      

Example response


        {
            "report-url": "https://www.packerinspector.com/report/somepath"
            "description": "Success. Your request has been queued.",
            "dpicode": 1,
            "id": "MzA5.Xf22qJROMIiCAwrt9w2-ByFTEgc",
            "status": 200,
            "upload_sha256": "9c516a5304c9cd70196362995564c812a68fc50982784bf24f0077151f94a942"
        }
      

GET /dpiapi/v1/report/{id}

With this endpoint you can retrieve the results of a file scan. The id path refers to the id field obtained in the JSON response of the POST /dpiapi/v1/scan endpoint.

URL parameter Required? Description
api-key Yes Your API key.
get-static-pe-info No Set Yes in this field to obtain the static PE file information, it is omitted by default.
get-vt-scans No Set Yes in this field to obtain VirusTotal's scans, they are omitted by default.

The response will have the following fields:

JSON field Type Description
status Number 200
dpicode Number 1
description String Verbose description of what has happened with the request.
id String Copy of the ID received in the URL path.
report-url String An url to the report. You must be logged in the service to acces a private submission.
file-identification Boolean Whether the file identification information is available or not.
packer-analysis Boolean Whether the packer analysis information is available or not.
static-pe-information Boolean Whether the static PE file information is available or not. (The information can be available, but if you haven't explicitly requested its retrieval it won't be returned.)
vt-scans Boolean Whether the Virus Total scan information is available or not. (The information can be available, but if you haven't explicitly requested its retrieval it won't be returned.)
report Object This field will have the keys file-identification, packer-analysis, static-pe-analysis (optional) and vt-scans (optional), see the example.

If there is an error the response will have the following fields:

JSON field Type Description
status Number 40X
dpicode Number 0
description String Verbose description of what has happened with the request.
flag String (Optional) A warning, your request has been considered a bruteforce attempt.

Example requests

cURL

        curl -G -d "api-key=somekey" -d "get-static-pe-info=Yes" -d "get-vt-scans=Yes" \
        https://www.packerinspector.com/dpiapi/v1/report/report-id
      
Python

        import requests
        
        params = {'api-key':'your-api-key', 'get-static-pe-info': 'Yes', 'get-vt-scans': 'Yes'}
        response = requests.get('https://www.packerinspector.com/dpiapi/v1/report/some-id',
                                 params=params)
        json_response = response.json()
      

Example responses

Example response with a successful packer analysis

{
  "description": "Report successfully retrieved.",
  "dpicode": 1,
  "file-identification": true,
  "id": "some id",
  "packer-analysis": true,
  "report-url": "https://www.packerinspector.com/report/someurl",
  "static-pe-information": true,
  "status": 200,
  "vt-scans": true,
  "report": {
    "file-identification": {
      "auxiliary-files": [{"name": "somename", "sha256": "somehash"}, ...],
      "entropy": 6.16549,
      "file-type": "PE32 executable (GUI) Intel 80386, for MS Windows",
      "first-seen": "Tue, 14 Feb 2017 15:57:34 GMT",
      "imphash": "5a498eee87e4d89512a84502f500181f",
      "known-names": [
        "SolidMorph.v1.UnPackMe.exe"
      ],
      "md5": "1386c0a3d4e8e3b4de99eb0495af120e",
      "mime-type": "application/x-dosexec",
      "packer-signatures": [],
      "sdhash": "very long string",
      "sha1": "109ae218c0589b056bf4d31e053e717f7ddc45c5",
      "sha256": "7fdb3557f55865502544f847bab83558931b24450b7621a4a5735ef1b1f372aa",
      "size": 42496,
      "ssdeep": "768:1tvEsbxsCSLMd/YGoc1lU9aaspMJeWWGsxHbPrf:1pEsFJVd/YG0YaheWWJj",
      "trid": [
        {
          "percent": 52.9,
          "type": "(.EXE) Win32 Executable (generic)"
        },...
      ]
    },
    "packer-analysis": {
      # From 0 (not packed) to 6
      "complexity-type": 3,
      # Seconds
      "execution-time": 22,
      "granularity": "Not applicable",
      "graph": "https://www.packerinspector.com/graph/somepath",
     "last-executed-region": {
        "address": 3735588,
        "calls-api-getcomm": true,
        "calls-api-getmodu": true,
        "calls-api-getvers": true,
        "layer-num": 1,
        "memory-type": "",
        "modified-by-extern-pro": false,
        "num-api-fun-called": 356,
        "num-diff-apis-called": 356,
        "process": 0,
        "region-num": 0,
        "size": 6471,
        "writes-exe-region": true
      },
      "num-downward-trans": 22,
      "num-layers": 3,
      "num-pro-ipc": 0,
      "num-processes": 1,
      "num-regions": 6,
      "num-regions-special-apis": 2,
      "num-upward-trans": 23,
      # The elements of this array will have the same fields as the 'last-executed-region'
      "regions-pot-original": [],
      "layers-and-regions": [
        {
          "frames": 0,
          "highest-address": 4277394,
          "layer-num": 0,
          "lowest-address": 4204353,
          "regions": 2,
          "size": 6912
        },...
      ],
      "api-calls": {
        # The first key in this object is the layer number.
        # (You know how many layers are, and how many regions are in a layer
        #  looking the 'layers-and-regions' field seen above.)
        "0": {
          # Inside each layer the regions are listed.
          # Each region has an 'address-space' and the total number of api calls
          # made in that region ('total-api-calls').
          # The rest of the keys (if any) are the dlls names with an array of
          # the functions used from that dll.
          "0": {
            "address-space": "4204353-4204359",
            "total-api-calls": 0
          },
          "1": {
            "address-space": "4277394-4284300",
            "kErNeL32.dll": [
              "VirtualFreeEx",...
            ],
            "ntdll.dll": [
              "RtlFindCharInUnicodeString",...
            ],
            "total-api-calls": 136
          },
          # Finally, we have the total number of api calls made in the layer.
          "total-api-calls": 136
        }, ...
      },
      "loaded-modules": [
        {
          "name": "msctf.dll",
          "pid": 644,
          "size": 311296,
          "start-address": 1953169408
        },...
      ],
      "remote-memory-writes": [
        {
          "dest-address": 9764864,
          "dest-process": 0,
          "size": 262144,
          "source-address": "",
          "source-process": 0,
          "type": "Memory unmap|deallocate"
        },...
      ]
    }
   },
   "vt-scans": [
       # This array shows the scan for the main file and (if any) auxiliary files.
       {
          "sha256": "some sha",
          "scans": {
            "status": 3,
            "description": "VT scan available.",
            "date": "Wed, 30 Mar 2016 12:12:47 GMT",
            "results": [
              {
                "result": "HW32.Packed.DB81",
                "antivirus": "Bkav",
                "update": 20160330
              }, ...
             ]
       },...
   ],
   "static-pe-analysis": {
      "compi-timestamp": "Wed, 13 Apr 2011 01:34:12 GMT",
      "entry-point": "0x402741",
      "target-machine": "Intel 386 or later processors and compatible processors"
      "exports": ["somename", ...],
      "imports": {
        "kErNeL32.dll": [
          "GetProcAddress",
          "GetModuleHandleA",
          "LoadLibraryA"
        ]
      },
      "overlay-entropy": 0,
      "overlay-size": 0,
      "resources": [
        {
          "count": 1,
          "md5": "d41d8cd98f00b204e9800998ecf8427e",
          "name": "RT_DIALOG",
          "sdhash": "Not applicable",
          "sha1": "da39a3ee5e6b4b0d3255bfef95601890afd80709",
          "sha256": "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855",
          "size": 382,
          "ssdeep": "3::",
          "type": "empty"
        },
        {
          "count": 3,
          "md5": "76e49a891b3576e44dd22c8b1fe92a21",
          "name": "RT_MANIFEST",
          "sdhash": "long string",
          "sha1": "ff938b12ed87d10a58bbc76f330f0eacbd839925",
          "sha256": "3d48dbb04e791a9341f940b8657dd7547e19fa8612b220fe36bf613bbd06f9c0",
          "size": 855,
          "ssdeep": "24:OJiNK+bIg4y5xvW0WiEgug5vc+Ar4+AzfR+:OgK+bIg4y/5yg5ngio",
          "type": "ASCII text, with very long lines, with no line terminators"
        },...
      ],
      "sections": [
        {
          "entropy": 7.90744,
          "flags": [
            {
              "name": "IMAGE_SCN_MEM_EXECUTE",
              "value": 536870912
            },
            {
              "name": "IMAGE_SCN_CNT_INITIALIZED_DATA",
              "value": 64
            },...
          ],
          "md5": "269b3f755bf0cea0a38db94721cf2a09",
          "name": ".text\u0000\u0000\u0000",
          "raw-address": "0xc000",
          "raw-size": "0x1800",
          "sdhash": "very long string",
          "sha1": "d11147e31bac70808a7d7066b0be1c9066bc464b",
          "sha256": "7d72b3792f4485ca98e2e8c70cbb0d8880e6763ef496ca5f0ae3c8e8d4b999b9",
          "ssdeep": "long string",
          "type": "Code",
          "virtual-address": "0x1000",
          "virtual-size": "0xc000"
        },...
      ]
    }
  }
}
      
Example response with an error in the analysis

{
  "description": "Report successfully retrieved.",
  "dpicode": 1,
  "file-identification": true,
  "id": "some id",
  "packer-analysis": true,
  "report": {
    "packer-analysis": {
      "error": "Analysis finished after an error during execution. The sample may not have been fully unpacked."
    }, ...

  },
  "report-url": "https://www.packerinspector.com/report/somepath",
  "static-pe-information": true,
  "status": 200,
  "vt-scans": false
}
      

GET /dpiapi/v1/memorydump/{id}

Returns the memory dump of a report given its ID. The id path refers to the id field obtained in the JSON response of the POST /dpiapi/v1/scan endpoint.

Please note that like in the web interface, you can only download memory dumps of the reports that you own (when you have triggered the analysis).

URL parameter Required? Description
api-key Yes Your API key.

If there is an error the response will have the following fields:

JSON field Type Description
status Number 40X
dpicode Number 0
description String Verbose description of what has happened with the request.
flag String (Optional) A warning, your request has been considered a bruteforce attempt.

Example requests

cURL

        curl -G -d "api-key=someapikey" \
        https://www.packerinspector.com/dpiapi/v1/memorydump/someid
        -o samplename.tar.gz
      
Python

        import requests
        
        params = {'api-key':'your-api-key'}
        response = requests.get('https://www.packerinspector.com/dpiapi/v1/memorydump/some-id',
                                params=params)
        download = response.content
      

GET /dpiapi/v1/graph/{id}

Returns the unpacking graph of a report. The id path refers to the id field obtained in the JSON response of the POST /dpiapi/v1/scan endpoint.

URL parameter Required? Description
api-key Yes Your API key.

If there is an error the response will have the following fields:

JSON field Type Description
status Number 40X
dpicode Number 0
description String Verbose description of what has happened with the request.
flag String (Optional) A warning, your request has been considered a bruteforce attempt.

Example requests

cURL

        curl -G -d "api-key=someapikey" \
        https://www.packerinspector.com/dpiapi/v1/graph/someid
        -o graph.png
      
Python

        import requests
        
        params = {'api-key':'your-api-key'}
        response = requests.get('https://www.packerinspector.com/dpiapi/v1/graph/some-id',
                                params=params)
        download = response.content
    

Scripts to interact with the API

Drop us a line if you want your tool to be listed here.

Python