lighthouse_coverage - an execution tracer for PANDA 2

The lighthouse_coverage plugin, as developed here has a few areas that could be improved:

  1. sometimes addresses are printed doubly, confusing lighthouse
  2. sometimes the executable name is cut off in the output file. V9_V432RevB - Copy.exe could be V9_V432RevB - C
  3. needs custom plugin for lighthouse
  4. prints out all code run in an executable process, including DLLs, but does not mark the DLLs in any way

The solution for the first point is to use a different API. before_block_exec can be interrupted and re-run. This caused duplicate entries in the output file. Instead, we use after_block_exec, which has a parameter to check if the block executed completely.

The solutions for points 2, 3 and 4 are closely related. According to PANDA developer Andrew Fasano, The structure we pull process names out of is a fixed-size buffer (task->comm) so long names do get truncated. I have found that the list of loaded modules obtained with get_mappings() provides full module names, including the executable. So after calling get_current_process(), we call get_mappings(). I avoided that previously because there was a large performance penalty. This fixes point 2. However, once making this call, we can also extract the base address of each module. That in turn allows us to output the results in module+offset format that lighthouse already understands, obviating the need for the custom plugin. This fixes point 3. Also, since get_mappings() provides the module address ranges, we can now restrict the output to the module. I.e. only print out when actual executable code is run, not loaded DLLs. Additionally, we can now look for execution of a particular DLL for a particular process. This fixes point 4.

So there are now 3 ways to execute the lighthouse_coverage plugin:

  • Print out the execution trace for all executables, but not their loaded DLLs
  • Print out the execution trace for a particular executable, but not its loaded DLLs
  • Print out the execution trace for a particular DLL of a particular executable

This is achieved with switches sent to the plugin:

./panda-system-x86_64 -m 4096 -replay vc6  -os windows-32-xpsp3 -panda osi  -panda lighthouse_coverage

or

./panda-system-x86_64 -m 4096 -replay vc6  -os windows-32-xpsp3 -panda osi  -panda lighthouse_coverage:process="testapp.exe"

or

./panda-system-x86_64 -m 4096 -replay vc6  -os windows-32-xpsp3 -panda osi  -panda lighthouse_coverage:process="testapp.exe",dll=mini.dll

The updated code looks like this:

#include "panda/plugin.h"// OSI#include "osi/osi_types.h"#include "osi/osi_ext.h"
// you can restrict the output of this plugin to a particular process by specifying the process parameter, e.g.// -panda lighthouse_coverage:process=lsass.exe// you can restrict the output of this plugin to a particular dll by specifying both process and dll parameters, e.g.// -panda lighthouse_coverage:process=lsass.exe,dll=ntdll.dll

// function prototypesvoid after_block_exec(CPUState *cpuState, TranslationBlock *translationBlock, uint8_t exitCode) ;void uninit_plugin(void *self) ;bool init_plugin(void *self) ;
FILE * outputFile = 0; // pointer to output file...const char * processName = 0; // pointer to process name to restrict output toconst char * dllName = 0; // pointer to dll name to restrict output to
void after_block_exec(CPUState *cpuState, TranslationBlock *translationBlock, uint8_t exitCode) { // this function gets called right after every basic block is executed if (exitCode > TB_EXIT_IDX1) // If exitCode > TB_EXIT_IDX1, then the block exited early. return; if (panda_in_kernel(first_cpu) == 0) // I'm not interested in kernel modules { OsiProc * process = get_current_process(cpuState); // get a reference to the process this TranslationBlock belongs to if (process) // Make sure 'process' is a thing { GArray * mappings = get_mappings(cpuState, process); // we need this for getting the base address of the process or DLL if (mappings != NULL) // make sure 'mappings' is a thing { OsiModule * module = NULL; // now we have 3 cases. All processes; only a particular process; or a particular DLL for a particular process if (0 == strcmp("",processName)) // This means all processes; but we ignore the DLLs { // find base address module = &g_array_index(mappings, OsiModule, 0); // the first module mapped is the main executable itself if ((translationBlock->pc >= module->base) && (translationBlock->pc <= (module->base + module->size))) // make sure we are in the address space for the module { fprintf(outputFile,"\n%s+%#018"PRIx64"", module->name, (long unsigned int)((translationBlock->pc)-(module->base))); // print out info } } else if(0 == strcmp("",dllName)) // Only a particular process; but not a DLL { module = &g_array_index(mappings, OsiModule, 0); if (0 == strcasecmp(module->name,processName)) // first check that the first module name matches the desired process name { if ((translationBlock->pc >= module->base) && (translationBlock->pc <= (module->base + module->size))) // make sure we are in the address space for the module { fprintf(outputFile,"\n%s+%#018"PRIx64"", module->name, (long unsigned int)((translationBlock->pc)-(module->base))); // print out info } } } else // Only a particular DLL for a particular process { module = &g_array_index(mappings, OsiModule, 0); if (0 == strcasecmp(module->name,processName)) // check for the particular process { for (int i = 1; i < mappings->len; i++) // we have to iterate though the list of loaded modules to find the desired DLL { module = &g_array_index(mappings, OsiModule, i); if (0 == strcasecmp(module->name,dllName)) // found the module with the right name { if ((translationBlock->pc >= module->base) && (translationBlock->pc <= (module->base + module->size))) // make sure we are in the dll { fprintf(outputFile,"\n%s+%#018"PRIx64"", module->name, (long unsigned int)((translationBlock->pc)-(module->base))); // print out info } break; // done iterating through for loop } } } } g_array_free(mappings, true); // always free unused resources } else { printf("Whoa! g_array_index went wrong\n"); } } else { printf("Whoa! get_current_process went wrong\n"); } free_osiproc(process); // always free unused resources } return;};
bool init_plugin(void *self) { panda_require("osi"); // ensure that OSI is loaded assert(init_osi_api()); // ensure that OSI is loaded outputFile = fopen("lighthouse.out", "w"); // open output file panda_cb pcb = { .after_block_exec = after_block_exec }; panda_register_callback(self, PANDA_CB_AFTER_BLOCK_EXEC, pcb); // register the callback function above panda_arg_list *args = panda_get_args("lighthouse_coverage"); // Get Plugin Arguments processName = panda_parse_string(args, "process", ""); // get process name to restrict to, or default to "" dllName = panda_parse_string(args, "dll", ""); // get dll name to restrict to, or default to "" return true;};
void uninit_plugin(void *self) { fclose(outputFile); // close output file};

There are still some issues. For example, under Windows, sometimes I see messages of the sort:

Could not read next entry in module list.

when calling get_mappings()

According to very helpful PANDA developer Nathan Jackson, who wrote the OSI for windows XP, the problem could be that

"that data structure lies in user memory and Windows likes to page things out."

but so far I haven't had any problem with getting the information about the executables I am interested in, so maybe it does not matter in most cases.

References: