package bap-std
Install
Dune Dependency
Authors
Maintainers
Sources
sha256=631fc58628418e4856709a0cfc923a65e00c9494fbd28d444c633d11194831de
md5=3db9deac8d429b9b8a8ec9aec54987b1
doc/bap/Bap/Std/Project/index.html
Module Std.Project
Disassembled program.
Project contains data that we were able to reconstruct during the disassembly, semantic analysis, and other arbitrary amount of analyses.
Actually, project allows to associate arbitrary data with memory regions, program terms, and even attach them globally to itself. So it can be seen as a knowledge base of deeply interconnected facts.
Other than delivering information, from the bap to a passes, it can be also used as a communication media between different passes, (see Working with project).
type t = project
val bin_size_state : state Core_kernel.Bin_prot.Size.sizer
val bin_write_state : state Core_kernel.Bin_prot.Write.writer
val bin_writer_state : state Core_kernel.Bin_prot.Type_class.writer
val bin_read_state : state Core_kernel.Bin_prot.Read.reader
val __bin_read_state__ : (int -> state) Core_kernel.Bin_prot.Read.reader
val bin_reader_state : state Core_kernel.Bin_prot.Type_class.reader
val bin_state : state Core_kernel.Bin_prot.Type_class.t
IO interface to a project data structure.
include Regular.Std.Data.S with type t := t
name,Ver v,desc
information attached to a particular reader or writer.
Data representation version. After any change in data representation the version should be increased.
Serializers that are derived from a data representation must have the same version as a version of the data structure, from which it is derived. This kind of serializers can only read and write data of the same version.
Other serializers can actually read and write data independent on its representation version. A serializer, that can't store data of current version simply shouldn't be added to a set of serializers.
It is assumed, that if a reader and a writer has the same name and version, then whatever was written by the writer should be readable by the reader. The round-trip equality is not required, thus it is acceptable if some information is lost.
It is also possible, that a reader and a writer that has the same name are compatible. In that case it is recommended to use semantic versioning.
val size_in_bytes : ?ver:string -> ?fmt:string -> t -> int
size_in_bytes ?ver ?fmt datum
returns the amount of bytes that is needed to represent datum
in the given format and version
val of_bytes : ?ver:string -> ?fmt:string -> Regular.Std.bytes -> t
of_bytes ?ver ?fmt bytes
deserializes a value from bytes.
val to_bytes : ?ver:string -> ?fmt:string -> t -> Regular.Std.bytes
to_bytes ?ver ?fmt datum
serializes a datum
to a sequence of bytes.
val blit_to_bytes :
?ver:string ->
?fmt:string ->
Regular.Std.bytes ->
t ->
int ->
unit
blit_to_bytes ?ver ?fmt buffer datum offset
copies a serialized representation of datum into a buffer
, starting from the offset
.
val of_bigstring : ?ver:string -> ?fmt:string -> Core_kernel.bigstring -> t
of_bigstring ?ver ?fmt buf
deserializes a datum from bigstring
val to_bigstring : ?ver:string -> ?fmt:string -> t -> Core_kernel.bigstring
of_bigstring ?ver ?fmt datum
serializes a datum to a sequence of bytes represented as bigstring
val blit_to_bigstring :
?ver:string ->
?fmt:string ->
Core_kernel.bigstring ->
t ->
int ->
unit
blit_to_bigstring ?ver ?fmt buffer datum offset
copies a serialized representation of datum into a buffer
, starting from offset
.
module Io : sig ... end
Input/Output functions for the given datum.
module Cache : sig ... end
Data cache.
val add_reader :
?desc:string ->
ver:string ->
string ->
t Regular.Std.reader ->
unit
add_reader ?desc ~ver name reader
registers a new reader
with a provided name
, version ver
and optional description desc
val add_writer :
?desc:string ->
ver:string ->
string ->
t Regular.Std.writer ->
unit
add_writer ?desc ~ver name writer
registers a new writer
with a provided name
, version ver
and optional description desc
val available_readers : unit -> info list
available_reader ()
lists available readers for the data type
val default_reader : unit -> info
default_reader
returns information about default reader
set_default_reader ?ver name
sets new default reader. If version is not specified then the latest available version is used. Raises an exception if a reader with a given name doesn't exist.
with_reader ?ver name operation
temporary sets a default reader to a reader with a specified name and version. The default reader is restored after operation
is finished.
val available_writers : unit -> info list
available_writer ()
lists available writers for the data type
val default_writer : unit -> info
default_writer
returns information about the default writer
set_default_writer ?ver name
sets new default writer. If version is not specified then the latest available version is used. Raises an exception if a writer with a given name doesn't exist.
with_writer ?ver name operation
temporary sets a default writer to a writer with a specified name and version. The default writer is restored after operation
is finished.
val default_printer : unit -> info option
default_writer
optionally returns an information about default printer
set_default_printer ?ver name
sets new default printer. If version is not specified then the latest available version is used. Raises an exception if a printer with a given name doesn't exist.
with_printer ?ver name operation
temporary sets a default printer to a printer with a specified name and version. The default printer is restored after operation
is finished.
Low level access to serializers
val find_reader : ?ver:string -> string -> t Regular.Std.reader option
find_reader ?ver name
lookups a reader with a given name. If version is not specified, then a reader with maximum version is returned.
val find_writer : ?ver:string -> string -> t Regular.Std.writer option
find_writer ?ver name
lookups a writer with a given name. If version is not specified, then a writer with maximum version is returned.
val create :
?state:state ->
?disassembler:string ->
?brancher:brancher source ->
?symbolizer:symbolizer source ->
?rooter:rooter source ->
?reconstructor:reconstructor source ->
input ->
t Core_kernel.Or_error.t
from_file filename
creates a project from a provided input source. The reconstruction is a multi-pass process driven by the following input variables, provided by a user:
brancher
decides instruction successors;rooter
decides function starts;symbolizer
decides function names;reconstructor
provides algorithm for symtab reconstruction;
The project is built incrementally and iteratively until a fixpoint is reached. The fixpoint is reached when an information stops to flow from the input variables.
The overall algorithm of can depicted with the following diargram, where boxes denote data and ovals denote processes:
+---------+ +---------+ +---------+ | brancher| |code/data| | rooter | +----+----+ +----+----+ +----+----+ | | | | v | | ----------- | +------>( disasm )<------+ -----+----- | v +----------+ +---------+ +----------+ |symbolizer| | CFG | | reconstr + +-----+----+ +----+----+ +----+-----+ | | | | v | | ----------- | +------>( reconstr )<------+ -----+----- | v +---------+ | symtab | +----+----+ | v ----------- ( lift IR ) -----+----- | v +---------+ | program | +---------+
The input variables, are represented with stream of values. Basically, they can be viewed as cells, that depends on some input. When input changes, the value is recomputed and passed to the stream. Circular dependencies are allowed, so a rooter may actually depend on the program
term. In case of circular dependencies, the above algorithm will be run iteratively, until a fixpoint is reached. A criterium for the fixpoint, is when no data need to be recomputed. And the data must be recomputed when its input is changed or needs to be recomputed.
User provided input can depend on any information, but a good start is the information provided by the Info
module. It contains several variables, that are guaranteed to be defined in the process of reconstruction.
For example, let's assume, that a create_source
function actually requires a filename as its input, to create a source t
, then it can be created as easily as:
Stream.map Input.file ~f:create_source
As a more complex, example let's assume, that a source now requires that both arch
and file
are known. We can combine two different streams of information with a merge
function:
Stream.merge Input.file Input.arch ~f:create_source
, where create_source
is a function of type: string -> arch -> t
.
If the source requires more than two arguments, then a Stream.Variadic
, that is a generalization of a merge function can be used. Suppose, that a source of information requires three inputs: filename, architecture and compiler name. Then we first define a list of arguments,
let args = Stream.Variadic.(args Input.arch $Input.file $Compiler.name)
and apply them to our function create_source
:
Stream.Variadic.(apply ~f:create_source args
.
Sources, specified in the examples above, will call a create_source
when all arguments changes. This is an expected behavior for the arch
and file
variables, since the do not change during the program computation. Mixing constant and non-constant (with respect to a computation) variables is not that easy, but still can be achieved using either
and parse
combinators. For example, let's assume, that a source
requires arch
and cfg
as its input:
Stream.either Input.arch Input.cfg |>
Stream.parse inputs ~init:nil ~f:(fun create -> function
| First arch -> None, create_source arch
| Second cfg -> Some (create cfg), create)
In the example, we parse the stream that contains either architectures or control flow graphs with a state of type, cfg -> t Or_error.t
. Every time an architecture is changed, (i.e., a new project is started), we recreate a our state, by calling the create_source
function. Since, we can't proof, that architecture will be decided before the cfg
, or decided at all we need to provide an initial nil
function. It can return either a bottom value, e.g., let nil _ = Or_error.of_string "expected arch"
or it can just provide an empty information.
memory t
returns the memory as an interval tree marked with arbitrary values.
tag_memory project region tag value
tags a given region
of memory in project
with a given tag
and value
. Example: Project.tag_memory project tained color red
substitute p region tag value
is like tag_memory, but it will also apply substitutions in the provided string value, as per OCaml standard library's Buffer.add_substitute
function.
Example:
Project.substitute project comment "$symbol starts at $symbol_addr"
The following substitutions are supported:
$section{_name,_addr,_min_addr,_max_addr}
- name of region of file to which it belongs. For example, in ELF this name will correspond to the section name
$symbol{_name,_addr,_min_addr,_max_addr}
- name or address of the symbol to which this memory belongs
$asm
- assembler listing of the memory region
$bil
- BIL code of the tagged memory region
$block{_name,_addr,_min_addr,_max_addr}
- name or address of a basic block to which this region belongs
$min_addr, $addr
- starting address of a memory region
$max_addr
- address of the last byte of a memory region.
with_memory project
updates project memory. It is recommended to use tag_memory
and substitute
instead of this function, if possible.
Extensible record
Project can also be viewed as an extensible record, where one can store arbitrary values. Example,
let p = Project.set project color `green
This will set field color
to a value `green
.
set project field value
sets a field
to a give value. If field
was already set, then new value overrides the old one. Otherwise the field is added.
has project field
checks whether field exists or not. Useful for fields of type unit, that actually isomorphic to bool fields, e.g., if Project.has project mark
module Info : sig ... end
Information obtained during project reconstruction.
module Input : sig ... end
Input information.
Registering passes
To add new pass one of the following register_*
functions should be called.
val register_pass :
?autorun:bool ->
?runonce:bool ->
?deps:string list ->
?name:string ->
(t -> t) ->
unit
register_pass ?autorun ?runonce ?deps ?name pass
registers a pass
over a project.
If autorun
is true
, then the host program will run this pass automatically. If runonce
is true, then for a given project the pass will be run only once. Each repeating attempts to run the pass will be ignored. The runonce
parameter defaults to false
when autorun
is false
, and to true
otherwise.
Parameter deps
is list of dependencies. Each dependency is a name of a pass, that should be run before the pass
. The dependencies will be run in a specified order every time the pass
is run.
To get access to command line arguments use Plugin.argv
val register_pass' :
?autorun:bool ->
?runonce:bool ->
?deps:string list ->
?name:string ->
(t -> unit) ->
unit
register_pass' pass
registers pass
that doesn't modify the project effect and is run only for side effect. (See register_pass
)
val passes : unit -> pass list
passes ()
returns all currently registered passes.
val find_pass : string -> pass option
find_pass name
returns a pass with the given name.
module Pass : sig ... end
A program analysis pass.