mirror of
https://github.com/zeek/zeek.git
synced 2025-10-01 22:28:20 +00:00
1115 lines
27 KiB
Text
1115 lines
27 KiB
Text
.. -*- mode: rst-mode -*-
|
|
|
|
======
|
|
BinPAC
|
|
======
|
|
|
|
BinPAC is a high level language for describing protocol parsers and
|
|
generates C++ code. It is currently maintained and distributed with the
|
|
Zeek Network Security Monitor distribution, however, the generated parsers
|
|
may be used with other programs besides Zeek.
|
|
|
|
BinPAC originally existed as a separate project to the main Zeek repository.
|
|
You can see the archived repository at https://github.com/zeek/binpac. The
|
|
repository only exists for historical reasons, as all new work done to
|
|
BinPAC is done in the main Zeek repo.
|
|
|
|
.. contents::
|
|
|
|
Prerequisites
|
|
=============
|
|
|
|
BinPAC relies on the following libraries and tools, which need to be
|
|
installed before you begin:
|
|
|
|
* Flex (Fast Lexical Analyzer)
|
|
Flex is already installed on most systems, so with luck you can
|
|
skip having to install it yourself.
|
|
|
|
* Bison (GNU Parser Generator)
|
|
Bison is also already installed on many system.
|
|
|
|
* CMake 3.15.0 or greater
|
|
CMake is a cross-platform, open-source build system, typically
|
|
not installed by default. See http://www.cmake.org for more
|
|
information regarding CMake and the installation steps below for
|
|
how to use it to build this distribution. CMake generates native
|
|
Makefiles that depend on GNU Make by default
|
|
|
|
Glossary and Convention
|
|
=======================
|
|
|
|
To make this document easier to read, the following are the glossary
|
|
and convention used.
|
|
|
|
- PAC grammar - .pac file written by user.
|
|
- PAC source - _pac.cc file generated by binpac
|
|
- PAC header - _pac.h file generated by binpac
|
|
- Analyzer - Protocol decoder generated by compiling PAC grammar
|
|
- Field - a member of a record
|
|
- Primary field - member of a record as direct result of parsing
|
|
- Derivative field - member of a record evaluated through post processing
|
|
|
|
BinPAC Language Reference
|
|
=========================
|
|
|
|
BinPAC language consists of:
|
|
|
|
- analyzer
|
|
- type - data structure like definition describing parsing unit. Types can built on each other to form more complex type similar to yacc productions.
|
|
- flow - "flow" defines how data will be fed into the analyzer and the top level parsing unit.
|
|
- Keywords
|
|
- Built-in macros
|
|
|
|
Defining an analyzer
|
|
--------------------
|
|
|
|
There are two components to an analyzer definition: the top level context
|
|
and the connection definition.
|
|
|
|
|
|
Context Definition
|
|
~~~~~~~~~~~~~~~~~~
|
|
|
|
Each analyzer requires a top level context defined by the following syntax:
|
|
|
|
.. code::
|
|
|
|
analyzer <ContextName> withcontext {
|
|
... context members ...
|
|
}
|
|
|
|
Typically top level context contains pointer to top level analyzer
|
|
and connection definition like below:
|
|
|
|
.. code::
|
|
|
|
analyzer HTTP withcontext {
|
|
connection : HTTP_analyzer;
|
|
flow : HTTP_flow;
|
|
};
|
|
|
|
|
|
Connection Definition
|
|
~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
A "connection" defines the entry point into the analyzer. It consists of
|
|
two "flow" definitions, an "upflow" and a "downflow".
|
|
|
|
.. code::
|
|
|
|
connection <AnalyzerName>(optional parameter) {
|
|
upflow = <UpflowConstructor>;
|
|
downflow = <DownflowConstructor>;
|
|
}
|
|
|
|
Example:
|
|
|
|
.. code::
|
|
|
|
connection HTTP_analyzer {
|
|
upflow = HTTP_flow (true);
|
|
downflow = HTTP_flow (false);
|
|
};
|
|
|
|
type
|
|
----
|
|
|
|
A "type" is the basic building block of binpac-generated parser, and describes
|
|
the structure of a byte segment. Each non-primitive "type" generates a C++
|
|
class that can independently parse the structure which it describes.
|
|
|
|
Syntax:
|
|
|
|
.. code::
|
|
|
|
type <typeName>{(<optional type parameter(s)>)} = <compositor or primitive class>{
|
|
cases or members declaration.
|
|
} <optional attribute(s)>;
|
|
|
|
Example:
|
|
|
|
PAC grammar::
|
|
|
|
type myType = record {
|
|
data:uint8;
|
|
};
|
|
|
|
PAC header::
|
|
|
|
class myType{
|
|
public:
|
|
myType();
|
|
~myType();
|
|
int Parse(const_byteptr const t_begin_of_data, const_byteptr const t_end_of_data);
|
|
uint8 data() const { return data_; }
|
|
protected:
|
|
uint8 data_;
|
|
};
|
|
|
|
|
|
Primitives
|
|
~~~~~~~~~~
|
|
|
|
Primitive type can be treated as #define in C language. They are embedded
|
|
into other type which reference them but do not generate any parsing
|
|
code of their own. Available primitive types are:
|
|
|
|
- int8
|
|
- int16
|
|
- int32
|
|
- uint8
|
|
- uint16
|
|
- uint32
|
|
- Regular expression ( ``type HTTP_URI = RE/[[:alnum:][:punct:]]+/;`` )
|
|
- bytestring
|
|
|
|
Examples:
|
|
|
|
.. code::
|
|
|
|
type foo = record { x: number; };
|
|
|
|
is equivalent to:
|
|
|
|
.. code::
|
|
|
|
type foo = record { x: uint8[3]; };
|
|
|
|
(Note: this behavior may change in future versions of binpac.)
|
|
|
|
record
|
|
~~~~~~
|
|
|
|
A "record" composes primitive type(s) and other record(s) to create
|
|
new "type". This new "type" in turn can be used as part of parent type
|
|
or directly for parsing.
|
|
|
|
Example:
|
|
|
|
.. code::
|
|
|
|
type SMB_body = record {
|
|
word_count : uint8;
|
|
parameter_words : uint16[word_count];
|
|
byte_count : uint16;
|
|
}
|
|
|
|
case
|
|
~~~~
|
|
|
|
The "case" compositor allows switching between different parsing methods.
|
|
|
|
.. code::
|
|
|
|
type SMB_string(unicode: bool, offset: int) = case unicode of {
|
|
true -> u: SMB_unicode_string(offset);
|
|
false -> a: SMB_ascii_string;
|
|
};
|
|
|
|
A "case" supports an optional "default" label to denote none of the
|
|
above labels are matched. If no fields follow a given label, a user
|
|
can specify an arbitrary field name with the "empty" type. See
|
|
the following example.
|
|
|
|
.. code::
|
|
|
|
type HTTP_Message(expect_body: ExpectBody) = record {
|
|
headers: HTTP_Headers;
|
|
body_or_not: case expect_body of {
|
|
BODY_NOT_EXPECTED -> none: empty;
|
|
default -> body: HTTP_Body(expect_body);
|
|
};
|
|
};
|
|
|
|
Note that only one field is allowed after a given label. If multiple fields
|
|
are to be specified, they should be packed in another "record" type first.
|
|
The other usages of `case`_ are described later.
|
|
|
|
array
|
|
~~~~~
|
|
|
|
A type can be defined as a sequence of "single-type elements". By default,
|
|
array type continue parsing for the array element in an infinite loop.
|
|
Or an array size can be specified to control the number of
|
|
match. &until can be also conditionally end parsing:
|
|
|
|
.. code::
|
|
|
|
# This will match for 10 element only
|
|
type HTTP_Headers = HTTP_Header [10];
|
|
|
|
# This will match until the condition is met
|
|
type HTTP_Headers = HTTP_Header [] &until(/*Some condition*/);
|
|
|
|
Array can also be used directly inside of "record". For example:
|
|
|
|
.. code::
|
|
|
|
type DNS_message = record {
|
|
header: DNS_header;
|
|
question: DNS_question(this)[header.qdcount];
|
|
answer: DNS_rr(this, DNS_ANSWER)[header.ancount];
|
|
authority: DNS_rr(this, DNS_AUTHORITY)[header.nscount];
|
|
additional: DNS_rr(this, DNS_ADDITIONAL)[header.arcount];
|
|
}&byteorder = bigendian, &exportsourcedata
|
|
|
|
flow
|
|
----
|
|
|
|
A "flow" defines how data is fed into the analyzer. It also maintains
|
|
custom state information declared by `%member`_. flow is configured by
|
|
specifying type of data unit.
|
|
|
|
Syntax:
|
|
|
|
.. code::
|
|
|
|
flow <Flow name>(<optional attribute>) {
|
|
<flowunit|datagram> = <top level data unit> withcontext (<context constructor parameter>);
|
|
};
|
|
|
|
When "flow" is added to top level context analyzer, it enables use of &oneline
|
|
and &length in "record" type. flow buffers data when there is not enough
|
|
to evaluate the record and dispatches data for evaluation when the
|
|
threshold is reached.
|
|
|
|
flowunit
|
|
~~~~~~~~
|
|
|
|
When flowunit is used, the analyzer uses flow buffer to handle incremental
|
|
input and provide support for &oneline/&length. For further detail on
|
|
this, see `Buffering`_.
|
|
|
|
.. code::
|
|
|
|
flowunit = HTTP_PDU(is_orig) withcontext (analyzer, this);
|
|
|
|
datagram
|
|
~~~~~~~~
|
|
|
|
Opposite to flowunit, by declaring data unit as datagram, flow buffer is
|
|
opted out. This results in faster parsing but no incremental input
|
|
or buffering support.
|
|
|
|
.. code::
|
|
|
|
datagram = HTTP_PDU(is_orig) withcontext (analyzer, this);
|
|
|
|
Byte Ordering and Alignment
|
|
---------------------------
|
|
|
|
Byte Ordering
|
|
~~~~~~~~~~~~~
|
|
|
|
Byte Alignment
|
|
~~~~~~~~~~~~~~
|
|
|
|
.. code::
|
|
|
|
type RPC_Opaque = record {
|
|
length: uint32;
|
|
data: uint8[length];
|
|
pad: padding align 4; # pad to 4-byte boundary
|
|
};
|
|
|
|
Functions
|
|
---------
|
|
|
|
User can define functions in binpac.
|
|
Function can be declared using one of the three ways:
|
|
|
|
PAC with embedded body
|
|
~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
PAC style function prototype and embed the body using %{ %}::
|
|
|
|
function print_stuff(value :const_bytestring):bool
|
|
%{
|
|
printf("Value [%s]\n", std_str(value).c_str());
|
|
%}
|
|
|
|
PAC with PAC-case body
|
|
~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
Pac style function with a case body, this type of declaration is useful for
|
|
extending later by casefunc::
|
|
|
|
function RPC_Service(prog: uint32, vers: uint32): EnumRPCService =
|
|
case prog of {
|
|
default -> RPC_SERVICE_UNKNOWN;
|
|
};
|
|
|
|
|
|
Inlined by %code
|
|
~~~~~~~~~~~~~~~~
|
|
|
|
Function can be completely inlined by using %code::
|
|
|
|
%code{
|
|
EnumRPCService RPC_Service(const RPC_Call* call)
|
|
{
|
|
return call ? call->service() : RPC_SERVICE_UNKNOWN;
|
|
}
|
|
%}
|
|
|
|
|
|
Extending
|
|
---------
|
|
|
|
PAC code can be extended by using "refine". This is useful for code
|
|
reusing and splitting functionality for parallel development.
|
|
|
|
Extending record
|
|
~~~~~~~~~~~~~~~~
|
|
|
|
Record can be extended to add additional attribute(s) by
|
|
using "refine typeattr". One of the typical use is to add &let for split
|
|
protocol parsing from protocol analysis.
|
|
|
|
.. code::
|
|
|
|
refine typeattr HTTP_RequestLine += &let {
|
|
process_request: bool =
|
|
process_func(method, uri, version);
|
|
};
|
|
|
|
Extending type case
|
|
~~~~~~~~~~~~~~~~~~~
|
|
|
|
.. code::
|
|
|
|
refine casetype RPC_Params += {
|
|
RPC_SERVICE_PORTMAP -> portmap: PortmapParams(call);
|
|
};
|
|
|
|
Extending function case
|
|
~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
Function which is declared as a PAC case can be extended by adding
|
|
additional case into the switch.
|
|
|
|
.. code::
|
|
|
|
refine casefunc RPC_BuildCallVal += {
|
|
RPC_SERVICE_PORTMAP ->
|
|
PortmapBuildCallVal(call, call.params.portmap);
|
|
};
|
|
|
|
Extending connection
|
|
~~~~~~~~~~~~~~~~~~~~
|
|
|
|
Connection can be extended to add functions and members. Example::
|
|
|
|
refine connection RPC_Conn += {
|
|
function ProcessPortmapReply(results: PortmapResults): bool
|
|
%{
|
|
%}
|
|
};
|
|
|
|
State Management
|
|
----------------
|
|
|
|
State is maintained by extending parsing class by declaring derivative.
|
|
State lasts until the top level parsing unit (flowunit/datagram is destroyed).
|
|
|
|
Keywords
|
|
--------
|
|
|
|
Source code embedding
|
|
~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
C++ code can be embedded within the .pac file using the following
|
|
directives. These code will be copied into the final generated code.
|
|
|
|
- %header{...%}
|
|
|
|
Code to be inserted in binpac generated header file.
|
|
|
|
- %code{...%}
|
|
|
|
Code to be inserted at the beginning of binpac generated C++ file.
|
|
|
|
.. _%member:
|
|
|
|
- %member{...%}
|
|
|
|
Add additional member(s) to connection (?) and flow class.
|
|
|
|
- %init{...%}
|
|
|
|
Code to be inserted in flow constructor.
|
|
|
|
- %cleanup{...%}
|
|
|
|
Code to be inserted in flow destructor.
|
|
|
|
Embedded pac primitive
|
|
~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
- ${
|
|
|
|
- $set{
|
|
|
|
- $type{
|
|
|
|
- $typeof{
|
|
|
|
- $const_def{
|
|
|
|
Condition checking
|
|
~~~~~~~~~~~~~~~~~~
|
|
|
|
&until
|
|
......
|
|
|
|
"&until" is used in conjunction with array declaration. It specifies exit
|
|
condition for array parsing.
|
|
|
|
.. code::
|
|
|
|
type HTTP_Headers = HTTP_Header[] &until($input.length() == 0);
|
|
|
|
&requires
|
|
.........
|
|
|
|
Process data dependencies before evaluating field.
|
|
|
|
Example: typically, derivative field is evaluated after primary field.
|
|
However "&requires" is used to force evaluate of length before msg_body.
|
|
|
|
.. code::
|
|
|
|
type RPC_Message = record {
|
|
xid: uint32;
|
|
msg_type: uint32;
|
|
msg_body: case msg_type of {
|
|
RPC_CALL -> call: RPC_Call(this);
|
|
RPC_REPLY -> reply: RPC_Reply(this);
|
|
} &requires(length);
|
|
} &let {
|
|
length = sourcedata.length(); # length of the RPC_Message
|
|
} &byteorder = bigendian, &exportsourcedata, &refcount;
|
|
|
|
&if
|
|
...
|
|
|
|
Evaluate field only if condition is met.
|
|
|
|
.. code::
|
|
|
|
type DNS_label(msg: DNS_message) = record {
|
|
length: uint8;
|
|
data: case label_type of {
|
|
0 -> label: bytestring &length = length;
|
|
3 -> ptr_lo: uint8;
|
|
};
|
|
} &let {
|
|
label_type: uint8 = length >> 6;
|
|
last: bool = (length == 0) || (label_type == 3);
|
|
ptr: DNS_name(msg)
|
|
withinput $context.flow.get_pointer(msg.sourcedata,
|
|
((length & 0x3f) << 8) | ptr_lo)
|
|
&if(label_type == 3);
|
|
clear_pointer_set: bool = $context.flow.reset_pointer_set()
|
|
&if(last);
|
|
};
|
|
|
|
.. _case:
|
|
|
|
case
|
|
....
|
|
|
|
There are two uses to the "case" keyword.
|
|
|
|
* As part of record field. In this scenario, it allow alternative
|
|
methods to parse a field. Example::
|
|
|
|
type RPC_Reply(msg: RPC_Message) = record {
|
|
stat: uint32;
|
|
reply: case stat of {
|
|
MSG_ACCEPTED -> areply: RPC_AcceptedReply(call);
|
|
MSG_DENIED -> rreply: RPC_RejectedReply(call);
|
|
};
|
|
} &let {
|
|
call: RPC_Call = context.connection.FindCall(msg.xid);
|
|
success: bool = (stat == MSG_ACCEPTED && areply.stat == SUCCESS);
|
|
};
|
|
|
|
|
|
* As function definition. Example::
|
|
|
|
function RPC_Service(prog: uint32, vers: uint32): EnumRPCService =
|
|
case prog of {
|
|
default -> RPC_SERVICE_UNKNOWN;
|
|
};
|
|
|
|
|
|
Note that one can "refine" both types of cases:
|
|
|
|
.. code::
|
|
|
|
refine casefunc RPC_Service += {
|
|
100000 -> RPC_SERVICE_PORTMAP;
|
|
};
|
|
|
|
Built-in macros
|
|
~~~~~~~~~~~~~~~
|
|
|
|
$input
|
|
......
|
|
|
|
This macro refers to the data that was passed into the ParseBuffer
|
|
function. When $input is used, binpac generate a const_bytestring
|
|
which contains the start and end pointer of the input.
|
|
|
|
PAC grammar::
|
|
|
|
&until($input.length()==0);
|
|
|
|
PAC source::
|
|
|
|
const_bytestring t_val__elem_input(t_begin_of_data, t_end_of_data);
|
|
if ( ( t_val__elem_input.length() == 0 ) )
|
|
|
|
$element
|
|
........
|
|
|
|
$element provides access to entry of the array type. Following are
|
|
the ways which $element can be used.
|
|
|
|
* Current element. Check on the value of the most recently parsed entry.
|
|
This would get executed after each time an entry is parsed. Example::
|
|
|
|
type SMB_ascii_string = uint8[] &until($element == 0);
|
|
|
|
* Current element's field. Example::
|
|
|
|
type DNS_label(msg: DNS_message) = record {
|
|
length: uint8;
|
|
data: case label_type of {
|
|
0 -> label: bytestring &length = length;
|
|
3 -> ptr_lo: uint8;
|
|
};
|
|
} &let {
|
|
label_type: uint8 = length >> 6;
|
|
last: bool = (length == 0) || (label_type == 3);
|
|
};
|
|
type DNS_name(msg: DNS_message) = record {
|
|
labels: DNS_label(msg)[] &until($element.last);
|
|
};
|
|
|
|
$context
|
|
........
|
|
|
|
This macro refers to the Analyzer context class (Context<Name> class gets
|
|
generated from analyzer <Name> withcontext {}). Using this macro, users
|
|
can gain access to the "flow" object and "analyzer" object.
|
|
|
|
Other keywords
|
|
~~~~~~~~~~~~~~
|
|
|
|
&transient
|
|
..........
|
|
|
|
Do not create copy of the bytestring
|
|
|
|
.. code::
|
|
|
|
type MIME_Line = record {
|
|
line: bytestring &restofdata &transient;
|
|
} &oneline;
|
|
|
|
&let
|
|
....
|
|
|
|
Adds derivative field to a record
|
|
|
|
.. code::
|
|
|
|
type ncp_request(length: uint32) = record {
|
|
data : uint8[length];
|
|
} &let {
|
|
function = length > 0 ? data[0] : 0;
|
|
subfunction = length > 1 ? data[1] : 0;
|
|
};
|
|
|
|
let
|
|
...
|
|
|
|
Declares global value. If the user does not specify a type,
|
|
the compiler will assume the "int" type.
|
|
|
|
PAC grammar::
|
|
|
|
let myValue:uint8=10;
|
|
|
|
PAC source::
|
|
|
|
uint8 const myValue = 10;
|
|
|
|
PAC header::
|
|
|
|
extern uint8 const myValue;
|
|
|
|
&restofdata
|
|
...........
|
|
|
|
Grab the rest of the data available in the FlowBuffer.
|
|
|
|
PAC grammar::
|
|
|
|
onebyte: uint8;
|
|
value: bytestring &restofdata &transient;
|
|
|
|
PAC source::
|
|
|
|
// Parse "onebyte"
|
|
onebyte_ = *((uint8 const *) (t_begin_of_data));
|
|
// Parse "value"
|
|
int t_value_string_length;
|
|
t_value_string_length = (t_end_of_data) - ((t_begin_of_data + 1));
|
|
int t_value__size;
|
|
t_value__size = t_value_string_length;
|
|
value_.init((t_begin_of_data + 1), t_value_string_length);
|
|
|
|
&length
|
|
.......
|
|
|
|
Length can appear in two different contexts: as property of a field
|
|
or as property of a record.
|
|
Examples:
|
|
&length as field property::
|
|
|
|
protocol : bytestring &length = 4;
|
|
|
|
translates into::
|
|
|
|
const_byteptr t_end_of_data = t_begin_of_data + 4;
|
|
int t_protocol_string_length;
|
|
t_protocol_string_length = 4;
|
|
int t_protocol__size;
|
|
t_protocol__size = t_protocol_string_length;
|
|
protocol_.init(t_begin_of_data, t_protocol_string_length);
|
|
|
|
|
|
&check
|
|
......
|
|
|
|
This was originally intended to implement the behavior of the
|
|
superseding "&enforce" attribute. It always has and always will just be
|
|
a no-op to ensure anything that uses this doesn't suddenly and
|
|
unintentionally break.
|
|
|
|
&enforce
|
|
........
|
|
|
|
Check a condition and raise exception if not met.
|
|
|
|
&chunked and $chunk
|
|
...................
|
|
|
|
When parsing a long field with variable length, "chunked" can be used to
|
|
improve performance. However, chunked field are not buffered across
|
|
packet. Data for the chunk in the current packet can be access by
|
|
using "$chunk".
|
|
|
|
&exportsourcedata
|
|
.................
|
|
|
|
Data matched for a particular type, the data matched can be retained by
|
|
using "&exportsourcedata".
|
|
|
|
.pac file
|
|
|
|
.. code::
|
|
|
|
type myType = record {
|
|
data:uint8;
|
|
} &exportsourcedata;
|
|
|
|
_pac.h
|
|
|
|
.. code::
|
|
|
|
class myType
|
|
{
|
|
public:
|
|
myType();
|
|
~myType();
|
|
int Parse(const_byteptr const t_begin_of_data, const_byteptr const _end_of_data);
|
|
uint8 myData() const { return myData_; }
|
|
const_bytestring const & sourcedata() const { return sourcedata_; }
|
|
protected:
|
|
uint8 myData_;
|
|
const_bytestring sourcedata_;
|
|
};
|
|
|
|
_pac.cc
|
|
|
|
.. code::
|
|
|
|
sourcedata_ = const_bytestring(t_begin_of_data, t_end_of_data);
|
|
sourcedata_.set_end(t_begin_of_data + 1);
|
|
|
|
Source data can be used within the type that match it or at the parent type.
|
|
|
|
.. code::
|
|
|
|
type myParentType (child:myType) = record {
|
|
somedata:uint8;
|
|
} &let{
|
|
do_something:bool = print_stuff(child.sourcedata);
|
|
};
|
|
|
|
translates into
|
|
|
|
.. code::
|
|
|
|
do_something_ = print_stuff(child()->sourcedata());
|
|
|
|
&refcount
|
|
.........
|
|
|
|
|
|
withinput
|
|
.........
|
|
|
|
|
|
Parsing Methodology
|
|
===================
|
|
|
|
.. _Buffering:
|
|
|
|
Buffering
|
|
---------
|
|
|
|
binpac supports incremental input to deal with packet fragmentation. This
|
|
is done via use of FlowBuffer class and maintaining buffering/parsing states.
|
|
|
|
FlowBuffer Class
|
|
~~~~~~~~~~~~~~~~
|
|
|
|
FlowBuffer provides two mode of buffering: line and frame. Line mode is
|
|
useful for parsing line based language like HTTP. Frame mode is best for
|
|
fixed length message. Buffering mode can be switched during parsing and
|
|
is done transparently to the grammar writer.
|
|
|
|
At compile time binpac calculates number of bytes required to evaluate
|
|
each field. During run time, data is buffered up in FlowBuffer until
|
|
there is enough to evaluate the "record". To optimize the buffering
|
|
process, if FlowBuffer has enough data to evaluate on the first NewData,
|
|
it would only mark the start and end pointer instead of copying.
|
|
|
|
- void **NewMessage**\();
|
|
|
|
- Advances the orig_data_begin\_ pointer depend on current mode\_. Moves
|
|
by 1/2 characters in LINE_MODE, by frame_length\_ in FRAME_MODE
|
|
and nothing in UNKNOWN_MODE (default mode).
|
|
|
|
- Set buffer_n\_ to 0
|
|
|
|
- Reset message_complete\_
|
|
|
|
- void **NewLine**\();
|
|
|
|
- Reset frame_length\_ and chunked\_, set mode\_ to LINE_MODE
|
|
|
|
- void **NewFrame**\(int frame_length, bool chunked\_);
|
|
|
|
- void **GrowFrame**\(int new_frame_length);
|
|
|
|
- void **AppendToBuffer**\(const_byteptr data, int len);
|
|
|
|
- Reallocate buffer\_ to add new data then copy data
|
|
|
|
- void **ExpandBuffer**\(int length);
|
|
|
|
- Reallocate buffer\_ to new size if new size is bigger than current size.
|
|
|
|
- Set minimum size to 512 (optimization?)
|
|
|
|
- void **MarkOrCopyLine**\();
|
|
|
|
- Seek current input for end of line (CR/LF/CRLF depend on line break mode).
|
|
If found append found data to buffer if one is already created or mark (set
|
|
frame_length\_) if one is not created (to minimize copying). If end of line
|
|
is not found, append partial data till end of input to buffer. Buffer
|
|
is created if one is not there.
|
|
|
|
- const_byteptr **begin**\()/**end**\()
|
|
|
|
- Returns buffer\_ and buffer_n\_ if a buffer exist, otherwise
|
|
orig_data_begin\_ and orig_data_begin\_ + frame_length\_.
|
|
|
|
Parsing States
|
|
~~~~~~~~~~~~~~
|
|
|
|
* buffering_state\_ - each parsing class contains a flag indicating whether
|
|
there are enough data buffered to evaluate the next block.
|
|
|
|
* parsing_state\_ - each parsing class which consists of multiple parsing
|
|
data unit (line/frames) has this flag indicating the parsing stage. Each
|
|
time new data comes in, it invokes parsing function and switch on
|
|
parsing_state to determine which sub parser to use next.
|
|
|
|
Regular Expression
|
|
------------------
|
|
|
|
Evaluation Order
|
|
----------------
|
|
|
|
Running Binpac-generated Analyzer Standalone
|
|
============================================
|
|
|
|
To run binpac-generated code independent of Zeek. Regex library must be
|
|
substituted. Below is one way of doing it. Use the following three header
|
|
files.
|
|
|
|
RE.h
|
|
----
|
|
|
|
.. code::
|
|
|
|
/*Dummy file to replace Zeek's file*/
|
|
#include "binpac_pcre.h"
|
|
#include "bro_dummy.h"
|
|
|
|
bro_dummy.h
|
|
-----------
|
|
|
|
.. code::
|
|
|
|
#ifndef BRO_DUMMY
|
|
#define BRO_DUMMY
|
|
#define DEBUG_MSG(x...) fprintf(stderr, x)
|
|
/*Dummy to link, this function suppose to be in Zeek*/
|
|
double network_time();
|
|
#endif
|
|
|
|
binpac_pcre.h
|
|
-------------
|
|
|
|
.. code::
|
|
|
|
#ifndef bro_pcre_h
|
|
#define bro_pcre_h
|
|
#include <stdio.h>
|
|
#include <assert.h>
|
|
#include <string>
|
|
using namespace std;
|
|
// TODO: use configure to figure out the location of pcre.h
|
|
#include "pcre.h"
|
|
class RE_Matcher {
|
|
public:
|
|
RE_Matcher(const char* pat){
|
|
pattern_ = "^";
|
|
pattern_ += "(";
|
|
pattern_ += pat;
|
|
pattern_ += ")";
|
|
pcre_ = NULL;
|
|
pextra_ = NULL;
|
|
}
|
|
~RE_Matcher() {
|
|
if (pcre_) {
|
|
pcre_free(pcre_);
|
|
}
|
|
}
|
|
int Compile() {
|
|
const char *err = NULL;
|
|
int erroffset = 0;
|
|
pcre_ = pcre_compile(pattern_.c_str(),
|
|
0, // options,
|
|
&err,
|
|
&erroffset,
|
|
NULL);
|
|
if (pcre_ == NULL) {
|
|
fprintf(stderr,
|
|
"Error in RE_Matcher::Compile(): %d:%s\n",
|
|
erroffset, err);
|
|
return 0;
|
|
}
|
|
return 1;
|
|
}
|
|
|
|
int MatchPrefix (const char* s, int n){
|
|
const char *err=NULL;
|
|
assert(pcre_);
|
|
const int MAX_NUM_OFFSETS = 30;
|
|
int offsets[MAX_NUM_OFFSETS];
|
|
int ret = pcre_exec(pcre_,
|
|
pextra_, // pcre_extra
|
|
//NULL, // pcre_extra
|
|
s, n,
|
|
0, // offset
|
|
0, // options
|
|
offsets,
|
|
MAX_NUM_OFFSETS);
|
|
if (ret < 0) {
|
|
return -1;
|
|
}
|
|
assert(offsets[0] == 0);
|
|
return offsets[1];
|
|
}
|
|
protected:
|
|
pcre *pcre_;
|
|
string pattern_;
|
|
};
|
|
#endif
|
|
|
|
main.cc
|
|
-------
|
|
|
|
In your main source, add this dummy stub.
|
|
|
|
.. code::
|
|
|
|
/*Dummy to link, this function suppose to be in Zeek*/
|
|
double network_time(){
|
|
return 0;
|
|
}
|
|
|
|
|
|
Q & A
|
|
=====
|
|
|
|
* Does &oneline only work when "flow" is used?
|
|
|
|
Yes. binpac uses the flowunit definition in "flow" to figure out which
|
|
types require buffering. For those that do, the parse function is:
|
|
|
|
.. code::
|
|
|
|
bool ParseBuffer(flow_buffer_t t_flow_buffer, ContextHTTP * t_context);
|
|
|
|
And the code of flow_buffer_t provides the functionality of buffering up to
|
|
one line. That's why &oneline is only active when "flow" is used and the
|
|
type requires buffering.
|
|
|
|
In certain cases we would want to use &oneline even if the type does
|
|
not require buffering, binpac currently does not provide such functionality.
|
|
|
|
* How would incremental input work in the case of regex?
|
|
|
|
A regex should not take incremental input. (The binpac compiler will
|
|
complain when that happens.) It should always appear below some type
|
|
that has either &length=... or &oneline.
|
|
|
|
* What is the role of Context_<Name> class (generated by analyzer <Name>
|
|
withcontext)?
|
|
|
|
* What is the difference between ''withcontext'' and w/o ''withcontext''?
|
|
|
|
withcontext should always be there. It's fine to have an empty context.
|
|
|
|
* Elaborate on $context and how it is related to "withcontext".
|
|
|
|
A "context" parameter is passed to every type. It provides a vehicle to
|
|
pass something to every type without adding a parameter to every type.
|
|
In that sense, it's optional. It exists for convenience.
|
|
|
|
* Example usage of composite type array.
|
|
|
|
Please see HTTP_Headers in http-protocol.pac in the Zeek source code.
|
|
|
|
* Clarification on "connection" keyword (binpac paper).
|
|
|
|
* Need a new way to attach hook additional code to each class beside &let.
|
|
|
|
* &transient, how is this different from declaring anonymous field? and
|
|
currently it doesn't seem to do much
|
|
|
|
.. code::
|
|
|
|
type HTTP_Header = record {
|
|
name: HTTP_HEADER_NAME &transient;
|
|
: HTTP_WS;
|
|
value: bytestring &restofdata &transient;
|
|
} &oneline;
|
|
|
|
.. code::
|
|
|
|
// Parse "name"
|
|
int t_name_string_length;
|
|
t_name_string_length =
|
|
HTTP_HEADER_NAME_re_011.MatchPrefix(
|
|
t_begin_of_data,
|
|
t_end_of_data - t_begin_of_data);
|
|
if ( t_name_string_length < 0 )
|
|
{
|
|
throw ExceptionStringMismatch( "./http-protocol.pac:96",
|
|
"|([^: \\t]+:)",
|
|
string((const char *) (t_begin_of_data), (const char *) t_end_of_data).c_str()
|
|
);
|
|
}
|
|
int t_name__size;
|
|
t_name__size = t_name_string_length;
|
|
name_.init(t_begin_of_data, t_name_string_length);
|
|
|
|
* Detail on the globals ($context, $element, $input...etc)
|
|
|
|
* How does BinPAC work with dynamic protocol detection?
|
|
|
|
Well, you can use the code in DNS-binpac.cc as a reference. First,
|
|
create a pointer to the connection. (See the example in DNS-binpac.cc)
|
|
|
|
.. code::
|
|
|
|
interp = new binpac::DNS::DNS_Conn(this);
|
|
|
|
Pass the data received from "DeliverPacket" or "DeliverStream" to
|
|
"interp->NewData()". (Again, see the example in DNS-binpac.cc)
|
|
|
|
.. code::
|
|
|
|
void DNS_UDP_Analyzer_binpac::DeliverPacket(int len, const u_char* data, bool orig, int seq, const IP_Hdr* ip, int caplen)
|
|
{
|
|
Analyzer::DeliverPacket(len, data, orig, seq, ip, caplen);
|
|
interp->NewData(orig, data, data + len);
|
|
}
|
|
|
|
* Explanation of &withinput
|
|
|
|
* Difference between using flow and not using flow (binpac generates Parse
|
|
method instead of ParseBuffer)
|
|
|
|
* &check currently working?
|
|
|
|
* Difference between flowunit and datagram, datagram and &oneline, &length?
|
|
|
|
* Go over TODO list in binpac release
|
|
|
|
* How would input get handle/buffered when length is not known (chunked)
|
|
|
|
* More feature multi byte character? utf16 utf32 etc.
|
|
|
|
TODO List
|
|
=========
|
|
|
|
New Features
|
|
------------
|
|
|
|
* Provides a method to match simple ascii text.
|
|
|
|
* Allows use fixed length array in addition to vector.
|
|
|
|
Bugs
|
|
----
|
|
|
|
Small clean-ups
|
|
~~~~~~~~~~~~~~~
|
|
|
|
* Remove anonymous field bytestring assignment.
|
|
|
|
* Redundant overflow checking/more efficient fixed length text copying.
|
|
|
|
Warning/Errors
|
|
~~~~~~~~~~~~~~
|
|
|
|
Things that compiler should flag out at code generation time
|
|
|
|
* Give warning when &transient is used on none bytestring
|
|
|
|
* Give warning when &oneline, &length is used and flowunit is not.
|
|
|
|
* Warning when more than one "connection" is defined
|