Merge remote-tracking branch 'origin/topic/vern/ZAM-remainder'

* origin/topic/vern/ZAM-remainder: (37 commits)
  fix race condition in btest output ordering
  whoops, forgot to canonicalize filenames in new btest
  extend btest to include a coercion overflow
  fixed a typo in a comment
  fixes for vector coercion overflows, typing, and holes
  factoring out logic to check for overflows during coercions
  test case for vector coercions, including holes
  low-level cleanups found by code review
  additional conversions of size() to empty() checks that were missed previously
  indentation nit
  flag loop that has slightly subtle logic
  use ## to start major sections
  a number of low-level tweaks from code review
  use std::find_if rather than explicit loop
  switch simple loops that don't need indices to being iterator-based
  use container empty() rather than size() where appropriate
  Baseline variants for "-a zam"
  new "-a ZAM" testing baseline alternative
  updates for usage issues: support for -uu, maybe/definitely distinctions
  enable reducer to track folding to enable constant propagation
  ...
This commit is contained in:
Tim Wojtulewicz 2021-09-08 11:44:15 -07:00
commit a251aa07f7
156 changed files with 31976 additions and 771 deletions

76
CHANGES
View file

@ -1,3 +1,79 @@
4.2.0-dev.150 | 2021-09-08 11:44:15 -0700
* fix race condition in btest output ordering (Vern Paxson, Corelight)
* whoops, forgot to canonicalize filenames in new btest (Vern Paxson, Corelight)
* extend btest to include a coercion overflow (Vern Paxson, Corelight)
* fixed a typo in a comment (Vern Paxson, Corelight)
* fixes for vector coercion overflows, typing, and holes (Vern Paxson, Corelight)
* factoring out logic to check for overflows during coercions (Vern Paxson, Corelight)
* test case for vector coercions, including holes (Vern Paxson, Corelight)
* low-level cleanups found by code review (Vern Paxson, Corelight)
* additional conversions of size() to empty() checks that were missed previously (Vern Paxson, Corelight)
* indentation nit (Vern Paxson, Corelight)
* flag loop that has slightly subtle logic (Vern Paxson, Corelight)
* use ## to start major sections (Vern Paxson, Corelight)
* a number of low-level tweaks from code review (Vern Paxson, Corelight)
* use std::find_if rather than explicit loop (Vern Paxson, Corelight)
* switch simple loops that don't need indices to being iterator-based (Vern Paxson, Corelight)
* use container empty() rather than size() where appropriate (Vern Paxson, Corelight)
* Baseline variants for "-a zam" (Vern Paxson, Corelight)
* new "-a ZAM" testing baseline alternative (Vern Paxson, Corelight)
* updates for usage issues: support for -uu, maybe/definitely distinctions (Vern Paxson, Corelight)
* enable reducer to track folding to enable constant propagation (Vern Paxson, Corelight)
* switch to ID definition regions; reworked driver functions; more info for reporting uncompilable functions (Vern Paxson, Corelight)
* README for using ZAM (Vern Paxson, Corelight)
* the main ZAM code (Vern Paxson, Corelight)
* reworking of command-line options related to script optimization (Vern Paxson, Corelight)
* definitions of ZAM operations (Vern Paxson, Corelight)
* standalone templator for ZAM operations (Vern Paxson, Corelight)
* computing of identifier definition regions (Vern Paxson, Corelight)
* for parse-only, only do script analysis if looking for usage issues (Vern Paxson, Corelight)
* tracking of optimization information associated with identifiers (Vern Paxson, Corelight)
* tracking of optimization information associated with expressions (Vern Paxson, Corelight)
* tracking of optimization information associated with statements (Vern Paxson, Corelight)
* simple AST optimization for ?: operator (Vern Paxson, Corelight)
* track implicit assignments when profiling, associate counts with assignees (Vern Paxson, Corelight)
* preparing for a new Stmt subclass for ZAM function bodies (Vern Paxson, Corelight)
* provide ZAM execution with direct access to ZVal elements (Vern Paxson, Corelight)
* factoring to support debugging of Dict iterators - no semantic changes (Vern Paxson, Corelight)
* low-level tidying/nits - no semantic changes (Vern Paxson, Corelight)
4.2.0-dev.112 | 2021-09-03 18:12:12 +0000 4.2.0-dev.112 | 2021-09-03 18:12:12 +0000
* Add btests for DNS WKS and BINDS (Vlad Grigorescu) * Add btests for DNS WKS and BINDS (Vlad Grigorescu)

View file

@ -1 +1 @@
4.2.0-dev.112 4.2.0-dev.150

View file

@ -200,7 +200,7 @@ install(FILES ${PRELOAD_SCRIPT} DESTINATION ${ZEEK_SCRIPT_INSTALL_PATH}/builtin-
install(FILES ${LOAD_SCRIPT} DESTINATION ${ZEEK_SCRIPT_INSTALL_PATH}/builtin-plugins/) install(FILES ${LOAD_SCRIPT} DESTINATION ${ZEEK_SCRIPT_INSTALL_PATH}/builtin-plugins/)
######################################################################## ########################################################################
## bro target ## zeek target
find_package (Threads) find_package (Threads)
@ -234,7 +234,7 @@ endmacro(COLLECT_HEADERS _var)
cmake_policy(POP) cmake_policy(POP)
# define a command that's used to run the make_dbg_constants.py script # define a command that's used to run the make_dbg_constants.py script
# building the bro binary depends on the outputs of this script # building the zeek binary depends on the outputs of this script
add_custom_command(OUTPUT ${CMAKE_CURRENT_BINARY_DIR}/DebugCmdConstants.h add_custom_command(OUTPUT ${CMAKE_CURRENT_BINARY_DIR}/DebugCmdConstants.h
${CMAKE_CURRENT_BINARY_DIR}/DebugCmdInfoConstants.cc ${CMAKE_CURRENT_BINARY_DIR}/DebugCmdInfoConstants.cc
COMMAND ${PYTHON_EXECUTABLE} COMMAND ${PYTHON_EXECUTABLE}
@ -250,6 +250,36 @@ set(_gen_zeek_script_cpp ${CMAKE_CURRENT_BINARY_DIR}/../CPP-gen.cc)
add_custom_command(OUTPUT ${_gen_zeek_script_cpp} add_custom_command(OUTPUT ${_gen_zeek_script_cpp}
COMMAND ${CMAKE_COMMAND} -E touch ${_gen_zeek_script_cpp}) COMMAND ${CMAKE_COMMAND} -E touch ${_gen_zeek_script_cpp})
# define a command that's used to run the ZAM instruction generator;
# building the zeek binary depends on the outputs of this script
add_custom_command(OUTPUT ${CMAKE_CURRENT_BINARY_DIR}/ZAM-AssignFlavorsDefs.h
${CMAKE_CURRENT_BINARY_DIR}/ZAM-Conds.h
${CMAKE_CURRENT_BINARY_DIR}/ZAM-DirectDefs.h
${CMAKE_CURRENT_BINARY_DIR}/ZAM-EvalDefs.h
${CMAKE_CURRENT_BINARY_DIR}/ZAM-EvalMacros.h
${CMAKE_CURRENT_BINARY_DIR}/ZAM-GenExprsDefsC1.h
${CMAKE_CURRENT_BINARY_DIR}/ZAM-GenExprsDefsC2.h
${CMAKE_CURRENT_BINARY_DIR}/ZAM-GenExprsDefsC3.h
${CMAKE_CURRENT_BINARY_DIR}/ZAM-GenExprsDefsV.h
${CMAKE_CURRENT_BINARY_DIR}/ZAM-GenFieldsDefsC1.h
${CMAKE_CURRENT_BINARY_DIR}/ZAM-GenFieldsDefsC2.h
${CMAKE_CURRENT_BINARY_DIR}/ZAM-GenFieldsDefsV.h
${CMAKE_CURRENT_BINARY_DIR}/ZAM-MethodDecls.h
${CMAKE_CURRENT_BINARY_DIR}/ZAM-MethodDefs.h
${CMAKE_CURRENT_BINARY_DIR}/ZAM-Op1FlavorsDefs.h
${CMAKE_CURRENT_BINARY_DIR}/ZAM-OpSideEffects.h
${CMAKE_CURRENT_BINARY_DIR}/ZAM-OpsDefs.h
${CMAKE_CURRENT_BINARY_DIR}/ZAM-OpsNamesDefs.h
${CMAKE_CURRENT_BINARY_DIR}/ZAM-Vec1EvalDefs.h
${CMAKE_CURRENT_BINARY_DIR}/ZAM-Vec2EvalDefs.h
COMMAND ${CMAKE_CURRENT_BINARY_DIR}/Gen-ZAM
ARGS ${CMAKE_CURRENT_SOURCE_DIR}/script_opt/ZAM/Ops.in
DEPENDS ${CMAKE_CURRENT_BINARY_DIR}/Gen-ZAM
${CMAKE_CURRENT_SOURCE_DIR}/script_opt/ZAM/Ops.in
COMMENT "[sh] Generating ZAM operations"
WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}
)
set_source_files_properties(nb_dns.c PROPERTIES COMPILE_FLAGS set_source_files_properties(nb_dns.c PROPERTIES COMPILE_FLAGS
-fno-strict-aliasing) -fno-strict-aliasing)
@ -303,6 +333,7 @@ set(MAIN_SRCS
Obj.cc Obj.cc
OpaqueVal.cc OpaqueVal.cc
Options.cc Options.cc
Overflow.cc
PacketFilter.cc PacketFilter.cc
Pipe.cc Pipe.cc
PolicyFile.cc PolicyFile.cc
@ -385,6 +416,8 @@ set(MAIN_SRCS
script_opt/DefSetsMgr.cc script_opt/DefSetsMgr.cc
script_opt/Expr.cc script_opt/Expr.cc
script_opt/GenRDs.cc script_opt/GenRDs.cc
script_opt/GenIDDefs.cc
script_opt/IDOptInfo.cc
script_opt/Inline.cc script_opt/Inline.cc
script_opt/ProfileFunc.cc script_opt/ProfileFunc.cc
script_opt/ReachingDefs.cc script_opt/ReachingDefs.cc
@ -394,6 +427,20 @@ set(MAIN_SRCS
script_opt/TempVar.cc script_opt/TempVar.cc
script_opt/UseDefs.cc script_opt/UseDefs.cc
script_opt/ZAM/AM-Opt.cc
script_opt/ZAM/Branches.cc
script_opt/ZAM/BuiltIn.cc
script_opt/ZAM/Driver.cc
script_opt/ZAM/Expr.cc
script_opt/ZAM/Inst-Gen.cc
script_opt/ZAM/Low-Level.cc
script_opt/ZAM/Stmt.cc
script_opt/ZAM/Support.cc
script_opt/ZAM/Vars.cc
script_opt/ZAM/ZBody.cc
script_opt/ZAM/ZInst.cc
script_opt/ZAM/ZOp.cc
nb_dns.c nb_dns.c
digest.h digest.h
) )
@ -402,6 +449,10 @@ set(THIRD_PARTY_SRCS
3rdparty/sqlite3.c 3rdparty/sqlite3.c
) )
set(GEN_ZAM_SRCS
script_opt/ZAM/Gen-ZAM.cc
)
# Highwayhash. Highwayhash is a bit special since it has architecture dependent code... # Highwayhash. Highwayhash is a bit special since it has architecture dependent code...
set(HH_SRCS set(HH_SRCS
@ -468,12 +519,14 @@ set(zeek_SRCS
${FLEX_Scanner_INPUT} ${FLEX_Scanner_INPUT}
${BISON_Parser_INPUT} ${BISON_Parser_INPUT}
${CMAKE_CURRENT_BINARY_DIR}/DebugCmdConstants.h ${CMAKE_CURRENT_BINARY_DIR}/DebugCmdConstants.h
${CMAKE_CURRENT_BINARY_DIR}/ZAM-MethodDecls.h
${THIRD_PARTY_SRCS} ${THIRD_PARTY_SRCS}
${HH_SRCS} ${HH_SRCS}
${MAIN_SRCS} ${MAIN_SRCS}
) )
collect_headers(zeek_HEADERS ${zeek_SRCS}) collect_headers(zeek_HEADERS ${zeek_SRCS})
collect_headers(GEN_ZAM_HEADERS ${GEN_ZAM_SRCS})
add_library(zeek_objs OBJECT ${zeek_SRCS}) add_library(zeek_objs OBJECT ${zeek_SRCS})
@ -489,6 +542,8 @@ set_target_properties(zeek PROPERTIES ENABLE_EXPORTS TRUE)
install(TARGETS zeek DESTINATION bin) install(TARGETS zeek DESTINATION bin)
add_executable(Gen-ZAM ${GEN_ZAM_SRCS} ${GEN_ZAM_HEADERS})
# Install wrapper script for Bro-to-Zeek renaming. # Install wrapper script for Bro-to-Zeek renaming.
include(InstallSymlink) include(InstallSymlink)
InstallSymlink("${CMAKE_INSTALL_PREFIX}/bin/zeek-wrapper" "${CMAKE_INSTALL_PREFIX}/bin/bro") InstallSymlink("${CMAKE_INSTALL_PREFIX}/bin/zeek-wrapper" "${CMAKE_INSTALL_PREFIX}/bin/bro")

View file

@ -1527,7 +1527,7 @@ void Dictionary::MakeRobustCookie(IterCookie* cookie)
IterCookie* Dictionary::InitForIterationNonConst() //const IterCookie* Dictionary::InitForIterationNonConst() //const
{ {
num_iterators++; IncrIters();
return new IterCookie(const_cast<Dictionary*>(this)); return new IterCookie(const_cast<Dictionary*>(this));
} }
@ -1535,7 +1535,7 @@ void Dictionary::StopIterationNonConst(IterCookie* cookie) //const
{ {
ASSERT(num_iterators > 0); ASSERT(num_iterators > 0);
if ( num_iterators > 0 ) if ( num_iterators > 0 )
num_iterators--; DecrIters();
delete cookie; delete cookie;
} }
@ -1549,7 +1549,7 @@ void* Dictionary::NextEntryNonConst(detail::HashKey*& h, IterCookie*& c, bool re
if ( ! table ) if ( ! table )
{ {
if ( num_iterators > 0 ) if ( num_iterators > 0 )
num_iterators--; DecrIters();
delete c; delete c;
c = nullptr; c = nullptr;
return nullptr; //end of iteration. return nullptr; //end of iteration.
@ -1589,7 +1589,7 @@ void* Dictionary::NextEntryNonConst(detail::HashKey*& h, IterCookie*& c, bool re
if ( c->next >= capacity ) if ( c->next >= capacity )
{//end. {//end.
if ( num_iterators > 0 ) if ( num_iterators > 0 )
num_iterators--; DecrIters();
delete c; delete c;
c = nullptr; c = nullptr;
return nullptr; //end of iteration. return nullptr; //end of iteration.
@ -1641,7 +1641,7 @@ DictIterator::DictIterator(const Dictionary* d, detail::DictEntry* begin, detail
// violate the constness guarantees of const-begin()/end() and cbegin()/cend(), but we're not modifying the // violate the constness guarantees of const-begin()/end() and cbegin()/cend(), but we're not modifying the
// actual data in the collection, just a counter in the wrapper of the collection. // actual data in the collection, just a counter in the wrapper of the collection.
dict = const_cast<Dictionary*>(d); dict = const_cast<Dictionary*>(d);
dict->num_iterators++; dict->IncrIters();
} }
DictIterator::~DictIterator() DictIterator::~DictIterator()
@ -1649,7 +1649,7 @@ DictIterator::~DictIterator()
if ( dict ) if ( dict )
{ {
assert(dict->num_iterators > 0); assert(dict->num_iterators > 0);
dict->num_iterators--; dict->DecrIters();
} }
} }
@ -1673,13 +1673,13 @@ DictIterator::DictIterator(const DictIterator& that)
if ( dict ) if ( dict )
{ {
assert(dict->num_iterators > 0); assert(dict->num_iterators > 0);
dict->num_iterators--; dict->DecrIters();
} }
dict = that.dict; dict = that.dict;
curr = that.curr; curr = that.curr;
end = that.end; end = that.end;
dict->num_iterators++; dict->IncrIters();
} }
DictIterator& DictIterator::operator=(const DictIterator& that) DictIterator& DictIterator::operator=(const DictIterator& that)
@ -1690,13 +1690,13 @@ DictIterator& DictIterator::operator=(const DictIterator& that)
if ( dict ) if ( dict )
{ {
assert(dict->num_iterators > 0); assert(dict->num_iterators > 0);
dict->num_iterators--; dict->DecrIters();
} }
dict = that.dict; dict = that.dict;
curr = that.curr; curr = that.curr;
end = that.end; end = that.end;
dict->num_iterators++; dict->IncrIters();
return *this; return *this;
} }
@ -1709,7 +1709,7 @@ DictIterator::DictIterator(DictIterator&& that)
if ( dict ) if ( dict )
{ {
assert(dict->num_iterators > 0); assert(dict->num_iterators > 0);
dict->num_iterators--; dict->DecrIters();
} }
dict = that.dict; dict = that.dict;
@ -1727,7 +1727,7 @@ DictIterator& DictIterator::operator=(DictIterator&& that)
if ( dict ) if ( dict )
{ {
assert(dict->num_iterators > 0); assert(dict->num_iterators > 0);
dict->num_iterators--; dict->DecrIters();
} }
dict = that.dict; dict = that.dict;
@ -1809,7 +1809,7 @@ RobustDictIterator::RobustDictIterator(Dictionary* d) : curr(nullptr), dict(d)
inserted = new std::vector<detail::DictEntry>(); inserted = new std::vector<detail::DictEntry>();
visited = new std::vector<detail::DictEntry>(); visited = new std::vector<detail::DictEntry>();
dict->num_iterators++; dict->IncrIters();
dict->iterators->push_back(this); dict->iterators->push_back(this);
// Advance the iterator one step so that we're at the first element. // Advance the iterator one step so that we're at the first element.
@ -1833,7 +1833,7 @@ RobustDictIterator::RobustDictIterator(const RobustDictIterator& other) : curr(n
std::copy(other.visited->begin(), other.visited->end(), std::back_inserter(*visited)); std::copy(other.visited->begin(), other.visited->end(), std::back_inserter(*visited));
dict = other.dict; dict = other.dict;
dict->num_iterators++; dict->IncrIters();
dict->iterators->push_back(this); dict->iterators->push_back(this);
curr = other.curr; curr = other.curr;
@ -1870,7 +1870,7 @@ void RobustDictIterator::Complete()
if ( dict ) if ( dict )
{ {
assert(dict->num_iterators > 0); assert(dict->num_iterators > 0);
dict->num_iterators--; dict->DecrIters();
dict->iterators->erase(std::remove(dict->iterators->begin(), dict->iterators->end(), this), dict->iterators->erase(std::remove(dict->iterators->begin(), dict->iterators->end(), this),
dict->iterators->end()); dict->iterators->end());

View file

@ -477,6 +477,9 @@ private:
RobustDictIterator MakeRobustIterator(); RobustDictIterator MakeRobustIterator();
detail::DictEntry GetNextRobustIteration(RobustDictIterator* iter); detail::DictEntry GetNextRobustIteration(RobustDictIterator* iter);
void IncrIters() { ++num_iterators; }
void DecrIters() { --num_iterators; }
//alligned on 8-bytes with 4-leading bytes. 7*8=56 bytes a dictionary. //alligned on 8-bytes with 4-leading bytes. 7*8=56 bytes a dictionary.
// when sizeup but the current mapping is in progress. the current mapping will be ignored // when sizeup but the current mapping is in progress. the current mapping will be ignored

View file

@ -19,6 +19,7 @@
#include "zeek/module_util.h" #include "zeek/module_util.h"
#include "zeek/DebugLogger.h" #include "zeek/DebugLogger.h"
#include "zeek/Hash.h" #include "zeek/Hash.h"
#include "zeek/script_opt/ExprOptInfo.h"
#include "zeek/broker/Data.h" #include "zeek/broker/Data.h"
@ -79,107 +80,113 @@ const char* expr_name(BroExprTag t)
Expr::Expr(BroExprTag arg_tag) : tag(arg_tag), paren(false), type(nullptr) Expr::Expr(BroExprTag arg_tag) : tag(arg_tag), paren(false), type(nullptr)
{ {
SetLocationInfo(&start_location, &end_location); SetLocationInfo(&start_location, &end_location);
opt_info = new ExprOptInfo();
}
Expr::~Expr()
{
delete opt_info;
} }
const ListExpr* Expr::AsListExpr() const const ListExpr* Expr::AsListExpr() const
{ {
CHECK_TAG(tag, EXPR_LIST, "ExprVal::AsListExpr", expr_name) CHECK_TAG(tag, EXPR_LIST, "Expr::AsListExpr", expr_name)
return (const ListExpr*) this; return (const ListExpr*) this;
} }
ListExpr* Expr::AsListExpr() ListExpr* Expr::AsListExpr()
{ {
CHECK_TAG(tag, EXPR_LIST, "ExprVal::AsListExpr", expr_name) CHECK_TAG(tag, EXPR_LIST, "Expr::AsListExpr", expr_name)
return (ListExpr*) this; return (ListExpr*) this;
} }
ListExprPtr Expr::AsListExprPtr() ListExprPtr Expr::AsListExprPtr()
{ {
CHECK_TAG(tag, EXPR_LIST, "ExprVal::AsListExpr", expr_name) CHECK_TAG(tag, EXPR_LIST, "Expr::AsListExpr", expr_name)
return {NewRef{}, (ListExpr*) this}; return {NewRef{}, (ListExpr*) this};
} }
const NameExpr* Expr::AsNameExpr() const const NameExpr* Expr::AsNameExpr() const
{ {
CHECK_TAG(tag, EXPR_NAME, "ExprVal::AsNameExpr", expr_name) CHECK_TAG(tag, EXPR_NAME, "Expr::AsNameExpr", expr_name)
return (const NameExpr*) this; return (const NameExpr*) this;
} }
NameExpr* Expr::AsNameExpr() NameExpr* Expr::AsNameExpr()
{ {
CHECK_TAG(tag, EXPR_NAME, "ExprVal::AsNameExpr", expr_name) CHECK_TAG(tag, EXPR_NAME, "Expr::AsNameExpr", expr_name)
return (NameExpr*) this; return (NameExpr*) this;
} }
NameExprPtr Expr::AsNameExprPtr() NameExprPtr Expr::AsNameExprPtr()
{ {
CHECK_TAG(tag, EXPR_NAME, "ExprVal::AsNameExpr", expr_name) CHECK_TAG(tag, EXPR_NAME, "Expr::AsNameExpr", expr_name)
return {NewRef{}, (NameExpr*) this}; return {NewRef{}, (NameExpr*) this};
} }
const ConstExpr* Expr::AsConstExpr() const const ConstExpr* Expr::AsConstExpr() const
{ {
CHECK_TAG(tag, EXPR_CONST, "ExprVal::AsConstExpr", expr_name) CHECK_TAG(tag, EXPR_CONST, "Expr::AsConstExpr", expr_name)
return (const ConstExpr*) this; return (const ConstExpr*) this;
} }
ConstExprPtr Expr::AsConstExprPtr() ConstExprPtr Expr::AsConstExprPtr()
{ {
CHECK_TAG(tag, EXPR_CONST, "ExprVal::AsConstExpr", expr_name) CHECK_TAG(tag, EXPR_CONST, "Expr::AsConstExpr", expr_name)
return {NewRef{}, (ConstExpr*) this}; return {NewRef{}, (ConstExpr*) this};
} }
const CallExpr* Expr::AsCallExpr() const const CallExpr* Expr::AsCallExpr() const
{ {
CHECK_TAG(tag, EXPR_CALL, "ExprVal::AsCallExpr", expr_name) CHECK_TAG(tag, EXPR_CALL, "Expr::AsCallExpr", expr_name)
return (const CallExpr*) this; return (const CallExpr*) this;
} }
const AssignExpr* Expr::AsAssignExpr() const const AssignExpr* Expr::AsAssignExpr() const
{ {
CHECK_TAG(tag, EXPR_ASSIGN, "ExprVal::AsAssignExpr", expr_name) CHECK_TAG(tag, EXPR_ASSIGN, "Expr::AsAssignExpr", expr_name)
return (const AssignExpr*) this; return (const AssignExpr*) this;
} }
AssignExpr* Expr::AsAssignExpr() AssignExpr* Expr::AsAssignExpr()
{ {
CHECK_TAG(tag, EXPR_ASSIGN, "ExprVal::AsAssignExpr", expr_name) CHECK_TAG(tag, EXPR_ASSIGN, "Expr::AsAssignExpr", expr_name)
return (AssignExpr*) this; return (AssignExpr*) this;
} }
const IndexExpr* Expr::AsIndexExpr() const const IndexExpr* Expr::AsIndexExpr() const
{ {
CHECK_TAG(tag, EXPR_INDEX, "ExprVal::AsIndexExpr", expr_name) CHECK_TAG(tag, EXPR_INDEX, "Expr::AsIndexExpr", expr_name)
return (const IndexExpr*) this; return (const IndexExpr*) this;
} }
IndexExpr* Expr::AsIndexExpr() IndexExpr* Expr::AsIndexExpr()
{ {
CHECK_TAG(tag, EXPR_INDEX, "ExprVal::AsIndexExpr", expr_name) CHECK_TAG(tag, EXPR_INDEX, "Expr::AsIndexExpr", expr_name)
return (IndexExpr*) this; return (IndexExpr*) this;
} }
const EventExpr* Expr::AsEventExpr() const const EventExpr* Expr::AsEventExpr() const
{ {
CHECK_TAG(tag, EXPR_EVENT, "ExprVal::AsEventExpr", expr_name) CHECK_TAG(tag, EXPR_EVENT, "Expr::AsEventExpr", expr_name)
return (const EventExpr*) this; return (const EventExpr*) this;
} }
EventExprPtr Expr::AsEventExprPtr() EventExprPtr Expr::AsEventExprPtr()
{ {
CHECK_TAG(tag, EXPR_EVENT, "ExprVal::AsEventExpr", expr_name) CHECK_TAG(tag, EXPR_EVENT, "Expr::AsEventExpr", expr_name)
return {NewRef{}, (EventExpr*) this}; return {NewRef{}, (EventExpr*) this};
} }
const RefExpr* Expr::AsRefExpr() const const RefExpr* Expr::AsRefExpr() const
{ {
CHECK_TAG(tag, EXPR_REF, "ExprVal::AsRefExpr", expr_name) CHECK_TAG(tag, EXPR_REF, "Expr::AsRefExpr", expr_name)
return (const RefExpr*) this; return (const RefExpr*) this;
} }
RefExprPtr Expr::AsRefExprPtr() RefExprPtr Expr::AsRefExprPtr()
{ {
CHECK_TAG(tag, EXPR_REF, "ExprVal::AsRefExpr", expr_name) CHECK_TAG(tag, EXPR_REF, "Expr::AsRefExpr", expr_name)
return {NewRef{}, (RefExpr*) this}; return {NewRef{}, (RefExpr*) this};
} }

View file

@ -117,6 +117,8 @@ using RefExprPtr = IntrusivePtr<RefExpr>;
class Stmt; class Stmt;
using StmtPtr = IntrusivePtr<Stmt>; using StmtPtr = IntrusivePtr<Stmt>;
class ExprOptInfo;
class Expr : public Obj { class Expr : public Obj {
public: public:
const TypePtr& GetType() const const TypePtr& GetType() const
@ -389,6 +391,12 @@ public:
return Obj::GetLocationInfo(); return Obj::GetLocationInfo();
} }
// Access script optimization information associated with
// this statement.
ExprOptInfo* GetOptInfo() const { return opt_info; }
~Expr() override;
protected: protected:
Expr() = default; Expr() = default;
explicit Expr(BroExprTag arg_tag); explicit Expr(BroExprTag arg_tag);
@ -418,6 +426,10 @@ protected:
// derived, if any. Used as an aid for generating meaningful // derived, if any. Used as an aid for generating meaningful
// and correctly-localized error messages. // and correctly-localized error messages.
ExprPtr original = nullptr; ExprPtr original = nullptr;
// Information associated with the Expr for purposes of
// script optimization.
ExprOptInfo* opt_info;
}; };
class NameExpr final : public Expr { class NameExpr final : public Expr {

View file

@ -20,6 +20,7 @@
#include "zeek/zeekygen/ScriptInfo.h" #include "zeek/zeekygen/ScriptInfo.h"
#include "zeek/zeekygen/utils.h" #include "zeek/zeekygen/utils.h"
#include "zeek/module_util.h" #include "zeek/module_util.h"
#include "zeek/script_opt/IDOptInfo.h"
namespace zeek { namespace zeek {
@ -119,6 +120,8 @@ ID::ID(const char* arg_name, IDScope arg_scope, bool arg_is_export)
is_type = false; is_type = false;
offset = 0; offset = 0;
opt_info = new IDOptInfo(this);
infer_return_type = false; infer_return_type = false;
SetLocationInfo(&start_location, &end_location); SetLocationInfo(&start_location, &end_location);
@ -127,6 +130,7 @@ ID::ID(const char* arg_name, IDScope arg_scope, bool arg_is_export)
ID::~ID() ID::~ID()
{ {
delete [] name; delete [] name;
delete opt_info;
} }
std::string ID::ModuleName() const std::string ID::ModuleName() const
@ -285,11 +289,6 @@ const AttrPtr& ID::GetAttr(AttrTag t) const
return attrs ? attrs->Find(t) : Attr::nil; return attrs ? attrs->Find(t) : Attr::nil;
} }
void ID::AddInitExpr(ExprPtr init_expr)
{
init_exprs.emplace_back(std::move(init_expr));
}
bool ID::IsDeprecated() const bool ID::IsDeprecated() const
{ {
return GetAttr(ATTR_DEPRECATED) != nullptr; return GetAttr(ATTR_DEPRECATED) != nullptr;
@ -676,6 +675,12 @@ std::vector<Func*> ID::GetOptionHandlers() const
return v; return v;
} }
void IDOptInfo::AddInitExpr(ExprPtr init_expr)
{
init_exprs.emplace_back(std::move(init_expr));
}
} // namespace detail } // namespace detail
} // namespace zeek } // namespace zeek

View file

@ -7,7 +7,6 @@
#include <string_view> #include <string_view>
#include <vector> #include <vector>
#include "zeek/IntrusivePtr.h"
#include "zeek/Obj.h" #include "zeek/Obj.h"
#include "zeek/Attr.h" #include "zeek/Attr.h"
#include "zeek/Notifier.h" #include "zeek/Notifier.h"
@ -44,6 +43,8 @@ enum IDScope { SCOPE_FUNCTION, SCOPE_MODULE, SCOPE_GLOBAL };
class ID; class ID;
using IDPtr = IntrusivePtr<ID>; using IDPtr = IntrusivePtr<ID>;
class IDOptInfo;
class ID final : public Obj, public notifier::detail::Modifiable { class ID final : public Obj, public notifier::detail::Modifiable {
public: public:
static inline const IDPtr nil; static inline const IDPtr nil;
@ -112,10 +113,6 @@ public:
const AttrPtr& GetAttr(AttrTag t) const; const AttrPtr& GetAttr(AttrTag t) const;
void AddInitExpr(ExprPtr init_expr);
const std::vector<ExprPtr>& GetInitExprs() const
{ return init_exprs; }
bool IsDeprecated() const; bool IsDeprecated() const;
void MakeDeprecated(ExprPtr deprecation); void MakeDeprecated(ExprPtr deprecation);
@ -144,6 +141,8 @@ public:
void AddOptionHandler(FuncPtr callback, int priority); void AddOptionHandler(FuncPtr callback, int priority);
std::vector<Func*> GetOptionHandlers() const; std::vector<Func*> GetOptionHandlers() const;
IDOptInfo* GetOptInfo() const { return opt_info; }
protected: protected:
void EvalFunc(ExprPtr ef, ExprPtr ev); void EvalFunc(ExprPtr ef, ExprPtr ev);
@ -161,15 +160,15 @@ protected:
ValPtr val; ValPtr val;
AttributesPtr attrs; AttributesPtr attrs;
// Expressions used to initialize the identifier, for use by
// the scripts-to-C++ compiler. We need to track all of them
// because it's possible that a global value gets created using
// one of the earlier instances rather than the last one.
std::vector<ExprPtr> init_exprs;
// contains list of functions that are called when an option changes // contains list of functions that are called when an option changes
std::multimap<int, FuncPtr> option_handlers; std::multimap<int, FuncPtr> option_handlers;
// Information managed by script optimization. We package this
// up into a separate object for purposes of modularity, and,
// via the associated pointer, to allow it to be modified in
// contexts where the ID is itself "const".
IDOptInfo* opt_info;
}; };
} // namespace zeek::detail } // namespace zeek::detail

View file

@ -142,59 +142,84 @@ void usage(const char* prog, int code)
exit(code); exit(code);
} }
static void print_analysis_help()
{
fprintf(stderr, "--optimize options when using ZAM:\n");
fprintf(stderr, " ZAM execute scripts using ZAM and all optimizations\n");
fprintf(stderr, " help print this list\n");
fprintf(stderr, " report-uncompilable print names of functions that can't be compiled\n");
fprintf(stderr, "\n primarily for developers:\n");
fprintf(stderr, " dump-uds dump use-defs to stdout; implies xform\n");
fprintf(stderr, " dump-xform dump transformed scripts to stdout; implies xform\n");
fprintf(stderr, " dump-ZAM dump generated ZAM code; implies gen-ZAM-code\n");
fprintf(stderr, " gen-ZAM-code generate ZAM code (without turning on additional optimizations)\n");
fprintf(stderr, " inline inline function calls\n");
fprintf(stderr, " no-ZAM-opt omit low-level ZAM optimization\n");
fprintf(stderr, " optimize-all optimize all scripts, even inlined ones\n");
fprintf(stderr, " optimize-AST optimize the (transformed) AST; implies xform\n");
fprintf(stderr, " profile-ZAM generate to stdout a ZAM execution profile\n");
fprintf(stderr, " report-recursive report on recursive functions and exit\n");
fprintf(stderr, " xform transform scripts to \"reduced\" form\n");
fprintf(stderr, "\n--optimize options when generating C++:\n");
fprintf(stderr, " gen-C++ generate C++ script bodies\n");
fprintf(stderr, " gen-standalone-C++ generate \"standalone\" C++ script bodies\n");
fprintf(stderr, " help print this list\n");
fprintf(stderr, " report-C++ report available C++ script bodies and exit\n");
fprintf(stderr, " report-uncompilable print names of functions that can't be compiled\n");
fprintf(stderr, " use-C++ use available C++ script bodies\n");
fprintf(stderr, "\n experimental options for incremental compilation:\n");
fprintf(stderr, " add-C++ generate private C++ for any missing script bodies\n");
fprintf(stderr, " update-C++ generate reusable C++ for any missing script bodies\n");
}
static void set_analysis_option(const char* opt, Options& opts) static void set_analysis_option(const char* opt, Options& opts)
{ {
if ( ! opt || util::streq(opt, "all") ) auto& a_o = opts.analysis_options;
if ( ! opt || util::streq(opt, "ZAM") )
{ {
opts.analysis_options.inliner = true; a_o.inliner = a_o.optimize_AST = a_o.activate = true;
opts.analysis_options.activate = true; a_o.gen_ZAM = true;
opts.analysis_options.optimize_AST = true;
return; return;
} }
if ( util::streq(opt, "help") ) if ( util::streq(opt, "help") )
{ {
fprintf(stderr, "--optimize options:\n"); print_analysis_help();
fprintf(stderr, " all equivalent to \"inline\" and \"activate\"\n");
fprintf(stderr, " add-C++ generate private C++ for any missing script bodies\n");
fprintf(stderr, " compile-all *if* compiling, compile all scripts, even inlined ones\n");
fprintf(stderr, " dump-uds dump use-defs to stdout; implies xform\n");
fprintf(stderr, " dump-xform dump transformed scripts to stdout; implies xform\n");
fprintf(stderr, " gen-C++ generate C++ script bodies\n");
fprintf(stderr, " gen-standalone-C++ generate \"standalone\" C++ script bodies\n");
fprintf(stderr, " help print this list\n");
fprintf(stderr, " inline inline function calls\n");
fprintf(stderr, " optimize-AST optimize the (transformed) AST; implies xform\n");
fprintf(stderr, " recursive report on recursive functions and exit\n");
fprintf(stderr, " report-C++ report available C++ script bodies and exit\n");
fprintf(stderr, " update-C++ generate reusable C++ for any missing script bodies\n");
fprintf(stderr, " use-C++ use available C++ script bodies\n");
fprintf(stderr, " xform tranform scripts to \"reduced\" form\n");
exit(0); exit(0);
} }
auto& a_o = opts.analysis_options;
if ( util::streq(opt, "add-C++") ) if ( util::streq(opt, "add-C++") )
a_o.add_CPP = true; a_o.add_CPP = true;
else if ( util::streq(opt, "compile-all") )
a_o.activate = a_o.compile_all = true;
else if ( util::streq(opt, "dump-uds") ) else if ( util::streq(opt, "dump-uds") )
a_o.activate = a_o.dump_uds = true; a_o.activate = a_o.dump_uds = true;
else if ( util::streq(opt, "dump-xform") ) else if ( util::streq(opt, "dump-xform") )
a_o.activate = a_o.dump_xform = true; a_o.activate = a_o.dump_xform = true;
else if ( util::streq(opt, "dump-ZAM") )
a_o.activate = a_o.dump_ZAM = true;
else if ( util::streq(opt, "gen-C++") ) else if ( util::streq(opt, "gen-C++") )
a_o.gen_CPP = true; a_o.gen_CPP = true;
else if ( util::streq(opt, "gen-standalone-C++") ) else if ( util::streq(opt, "gen-standalone-C++") )
a_o.gen_standalone_CPP = true; a_o.gen_standalone_CPP = true;
else if ( util::streq(opt, "gen-ZAM-code") )
a_o.activate = a_o.gen_ZAM_code = true;
else if ( util::streq(opt, "inline") ) else if ( util::streq(opt, "inline") )
a_o.inliner = true; a_o.inliner = true;
else if ( util::streq(opt, "no-ZAM-opt") )
a_o.activate = a_o.no_ZAM_opt = true;
else if ( util::streq(opt, "optimize-all") )
a_o.activate = a_o.compile_all = true;
else if ( util::streq(opt, "optimize-AST") ) else if ( util::streq(opt, "optimize-AST") )
a_o.activate = a_o.optimize_AST = true; a_o.activate = a_o.optimize_AST = true;
else if ( util::streq(opt, "recursive") ) else if ( util::streq(opt, "profile-ZAM") )
a_o.inliner = a_o.report_recursive = true; a_o.activate = a_o.profile_ZAM = true;
else if ( util::streq(opt, "report-C++") ) else if ( util::streq(opt, "report-C++") )
a_o.report_CPP = true; a_o.report_CPP = true;
else if ( util::streq(opt, "report-recursive") )
a_o.inliner = a_o.report_recursive = true;
else if ( util::streq(opt, "report-uncompilable") )
a_o.report_uncompilable = true;
else if ( util::streq(opt, "update-C++") ) else if ( util::streq(opt, "update-C++") )
a_o.update_CPP = true; a_o.update_CPP = true;
else if ( util::streq(opt, "use-C++") ) else if ( util::streq(opt, "use-C++") )
@ -204,7 +229,9 @@ static void set_analysis_option(const char* opt, Options& opts)
else else
{ {
fprintf(stderr,"zeek: unrecognized --optimize option: %s\n", opt); fprintf(stderr,"zeek: unrecognized -O/--optimize option: %s\n\n",
opt);
print_analysis_help();
exit(1); exit(1);
} }
} }

39
src/Overflow.cc Normal file
View file

@ -0,0 +1,39 @@
// See the file "COPYING" in the main distribution directory for copyright.
#include "zeek/Overflow.h"
#include "zeek/Val.h"
namespace zeek::detail {
bool would_overflow(const zeek::Type* from_type, const zeek::Type* to_type,
const Val* val)
{
if ( ! to_type || ! from_type )
return true;
if ( same_type(to_type, from_type) )
return false;
if ( to_type->InternalType() == TYPE_INTERNAL_DOUBLE )
return false;
if ( to_type->InternalType() == TYPE_INTERNAL_UNSIGNED )
{
if ( from_type->InternalType() == TYPE_INTERNAL_DOUBLE )
return double_to_count_would_overflow(val->InternalDouble());
if ( from_type->InternalType() == TYPE_INTERNAL_INT )
return int_to_count_would_overflow(val->InternalInt());
}
if ( to_type->InternalType() == TYPE_INTERNAL_INT )
{
if ( from_type->InternalType() == TYPE_INTERNAL_DOUBLE )
return double_to_int_would_overflow(val->InternalDouble());
if ( from_type->InternalType() == TYPE_INTERNAL_UNSIGNED )
return count_to_int_would_overflow(val->InternalUnsigned());
}
return false;
}
}

33
src/Overflow.h Normal file
View file

@ -0,0 +1,33 @@
// See the file "COPYING" in the main distribution directory for copyright.
#pragma once
#include "zeek/Type.h"
namespace zeek::detail {
inline bool double_to_count_would_overflow(double v)
{
return v < 0.0 || v > static_cast<double>(UINT64_MAX);
}
inline bool int_to_count_would_overflow(bro_int_t v)
{
return v < 0.0;
}
inline bool double_to_int_would_overflow(double v)
{
return v < static_cast<double>(INT64_MIN) ||
v > static_cast<double>(INT64_MAX);
}
inline bool count_to_int_would_overflow(bro_uint_t v)
{
return v > INT64_MAX;
}
extern bool would_overflow(const zeek::Type* from_type,
const zeek::Type* to_type, const Val* val);
}

View file

@ -19,6 +19,7 @@
#include "zeek/Trigger.h" #include "zeek/Trigger.h"
#include "zeek/IntrusivePtr.h" #include "zeek/IntrusivePtr.h"
#include "zeek/logging/Manager.h" #include "zeek/logging/Manager.h"
#include "zeek/script_opt/StmtOptInfo.h"
#include "zeek/logging/logging.bif.h" #include "zeek/logging/logging.bif.h"
@ -35,6 +36,7 @@ const char* stmt_name(StmtTag t)
"catch-return", "catch-return",
"check-any-length", "check-any-length",
"compiled-C++", "compiled-C++",
"ZAM", "ZAM-resumption",
"null", "null",
}; };
@ -48,11 +50,14 @@ Stmt::Stmt(StmtTag arg_tag)
last_access = 0; last_access = 0;
access_count = 0; access_count = 0;
opt_info = new StmtOptInfo();
SetLocationInfo(&start_location, &end_location); SetLocationInfo(&start_location, &end_location);
} }
Stmt::~Stmt() Stmt::~Stmt()
{ {
delete opt_info;
} }
StmtList* Stmt::AsStmtList() StmtList* Stmt::AsStmtList()

View file

@ -48,6 +48,8 @@ class Reducer;
class Stmt; class Stmt;
using StmtPtr = IntrusivePtr<Stmt>; using StmtPtr = IntrusivePtr<Stmt>;
class StmtOptInfo;
class Stmt : public Obj { class Stmt : public Obj {
public: public:
StmtTag Tag() const { return tag; } StmtTag Tag() const { return tag; }
@ -160,6 +162,10 @@ public:
return Obj::GetLocationInfo(); return Obj::GetLocationInfo();
} }
// Access script optimization information associated with
// this statement.
StmtOptInfo* GetOptInfo() const { return opt_info; }
protected: protected:
explicit Stmt(StmtTag arg_tag); explicit Stmt(StmtTag arg_tag);
@ -182,6 +188,10 @@ protected:
// derived, if any. Used as an aid for generating meaningful // derived, if any. Used as an aid for generating meaningful
// and correctly-localized error messages. // and correctly-localized error messages.
StmtPtr original = nullptr; StmtPtr original = nullptr;
// Information associated with the Stmt for purposes of
// script optimization.
StmtOptInfo* opt_info;
}; };
} // namespace detail } // namespace detail

View file

@ -21,6 +21,8 @@ enum StmtTag {
STMT_CATCH_RETURN, // for reduced InlineExpr's STMT_CATCH_RETURN, // for reduced InlineExpr's
STMT_CHECK_ANY_LEN, // internal reduced statement STMT_CHECK_ANY_LEN, // internal reduced statement
STMT_CPP, // compiled C++ STMT_CPP, // compiled C++
STMT_ZAM, // a ZAM function body
STMT_ZAM_RESUMPTION, // resumes ZAM execution for "when" statements
STMT_NULL STMT_NULL
#define NUM_STMTS (int(STMT_NULL) + 1) #define NUM_STMTS (int(STMT_NULL) + 1)
}; };

View file

@ -2,6 +2,7 @@
#include "zeek/zeek-config.h" #include "zeek/zeek-config.h"
#include "zeek/Val.h" #include "zeek/Val.h"
#include "zeek/Overflow.h"
#include <sys/types.h> #include <sys/types.h>
#include <sys/param.h> #include <sys/param.h>
@ -349,33 +350,6 @@ void Val::SetID(detail::ID* id)
} }
#endif #endif
bool Val::WouldOverflow(const zeek::Type* from_type, const zeek::Type* to_type, const Val* val)
{
if ( !to_type || !from_type )
return true;
else if ( same_type(to_type, from_type) )
return false;
if ( to_type->InternalType() == TYPE_INTERNAL_DOUBLE )
return false;
else if ( to_type->InternalType() == TYPE_INTERNAL_UNSIGNED )
{
if ( from_type->InternalType() == TYPE_INTERNAL_DOUBLE )
return (val->InternalDouble() < 0.0 || val->InternalDouble() > static_cast<double>(UINT64_MAX));
else if ( from_type->InternalType() == TYPE_INTERNAL_INT )
return (val->InternalInt() < 0);
}
else if ( to_type->InternalType() == TYPE_INTERNAL_INT )
{
if ( from_type->InternalType() == TYPE_INTERNAL_DOUBLE )
return (val->InternalDouble() < static_cast<double>(INT64_MIN) ||
val->InternalDouble() > static_cast<double>(INT64_MAX));
else if ( from_type->InternalType() == TYPE_INTERNAL_UNSIGNED )
return (val->InternalUnsigned() > INT64_MAX);
}
return false;
}
TableValPtr Val::GetRecordFields() TableValPtr Val::GetRecordFields()
{ {
@ -3825,7 +3799,7 @@ ValPtr check_and_promote(ValPtr v,
switch ( it ) { switch ( it ) {
case TYPE_INTERNAL_INT: case TYPE_INTERNAL_INT:
if ( ( vit == TYPE_INTERNAL_UNSIGNED || vit == TYPE_INTERNAL_DOUBLE ) && Val::WouldOverflow(vt, t, v.get()) ) if ( ( vit == TYPE_INTERNAL_UNSIGNED || vit == TYPE_INTERNAL_DOUBLE ) && detail::would_overflow(vt, t, v.get()) )
{ {
t->Error("overflow promoting from unsigned/double to signed arithmetic value", v.get(), false, expr_location); t->Error("overflow promoting from unsigned/double to signed arithmetic value", v.get(), false, expr_location);
return nullptr; return nullptr;
@ -3841,7 +3815,7 @@ ValPtr check_and_promote(ValPtr v,
break; break;
case TYPE_INTERNAL_UNSIGNED: case TYPE_INTERNAL_UNSIGNED:
if ( ( vit == TYPE_INTERNAL_DOUBLE || vit == TYPE_INTERNAL_INT) && Val::WouldOverflow(vt, t, v.get()) ) if ( ( vit == TYPE_INTERNAL_DOUBLE || vit == TYPE_INTERNAL_INT) && detail::would_overflow(vt, t, v.get()) )
{ {
t->Error("overflow promoting from signed/double to unsigned arithmetic value", v.get(), false, expr_location); t->Error("overflow promoting from signed/double to unsigned arithmetic value", v.get(), false, expr_location);
return nullptr; return nullptr;

View file

@ -207,8 +207,6 @@ UNDERLYING_ACCESSOR_DECL(TypeVal, zeek::Type*, AsType)
void SetID(detail::ID* id); void SetID(detail::ID* id);
#endif #endif
static bool WouldOverflow(const zeek::Type* from_type, const zeek::Type* to_type, const Val* val);
TableValPtr GetRecordFields(); TableValPtr GetRecordFields();
StringValPtr ToJSON(bool only_loggable=false, RE_Matcher* re=nullptr); StringValPtr ToJSON(bool only_loggable=false, RE_Matcher* re=nullptr);

View file

@ -35,6 +35,10 @@ using TypeValPtr = IntrusivePtr<TypeVal>;
using ValPtr = IntrusivePtr<Val>; using ValPtr = IntrusivePtr<Val>;
using VectorValPtr = IntrusivePtr<VectorVal>; using VectorValPtr = IntrusivePtr<VectorVal>;
namespace detail {
class ZBody;
}
// Note that a ZVal by itself is ambiguous: it doesn't track its type. // Note that a ZVal by itself is ambiguous: it doesn't track its type.
// This makes them consume less memory and cheaper to copy. It does // This makes them consume less memory and cheaper to copy. It does
// however require a separate way to determine the type. Generally // however require a separate way to determine the type. Generally
@ -70,9 +74,9 @@ union ZVal {
ZVal(OpaqueVal* v) { opaque_val = v; } ZVal(OpaqueVal* v) { opaque_val = v; }
ZVal(PatternVal* v) { re_val = v; } ZVal(PatternVal* v) { re_val = v; }
ZVal(TableVal* v) { table_val = v; } ZVal(TableVal* v) { table_val = v; }
ZVal(TypeVal* v) { type_val = v; }
ZVal(RecordVal* v) { record_val = v; } ZVal(RecordVal* v) { record_val = v; }
ZVal(VectorVal* v) { vector_val = v; } ZVal(VectorVal* v) { vector_val = v; }
ZVal(TypeVal* v) { type_val = v; }
ZVal(Val* v) { any_val = v; } ZVal(Val* v) { any_val = v; }
ZVal(StringValPtr v) { string_val = v.release(); } ZVal(StringValPtr v) { string_val = v.release(); }
@ -82,9 +86,9 @@ union ZVal {
ZVal(OpaqueValPtr v) { opaque_val = v.release(); } ZVal(OpaqueValPtr v) { opaque_val = v.release(); }
ZVal(PatternValPtr v) { re_val = v.release(); } ZVal(PatternValPtr v) { re_val = v.release(); }
ZVal(TableValPtr v) { table_val = v.release(); } ZVal(TableValPtr v) { table_val = v.release(); }
ZVal(TypeValPtr v) { type_val = v.release(); }
ZVal(RecordValPtr v) { record_val = v.release(); } ZVal(RecordValPtr v) { record_val = v.release(); }
ZVal(VectorValPtr v) { vector_val = v.release(); } ZVal(VectorValPtr v) { vector_val = v.release(); }
ZVal(TypeValPtr v) { type_val = v.release(); }
// Convert to a higher-level script value. The caller needs to // Convert to a higher-level script value. The caller needs to
// ensure that they're providing the correct type. // ensure that they're providing the correct type.
@ -160,6 +164,7 @@ union ZVal {
private: private:
friend class RecordVal; friend class RecordVal;
friend class VectorVal; friend class VectorVal;
friend class zeek::detail::ZBody;
// Used for bool, int, enum. // Used for bool, int, enum.
bro_int_t int_val; bro_int_t int_val;
@ -170,8 +175,8 @@ private:
// Used for double, time, interval. // Used for double, time, interval.
double double_val; double double_val;
// The types are all variants of Val, Type, or more fundamentally // The types are all variants of Val, or more fundamentally Obj.
// Obj. They are raw pointers rather than IntrusivePtr's because // They are raw pointers rather than IntrusivePtr's because
// unions can't contain the latter. For memory management, we use // unions can't contain the latter. For memory management, we use
// Ref/Unref. // Ref/Unref.
StringVal* string_val; StringVal* string_val;

View file

@ -98,6 +98,7 @@
#include "zeek/zeekygen/Manager.h" #include "zeek/zeekygen/Manager.h"
#include "zeek/module_util.h" #include "zeek/module_util.h"
#include "zeek/IntrusivePtr.h" #include "zeek/IntrusivePtr.h"
#include "zeek/script_opt/IDOptInfo.h"
extern const char* filename; // Absolute path of file currently being parsed. extern const char* filename; // Absolute path of file currently being parsed.
extern const char* last_filename; // Absolute path of last file parsed. extern const char* last_filename; // Absolute path of last file parsed.
@ -244,7 +245,7 @@ static void build_global(ID* id, Type* t, InitClass ic, Expr* e,
add_global(id_ptr, std::move(t_ptr), ic, e_ptr, std::move(attrs_ptr), dt); add_global(id_ptr, std::move(t_ptr), ic, e_ptr, std::move(attrs_ptr), dt);
id->AddInitExpr(e_ptr); id->GetOptInfo()->AddInitExpr(e_ptr);
if ( dt == VAR_REDEF ) if ( dt == VAR_REDEF )
zeekygen_mgr->Redef(id, ::filename, ic, std::move(e_ptr)); zeekygen_mgr->Redef(id, ::filename, ic, std::move(e_ptr));
@ -265,7 +266,7 @@ static StmtPtr build_local(ID* id, Type* t, InitClass ic, Expr* e,
auto init = add_local(std::move(id_ptr), std::move(t_ptr), ic, auto init = add_local(std::move(id_ptr), std::move(t_ptr), ic,
e_ptr, std::move(attrs_ptr), dt); e_ptr, std::move(attrs_ptr), dt);
id->AddInitExpr(std::move(e_ptr)); id->GetOptInfo()->AddInitExpr(std::move(e_ptr));
if ( do_coverage ) if ( do_coverage )
script_coverage_mgr.AddStmt(init.get()); script_coverage_mgr.AddStmt(init.get());

View file

@ -135,7 +135,8 @@ class CPPCompile {
public: public:
CPPCompile(std::vector<FuncInfo>& _funcs, ProfileFuncs& pfs, CPPCompile(std::vector<FuncInfo>& _funcs, ProfileFuncs& pfs,
const std::string& gen_name, const std::string& addl_name, const std::string& gen_name, const std::string& addl_name,
CPPHashManager& _hm, bool _update, bool _standalone); CPPHashManager& _hm, bool _update, bool _standalone,
bool report_uncompilable);
~CPPCompile(); ~CPPCompile();
private: private:
@ -145,7 +146,7 @@ private:
// //
// Main driver, invoked by constructor. // Main driver, invoked by constructor.
void Compile(); void Compile(bool report_uncompilable);
// Generate the beginning of the compiled code: run-time functions, // Generate the beginning of the compiled code: run-time functions,
// namespace, auxiliary globals. // namespace, auxiliary globals.
@ -161,8 +162,11 @@ private:
void GenEpilog(); void GenEpilog();
// True if the given function (plus body and profile) is one // True if the given function (plus body and profile) is one
// that should be compiled. // that should be compiled. If non-nil, sets reason to the
bool IsCompilable(const FuncInfo& func); // the reason why, if there's a fundamental problem. If however
// the function should be skipped for other reasons, then sets
// it to nil.
bool IsCompilable(const FuncInfo& func, const char** reason = nullptr);
// The set of functions/bodies we're compiling. // The set of functions/bodies we're compiling.
std::vector<FuncInfo>& funcs; std::vector<FuncInfo>& funcs;

View file

@ -14,7 +14,8 @@ using namespace std;
CPPCompile::CPPCompile(vector<FuncInfo>& _funcs, ProfileFuncs& _pfs, CPPCompile::CPPCompile(vector<FuncInfo>& _funcs, ProfileFuncs& _pfs,
const string& gen_name, const string& _addl_name, const string& gen_name, const string& _addl_name,
CPPHashManager& _hm, bool _update, bool _standalone) CPPHashManager& _hm, bool _update, bool _standalone,
bool report_uncompilable)
: funcs(_funcs), pfs(_pfs), hm(_hm), : funcs(_funcs), pfs(_pfs), hm(_hm),
update(_update), standalone(_standalone) update(_update), standalone(_standalone)
{ {
@ -67,7 +68,7 @@ CPPCompile::CPPCompile(vector<FuncInfo>& _funcs, ProfileFuncs& _pfs,
fclose(addl_f); fclose(addl_f);
} }
Compile(); Compile(report_uncompilable);
} }
CPPCompile::~CPPCompile() CPPCompile::~CPPCompile()
@ -75,7 +76,7 @@ CPPCompile::~CPPCompile()
fclose(write_file); fclose(write_file);
} }
void CPPCompile::Compile() void CPPCompile::Compile(bool report_uncompilable)
{ {
// Get the working directory so we can use it in diagnostic messages // Get the working directory so we can use it in diagnostic messages
// as a way to identify this compilation. Only germane when doing // as a way to identify this compilation. Only germane when doing
@ -100,8 +101,13 @@ void CPPCompile::Compile()
// Can't be called directly. // Can't be called directly.
continue; continue;
if ( IsCompilable(func) ) const char* reason;
if ( IsCompilable(func, &reason) )
compilable_funcs.insert(BodyName(func)); compilable_funcs.insert(BodyName(func));
else if ( reason && report_uncompilable )
fprintf(stderr,
"%s cannot be compiled to C++ due to %s\n",
func.Func()->Name(), reason);
auto h = func.Profile()->HashVal(); auto h = func.Profile()->HashVal();
if ( hm.HasHash(h) ) if ( hm.HasHash(h) )
@ -341,17 +347,24 @@ void CPPCompile::GenEpilog()
Emit("} // zeek::detail"); Emit("} // zeek::detail");
} }
bool CPPCompile::IsCompilable(const FuncInfo& func) bool CPPCompile::IsCompilable(const FuncInfo& func, const char** reason)
{ {
if ( ! is_CPP_compilable(func.Profile(), reason) )
return false;
if ( reason )
// Indicate that there's no fundamental reason it can't be
// compiled.
*reason = nullptr;
if ( func.ShouldSkip() ) if ( func.ShouldSkip() )
// Caller marked this function as one to skip.
return false; return false;
if ( hm.HasHash(func.Profile()->HashVal()) ) if ( hm.HasHash(func.Profile()->HashVal()) )
// We've already compiled it. // We've already compiled it.
return false; return false;
return is_CPP_compilable(func.Profile()); return true;
} }
} // zeek::detail } // zeek::detail

View file

@ -6,6 +6,7 @@
#include "zeek/module_util.h" #include "zeek/module_util.h"
#include "zeek/script_opt/ProfileFunc.h" #include "zeek/script_opt/ProfileFunc.h"
#include "zeek/script_opt/IDOptInfo.h"
#include "zeek/script_opt/CPP/Compile.h" #include "zeek/script_opt/CPP/Compile.h"
@ -122,7 +123,7 @@ void CPPCompile::GenGlobalInit(const ID* g, string& gl, const ValPtr& v)
// expression anyway.) // expression anyway.)
// Use the final initialization expression. // Use the final initialization expression.
auto& init_exprs = g->GetInitExprs(); auto& init_exprs = g->GetOptInfo()->GetInitExprs();
init_val = GenExpr(init_exprs.back(), GEN_VAL_PTR, false); init_val = GenExpr(init_exprs.back(), GEN_VAL_PTR, false);
} }
else else
@ -279,10 +280,7 @@ void CPPCompile::AddInit(const Obj* o, const string& init)
void CPPCompile::AddInit(const Obj* o) void CPPCompile::AddInit(const Obj* o)
{ {
if ( obj_inits.count(o) == 0 ) if ( obj_inits.count(o) == 0 )
{ obj_inits[o] = {};
vector<string> empty;
obj_inits[o] = empty;
}
} }
void CPPCompile::NoteInitDependency(const Obj* o1, const Obj* o2) void CPPCompile::NoteInitDependency(const Obj* o1, const Obj* o2)

View file

@ -78,8 +78,8 @@ a file `CPP-hashes.dat`, for use by an advanced feature, and an
empty `CPP-gen-addl.h` file (same). empty `CPP-gen-addl.h` file (same).
2. `ninja` or `make` to recompile Zeek 2. `ninja` or `make` to recompile Zeek
3. `./src/zeek -O use-C++ target.zeek` 3. `./src/zeek -O use-C++ target.zeek`
Executes with each function/hook/ Executes with each function/hook/event
event handler pulled in by `target.zeek` replaced with its compiled version. handler pulled in by `target.zeek` replaced with its compiled version.
Instead of the last line above, you can use the following variants: Instead of the last line above, you can use the following variants:

View file

@ -133,7 +133,7 @@ void CPPCompile::ExpandTypeVar(const TypePtr& t)
} }
auto& script_type_name = t->GetName(); auto& script_type_name = t->GetName();
if ( script_type_name.size() > 0 ) if ( ! script_type_name.empty() )
AddInit(t, "register_type__CPP(" + tn + ", \"" + AddInit(t, "register_type__CPP(" + tn + ", \"" +
script_type_name + "\");"); script_type_name + "\");");
@ -145,9 +145,8 @@ void CPPCompile::ExpandListTypeVar(const TypePtr& t, string& tn)
const auto& tl = t->AsTypeList()->GetTypes(); const auto& tl = t->AsTypeList()->GetTypes();
auto t_name = tn + "->AsTypeList()"; auto t_name = tn + "->AsTypeList()";
for ( auto i = 0u; i < tl.size(); ++i ) for ( const auto& tl_i : tl )
AddInit(t, t_name + "->Append(" + AddInit(t, t_name + "->Append(" + GenTypeName(tl_i) + ");");
GenTypeName(tl[i]) + ");");
} }
void CPPCompile::ExpandRecordTypeVar(const TypePtr& t, string& tn) void CPPCompile::ExpandRecordTypeVar(const TypePtr& t, string& tn)
@ -181,7 +180,7 @@ void CPPCompile::ExpandEnumTypeVar(const TypePtr& t, string& tn)
auto names = et->Names(); auto names = et->Names();
AddInit(t, "{ auto et = " + e_name + ";"); AddInit(t, "{ auto et = " + e_name + ";");
AddInit(t, "if ( et->Names().size() == 0 ) {"); AddInit(t, "if ( et->Names().empty() ) {");
for ( const auto& name_pair : et->Names() ) for ( const auto& name_pair : et->Names() )
AddInit(t, string("\tet->AddNameInternal(\"") + AddInit(t, string("\tet->AddNameInternal(\"") +
@ -459,10 +458,10 @@ void CPPCompile::RegisterListType(const TypePtr& t)
{ {
const auto& tl = t->AsTypeList()->GetTypes(); const auto& tl = t->AsTypeList()->GetTypes();
for ( auto i = 0u; i < tl.size(); ++i ) for ( auto& tl_i : tl )
{ {
NoteNonRecordInitDependency(t, tl[i]); NoteNonRecordInitDependency(t, tl_i);
RegisterType(tl[i]); RegisterType(tl_i);
} }
} }
@ -489,10 +488,8 @@ void CPPCompile::RegisterRecordType(const TypePtr& t)
if ( ! r ) if ( ! r )
return; return;
for ( auto i = 0; i < r->length(); ++i ) for ( const auto& r_i : *r )
{ {
const auto& r_i = (*r)[i];
NoteNonRecordInitDependency(t, r_i->type); NoteNonRecordInitDependency(t, r_i->type);
RegisterType(r_i->type); RegisterType(r_i->type);

View file

@ -33,13 +33,21 @@ string scope_prefix(int scope)
return scope_prefix(to_string(scope)); return scope_prefix(to_string(scope));
} }
bool is_CPP_compilable(const ProfileFunc* pf) bool is_CPP_compilable(const ProfileFunc* pf, const char** reason)
{ {
if ( pf->NumWhenStmts() > 0 ) if ( pf->NumWhenStmts() > 0 )
{
if ( reason )
*reason = "use of \"when\"";
return false; return false;
}
if ( pf->TypeSwitches().size() > 0 ) if ( pf->TypeSwitches().size() > 0 )
{
if ( reason )
*reason = "use of type-based \"switch\"";
return false; return false;
}
return true; return true;
} }

View file

@ -19,8 +19,11 @@ extern std::string scope_prefix(const std::string& scope);
// Same, but for scopes identified with numbers. // Same, but for scopes identified with numbers.
extern std::string scope_prefix(int scope); extern std::string scope_prefix(int scope);
// True if the given function is compilable to C++. // True if the given function is compilable to C++. If it isn't, and
extern bool is_CPP_compilable(const ProfileFunc* pf); // the second argument is non-nil, then on return it points to text
// explaining why not.
extern bool is_CPP_compilable(const ProfileFunc* pf,
const char** reason = nullptr);
// Helper utilities for file locking, to ensure that hash files // Helper utilities for file locking, to ensure that hash files
// don't receive conflicting writes due to concurrent compilations. // don't receive conflicting writes due to concurrent compilations.

View file

@ -54,7 +54,7 @@ bool CPPCompile::CheckForCollisions()
// the name either (1) wasn't previously used, or (2) if it // the name either (1) wasn't previously used, or (2) if it
// was, it was likewise for an enum or a record. // was, it was likewise for an enum or a record.
const auto& tn = t->GetName(); const auto& tn = t->GetName();
if ( tn.size() == 0 || ! hm.HasGlobal(tn) ) if ( tn.empty() || ! hm.HasGlobal(tn) )
// No concern of collision since the type name // No concern of collision since the type name
// wasn't previously compiled. // wasn't previously compiled.
continue; continue;

View file

@ -17,7 +17,7 @@ enum DefPointType {
// Used to capture the notion "the variable may have no definition // Used to capture the notion "the variable may have no definition
// at this point" (or "has no definition", depending on whether we're // at this point" (or "has no definition", depending on whether we're
// concerned with minimal or maximal RDs). // concerned with minimal or maximal RDs).
NO_DEF, NO_DEF_POINT,
// Assigned at the given statement. // Assigned at the given statement.
STMT_DEF, STMT_DEF,
@ -49,7 +49,7 @@ public:
DefinitionPoint() DefinitionPoint()
{ {
o = nullptr; o = nullptr;
t = NO_DEF; t = NO_DEF_POINT;
} }
DefinitionPoint(const Stmt* s) DefinitionPoint(const Stmt* s)

View file

@ -1015,9 +1015,10 @@ ExprPtr TimesExpr::Reduce(Reducer* c, StmtPtr& red_stmt)
if ( (op1->IsZero() || op2->IsZero()) && if ( (op1->IsZero() || op2->IsZero()) &&
GetType()->Tag() != TYPE_DOUBLE ) GetType()->Tag() != TYPE_DOUBLE )
{ {
auto zero_val = op1->IsZero() ? if ( op1->IsZero() )
op1->Eval(nullptr) : op2->Eval(nullptr); return c->Fold(op1);
return make_intrusive<ConstExpr>(zero_val); else
return c->Fold(op2);
} }
return BinaryExpr::Reduce(c, red_stmt); return BinaryExpr::Reduce(c, red_stmt);
@ -1106,8 +1107,8 @@ static ExprPtr build_disjunction(std::vector<ConstExprPtr>& patterns)
ExprPtr e = patterns[0]; ExprPtr e = patterns[0];
for ( unsigned int i = 1; i < patterns.size(); ++i ) for ( auto& p : patterns )
e = make_intrusive<BitExpr>(EXPR_OR, e, patterns[i]); e = make_intrusive<BitExpr>(EXPR_OR, e, p);
return e; return e;
} }
@ -1427,6 +1428,19 @@ ExprPtr CondExpr::Reduce(Reducer* c, StmtPtr& red_stmt)
return op2; return op2;
} }
if ( op2->IsConst() && op3->IsConst() && GetType()->Tag() == TYPE_BOOL )
{
auto op2_t = op2->IsOne();
ASSERT(op2_t != op3->IsOne());
if ( op2_t )
// This is "var ? T : F", which can be replaced by var.
return op1;
// Instead we have "var ? F : T".
return make_intrusive<NotExpr>(op1);
}
if ( c->Optimizing() ) if ( c->Optimizing() )
return ThisPtr(); return ThisPtr();
@ -1610,12 +1624,22 @@ bool AssignExpr::HasReducedOps(Reducer* c) const
ExprPtr AssignExpr::Reduce(Reducer* c, StmtPtr& red_stmt) ExprPtr AssignExpr::Reduce(Reducer* c, StmtPtr& red_stmt)
{ {
// Yields a fully reduced assignment expression. // Yields a fully reduced assignment expression.
if ( c->Optimizing() ) if ( c->Optimizing() )
{ {
// Don't update the LHS, it's already in reduced form // Don't update the LHS, it's already in reduced form
// and it doesn't make sense to expand aliases or such. // and it doesn't make sense to expand aliases or such.
auto orig_op2 = op2;
op2 = c->UpdateExpr(op2); op2 = c->UpdateExpr(op2);
if ( op2 != orig_op2 && op2->Tag() == EXPR_CONST &&
op1->Tag() == EXPR_REF )
{
auto lhs = op1->GetOp1();
auto op2_c = cast_intrusive<ConstExpr>(op2);
if ( lhs->Tag() == EXPR_NAME )
c->FoldedTo(orig_op2, op2_c);
}
return ThisPtr(); return ThisPtr();
} }
@ -1738,7 +1762,7 @@ ExprPtr AssignExpr::Reduce(Reducer* c, StmtPtr& red_stmt)
red_stmt = op2->ReduceToSingletons(c); red_stmt = op2->ReduceToSingletons(c);
if ( op2->HasConstantOps() && op2->Tag() != EXPR_TO_ANY_COERCE ) if ( op2->HasConstantOps() && op2->Tag() != EXPR_TO_ANY_COERCE )
op2 = make_intrusive<ConstExpr>(op2->Eval(nullptr)); op2 = c->Fold(op2);
// Check once again for transformation, this time made possible // Check once again for transformation, this time made possible
// because the operands have been reduced. We don't simply // because the operands have been reduced. We don't simply
@ -2110,8 +2134,6 @@ ExprPtr ArithCoerceExpr::Reduce(Reducer* c, StmtPtr& red_stmt)
if ( ! op->IsReduced(c) ) if ( ! op->IsReduced(c) )
op = op->ReduceToSingleton(c, red_stmt); op = op->ReduceToSingleton(c, red_stmt);
auto t = type->InternalType();
if ( op->Tag() == EXPR_CONST ) if ( op->Tag() == EXPR_CONST )
{ {
const auto& t = GetType(); const auto& t = GetType();
@ -2129,9 +2151,18 @@ ExprPtr ArithCoerceExpr::Reduce(Reducer* c, StmtPtr& red_stmt)
if ( c->Optimizing() ) if ( c->Optimizing() )
return ThisPtr(); return ThisPtr();
auto bt = op->GetType()->InternalType(); const auto& ot = op->GetType();
auto bt = ot->InternalType();
auto tt = type->InternalType();
if ( t == bt ) if ( ot->Tag() == TYPE_VECTOR )
{
bt = ot->Yield()->InternalType();
tt = type->Yield()->InternalType();
}
if ( bt == tt )
// Can drop the conversion.
return op; return op;
return AssignToTemporary(c, red_stmt); return AssignToTemporary(c, red_stmt);

View file

@ -0,0 +1,17 @@
// See the file "COPYING" in the main distribution directory for copyright.
// Auxiliary information associated with expressions to aid script
// optimization.
#pragma once
namespace zeek::detail {
class ExprOptInfo {
public:
// The AST number of the statement in which this expression
// appears.
int stmt_num = -1; // -1 = not assigned yet
};
} // namespace zeek::detail

533
src/script_opt/GenIDDefs.cc Normal file
View file

@ -0,0 +1,533 @@
// See the file "COPYING" in the main distribution directory for copyright.
#include "zeek/Expr.h"
#include "zeek/Scope.h"
#include "zeek/Reporter.h"
#include "zeek/Desc.h"
#include "zeek/script_opt/GenIDDefs.h"
#include "zeek/script_opt/ScriptOpt.h"
#include "zeek/script_opt/ExprOptInfo.h"
#include "zeek/script_opt/StmtOptInfo.h"
namespace zeek::detail {
GenIDDefs::GenIDDefs(std::shared_ptr<ProfileFunc> _pf, const Func* f,
ScopePtr scope, StmtPtr body)
: pf(std::move(_pf))
{
TraverseFunction(f, scope, body);
}
void GenIDDefs::TraverseFunction(const Func* f, ScopePtr scope, StmtPtr body)
{
func_flavor = f->Flavor();
// Establish the outermost barrior and associated set of
// identifiers.
barrier_blocks.push_back(0);
modified_IDs.push_back({});
for ( const auto& g : pf->Globals() )
{
g->GetOptInfo()->Clear();
TrackID(g);
}
// Clear the locals before processing the arguments, since
// they're included among the locals.
for ( const auto& l : pf->Locals() )
l->GetOptInfo()->Clear();
const auto& args = scope->OrderedVars();
int nparam = f->GetType()->Params()->NumFields();
for ( const auto& a : args )
{
if ( --nparam < 0 )
break;
a->GetOptInfo()->Clear();
TrackID(a);
}
stmt_num = 0; // 0 = "before the first statement"
body->Traverse(this);
}
TraversalCode GenIDDefs::PreStmt(const Stmt* s)
{
curr_stmt = s;
auto si = s->GetOptInfo();
si->stmt_num = ++stmt_num;
si->block_level = confluence_blocks.size() + 1;
switch ( s->Tag() ) {
case STMT_CATCH_RETURN:
{
auto cr = s->AsCatchReturnStmt();
auto block = cr->Block();
StartConfluenceBlock(s);
block->Traverse(this);
EndConfluenceBlock();
auto retvar = cr->RetVar();
if ( retvar )
TrackID(retvar->Id());
return TC_ABORTSTMT;
}
case STMT_IF:
{
auto i = s->AsIfStmt();
auto cond = i->StmtExpr();
auto t_branch = i->TrueBranch();
auto f_branch = i->FalseBranch();
cond->Traverse(this);
StartConfluenceBlock(s);
t_branch->Traverse(this);
if ( ! t_branch->NoFlowAfter(false) )
BranchBeyond(curr_stmt, s, true);
f_branch->Traverse(this);
if ( ! f_branch->NoFlowAfter(false) )
BranchBeyond(curr_stmt, s, true);
EndConfluenceBlock(true);
return TC_ABORTSTMT;
}
case STMT_SWITCH:
{
auto sw = s->AsSwitchStmt();
auto e = sw->StmtExpr();
e->Traverse(this);
StartConfluenceBlock(sw);
for ( const auto& c : *sw->Cases() )
{
auto body = c->Body();
auto exprs = c->ExprCases();
if ( exprs )
exprs->Traverse(this);
auto type_ids = c->TypeCases();
if ( type_ids )
{
for ( const auto& id : *type_ids )
if ( id->Name() )
TrackID(id);
}
body->Traverse(this);
}
EndConfluenceBlock(sw->HasDefault());
return TC_ABORTSTMT;
}
case STMT_FOR:
{
auto f = s->AsForStmt();
auto ids = f->LoopVars();
auto e = f->LoopExpr();
auto body = f->LoopBody();
auto val_var = f->ValueVar();
e->Traverse(this);
for ( const auto& id : *ids )
TrackID(id);
if ( val_var )
TrackID(val_var);
StartConfluenceBlock(s);
body->Traverse(this);
if ( ! body->NoFlowAfter(false) )
BranchBackTo(curr_stmt, s, true);
EndConfluenceBlock();
return TC_ABORTSTMT;
}
case STMT_WHILE:
{
auto w = s->AsWhileStmt();
StartConfluenceBlock(s);
auto cond_pred_stmt = w->CondPredStmt();
if ( cond_pred_stmt )
cond_pred_stmt->Traverse(this);
// Important to traverse the condition in its version
// interpreted as a statement, so that when evaluating
// its variable usage, that's done in the context of
// *after* cond_pred_stmt executes, rather than as
// part of that execution.
auto cond_stmt = w->ConditionAsStmt();
cond_stmt->Traverse(this);
auto body = w->Body();
body->Traverse(this);
if ( ! body->NoFlowAfter(false) )
BranchBackTo(curr_stmt, s, true);
EndConfluenceBlock();
return TC_ABORTSTMT;
}
case STMT_WHEN:
{
// ### punt on these for now, need to reflect on bindings.
return TC_ABORTSTMT;
}
default:
return TC_CONTINUE;
}
}
TraversalCode GenIDDefs::PostStmt(const Stmt* s)
{
switch ( s->Tag() ) {
case STMT_INIT:
{
auto init = s->AsInitStmt();
auto& inits = init->Inits();
for ( const auto& id : inits )
{
auto id_t = id->GetType();
// Only aggregates get initialized.
if ( zeek::IsAggr(id->GetType()->Tag()) )
TrackID(id);
}
break;
}
case STMT_RETURN:
ReturnAt(s);
break;
case STMT_NEXT:
BranchBackTo(curr_stmt, FindLoop(), false);
break;
case STMT_BREAK:
{
auto target = FindBreakTarget();
if ( target )
BranchBeyond(s, target, false);
else
{
ASSERT(func_flavor == FUNC_FLAVOR_HOOK);
ReturnAt(s);
}
break;
}
case STMT_FALLTHROUGH:
// No need to do anything, the work all occurs
// with NoFlowAfter.
break;
default:
break;
}
return TC_CONTINUE;
}
TraversalCode GenIDDefs::PreExpr(const Expr* e)
{
e->GetOptInfo()->stmt_num = stmt_num;
switch ( e->Tag() ) {
case EXPR_NAME:
CheckVarUsage(e, e->AsNameExpr()->Id());
break;
case EXPR_ASSIGN:
{
auto lhs = e->GetOp1();
auto op2 = e->GetOp2();
if ( lhs->Tag() == EXPR_LIST &&
op2->GetType()->Tag() != TYPE_ANY )
{
// This combination occurs only for assignments used
// to initialize table entries. Treat it as references
// to both the lhs and the rhs, not as an assignment.
return TC_CONTINUE;
}
op2->Traverse(this);
if ( ! CheckLHS(lhs, op2) )
// Not a simple assignment (or group of assignments),
// so analyze the accesses to check for use of
// possibly undefined values.
lhs->Traverse(this);
return TC_ABORTSTMT;
}
case EXPR_COND:
// Special hack. We turn off checking for usage issues
// inside conditionals. This is because we use them heavily
// to deconstruct logical expressions for which the actual
// operand access is safe (guaranteed not to access a value
// that hasn't been undefined), but the flow analysis has
// trouble determining that.
++suppress_usage;
e->GetOp1()->Traverse(this);
e->GetOp2()->Traverse(this);
e->GetOp3()->Traverse(this);
--suppress_usage;
return TC_ABORTSTMT;
case EXPR_LAMBDA:
{
auto l = static_cast<const LambdaExpr*>(e);
const auto& ids = l->OuterIDs();
for ( auto& id : ids )
CheckVarUsage(e, id);
// Don't descend into the lambda body - we'll analyze and
// optimize it separately, as its own function.
return TC_ABORTSTMT;
}
default:
break;
}
return TC_CONTINUE;
}
TraversalCode GenIDDefs::PostExpr(const Expr* e)
{
// Attend to expressions that reflect assignments after
// execution, but for which the assignment target was
// also an accessed value (so if we analyzed them
// in PreExpr then we'd have had to do manual traversals
// of their operands).
auto t = e->Tag();
if ( t == EXPR_INCR || t == EXPR_DECR ||
t == EXPR_ADD_TO || t == EXPR_REMOVE_FROM )
{
auto op = e->GetOp1();
if ( ! IsAggr(op) )
(void) CheckLHS(op);
}
return TC_CONTINUE;
}
bool GenIDDefs::CheckLHS(const ExprPtr& lhs, const ExprPtr& rhs)
{
switch ( lhs->Tag() ) {
case EXPR_REF:
return CheckLHS(lhs->GetOp1(), rhs);
case EXPR_NAME:
{
auto n = lhs->AsNameExpr();
TrackID(n->Id(), rhs);
return true;
}
case EXPR_LIST:
{ // look for [a, b, c] = any_val
auto l = lhs->AsListExpr();
for ( const auto& expr : l->Exprs() )
{
if ( expr->Tag() != EXPR_NAME )
// This will happen for table initializers,
// for example.
return false;
auto n = expr->AsNameExpr();
TrackID(n->Id());
}
return true;
}
case EXPR_FIELD:
// If we want to track record field initializations,
// we'd handle that here.
return false;
case EXPR_INDEX:
// If we wanted to track potential alterations of
// aggregates, we'd do that here.
return false;
default:
reporter->InternalError("bad tag in GenIDDefs::CheckLHS");
}
}
bool GenIDDefs::IsAggr(const Expr* e) const
{
if ( e->Tag() != EXPR_NAME )
return false;
auto n = e->AsNameExpr();
auto id = n->Id();
auto tag = id->GetType()->Tag();
return zeek::IsAggr(tag);
}
void GenIDDefs::CheckVarUsage(const Expr* e, const ID* id)
{
if ( analysis_options.usage_issues != 1 || id->IsGlobal() ||
suppress_usage > 0 )
return;
auto oi = id->GetOptInfo();
if ( ! oi->DidUndefinedWarning() && ! oi->IsDefinedBefore(curr_stmt) &&
! id->GetAttr(ATTR_IS_ASSIGNED) )
{
if ( ! oi->IsPossiblyDefinedBefore(curr_stmt) )
{
e->Warn("used without definition");
oi->SetDidUndefinedWarning();
}
else if ( ! oi->DidPossiblyUndefinedWarning() )
{
e->Warn("possibly used without definition");
oi->SetDidPossiblyUndefinedWarning();
}
}
}
void GenIDDefs::StartConfluenceBlock(const Stmt* s)
{
if ( s->Tag() == STMT_CATCH_RETURN )
barrier_blocks.push_back(confluence_blocks.size());
confluence_blocks.push_back(s);
modified_IDs.push_back({});
}
void GenIDDefs::EndConfluenceBlock(bool no_orig)
{
for ( auto id : modified_IDs.back() )
id->GetOptInfo()->ConfluenceBlockEndsAfter(curr_stmt, no_orig);
confluence_blocks.pop_back();
int bb = barrier_blocks.back();
if ( bb > 0 && confluence_blocks.size() == bb )
barrier_blocks.pop_back();
modified_IDs.pop_back();
}
void GenIDDefs::BranchBackTo(const Stmt* from, const Stmt* to, bool close_all)
{
for ( auto id : modified_IDs.back() )
id->GetOptInfo()->BranchBackTo(from, to, close_all);
}
void GenIDDefs::BranchBeyond(const Stmt* from, const Stmt* to, bool close_all)
{
for ( auto id : modified_IDs.back() )
id->GetOptInfo()->BranchBeyond(from, to, close_all);
to->GetOptInfo()->contains_branch_beyond = true;
}
const Stmt* GenIDDefs::FindLoop()
{
int i = confluence_blocks.size() - 1;
while ( i >= 0 )
{
auto t = confluence_blocks[i]->Tag();
if ( t == STMT_WHILE || t == STMT_FOR )
break;
--i;
}
ASSERT(i >= 0);
return confluence_blocks[i];
}
const Stmt* GenIDDefs::FindBreakTarget()
{
int i = confluence_blocks.size() - 1;
while ( i >= 0 )
{
auto cb = confluence_blocks[i];
auto t = cb->Tag();
if ( t == STMT_WHILE || t == STMT_FOR || t == STMT_SWITCH )
return cb;
--i;
}
return nullptr;
}
void GenIDDefs::ReturnAt(const Stmt* s)
{
for ( auto id : modified_IDs.back() )
id->GetOptInfo()->ReturnAt(s);
}
void GenIDDefs::TrackID(const ID* id, const ExprPtr& e)
{
auto oi = id->GetOptInfo();
ASSERT(! barrier_blocks.empty());
oi->DefinedAfter(curr_stmt, e,
confluence_blocks, barrier_blocks.back());
// Ensure we track this identifier across all relevant
// confluence regions.
for ( int i = barrier_blocks.back(); i < confluence_blocks.size(); ++i )
// Add one because modified_IDs includes outer non-confluence
// block.
modified_IDs[i+1].insert(id);
if ( confluence_blocks.empty() )
// This is a definition at the outermost level.
modified_IDs[0].insert(id);
}
} // zeek::detail

116
src/script_opt/GenIDDefs.h Normal file
View file

@ -0,0 +1,116 @@
// See the file "COPYING" in the main distribution directory for copyright.
// Class for generating identifier definition information by traversing
// a function body's AST.
#pragma once
#include "zeek/script_opt/IDOptInfo.h"
#include "zeek/script_opt/ProfileFunc.h"
namespace zeek::detail {
class GenIDDefs : public TraversalCallback {
public:
GenIDDefs(std::shared_ptr<ProfileFunc> _pf, const Func* f,
ScopePtr scope, StmtPtr body);
private:
// Traverses the given function body, using the first two
// arguments for context.
void TraverseFunction(const Func* f, ScopePtr scope, StmtPtr body);
TraversalCode PreStmt(const Stmt*) override;
TraversalCode PostStmt(const Stmt*) override;
TraversalCode PreExpr(const Expr*) override;
TraversalCode PostExpr(const Expr*) override;
// Analyzes the target of an assignment. Returns true if the LHS
// was an expression for which we can track it as a definition
// (e.g., assignments to variables, but not to elements of
// aggregates). "rhs" gives the expression used for simple direct
// assignments.
bool CheckLHS(const ExprPtr& lhs, const ExprPtr& rhs = nullptr);
// True if the given expression directly represents an aggregate.
bool IsAggr(const ExprPtr& e) const { return IsAggr(e.get()); }
bool IsAggr(const Expr* e) const;
// If -u is active, checks for whether the given identifier present
// in the given expression is undefined at that point.
void CheckVarUsage(const Expr* e, const ID* id);
// Begin a new confluence block with the given statement.
void StartConfluenceBlock(const Stmt* s);
// Finish up the current confluence block. If no_orig_flow is true,
// then there's no control flow from the origin (the statement that
// starts the block).
void EndConfluenceBlock(bool no_orig_flow = false);
// Note branches from the given "from" statement back up to the
// beginning of, or just past, the "to" statement. If "close_all"
// is true then the nature of the branch is that it terminates
// all pending confluence blocks.
void BranchBackTo(const Stmt* from, const Stmt* to, bool close_all);
void BranchBeyond(const Stmt* from, const Stmt* to, bool close_all);
// These search back through the active confluence blocks looking
// for either the innermost loop, or the innermost block for which
// a "break" would target going beyond that block.
const Stmt* FindLoop();
const Stmt* FindBreakTarget();
// Note that the given statement executes a "return" (which could
// instead be an outer "break" for a hook).
void ReturnAt(const Stmt* s);
// Tracks that the given identifier is defined at the current
// statement in the current confluence block. 'e' is the
// expression used to define the identifier, for simple direct
// assignments.
void TrackID(const IDPtr& id, const ExprPtr& e = nullptr)
{ TrackID(id.get(), e); }
void TrackID(const ID* id, const ExprPtr& e = nullptr);
// Profile for the function. Currently, all we actually need from
// this is the list of globals and locals.
std::shared_ptr<ProfileFunc> pf;
// Whether the Func is an event/hook/function. We currently only
// need to know whether it's a hook, so we correctly interpret an
// outer "break" in that context.
FunctionFlavor func_flavor;
// The statement we are currently traversing.
const Stmt* curr_stmt = nullptr;
// Used to number Stmt objects found during AST traversal.
int stmt_num;
// A stack of confluence blocks, with the innermost at the top/back.
std::vector<const Stmt*> confluence_blocks;
// Index into confluence_blocks of "barrier" blocks that
// represent unavoidable confluence blocks (no branching
// out of them). These include the outermost block and
// any catch-return blocks. We track these because
// (1) there's no need for an IDOptInfo to track previously
// unseen confluence regions outer to those, and (2) they
// can get quite deep due when inlining, so there are savings
// to avoid having to track outer to them.
std::vector<int> barrier_blocks;
// The following is parallel to confluence_blocks except
// the front entry tracks identifiers at the outermost
// (non-confluence) scope. Thus, to index it for a given
// confluence block i, we need to use i+1.
std::vector<std::unordered_set<const ID*>> modified_IDs;
// If non-zero, indicates we should suspend any generation
// of usage errors. A counter rather than a boolean because
// such situations might nest.
int suppress_usage = 0;
};
} // zeek::detail

View file

@ -10,6 +10,13 @@
namespace zeek::detail { namespace zeek::detail {
RD_Decorate::RD_Decorate(std::shared_ptr<ProfileFunc> _pf, const Func* f,
ScopePtr scope, StmtPtr body)
: pf(std::move(_pf))
{
TraverseFunction(f, scope, body);
}
void RD_Decorate::TraverseFunction(const Func* f, ScopePtr scope, StmtPtr body) void RD_Decorate::TraverseFunction(const Func* f, ScopePtr scope, StmtPtr body)
{ {
func_flavor = f->Flavor(); func_flavor = f->Flavor();
@ -350,7 +357,7 @@ void RD_Decorate::TraverseSwitch(const SwitchStmt* sw)
bd->Clear(); bd->Clear();
body->Traverse(this); body->Traverse(this);
if ( bd->PreRDs().size() > 0 ) if ( ! bd->PreRDs().empty() )
reporter->InternalError("mispropagation of switch body defs"); reporter->InternalError("mispropagation of switch body defs");
if ( body->NoFlowAfter(true) ) if ( body->NoFlowAfter(true) )
@ -537,7 +544,7 @@ TraversalCode RD_Decorate::PostStmt(const Stmt* s)
break; break;
case STMT_BREAK: case STMT_BREAK:
if ( block_defs.size() == 0 ) if ( block_defs.empty() )
{ {
if ( func_flavor == FUNC_FLAVOR_HOOK ) if ( func_flavor == FUNC_FLAVOR_HOOK )
// Treat as a return. // Treat as a return.
@ -634,7 +641,7 @@ bool RD_Decorate::CheckLHS(const Expr* lhs, const Expr* e)
for ( const auto& expr : l->Exprs() ) for ( const auto& expr : l->Exprs() )
{ {
if ( expr->Tag() != EXPR_NAME ) if ( expr->Tag() != EXPR_NAME )
// This will happen for table initialiers, // This will happen for table initializers,
// for example. // for example.
return false; return false;

View file

@ -50,9 +50,12 @@ private:
class RD_Decorate : public TraversalCallback { class RD_Decorate : public TraversalCallback {
public: public:
RD_Decorate(std::shared_ptr<ProfileFunc> _pf) : pf(std::move(_pf)) RD_Decorate(std::shared_ptr<ProfileFunc> _pf, const Func* f,
{ } ScopePtr scope, StmtPtr body);
const DefSetsMgr* GetDefSetsMgr() const { return &mgr; }
private:
// Traverses the given function body, using the first two // Traverses the given function body, using the first two
// arguments for context. // arguments for context.
void TraverseFunction(const Func* f, ScopePtr scope, StmtPtr body); void TraverseFunction(const Func* f, ScopePtr scope, StmtPtr body);
@ -62,9 +65,6 @@ public:
TraversalCode PreExpr(const Expr*) override; TraversalCode PreExpr(const Expr*) override;
TraversalCode PostExpr(const Expr*) override; TraversalCode PostExpr(const Expr*) override;
const DefSetsMgr* GetDefSetsMgr() const { return &mgr; }
private:
// The following implement various types of "confluence", i.e., // The following implement various types of "confluence", i.e.,
// situations in which control flow merges from multiple possible // situations in which control flow merges from multiple possible
// paths to a given point. // paths to a given point.

522
src/script_opt/IDOptInfo.cc Normal file
View file

@ -0,0 +1,522 @@
// See the file "COPYING" in the main distribution directory for copyright.
#include "zeek/Stmt.h"
#include "zeek/Expr.h"
#include "zeek/Desc.h"
#include "zeek/script_opt/IDOptInfo.h"
#include "zeek/script_opt/StmtOptInfo.h"
namespace zeek::detail {
const char* trace_ID = nullptr;
IDDefRegion::IDDefRegion(const Stmt* s, bool maybe, int def)
{
start_stmt = s->GetOptInfo()->stmt_num;
block_level = s->GetOptInfo()->block_level;
Init(maybe, def);
}
IDDefRegion::IDDefRegion(int stmt_num, int level, bool maybe, int def)
{
start_stmt = stmt_num;
block_level = level;
Init(maybe, def);
}
IDDefRegion::IDDefRegion(const Stmt* s, const IDDefRegion& ur)
{
start_stmt = s->GetOptInfo()->stmt_num;
block_level = s->GetOptInfo()->block_level;
Init(ur.MaybeDefined(), ur.DefinedAfter());
SetDefExpr(ur.DefExprAfter());
}
void IDDefRegion::Dump() const
{
printf("\t%d->%d (%d): ", start_stmt, end_stmt, block_level);
if ( defined != NO_DEF )
printf("%d (%s)", defined, def_expr ? obj_desc(def_expr.get()).c_str() : "<no expr>");
else if ( maybe_defined )
printf("?");
else
printf("N/A");
printf("\n");
}
void IDOptInfo::Clear()
{
static bool did_init = false;
if ( ! did_init )
{
trace_ID = getenv("ZEEK_TRACE_ID");
did_init = true;
}
init_exprs.clear();
usage_regions.clear();
pending_confluences.clear();
confluence_stmts.clear();
tracing = trace_ID && util::streq(trace_ID, my_id->Name());
}
void IDOptInfo::DefinedAfter(const Stmt* s, const ExprPtr& e,
const std::vector<const Stmt*>& conf_blocks,
int conf_start)
{
if ( tracing )
printf("ID %s defined at %d: %s\n", trace_ID, s ? s->GetOptInfo()->stmt_num : NO_DEF, s ? obj_desc(s).c_str() : "<entry>");
if ( ! s )
{ // This is a definition-upon-entry
ASSERT(usage_regions.empty());
usage_regions.emplace_back(0, 0, true, 0);
if ( tracing )
DumpBlocks();
return;
}
auto s_oi = s->GetOptInfo();
auto stmt_num = s_oi->stmt_num;
if ( usage_regions.empty() )
{
// We're seeing this identifier for the first time,
// so we don't have any context or confluence
// information for it. Create its "backstory" region.
ASSERT(confluence_stmts.empty());
usage_regions.emplace_back(0, 0, false, NO_DEF);
}
// Any pending regions stop prior to this statement.
EndRegionsAfter(stmt_num - 1, s_oi->block_level);
// Fill in any missing confluence blocks.
int b = 0; // index into our own blocks
int n = confluence_stmts.size();
while ( b < n && conf_start < conf_blocks.size() )
{
auto outer_block = conf_blocks[conf_start];
// See if we can find that block.
for ( ; b < n; ++b )
if ( confluence_stmts[b] == outer_block )
break;
if ( b < n )
{ // We found it, look for the next one.
++conf_start;
++b;
}
}
// Add in the remainder.
for ( ; conf_start < conf_blocks.size(); ++conf_start )
StartConfluenceBlock(conf_blocks[conf_start]);
// Create a new region corresponding to this definition.
// This needs to come after filling out the confluence
// blocks, since they'll create their own (earlier) regions.
usage_regions.emplace_back(s, true, stmt_num);
usage_regions.back().SetDefExpr(e);
if ( tracing )
DumpBlocks();
}
void IDOptInfo::ReturnAt(const Stmt* s)
{
if ( tracing )
printf("ID %s subject to return %d: %s\n", trace_ID, s->GetOptInfo()->stmt_num, obj_desc(s).c_str());
// Look for a catch-return that this would branch to.
for ( int i = confluence_stmts.size() - 1; i >= 0; --i )
if ( confluence_stmts[i]->Tag() == STMT_CATCH_RETURN )
{
BranchBeyond(s, confluence_stmts[i], false);
if ( tracing )
DumpBlocks();
return;
}
auto s_oi = s->GetOptInfo();
EndRegionsAfter(s_oi->stmt_num - 1, s_oi->block_level);
if ( tracing )
DumpBlocks();
}
void IDOptInfo::BranchBackTo(const Stmt* from, const Stmt* to, bool close_all)
{
if ( tracing )
printf("ID %s branching back from %d->%d: %s\n", trace_ID,
from->GetOptInfo()->stmt_num,
to->GetOptInfo()->stmt_num, obj_desc(from).c_str());
// The key notion we need to update is whether the regions
// between from_reg and to_reg still have unique definitions.
// Confluence due to the branch can only take that away, it
// can't instill it. (OTOH, in principle it could update
// "maybe defined", but not in a way we care about, since we
// only draw upon that for diagnosing usage errors, and for
// those the error has already occurred on entry into the loop.)
auto from_reg = ActiveRegion();
auto f_oi = from->GetOptInfo();
auto t_oi = to->GetOptInfo();
auto t_r_ind = FindRegionBeforeIndex(t_oi->stmt_num);
auto& t_r = usage_regions[t_r_ind];
if ( from_reg && from_reg->DefinedAfter() != t_r.DefinedAfter() &&
t_r.DefinedAfter() != NO_DEF )
{
// They disagree on the definition. Move the definition
// point to be the start of the confluence region, and
// update any blocks inside the region that refer to
// a pre-"to" definition to instead reflect the confluence
// region (and remove their definition expressions).
int new_def = t_oi->stmt_num;
for ( auto i = t_r_ind; i < usage_regions.size(); ++i )
{
auto& ur = usage_regions[i];
if ( ur.DefinedAfter() < new_def )
{
ASSERT(ur.DefinedAfter() != NO_DEF);
ur.UpdateDefinedAfter(new_def);
ur.SetDefExpr(nullptr);
}
}
}
int level = close_all ? t_oi->block_level + 1 : f_oi->block_level;
EndRegionsAfter(f_oi->stmt_num, level);
if ( tracing )
DumpBlocks();
}
void IDOptInfo::BranchBeyond(const Stmt* end_s, const Stmt* block,
bool close_all)
{
if ( tracing )
printf("ID %s branching forward from %d beyond %d: %s\n",
trace_ID, end_s->GetOptInfo()->stmt_num,
block->GetOptInfo()->stmt_num, obj_desc(end_s).c_str());
ASSERT(pending_confluences.count(block) > 0);
auto ar = ActiveRegionIndex();
if ( ar != NO_DEF )
pending_confluences[block].insert(ar);
auto end_oi = end_s->GetOptInfo();
int level;
if ( close_all )
level = block->GetOptInfo()->block_level + 1;
else
level = end_oi->block_level;
EndRegionsAfter(end_oi->stmt_num, level);
if ( tracing )
DumpBlocks();
}
void IDOptInfo::StartConfluenceBlock(const Stmt* s)
{
if ( tracing )
printf("ID %s starting confluence block at %d: %s\n", trace_ID, s->GetOptInfo()->stmt_num, obj_desc(s).c_str());
auto s_oi = s->GetOptInfo();
int block_level = s_oi->block_level;
// End any confluence blocks at this or inner levels.
for ( auto cs : confluence_stmts )
{
ASSERT(cs != s);
auto cs_level = cs->GetOptInfo()->block_level;
if ( cs_level >= block_level )
{
ASSERT(cs_level == block_level);
ASSERT(cs == confluence_stmts.back());
EndRegionsAfter(s_oi->stmt_num - 1, block_level);
}
}
pending_confluences[s] = {};
confluence_stmts.push_back(s);
block_has_orig_flow.push_back(s_oi->contains_branch_beyond);
// Inherit the closest open, outer region, if necessary.
for ( int i = usage_regions.size() - 1; i >= 0; --i )
{
auto& ur = usage_regions[i];
if ( ur.EndsAfter() == NO_DEF )
{
if ( ur.BlockLevel() > block_level )
{
// This can happen for regions left over
// from a previous catch-return, which
// we haven't closed out yet because we
// don't track new identifiers beyond
// outer CRs. Close the region now.
ASSERT(s->Tag() == STMT_CATCH_RETURN);
ur.SetEndsAfter(s_oi->stmt_num - 1);
continue;
}
if ( ur.BlockLevel() < block_level )
// Didn't find one at our own level,
// so create on inherited from the
// outer one.
usage_regions.emplace_back(s, ur);
// We now have one at our level that we can use.
break;
}
}
if ( tracing )
DumpBlocks();
}
void IDOptInfo::ConfluenceBlockEndsAfter(const Stmt* s, bool no_orig_flow)
{
auto stmt_num = s->GetOptInfo()->stmt_num;
ASSERT(! confluence_stmts.empty());
auto cs = confluence_stmts.back();
auto& pc = pending_confluences[cs];
// End any active regions. Those will all have a level >= that
// of cs, since we're now returning to cs's level.
int cs_stmt_num = cs->GetOptInfo()->stmt_num;
int cs_level = cs->GetOptInfo()->block_level;
if ( tracing )
printf("ID %s ending (%d) confluence block (%d, level %d) at %d: %s\n", trace_ID, no_orig_flow, cs_stmt_num, cs_level, stmt_num, obj_desc(s).c_str());
if ( block_has_orig_flow.back() )
no_orig_flow = false;
// Compute the state of the definition at the point of confluence:
// whether it's at least could-be-defined, whether it's definitely
// defined and if so whether it has a single point of definition.
bool maybe = false;
bool defined = true;
bool did_single_def = false;
int single_def = NO_DEF;
ExprPtr single_def_expr;
bool have_multi_defs = false;
int num_regions = 0;
for ( auto i = 0; i < usage_regions.size(); ++i )
{
auto& ur = usage_regions[i];
if ( ur.BlockLevel() < cs_level )
// Region is not applicable.
continue;
if ( ur.EndsAfter() == NO_DEF )
{ // End this region.
ur.SetEndsAfter(stmt_num);
if ( ur.StartsAfter() <= cs_stmt_num && no_orig_flow &&
pc.count(i) == 0 )
// Don't include this region in our assessment.
continue;
}
else if ( ur.EndsAfter() < cs_stmt_num )
// Irrelevant, didn't extend into confluence region.
// We test here just to avoid the set lookup in
// the next test, which presumably will sometimes
// be a tad expensive.
continue;
else if ( pc.count(i) == 0 )
// This region isn't active, and we're not
// tracking it for confluence.
continue;
++num_regions;
maybe = maybe || ur.MaybeDefined();
if ( ur.DefinedAfter() == NO_DEF )
{
defined = false;
continue;
}
if ( did_single_def )
{
if ( single_def != ur.DefinedAfter() )
have_multi_defs = true;
}
else
{
single_def = ur.DefinedAfter();
single_def_expr = ur.DefExprAfter();
did_single_def = true;
}
}
if ( num_regions == 0 )
{ // Nothing survives.
ASSERT(maybe == false);
defined = false;
}
if ( ! defined )
{
single_def = NO_DEF;
have_multi_defs = false;
}
if ( have_multi_defs )
// Definition reflects confluence point, which comes
// just after 's'.
single_def = stmt_num + 1;
int level = cs->GetOptInfo()->block_level;
usage_regions.emplace_back(stmt_num, level, maybe, single_def);
if ( single_def != NO_DEF && ! have_multi_defs )
usage_regions.back().SetDefExpr(single_def_expr);
confluence_stmts.pop_back();
block_has_orig_flow.pop_back();
pending_confluences.erase(cs);
if ( tracing )
DumpBlocks();
}
bool IDOptInfo::IsPossiblyDefinedBefore(const Stmt* s)
{
return IsPossiblyDefinedBefore(s->GetOptInfo()->stmt_num);
}
bool IDOptInfo::IsDefinedBefore(const Stmt* s)
{
return IsDefinedBefore(s->GetOptInfo()->stmt_num);
}
int IDOptInfo::DefinitionBefore(const Stmt* s)
{
return DefinitionBefore(s->GetOptInfo()->stmt_num);
}
ExprPtr IDOptInfo::DefExprBefore(const Stmt* s)
{
return DefExprBefore(s->GetOptInfo()->stmt_num);
}
bool IDOptInfo::IsPossiblyDefinedBefore(int stmt_num)
{
if ( usage_regions.empty() )
return false;
return FindRegionBefore(stmt_num).MaybeDefined();
}
bool IDOptInfo::IsDefinedBefore(int stmt_num)
{
if ( usage_regions.empty() )
return false;
return FindRegionBefore(stmt_num).DefinedAfter() != NO_DEF;
}
int IDOptInfo::DefinitionBefore(int stmt_num)
{
if ( usage_regions.empty() )
return NO_DEF;
return FindRegionBefore(stmt_num).DefinedAfter();
}
ExprPtr IDOptInfo::DefExprBefore(int stmt_num)
{
if ( usage_regions.empty() )
return nullptr;
return FindRegionBefore(stmt_num).DefExprAfter();
}
void IDOptInfo::EndRegionsAfter(int stmt_num, int level)
{
for ( int i = usage_regions.size() - 1; i >= 0; --i )
{
auto& ur = usage_regions[i];
if ( ur.BlockLevel() < level )
return;
if ( ur.EndsAfter() == NO_DEF )
ur.SetEndsAfter(stmt_num);
}
}
int IDOptInfo::FindRegionBeforeIndex(int stmt_num)
{
int region_ind = NO_DEF;
for ( auto i = 0; i < usage_regions.size(); ++i )
{
auto ur = usage_regions[i];
if ( ur.StartsAfter() >= stmt_num )
break;
if ( ur.EndsAfter() == NO_DEF )
// It's active for everything beyond its start.
region_ind = i;
else if ( ur.EndsAfter() >= stmt_num - 1 )
// It's active at the beginning of the statement of
// interest.
region_ind = i;
}
ASSERT(region_ind != NO_DEF);
return region_ind;
}
int IDOptInfo::ActiveRegionIndex()
{
int i;
for ( i = usage_regions.size() - 1; i >= 0; --i )
if ( usage_regions[i].EndsAfter() == NO_DEF )
return i;
return NO_DEF;
}
void IDOptInfo::DumpBlocks() const
{
for ( auto& ur : usage_regions )
ur.Dump();
printf("<end>\n");
}
} // zeek::detail

272
src/script_opt/IDOptInfo.h Normal file
View file

@ -0,0 +1,272 @@
// See the file "COPYING" in the main distribution directory for copyright.
// Auxiliary information associated with identifiers to aid script
// optimization.
#pragma once
#include <set>
#include "zeek/IntrusivePtr.h"
namespace zeek::detail {
class Expr;
class Stmt;
using ExprPtr = IntrusivePtr<Expr>;
#define NO_DEF -1
// This class tracks a single region during which an identifier has
// a consistent state of definition, meaning either it's (1) defined
// as of its value after a specific statement, (2) might-or-might-not
// be defined, or (3) definitely not defined.
class IDDefRegion {
public:
IDDefRegion(const Stmt* s, bool maybe, int def);
IDDefRegion(int stmt_num, int level, bool maybe, int def);
IDDefRegion(const Stmt* s, const IDDefRegion& ur);
void Init(bool maybe, int def)
{
if ( def != NO_DEF )
maybe_defined = true;
else
maybe_defined = maybe;
defined = def;
}
// Returns the starting point of the region, i.e., the number
// of the statement *after* which executing this region begins.
int StartsAfter() const { return start_stmt; }
// Returns or sets the ending point of the region, i.e., the
// last statement for which this region applies (including executing
// that statement). A value of NO_DEF means that the region
// continues indefinitely, i.e., we haven't yet encountered its end.
int EndsAfter() const { return end_stmt; }
void SetEndsAfter(int _end_stmt) { end_stmt = _end_stmt; }
// The confluence nesting level associated with the region. Other
// regions that overlap take precedence if they have a higher
// (= more inner) block level.
int BlockLevel() const { return block_level; }
// True if in the region the identifer could be defined.
bool MaybeDefined() const { return maybe_defined; }
// Returns (or sets) the statement after which the identifer is
// (definitely) defined, or NO_DEF if it doesn't have a definite
// point of definition.
int DefinedAfter() const { return defined; }
void UpdateDefinedAfter(int _defined) { defined = _defined; }
// Returns (or sets) the expression used to define the identifier,
// if any. Note that an identifier can be definitely defined
// (i.e., DefinedAfter() returns a statement number, not NO_DEF)
// but not have an associated expression, if the point-of-definition
// is the end of a confluence block.
const ExprPtr& DefExprAfter() const { return def_expr; }
void SetDefExpr(ExprPtr e) { def_expr = e; }
// Used for debugging.
void Dump() const;
protected:
// Number of the statement for which this region applies *after*
// its execution.
int start_stmt;
// Number of the statement that this region applies to, *after*
// its execution.
int end_stmt = NO_DEF; // means the region hasn't ended yet
// Degree of confluence nesting associated with this region.
int block_level;
// Identifier could be defined in this region.
bool maybe_defined;
// If not NO_DEF, then the statement number of either the identifier's
// definition, or its confluence point if multiple, differing
// definitions come together.
int defined;
// The expression used to define the identifier in this region.
// Nil if either it's ambiguous (due to confluence), or the
// identifier isn't guaranteed to be defined.
ExprPtr def_expr;
};
// Class tracking optimization information associated with identifiers.
class IDOptInfo {
public:
IDOptInfo(const ID* id) { my_id = id; }
// Reset all computed information about the identifier. Used
// when making a second pass over an AST after optimizing it,
// to avoid inheriting now-stale information.
void Clear();
// Used to track expressions employed when explicitly initializing
// the identifier. These are needed by compile-to-C++ script
// optimization. They're not used by ZAM optimization.
void AddInitExpr(ExprPtr init_expr);
const std::vector<ExprPtr>& GetInitExprs() const
{ return init_exprs; }
// Associated constant expression, if any. This is only set
// for identifiers that are aliases for a constant (i.e., there
// are no other assignments to them).
const ConstExpr* Const() const { return const_expr; }
// The most use of "const" in any single line in the Zeek
// codebase :-P ... though only by one!
void SetConst(const ConstExpr* _const) { const_expr = _const; }
// Whether the identifier is a temporary variable. Temporaries
// are guaranteed to have exactly one point of definition.
bool IsTemp() const { return is_temp; }
void SetTemp() { is_temp = true; }
// Called when the identifier is defined via execution of the
// given statement, with an assignment to the expression 'e'
// (only non-nil for simple direct assignments). "conf_blocks"
// gives the full set of surrounding confluence statements.
// It should be processed starting at conf_start (note that
// conf_blocks may be empty).
void DefinedAfter(const Stmt* s, const ExprPtr& e,
const std::vector<const Stmt*>& conf_blocks,
int conf_start);
// Called upon encountering a "return" statement.
void ReturnAt(const Stmt* s);
// Called when the current region ends with a backwards branch,
// possibly across multiple block levels, occurring at "from"
// and going into the block "to". If "close_all" is true then
// any pending regions at a level inner to "to" should be
// closed; if not, just those at "from"'s level.
void BranchBackTo(const Stmt* from, const Stmt* to, bool close_all);
// Called when the current region ends at statement end_s with a
// forwards branch, possibly across multiple block levels, to
// the statement that comes right after the execution of "block".
// See above re "close_all".
void BranchBeyond(const Stmt* end_s, const Stmt* block, bool close_all);
// Start tracking a confluence block that begins with the body
// of s (not s itself).
void StartConfluenceBlock(const Stmt* s);
// Finish tracking confluence; s is the last point of execution
// prior to leaving a block. If no_orig_flow is true, then
// the region for 's' itself does not continue to the end of
// the block.
void ConfluenceBlockEndsAfter(const Stmt* s, bool no_orig_flow);
// All of these regard the identifer's state just *prior* to
// executing the given statement.
bool IsPossiblyDefinedBefore(const Stmt* s);
bool IsDefinedBefore(const Stmt* s);
int DefinitionBefore(const Stmt* s);
ExprPtr DefExprBefore(const Stmt* s);
// Same, but using statement numbers.
bool IsPossiblyDefinedBefore(int stmt_num);
bool IsDefinedBefore(int stmt_num);
int DefinitionBefore(int stmt_num);
ExprPtr DefExprBefore(int stmt_num);
// The following are used to avoid multiple error messages
// for use of undefined variables.
bool DidUndefinedWarning() const
{ return did_undefined_warning; }
bool DidPossiblyUndefinedWarning() const
{ return did_possibly_undefined_warning; }
void SetDidUndefinedWarning()
{ did_undefined_warning = true; }
void SetDidPossiblyUndefinedWarning()
{ did_possibly_undefined_warning = true; }
private:
// End any active regions that are at or inner to the given level.
void EndRegionsAfter(int stmt_num, int level);
// Find the region that applies *before* executing the given
// statement. There should always be such a region.
IDDefRegion& FindRegionBefore(int stmt_num)
{ return usage_regions[FindRegionBeforeIndex(stmt_num)]; }
int FindRegionBeforeIndex(int stmt_num);
// Return the current "active" region, if any. The active region
// is the innermost region that currently has an end of NO_DEF,
// meaning we have not yet found its end.
IDDefRegion* ActiveRegion()
{
auto ind = ActiveRegionIndex();
return ind >= 0 ? &usage_regions[ind] : nullptr;
}
int ActiveRegionIndex();
// Used for debugging.
void DumpBlocks() const;
// Expressions used to initialize the identifier, for use by
// the scripts-to-C++ compiler. We need to track all of them
// because it's possible that a global value gets created using
// one of the earlier instances rather than the last one.
std::vector<ExprPtr> init_exprs;
// If non-nil, a constant that this identifier always holds
// once initially defined.
const ConstExpr* const_expr = nullptr;
// The different usage regions associated with the identifier.
// These are constructed such that they're always with non-decreasing
// starting statements.
std::vector<IDDefRegion> usage_regions;
// A type for collecting the indices of usage_regions that will
// all have confluence together at one point. Used to track
// things like "break" statements that jump out of loops or
// switch confluence regions.
using ConfluenceSet = std::set<int>;
// Maps loops/switches/catch-returns to their associated
// confluence sets.
std::map<const Stmt*, ConfluenceSet> pending_confluences;
// A stack of active confluence statements, so we can always find
// the innermost when ending a confluence block.
std::vector<const Stmt*> confluence_stmts;
// Parallel vector that tracks whether, upon creating the
// confluence block, there had already been observed internal flow
// going beyond it. If so, then we can ignore no_orig_flow when
// ending the block, because in fact there *was* original flow.
std::vector<bool> block_has_orig_flow;
// Whether the identifier is a temporary variable.
bool is_temp = false;
// Only needed for debugging purposes.
const ID* my_id;
bool tracing = false;
// Track whether we've already generated usage errors.
bool did_undefined_warning = false;
bool did_possibly_undefined_warning = false;
};
// If non-nil, then output detailed tracing information when building
// up the usage regions for any identifier with the given name.
extern const char* trace_ID;
} // namespace zeek::detail

View file

@ -4,6 +4,7 @@
#include <cerrno> #include <cerrno>
#include "zeek/script_opt/ProfileFunc.h" #include "zeek/script_opt/ProfileFunc.h"
#include "zeek/script_opt/IDOptInfo.h"
#include "zeek/Desc.h" #include "zeek/Desc.h"
#include "zeek/Stmt.h" #include "zeek/Stmt.h"
#include "zeek/Func.h" #include "zeek/Func.h"
@ -261,13 +262,17 @@ TraversalCode ProfileFunc::PreExpr(const Expr* e)
} }
break; break;
case EXPR_INCR:
case EXPR_DECR:
case EXPR_ADD_TO:
case EXPR_REMOVE_FROM:
case EXPR_ASSIGN: case EXPR_ASSIGN:
{ {
if ( e->GetOp1()->Tag() == EXPR_REF ) if ( e->GetOp1()->Tag() == EXPR_REF )
{ {
auto lhs = e->GetOp1()->GetOp1(); auto lhs = e->GetOp1()->GetOp1();
if ( lhs->Tag() == EXPR_NAME ) if ( lhs->Tag() == EXPR_NAME )
assignees.insert(lhs->AsNameExpr()->Id()); TrackAssignment(lhs->AsNameExpr()->Id());
} }
// else this isn't a direct assignment. // else this isn't a direct assignment.
break; break;
@ -432,6 +437,14 @@ void ProfileFunc::TrackID(const ID* id)
ordered_ids.push_back(id); ordered_ids.push_back(id);
} }
void ProfileFunc::TrackAssignment(const ID* id)
{
if ( assignees.count(id) > 0 )
++assignees[id];
else
assignees[id] = 1;
}
ProfileFuncs::ProfileFuncs(std::vector<FuncInfo>& funcs, ProfileFuncs::ProfileFuncs(std::vector<FuncInfo>& funcs,
is_compilable_pred pred, bool _full_record_hashes) is_compilable_pred pred, bool _full_record_hashes)
@ -446,7 +459,7 @@ ProfileFuncs::ProfileFuncs(std::vector<FuncInfo>& funcs,
auto pf = std::make_unique<ProfileFunc>(f.Func(), f.Body(), auto pf = std::make_unique<ProfileFunc>(f.Func(), f.Body(),
full_record_hashes); full_record_hashes);
if ( ! pred || (*pred)(pf.get()) ) if ( ! pred || (*pred)(pf.get(), nullptr) )
MergeInProfile(pf.get()); MergeInProfile(pf.get());
else else
f.SetSkip(true); f.SetSkip(true);
@ -488,7 +501,7 @@ void ProfileFuncs::MergeInProfile(ProfileFunc* pf)
if ( t->Tag() == TYPE_TYPE ) if ( t->Tag() == TYPE_TYPE )
(void) HashType(t->AsTypeType()->GetType()); (void) HashType(t->AsTypeType()->GetType());
auto& init_exprs = g->GetInitExprs(); auto& init_exprs = g->GetOptInfo()->GetInitExprs();
for ( const auto& i_e : init_exprs ) for ( const auto& i_e : init_exprs )
if ( i_e ) if ( i_e )
{ {

View file

@ -103,7 +103,7 @@ public:
{ return locals; } { return locals; }
const std::unordered_set<const ID*>& Params() const const std::unordered_set<const ID*>& Params() const
{ return params; } { return params; }
const std::unordered_set<const ID*>& Assignees() const const std::unordered_map<const ID*, int>& Assignees() const
{ return assignees; } { return assignees; }
const std::unordered_set<const ID*>& Inits() const const std::unordered_set<const ID*>& Inits() const
{ return inits; } { return inits; }
@ -166,6 +166,9 @@ protected:
// Take note of the presence of an identifier. // Take note of the presence of an identifier.
void TrackID(const ID* id); void TrackID(const ID* id);
// Take note of an assignment to an identifier.
void TrackAssignment(const ID* id);
// Globals seen in the function. // Globals seen in the function.
// //
// Does *not* include globals solely seen as the function being // Does *not* include globals solely seen as the function being
@ -187,10 +190,11 @@ protected:
// function. // function.
int num_params = -1; int num_params = -1;
// Identifiers (globals, locals, parameters) that are assigned to. // Maps identifiers (globals, locals, parameters) to how often
// Does not include implicit assignments due to initializations, // they are assigned to (no entry if never). Does not include
// which are instead captured in "inits". // implicit assignments due to initializations, which are instead
std::unordered_set<const ID*> assignees; // captured in "inits".
std::unordered_map<const ID*, int> assignees;
// Same for locals seen in initializations, so we can find, // Same for locals seen in initializations, so we can find,
// for example, unused aggregates. // for example, unused aggregates.
@ -277,7 +281,7 @@ protected:
// profile is compilable. Alternatively we could derive subclasses // profile is compilable. Alternatively we could derive subclasses
// from ProfileFuncs and use a virtual method for this, but that seems // from ProfileFuncs and use a virtual method for this, but that seems
// heavier-weight for what's really a simple notion. // heavier-weight for what's really a simple notion.
typedef bool (*is_compilable_pred)(const ProfileFunc*); using is_compilable_pred = bool (*)(const ProfileFunc*, const char** reason);
// Collectively profile an entire collection of functions. // Collectively profile an entire collection of functions.
class ProfileFuncs { class ProfileFuncs {

View file

@ -190,7 +190,7 @@ public:
if ( ! dps || dps->length() != 1 ) if ( ! dps || dps->length() != 1 )
return false; return false;
return (*dps)[0].Tag() != NO_DEF; return (*dps)[0].Tag() != NO_DEF_POINT;
} }
// Whether the given definition item has an RD at the given // Whether the given definition item has an RD at the given

View file

@ -7,6 +7,8 @@
#include "zeek/Stmt.h" #include "zeek/Stmt.h"
#include "zeek/Desc.h" #include "zeek/Desc.h"
#include "zeek/Reporter.h" #include "zeek/Reporter.h"
#include "zeek/script_opt/ExprOptInfo.h"
#include "zeek/script_opt/StmtOptInfo.h"
#include "zeek/script_opt/ProfileFunc.h" #include "zeek/script_opt/ProfileFunc.h"
#include "zeek/script_opt/Reduce.h" #include "zeek/script_opt/Reduce.h"
#include "zeek/script_opt/TempVar.h" #include "zeek/script_opt/TempVar.h"
@ -33,6 +35,10 @@ ExprPtr Reducer::GenTemporaryExpr(const TypePtr& t, ExprPtr rhs)
{ {
auto e = make_intrusive<NameExpr>(GenTemporary(t, rhs)); auto e = make_intrusive<NameExpr>(GenTemporary(t, rhs));
e->SetLocationInfo(rhs->GetLocationInfo()); e->SetLocationInfo(rhs->GetLocationInfo());
// No need to associate with current statement, since these
// are not generated during optimization.
return e; return e;
} }
@ -41,7 +47,13 @@ NameExprPtr Reducer::UpdateName(NameExprPtr n)
if ( NameIsReduced(n.get()) ) if ( NameIsReduced(n.get()) )
return n; return n;
return make_intrusive<NameExpr>(FindNewLocal(n)); auto ne = make_intrusive<NameExpr>(FindNewLocal(n));
// This name can be used by follow-on optimization analysis,
// so need to associate it with its statement.
BindExprToCurrStmt(ne);
return ne;
} }
bool Reducer::NameIsReduced(const NameExpr* n) const bool Reducer::NameIsReduced(const NameExpr* n) const
@ -106,6 +118,8 @@ bool Reducer::ID_IsReduced(const ID* id) const
NameExprPtr Reducer::GenInlineBlockName(const IDPtr& id) NameExprPtr Reducer::GenInlineBlockName(const IDPtr& id)
{ {
// We do this during reduction, not optimization, so no need
// to associate with curr_stmt.
return make_intrusive<NameExpr>(GenLocal(id)); return make_intrusive<NameExpr>(GenLocal(id));
} }
@ -118,6 +132,7 @@ NameExprPtr Reducer::PushInlineBlock(TypePtr type)
IDPtr ret_id = install_ID("@retvar", "<internal>", false, false); IDPtr ret_id = install_ID("@retvar", "<internal>", false, false);
ret_id->SetType(type); ret_id->SetType(type);
ret_id->GetOptInfo()->SetTemp();
// Track this as a new local *if* we're in the outermost inlining // Track this as a new local *if* we're in the outermost inlining
// block. If we're recursively deeper into inlining, then this // block. If we're recursively deeper into inlining, then this
@ -141,48 +156,22 @@ bool Reducer::SameVal(const Val* v1, const Val* v2) const
return v1 == v2; return v1 == v2;
} }
ExprPtr Reducer::NewVarUsage(IDPtr var, const DefPoints* dps, const Expr* orig) ExprPtr Reducer::NewVarUsage(IDPtr var, const Expr* orig)
{ {
if ( ! dps )
reporter->InternalError("null defpoints in NewVarUsage");
auto var_usage = make_intrusive<NameExpr>(var); auto var_usage = make_intrusive<NameExpr>(var);
SetDefPoints(var_usage.get(), dps); BindExprToCurrStmt(var_usage);
TrackExprReplacement(orig, var_usage.get());
return var_usage; return var_usage;
} }
const DefPoints* Reducer::GetDefPoints(const NameExpr* var) void Reducer::BindExprToCurrStmt(const ExprPtr& e)
{ {
auto dps = FindDefPoints(var); e->GetOptInfo()->stmt_num = curr_stmt->GetOptInfo()->stmt_num;
if ( ! dps )
{
auto id = var->Id();
auto di = mgr->GetConstID_DI(id);
auto rds = mgr->GetPreMaxRDs(GetRDLookupObj(var));
dps = rds->GetDefPoints(di);
SetDefPoints(var, dps);
} }
return dps; void Reducer::BindStmtToCurrStmt(const StmtPtr& s)
}
const DefPoints* Reducer::FindDefPoints(const NameExpr* var) const
{ {
auto dps = var_usage_to_DPs.find(var); s->GetOptInfo()->stmt_num = curr_stmt->GetOptInfo()->stmt_num;
if ( dps == var_usage_to_DPs.end() )
return nullptr;
else
return dps->second;
}
void Reducer::SetDefPoints(const NameExpr* var, const DefPoints* dps)
{
var_usage_to_DPs[var] = dps;
} }
bool Reducer::SameOp(const Expr* op1, const Expr* op2) bool Reducer::SameOp(const Expr* op1, const Expr* op2)
@ -196,7 +185,7 @@ bool Reducer::SameOp(const Expr* op1, const Expr* op2)
if ( op1->Tag() == EXPR_NAME ) if ( op1->Tag() == EXPR_NAME )
{ {
// Needs to be both the same identifier and in contexts // Needs to be both the same identifier and in contexts
// where the identifier has the same definition points. // where the identifier has the same definitions.
auto op1_n = op1->AsNameExpr(); auto op1_n = op1->AsNameExpr();
auto op2_n = op2->AsNameExpr(); auto op2_n = op2->AsNameExpr();
@ -206,10 +195,13 @@ bool Reducer::SameOp(const Expr* op1, const Expr* op2)
if ( op1_id != op2_id ) if ( op1_id != op2_id )
return false; return false;
auto op1_dps = GetDefPoints(op1_n); auto e_stmt_1 = op1->GetOptInfo()->stmt_num;
auto op2_dps = GetDefPoints(op2_n); auto e_stmt_2 = op2->GetOptInfo()->stmt_num;
return same_DPs(op1_dps, op2_dps); auto def_1 = op1_id->GetOptInfo()->DefinitionBefore(e_stmt_1);
auto def_2 = op2_id->GetOptInfo()->DefinitionBefore(e_stmt_2);
return def_1 == def_2 && def_1 != NO_DEF;
} }
else if ( op1->Tag() == EXPR_CONST ) else if ( op1->Tag() == EXPR_CONST )
@ -391,11 +383,10 @@ IDPtr Reducer::FindExprTmp(const Expr* rhs, const Expr* a,
// always makes it here. // always makes it here.
auto id = et_i->Id().get(); auto id = et_i->Id().get();
// We use 'a' in the following rather than rhs auto stmt_num = a->GetOptInfo()->stmt_num;
// because the RHS can get rewritten (for example, auto def = id->GetOptInfo()->DefinitionBefore(stmt_num);
// due to folding) after we generate RDs, and
// thus might not have any. if ( def == NO_DEF )
if ( ! mgr->HasSinglePreMinRD(a, id) )
// The temporary's value isn't guaranteed // The temporary's value isn't guaranteed
// to make it here. // to make it here.
continue; continue;
@ -470,8 +461,6 @@ void Reducer::CheckIDs(const Expr* e, std::vector<const ID*>& ids) const
bool Reducer::IsCSE(const AssignExpr* a, const NameExpr* lhs, const Expr* rhs) bool Reducer::IsCSE(const AssignExpr* a, const NameExpr* lhs, const Expr* rhs)
{ {
auto a_max_rds = mgr->GetPostMaxRDs(GetRDLookupObj(a));
auto lhs_id = lhs->Id(); auto lhs_id = lhs->Id();
auto lhs_tmp = FindTemporary(lhs_id); // nil if LHS not a temporary auto lhs_tmp = FindTemporary(lhs_id); // nil if LHS not a temporary
auto rhs_tmp = FindExprTmp(rhs, a, lhs_tmp); auto rhs_tmp = FindExprTmp(rhs, a, lhs_tmp);
@ -479,9 +468,7 @@ bool Reducer::IsCSE(const AssignExpr* a, const NameExpr* lhs, const Expr* rhs)
ExprPtr new_rhs; ExprPtr new_rhs;
if ( rhs_tmp ) if ( rhs_tmp )
{ // We already have a temporary { // We already have a temporary
auto tmp_di = mgr->GetConstID_DI(rhs_tmp.get()); new_rhs = NewVarUsage(rhs_tmp, rhs);
auto dps = a_max_rds->GetDefPoints(tmp_di);
new_rhs = NewVarUsage(rhs_tmp, dps, rhs);
rhs = new_rhs.get(); rhs = new_rhs.get();
} }
@ -507,104 +494,73 @@ bool Reducer::IsCSE(const AssignExpr* a, const NameExpr* lhs, const Expr* rhs)
// Treat the LHS as either an alias for the RHS, // Treat the LHS as either an alias for the RHS,
// or as a constant if the RHS is a constant in // or as a constant if the RHS is a constant in
// this context. // this context.
auto rhs_di = mgr->GetConstID_DI(rhs_id.get()); auto stmt_num = a->GetOptInfo()->stmt_num;
auto dps = a_max_rds->GetDefPoints(rhs_di); auto rhs_const = CheckForConst(rhs_id, stmt_num);
auto rhs_const = CheckForConst(rhs_id, dps);
if ( rhs_const ) if ( rhs_const )
lhs_tmp->SetConst(rhs_const); lhs_tmp->SetConst(rhs_const);
else else
lhs_tmp->SetAlias(rhs_id, dps); lhs_tmp->SetAlias(rhs_id);
return true; return true;
} }
// Track where we define the temporary.
auto lhs_di = mgr->GetConstID_DI(lhs_id);
auto dps = a_max_rds->GetDefPoints(lhs_di);
if ( lhs_tmp->DPs() && ! same_DPs(lhs_tmp->DPs(), dps) )
reporter->InternalError("double DPs for temporary");
lhs_tmp->SetDPs(dps);
SetDefPoints(lhs, dps);
expr_temps.emplace_back(lhs_tmp); expr_temps.emplace_back(lhs_tmp);
} }
return false; return false;
} }
const ConstExpr* Reducer::CheckForConst(const IDPtr& id, const ConstExpr* Reducer::CheckForConst(const IDPtr& id, int stmt_num) const
const DefPoints* dps) const
{ {
if ( ! dps || dps->length() == 0 ) if ( id->GetType()->Tag() == TYPE_ANY )
// This can happen for access to uninitialized values. // Don't propagate identifiers of type "any" as constants.
// This is because the identifier might be used in some
// context that's dynamically unreachable due to the type
// of its value (such as via a type-switch), but for which
// constant propagation of the constant value to that
// context can result in compile-time errors when folding
// expressions in which the identifier appears (and is
// in that context presumed to have a different type).
return nullptr; return nullptr;
if ( dps->length() != 1 ) auto oi = id->GetOptInfo();
// Multiple definitions of the variable reach to this auto c = oi->Const();
// location. In theory we could check whether they *all*
// provide the same constant, but that hardly seems likely. if ( c )
return c;
auto e = id->GetOptInfo()->DefExprBefore(stmt_num);
if ( e )
{
auto ce = constant_exprs.find(e.get());
if ( ce != constant_exprs.end() )
e = ce->second;
if ( e->Tag() == EXPR_CONST )
return e->AsConstExpr();
// Follow aliases.
if ( e->Tag() != EXPR_NAME )
return nullptr; return nullptr;
// Identifier has a unique definition. return CheckForConst(e->AsNameExpr()->IdPtr(), stmt_num);
auto dp = (*dps)[0]; }
const Expr* e = nullptr;
if ( dp.Tag() == STMT_DEF )
{
auto s = dp.StmtVal();
if ( s->Tag() == STMT_CATCH_RETURN )
{
// Change 's' to refer to the associated assignment
// statement, if any.
auto cr = s->AsCatchReturnStmt();
s = cr->AssignStmt().get();
if ( ! s )
return nullptr; return nullptr;
} }
if ( s->Tag() != STMT_EXPR ) ConstExprPtr Reducer::Fold(ExprPtr e)
// Defined in a statement other than an assignment.
return nullptr;
e = s->AsExprStmt()->StmtExpr();
}
else if ( dp.Tag() == EXPR_DEF )
e = dp.ExprVal();
else
return nullptr;
if ( e->Tag() != EXPR_ASSIGN )
// Not sure why this would happen, other than EXPR_APPEND_TO,
// but in any case not an expression we can mine for a
// constant.
return nullptr;
auto rhs = e->GetOp2();
if ( rhs->Tag() != EXPR_CONST )
return nullptr;
return rhs->AsConstExpr();
}
void Reducer::TrackExprReplacement(const Expr* orig, const Expr* e)
{ {
new_expr_to_orig[e] = orig; auto c = make_intrusive<ConstExpr>(e->Eval(nullptr));
FoldedTo(e, c);
return c;
} }
const Obj* Reducer::GetRDLookupObj(const Expr* e) const void Reducer::FoldedTo(ExprPtr e, ConstExprPtr c)
{ {
auto orig_e = new_expr_to_orig.find(e); constant_exprs[e.get()] = std::move(c);
if ( orig_e == new_expr_to_orig.end() ) folded_exprs.push_back(std::move(e));
return e;
else
return orig_e->second;
} }
ExprPtr Reducer::OptExpr(Expr* e) ExprPtr Reducer::OptExpr(Expr* e)
@ -635,13 +591,10 @@ ExprPtr Reducer::UpdateExpr(ExprPtr e)
auto tmp_var = FindTemporary(id); auto tmp_var = FindTemporary(id);
if ( ! tmp_var ) if ( ! tmp_var )
{ {
auto max_rds = mgr->GetPreMaxRDs(GetRDLookupObj(n));
IDPtr id_ptr = {NewRef{}, id}; IDPtr id_ptr = {NewRef{}, id};
auto di = mgr->GetConstID_DI(id); auto stmt_num = e->GetOptInfo()->stmt_num;
auto dps = max_rds->GetDefPoints(di); auto is_const = CheckForConst(id_ptr, stmt_num);
auto is_const = CheckForConst(id_ptr, dps);
if ( is_const ) if ( is_const )
{ {
// Remember this variable as one whose value // Remember this variable as one whose value
@ -662,36 +615,33 @@ ExprPtr Reducer::UpdateExpr(ExprPtr e)
auto alias = tmp_var->Alias(); auto alias = tmp_var->Alias();
if ( alias ) if ( alias )
{ {
// Make sure that the definition points for the // Make sure that the definitions for the alias here are
// alias here are the same as when the alias // the same as when the alias was created.
// was created.
auto alias_tmp = FindTemporary(alias.get()); auto alias_tmp = FindTemporary(alias.get());
if ( alias_tmp ) // Resolve any alias chains.
while ( alias_tmp && alias_tmp->Alias() )
{ {
while ( alias_tmp->Alias() ) alias = alias_tmp->Alias();
{ alias_tmp = FindTemporary(alias.get());
// Alias chains can occur due to
// re-reduction while optimizing.
auto a_id = alias_tmp->Id();
if ( a_id == id )
return e;
alias_tmp = FindTemporary(alias_tmp->Id().get());
} }
// Temporaries always have only one definition point, if ( alias->GetOptInfo()->IsTemp() )
{
// Temporaries always have only one definition,
// so no need to check for consistency. // so no need to check for consistency.
auto new_usage = NewVarUsage(alias, alias_tmp->DPs(), e.get()); auto new_usage = NewVarUsage(alias, e.get());
return new_usage; return new_usage;
} }
auto e_max_rds = mgr->GetPreMaxRDs(GetRDLookupObj(e.get())); auto e_stmt_1 = e->GetOptInfo()->stmt_num;
auto alias_di = mgr->GetConstID_DI(alias.get()); auto e_stmt_2 = tmp_var->RHS()->GetOptInfo()->stmt_num;
auto alias_dps = e_max_rds->GetDefPoints(alias_di);
if ( same_DPs(alias_dps, tmp_var->DPs()) ) auto def_1 = alias->GetOptInfo()->DefinitionBefore(e_stmt_1);
return NewVarUsage(alias, alias_dps, e.get()); auto def_2 = tmp_var->Id()->GetOptInfo()->DefinitionBefore(e_stmt_2);
if ( def_1 == def_2 && def_1 != NO_DEF )
return NewVarUsage(alias, e.get());
else else
return e; return e;
} }
@ -758,9 +708,17 @@ StmtPtr Reducer::MergeStmts(const NameExpr* lhs, ExprPtr rhs, Stmt* succ_stmt)
lhs_tmp->Deactivate(); lhs_tmp->Deactivate();
auto merge_e = make_intrusive<AssignExpr>(a_lhs_deref, rhs, false, auto merge_e = make_intrusive<AssignExpr>(a_lhs_deref, rhs, false,
nullptr, nullptr, false); nullptr, nullptr, false);
TrackExprReplacement(rhs.get(), merge_e.get()); auto merge_e_stmt = make_intrusive<ExprStmt>(merge_e);
return make_intrusive<ExprStmt>(merge_e); // Update the associated stmt_num's. For strict correctness, we
// want both of these bound to the earlier of the two statements
// we're merging (though in practice, either will work, since
// we're eliding the only difference between the two). Our
// caller ensures this.
BindExprToCurrStmt(merge_e);
BindStmtToCurrStmt(merge_e_stmt);
return merge_e_stmt;
} }
IDPtr Reducer::GenTemporary(const TypePtr& t, ExprPtr rhs) IDPtr Reducer::GenTemporary(const TypePtr& t, ExprPtr rhs)
@ -809,6 +767,9 @@ IDPtr Reducer::GenLocal(const IDPtr& orig)
local_id->SetType(orig->GetType()); local_id->SetType(orig->GetType());
local_id->SetAttrs(orig->GetAttrs()); local_id->SetAttrs(orig->GetAttrs());
if ( orig->GetOptInfo()->IsTemp() )
local_id->GetOptInfo()->SetTemp();
new_locals.insert(local_id.get()); new_locals.insert(local_id.get());
orig_to_new_locals[orig.get()] = local_id; orig_to_new_locals[orig.get()] = local_id;
@ -1040,27 +1001,6 @@ bool CSE_ValidityChecker::CheckAggrMod(const std::vector<const ID*>& ids,
} }
bool same_DPs(const DefPoints* dp1, const DefPoints* dp2)
{
if ( dp1 == dp2 )
return true;
if ( ! dp1 || ! dp2 )
return false;
// Given how we construct DPs, they should be element-by-element
// equivalent; we don't have to worry about reordering.
if ( dp1->length() != dp2->length() )
return false;
for ( auto i = 0; i < dp1->length(); ++i )
if ( ! (*dp1)[i].SameAs((*dp2)[i]) )
return false;
return true;
}
const Expr* non_reduced_perp; const Expr* non_reduced_perp;
bool checking_reduction; bool checking_reduction;

View file

@ -6,13 +6,11 @@
#include "zeek/Expr.h" #include "zeek/Expr.h"
#include "zeek/Stmt.h" #include "zeek/Stmt.h"
#include "zeek/Traverse.h" #include "zeek/Traverse.h"
#include "zeek/script_opt/DefSetsMgr.h"
namespace zeek::detail { namespace zeek::detail {
class Expr; class Expr;
class TempVar; class TempVar;
class ProfileFunc;
class Reducer { class Reducer {
public: public:
@ -20,8 +18,9 @@ public:
StmtPtr Reduce(StmtPtr s); StmtPtr Reduce(StmtPtr s);
const DefSetsMgr* GetDefSetsMgr() const { return mgr; } void SetReadyToOptimize() { opt_ready = true; }
void SetDefSetsMgr(const DefSetsMgr* _mgr) { mgr = _mgr; }
void SetCurrStmt(const Stmt* stmt) { curr_stmt = stmt; }
ExprPtr GenTemporaryExpr(const TypePtr& t, ExprPtr rhs); ExprPtr GenTemporaryExpr(const TypePtr& t, ExprPtr rhs);
@ -76,7 +75,7 @@ public:
// True if the Reducer is being used in the context of a second // True if the Reducer is being used in the context of a second
// pass over for AST optimization. // pass over for AST optimization.
bool Optimizing() const bool Optimizing() const
{ return ! IsPruning() && mgr != nullptr; } { return ! IsPruning() && opt_ready; }
// A predicate that indicates whether a given reduction pass // A predicate that indicates whether a given reduction pass
// is being made to prune unused statements. // is being made to prune unused statements.
@ -126,6 +125,14 @@ public:
// already been applied. // already been applied.
bool IsCSE(const AssignExpr* a, const NameExpr* lhs, const Expr* rhs); bool IsCSE(const AssignExpr* a, const NameExpr* lhs, const Expr* rhs);
// Returns a constant representing folding of the given expression
// (which must have constant operands).
ConstExprPtr Fold(ExprPtr e);
// Notes that the given expression has been folded to the
// given constant.
void FoldedTo(ExprPtr orig, ConstExprPtr c);
// Given an lhs=rhs statement followed by succ_stmt, returns // Given an lhs=rhs statement followed by succ_stmt, returns
// a (new) merge of the two if they're of the form tmp=rhs, var=tmp; // a (new) merge of the two if they're of the form tmp=rhs, var=tmp;
// otherwise, nil. // otherwise, nil.
@ -150,23 +157,13 @@ protected:
// are in fact equivalent.) // are in fact equivalent.)
bool SameVal(const Val* v1, const Val* v2) const; bool SameVal(const Val* v1, const Val* v2) const;
// Track that the variable "var", which has the given set of // Track that the variable "var" will be a replacement for
// definition points, will be a replacement for the "orig" // the "orig" expression. Returns the replacement expression
// expression. Returns the replacement expression (which is // (which is is just a NameExpr referring to "var").
// is just a NameExpr referring to "var"). ExprPtr NewVarUsage(IDPtr var, const Expr* orig);
ExprPtr NewVarUsage(IDPtr var, const DefPoints* dps, const Expr* orig);
// Returns the definition points associated with "var". If none void BindExprToCurrStmt(const ExprPtr& e);
// exist in our cache, then populates the cache. void BindStmtToCurrStmt(const StmtPtr& s);
const DefPoints* GetDefPoints(const NameExpr* var);
// Retrieve the definition points associated in our cache with the
// given variable, if any.
const DefPoints* FindDefPoints(const NameExpr* var) const;
// Adds a mapping in our cache of the given variable to the given
// set of definition points.
void SetDefPoints(const NameExpr* var, const DefPoints* dps);
// Returns true if op1 and op2 represent the same operand, given // Returns true if op1 and op2 represent the same operand, given
// the reaching definitions available at their usages (e1 and e2). // the reaching definitions available at their usages (e1 and e2).
@ -216,23 +213,10 @@ protected:
// for the current function. // for the current function.
IDPtr GenLocal(const IDPtr& orig); IDPtr GenLocal(const IDPtr& orig);
// This is the heart of constant propagation. Given an identifier // This is the heart of constant propagation. Given an identifier,
// and a set of definition points for it, if its value is constant // if its value is constant at the given location then returns
// then returns the corresponding ConstExpr with the value. // the corresponding ConstExpr with the value.
const ConstExpr* CheckForConst(const IDPtr& id, const ConstExpr* CheckForConst(const IDPtr& id, int stmt_num) const;
const DefPoints* dps) const;
// Track that we're replacing instances of "orig" with a new
// expression. This allows us to locate the RDs associated
// with "orig" in the context of the new expression, without
// requiring an additional RD propagation pass.
void TrackExprReplacement(const Expr* orig, const Expr* e);
// Returns the object we should use to look up RD's associated
// with 'e'. (This isn't necessarily 'e' itself because we
// may have decided to replace it with a different expression,
// per TrackExprReplacement().)
const Obj* GetRDLookupObj(const Expr* e) const;
// Tracks the temporary variables created during the reduction/ // Tracks the temporary variables created during the reduction/
// optimization process. // optimization process.
@ -253,6 +237,14 @@ protected:
// rename local variables when inlining. // rename local variables when inlining.
std::unordered_map<const ID*, IDPtr> orig_to_new_locals; std::unordered_map<const ID*, IDPtr> orig_to_new_locals;
// Tracks expressions we've folded, so that we can recognize them
// for constant propagation.
std::unordered_map<const Expr*, ConstExprPtr> constant_exprs;
// Holds onto those expressions so they don't become invalid
// due to memory management.
std::vector<ExprPtr> folded_exprs;
// Which statements to elide from the AST (because optimization // Which statements to elide from the AST (because optimization
// has determined they're no longer needed). // has determined they're no longer needed).
std::unordered_set<const Stmt*> omitted_stmts; std::unordered_set<const Stmt*> omitted_stmts;
@ -270,25 +262,17 @@ protected:
// exponentially. // exponentially.
int bifurcation_level = 0; int bifurcation_level = 0;
// For a given usage of a variable's value, return the definition
// points associated with its use at that point. We use this
// both as a cache (populating it every time we do a more laborious
// lookup), and proactively when creating new references to variables.
std::unordered_map<const NameExpr*, const DefPoints*> var_usage_to_DPs;
// Tracks which (non-temporary) variables had constant // Tracks which (non-temporary) variables had constant
// values used for constant propagation. // values used for constant propagation.
std::unordered_set<const ID*> constant_vars; std::unordered_set<const ID*> constant_vars;
// For a new expression we've created, map it to the expression
// it's replacing. This allows us to locate the RDs associated
// with the usage.
std::unordered_map<const Expr*, const Expr*> new_expr_to_orig;
// Statement at which the current reduction started. // Statement at which the current reduction started.
StmtPtr reduction_root = nullptr; StmtPtr reduction_root = nullptr;
const DefSetsMgr* mgr = nullptr; // Statement we're currently working on.
const Stmt* curr_stmt = nullptr;
bool opt_ready = false;
}; };
@ -364,8 +348,6 @@ protected:
}; };
extern bool same_DPs(const DefPoints* dp1, const DefPoints* dp2);
// Used for debugging, to communicate which expression wasn't // Used for debugging, to communicate which expression wasn't
// reduced when we expected them all to be. // reduced when we expected them all to be.
extern const Expr* non_reduced_perp; extern const Expr* non_reduced_perp;

View file

@ -6,15 +6,20 @@
#include "zeek/Desc.h" #include "zeek/Desc.h"
#include "zeek/EventHandler.h" #include "zeek/EventHandler.h"
#include "zeek/EventRegistry.h" #include "zeek/EventRegistry.h"
#include "zeek/script_opt/ScriptOpt.h" #include "zeek/script_opt/ScriptOpt.h"
#include "zeek/script_opt/ProfileFunc.h" #include "zeek/script_opt/ProfileFunc.h"
#include "zeek/script_opt/Inline.h" #include "zeek/script_opt/Inline.h"
#include "zeek/script_opt/Reduce.h" #include "zeek/script_opt/Reduce.h"
#include "zeek/script_opt/GenIDDefs.h"
#include "zeek/script_opt/GenRDs.h" #include "zeek/script_opt/GenRDs.h"
#include "zeek/script_opt/UseDefs.h" #include "zeek/script_opt/UseDefs.h"
#include "zeek/script_opt/CPP/Compile.h" #include "zeek/script_opt/CPP/Compile.h"
#include "zeek/script_opt/CPP/Func.h" #include "zeek/script_opt/CPP/Func.h"
#include "zeek/script_opt/ZAM/Compile.h"
namespace zeek::detail { namespace zeek::detail {
@ -31,21 +36,75 @@ static std::vector<FuncInfo> funcs;
static ZAMCompiler* ZAM = nullptr; static ZAMCompiler* ZAM = nullptr;
static bool generating_CPP = false;
static std::string hash_dir; // for storing hashes of previous compilations
void optimize_func(ScriptFunc* f, std::shared_ptr<ProfileFunc> pf, static ScriptFuncPtr global_stmts;
ScopePtr scope, StmtPtr& body,
AnalyOpt& analysis_options) void analyze_func(ScriptFuncPtr f)
{
// Even if we're doing --optimize-only, we still track all functions
// here because the inliner will need the full list.
funcs.emplace_back(f, f->GetScope(), f->CurrentBody(),
f->CurrentPriority());
}
const FuncInfo* analyze_global_stmts(Stmt* stmts)
{
// We ignore analysis_options.only_func - if it's in use, later
// logic will keep this function from being compiled, but it's handy
// now to enter it into "funcs" so we have a FuncInfo to return.
auto id = install_ID("<global-stmts>", GLOBAL_MODULE_NAME, true, false);
auto empty_args_t = make_intrusive<RecordType>(nullptr);
auto func_t = make_intrusive<FuncType>(empty_args_t, nullptr, FUNC_FLAVOR_FUNCTION);
id->SetType(func_t);
auto sc = current_scope();
std::vector<IDPtr> empty_inits;
StmtPtr stmts_p{NewRef{}, stmts};
global_stmts = make_intrusive<ScriptFunc>(id, stmts_p, empty_inits,
sc->Length(), 0);
funcs.emplace_back(global_stmts, sc, stmts_p, 0);
return &funcs.back();
}
static bool optimize_AST(ScriptFunc* f, std::shared_ptr<ProfileFunc>& pf,
std::shared_ptr<Reducer>& rc, ScopePtr scope,
StmtPtr& body)
{
pf = std::make_shared<ProfileFunc>(f, body, true);
GenIDDefs ID_defs(pf, f, scope, body);
if ( reporter->Errors() > 0 )
return false;
rc->SetReadyToOptimize();
auto new_body = rc->Reduce(body);
if ( reporter->Errors() > 0 )
return false;
if ( analysis_options.dump_xform )
printf("Optimized: %s\n", obj_desc(new_body.get()).c_str());
f->ReplaceBody(body, new_body);
body = new_body;
return true;
}
static void optimize_func(ScriptFunc* f, std::shared_ptr<ProfileFunc> pf,
ScopePtr scope, StmtPtr& body)
{ {
if ( reporter->Errors() > 0 ) if ( reporter->Errors() > 0 )
return; return;
if ( ! analysis_options.activate )
return;
if ( analysis_options.only_func &&
*analysis_options.only_func != f->Name() )
return;
if ( analysis_options.only_func ) if ( analysis_options.only_func )
printf("Original: %s\n", obj_desc(body.get()).c_str()); printf("Original: %s\n", obj_desc(body.get()).c_str());
@ -53,10 +112,12 @@ void optimize_func(ScriptFunc* f, std::shared_ptr<ProfileFunc> pf,
// We're not able to optimize this. // We're not able to optimize this.
return; return;
if ( pf->NumWhenStmts() > 0 || pf->NumLambdas() > 0 ) const char* reason;
if ( ! is_ZAM_compilable(pf.get(), &reason) )
{ {
if ( analysis_options.only_func ) if ( analysis_options.report_uncompilable )
printf("Skipping analysis due to \"when\" statement or use of lambdas\n"); printf("Skipping compilation of %s due to %s\n",
f->Name(), reason);
return; return;
} }
@ -77,59 +138,40 @@ void optimize_func(ScriptFunc* f, std::shared_ptr<ProfileFunc> pf,
if ( ! new_body->IsReduced(rc.get()) ) if ( ! new_body->IsReduced(rc.get()) )
{ {
if ( non_reduced_perp ) if ( non_reduced_perp )
printf("Reduction inconsistency for %s: %s\n", f->Name(), reporter->InternalError("Reduction inconsistency for %s: %s\n", f->Name(),
obj_desc(non_reduced_perp).c_str()); obj_desc(non_reduced_perp).c_str());
else else
printf("Reduction inconsistency for %s\n", f->Name()); reporter->InternalError("Reduction inconsistency for %s\n", f->Name());
} }
checking_reduction = false; checking_reduction = false;
if ( analysis_options.only_func || analysis_options.dump_xform ) if ( analysis_options.dump_xform )
printf("Transformed: %s\n", obj_desc(new_body.get()).c_str()); printf("Transformed: %s\n", obj_desc(new_body.get()).c_str());
f->ReplaceBody(body, new_body); f->ReplaceBody(body, new_body);
body = new_body; body = new_body;
if ( analysis_options.optimize_AST ) if ( analysis_options.usage_issues > 1 )
{ {
pf = std::make_shared<ProfileFunc>(f, body, true); // Use the old-school approach for this.
RD_Decorate reduced_rds(pf, f, scope, body);
}
RD_Decorate reduced_rds(pf); if ( analysis_options.optimize_AST &&
reduced_rds.TraverseFunction(f, scope, body); ! optimize_AST(f, pf, rc, scope, body) )
if ( reporter->Errors() > 0 )
{ {
pop_scope(); pop_scope();
return; return;
} }
rc->SetDefSetsMgr(reduced_rds.GetDefSetsMgr());
new_body = rc->Reduce(body);
if ( reporter->Errors() > 0 )
{
pop_scope();
return;
}
if ( analysis_options.only_func || analysis_options.dump_xform )
printf("Optimized: %s\n", obj_desc(new_body.get()).c_str());
f->ReplaceBody(body, new_body);
body = new_body;
}
// Profile the new body. // Profile the new body.
pf = std::make_shared<ProfileFunc>(f, body, true); pf = std::make_shared<ProfileFunc>(f, body, true);
// Compute its reaching definitions. // Compute its reaching definitions.
RD_Decorate reduced_rds(pf); GenIDDefs ID_defs(pf, f, scope, body);
reduced_rds.TraverseFunction(f, scope, body); rc->SetReadyToOptimize();
rc->SetDefSetsMgr(reduced_rds.GetDefSetsMgr());
auto ud = std::make_shared<UseDefs>(body, rc); auto ud = std::make_shared<UseDefs>(body, rc);
ud->Analyze(); ud->Analyze();
@ -145,80 +187,58 @@ void optimize_func(ScriptFunc* f, std::shared_ptr<ProfileFunc> pf,
body = new_body; body = new_body;
} }
int new_frame_size = int new_frame_size = scope->Length() + rc->NumTemps() +
scope->Length() + rc->NumTemps() + rc->NumNewLocals(); rc->NumNewLocals();
if ( new_frame_size > f->FrameSize() ) if ( new_frame_size > f->FrameSize() )
f->SetFrameSize(new_frame_size); f->SetFrameSize(new_frame_size);
if ( analysis_options.gen_ZAM_code )
{
ZAM = new ZAMCompiler(f, pf, scope, new_body, ud, rc);
new_body = ZAM->CompileBody();
if ( reporter->Errors() > 0 )
return;
if ( analysis_options.dump_ZAM )
ZAM->Dump();
f->ReplaceBody(body, new_body);
body = new_body;
}
pop_scope(); pop_scope();
} }
FuncInfo::FuncInfo(ScriptFuncPtr _func, ScopePtr _scope, StmtPtr _body,
int _priority)
: func(std::move(_func)), scope(std::move(_scope)),
body(std::move(_body)), priority(_priority)
{}
void FuncInfo::SetProfile(std::shared_ptr<ProfileFunc> _pf)
{ pf = std::move(_pf); }
void analyze_func(ScriptFuncPtr f)
{
if ( analysis_options.only_func &&
*analysis_options.only_func != f->Name() )
return;
funcs.emplace_back(f, f->GetScope(), f->CurrentBody(),
f->CurrentPriority());
}
const FuncInfo* analyze_global_stmts(Stmt* stmts)
{
// We ignore analysis_options.only_func - if it's in use, later
// logic will keep this function from being compiled, but it's handy
// now to enter it into "funcs" so we have a FuncInfo to return.
auto id = install_ID("<global-stmts>", GLOBAL_MODULE_NAME, true, false);
auto empty_args_t = make_intrusive<RecordType>(nullptr);
auto func_t = make_intrusive<FuncType>(empty_args_t, nullptr, FUNC_FLAVOR_FUNCTION);
id->SetType(func_t);
auto sc = current_scope();
std::vector<IDPtr> empty_inits;
StmtPtr stmts_p{NewRef{}, stmts};
auto sf = make_intrusive<ScriptFunc>(id, stmts_p, empty_inits, sc->Length(), 0);
funcs.emplace_back(sf, sc, stmts_p, 0);
return &funcs.back();
}
static void check_env_opt(const char* opt, bool& opt_flag) static void check_env_opt(const char* opt, bool& opt_flag)
{ {
if ( getenv(opt) ) if ( getenv(opt) )
opt_flag = true; opt_flag = true;
} }
void analyze_scripts() static void init_options()
{
static bool did_init = false;
static std::string hash_dir;
bool generating_CPP = false;
if ( ! did_init )
{ {
auto hd = getenv("ZEEK_HASH_DIR"); auto hd = getenv("ZEEK_HASH_DIR");
if ( hd ) if ( hd )
hash_dir = std::string(hd) + "/"; hash_dir = std::string(hd) + "/";
// ZAM-related options.
check_env_opt("ZEEK_DUMP_XFORM", analysis_options.dump_xform); check_env_opt("ZEEK_DUMP_XFORM", analysis_options.dump_xform);
check_env_opt("ZEEK_DUMP_UDS", analysis_options.dump_uds); check_env_opt("ZEEK_DUMP_UDS", analysis_options.dump_uds);
check_env_opt("ZEEK_INLINE", analysis_options.inliner); check_env_opt("ZEEK_INLINE", analysis_options.inliner);
check_env_opt("ZEEK_OPT", analysis_options.optimize_AST); check_env_opt("ZEEK_OPT", analysis_options.optimize_AST);
check_env_opt("ZEEK_XFORM", analysis_options.activate); check_env_opt("ZEEK_XFORM", analysis_options.activate);
check_env_opt("ZEEK_ZAM", analysis_options.gen_ZAM);
check_env_opt("ZEEK_COMPILE_ALL", analysis_options.compile_all);
check_env_opt("ZEEK_ZAM_CODE", analysis_options.gen_ZAM_code);
check_env_opt("ZEEK_NO_ZAM_OPT", analysis_options.no_ZAM_opt);
check_env_opt("ZEEK_DUMP_ZAM", analysis_options.dump_ZAM);
check_env_opt("ZEEK_PROFILE", analysis_options.profile_ZAM);
// Compile-to-C++-related options.
check_env_opt("ZEEK_ADD_CPP", analysis_options.add_CPP); check_env_opt("ZEEK_ADD_CPP", analysis_options.add_CPP);
check_env_opt("ZEEK_UPDATE_CPP", analysis_options.update_CPP); check_env_opt("ZEEK_UPDATE_CPP", analysis_options.update_CPP);
check_env_opt("ZEEK_GEN_CPP", analysis_options.gen_CPP); check_env_opt("ZEEK_GEN_CPP", analysis_options.gen_CPP);
@ -252,16 +272,10 @@ void analyze_scripts()
generating_CPP = true; generating_CPP = true;
if ( analysis_options.use_CPP && generating_CPP ) if ( analysis_options.use_CPP && generating_CPP )
{ reporter->FatalError("generating C++ incompatible with using C++");
reporter->Error("generating C++ incompatible with using C++");
exit(1);
}
if ( analysis_options.use_CPP && ! CPP_init_hook ) if ( analysis_options.use_CPP && ! CPP_init_hook )
{ reporter->FatalError("no C++ functions available to use");
reporter->Error("no C++ functions available to use");
exit(1);
}
auto usage = getenv("ZEEK_USAGE_ISSUES"); auto usage = getenv("ZEEK_USAGE_ISSUES");
@ -275,28 +289,40 @@ void analyze_scripts()
analysis_options.only_func = zo; analysis_options.only_func = zo;
} }
if ( analysis_options.only_func || if ( analysis_options.gen_ZAM )
analysis_options.optimize_AST || {
analysis_options.usage_issues > 0 ) analysis_options.gen_ZAM_code = true;
analysis_options.activate = true; analysis_options.inliner = true;
analysis_options.optimize_AST = true;
did_init = true;
} }
if ( ! analysis_options.activate && ! analysis_options.inliner && if ( analysis_options.dump_ZAM )
! generating_CPP && ! analysis_options.report_CPP && analysis_options.gen_ZAM_code = true;
! analysis_options.use_CPP )
// Avoid profiling overhead.
return;
// Now that everything's parsed and BiF's have been initialized, if ( analysis_options.only_func )
// profile the functions. {
auto pfs = std::make_unique<ProfileFuncs>(funcs, is_CPP_compilable, false); // Note, this comes after the statement above because for
// --optimize-only we don't necessarily want to go all
// the way to *generating* ZAM code, though we'll want to
// dump it *if* we generate it.
analysis_options.dump_xform = analysis_options.dump_ZAM = true;
if ( CPP_init_hook ) if ( analysis_options.gen_ZAM_code || generating_CPP )
(*CPP_init_hook)(); analysis_options.report_uncompilable = true;
}
if ( analysis_options.report_CPP ) if ( analysis_options.report_uncompilable &&
! analysis_options.gen_ZAM_code && ! generating_CPP )
reporter->FatalError("report-uncompilable requires generation of ZAM or C++");
if ( analysis_options.only_func ||
analysis_options.optimize_AST ||
analysis_options.gen_ZAM_code ||
analysis_options.usage_issues > 0 )
analysis_options.activate = true;
}
static void report_CPP()
{ {
if ( ! CPP_init_hook ) if ( ! CPP_init_hook )
{ {
@ -331,6 +357,7 @@ void analyze_scripts()
} }
printf("\nAdditional C++ script bodies available:\n"); printf("\nAdditional C++ script bodies available:\n");
int addl = 0; int addl = 0;
for ( const auto& s : compiled_scripts ) for ( const auto& s : compiled_scripts )
if ( already_reported.count(s.first) == 0 ) if ( already_reported.count(s.first) == 0 )
@ -342,11 +369,9 @@ void analyze_scripts()
if ( addl == 0 ) if ( addl == 0 )
printf("(none)\n"); printf("(none)\n");
exit(0);
} }
if ( analysis_options.use_CPP ) static void use_CPP()
{ {
for ( auto& f : funcs ) for ( auto& f : funcs )
{ {
@ -393,15 +418,27 @@ void analyze_scripts()
(*cb)(); (*cb)();
} }
if ( generating_CPP ) static void generate_CPP(std::unique_ptr<ProfileFuncs>& pfs)
{ {
const auto hash_name = hash_dir + "CPP-hashes"; const auto hash_name = hash_dir + "CPP-hashes";
auto hm = std::make_unique<CPPHashManager>(hash_name.c_str(), auto hm = std::make_unique<CPPHashManager>(hash_name.c_str(),
analysis_options.add_CPP); analysis_options.add_CPP);
if ( ! analysis_options.gen_CPP ) if ( analysis_options.gen_CPP )
{ {
if ( analysis_options.only_func )
{ // deactivate all functions except the target one
for ( auto& func : funcs )
{
auto fn = func.Func()->Name();
if ( *analysis_options.only_func != fn )
func.SetSkip(true);
}
}
}
else
{ // doing add-C++ instead, so look for previous compilations
for ( auto& func : funcs ) for ( auto& func : funcs )
{ {
auto hash = func.Profile()->HashVal(); auto hash = func.Profile()->HashVal();
@ -419,41 +456,21 @@ void analyze_scripts()
const auto addl_name = hash_dir + "CPP-gen-addl.h"; const auto addl_name = hash_dir + "CPP-gen-addl.h";
CPPCompile cpp(funcs, *pfs, gen_name, addl_name, *hm, CPPCompile cpp(funcs, *pfs, gen_name, addl_name, *hm,
analysis_options.gen_CPP || analysis_options.gen_CPP || analysis_options.update_CPP,
analysis_options.update_CPP, analysis_options.gen_standalone_CPP,
analysis_options.gen_standalone_CPP); analysis_options.report_uncompilable);
exit(0);
} }
if ( analysis_options.usage_issues > 0 && analysis_options.optimize_AST ) static void find_when_funcs(std::unique_ptr<ProfileFuncs>& pfs,
std::unordered_set<const ScriptFunc*>& when_funcs)
{ {
fprintf(stderr, "warning: \"-O optimize-AST\" option is incompatible with -u option, deactivating optimization\n");
analysis_options.optimize_AST = false;
}
// Re-profile the functions, this time without worrying about
// compatibility with compilation to C++. Note that the first
// profiling pass above may have marked some of the functions
// as to-skip, so first clear those markings. Once we have
// full compile-to-C++ and ZAM support for all Zeek language
// features, we can remove the re-profiling here.
for ( auto& f : funcs )
f.SetSkip(false);
pfs = std::make_unique<ProfileFuncs>(funcs, nullptr, true);
// Figure out which functions either directly or indirectly // Figure out which functions either directly or indirectly
// appear in "when" clauses. // appear in "when" clauses.
// Final set of functions involved in "when" clauses.
std::unordered_set<const ScriptFunc*> when_funcs;
// Which functions we still need to analyze. // Which functions we still need to analyze.
std::unordered_set<const ScriptFunc*> when_funcs_to_do; std::unordered_set<const ScriptFunc*> when_funcs_to_do;
for ( auto& f : funcs ) for ( auto& f : funcs )
{
if ( f.Profile()->WhenCalls().size() > 0 ) if ( f.Profile()->WhenCalls().size() > 0 )
{ {
when_funcs.insert(f.Func()); when_funcs.insert(f.Func());
@ -463,17 +480,6 @@ void analyze_scripts()
ASSERT(pfs->FuncProf(bf)); ASSERT(pfs->FuncProf(bf));
when_funcs_to_do.insert(bf); when_funcs_to_do.insert(bf);
} }
#ifdef NOT_YET
if ( analysis_options.report_uncompilable )
{
ODesc d;
f.ScriptFunc()->AddLocation(&d);
printf("%s cannot be compiled due to use of \"when\" statement (%s)\n",
f.ScriptFunc()->Name(), d.Description());
}
#endif // NOT_YET
}
} }
// Set of new functions to put on to-do list. Separate from // Set of new functions to put on to-do list. Separate from
@ -501,12 +507,41 @@ void analyze_scripts()
when_funcs_to_do = new_to_do; when_funcs_to_do = new_to_do;
new_to_do.clear(); new_to_do.clear();
} }
}
static void analyze_scripts_for_ZAM(std::unique_ptr<ProfileFuncs>& pfs)
{
if ( analysis_options.usage_issues > 0 &&
analysis_options.optimize_AST )
{
fprintf(stderr, "warning: \"-O optimize-AST\" option is incompatible with -u option, deactivating optimization\n");
analysis_options.optimize_AST = false;
}
// Re-profile the functions, now without worrying about compatibility
// with compilation to C++. Note that the first profiling pass earlier
// may have marked some of the functions as to-skip, so first clear
// those markings. Once we have full compile-to-C++ and ZAM support
// for all Zeek language features, we can remove the re-profiling here.
for ( auto& f : funcs )
f.SetSkip(false);
pfs = std::make_unique<ProfileFuncs>(funcs, nullptr, true);
// set of functions involved (directly or indirectly) in "when"
// clauses.
std::unordered_set<const ScriptFunc*> when_funcs;
find_when_funcs(pfs, when_funcs);
bool report_recursive = analysis_options.report_recursive;
std::unique_ptr<Inliner> inl; std::unique_ptr<Inliner> inl;
if ( analysis_options.inliner ) if ( analysis_options.inliner )
inl = std::make_unique<Inliner>(funcs, analysis_options.report_recursive); inl = std::make_unique<Inliner>(funcs, report_recursive);
if ( ! analysis_options.activate ) if ( ! analysis_options.activate )
// Some --optimize options stop short of AST transformations,
// for development/debugging purposes.
return; return;
// The following tracks inlined functions that are also used // The following tracks inlined functions that are also used
@ -515,6 +550,9 @@ void analyze_scripts()
// since it won't be consulted in that case. // since it won't be consulted in that case.
std::unordered_set<Func*> func_used_indirectly; std::unordered_set<Func*> func_used_indirectly;
if ( global_stmts )
func_used_indirectly.insert(global_stmts.get());
if ( inl ) if ( inl )
{ {
for ( auto& f : funcs ) for ( auto& f : funcs )
@ -540,23 +578,67 @@ void analyze_scripts()
{ {
auto func = f.Func(); auto func = f.Func();
if ( ! analysis_options.compile_all && if ( analysis_options.only_func )
{
if ( *analysis_options.only_func != func->Name() )
continue;
}
else if ( ! analysis_options.compile_all &&
inl && inl->WasInlined(func) && inl && inl->WasInlined(func) &&
func_used_indirectly.count(func) == 0 ) func_used_indirectly.count(func) == 0 )
// No need to compile as it won't be // No need to compile as it won't be called directly.
// called directly.
continue;
if ( when_funcs.count(func) > 0 )
// We don't try to compile these.
continue; continue;
auto new_body = f.Body(); auto new_body = f.Body();
optimize_func(func, f.ProfilePtr(), f.Scope(), new_body, optimize_func(func, f.ProfilePtr(), f.Scope(), new_body);
analysis_options);
f.SetBody(new_body); f.SetBody(new_body);
} }
} }
void analyze_scripts()
{
static bool did_init = false;
if ( ! did_init )
{
init_options();
did_init = true;
}
if ( ! analysis_options.activate && ! analysis_options.inliner &&
! generating_CPP && ! analysis_options.report_CPP &&
! analysis_options.use_CPP )
// No work to do, avoid profiling overhead.
return;
// Now that everything's parsed and BiF's have been initialized,
// profile the functions.
auto pfs = std::make_unique<ProfileFuncs>(funcs, is_CPP_compilable,
false);
if ( CPP_init_hook )
(*CPP_init_hook)();
if ( analysis_options.report_CPP )
{
report_CPP();
exit(0);
}
if ( analysis_options.use_CPP )
use_CPP();
if ( generating_CPP )
{
generate_CPP(pfs);
exit(0);
}
// At this point we're done with C++ considerations, so instead
// are compiling to ZAM.
analyze_scripts_for_ZAM(pfs);
}
} // namespace zeek::detail } // namespace zeek::detail

View file

@ -19,12 +19,50 @@ namespace zeek::detail {
// Flags controlling what sorts of analysis to do. // Flags controlling what sorts of analysis to do.
struct AnalyOpt { struct AnalyOpt {
// If non-nil, then only analyze the given function/event/hook.
// Applies to both ZAM and C++.
std::optional<std::string> only_func;
// For a given compilation target, report functions that can't
// be compiled.
bool report_uncompilable = false;
////// Options relating to ZAM:
// Whether to analyze scripts. // Whether to analyze scripts.
bool activate = false; bool activate = false;
// If true, compile all compileable functions, even those that
// are inlined. Mainly useful for ensuring compatibility for
// some tests in the test suite.
bool compile_all = false;
// Whether to optimize the AST. // Whether to optimize the AST.
bool optimize_AST = false; bool optimize_AST = false;
// If true, do global inlining.
bool inliner = false;
// If true, report which functions are directly and indirectly
// recursive, and exit. Only germane if running the inliner.
bool report_recursive = false;
// If true, generate ZAM code for applicable function bodies,
// activating all optimizations.
bool gen_ZAM = false;
// Generate ZAM code, but do not turn on optimizations unless
// specified.
bool gen_ZAM_code = false;
// Deactivate the low-level ZAM optimizer.
bool no_ZAM_opt = false;
// Produce a profile of ZAM execution.
bool profile_ZAM = false;
// If true, dump out transformed code: the results of reducing // If true, dump out transformed code: the results of reducing
// interpreted scripts, and, if optimize is set, of then optimizing // interpreted scripts, and, if optimize is set, of then optimizing
// them. Always done if only_func is set. // them. Always done if only_func is set.
@ -33,11 +71,23 @@ struct AnalyOpt {
// If true, dump out the use-defs for each analyzed function. // If true, dump out the use-defs for each analyzed function.
bool dump_uds = false; bool dump_uds = false;
// If non-nil, then only analyze the given function/event/hook. // If true, dump out generated ZAM code.
std::optional<std::string> only_func; bool dump_ZAM = false;
// If true, do global inlining. // If non-zero, looks for variables that are used-but-possibly-not-set,
bool inliner = false; // or set-but-not-used.
//
// If > 1, also reports on uses of uninitialized record fields and
// analyzes nested records in depth. Warning: with the current
// data structures this greatly increases analysis time.
//
// Included here with other ZAM-related options since conducting
// the analysis requires activating some of the machinery used
// for ZAM.
int usage_issues = 0;
////// Options relating to C++:
// If true, generate C++; // If true, generate C++;
bool gen_CPP = false; bool gen_CPP = false;
@ -61,25 +111,8 @@ struct AnalyOpt {
// If true, use C++ bodies if available. // If true, use C++ bodies if available.
bool use_CPP = false; bool use_CPP = false;
// If true, compile all compileable functions, even those that
// are inlined. Mainly useful for ensuring compatibility for
// some tests in the test suite.
bool compile_all = false;
// If true, report on available C++ bodies. // If true, report on available C++ bodies.
bool report_CPP = false; bool report_CPP = false;
// If true, report which functions are directly and indirectly
// recursive, and exit. Only germane if running the inliner.
bool report_recursive = false;
// If non-zero, looks for variables that are used-but-possibly-not-set,
// or set-but-not-used.
//
// If > 1, also reports on uses of uninitialized record fields and
// analyzes nested records in depth. Warning: with the current
// data structures this greatly increases analysis time.
int usage_issues = 0;
}; };
extern AnalyOpt analysis_options; extern AnalyOpt analysis_options;
@ -92,7 +125,11 @@ using ScriptFuncPtr = IntrusivePtr<ScriptFunc>;
// Info we need for tracking an instance of a function. // Info we need for tracking an instance of a function.
class FuncInfo { class FuncInfo {
public: public:
FuncInfo(ScriptFuncPtr func, ScopePtr scope, StmtPtr body, int priority); FuncInfo(ScriptFuncPtr _func, ScopePtr _scope, StmtPtr _body,
int _priority)
: func(std::move(_func)), scope(std::move(_scope)),
body(std::move(_body)), priority(_priority)
{}
ScriptFunc* Func() const { return func.get(); } ScriptFunc* Func() const { return func.get(); }
const ScriptFuncPtr& FuncPtr() const { return func; } const ScriptFuncPtr& FuncPtr() const { return func; }
@ -101,11 +138,11 @@ public:
int Priority() const { return priority; } int Priority() const { return priority; }
const ProfileFunc* Profile() const { return pf.get(); } const ProfileFunc* Profile() const { return pf.get(); }
std::shared_ptr<ProfileFunc> ProfilePtr() const { return pf; } std::shared_ptr<ProfileFunc> ProfilePtr() const { return pf; }
const std::string& SaveFile() const { return save_file; }
void SetBody(StmtPtr new_body) { body = std::move(new_body); } void SetBody(StmtPtr new_body) { body = std::move(new_body); }
void SetProfile(std::shared_ptr<ProfileFunc> _pf); // void SetProfile(std::shared_ptr<ProfileFunc> _pf);
void SetSaveFile(std::string _sf) { save_file = std::move(_sf); } void SetProfile(std::shared_ptr<ProfileFunc> _pf)
{ pf = std::move(_pf); }
// The following provide a way of marking FuncInfo's as // The following provide a way of marking FuncInfo's as
// should-be-skipped for script optimization, generally because // should-be-skipped for script optimization, generally because
@ -123,10 +160,6 @@ protected:
// Whether to skip optimizing this function. // Whether to skip optimizing this function.
bool skip = false; bool skip = false;
// If we're saving this function in a file, this is the name
// of the file to use.
std::string save_file;
}; };

View file

@ -8,6 +8,7 @@
#include "zeek/Reporter.h" #include "zeek/Reporter.h"
#include "zeek/Desc.h" #include "zeek/Desc.h"
#include "zeek/Traverse.h" #include "zeek/Traverse.h"
#include "zeek/script_opt/IDOptInfo.h"
#include "zeek/script_opt/Reduce.h" #include "zeek/script_opt/Reduce.h"
@ -34,6 +35,8 @@ StmtPtr Stmt::Reduce(Reducer* c)
return null; return null;
} }
c->SetCurrStmt(this);
return DoReduce(c); return DoReduce(c);
} }
@ -846,7 +849,9 @@ bool StmtList::ReduceStmt(int& s_i, StmtPList* f_stmts, Reducer* c)
auto& s_i_succ = Stmts()[s_i + 1]; auto& s_i_succ = Stmts()[s_i + 1];
// Don't reduce s_i_succ. If it's what we're // Don't reduce s_i_succ. If it's what we're
// looking for, it's already reduced. // looking for, it's already reduced. Plus
// that's what Reducer::MergeStmts (not that
// it really matters, per the comment there).
auto merge = c->MergeStmts(var, rhs, s_i_succ); auto merge = c->MergeStmts(var, rhs, s_i_succ);
if ( merge ) if ( merge )
{ {
@ -1014,10 +1019,19 @@ StmtPtr CatchReturnStmt::DoReduce(Reducer* c)
return make_intrusive<NullStmt>(); return make_intrusive<NullStmt>();
} }
auto assign = make_intrusive<AssignExpr>(ret_var->Duplicate(), auto rv_dup = ret_var->Duplicate();
ret_e->Duplicate(), auto ret_e_dup = ret_e->Duplicate();
auto assign = make_intrusive<AssignExpr>(rv_dup, ret_e_dup,
false); false);
assign_stmt = make_intrusive<ExprStmt>(assign); assign_stmt = make_intrusive<ExprStmt>(assign);
if ( ret_e_dup->Tag() == EXPR_CONST )
{
auto c = ret_e_dup->AsConstExpr();
rv_dup->AsNameExpr()->Id()->GetOptInfo()->SetConst(c);
}
return assign_stmt; return assign_stmt;
} }

View file

@ -0,0 +1,25 @@
// See the file "COPYING" in the main distribution directory for copyright.
// Auxiliary information associated with statements to aid script
// optimization.
#pragma once
namespace zeek::detail {
class StmtOptInfo {
public:
// We number statements by their traversal order in the AST.
int stmt_num = -1; // -1 = not assigned yet
// The confluence block nesting associated with the statement.
// We number these using 0 for the outermost block of a function
// (which, strictly speaking, isn't a confluence block).
int block_level = -1;
// True if we observe that there is a branch out of the statement
// to just beyond its extent, such as due to a "break".
bool contains_branch_beyond = false;
};
} // namespace zeek::detail

View file

@ -14,28 +14,15 @@ TempVar::TempVar(int num, const TypePtr& t, ExprPtr _rhs) : type(t)
rhs = std::move(_rhs); rhs = std::move(_rhs);
} }
void TempVar::SetAlias(IDPtr _alias, const DefPoints* _dps) void TempVar::SetAlias(IDPtr _alias)
{ {
if ( alias ) if ( alias )
reporter->InternalError("Re-aliasing a temporary"); reporter->InternalError("Re-aliasing a temporary");
if ( ! _dps )
{
printf("trying to alias %s to %s\n", name.c_str(), _alias->Name());
reporter->InternalError("Empty dps for alias");
}
if ( alias == id ) if ( alias == id )
reporter->InternalError("Creating alias loop"); reporter->InternalError("Creating alias loop");
alias = std::move(_alias); alias = std::move(_alias);
dps = _dps;
}
void TempVar::SetDPs(const DefPoints* _dps)
{
ASSERT(_dps->length() == 1);
dps = _dps;
} }
} // zeek::detail } // zeek::detail

View file

@ -9,6 +9,7 @@
#include "zeek/ID.h" #include "zeek/ID.h"
#include "zeek/Expr.h" #include "zeek/Expr.h"
#include "zeek/script_opt/IDOptInfo.h"
#include "zeek/script_opt/ReachingDefs.h" #include "zeek/script_opt/ReachingDefs.h"
namespace zeek::detail { namespace zeek::detail {
@ -22,21 +23,24 @@ public:
const Expr* RHS() const { return rhs.get(); } const Expr* RHS() const { return rhs.get(); }
IDPtr Id() const { return id; } IDPtr Id() const { return id; }
void SetID(IDPtr _id) { id = std::move(_id); } void SetID(IDPtr _id)
{
id = std::move(_id);
id->GetOptInfo()->SetTemp();
}
void Deactivate() { active = false; } void Deactivate() { active = false; }
bool IsActive() const { return active; } bool IsActive() const { return active; }
// Associated constant expression, if any. // Associated constant expression, if any.
const ConstExpr* Const() const { return const_expr; } const ConstExpr* Const() const { return id->GetOptInfo()->Const(); }
// The most use of "const" in any single line in the Zeek // The most use of "const" in any single line in the Zeek
// codebase :-P ... though only by one! // codebase :-P ... though only by one!
void SetConst(const ConstExpr* _const) { const_expr = _const; } void SetConst(const ConstExpr* _const)
{ id->GetOptInfo()->SetConst(_const); }
IDPtr Alias() const { return alias; } IDPtr Alias() const { return alias; }
const DefPoints* DPs() const { return dps; } void SetAlias(IDPtr id);
void SetAlias(IDPtr id, const DefPoints* dps);
void SetDPs(const DefPoints* _dps);
const RDPtr& MaxRDs() const { return max_rds; } const RDPtr& MaxRDs() const { return max_rds; }
void SetMaxRDs(RDPtr rds) { max_rds = std::move(rds); } void SetMaxRDs(RDPtr rds) { max_rds = std::move(rds); }
@ -47,9 +51,7 @@ protected:
const TypePtr& type; const TypePtr& type;
ExprPtr rhs; ExprPtr rhs;
bool active = true; bool active = true;
const ConstExpr* const_expr = nullptr;
IDPtr alias; IDPtr alias;
const DefPoints* dps = nullptr;
RDPtr max_rds; RDPtr max_rds;
}; };

View file

@ -79,10 +79,8 @@ bool UseDefs::RemoveUnused(int iter)
bool did_omission = false; bool did_omission = false;
for ( unsigned int i = 0; i < stmts.size(); ++i ) for ( const auto& s : stmts )
{ {
const auto& s = stmts[i];
if ( s->Tag() == STMT_INIT ) if ( s->Tag() == STMT_INIT )
{ {
auto init = s->AsInitStmt(); auto init = s->AsInitStmt();
@ -94,7 +92,7 @@ bool UseDefs::RemoveUnused(int iter)
! CheckIfUnused(s, id.get(), false) ) ! CheckIfUnused(s, id.get(), false) )
used_ids.emplace_back(id); used_ids.emplace_back(id);
if ( used_ids.size() == 0 ) if ( used_ids.empty() )
{ // There aren't any ID's to keep. { // There aren't any ID's to keep.
rc->AddStmtToOmit(s); rc->AddStmtToOmit(s);
continue; continue;

1060
src/script_opt/ZAM/AM-Opt.cc Normal file

File diff suppressed because it is too large Load diff

View file

@ -0,0 +1,171 @@
// See the file "COPYING" in the main distribution directory for copyright.
// Methods for dealing with ZAM branches.
#include "zeek/Reporter.h"
#include "zeek/Desc.h"
#include "zeek/script_opt/ZAM/Compile.h"
namespace zeek::detail {
void ZAMCompiler::PushGoTos(GoToSets& gotos)
{
gotos.push_back({});
}
void ZAMCompiler::ResolveGoTos(GoToSets& gotos, const InstLabel l)
{
for ( auto& gi : gotos.back() )
SetGoTo(gi, l);
gotos.pop_back();
}
ZAMStmt ZAMCompiler::GenGoTo(GoToSet& v)
{
auto g = GoToStub();
v.push_back(g.stmt_num);
return g;
}
ZAMStmt ZAMCompiler::GoToStub()
{
ZInstI z(OP_GOTO_V, 0);
z.op_type = OP_V_I1;
return AddInst(z);
}
ZAMStmt ZAMCompiler::GoTo(const InstLabel l)
{
ZInstI inst(OP_GOTO_V, 0);
inst.target = l;
inst.target_slot = 1;
inst.op_type = OP_V_I1;
return AddInst(inst);
}
InstLabel ZAMCompiler::GoToTarget(const ZAMStmt s)
{
return insts1[s.stmt_num];
}
InstLabel ZAMCompiler::GoToTargetBeyond(const ZAMStmt s)
{
int n = s.stmt_num;
if ( n == int(insts1.size()) - 1 )
{
if ( ! pending_inst )
pending_inst = new ZInstI();
return pending_inst;
}
return insts1[n+1];
}
void ZAMCompiler::SetTarget(ZInstI* inst, const InstLabel l, int slot)
{
inst->target = l;
inst->target_slot = slot;
}
ZInstI* ZAMCompiler::FindLiveTarget(ZInstI* goto_target)
{
if ( goto_target == pending_inst )
return goto_target;
int idx = goto_target->inst_num;
ASSERT(idx >= 0 && idx <= insts1.size());
while ( idx < int(insts1.size()) && ! insts1[idx]->live )
++idx;
if ( idx == int(insts1.size()) )
return pending_inst;
else
return insts1[idx];
}
void ZAMCompiler::ConcretizeBranch(ZInstI* inst, ZInstI* target,
int target_slot)
{
int t; // instruction number of target
if ( target == pending_inst )
{
if ( insts2.empty() )
// We're doing this in the context of concretizing
// intermediary instructions for dumping them out.
t = insts1.size();
else
t = insts2.size();
}
else
t = target->inst_num;
switch ( target_slot ) {
case 1: inst->v1 = t; break;
case 2: inst->v2 = t; break;
case 3: inst->v3 = t; break;
case 4: inst->v4 = t; break;
default:
reporter->InternalError("bad GoTo target");
}
}
void ZAMCompiler::SetV1(ZAMStmt s, const InstLabel l)
{
auto inst = insts1[s.stmt_num];
SetTarget(inst, l, 1);
ASSERT(inst->op_type == OP_V || inst->op_type == OP_V_I1);
inst->op_type = OP_V_I1;
}
void ZAMCompiler::SetV2(ZAMStmt s, const InstLabel l)
{
auto inst = insts1[s.stmt_num];
SetTarget(inst, l, 2);
auto& ot = inst->op_type;
if ( ot == OP_VV )
ot = OP_VV_I2;
else if ( ot == OP_VC || ot == OP_VVC )
ot = OP_VVC_I2;
else
ASSERT(ot == OP_VV_I2 || ot == OP_VV_I1_I2 || ot == OP_VVC_I2);
}
void ZAMCompiler::SetV3(ZAMStmt s, const InstLabel l)
{
auto inst = insts1[s.stmt_num];
SetTarget(inst, l, 3);
auto ot = inst->op_type;
if ( ot == OP_VVV_I2_I3 || ot == OP_VVVC_I3 )
return;
ASSERT(ot == OP_VV || ot == OP_VVV || ot == OP_VVV_I3);
inst->op_type = OP_VVV_I3;
}
void ZAMCompiler::SetV4(ZAMStmt s, const InstLabel l)
{
auto inst = insts1[s.stmt_num];
SetTarget(inst, l, 4);
auto ot = inst->op_type;
ASSERT(ot == OP_VVVV || ot == OP_VVVV_I4);
if ( ot != OP_VVVV_I4 )
inst->op_type = OP_VVVV_I4;
}
} // zeek::detail

View file

@ -0,0 +1,447 @@
// See the file "COPYING" in the main distribution directory for copyright.
// ZAM methods associated with instructions that replace calls to
// built-in functions.
#include "zeek/Func.h"
#include "zeek/Reporter.h"
#include "zeek/script_opt/ZAM/Compile.h"
namespace zeek::detail {
bool ZAMCompiler::IsZAM_BuiltIn(const Expr* e)
{
// The expression e is either directly a call (in which case there's
// no return value), or an assignment to a call.
const CallExpr* c;
if ( e->Tag() == EXPR_CALL )
c = e->AsCallExpr();
else
c = e->GetOp2()->AsCallExpr();
auto func_expr = c->Func();
if ( func_expr->Tag() != EXPR_NAME )
// An indirect call.
return false;
auto func_val = func_expr->AsNameExpr()->Id()->GetVal();
if ( ! func_val )
// A call to a function that hasn't been defined.
return false;
auto func = func_val->AsFunc();
if ( func->GetKind() != BuiltinFunc::BUILTIN_FUNC )
return false;
auto& args = c->Args()->Exprs();
const NameExpr* n = nullptr; // name to assign to, if any
if ( e->Tag() != EXPR_CALL )
n = e->GetOp1()->AsRefExpr()->GetOp1()->AsNameExpr();
using GenBuiltIn = bool (ZAMCompiler::*)(const NameExpr* n,
const ExprPList& args);
static std::vector<std::pair<const char*, GenBuiltIn>> builtins = {
{ "Analyzer::__name", &ZAMCompiler::BuiltIn_Analyzer__name },
{ "Broker::__flush_logs",
&ZAMCompiler::BuiltIn_Broker__flush_logs },
{ "Files::__enable_reassembly",
&ZAMCompiler::BuiltIn_Files__enable_reassembly },
{ "Files::__set_reassembly_buffer",
&ZAMCompiler::BuiltIn_Files__set_reassembly_buffer },
{ "Log::__write", &ZAMCompiler::BuiltIn_Log__write },
{ "current_time", &ZAMCompiler::BuiltIn_current_time },
{ "get_port_transport_proto",
&ZAMCompiler::BuiltIn_get_port_etc },
{ "network_time", &ZAMCompiler::BuiltIn_network_time },
{ "reading_live_traffic",
&ZAMCompiler::BuiltIn_reading_live_traffic },
{ "reading_traces", &ZAMCompiler::BuiltIn_reading_traces },
{ "strstr", &ZAMCompiler::BuiltIn_strstr },
{ "sub_bytes", &ZAMCompiler::BuiltIn_sub_bytes },
{ "to_lower", &ZAMCompiler::BuiltIn_to_lower },
};
for ( auto& b : builtins )
if ( util::streq(func->Name(), b.first) )
return (this->*(b.second))(n ,args);
return false;
}
bool ZAMCompiler::BuiltIn_Analyzer__name(const NameExpr* n,
const ExprPList& args)
{
if ( ! n )
{
reporter->Warning("return value from built-in function ignored");
return true;
}
if ( args[0]->Tag() == EXPR_CONST )
// Doesn't seem worth developing a variant for this weird
// usage cast.
return false;
int nslot = Frame1Slot(n, OP1_WRITE);
auto arg_t = args[0]->AsNameExpr();
auto z = ZInstI(OP_ANALYZER__NAME_VV, nslot, FrameSlot(arg_t));
z.SetType(args[0]->GetType());
AddInst(z);
return true;
}
bool ZAMCompiler::BuiltIn_Broker__flush_logs(const NameExpr* n,
const ExprPList& args)
{
if ( n )
AddInst(ZInstI(OP_BROKER_FLUSH_LOGS_V,
Frame1Slot(n, OP1_WRITE)));
else
AddInst(ZInstI(OP_BROKER_FLUSH_LOGS_X));
return true;
}
bool ZAMCompiler::BuiltIn_Files__enable_reassembly(const NameExpr* n,
const ExprPList& args)
{
if ( n )
// While this built-in nominally returns a value, existing
// script code ignores it, so for now we don't bother
// special-casing the possibility that it doesn't.
return false;
if ( args[0]->Tag() == EXPR_CONST )
// Weird!
return false;
auto arg_f = args[0]->AsNameExpr();
AddInst(ZInstI(OP_FILES__ENABLE_REASSEMBLY_V, FrameSlot(arg_f)));
return true;
}
bool ZAMCompiler::BuiltIn_Files__set_reassembly_buffer(const NameExpr* n,
const ExprPList& args)
{
if ( n )
// See above for enable_reassembly
return false;
if ( args[0]->Tag() == EXPR_CONST )
// Weird!
return false;
auto arg_f = FrameSlot(args[0]->AsNameExpr());
ZInstI z;
if ( args[1]->Tag() == EXPR_CONST )
{
auto arg_cnt = args[1]->AsConstExpr()->Value()->AsCount();
z = ZInstI(OP_FILES__SET_REASSEMBLY_BUFFER_VC, arg_f, arg_cnt);
z.op_type = OP_VV_I2;
}
else
z = ZInstI(OP_FILES__SET_REASSEMBLY_BUFFER_VV, arg_f,
FrameSlot(args[1]->AsNameExpr()));
AddInst(z);
return true;
}
bool ZAMCompiler::BuiltIn_Log__write(const NameExpr* n, const ExprPList& args)
{
auto id = args[0];
auto columns = args[1];
if ( columns->Tag() != EXPR_NAME )
return false;
auto columns_n = columns->AsNameExpr();
auto col_slot = FrameSlot(columns_n);
bool const_id = (id->Tag() == EXPR_CONST);
ZInstAux* aux = nullptr;
if ( const_id )
{
aux = new ZInstAux(1);
aux->Add(0, id->AsConstExpr()->ValuePtr());
}
ZInstI z;
if ( n )
{
int nslot = Frame1Slot(n, OP1_WRITE);
if ( const_id )
{
z = ZInstI(OP_LOG_WRITEC_VV, nslot, col_slot);
z.aux = aux;
}
else
z = ZInstI(OP_LOG_WRITE_VVV, nslot,
FrameSlot(id->AsNameExpr()), col_slot);
}
else
{
if ( const_id )
{
z = ZInstI(OP_LOG_WRITEC_V, col_slot, id->AsConstExpr());
z.aux = aux;
}
else
z = ZInstI(OP_LOG_WRITE_VV, FrameSlot(id->AsNameExpr()),
col_slot);
}
z.SetType(columns_n->GetType());
AddInst(z);
return true;
}
bool ZAMCompiler::BuiltIn_current_time(const NameExpr* n, const ExprPList& args)
{
if ( ! n )
{
reporter->Warning("return value from built-in function ignored");
return true;
}
int nslot = Frame1Slot(n, OP1_WRITE);
AddInst(ZInstI(OP_CURRENT_TIME_V, nslot));
return true;
}
bool ZAMCompiler::BuiltIn_get_port_etc(const NameExpr* n, const ExprPList& args)
{
if ( ! n )
{
reporter->Warning("return value from built-in function ignored");
return true;
}
auto p = args[0];
if ( p->Tag() != EXPR_NAME )
return false;
auto pn = p->AsNameExpr();
int nslot = Frame1Slot(n, OP1_WRITE);
AddInst(ZInstI(OP_GET_PORT_TRANSPORT_PROTO_VV, nslot, FrameSlot(pn)));
return true;
}
bool ZAMCompiler::BuiltIn_network_time(const NameExpr* n, const ExprPList& args)
{
if ( ! n )
{
reporter->Warning("return value from built-in function ignored");
return true;
}
int nslot = Frame1Slot(n, OP1_WRITE);
AddInst(ZInstI(OP_NETWORK_TIME_V, nslot));
return true;
}
bool ZAMCompiler::BuiltIn_reading_live_traffic(const NameExpr* n,
const ExprPList& args)
{
if ( ! n )
{
reporter->Warning("return value from built-in function ignored");
return true;
}
int nslot = Frame1Slot(n, OP1_WRITE);
AddInst(ZInstI(OP_READING_LIVE_TRAFFIC_V, nslot));
return true;
}
bool ZAMCompiler::BuiltIn_reading_traces(const NameExpr* n,
const ExprPList& args)
{
if ( ! n )
{
reporter->Warning("return value from built-in function ignored");
return true;
}
int nslot = Frame1Slot(n, OP1_WRITE);
AddInst(ZInstI(OP_READING_TRACES_V, nslot));
return true;
}
bool ZAMCompiler::BuiltIn_strstr(const NameExpr* n, const ExprPList& args)
{
if ( ! n )
{
reporter->Warning("return value from built-in function ignored");
return true;
}
auto big = args[0];
auto little = args[1];
auto big_n = big->Tag() == EXPR_NAME ? big->AsNameExpr() : nullptr;
auto little_n = little->Tag() == EXPR_NAME ?
little->AsNameExpr() : nullptr;
ZInstI z;
if ( big_n && little_n )
z = GenInst(OP_STRSTR_VVV, n, big_n, little_n);
else if ( big_n )
z = GenInst(OP_STRSTR_VVC, n, big_n, little->AsConstExpr());
else if ( little_n )
z = GenInst(OP_STRSTR_VCV, n, little_n, big->AsConstExpr());
else
return false;
AddInst(z);
return true;
}
bool ZAMCompiler::BuiltIn_sub_bytes(const NameExpr* n, const ExprPList& args)
{
if ( ! n )
{
reporter->Warning("return value from built-in function ignored");
return true;
}
auto arg_s = args[0];
auto arg_start = args[1];
auto arg_n = args[2];
int nslot = Frame1Slot(n, OP1_WRITE);
int v2 = FrameSlotIfName(arg_s);
int v3 = ConvertToCount(arg_start);
int v4 = ConvertToInt(arg_n);
auto c = arg_s->Tag() == EXPR_CONST ? arg_s->AsConstExpr() : nullptr;
ZInstI z;
switch ( ConstArgsMask(args, 3) ) {
case 0x0: // all variable
z = ZInstI(OP_SUB_BYTES_VVVV, nslot, v2, v3, v4);
z.op_type = OP_VVVV;
break;
case 0x1: // last argument a constant
z = ZInstI(OP_SUB_BYTES_VVVi, nslot, v2, v3, v4);
z.op_type = OP_VVVV_I4;
break;
case 0x2: // 2nd argument a constant; flip!
z = ZInstI(OP_SUB_BYTES_VViV, nslot, v2, v4, v3);
z.op_type = OP_VVVV_I4;
break;
case 0x3: // both 2nd and third are constants
z = ZInstI(OP_SUB_BYTES_VVii, nslot, v2, v3, v4);
z.op_type = OP_VVVV_I3_I4;
break;
case 0x4: // first argument a constant
z = ZInstI(OP_SUB_BYTES_VVVC, nslot, v3, v4, c);
z.op_type = OP_VVVC;
break;
case 0x5: // first and third constant
z = ZInstI(OP_SUB_BYTES_VViC, nslot, v3, v4, c);
z.op_type = OP_VVVC_I3;
break;
case 0x6: // first and second constant - flip!
z = ZInstI(OP_SUB_BYTES_ViVC, nslot, v4, v3, c);
z.op_type = OP_VVVC_I3;
break;
case 0x7: // whole shebang
z = ZInstI(OP_SUB_BYTES_ViiC, nslot, v3, v4, c);
z.op_type = OP_VVVC_I2_I3;
break;
default:
reporter->InternalError("bad constant mask");
}
AddInst(z);
return true;
}
bool ZAMCompiler::BuiltIn_to_lower(const NameExpr* n, const ExprPList& args)
{
if ( ! n )
{
reporter->Warning("return value from built-in function ignored");
return true;
}
int nslot = Frame1Slot(n, OP1_WRITE);
if ( args[0]->Tag() == EXPR_CONST )
{
auto arg_c = args[0]->AsConstExpr()->Value()->AsStringVal();
ValPtr arg_lc = {AdoptRef{}, ZAM_to_lower(arg_c)};
auto arg_lce = make_intrusive<ConstExpr>(arg_lc);
auto z = ZInstI(OP_ASSIGN_CONST_VC, nslot, arg_lce.get());
z.is_managed = true;
AddInst(z);
}
else
{
auto arg_s = args[0]->AsNameExpr();
AddInst(ZInstI(OP_TO_LOWER_VV, nslot, FrameSlot(arg_s)));
}
return true;
}
bro_uint_t ZAMCompiler::ConstArgsMask(const ExprPList& args, int nargs) const
{
ASSERT(args.length() == nargs);
bro_uint_t mask = 0;
for ( int i = 0; i < nargs; ++i )
{
mask <<= 1;
if ( args[i]->Tag() == EXPR_CONST )
mask |= 1;
}
return mask;
}
} // zeek::detail

View file

@ -0,0 +1,27 @@
// See the file "COPYING" in the main distribution directory for copyright.
// ZAM compiler method declarations for built-in functions.
//
// This file is only included by ZAM.h, in the context of the ZAM class
// declaration (so these are methods, not standalone functions). We maintain
// it separately so that the conceptual overhead of adding a new built-in
// is lower.
// If the given expression corresponds to a call to a ZAM built-in,
// then compiles the call and returns true. Otherwise, returns false.
bool IsZAM_BuiltIn(const Expr* e);
// Built-ins return true if able to compile the call, false if not.
bool BuiltIn_Analyzer__name(const NameExpr* n, const ExprPList& args);
bool BuiltIn_Broker__flush_logs(const NameExpr* n, const ExprPList& args);
bool BuiltIn_Files__enable_reassembly(const NameExpr* n, const ExprPList& args);
bool BuiltIn_Files__set_reassembly_buffer(const NameExpr* n, const ExprPList& args);
bool BuiltIn_Log__write(const NameExpr* n, const ExprPList& args);
bool BuiltIn_current_time(const NameExpr* n, const ExprPList& args);
bool BuiltIn_get_port_etc(const NameExpr* n, const ExprPList& args);
bool BuiltIn_network_time(const NameExpr* n, const ExprPList& args);
bool BuiltIn_reading_live_traffic(const NameExpr* n, const ExprPList& args);
bool BuiltIn_reading_traces(const NameExpr* n, const ExprPList& args);
bool BuiltIn_strstr(const NameExpr* n, const ExprPList& args);
bool BuiltIn_sub_bytes(const NameExpr* n, const ExprPList& args);
bool BuiltIn_to_lower(const NameExpr* n, const ExprPList& args);

View file

@ -0,0 +1,639 @@
// See the file "COPYING" in the main distribution directory for copyright.
// ZAM: Zeek Abstract Machine compiler.
#pragma once
#include "zeek/Event.h"
#include "zeek/script_opt/UseDefs.h"
#include "zeek/script_opt/ZAM/ZBody.h"
namespace zeek {
class EventHandler;
}
namespace zeek::detail {
class NameExpr;
class ConstExpr;
class FieldExpr;
class ListExpr;
class Stmt;
class SwitchStmt;
class CatchReturnStmt;
class ProfileFunc;
using InstLabel = ZInstI*;
// Class representing a single compiled statement. (This is different from,
// but related to, the ZAM instruction(s) generated for that compilation.)
// Designed to be fully opaque, but also effective without requiring pointer
// management.
class ZAMStmt {
protected:
friend class ZAMCompiler;
ZAMStmt() { stmt_num = -1; /* flag that it needs to be set */ }
ZAMStmt(int _stmt_num) { stmt_num = _stmt_num; }
int stmt_num;
};
// Class that holds values that only have meaning to the ZAM compiler,
// but that needs to be held (opaquely, via a pointer) by external
// objects.
class OpaqueVals {
public:
OpaqueVals(ZInstAux* _aux) { aux = _aux; }
ZInstAux* aux;
};
class ZAMCompiler {
public:
ZAMCompiler(ScriptFunc* f, std::shared_ptr<ProfileFunc> pf,
ScopePtr scope, StmtPtr body, std::shared_ptr<UseDefs> ud,
std::shared_ptr<Reducer> rd);
StmtPtr CompileBody();
const FrameReMap& FrameDenizens() const
{ return shared_frame_denizens_final; }
const std::vector<int>& ManagedSlots() const
{ return managed_slotsI; }
const std::vector<GlobalInfo>& Globals() const
{ return globalsI; }
bool NonRecursive() const { return non_recursive; }
const TableIterVec& GetTableIters() const { return table_iters; }
int NumStepIters() const { return num_step_iters; }
template <typename T>
const CaseMaps<T>& GetCases() const
{
if constexpr ( std::is_same_v<T, bro_int_t> )
return int_cases;
else if constexpr ( std::is_same_v<T, bro_uint_t> )
return uint_cases;
else if constexpr ( std::is_same_v<T, double> )
return double_cases;
else if constexpr ( std::is_same_v<T, std::string> )
return str_cases;
}
void Dump();
private:
void Init();
void InitGlobals();
void InitArgs();
void InitLocals();
void TrackMemoryManagement();
void ResolveHookBreaks();
void ComputeLoopLevels();
void AdjustBranches();
void RetargetBranches();
void RemapFrameDenizens(const std::vector<int>& inst1_to_inst2);
void CreateSharedFrameDenizens();
void ConcretizeSwitches();
// The following are used for switch statements, mapping the
// switch value (which can be any atomic type) to a branch target.
// We have vectors of them because functions can contain multiple
// switches.
// See ZBody.h for their concrete counterparts, which we've
// already #include'd.
template<typename T> using CaseMapI = std::map<T, InstLabel>;
template<typename T> using CaseMapsI = std::vector<CaseMapI<T>>;
template <typename T>
void ConcretizeSwitchTables(const CaseMapsI<T>& abstract_cases,
CaseMaps<T>& concrete_cases);
template <typename T>
void DumpCases(const T& cases, const char* type_name) const;
void DumpInsts1(const FrameReMap* remappings);
#include "zeek/ZAM-MethodDecls.h"
const ZAMStmt CompileStmt(const StmtPtr& body)
{ return CompileStmt(body.get()); }
const ZAMStmt CompileStmt(const Stmt* body);
void SetCurrStmt(const Stmt* stmt) { curr_stmt = stmt; }
const ZAMStmt CompilePrint(const PrintStmt* ps);
const ZAMStmt CompileExpr(const ExprStmt* es);
const ZAMStmt CompileIf(const IfStmt* is);
const ZAMStmt CompileSwitch(const SwitchStmt* sw);
const ZAMStmt CompileAdd(const AddStmt* as);
const ZAMStmt CompileDel(const DelStmt* ds);
const ZAMStmt CompileWhile(const WhileStmt* ws);
const ZAMStmt CompileFor(const ForStmt* f);
const ZAMStmt CompileReturn(const ReturnStmt* r);
const ZAMStmt CompileCatchReturn(const CatchReturnStmt* cr);
const ZAMStmt CompileStmts(const StmtList* sl);
const ZAMStmt CompileInit(const InitStmt* is);
const ZAMStmt CompileWhen(const WhenStmt* ws);
const ZAMStmt CompileNext()
{ return GenGoTo(nexts.back()); }
const ZAMStmt CompileBreak()
{ return GenGoTo(breaks.back()); }
const ZAMStmt CompileFallThrough()
{ return GenGoTo(fallthroughs.back()); }
const ZAMStmt CompileCatchReturn()
{ return GenGoTo(catches.back()); }
const ZAMStmt IfElse(const Expr* e, const Stmt* s1, const Stmt* s2);
const ZAMStmt While(const Stmt* cond_stmt, const Expr* cond,
const Stmt* body);
const ZAMStmt InitRecord(IDPtr id, RecordType* rt);
const ZAMStmt InitVector(IDPtr id, VectorType* vt);
const ZAMStmt InitTable(IDPtr id, TableType* tt, Attributes* attrs);
const ZAMStmt ValueSwitch(const SwitchStmt* sw, const NameExpr* v,
const ConstExpr* c);
const ZAMStmt TypeSwitch(const SwitchStmt* sw, const NameExpr* v,
const ConstExpr* c);
void PushNexts() { PushGoTos(nexts); }
void PushBreaks() { PushGoTos(breaks); }
void PushFallThroughs() { PushGoTos(fallthroughs); }
void PushCatchReturns() { PushGoTos(catches); }
void ResolveNexts(const InstLabel l)
{ ResolveGoTos(nexts, l); }
void ResolveBreaks(const InstLabel l)
{ ResolveGoTos(breaks, l); }
void ResolveFallThroughs(const InstLabel l)
{ ResolveGoTos(fallthroughs, l); }
void ResolveCatchReturns(const InstLabel l)
{ ResolveGoTos(catches, l); }
const ZAMStmt LoopOverTable(const ForStmt* f, const NameExpr* val);
const ZAMStmt LoopOverVector(const ForStmt* f, const NameExpr* val);
const ZAMStmt LoopOverString(const ForStmt* f, const Expr* e);
const ZAMStmt FinishLoop(const ZAMStmt iter_head, ZInstI iter_stmt,
const Stmt* body, int iter_slot,
bool is_table);
const ZAMStmt Loop(const Stmt* body);
const ZAMStmt CompileExpr(const ExprPtr& e)
{ return CompileExpr(e.get()); }
const ZAMStmt CompileExpr(const Expr* body);
const ZAMStmt CompileIncrExpr(const IncrExpr* e);
const ZAMStmt CompileAppendToExpr(const AppendToExpr* e);
const ZAMStmt CompileAssignExpr(const AssignExpr* e);
const ZAMStmt CompileAssignToIndex(const NameExpr* lhs,
const IndexExpr* rhs);
const ZAMStmt CompileFieldLHSAssignExpr(const FieldLHSAssignExpr* e);
const ZAMStmt CompileScheduleExpr(const ScheduleExpr* e);
const ZAMStmt CompileSchedule(const NameExpr* n, const ConstExpr* c,
int is_interval, EventHandler* h,
const ListExpr* l);
const ZAMStmt CompileEvent(EventHandler* h, const ListExpr* l);
const ZAMStmt CompileInExpr(const NameExpr* n1, const NameExpr* n2,
const NameExpr* n3)
{ return CompileInExpr(n1, n2, nullptr, n3, nullptr); }
const ZAMStmt CompileInExpr(const NameExpr* n1, const NameExpr* n2,
const ConstExpr* c)
{ return CompileInExpr(n1, n2, nullptr, nullptr, c); }
const ZAMStmt CompileInExpr(const NameExpr* n1, const ConstExpr* c,
const NameExpr* n3)
{ return CompileInExpr(n1, nullptr, c, n3, nullptr); }
// In the following, one of n2 or c2 (likewise, n3/c3) will be nil.
const ZAMStmt CompileInExpr(const NameExpr* n1, const NameExpr* n2,
const ConstExpr* c2, const NameExpr* n3,
const ConstExpr* c3);
const ZAMStmt CompileInExpr(const NameExpr* n1, const ListExpr* l,
const NameExpr* n2)
{ return CompileInExpr(n1, l, n2, nullptr); }
const ZAMStmt CompileInExpr(const NameExpr* n, const ListExpr* l,
const ConstExpr* c)
{ return CompileInExpr(n, l, nullptr, c); }
const ZAMStmt CompileInExpr(const NameExpr* n1, const ListExpr* l,
const NameExpr* n2, const ConstExpr* c);
const ZAMStmt CompileIndex(const NameExpr* n1, const NameExpr* n2,
const ListExpr* l);
const ZAMStmt CompileIndex(const NameExpr* n1, const ConstExpr* c,
const ListExpr* l);
const ZAMStmt CompileIndex(const NameExpr* n1, int n2_slot,
const TypePtr& n2_type, const ListExpr* l);
// Second argument is which instruction slot holds the branch target.
const ZAMStmt GenCond(const Expr* e, int& branch_v);
const ZAMStmt Call(const ExprStmt* e);
const ZAMStmt AssignToCall(const ExprStmt* e);
const ZAMStmt DoCall(const CallExpr* c, const NameExpr* n);
const ZAMStmt AssignVecElems(const Expr* e);
const ZAMStmt AssignTableElem(const Expr* e);
const ZAMStmt AppendToField(const NameExpr* n1, const NameExpr* n2,
const ConstExpr* c, int offset);
const ZAMStmt ConstructTable(const NameExpr* n, const Expr* e);
const ZAMStmt ConstructSet(const NameExpr* n, const Expr* e);
const ZAMStmt ConstructRecord(const NameExpr* n, const Expr* e);
const ZAMStmt ConstructVector(const NameExpr* n, const Expr* e);
const ZAMStmt ArithCoerce(const NameExpr* n, const Expr* e);
const ZAMStmt RecordCoerce(const NameExpr* n, const Expr* e);
const ZAMStmt TableCoerce(const NameExpr* n, const Expr* e);
const ZAMStmt VectorCoerce(const NameExpr* n, const Expr* e);
const ZAMStmt Is(const NameExpr* n, const Expr* e);
#include "zeek/script_opt/ZAM/Inst-Gen.h"
#include "zeek/script_opt/ZAM/BuiltIn.h"
// A bit weird, but handy for switch statements used in built-in
// operations: returns a bit mask of which of the arguments in the
// given list correspond to constants, with the high-ordered bit
// being the first argument (argument "0" in the list) and the
// low-ordered bit being the last. Second parameter is the number
// of arguments that should be present.
bro_uint_t ConstArgsMask(const ExprPList& args, int nargs) const;
int ConvertToInt(const Expr* e)
{
if ( e->Tag() == EXPR_NAME )
return FrameSlot(e->AsNameExpr()->Id());
else
return e->AsConstExpr()->Value()->AsInt();
}
int ConvertToCount(const Expr* e)
{
if ( e->Tag() == EXPR_NAME )
return FrameSlot(e->AsNameExpr()->Id());
else
return e->AsConstExpr()->Value()->AsCount();
}
using GoToSet = std::vector<ZAMStmt>;
using GoToSets = std::vector<GoToSet>;
void PushGoTos(GoToSets& gotos);
void ResolveGoTos(GoToSets& gotos, const InstLabel l);
ZAMStmt GenGoTo(GoToSet& v);
ZAMStmt GoToStub();
ZAMStmt GoTo(const InstLabel l);
InstLabel GoToTarget(const ZAMStmt s);
InstLabel GoToTargetBeyond(const ZAMStmt s);
void SetTarget(ZInstI* inst, const InstLabel l, int slot);
// Given a GoTo target, find its live equivalent (first instruction
// at that location or beyond that's live).
ZInstI* FindLiveTarget(ZInstI* goto_target);
// Given an instruction that has a slot associated with the
// given target, updates the slot to correspond with the current
// instruction number of the target.
void ConcretizeBranch(ZInstI* inst, ZInstI* target, int target_slot);
void SetV(ZAMStmt s, const InstLabel l, int v)
{
if ( v == 1 )
SetV1(s, l);
else if ( v == 2 )
SetV2(s, l);
else if ( v == 3 )
SetV3(s, l);
else
SetV4(s, l);
}
void SetV1(ZAMStmt s, const InstLabel l);
void SetV2(ZAMStmt s, const InstLabel l);
void SetV3(ZAMStmt s, const InstLabel l);
void SetV4(ZAMStmt s, const InstLabel l);
void SetGoTo(ZAMStmt s, const InstLabel targ)
{ SetV1(s, targ); }
const ZAMStmt StartingBlock();
const ZAMStmt FinishBlock(const ZAMStmt start);
bool NullStmtOK() const;
const ZAMStmt EmptyStmt();
const ZAMStmt ErrorStmt();
const ZAMStmt LastInst();
// Returns a handle to state associated with building
// up a list of values.
OpaqueVals* BuildVals(const ListExprPtr&);
// "stride" is how many slots each element of l will consume.
ZInstAux* InternalBuildVals(const ListExpr* l, int stride = 1);
// Returns how many values were added.
int InternalAddVal(ZInstAux* zi, int i, Expr* e);
const ZAMStmt AddInst(const ZInstI& inst);
// Returns the statement just before the given one.
ZAMStmt PrevStmt(const ZAMStmt s);
// Returns the last (interpreter) statement in the body.
const Stmt* LastStmt(const Stmt* s) const;
// Returns the most recent added instruction *other* than those
// added for bookkeeping.
ZInstI* TopMainInst() { return insts1[top_main_inst]; }
bool IsUnused(const IDPtr& id, const Stmt* where) const;
void LoadParam(ID* id);
const ZAMStmt LoadGlobal(ID* id);
int AddToFrame(ID*);
int FrameSlot(const IDPtr& id) { return FrameSlot(id.get()); }
int FrameSlot(const ID* id);
int FrameSlotIfName(const Expr* e)
{
auto n = e->Tag() == EXPR_NAME ? e->AsNameExpr() : nullptr;
return n ? FrameSlot(n->Id()) : 0;
}
int FrameSlot(const NameExpr* id)
{ return FrameSlot(id->AsNameExpr()->Id()); }
int Frame1Slot(const NameExpr* id, ZOp op)
{ return Frame1Slot(id->AsNameExpr()->Id(), op); }
int Frame1Slot(const ID* id, ZOp op)
{ return Frame1Slot(id, op1_flavor[op]); }
int Frame1Slot(const NameExpr* n, ZAMOp1Flavor fl)
{ return Frame1Slot(n->Id(), fl); }
int Frame1Slot(const ID* id, ZAMOp1Flavor fl);
// The slot without doing any global-related checking.
int RawSlot(const NameExpr* n) { return RawSlot(n->Id()); }
int RawSlot(const ID* id);
bool HasFrameSlot(const ID* id) const;
int NewSlot(const TypePtr& t)
{ return NewSlot(ZVal::IsManagedType(t)); }
int NewSlot(bool is_managed);
int TempForConst(const ConstExpr* c);
////////////////////////////////////////////////////////////
// The following methods relate to optimizing the low-level
// ZAM function body after it is initially generated. They're
// factored out into ZOpt.cc since they're structurally quite
// different from the methods above that relate to the initial
// compilation.
// Optimizing the low-level compiled instructions.
void OptimizeInsts();
// Tracks which instructions can be branched to via the given
// set of switches.
template<typename T>
void TallySwitchTargets(const CaseMapsI<T>& switches);
// Remove code that can't be reached. True if some removal happened.
bool RemoveDeadCode();
// Collapse chains of gotos. True if some something changed.
bool CollapseGoTos();
// Prune statements that are unnecessary. True if something got
// pruned.
bool PruneUnused();
// For the current state of insts1, compute lifetimes of frame
// denizens (variable(s) using a given frame slot) in terms of
// first-instruction-to-last-instruction during which they're
// relevant, including consideration for loops.
void ComputeFrameLifetimes();
// Given final frame lifetime information, remaps frame members
// with non-overlapping lifetimes to share slots.
void ReMapFrame();
// Given final frame lifetime information, remaps slots in the
// interpreter frame. (No longer strictly necessary.)
void ReMapInterpreterFrame();
// Computes the remapping for a variable currently in the given slot,
// whose scope begins at the given instruction.
void ReMapVar(ID* id, int slot, int inst);
// Look to initialize the beginning of local lifetime based on slot
// assignment at instruction inst.
void CheckSlotAssignment(int slot, const ZInstI* inst);
// Track that a local's lifetime begins at the given statement.
void SetLifetimeStart(int slot, const ZInstI* inst);
// Look for extension of local lifetime based on slot usage
// at instruction inst.
void CheckSlotUse(int slot, const ZInstI* inst);
// Extend (or create) the end of a local's lifetime.
void ExtendLifetime(int slot, const ZInstI* inst);
// Returns the (live) instruction at the beginning/end of the loop(s)
// within which the given instruction lies; or that instruction
// itself if it's not inside a loop. The second argument specifies
// the loop depth. For example, a value of '2' means "extend to
// the beginning/end of any loop(s) of depth >= 2".
const ZInstI* BeginningOfLoop(const ZInstI* inst, int depth) const;
const ZInstI* EndOfLoop(const ZInstI* inst, int depth) const;
// True if any statement other than a frame sync assigns to the
// given slot.
bool VarIsAssigned(int slot) const;
// True if the given statement assigns to the given slot, and
// it's not a frame sync.
bool VarIsAssigned(int slot, const ZInstI* i) const;
// True if any statement other than a frame sync uses the given slot.
bool VarIsUsed(int slot) const;
// Find the first non-dead instruction after i (inclusive).
// If follow_gotos is true, then if that instruction is
// an unconditional branch, continues the process until
// a different instruction is found (and report if there
// are infinite loops).
//
// First form returns nil if there's nothing live after i.
// Second form returns insts1.size() in that case.
ZInstI* FirstLiveInst(ZInstI* i, bool follow_gotos = false);
int FirstLiveInst(int i, bool follow_gotos = false);
// Same, but not including i.
ZInstI* NextLiveInst(ZInstI* i, bool follow_gotos = false)
{
if ( i->inst_num == insts1.size() - 1 )
return nullptr;
return FirstLiveInst(insts1[i->inst_num + 1], follow_gotos);
}
int NextLiveInst(int i, bool follow_gotos = false)
{ return FirstLiveInst(i + 1, follow_gotos); }
// Mark an instruction as unnecessary and remove its influence on
// other statements. The instruction is indicated as an offset
// into insts1; any labels associated with it are transferred
// to its next live successor, if any.
void KillInst(ZInstI* i) { KillInst(i->inst_num); }
void KillInst(int i);
// The same, but kills any successor instructions until finding
// one that's labeled.
void KillInsts(ZInstI* i) { KillInsts(i->inst_num); }
void KillInsts(int i);
// The first of these is used as we compile down to ZInstI's.
// The second is the final intermediary code. They're separate
// to make it easy to remove dead code.
std::vector<ZInstI*> insts1;
std::vector<ZInstI*> insts2;
// Used as a placeholder when we have to generate a GoTo target
// beyond the end of what we've compiled so far.
ZInstI* pending_inst = nullptr;
// Indices of break/next/fallthrough/catch-return goto's, so they
// can be patched up post-facto. These are vectors-of-vectors
// so that nesting works properly.
GoToSets breaks;
GoToSets nexts;
GoToSets fallthroughs;
GoToSets catches;
// The following tracks return variables for catch-returns.
// Can be nil if the usage doesn't include using the return value
// (and/or no return value generated).
std::vector<const NameExpr*> retvars;
ScriptFunc* func;
std::shared_ptr<ProfileFunc> pf;
ScopePtr scope;
StmtPtr body;
std::shared_ptr<UseDefs> ud;
std::shared_ptr<Reducer> reducer;
// Maps identifiers to their (unique) frame location.
std::unordered_map<const ID*, int> frame_layout1;
// Inverse mapping, used for tracking frame usage (and for dumping
// statements).
FrameMap frame_denizens;
// The same, but for remapping identifiers to shared frame slots.
FrameReMap shared_frame_denizens;
// The same, but renumbered to take into account removal of
// dead statements.
FrameReMap shared_frame_denizens_final;
// Maps frame1 slots to frame2 slots. A value < 0 means the
// variable doesn't exist in frame2 - it's an error to encounter
// one of these when remapping instructions!
std::vector<int> frame1_to_frame2;
// A type for mapping an instruction to a set of locals associated
// with it.
using AssociatedLocals =
std::unordered_map<const ZInstI*, std::unordered_set<ID*>>;
// Maps (live) instructions to which frame denizens begin their
// lifetime via an initialization at that instruction, if any ...
// (it can be more than one local due to extending lifetimes to
// span loop bodies)
AssociatedLocals inst_beginnings;
// ... and which frame denizens had their last usage at the
// given instruction. (These are insts1 instructions, prior to
// removing dead instructions, compressing the frames, etc.)
AssociatedLocals inst_endings;
// A type for inverse mappings.
using AssociatedInsts = std::unordered_map<int, const ZInstI*>;
// Inverse mappings: for a given frame denizen's slot, where its
// lifetime begins and ends.
AssociatedInsts denizen_beginning;
AssociatedInsts denizen_ending;
// In the following, member variables ending in 'I' are intermediary
// values that get finalized when constructing the corresponding
// ZBody.
std::vector<GlobalInfo> globalsI;
std::unordered_map<const ID*, int> global_id_to_info; // inverse
// Intermediary switch tables (branching to ZInst's rather
// than concrete instruction offsets).
CaseMapsI<bro_int_t> int_casesI;
CaseMapsI<bro_uint_t> uint_casesI;
CaseMapsI<double> double_casesI;
// Note, we use this not only for strings but for addresses
// and prefixes.
CaseMapsI<std::string> str_casesI;
// Same, but for the concretized versions.
CaseMaps<bro_int_t> int_cases;
CaseMaps<bro_uint_t> uint_cases;
CaseMaps<double> double_cases;
CaseMaps<std::string> str_cases;
std::vector<int> managed_slotsI;
int frame_sizeI;
TableIterVec table_iters;
int num_step_iters = 0;
bool non_recursive = false;
// Most recent instruction, other than for housekeeping.
int top_main_inst;
// Used for communication between Frame1Slot and a subsequent
// AddInst. If >= 0, then upon adding the next instruction,
// it should be followed by Store-Global for the given slot.
int pending_global_store = -1;
};
// Invokes after compiling all of the function bodies.
class FuncInfo;
extern void finalize_functions(const std::vector<FuncInfo>& funcs);
} // namespace zeek::detail

View file

@ -0,0 +1,493 @@
// See the file "COPYING" in the main distribution directory for copyright.
// Driver (and other high-level) methods for ZAM compilation.
#include "zeek/CompHash.h"
#include "zeek/RE.h"
#include "zeek/Frame.h"
#include "zeek/module_util.h"
#include "zeek/Scope.h"
#include "zeek/Reporter.h"
#include "zeek/script_opt/ScriptOpt.h"
#include "zeek/script_opt/ProfileFunc.h"
#include "zeek/script_opt/ZAM/Compile.h"
namespace zeek::detail {
ZAMCompiler::ZAMCompiler(ScriptFunc* f, std::shared_ptr<ProfileFunc> _pf,
ScopePtr _scope, StmtPtr _body,
std::shared_ptr<UseDefs> _ud,
std::shared_ptr<Reducer> _rd)
{
func = f;
pf = std::move(_pf);
scope = std::move(_scope);
body = std::move(_body);
ud = std::move(_ud);
reducer = std::move(_rd);
frame_sizeI = 0;
Init();
}
void ZAMCompiler::Init()
{
InitGlobals();
InitArgs();
InitLocals();
#if 0
// Complain about unused aggregates ... but not if we're inlining,
// as that can lead to optimizations where they wind up being unused
// but the original logic for using them was sound.
if ( ! analysis_options.inliner )
for ( auto a : pf->Inits() )
{
if ( pf->Locals().find(a) == pf->Locals().end() )
reporter->Warning("%s unused", a->Name());
}
#endif
TrackMemoryManagement();
non_recursive = non_recursive_funcs.count(func) > 0;
}
void ZAMCompiler::InitGlobals()
{
for ( auto g : pf->Globals() )
{
auto non_const_g = const_cast<ID*>(g);
GlobalInfo info;
info.id = {NewRef{}, non_const_g};
info.slot = AddToFrame(non_const_g);
global_id_to_info[non_const_g] = globalsI.size();
globalsI.push_back(info);
}
}
void ZAMCompiler::InitArgs()
{
auto uds = ud->HasUsage(body.get()) ? ud->GetUsage(body.get()) :
nullptr;
auto args = scope->OrderedVars();
int nparam = func->GetType()->Params()->NumFields();
push_existing_scope(scope);
for ( auto& a : args )
{
if ( --nparam < 0 )
break;
auto arg_id = a.get();
if ( uds && uds->HasID(arg_id) )
LoadParam(arg_id);
else
{
// printf("param %s unused\n", obj_desc(arg_id.get()));
}
}
pop_scope();
}
void ZAMCompiler::InitLocals()
{
// Assign slots for locals (which includes temporaries).
for ( auto l : pf->Locals() )
{
auto non_const_l = const_cast<ID*>(l);
// Don't add locals that were already added because they're
// parameters.
//
// Don't worry about unused variables, those will get
// removed during low-level ZAM optimization.
if ( ! HasFrameSlot(non_const_l) )
(void) AddToFrame(non_const_l);
}
}
void ZAMCompiler::TrackMemoryManagement()
{
for ( auto& slot : frame_layout1 )
{
// Look for locals with values of types for which
// we do explicit memory management on (re)assignment.
auto t = slot.first->GetType();
if ( ZVal::IsManagedType(t) )
managed_slotsI.push_back(slot.second);
}
}
StmtPtr ZAMCompiler::CompileBody()
{
curr_stmt = nullptr;
if ( func->Flavor() == FUNC_FLAVOR_HOOK )
PushBreaks();
(void) CompileStmt(body);
if ( reporter->Errors() > 0 )
return nullptr;
ResolveHookBreaks();
if ( ! nexts.empty() )
reporter->Error("\"next\" used without an enclosing \"for\"");
if ( ! fallthroughs.empty() )
reporter->Error("\"fallthrough\" used without an enclosing \"switch\"");
if ( ! catches.empty() )
reporter->InternalError("untargeted inline return");
// Make sure we have a (pseudo-)instruction at the end so we
// can use it as a branch label.
if ( ! pending_inst )
pending_inst = new ZInstI();
// Concretize instruction numbers in inst1 so we can
// easily move through the code.
for ( auto i = 0U; i < insts1.size(); ++i )
insts1[i]->inst_num = i;
ComputeLoopLevels();
if ( ! analysis_options.no_ZAM_opt )
OptimizeInsts();
AdjustBranches();
// Construct the final program with the dead code eliminated
// and branches resolved.
// Make sure we don't include the empty pending-instruction, if any.
if ( pending_inst )
pending_inst->live = false;
// Maps inst1 instructions to where they are in inst2.
// Dead instructions map to -1.
std::vector<int> inst1_to_inst2;
for ( auto& i1 : insts1 )
{
if ( i1->live )
{
inst1_to_inst2.push_back(insts2.size());
insts2.push_back(i1);
}
else
inst1_to_inst2.push_back(-1);
}
// Re-concretize instruction numbers, and concretize GoTo's.
for ( auto i = 0U; i < insts2.size(); ++i )
insts2[i]->inst_num = i;
RetargetBranches();
// If we have remapped frame denizens, update them. If not,
// create them.
if ( ! shared_frame_denizens.empty() )
RemapFrameDenizens(inst1_to_inst2);
else
CreateSharedFrameDenizens();
delete pending_inst;
ConcretizeSwitches();
// Could erase insts1 here to recover memory, but it's handy
// for debugging.
#if 0
if ( non_recursive )
func->UseStaticFrame();
#endif
auto zb = make_intrusive<ZBody>(func->Name(), this);
zb->SetInsts(insts2);
return zb;
}
void ZAMCompiler::ResolveHookBreaks()
{
if ( ! breaks.empty() )
{
ASSERT(breaks.size() == 1);
if ( func->Flavor() == FUNC_FLAVOR_HOOK )
{
// Rewrite the breaks.
for ( auto& b : breaks[0] )
{
auto& i = insts1[b.stmt_num];
delete i;
i = new ZInstI(OP_HOOK_BREAK_X);
}
}
else
reporter->Error("\"break\" used without an enclosing \"for\" or \"switch\"");
}
}
void ZAMCompiler::ComputeLoopLevels()
{
// Compute which instructions are inside loops.
for ( auto i = 0; i < int(insts1.size()); ++i )
{
auto inst = insts1[i];
auto t = inst->target;
if ( ! t || t == pending_inst )
continue;
if ( t->inst_num < i )
{
auto j = t->inst_num;
if ( ! t->loop_start )
{
// Loop is newly discovered.
t->loop_start = true;
}
else
{
// We're extending an existing loop. Find
// its current end.
auto depth = t->loop_depth;
while ( j < i &&
insts1[j]->loop_depth == depth )
++j;
ASSERT(insts1[j]->loop_depth == depth - 1);
}
// Run from j's current position to i, bumping
// the loop depth.
while ( j <= i )
{
++insts1[j]->loop_depth;
++j;
}
}
}
}
void ZAMCompiler::AdjustBranches()
{
// Move branches to dead code forward to their successor live code.
for ( auto& inst : insts1 )
{
if ( ! inst->live )
continue;
if ( auto t = inst->target )
inst->target = FindLiveTarget(t);
}
}
void ZAMCompiler::RetargetBranches()
{
for ( auto& inst : insts2 )
if ( inst->target )
ConcretizeBranch(inst, inst->target, inst->target_slot);
}
void ZAMCompiler::RemapFrameDenizens(const std::vector<int>& inst1_to_inst2)
{
for ( auto& info : shared_frame_denizens )
{
for ( auto& start : info.id_start )
{
// It can happen that the identifier's
// origination instruction was optimized
// away, if due to slot sharing it's of
// the form "slotX = slotX". In that
// case, look forward for the next viable
// instruction.
while ( start < int(insts1.size()) &&
inst1_to_inst2[start] == -1 )
++start;
ASSERT(start < insts1.size());
start = inst1_to_inst2[start];
}
shared_frame_denizens_final.push_back(info);
}
}
void ZAMCompiler::CreateSharedFrameDenizens()
{
for ( auto& fd : frame_denizens )
{
FrameSharingInfo info;
info.ids.push_back(fd);
info.id_start.push_back(0);
info.scope_end = insts2.size();
// The following doesn't matter since the value
// is only used during compiling, not during
// execution.
info.is_managed = false;
shared_frame_denizens_final.push_back(info);
}
}
void ZAMCompiler::ConcretizeSwitches()
{
// Create concretized versions of any case tables.
ConcretizeSwitchTables(int_casesI, int_cases);
ConcretizeSwitchTables(uint_casesI, uint_cases);
ConcretizeSwitchTables(double_casesI, double_cases);
ConcretizeSwitchTables(str_casesI, str_cases);
}
template <typename T>
void ZAMCompiler::ConcretizeSwitchTables(const CaseMapsI<T>& abstract_cases,
CaseMaps<T>& concrete_cases)
{
for ( auto& targs : abstract_cases )
{
CaseMap<T> cm;
for ( auto& targ : targs )
cm[targ.first] = targ.second->inst_num;
concrete_cases.emplace_back(cm);
}
}
#include "ZAM-MethodDefs.h"
void ZAMCompiler::Dump()
{
bool remapped_frame = ! analysis_options.no_ZAM_opt;
if ( remapped_frame )
printf("Original frame for %s:\n", func->Name());
for ( auto elem : frame_layout1 )
printf("frame[%d] = %s\n", elem.second, elem.first->Name());
if ( remapped_frame )
{
printf("Final frame for %s:\n", func->Name());
for ( auto i = 0U; i < shared_frame_denizens.size(); ++i )
{
printf("frame2[%d] =", i);
for ( auto& id : shared_frame_denizens[i].ids )
printf(" %s", id->Name());
printf("\n");
}
}
if ( ! insts2.empty() )
printf("Pre-removal of dead code for %s:\n", func->Name());
auto remappings = remapped_frame ? &shared_frame_denizens : nullptr;
DumpInsts1(remappings);
if ( ! insts2.empty() )
printf("Final intermediary code for %s:\n", func->Name());
remappings = remapped_frame ? &shared_frame_denizens_final : nullptr;
for ( auto i = 0U; i < insts2.size(); ++i )
{
auto& inst = insts2[i];
std::string liveness, depth;
if ( inst->live )
liveness = util::fmt("(labels %d)", inst->num_labels);
else
liveness = "(dead)";
if ( inst->loop_depth )
depth = util::fmt(" (loop %d)", inst->loop_depth);
printf("%d %s%s: ", i, liveness.c_str(), depth.c_str());
inst->Dump(&frame_denizens, remappings);
}
if ( ! insts2.empty() )
printf("Final code for %s:\n", func->Name());
for ( auto i = 0U; i < insts2.size(); ++i )
{
auto& inst = insts2[i];
printf("%d: ", i);
inst->Dump(&frame_denizens, remappings);
}
DumpCases(int_casesI, "int");
DumpCases(uint_casesI, "uint");
DumpCases(double_casesI, "double");
DumpCases(str_casesI, "str");
}
template <typename T>
void ZAMCompiler::DumpCases(const T& cases, const char* type_name) const
{
for ( auto i = 0U; i < cases.size(); ++i )
{
printf("%s switch table #%d:", type_name, i);
for ( auto& m : cases[i] )
{
std::string case_val;
if constexpr ( std::is_same_v<T, std::string> )
case_val = m.first;
else if constexpr ( std::is_same_v<T, bro_int_t> ||
std::is_same_v<T, bro_uint_t> ||
std::is_same_v<T, double> )
case_val = std::to_string(m.first);
printf(" %s->%d", case_val.c_str(), m.second->inst_num);
}
printf("\n");
}
}
void ZAMCompiler::DumpInsts1(const FrameReMap* remappings)
{
for ( auto i = 0U; i < insts1.size(); ++i )
{
auto& inst = insts1[i];
if ( inst->target )
// To get meaningful branch information in the dump,
// we need to concretize the branch slots
ConcretizeBranch(inst, inst->target, inst->target_slot);
std::string liveness, depth;
if ( inst->live )
liveness = util::fmt("(labels %d)", inst->num_labels);
else
liveness = "(dead)";
if ( inst->loop_depth )
depth = util::fmt(" (loop %d)", inst->loop_depth);
printf("%d %s%s: ", i, liveness.c_str(), depth.c_str());
inst->Dump(&frame_denizens, remappings);
}
}
} // zeek::detail

1219
src/script_opt/ZAM/Expr.cc Normal file

File diff suppressed because it is too large Load diff

File diff suppressed because it is too large Load diff

View file

@ -0,0 +1,991 @@
// See the file "COPYING" in the main distribution directory for copyright.
// Gen-ZAM is a standalone program that takes as input a file specifying
// ZAM operations and from them generates a (large) set of C++ include
// files used to instantiate those operations as low-level ZAM instructions.
// (Those files are described in the EmitTarget enumeration below.)
//
// See Ops.in for documentation regarding the format of the ZAM templates.
#pragma once
#include <assert.h>
#include <memory>
#include <string>
#include <vector>
#include <map>
#include <unordered_set>
#include <unordered_map>
using std::string;
using std::vector;
// An instruction can have one of four basic classes.
enum ZAM_InstClass {
ZIC_REGULAR, // a non-complicated instruction
ZIC_COND, // a conditional branch
ZIC_VEC, // a vector operation
ZIC_FIELD, // a record field assignment
};
// For a given instruction operand, its general type.
enum ZAM_OperandType {
ZAM_OT_CONSTANT, // uses the instruction's associated constant
ZAM_OT_EVENT_HANDLER, // uses the associated event handler
ZAM_OT_INT, // directly specified integer
ZAM_OT_VAR, // frame slot associated with a variable
ZAM_OT_ASSIGN_FIELD, // record field offset to assign to
ZAM_OT_RECORD_FIELD, // record field offset to access
// The following wind up the same in the ultimate instruction,
// but they differ in the calling sequences used to generate
// the instruction.
ZAM_OT_AUX, // uses the instruction's "aux" field
ZAM_OT_LIST, // a list, managed via the "aux" field
ZAM_OT_NONE, // instruction has no direct operands
};
// For instructions corresponding to evaluating expressions, the type
// of a given operand. The generator uses these to transform the operand's
// low-level ZVal into a higher-level type expected by the associated
// evaluation code.
enum ZAM_ExprType {
ZAM_EXPR_TYPE_ADDR,
ZAM_EXPR_TYPE_ANY,
ZAM_EXPR_TYPE_DOUBLE,
ZAM_EXPR_TYPE_FUNC,
ZAM_EXPR_TYPE_INT,
ZAM_EXPR_TYPE_PATTERN,
ZAM_EXPR_TYPE_RECORD,
ZAM_EXPR_TYPE_STRING,
ZAM_EXPR_TYPE_SUBNET,
ZAM_EXPR_TYPE_TABLE,
ZAM_EXPR_TYPE_UINT,
ZAM_EXPR_TYPE_VECTOR,
ZAM_EXPR_TYPE_FILE,
ZAM_EXPR_TYPE_OPAQUE,
ZAM_EXPR_TYPE_LIST,
ZAM_EXPR_TYPE_TYPE,
// Used to specify "apart from the explicitly specified operand
// types, do this action for any other types".
ZAM_EXPR_TYPE_DEFAULT,
// Used for expressions where the evaluation code for the
// expression deals directly with the operand's ZVal, rather
// than the generator providing a higher-level version.
ZAM_EXPR_TYPE_NONE,
};
// We only use the following in the context where the vector's elements
// are individual words from the same line. We don't use it in other
// contexts where we're tracking a bunch of strings.
using Words = vector<string>;
// Used for error-reporting.
struct InputLoc {
const char* file_name;
int line_num = 0;
};
// An EmitTarget is a generated file to which code will be emitted.
// The different values are used to instruct the generator which target
// is currently of interest.
enum EmitTarget {
// Indicates that no generated file has yet been specified.
None,
// Declares/defines methods that take AST nodes and generate
// corresponding ZAM instructions.
MethodDecl,
MethodDef,
// Switch cases for expressions that are compiled directly, using
// custom methods rather than methods produced by the generator.
DirectDef,
// Switch cases for invoking various flavors of methods produced
// by the generator for generating ZAM instructions for AST
// expressions. C1/C2/C3 refer to the first/second/third operand
// being a constant. V refers to none of the operands being
// a constant.
C1Def,
C2Def,
C3Def,
VDef,
// The same, but for when the expression is being assigned to
// a record field rather than a variable. There's no "C3" option
// because of how we reduce AST ternary operations.
C1FieldDef,
C2FieldDef,
VFieldDef,
// Switch cases for compiling relational operations used in
// conditionals.
Cond,
// Switch cases that provide the C++ code for executing specific
// individual ZAM instructions.
Eval,
// #define's used to provide the templator's macro functionality.
EvalMacros,
// Switch cases the provide the C++ code for executing unary
// and binary vector operations.
Vec1Eval,
Vec2Eval,
// A set of instructions to dynamically generate maps that
// translate a generic ZAM operation (e.g., OP_LOAD_GLOBAL_VV)
// to a specific ZAM instruction, given a specific type
// (e.g., for OP_LOAD_GLOBAL_VV plus TYPE_ADDR, the map yields
// OP_LOAD_GLOBAL_VV_A).
AssignFlavor,
// A list of values, one per ZAM instruction, that indicate whether
// that instruction writes to its first operand (the most common
// case), reads the operand but doesn't write to it, both reads it
// and writes to it, or none of these apply because the first
// operand isn't a frame variable. See the ZAMOp1Flavor enum
// defined in ZOp.h.
Op1Flavor,
// A list of boolean values, one per ZAM instruction, that indicate
// whether the instruction has side effects, and thus should not
// be deleted even if its associated assignment is to a dead value
// (one not subsequently used).
OpSideEffects,
// A list of names enumerating each ZAM instruction. These
// are ZAM opcodes.
OpDef,
// A list of cases, indexed by ZAM opcode, that return a
// human-readable string of naming the opcode, for use in debugging
// output. For example, for OP_NEGATE_VV_I the corresponding
// string is "negate-VV-I".
OpName,
};
// A helper class for managing the (ordered) collection of ZAM_OperandType's
// associated with an instruction in order to generate C++ calling sequences
// (both parameters for declarations, and arguments for invocations).
class ArgsManager {
public:
// Constructed by providing the various ZAM_OperandType's along
// with the instruction's class.
ArgsManager(const vector<ZAM_OperandType>& ot, ZAM_InstClass ic);
// Returns a string defining the parameters for a declaration;
// these have full C++ type information along with the parameter
// name.
string Decls() const { return full_decl; }
// Returns a string for passing the parameters in a function
// call. This is a comma-separated list of the parameter names,
// with no associated C++ types.
string Params() const { return full_params; }
// Returns the name of the given parameter, indexed starting with 0.
const string& NthParam(int n) const { return params[n]; }
private:
// Makes sure that each parameter has a unique name. For any
// parameter 'x' that occurs more than once, renames the instances
// "x1", "x2", etc.
void Differentiate();
// Maps ZAM_OperandType's to their associated C++ type and
// canonical parameter name.
static std::unordered_map<ZAM_OperandType,
std::pair<const char*, const char*>> ot_to_args;
// For a single argument/parameter, tracks its declaration name,
// C++ type, and the name to use when providing it as a parameter.
// These last two names are potentially distinct when we're
// assigning to record field (which is tracked by the is_field
// member variable), hence the need to track both.
struct Arg {
string decl_name;
string decl_type;
string param_name;
bool is_field;
};
// All of the argument/parameters associated with the collection
// of ZAM_OperandType's.
vector<Arg> args;
// Each of the individual parameters.
vector<string> params;
// See Decls() and Params() above.
string full_decl;
string full_params;
};
// There are two mutually interacting classes: ZAMGen is the overall
// driver for the ZAM generator, while ZAM_OpTemplate represents a
// single operation template, with subclasses for specific types of
// operations.
class ZAMGen;
class ZAM_OpTemplate {
public:
// Instantiated by passing in the ZAMGen driver and the generic
// name for the operation.
ZAM_OpTemplate(ZAMGen* _g, string _base_name);
virtual ~ZAM_OpTemplate() { }
// Constructs the template's data structures by parsing its
// description (beyond the initial description of the type of
// operation).
void Build();
// Tells the object to generate the code/files necessary for
// each of its underlying instructions.
virtual void Instantiate();
// Returns the generic name for the operation.
const string& BaseName() const { return base_name; }
// Returns the canonical name for the operation. This is a
// version of the name that, for expression-based operations,
// can be concatenated with "EXPR_" to get the name of the
// corresponding AST node.
const string& CanonicalName() const { return cname; }
// Returns a string version of the ZAMOp1Flavor associated
// with this operation.
const string& GetOp1Flavor() const { return op1_flavor; }
// True if this is an operation with side effects (see OpSideEffects
// above).
bool HasSideEffects() const { return has_side_effects; }
protected:
// Append to the list of operand types associated with this operation.
void AddOpType(ZAM_OperandType ot)
{ op_types.push_back(ot); }
// Retrieve the list of operand types associated with this operation.
const vector<ZAM_OperandType>& OperandTypes() const
{ return op_types; }
// Specify the ZAMOp1Flavor associated with this operation. See
// GetOp1Flavor() above for the corresponding accessor.
void SetOp1Flavor(string fl) { op1_flavor = fl; }
// Specify/fetch the parameter (operand) from which to take the
// primary type of this operation.
void SetTypeParam(int param) { type_param = param; }
int GetTypeParam() const { return type_param; }
// Specify/fetch the parameter (operand) from which to take the
// secondary type of this operation.
void SetType2Param(int param) { type2_param = param; }
int GetType2Param() const { return type2_param; }
// Tracking of assignment values (C++ variables that hold the
// value that should be assigned to usual frame slot).
void SetAssignVal(string _av) { av = _av; }
bool HasAssignVal() const { return ! av.empty(); }
const string& GetAssignVal() const { return av; }
// Management of C++ evaluation blocks. These are built up
// line-by-line.
void AddEval(string line) { eval += line; }
bool HasEval() const { return ! eval.empty(); }
const string& GetEval() const { return eval; }
// Management of custom methods to be used rather than generating
// a method.
void SetCustomMethod(string cm) { custom_method = SkipWS(cm); }
bool HasCustomMethod() const
{ return ! custom_method.empty(); }
const string& GetCustomMethod() const
{ return custom_method; }
// Management of code to execute at the end of a generated method.
void SetPostMethod(string cm) { post_method = SkipWS(cm); }
bool HasPostMethod() const
{ return ! post_method.empty(); }
const string& GetPostMethod() const
{ return post_method; }
// Predicates indicating whether a subclass supports a given
// property. These are whether the operation: (1) should include
// a version that assigns to a record field as well as the normal
// assigning to a frame slot, (2) is a conditional branch, (3) does
// not have a corresponding AST node, (4) is a direct assignment
// (not an assignment to an expression), (5) is a direct assignment
// to a record field.
virtual bool IncludesFieldOp() const { return false; }
virtual bool IsConditionalOp() const { return false; }
virtual bool IsInternalOp() const { return false; }
virtual bool IsAssignOp() const { return false; }
virtual bool IsFieldOp() const { return false; }
// Whether this operation does not have any C++ evaluation associated
// with it. Used for custom methods that compile into internal
// ZAM operations.
bool NoEval() const { return no_eval; }
void SetNoEval() { no_eval = true; }
// Whether this operation does not have a version where one of
// its operands is a constant.
bool NoConst() const { return no_const; }
void SetNoConst() { no_const = true; }
// Whether this operation also has a vectorized form.
bool IncludesVectorOp() const { return includes_vector_op; }
void SetIncludesVectorOp() { includes_vector_op = true; }
// Whether this operation has side effects, and thus should
// not be elided even if its result is used in a dead assignment.
void SetHasSideEffects() { has_side_effects = true; }
// An "assignment-less" operation is one that, if its result
// is used in a dead assignment, should be converted to a different
// operation that explictly omits any assignment.
bool HasAssignmentLess() const
{ return ! assignment_less_op.empty(); }
void SetAssignmentLess(string op, string op_type)
{
assignment_less_op = op;
assignment_less_op_type = op_type;
}
const string& AssignmentLessOp() const
{ return assignment_less_op; }
const string& AssignmentLessOpType() const
{ return assignment_less_op_type; }
// Builds the instructions associated with this operation, assuming
// a single operand.
void UnaryInstantiate();
// Parses the next line in an operation template. "attr" is
// the first word on the line, which often specifies the
// attribute specified by the line. "line" is the entire line,
// for parsing when that's necessary, and for error reporting.
// "words" is "line" split into a vector of whitespace-delimited
// words.
virtual void Parse(const string& attr, const string& line,
const Words& words);
// Scans in a C++ evaluation block, which continues until encountering
// a line that does not start with whitespace, or that's empty.
string GatherEval();
// Parses a $-specifier of which operand to use to associate
// a Zeek scripting type with ZAM instructions.
int ExtractTypeParam(const string& arg);
// Generates instructions for each of the different flavors of the
// given operation. "ot" specifies the types of operands for the
// instruction, and "do_vec" whether to generate a vector version.
void InstantiateOp(const vector<ZAM_OperandType>& ot, bool do_vec);
// Generates one specific flavor ("zc") of the given operation,
// using a method named 'm', the given operand types, and the class.
void InstantiateOp(const string& m, const vector<ZAM_OperandType>& ot,
ZAM_InstClass zc);
// Generates the "assignment-less" version of the given op-code.
void GenAssignmentlessVersion(string op);
// Generates the method 'm' for an operation, where "suffix" is
// a (potentially empty) string differentiating the method from
// others for that operation, and "ot" and "zc" are the same
// as above.
void InstantiateMethod(const string& m, const string& suffix,
const vector<ZAM_OperandType>& ot,
ZAM_InstClass zc);
// Generates the main logic of an operation's method, given the
// specific operand types, an associated suffix for differentiating
// ZAM instructions, and the instruction class.
void InstantiateMethodCore(const vector<ZAM_OperandType>& ot,
string suffix, ZAM_InstClass zc);
// Generates the specific code to create a ZInst for the given
// operation, operands, parameters to "GenInst", and suffix and
// class per the above.
virtual void BuildInstruction(const vector<ZAM_OperandType>& ot,
const string& params,
const string& suffix, ZAM_InstClass zc);
// Top-level driver for generating the C++ evaluation code for
// a given flavor of operation.
virtual void InstantiateEval(const vector<ZAM_OperandType>& ot,
const string& suffix, ZAM_InstClass zc);
// Generates the C++ case statement for evaluating the given flavor
// of operation.
void InstantiateEval(EmitTarget et, const string& op_suffix,
const string& eval, ZAM_InstClass zc);
// Generates a set of assignment C++ evaluations, one per each
// possible Zeek scripting type of operand.
void InstantiateAssignOp(const vector<ZAM_OperandType>& ot,
const string& suffix);
// Generates a C++ evaluation for an assignment of the type
// corresponding to "accessor". If "is_managed" is true then
// generates the associated memory management, too.
void GenAssignOpCore(const vector<ZAM_OperandType>& ot,
const string& eval, const string& accessor,
bool is_managed);
// The same, but for when there's an explicit assignment value.
void GenAssignOpValCore(const string& eval, const string& accessor,
bool is_managed);
// Returns the name of the method associated with the particular
// list of operand types.
string MethodName(const vector<ZAM_OperandType>& ot) const;
// Returns the parameter declarations to use in declaring a method.
string MethodDeclare(const vector<ZAM_OperandType>& ot,
ZAM_InstClass zc);
// Returns a suffix that differentiates an operation name for
// a specific list of operand types.
string OpSuffix(const vector<ZAM_OperandType>& ot) const;
// Returns a copy of the given string with leading whitespace
// removed.
string SkipWS(const string& s) const;
// Set the target to use for subsequent code emission.
void EmitTo(EmitTarget et) { curr_et = et; }
// Emit the given string to the currently selected EmitTarget.
void Emit(const string& s);
// Same, but temporarily indented up.
void EmitUp(const string& s)
{
IndentUp();
Emit(s);
IndentDown();
}
// Same, but reframe from inserting a newline.
void EmitNoNL(const string& s);
// Emit a newline. Implementation doesn't actually include a
// newline since that's implicit in a call to Emit().
void NL() { Emit(""); }
// Increase/decrease the indentation level, with the last two
// being used for brace-delimited code blocks.
void IndentUp();
void IndentDown();
void BeginBlock() { IndentUp(); Emit("{"); }
void EndBlock() { Emit("}"); IndentDown(); }
// Maps an operand type to a character mnemonic used to distinguish
// it from others.
static std::unordered_map<ZAM_OperandType, char> ot_to_char;
// The associated driver object.
ZAMGen* g;
// See BaseName() and CanonicalName() above.
string base_name;
string cname;
// Tracks the beginning of this operation template's definition,
// for error reporting.
InputLoc op_loc;
// The current emission target.
EmitTarget curr_et = None;
// The operand types for operations that have a single fixed list.
// Some operations (like those evaluating expressions) instead have
// dynamically generated range of possible operand types.
vector<ZAM_OperandType> op_types;
// See the description of Op1Flavor above.
string op1_flavor = "OP1_WRITE";
// Tracks the result of ExtractTypeParam() used for "type" and
// "type2" attributes.
int type_param = 0; // 0 = not set
int type2_param = 0;
// If non-empty, the value to assign to the target in an assignment
// operation.
string av;
// The C++ evaluation; may span multiple lines.
string eval;
// Any associated custom method.
string custom_method;
// Any associated additional code to add at the end of a
// generated method.
string post_method;
// If true, then this operation does not have C++ evaluation
// associated with it.
bool no_eval = false;
// If true, then this operation should not include a version
// supporting operands of constant type.
bool no_const = false;
// If true, then this operation includes a vectorized version.
bool includes_vector_op = false;
// If true, then this operation has side effects.
bool has_side_effects = false;
// If non-empty, then specifies the associated operation that
// is a version of this operation but without assigning the result;
// and the operand type (like "OP_V") of that associated operation.
string assignment_less_op;
string assignment_less_op_type;
};
// A subclass used for "unary-op" templates.
class ZAM_UnaryOpTemplate : public ZAM_OpTemplate {
public:
ZAM_UnaryOpTemplate(ZAMGen* _g, string _base_name)
: ZAM_OpTemplate(_g, _base_name) { }
protected:
void Instantiate() override;
};
// A subclass for unary operations that are directly instantiated using
// custom methods.
class ZAM_DirectUnaryOpTemplate : public ZAM_OpTemplate {
public:
ZAM_DirectUnaryOpTemplate(ZAMGen* _g, string _base_name, string _direct)
: ZAM_OpTemplate(_g, _base_name), direct(_direct) { }
protected:
void Instantiate() override;
private:
// The ZAMCompiler method to call to compile the operation.
string direct;
};
// A helper class for the ZAM_ExprOpTemplate class (which follows).
// This class tracks a single instance of creating an evaluation for
// an AST expression.
class EvalInstance {
public:
// Initialized using the types of the LHS (result) and the
// first and second operand. Often all three types are the
// same, but they can differ for some particular expressions,
// and for relationals. "eval" provides the C++ evaluation code.
// "is_def" is true if this instance is for the default catch-all
// where the operand types don't match any of the explicitly
// specified evaluations;
EvalInstance(ZAM_ExprType lhs_et, ZAM_ExprType op1_et,
ZAM_ExprType op2_et, string eval, bool is_def);
// Returns the accessor to use for assigning to the LHS. "is_ptr"
// indicates whether the value to which we're applying the
// accessor is a pointer, rather than a ZVal.
string LHSAccessor(bool is_ptr = false) const;
// Same but for access to the first or second operand.
string Op1Accessor(bool is_ptr = false) const
{ return Accessor(op1_et, is_ptr); }
string Op2Accessor(bool is_ptr = false) const
{ return Accessor(op2_et, is_ptr); }
// Provides an accessor for an operand of the given type.
string Accessor(ZAM_ExprType et, bool is_ptr = false) const;
// Returns the "marker" use to make unique the opcode for this
// flavor of expression-evaluation instruction.
string OpMarker() const;
const string& Eval() const { return eval; }
ZAM_ExprType LHS_ET() const { return lhs_et; }
bool IsDefault() const { return is_def; }
private:
ZAM_ExprType lhs_et;
ZAM_ExprType op1_et;
ZAM_ExprType op2_et;
string eval;
bool is_def;
};
// A subclass for AST "Expr" nodes in reduced form.
class ZAM_ExprOpTemplate : public ZAM_OpTemplate {
public:
ZAM_ExprOpTemplate(ZAMGen* _g, string _base_name);
// The number of operands the operation takes (not including its
// assignment target). A value of 0 is used for expressions that
// require special handling.
virtual int Arity() const { return 0; }
int HasExplicitResultType() const { return explicit_res_type; }
void SetHasExplicitResultType() { explicit_res_type = true; }
void AddExprType(ZAM_ExprType et)
{ expr_types.insert(et); }
const std::unordered_set<ZAM_ExprType>& ExprTypes() const
{ return expr_types; }
void AddEvalSet(ZAM_ExprType et, string ev)
{ eval_set[et] += ev; }
void AddEvalSet(ZAM_ExprType et1, ZAM_ExprType et2, string ev)
{ eval_mixed_set[et1][et2] += ev; }
bool IncludesFieldOp() const override { return includes_field_op; }
void SetIncludesFieldOp() { includes_field_op = true; }
bool HasPreEval() const { return ! pre_eval.empty(); }
void SetPreEval(string pe) { pre_eval = SkipWS(pe); }
const string& GetPreEval() const { return pre_eval; }
protected:
// Returns a regular expression used to access the value of the
// expression suitable for assignment in a loop across the elements
// of a Zeek "vector" type. "have_target" is true if the template
// has an explicit "$$" assignment target.
virtual const char* VecEvalRE(bool have_target) const
{
return have_target ? "$$$$ = ZVal($1)" : "ZVal($&)";
}
void Parse(const string& attr, const string& line, const Words& words) override;
void Instantiate() override;
// Instantiates versions of the operation that have a constant
// as the first, second, or third operand ...
void InstantiateC1(const vector<ZAM_OperandType>& ots, int arity,
bool do_vec = false);
void InstantiateC2(const vector<ZAM_OperandType>& ots, int arity);
void InstantiateC3(const vector<ZAM_OperandType>& ots);
// ... or if all of the operands are non-constant.
void InstantiateV(const vector<ZAM_OperandType>& ots);
// Generates code that instantiates either the vectorized version
// of an operation, or the non-vector one, depending on whether
// the RHS of the reduced expression/assignment is a vector.
void DoVectorCase(const string& m, const string& args);
// Iterates over the different Zeek types specified for an expression's
// operands and generates instructions for each.
void BuildInstructionCore(const string& params, const string& suffix,
ZAM_InstClass zc);
// Generates an if-else cascade element that matches one of the
// specific Zeek types associated with the instruction.
void GenMethodTest(ZAM_ExprType et1, ZAM_ExprType et2,
const string& params, const string& suffix,
bool do_else, ZAM_InstClass zc);
void InstantiateEval(const vector<ZAM_OperandType>& ot,
const string& suffix, ZAM_InstClass zc) override;
private:
// The Zeek types that can appear as operands for the expression.
std::unordered_set<ZAM_ExprType> expr_types;
// The C++ evaluation template for a given operand type.
std::unordered_map<ZAM_ExprType, string> eval_set;
// Some expressions take two operands of different types. This
// holds their C++ evaluation template.
std::unordered_map<ZAM_ExprType,
std::unordered_map<ZAM_ExprType, string>>
eval_mixed_set;
// Whether this expression's operand is a field access (and thus
// needs both the record as an operand and an additional constant
// offset into the record to get to the field).
bool includes_field_op = false;
// If non-zero, code to generate prior to evaluating the expression.
string pre_eval;
// If true, then the evaluations will take care of ensuring
// proper result types when assigning to $$.
bool explicit_res_type = false;
};
// A version of ZAM_ExprOpTemplate for unary expressions.
class ZAM_UnaryExprOpTemplate : public ZAM_ExprOpTemplate {
public:
ZAM_UnaryExprOpTemplate(ZAMGen* _g, string _base_name)
: ZAM_ExprOpTemplate(_g, _base_name) { }
bool IncludesFieldOp() const override
{ return ExprTypes().count(ZAM_EXPR_TYPE_NONE) == 0; }
int Arity() const override { return 1; }
protected:
void Parse(const string& attr, const string& line, const Words& words) override;
void Instantiate() override;
void BuildInstruction(const vector<ZAM_OperandType>& ot,
const string& params, const string& suffix,
ZAM_InstClass zc) override;
};
// A version of ZAM_UnaryExprOpTemplate where the point of the expression
// is to capture a direct assignment operation.
class ZAM_AssignOpTemplate : public ZAM_UnaryExprOpTemplate {
public:
ZAM_AssignOpTemplate(ZAMGen* _g, string _base_name);
bool IsAssignOp() const override { return true; }
bool IncludesFieldOp() const override { return false; }
bool IsFieldOp() const override { return field_op; }
void SetFieldOp() { field_op = true; }
protected:
void Parse(const string& attr, const string& line, const Words& words) override;
void Instantiate() override;
private:
bool field_op = false;
};
// A version of ZAM_ExprOpTemplate for binary expressions.
class ZAM_BinaryExprOpTemplate : public ZAM_ExprOpTemplate {
public:
ZAM_BinaryExprOpTemplate(ZAMGen* _g, string _base_name)
: ZAM_ExprOpTemplate(_g, _base_name) { }
bool IncludesFieldOp() const override { return true; }
int Arity() const override { return 2; }
protected:
void Instantiate() override;
void BuildInstruction(const vector<ZAM_OperandType>& ot,
const string& params, const string& suffix,
ZAM_InstClass zc) override;
};
// A version of ZAM_BinaryExprOpTemplate for relationals.
class ZAM_RelationalExprOpTemplate : public ZAM_BinaryExprOpTemplate {
public:
ZAM_RelationalExprOpTemplate(ZAMGen* _g, string _base_name)
: ZAM_BinaryExprOpTemplate(_g, _base_name) { }
bool IncludesFieldOp() const override { return false; }
bool IsConditionalOp() const override { return true; }
protected:
const char* VecEvalRE(bool have_target) const override
{
if ( have_target )
return "$$$$ = ZVal(bro_int_t($1))";
else
return "ZVal(bro_int_t($&))";
}
void Instantiate() override;
void BuildInstruction(const vector<ZAM_OperandType>& ot,
const string& params, const string& suffix,
ZAM_InstClass zc) override;
};
// A version of ZAM_BinaryExprOpTemplate for binary operations generated
// by custom methods rather than directly from the AST.
class ZAM_InternalBinaryOpTemplate : public ZAM_BinaryExprOpTemplate {
public:
ZAM_InternalBinaryOpTemplate(ZAMGen* _g, string _base_name)
: ZAM_BinaryExprOpTemplate(_g, _base_name) { }
bool IsInternalOp() const override { return true; }
// The accessors used to get to the underlying Zeek script value
// of the first and second operand.
void SetOp1Accessor(string accessor) { op1_accessor = accessor; }
void SetOp2Accessor(string accessor) { op2_accessor = accessor; }
void SetOpAccessor(string accessor)
{
SetOp1Accessor(accessor);
SetOp2Accessor(accessor);
}
protected:
void Parse(const string& attr, const string& line, const Words& words) override;
void InstantiateEval(const vector<ZAM_OperandType>& ot,
const string& suffix, ZAM_InstClass zc) override;
private:
string op1_accessor;
string op2_accessor;
};
// A version of ZAM_OpTemplate for operations used internally (and not
// corresponding to AST elements).
class ZAM_InternalOpTemplate : public ZAM_OpTemplate {
public:
ZAM_InternalOpTemplate(ZAMGen* _g, string _base_name)
: ZAM_OpTemplate(_g, _base_name) { }
bool IsInternalOp() const override { return true; }
protected:
void Parse(const string& attr, const string& line, const Words& words) override;
private:
// True if the internal operation corresponds to an indirect call,
// i.e., one through a variable rather than one directly specified.
bool is_indirect_call = false;
};
// An internal operation that assigns a result to a frame element.
class ZAM_InternalAssignOpTemplate : public ZAM_InternalOpTemplate {
public:
ZAM_InternalAssignOpTemplate(ZAMGen* _g, string _base_name)
: ZAM_InternalOpTemplate(_g, _base_name) { }
bool IsAssignOp() const override { return true; }
};
// Helper classes for managing input from the template file, including
// low-level scanning.
class TemplateInput {
public:
// Program name and file name are for generating error messages.
TemplateInput(FILE* _f, const char* _prog_name, const char* _file_name)
: f(_f), prog_name(_prog_name)
{
loc.file_name = _file_name;
}
const InputLoc& CurrLoc() const { return loc; }
// Fetch the next line of input, including trailing newline.
// Returns true on success, false on EOF or error. Skips over
// comments.
bool ScanLine(string& line);
// Takes a line and splits it into white-space delimited words,
// returned in a vector. Removes trailing whitespace.
Words SplitIntoWords(const string& line) const;
// Returns the line with the given number of initial words skipped.
string SkipWords(const string& line, int n) const;
// Puts back the given line so that the next call to ScanLine will
// return it. Does not nest.
void PutBack(const string& line) { put_back = line; }
// Report an error and exit.
[[noreturn]] void Gripe(const char* msg, const string& input) const;
[[noreturn]] void Gripe(const char* msg, const InputLoc& loc) const;
private:
string put_back; // if non-empty, use this for the next ScanLine
FILE* f;
const char* prog_name;
InputLoc loc;
};
// Driver class for the ZAM instruction generator.
class ZAMGen {
public:
ZAMGen(int argc, char** argv);
// Reads in and records a macro definition, which ends upon
// encountering a blank line or a line that does not begin
// with whitespace.
void ReadMacro(const string& line);
// Emits C++ #define's to implement the recorded macros.
void GenMacros();
// Generates a ZAM op-code for the given template, suffix, and
// instruction class. Also creates auxiliary information associated
// with the instruction.
string GenOpCode(const ZAM_OpTemplate* ot, const string& suffix,
ZAM_InstClass zc = ZIC_REGULAR);
// These methods provide low-level parsing (and error-reporting)
// access to ZAM_OpTemplate objects.
const InputLoc& CurrLoc() const { return ti->CurrLoc(); }
bool ScanLine(string& line) { return ti->ScanLine(line); }
Words SplitIntoWords(const string& line) const
{ return ti->SplitIntoWords(line); }
string SkipWords(const string& line, int n) const
{ return ti->SkipWords(line, n); }
void PutBack(const string& line) { ti->PutBack(line); }
// Methods made public to ZAM_OpTemplate objects for emitting code.
void Emit(EmitTarget et, const string& s);
void IndentUp() { ++indent_level; }
void IndentDown() { --indent_level; }
void SetNoNL(bool _no_NL) { no_NL = _no_NL; }
[[noreturn]] void Gripe(const char* msg, const string& input) const
{ ti->Gripe(msg, input); }
[[noreturn]] void Gripe(const char* msg, const InputLoc& loc) const
{ ti->Gripe(msg, loc); }
private:
// Opens all of the code generation targets, and creates prologs
// for those requiring them (such as for embedding into switch
// statements).
void InitEmitTargets();
void InitSwitch(EmitTarget et, string desc);
// Closes all of the code generation targets, and creates epilogs
// for those requiring them.
void CloseEmitTargets();
void FinishSwitches();
// Parses a single template, returning true on success and false
// if we've reached the end of the input. (Errors during parsing
// result instead in exiting.)
bool ParseTemplate();
// Maps code generation targets with their corresponding FILE*.
std::unordered_map<EmitTarget, FILE*> gen_files;
// Maps code generation targets to strings used to describe any
// associated switch (for error reporting).
std::unordered_map<EmitTarget, string> switch_targets;
// The low-level TemplateInput object used to manage input.
std::unique_ptr<TemplateInput> ti;
// Tracks all of the templates created so far.
vector<std::unique_ptr<ZAM_OpTemplate>> templates;
// Tracks the macros recorded so far.
vector<vector<string>> macros;
// Current indentation level. Maintained globally rather than
// per EmitTarget, so the caller needs to ensure it is managed
// consistently.
int indent_level = 0;
// If true, refrain from appending a newline to any emitted lines.
bool no_NL = false;
};

View file

@ -0,0 +1,167 @@
// See the file "COPYING" in the main distribution directory for copyright.
// Helper functions for generating ZAM code.
#include "zeek/script_opt/ZAM/Compile.h"
namespace zeek::detail {
ZInstI ZAMCompiler::GenInst(ZOp op)
{
return ZInstI(op);
}
ZInstI ZAMCompiler::GenInst(ZOp op, const NameExpr* v1)
{
return ZInstI(op, Frame1Slot(v1, op));
}
ZInstI ZAMCompiler::GenInst(ZOp op, const NameExpr* v1, int i)
{
auto z = ZInstI(op, Frame1Slot(v1, op), i);
z.op_type = OP_VV_I2;
return z;
}
ZInstI ZAMCompiler::GenInst(ZOp op, const ConstExpr* c, const NameExpr* v1,
int i)
{
auto z = ZInstI(op, Frame1Slot(v1, op), i, c);
z.op_type = OP_VVC_I2;
return z;
}
ZInstI ZAMCompiler::GenInst(ZOp op, const NameExpr* v1, const NameExpr* v2)
{
int nv2 = FrameSlot(v2);
return ZInstI(op, Frame1Slot(v1, op), nv2);
}
ZInstI ZAMCompiler::GenInst(ZOp op, const NameExpr* v1, const NameExpr* v2,
const NameExpr* v3)
{
int nv2 = FrameSlot(v2);
int nv3 = FrameSlot(v3);
return ZInstI(op, Frame1Slot(v1, op), nv2, nv3);
}
ZInstI ZAMCompiler::GenInst(ZOp op, const NameExpr* v1, const NameExpr* v2,
const NameExpr* v3, const NameExpr* v4)
{
int nv2 = FrameSlot(v2);
int nv3 = FrameSlot(v3);
int nv4 = FrameSlot(v4);
return ZInstI(op, Frame1Slot(v1, op), nv2, nv3, nv4);
}
ZInstI ZAMCompiler::GenInst(ZOp op, const ConstExpr* ce)
{
return ZInstI(op, ce);
}
ZInstI ZAMCompiler::GenInst(ZOp op, const NameExpr* v1, const ConstExpr* ce)
{
return ZInstI(op, Frame1Slot(v1, op), ce);
}
ZInstI ZAMCompiler::GenInst(ZOp op, const ConstExpr* ce, const NameExpr* v1)
{
return ZInstI(op, Frame1Slot(v1, op), ce);
}
ZInstI ZAMCompiler::GenInst(ZOp op, const NameExpr* v1, const ConstExpr* ce,
const NameExpr* v2)
{
int nv2 = FrameSlot(v2);
return ZInstI(op, Frame1Slot(v1, op), nv2, ce);
}
ZInstI ZAMCompiler::GenInst(ZOp op, const NameExpr* v1, const NameExpr* v2,
const ConstExpr* ce)
{
int nv2 = FrameSlot(v2);
return ZInstI(op, Frame1Slot(v1, op), nv2, ce);
}
ZInstI ZAMCompiler::GenInst(ZOp op, const NameExpr* v1, const NameExpr* v2,
const NameExpr* v3, const ConstExpr* ce)
{
int nv2 = FrameSlot(v2);
int nv3 = FrameSlot(v3);
return ZInstI(op, Frame1Slot(v1, op), nv2, nv3, ce);
}
ZInstI ZAMCompiler::GenInst(ZOp op, const NameExpr* v1, const NameExpr* v2,
const ConstExpr* ce, const NameExpr* v3)
{
// Note that here we reverse the order of the arguments; saves
// us from needing to implement a redundant constructor.
int nv2 = FrameSlot(v2);
int nv3 = FrameSlot(v3);
return ZInstI(op, Frame1Slot(v1, op), nv2, nv3, ce);
}
ZInstI ZAMCompiler::GenInst(ZOp op, const NameExpr* v1, const ConstExpr* c,
int i)
{
auto z = ZInstI(op, Frame1Slot(v1, op), i, c);
z.op_type = OP_VVC_I2;
return z;
}
ZInstI ZAMCompiler::GenInst(ZOp op, const NameExpr* v1, const NameExpr* v2,
int i)
{
int nv2 = FrameSlot(v2);
auto z = ZInstI(op, Frame1Slot(v1, op), nv2, i);
z.op_type = OP_VVV_I3;
return z;
}
ZInstI ZAMCompiler::GenInst(ZOp op, const NameExpr* v1, const NameExpr* v2,
int i1, int i2)
{
int nv2 = FrameSlot(v2);
auto z = ZInstI(op, Frame1Slot(v1, op), nv2, i1, i2);
z.op_type = OP_VVVV_I3_I4;
return z;
}
ZInstI ZAMCompiler::GenInst(ZOp op, const NameExpr* v, const ConstExpr* c,
int i1, int i2)
{
auto z = ZInstI(op, Frame1Slot(v, op), i1, i2, c);
z.op_type = OP_VVVC_I2_I3;
return z;
}
ZInstI ZAMCompiler::GenInst(ZOp op, const NameExpr* v1, const NameExpr* v2,
const NameExpr* v3, int i)
{
int nv2 = FrameSlot(v2);
int nv3 = FrameSlot(v3);
auto z = ZInstI(op, Frame1Slot(v1, op), nv2, nv3, i);
z.op_type = OP_VVVV_I4;
return z;
}
ZInstI ZAMCompiler::GenInst(ZOp op, const NameExpr* v1, const NameExpr* v2,
const ConstExpr* c, int i)
{
int nv2 = FrameSlot(v2);
auto z = ZInstI(op, Frame1Slot(v1, op), nv2, i, c);
z.op_type = OP_VVVC_I3;
return z;
}
ZInstI ZAMCompiler::GenInst(ZOp op, const NameExpr* v1, const ConstExpr* c,
const NameExpr* v2, int i)
{
int nv2 = FrameSlot(v2);
auto z = ZInstI(op, Frame1Slot(v1, op), nv2, i, c);
z.op_type = OP_VVVC_I3;
return z;
}
} // zeek::detail

View file

@ -0,0 +1,39 @@
// See the file "COPYING" in the main distribution directory for copyright.
// Methods for generating ZAM instructions, mainly to aid in translating
// NameExpr*'s to slots. Some aren't needed, but we provide a complete
// set mirroring the ZInstI constructors for consistency.
//
// Maintained separately from Compile.h to make it conceptually simple to
// add new helpers.
ZInstI GenInst(ZOp op);
ZInstI GenInst(ZOp op, const NameExpr* v1);
ZInstI GenInst(ZOp op, const NameExpr* v1, int i);
ZInstI GenInst(ZOp op, const ConstExpr* c, const NameExpr* v1, int i);
ZInstI GenInst(ZOp op, const NameExpr* v1, const NameExpr* v2);
ZInstI GenInst(ZOp op, const NameExpr* v1, const NameExpr* v2,
const NameExpr* v3);
ZInstI GenInst(ZOp op, const NameExpr* v1, const NameExpr* v2,
const NameExpr* v3, const NameExpr* v4);
ZInstI GenInst(ZOp op, const ConstExpr* ce);
ZInstI GenInst(ZOp op, const NameExpr* v1, const ConstExpr* ce);
ZInstI GenInst(ZOp op, const ConstExpr* ce, const NameExpr* v1);
ZInstI GenInst(ZOp op, const NameExpr* v1, const ConstExpr* ce,
const NameExpr* v2);
ZInstI GenInst(ZOp op, const NameExpr* v1, const NameExpr* v2,
const ConstExpr* ce);
ZInstI GenInst(ZOp op, const NameExpr* v1, const NameExpr* v2,
const NameExpr* v3, const ConstExpr* ce);
ZInstI GenInst(ZOp op, const NameExpr* v1, const NameExpr* v2,
const ConstExpr* ce, const NameExpr* v3);
ZInstI GenInst(ZOp op, const NameExpr* v1, const ConstExpr* c, int i);
ZInstI GenInst(ZOp op, const NameExpr* v1, const NameExpr* v2, int i);
ZInstI GenInst(ZOp op, const NameExpr* v1, const NameExpr* v2, int i1, int i2);
ZInstI GenInst(ZOp op, const NameExpr* v, const ConstExpr* c, int i1, int i2);
ZInstI GenInst(ZOp op, const NameExpr* v1, const NameExpr* v2,
const NameExpr* v3, int i);
ZInstI GenInst(ZOp op, const NameExpr* v1, const NameExpr* v2,
const ConstExpr* c, int i);
ZInstI GenInst(ZOp op, const NameExpr* v1, const ConstExpr* c,
const NameExpr* v2, int i);

View file

@ -0,0 +1,146 @@
// See the file "COPYING" in the main distribution directory for copyright.
// Classes to support ZAM for-loop iterations.
#pragma once
#include "zeek/Val.h"
#include "zeek/ZeekString.h"
#include "zeek/script_opt/ZAM/ZInst.h"
namespace zeek::detail {
// Class for iterating over the elements of a table. Requires some care
// because the dictionary iterators need to be destructed when done.
class TableIterInfo {
public:
// No constructor needed, as all of our member variables are
// instead instantiated via BeginLoop(). This allows us to
// reuse TableIterInfo objects to lower the overhead associated
// with executing ZBody::DoExec for non-recursive functions.
// We do, however, want to make sure that when we go out of scope,
// if we have any pending iterators we clear them.
~TableIterInfo() { Clear(); }
// Start looping over the elements of the given table. "_aux"
// provides information about the index variables, their types,
// and the type of the value variable (if any).
void BeginLoop(const TableVal* _tv, ZInstAux* _aux)
{
tv = _tv;
aux = _aux;
auto tvd = tv->AsTable();
tbl_iter = tvd->begin();
tbl_end = tvd->end();
}
// True if we're done iterating, false if not.
bool IsDoneIterating() const
{
return *tbl_iter == *tbl_end;
}
// Indicates that the current iteration is finished.
void IterFinished()
{
++*tbl_iter;
}
// Performs the next iteration (assuming IsDoneIterating() returned
// false), assigning to the index variables.
void NextIter(ZVal* frame)
{
auto ind_lv = tv->RecreateIndex(*(*tbl_iter)->GetHashKey());
for ( int i = 0; i < ind_lv->Length(); ++i )
{
ValPtr ind_lv_p = ind_lv->Idx(i);
auto& var = frame[aux->loop_vars[i]];
auto& t = aux->loop_var_types[i];
if ( ZVal::IsManagedType(t) )
ZVal::DeleteManagedType(var);
var = ZVal(ind_lv_p, t);
}
IterFinished();
}
// For the current iteration, returns the corresponding value.
ZVal IterValue()
{
auto tev = (*tbl_iter)->GetValue<TableEntryVal*>();
return ZVal(tev->GetVal(), aux->value_var_type);
}
// Called upon finishing the iteration.
void EndIter() { Clear(); }
// Called to explicitly clear any iteration state.
void Clear()
{
tbl_iter = std::nullopt;
tbl_end = std::nullopt;
}
private:
// The table we're looping over. If we want to allow for the table
// going away before we're able to clear our iterators then we
// could change this to non-const and use Ref/Unref.
const TableVal* tv = nullptr;
// Associated auxiliary information.
ZInstAux* aux;
std::optional<DictIterator> tbl_iter;
std::optional<DictIterator> tbl_end;
};
// Class for simple step-wise iteration across an integer range.
// Suitable for iterating over vectors or strings.
class StepIterInfo {
public:
// We do some cycle-squeezing by not having a constructor to
// initialize our member variables, since we impose a discipline
// that any use of the object starts with InitLoop(). That lets
// us use quasi-static objects for non-recursive functions.
// Initializes for looping over the elements of a raw vector.
void InitLoop(const std::vector<std::optional<ZVal>>* _vv)
{
vv = _vv;
n = vv->size();
iter = 0;
}
// Initializes for looping over the elements of a raw string.
void InitLoop(const String* _s)
{
s = _s;
n = s->Len();
iter = 0;
}
// True if we're done iterating, false if not.
bool IsDoneIterating() const
{
return iter >= n;
}
// Indicates that the current iteration is finished.
void IterFinished()
{
++iter;
}
// Counter of where we are in the iteration.
bro_uint_t iter; // initialized to 0 at start of loop
bro_uint_t n; // we loop from 0 ... n-1
// The low-level value we're iterating over.
const std::vector<std::optional<ZVal>>* vv;
const String* s;
};
} // namespace zeek::detail

View file

@ -0,0 +1,172 @@
// See the file "COPYING" in the main distribution directory for copyright.
// Methods relating to low-level ZAM instruction manipulation.
#include "zeek/Reporter.h"
#include "zeek/Desc.h"
#include "zeek/script_opt/ZAM/Compile.h"
#include "zeek/script_opt/ScriptOpt.h"
namespace zeek::detail {
const ZAMStmt ZAMCompiler::StartingBlock()
{
return ZAMStmt(insts1.size());
}
const ZAMStmt ZAMCompiler::FinishBlock(const ZAMStmt /* start */)
{
return ZAMStmt(insts1.size() - 1);
}
bool ZAMCompiler::NullStmtOK() const
{
// They're okay iff they're the entire statement body.
return insts1.empty();
}
const ZAMStmt ZAMCompiler::EmptyStmt()
{
return ZAMStmt(insts1.size() - 1);
}
const ZAMStmt ZAMCompiler::LastInst()
{
return ZAMStmt(insts1.size() - 1);
}
const ZAMStmt ZAMCompiler::ErrorStmt()
{
return ZAMStmt(0);
}
OpaqueVals* ZAMCompiler::BuildVals(const ListExprPtr& l)
{
return new OpaqueVals(InternalBuildVals(l.get()));
}
ZInstAux* ZAMCompiler::InternalBuildVals(const ListExpr* l, int stride)
{
auto exprs = l->Exprs();
int n = exprs.length();
auto aux = new ZInstAux(n * stride);
int offset = 0; // offset into aux info
for ( int i = 0; i < n; ++i )
{
auto& e = exprs[i];
int num_vals = InternalAddVal(aux, offset, e);
ASSERT(num_vals == stride);
offset += num_vals;
}
return aux;
}
int ZAMCompiler::InternalAddVal(ZInstAux* zi, int i, Expr* e)
{
if ( e->Tag() == EXPR_ASSIGN )
{ // We're building up a table constructor
auto& indices = e->GetOp1()->AsListExpr()->Exprs();
auto val = e->GetOp2();
int width = indices.length();
for ( int j = 0; j < width; ++j )
ASSERT(InternalAddVal(zi, i + j, indices[j]) == 1);
ASSERT(InternalAddVal(zi, i + width, val.get()) == 1);
return width + 1;
}
if ( e->Tag() == EXPR_LIST )
{ // We're building up a set constructor
auto& indices = e->AsListExpr()->Exprs();
int width = indices.length();
for ( int j = 0; j < width; ++j )
ASSERT(InternalAddVal(zi, i + j, indices[j]) == 1);
return width;
}
if ( e->Tag() == EXPR_FIELD_ASSIGN )
{
// These can appear when we're processing the expression
// list for a record constructor.
auto fa = e->AsFieldAssignExpr();
e = fa->GetOp1().get();
if ( e->GetType()->Tag() == TYPE_TYPE )
{
// Ugh - we actually need a "type" constant.
auto v = e->Eval(nullptr);
ASSERT(v);
zi->Add(i, v);
return 1;
}
// Now that we've adjusted, fall through.
}
if ( e->Tag() == EXPR_NAME )
zi->Add(i, FrameSlot(e->AsNameExpr()), e->GetType());
else
zi->Add(i, e->AsConstExpr()->ValuePtr());
return 1;
}
const ZAMStmt ZAMCompiler::AddInst(const ZInstI& inst)
{
ZInstI* i;
if ( pending_inst )
{
i = pending_inst;
pending_inst = nullptr;
}
else
i = new ZInstI();
*i = inst;
insts1.push_back(i);
top_main_inst = insts1.size() - 1;
if ( pending_global_store < 0 )
return ZAMStmt(top_main_inst);
auto global_slot = pending_global_store;
pending_global_store = -1;
auto store_inst = ZInstI(OP_STORE_GLOBAL_V, global_slot);
store_inst.op_type = OP_V_I1;
store_inst.t = globalsI[global_slot].id->GetType();
return AddInst(store_inst);
}
const Stmt* ZAMCompiler::LastStmt(const Stmt* s) const
{
if ( s->Tag() == STMT_LIST )
{
auto sl = s->AsStmtList()->Stmts();
return sl[sl.length() - 1];
}
else
return s;
}
ZAMStmt ZAMCompiler::PrevStmt(const ZAMStmt s)
{
return ZAMStmt(s.stmt_num - 1);
}
} // zeek::detail

2107
src/script_opt/ZAM/Ops.in Normal file

File diff suppressed because it is too large Load diff

View file

@ -0,0 +1,129 @@
<h1 align="center">
ZAM Optimization: User's Guide
</h1><h4 align="center">
[_Overview_](#overview) -
[_Known Issues_](#known-issues) -
[_Optimization Options_](#script-optimization-options) -
</h4>
<br>
## Overview
Zeek's _ZAM optimization_ is an experimental feature that changes the
basic execution model for Zeek scripts in an effort to gain higher
performance. Normally, Zeek parses scripts into _Abstract Syntax Trees_
that are then executed by recursively interpreting each node in a given
tree. With script optimization, Zeek compiles the trees into a low-level
form that can generally be executed more efficiently.
You specify use of this feature by including `-O ZAM` on the command
line. (Note that this option takes a few seconds to generate the ZAM code, unless you're using `-b` _bare mode_.)
How much faster will your scripts run? There's no simple answer to that.
It depends heavily on several factors:
* What proportion of the processing during execution is spent in Zeek's
_Event Engine_ rather than executing scripts. ZAM optimization doesn't
help with Event Engine execution.
* What proportion of the script's processing is spent executing built-in
functions (BiFs). ZAM optimization improves execution for some select,
_simple_ BiFs, like `network_time()`, but it doesn't help for complex BiFs.
It might well be that most of your script processing actually occurs inside
the _Logging Framework_, for example, and thus you won't see much improvement.
* Those two factors add up to gains very often on the order of only 10-15%,
rather than something a lot more dramatic.
* In addition, there are some
[types of scripts that currently can't be compiled](#Scripts-that-cannot-be-compiled),
and thus will remain interpreted. If your processing bottlenecks in such
scripts, you won't see much in the way of gains.
<br>
## Known Issues
Here we list various issues with using script optimization, including both
deficiencies (problems to eventually fix) and incompatibilities (differences
in behavior from the default of script interpretation, not necessarily
fixable). For each, the corresponding list is roughly ordered from
you're-most-likely-to-care-about-it to you're-less-likely-to-care, though
of course this varies for different users.
<br>
### Deficiencies to eventually fix:
* Error messages in compiled scripts often lack important identifying
information.
* The optimizer assumes you have ensured initialization of your variables.
If your script uses a variable that hasn't been set, the compiled code may
crash or behave aberrantly. You can use the `-u` command-line flag to find such potential usage issues.
* Certain complex "when" expressions may fail to reevaluate when elements
of the expression are modified by compiled scripts.
<br>
### Incompatibilities:
* When printing scripts (such as in some error messages), the names of
variables often reflect internal temporaries rather than the original
variables.
* The deprecated feature of intermixing vectors and scalars in operations
(e.g., `v2 = v1 * 3`) is not supported.
* The `same_object()` BiF will always deem two non-container values as
different.
<br>
### Scripts that cannot be compiled:
The ZAM optimizer does not compile scripts that include "when" statements or
lambda expressions. These will take substantial work to support. It also
will not inline such scripts, nor will it inline scripts that are either
directly or indirectly recursive.
You can get a list of non-compilable scripts using
`-O ZAM -O report-uncompilable`. For recursive scripts, use
`-O report-recursive` (no `-O ZAM` required, since it doesn't apply to the
alternative optimization, `-O gen-C++`).
<br>
## Script Optimization Options
Users will generally simply use `-O ZAM` to invoke the script optimizer.
There are, however, a number of additional options, nearly all of which
only have relevance for those debugging optimization problems or performance
issues:
|Option|Meaning|
|---|---|
|`dump-uds` | Dump use-defs to _stdout_.|
|`dump-xform` | Dump transformed scripts to _stdout_.|
|`dump-ZAM` | Dump generated ZAM code to _stdout_.|
|`help` | Print this list.|
|`inline` | Inline function calls.|
|`no-ZAM-opt` | Turn off low-level ZAM optimization.|
|`optimize-all` | Optimize all scripts, even inlined ones. You need to separately specify which optimizations you want to apply, e.g., `-O inline -O xform`.|
|`optimize-AST` | Optimize the (transform) AST; implies `xform`.|
|`profile-ZAM` | Generate to _stdout_ a ZAM execution profile. (Requires configuring with `--enable-debug`.)|
|`report-recursive` | Report on recursive functions and exit.|
|`report-uncompilable` | Report on uncompilable functions and exit.|
|`xform` | Transform scripts to "reduced" form.|
<br>
<br>

1154
src/script_opt/ZAM/Stmt.cc Normal file

File diff suppressed because it is too large Load diff

View file

@ -0,0 +1,106 @@
// See the file "COPYING" in the main distribution directory for copyright.
// Low-level support utilities/globals for ZAM compilation.
#include "zeek/Reporter.h"
#include "zeek/Desc.h"
#include "zeek/ZeekString.h"
#include "zeek/script_opt/ProfileFunc.h"
#include "zeek/script_opt/ZAM/Support.h"
namespace zeek::detail {
const Stmt* curr_stmt;
TypePtr log_ID_enum_type;
TypePtr any_base_type;
bool ZAM_error = false;
bool is_ZAM_compilable(const ProfileFunc* pf, const char** reason)
{
if ( pf->NumLambdas() > 0 )
{
if ( reason )
*reason = "use of lambda";
return false;
}
if ( pf->NumWhenStmts() > 0 )
{
if ( reason )
*reason = "use of \"when\"";
return false;
}
return true;
}
bool IsAny(const Type* t)
{
return t->Tag() == TYPE_ANY;
}
StringVal* ZAM_to_lower(const StringVal* sv)
{
auto bs = sv->AsString();
const u_char* s = bs->Bytes();
int n = bs->Len();
u_char* lower_s = new u_char[n + 1];
u_char* ls = lower_s;
for ( int i = 0; i < n; ++i )
{
if ( isascii(s[i]) && isupper(s[i]) )
*ls++ = tolower(s[i]);
else
*ls++ = s[i];
}
*ls++ = '\0';
return new StringVal(new String(1, lower_s, n));
}
StringVal* ZAM_sub_bytes(const StringVal* s, bro_uint_t start, bro_int_t n)
{
if ( start > 0 )
--start; // make it 0-based
auto ss = s->AsString()->GetSubstring(start, n);
return new StringVal(ss ? ss : new String(""));
}
void ZAM_run_time_error(const char* msg)
{
fprintf(stderr, "%s\n", msg);
ZAM_error = true;
}
void ZAM_run_time_error(const Location* loc, const char* msg)
{
reporter->RuntimeError(loc, "%s", msg);
ZAM_error = true;
}
void ZAM_run_time_error(const char* msg, const Obj* o)
{
fprintf(stderr, "%s: %s\n", msg, obj_desc(o).c_str());
ZAM_error = true;
}
void ZAM_run_time_error(const Location* loc, const char* msg, const Obj* o)
{
reporter->RuntimeError(loc, "%s (%s)", msg, obj_desc(o).c_str());
ZAM_error = true;
}
void ZAM_run_time_warning(const Location* loc, const char* msg)
{
ODesc d;
loc->Describe(&d);
reporter->Warning("%s: %s", d.Description(), msg);
}
} // namespace zeek::detail

View file

@ -0,0 +1,53 @@
// See the file "COPYING" in the main distribution directory for copyright.
// Low-level support utilities/globals for ZAM compilation.
#pragma once
#include "zeek/Expr.h"
#include "zeek/Stmt.h"
namespace zeek::detail {
using ValVec = std::vector<ValPtr>;
// The (reduced) statement currently being compiled. Used for both
// tracking "use" and "reaching" definitions, and for error messages.
extern const Stmt* curr_stmt;
// True if a function with the given profile can be compiled to ZAM.
// If not, returns the reason in *reason, if non-nil.
class ProfileFunc;
extern bool is_ZAM_compilable(const ProfileFunc* pf,
const char** reason = nullptr);
// True if a given type is one that we treat internally as an "any" type.
extern bool IsAny(const Type* t);
// Convenience functions for getting to these.
inline bool IsAny(const TypePtr& t) { return IsAny(t.get()); }
inline bool IsAny(const Expr* e) { return IsAny(e->GetType()); }
// Needed for the logging built-in. Exported so that ZAM can make sure it's
// defined when compiling.
extern TypePtr log_ID_enum_type;
// Needed for a slight performance gain when dealing with "any" types.
extern TypePtr any_base_type;
extern void ZAM_run_time_error(const char* msg);
extern void ZAM_run_time_error(const Location* loc, const char* msg);
extern void ZAM_run_time_error(const Location* loc, const char* msg,
const Obj* o);
extern void ZAM_run_time_error(const Stmt* stmt, const char* msg);
extern void ZAM_run_time_error(const char* msg, const Obj* o);
extern bool ZAM_error;
extern void ZAM_run_time_warning(const Location* loc, const char* msg);
extern StringVal* ZAM_to_lower(const StringVal* sv);
extern StringVal* ZAM_sub_bytes(const StringVal* s, bro_uint_t start, bro_int_t n);
} // namespace zeek::detail

160
src/script_opt/ZAM/Vars.cc Normal file
View file

@ -0,0 +1,160 @@
// See the file "COPYING" in the main distribution directory for copyright.
// Methods for dealing with variables (both ZAM and script-level).
#include "zeek/Reporter.h"
#include "zeek/Desc.h"
#include "zeek/script_opt/ProfileFunc.h"
#include "zeek/script_opt/Reduce.h"
#include "zeek/script_opt/ZAM/Compile.h"
namespace zeek::detail {
bool ZAMCompiler::IsUnused(const IDPtr& id, const Stmt* where) const
{
if ( ! ud->HasUsage(where) )
return true;
auto usage = ud->GetUsage(where);
// "usage" can be nil if due to constant propagation we've prune
// all of the uses of the given identifier.
return ! usage || ! usage->HasID(id.get());
}
void ZAMCompiler::LoadParam(ID* id)
{
if ( id->IsType() )
reporter->InternalError("don't know how to compile local variable that's a type not a value");
bool is_any = IsAny(id->GetType());
ZOp op;
op = AssignmentFlavor(OP_LOAD_VAL_VV, id->GetType()->Tag());
int slot = AddToFrame(id);
ZInstI z(op, slot, id->Offset());
z.SetType(id->GetType());
z.op_type = OP_VV_FRAME;
(void) AddInst(z);
}
const ZAMStmt ZAMCompiler::LoadGlobal(ID* id)
{
ZOp op;
if ( id->IsType() )
// Need a special load for these, as they don't fit
// with the usual template.
op = OP_LOAD_GLOBAL_TYPE_VV;
else
op = AssignmentFlavor(OP_LOAD_GLOBAL_VV, id->GetType()->Tag());
auto slot = RawSlot(id);
ZInstI z(op, slot, global_id_to_info[id]);
z.SetType(id->GetType());
z.op_type = OP_VV_I2;
// We use the id_val for reporting used-but-not-set errors.
z.aux = new ZInstAux(0);
z.aux->id_val = id;
return AddInst(z);
}
int ZAMCompiler::AddToFrame(ID* id)
{
frame_layout1[id] = frame_sizeI;
frame_denizens.push_back(id);
return frame_sizeI++;
}
int ZAMCompiler::FrameSlot(const ID* id)
{
auto slot = RawSlot(id);
if ( id->IsGlobal() )
(void) LoadGlobal(frame_denizens[slot]);
return slot;
}
int ZAMCompiler::Frame1Slot(const ID* id, ZAMOp1Flavor fl)
{
auto slot = RawSlot(id);
switch ( fl ) {
case OP1_READ:
if ( id->IsGlobal() )
(void) LoadGlobal(frame_denizens[slot]);
break;
case OP1_WRITE:
if ( id->IsGlobal() )
pending_global_store = global_id_to_info[id];
break;
case OP1_READ_WRITE:
if ( id->IsGlobal() )
{
(void) LoadGlobal(frame_denizens[slot]);
pending_global_store = global_id_to_info[id];
}
break;
case OP1_INTERNAL:
break;
}
return slot;
}
int ZAMCompiler::RawSlot(const ID* id)
{
auto id_slot = frame_layout1.find(id);
if ( id_slot == frame_layout1.end() )
reporter->InternalError("ID %s missing from frame layout", id->Name());
return id_slot->second;
}
bool ZAMCompiler::HasFrameSlot(const ID* id) const
{
return frame_layout1.find(id) != frame_layout1.end();
}
int ZAMCompiler::NewSlot(bool is_managed)
{
char buf[8192];
snprintf(buf, sizeof buf, "#internal-%d#", frame_sizeI);
// In the following, all that matters is that for managed types
// we pick a tag that will be viewed as managed, and vice versa.
auto tag = is_managed ? TYPE_TABLE : TYPE_VOID;
auto internal_reg = new ID(buf, SCOPE_FUNCTION, false);
internal_reg->SetType(base_type(tag));
return AddToFrame(internal_reg);
}
int ZAMCompiler::TempForConst(const ConstExpr* c)
{
auto slot = NewSlot(c->GetType());
auto z = ZInstI(OP_ASSIGN_CONST_VC, slot, c);
z.CheckIfManaged(c->GetType());
(void) AddInst(z);
return slot;
}
} // zeek::detail

563
src/script_opt/ZAM/ZBody.cc Normal file
View file

@ -0,0 +1,563 @@
// See the file "COPYING" in the main distribution directory for copyright.
#include "zeek/Desc.h"
#include "zeek/RE.h"
#include "zeek/Frame.h"
#include "zeek/EventHandler.h"
#include "zeek/Trigger.h"
#include "zeek/Traverse.h"
#include "zeek/Overflow.h"
#include "zeek/Reporter.h"
#include "zeek/script_opt/ScriptOpt.h"
#include "zeek/script_opt/ZAM/Compile.h"
// Needed for managing the corresponding values.
#include "zeek/File.h"
#include "zeek/Func.h"
#include "zeek/OpaqueVal.h"
// Just needed for BiFs.
#include "zeek/analyzer/Manager.h"
#include "zeek/broker/Manager.h"
#include "zeek/file_analysis/Manager.h"
#include "zeek/logging/Manager.h"
namespace zeek::detail {
using std::vector;
static bool did_init = false;
// Count of how often each type of ZOP executed, and how much CPU it
// cumulatively took.
int ZOP_count[OP_NOP+1];
double ZOP_CPU[OP_NOP+1];
void report_ZOP_profile()
{
for ( int i = 1; i <= OP_NOP; ++i )
if ( ZOP_count[i] > 0 )
printf("%s\t%d\t%.06f\n", ZOP_name(ZOp(i)),
ZOP_count[i], ZOP_CPU[i]);
}
// Sets the given element to a copy of an existing (not newly constructed)
// ZVal, including underlying memory management. Returns false if the
// assigned value was missing (which we can only tell for managed types),
// true otherwise.
static bool copy_vec_elem(VectorVal* vv, int ind, ZVal zv, const TypePtr& t)
{
if ( vv->Size() <= ind )
vv->Resize(ind + 1);
auto& elem = (*vv->RawVec())[ind];
if ( ! ZVal::IsManagedType(t) )
{
elem = zv;
return true;
}
if ( elem )
ZVal::DeleteManagedType(*elem);
elem = zv;
auto managed_elem = elem->ManagedVal();
if ( ! managed_elem )
{
elem = std::nullopt;
return false;
}
zeek::Ref(managed_elem);
return true;
}
// Unary and binary element-by-element vector operations, yielding a new
// VectorVal with a yield type of 't'. 'z' is passed in only for localizing
// errors.
static void vec_exec(ZOp op, TypePtr t, VectorVal*& v1, const VectorVal* v2,
const ZInst& z);
static void vec_exec(ZOp op, TypePtr t, VectorVal*& v1, const VectorVal* v2,
const VectorVal* v3, const ZInst& z);
// Vector coercion.
#define VEC_COERCE(tag, lhs_type, cast, rhs_accessor, ov_check, ov_err) \
static VectorVal* vec_coerce_##tag(VectorVal* vec, const ZInst& z) \
{ \
auto& v = *vec->RawVec(); \
auto yt = make_intrusive<VectorType>(base_type(lhs_type)); \
auto res_zv = new VectorVal(yt); \
auto n = v.size(); \
res_zv->Resize(n); \
auto& res = *res_zv->RawVec(); \
for ( auto i = 0U; i < n; ++i ) \
if ( v[i] ) \
{ \
auto vi = (*v[i]).rhs_accessor; \
if ( ov_check(vi) ) \
{ \
std::string err = "overflow promoting from "; \
err += ov_err; \
err += " arithmetic value"; \
ZAM_run_time_error(z.loc, err.c_str()); \
res[i] = std::nullopt; \
} \
else \
res[i] = ZVal(cast(vi)); \
} \
else \
res[i] = std::nullopt; \
return res_zv; \
}
#define false_func(x) false
VEC_COERCE(DI, TYPE_DOUBLE, double, AsInt(), false_func, "")
VEC_COERCE(DU, TYPE_DOUBLE, double, AsCount(), false_func, "")
VEC_COERCE(ID, TYPE_INT, bro_int_t, AsDouble(), double_to_int_would_overflow, "double to signed")
VEC_COERCE(IU, TYPE_INT, bro_int_t, AsCount(), count_to_int_would_overflow, "unsigned to signed")
VEC_COERCE(UD, TYPE_COUNT, bro_uint_t, AsDouble(), double_to_count_would_overflow, "double to unsigned")
VEC_COERCE(UI, TYPE_COUNT, bro_int_t, AsInt(), int_to_count_would_overflow, "signed to unsigned")
double curr_CPU_time()
{
struct timespec ts;
clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &ts);
return double(ts.tv_sec) + double(ts.tv_nsec) / 1e9;
}
ZBody::ZBody(const char* _func_name, const ZAMCompiler* zc)
: Stmt(STMT_ZAM)
{
func_name = _func_name;
frame_denizens = zc->FrameDenizens();
frame_size = frame_denizens.size();
// Concretize the names of the frame denizens.
for ( auto& f : frame_denizens )
for ( auto& id : f.ids )
f.names.push_back(id->Name());
managed_slots = zc->ManagedSlots();
globals = zc->Globals();
num_globals = globals.size();
int_cases = zc->GetCases<bro_int_t>();
uint_cases = zc->GetCases<bro_uint_t>();
double_cases = zc->GetCases<double>();
str_cases = zc->GetCases<std::string>();
if ( zc->NonRecursive() )
{
fixed_frame = new ZVal[frame_size];
for ( auto& ms : managed_slots )
fixed_frame[ms].ClearManagedVal();
}
table_iters = zc->GetTableIters();
num_step_iters = zc->NumStepIters();
// It's a little weird doing this in the constructor, but unless
// we add a general "initialize for ZAM" function, this is as good
// a place as any.
if ( ! did_init )
{
auto log_ID_type = lookup_ID("ID", "Log");
ASSERT(log_ID_type);
log_ID_enum_type = log_ID_type->GetType<EnumType>();
any_base_type = base_type(TYPE_ANY);
ZVal::SetZValNilStatusAddr(&ZAM_error);
did_init = false;
}
}
ZBody::~ZBody()
{
delete[] fixed_frame;
delete[] insts;
delete inst_count;
delete CPU_time;
}
void ZBody::SetInsts(vector<ZInst*>& _insts)
{
ninst = _insts.size();
auto insts_copy = new ZInst[ninst];
for ( auto i = 0U; i < ninst; ++i )
insts_copy[i] = *_insts[i];
insts = insts_copy;
InitProfile();
}
void ZBody::SetInsts(vector<ZInstI*>& instsI)
{
ninst = instsI.size();
auto insts_copy = new ZInst[ninst];
for ( auto i = 0U; i < ninst; ++i )
{
auto& iI = *instsI[i];
insts_copy[i] = iI;
if ( iI.stmt )
insts_copy[i].loc = iI.stmt->Original()->GetLocationInfo();
}
insts = insts_copy;
InitProfile();
}
void ZBody::InitProfile()
{
if ( analysis_options.profile_ZAM )
{
inst_count = new vector<int>;
inst_CPU = new vector<double>;
for ( auto i = 0U; i < ninst; ++i )
{
inst_count->push_back(0);
inst_CPU->push_back(0.0);
}
CPU_time = new double;
*CPU_time = 0.0;
}
}
ValPtr ZBody::Exec(Frame* f, StmtFlowType& flow)
{
#ifdef DEBUG
double t = analysis_options.profile_ZAM ? curr_CPU_time() : 0.0;
#endif
auto val = DoExec(f, 0, flow);
#ifdef DEBUG
if ( analysis_options.profile_ZAM )
*CPU_time += curr_CPU_time() - t;
#endif
return val;
}
ValPtr ZBody::DoExec(Frame* f, int start_pc, StmtFlowType& flow)
{
int pc = start_pc;
const int end_pc = ninst;
// Return value, or nil if none.
const ZVal* ret_u;
// Type of the return value. If nil, then we don't have a value.
TypePtr ret_type;
#ifdef DEBUG
bool do_profile = analysis_options.profile_ZAM;
#endif
ZVal* frame;
std::unique_ptr<TableIterVec> local_table_iters;
std::vector<StepIterInfo> step_iters(num_step_iters);
if ( fixed_frame )
frame = fixed_frame;
else
{
frame = new ZVal[frame_size];
// Clear slots for which we do explicit memory management.
for ( auto s : managed_slots )
frame[s].ClearManagedVal();
if ( ! table_iters.empty() )
{
local_table_iters =
std::make_unique<TableIterVec>(table_iters.size());
*local_table_iters = table_iters;
tiv_ptr = &(*local_table_iters);
}
}
flow = FLOW_RETURN; // can be over-written by a Hook-Break
while ( pc < end_pc && ! ZAM_error )
{
auto& z = insts[pc];
#ifdef DEBUG
int profile_pc;
double profile_CPU;
if ( do_profile )
{
++ZOP_count[z.op];
++(*inst_count)[pc];
profile_pc = pc;
profile_CPU = curr_CPU_time();
}
#endif
switch ( z.op ) {
case OP_NOP:
break;
#include "ZAM-EvalMacros.h"
#include "ZAM-EvalDefs.h"
default:
reporter->InternalError("bad ZAM opcode");
}
#ifdef DEBUG
if ( do_profile )
{
double dt = curr_CPU_time() - profile_CPU;
inst_CPU->at(profile_pc) += dt;
ZOP_CPU[z.op] += dt;
}
#endif
++pc;
}
auto result = ret_type ? ret_u->ToVal(ret_type) : nullptr;
if ( fixed_frame )
{
// Make sure we don't have any dangling iterators.
for ( auto& ti : table_iters )
ti.Clear();
// Free slots for which we do explicit memory management,
// preparing them for reuse.
for ( auto& ms : managed_slots )
{
auto& v = frame[ms];
ZVal::DeleteManagedType(v);
v.ClearManagedVal();
}
}
else
{
// Free those slots for which we do explicit memory management.
// No need to then clear them, as we're about to throw away
// the entire frame.
for ( auto& ms : managed_slots )
{
auto& v = frame[ms];
ZVal::DeleteManagedType(v);
}
delete [] frame;
}
// Clear any error state.
ZAM_error = false;
return result;
}
void ZBody::ProfileExecution() const
{
if ( inst_count->empty() )
{
printf("%s has an empty body\n", func_name);
return;
}
if ( (*inst_count)[0] == 0 )
{
printf("%s did not execute\n", func_name);
return;
}
printf("%s CPU time: %.06f\n", func_name, *CPU_time);
for ( auto i = 0U; i < inst_count->size(); ++i )
{
printf("%s %d %d %.06f ", func_name, i,
(*inst_count)[i], (*inst_CPU)[i]);
insts[i].Dump(i, &frame_denizens);
}
}
bool ZBody::CheckAnyType(const TypePtr& any_type, const TypePtr& expected_type,
const Location* loc) const
{
if ( IsAny(expected_type) )
return true;
if ( ! same_type(any_type, expected_type, false, false) )
{
auto at = any_type->Tag();
auto et = expected_type->Tag();
if ( at == TYPE_RECORD && et == TYPE_RECORD )
{
auto at_r = any_type->AsRecordType();
auto et_r = expected_type->AsRecordType();
if ( record_promotion_compatible(et_r, at_r) )
return true;
}
char buf[8192];
snprintf(buf, sizeof buf, "run-time type clash (%s/%s)",
type_name(at), type_name(et));
reporter->RuntimeError(loc, "%s", buf);
return false;
}
return true;
}
void ZBody::Dump() const
{
printf("Frame:\n");
for ( unsigned i = 0; i < frame_denizens.size(); ++i )
{
auto& d = frame_denizens[i];
printf("frame[%d] =", i);
if ( d.names.empty() )
for ( auto& id : d.ids )
printf(" %s", id->Name());
else
for ( auto& n : d.names )
printf(" %s", n);
printf("\n");
}
printf("Final code:\n");
for ( unsigned i = 0; i < ninst; ++i )
{
auto& inst = insts[i];
printf("%d: ", i);
inst.Dump(i, &frame_denizens);
}
}
void ZBody::StmtDescribe(ODesc* d) const
{
d->AddSP("ZAM-code");
d->AddSP(func_name);
}
TraversalCode ZBody::Traverse(TraversalCallback* cb) const
{
TraversalCode tc = cb->PreStmt(this);
HANDLE_TC_STMT_PRE(tc);
tc = cb->PostStmt(this);
HANDLE_TC_STMT_POST(tc);
}
ValPtr ZAMResumption::Exec(Frame* f, StmtFlowType& flow)
{
return am->DoExec(f, xfer_pc, flow);
}
void ZAMResumption::StmtDescribe(ODesc* d) const
{
d->Add("<resumption of compiled code>");
}
TraversalCode ZAMResumption::Traverse(TraversalCallback* cb) const
{
TraversalCode tc = cb->PreStmt(this);
HANDLE_TC_STMT_PRE(tc);
tc = cb->PostStmt(this);
HANDLE_TC_STMT_POST(tc);
}
// Unary vector operation of v1 <vec-op> v2.
static void vec_exec(ZOp op, TypePtr t, VectorVal*& v1, const VectorVal* v2,
const ZInst& z)
{
// We could speed this up further still by gen'ing up an instance
// of the loop inside each switch case (in which case we might as
// well move the whole kit-and-caboodle into the Exec method). But
// that seems like a lot of code bloat for only a very modest gain.
auto& vec2 = *v2->RawVec();
auto n = vec2.size();
auto vec1_ptr = new vector<std::optional<ZVal>>(n);
auto& vec1 = *vec1_ptr;
for ( auto i = 0U; i < n; ++i )
{
if ( vec2[i] )
switch ( op ) {
#include "ZAM-Vec1EvalDefs.h"
default:
reporter->InternalError("bad invocation of VecExec");
}
else
vec1[i] = std::nullopt;
}
auto vt = cast_intrusive<VectorType>(std::move(t));
auto old_v1 = v1;
v1 = new VectorVal(std::move(vt), vec1_ptr);
Unref(old_v1);
}
// Binary vector operation of v1 = v2 <vec-op> v3.
static void vec_exec(ZOp op, TypePtr t, VectorVal*& v1,
const VectorVal* v2, const VectorVal* v3, const ZInst& z)
{
// See comment above re further speed-up.
auto& vec2 = *v2->RawVec();
auto& vec3 = *v3->RawVec();
auto n = vec2.size();
auto vec1_ptr = new vector<std::optional<ZVal>>(n);
auto& vec1 = *vec1_ptr;
for ( auto i = 0U; i < vec2.size(); ++i )
{
if ( vec2[i] && vec3[i] )
switch ( op ) {
#include "ZAM-Vec2EvalDefs.h"
default:
reporter->InternalError("bad invocation of VecExec");
}
else
vec1[i] = std::nullopt;
}
auto vt = cast_intrusive<VectorType>(std::move(t));
auto old_v1 = v1;
v1 = new VectorVal(std::move(vt), vec1_ptr);
Unref(old_v1);
}
} // zeek::detail

144
src/script_opt/ZAM/ZBody.h Normal file
View file

@ -0,0 +1,144 @@
// See the file "COPYING" in the main distribution directory for copyright.
// ZBody: ZAM function body that replaces a function's original AST body.
#pragma once
#include "zeek/script_opt/ZAM/IterInfo.h"
#include "zeek/script_opt/ZAM/Support.h"
namespace zeek::detail {
// Static information about globals used in a function.
class GlobalInfo {
public:
IDPtr id;
int slot;
};
// These are the counterparts to CaseMapI and CaseMapsI in ZAM.h,
// but concretized to use instruction numbers rather than pointers
// to instructions.
template<typename T> using CaseMap = std::map<T, int>;
template<typename T> using CaseMaps = std::vector<CaseMap<T>>;
using TableIterVec = std::vector<TableIterInfo>;
class ZBody : public Stmt {
public:
ZBody(const char* _func_name, const ZAMCompiler* zc);
~ZBody() override;
// These are split out from the constructor to allow construction
// of a ZBody from either save-file full instructions (first method)
// or intermediary instructions (second method).
void SetInsts(std::vector<ZInst*>& insts);
void SetInsts(std::vector<ZInstI*>& instsI);
ValPtr Exec(Frame* f, StmtFlowType& flow) override;
// Older code exists for save files, but let's see if we can
// avoid having to support them, as they're a fairly elaborate
// production.
//
// void SaveTo(FILE* f, int interp_frame_size) const;
void Dump() const;
void ProfileExecution() const;
protected:
friend class ZAMResumption;
// Initializes profiling information, if needed.
void InitProfile();
ValPtr DoExec(Frame* f, int start_pc, StmtFlowType& flow);
// Run-time checking for "any" type being consistent with
// expected typed. Returns true if the type match is okay.
bool CheckAnyType(const TypePtr& any_type, const TypePtr& expected_type,
const Location* loc) const;
StmtPtr Duplicate() override { return {NewRef{}, this}; }
void StmtDescribe(ODesc* d) const override;
TraversalCode Traverse(TraversalCallback* cb) const override;
private:
const char* func_name;
const ZInst* insts = nullptr;
unsigned int ninst;
FrameReMap frame_denizens;
int frame_size;
// A list of frame slots that correspond to managed values.
std::vector<int> managed_slots;
// This is non-nil if the function is (asserted to be) non-recursive,
// in which case we pre-allocate this.
ZVal* fixed_frame = nullptr;
// Pre-allocated table iteration values. For recursive invocations,
// these are copied into a local stack variable, but for non-recursive
// functions they can be used directly.
TableIterVec table_iters;
// Points to the TableIterVec used to manage iteration over tables.
// For non-recursive functions, we just use the static one, but
// for recursive ones this points to the local stack variable.
TableIterVec* tiv_ptr = &table_iters;
// Number of StepIterInfo's required by the function. These we
// always create using a local stack variable, since they don't
// require any overhead or cleanup.
int num_step_iters;
std::vector<GlobalInfo> globals;
int num_globals;
// The following are only maintained if we're doing profiling.
//
// These need to be pointers so we can manipulate them in a
// const method.
std::vector<int>* inst_count = nullptr; // for profiling
double* CPU_time = nullptr; // cumulative CPU time for the program
std::vector<double>* inst_CPU; // per-instruction CPU time.
CaseMaps<bro_int_t> int_cases;
CaseMaps<bro_uint_t> uint_cases;
CaseMaps<double> double_cases;
CaseMaps<std::string> str_cases;
};
// This is a statement that resumes execution into a code block in a
// ZBody. Used for deferred execution for "when" statements.
class ZAMResumption : public Stmt {
public:
ZAMResumption(ZBody* _am, int _xfer_pc)
: Stmt(STMT_ZAM_RESUMPTION), am(_am), xfer_pc(_xfer_pc)
{ }
ValPtr Exec(Frame* f, StmtFlowType& flow) override;
StmtPtr Duplicate() override { return {NewRef{}, this}; }
void StmtDescribe(ODesc* d) const override;
protected:
TraversalCode Traverse(TraversalCallback* cb) const override;
ZBody* am;
int xfer_pc = 0;
};
// Prints the execution profile.
extern void report_ZOP_profile();
} // namespace zeek::detail

627
src/script_opt/ZAM/ZInst.cc Normal file
View file

@ -0,0 +1,627 @@
// See the file "COPYING" in the main distribution directory for copyright.
#include "zeek/Desc.h"
#include "zeek/Reporter.h"
#include "zeek/Func.h"
#include "zeek/script_opt/ZAM/ZInst.h"
using std::string;
namespace zeek::detail {
void ZInst::Dump(int inst_num, const FrameReMap* mappings) const
{
// printf("v%d ", n);
auto id1 = VName(1, inst_num, mappings);
auto id2 = VName(2, inst_num, mappings);
auto id3 = VName(3, inst_num, mappings);
auto id4 = VName(4, inst_num, mappings);
Dump(id1, id2, id3, id4);
}
void ZInst::Dump(const string& id1, const string& id2, const string& id3,
const string& id4) const
{
printf("%s ", ZOP_name(op));
// printf("(%s) ", op_type_name(op_type));
if ( t && 0 )
printf("(%s) ", type_name(t->Tag()));
switch ( op_type ) {
case OP_X:
break;
case OP_V:
printf("%s", id1.c_str());
break;
case OP_VV:
printf("%s, %s", id1.c_str(), id2.c_str());
break;
case OP_VVV:
printf("%s, %s, %s", id1.c_str(), id2.c_str(), id3.c_str());
break;
case OP_VVVV:
printf("%s, %s, %s, %s", id1.c_str(), id2.c_str(), id3.c_str(),
id4.c_str());
break;
case OP_VVVC:
printf("%s, %s, %s, %s", id1.c_str(), id2.c_str(), id3.c_str(),
ConstDump().c_str());
break;
case OP_C:
printf("%s", ConstDump().c_str());
break;
case OP_VC:
printf("%s, %s", id1.c_str(), ConstDump().c_str());
break;
case OP_VVC:
printf("%s, %s, %s", id1.c_str(), id2.c_str(),
ConstDump().c_str());
break;
case OP_V_I1:
printf("%d", v1);
break;
case OP_VC_I1:
printf("%d %s", v1, ConstDump().c_str());
break;
case OP_VV_FRAME:
printf("%s, interpreter frame[%d]", id1.c_str(), v2);
break;
case OP_VV_I2:
printf("%s, %d", id1.c_str(), v2);
break;
case OP_VV_I1_I2:
printf("%d, %d", v1, v2);
break;
case OP_VVC_I2:
printf("%s, %d, %s", id1.c_str(), v2, ConstDump().c_str());
break;
case OP_VVV_I3:
printf("%s, %s, %d", id1.c_str(), id2.c_str(), v3);
break;
case OP_VVV_I2_I3:
printf("%s, %d, %d", id1.c_str(), v2, v3);
break;
case OP_VVVV_I4:
printf("%s, %s, %s, %d", id1.c_str(), id2.c_str(), id3.c_str(),
v4);
break;
case OP_VVVV_I3_I4:
printf("%s, %s, %d, %d", id1.c_str(), id2.c_str(), v3, v4);
break;
case OP_VVVV_I2_I3_I4:
printf("%s, %d, %d, %d", id1.c_str(), v2, v3, v4);
break;
case OP_VVVC_I3:
printf("%s, %s, %d, %s", id1.c_str(), id2.c_str(), v3,
ConstDump().c_str());
break;
case OP_VVVC_I2_I3:
printf("%s, %d, %d, %s", id1.c_str(), v2, v3,
ConstDump().c_str());
break;
case OP_VVVC_I1_I2_I3:
printf("%d, %d, %d, %s", v1, v2, v3, ConstDump().c_str());
break;
}
if ( func )
printf(" (func %s)", func->Name());
printf("\n");
}
int ZInst::NumFrameSlots() const
{
switch ( op_type ) {
case OP_X:
case OP_C:
case OP_V_I1:
case OP_VC_I1:
case OP_VV_I1_I2:
case OP_VVVC_I1_I2_I3:
return 0;
case OP_V:
case OP_VC:
case OP_VV_FRAME:
case OP_VV_I2:
case OP_VVC_I2:
case OP_VVV_I2_I3:
case OP_VVVC_I2_I3:
case OP_VVVV_I2_I3_I4:
return 1;
case OP_VV:
case OP_VVC:
case OP_VVV_I3:
case OP_VVVC_I3:
case OP_VVVV_I3_I4:
return 2;
case OP_VVV:
case OP_VVVC:
case OP_VVVV_I4:
return 3;
case OP_VVVV:
return 4;
}
}
int ZInst::NumSlots() const
{
switch ( op_type ) {
case OP_C:
case OP_X:
return 0;
case OP_V:
case OP_V_I1:
case OP_VC:
case OP_VC_I1:
return 1;
case OP_VV:
case OP_VVC:
case OP_VV_FRAME:
case OP_VV_I2:
case OP_VVC_I2:
case OP_VV_I1_I2:
return 2;
case OP_VVV:
case OP_VVV_I3:
case OP_VVV_I2_I3:
case OP_VVVC:
case OP_VVVC_I3:
case OP_VVVC_I2_I3:
case OP_VVVC_I1_I2_I3:
return 3;
case OP_VVVV:
case OP_VVVV_I4:
case OP_VVVV_I3_I4:
case OP_VVVV_I2_I3_I4:
return 4;
}
}
string ZInst::VName(int n, int inst_num, const FrameReMap* mappings) const
{
if ( n > NumFrameSlots() )
return "";
int slot = n == 1 ? v1 : (n == 2 ? v2 : (n == 3 ? v3 : v4));
if ( slot < 0 )
return "<special>";
// Find which identifier manifests at this instruction.
ASSERT(slot >= 0 && slot < mappings->size());
auto& map = (*mappings)[slot];
unsigned int i;
for ( i = 0; i < map.id_start.size(); ++i )
{
// If the slot is right at the boundary between two
// identifiers, then it matters whether this is slot 1
// (starts right here) vs. slot > 1 (ignore change right
// at the boundary and stick with older value).
if ( (n == 1 && map.id_start[i] > inst_num) ||
(n > 1 && map.id_start[i] >= inst_num) )
// Went too far.
break;
}
if ( i < map.id_start.size() )
{
ASSERT(i > 0);
}
auto id = map.names.empty() ? map.ids[i-1]->Name() : map.names[i-1];
return util::fmt("%d (%s)", slot, id);
}
ValPtr ZInst::ConstVal() const
{
switch ( op_type ) {
case OP_C:
case OP_VC:
case OP_VC_I1:
case OP_VVC:
case OP_VVC_I2:
case OP_VVVC:
case OP_VVVC_I3:
case OP_VVVC_I2_I3:
case OP_VVVC_I1_I2_I3:
return c.ToVal(t);
case OP_X:
case OP_V:
case OP_VV:
case OP_VVV:
case OP_VVVV:
case OP_V_I1:
case OP_VV_FRAME:
case OP_VV_I2:
case OP_VV_I1_I2:
case OP_VVV_I3:
case OP_VVV_I2_I3:
case OP_VVVV_I4:
case OP_VVVV_I3_I4:
case OP_VVVV_I2_I3_I4:
return nullptr;
}
}
string ZInst::ConstDump() const
{
auto v = ConstVal();
ODesc d;
d.Clear();
v->Describe(&d);
return d.Description();
}
void ZInstI::Dump(const FrameMap* frame_ids, const FrameReMap* remappings) const
{
int n = NumFrameSlots();
// printf("v%d ", n);
auto id1 = VName(1, frame_ids, remappings);
auto id2 = VName(2, frame_ids, remappings);
auto id3 = VName(3, frame_ids, remappings);
auto id4 = VName(4, frame_ids, remappings);
ZInst::Dump(id1, id2, id3, id4);
}
string ZInstI::VName(int n, const FrameMap* frame_ids,
const FrameReMap* remappings) const
{
if ( n > NumFrameSlots() )
return "";
int slot = n == 1 ? v1 : (n == 2 ? v2 : (n == 3 ? v3 : v4));
if ( slot < 0 )
return "<special>";
const ID* id;
if ( remappings && live )
{ // Find which identifier manifests at this instruction.
ASSERT(slot >= 0 && slot < remappings->size());
auto& map = (*remappings)[slot];
unsigned int i;
for ( i = 0; i < map.id_start.size(); ++i )
{
// See discussion for ZInst::VName.
if ( (n == 1 && map.id_start[i] > inst_num) ||
(n > 1 && map.id_start[i] >= inst_num) )
// Went too far.
break;
}
if ( i < map.id_start.size() )
{
ASSERT(i > 0);
}
// For ZInstI's, map.ids is always populated.
id = map.ids[i-1];
}
else
id = (*frame_ids)[slot];
return util::fmt("%d (%s)", slot, id->Name());
}
bool ZInstI::DoesNotContinue() const
{
switch ( op ) {
case OP_GOTO_V:
case OP_HOOK_BREAK_X:
case OP_RETURN_C:
case OP_RETURN_V:
case OP_RETURN_X:
return true;
default:
return false;
}
}
bool ZInstI::IsDirectAssignment() const
{
if ( op_type != OP_VV )
return false;
switch ( op ) {
case OP_ASSIGN_VV_N:
case OP_ASSIGN_VV_A:
case OP_ASSIGN_VV_O:
case OP_ASSIGN_VV_P:
case OP_ASSIGN_VV_R:
case OP_ASSIGN_VV_S:
case OP_ASSIGN_VV_F:
case OP_ASSIGN_VV_T:
case OP_ASSIGN_VV_V:
case OP_ASSIGN_VV_L:
case OP_ASSIGN_VV_f:
case OP_ASSIGN_VV_t:
case OP_ASSIGN_VV:
return true;
default:
return false;
}
}
bool ZInstI::HasSideEffects() const
{
return op_side_effects[op];
}
bool ZInstI::AssignsToSlot1() const
{
switch ( op_type ) {
case OP_X:
case OP_C:
case OP_V_I1:
case OP_VC_I1:
case OP_VV_I1_I2:
case OP_VVVC_I1_I2_I3:
return false;
// We use this ginormous set of cases rather than "default" so
// that when we add a new operand type, we have to consider
// its behavior here. (Same for many of the other switch's
// used for ZInst/ZinstI.)
case OP_V:
case OP_VC:
case OP_VV_FRAME:
case OP_VV_I2:
case OP_VVC_I2:
case OP_VVV_I2_I3:
case OP_VVVC_I2_I3:
case OP_VVVV_I2_I3_I4:
case OP_VV:
case OP_VVC:
case OP_VVV_I3:
case OP_VVVV_I3_I4:
case OP_VVVC_I3:
case OP_VVV:
case OP_VVVC:
case OP_VVVV_I4:
case OP_VVVV:
auto fl = op1_flavor[op];
return fl == OP1_WRITE || fl == OP1_READ_WRITE;
}
}
bool ZInstI::UsesSlot(int slot) const
{
auto fl = op1_flavor[op];
auto v1_relevant = fl == OP1_READ || fl == OP1_READ_WRITE;
auto v1_match = v1_relevant && v1 == slot;
switch ( op_type ) {
case OP_X:
case OP_C:
case OP_V_I1:
case OP_VC_I1:
case OP_VV_I1_I2:
case OP_VVVC_I1_I2_I3:
return false;
case OP_V:
case OP_VC:
case OP_VV_FRAME:
case OP_VV_I2:
case OP_VVC_I2:
case OP_VVV_I2_I3:
case OP_VVVC_I2_I3:
case OP_VVVV_I2_I3_I4:
return v1_match;
case OP_VV:
case OP_VVC:
case OP_VVV_I3:
case OP_VVVV_I3_I4:
case OP_VVVC_I3:
return v1_match || v2 == slot;
case OP_VVV:
case OP_VVVC:
case OP_VVVV_I4:
return v1_match || v2 == slot || v3 == slot;
case OP_VVVV:
return v1_match || v2 == slot || v3 == slot || v4 == slot;
}
}
bool ZInstI::UsesSlots(int& s1, int& s2, int& s3, int& s4) const
{
s1 = s2 = s3 = s4 = -1;
auto fl = op1_flavor[op];
auto v1_relevant = fl == OP1_READ || fl == OP1_READ_WRITE;
switch ( op_type ) {
case OP_X:
case OP_C:
case OP_V_I1:
case OP_VC_I1:
case OP_VV_I1_I2:
case OP_VVVC_I1_I2_I3:
return false;
case OP_V:
case OP_VC:
case OP_VV_FRAME:
case OP_VV_I2:
case OP_VVC_I2:
case OP_VVV_I2_I3:
case OP_VVVC_I2_I3:
case OP_VVVV_I2_I3_I4:
if ( ! v1_relevant )
return false;
s1 = v1;
return true;
case OP_VV:
case OP_VVC:
case OP_VVV_I3:
case OP_VVVV_I3_I4:
case OP_VVVC_I3:
s1 = v2;
if ( v1_relevant )
s2 = v1;
return true;
case OP_VVV:
case OP_VVVC:
case OP_VVVV_I4:
s1 = v2;
s2 = v3;
if ( v1_relevant )
s3 = v1;
return true;
case OP_VVVV:
s1 = v2;
s2 = v3;
s3 = v4;
if ( v1_relevant )
s4 = v1;
return true;
}
}
void ZInstI::UpdateSlots(std::vector<int>& slot_mapping)
{
switch ( op_type ) {
case OP_X:
case OP_C:
case OP_V_I1:
case OP_VC_I1:
case OP_VV_I1_I2:
case OP_VVVC_I1_I2_I3:
return; // so we don't do any v1 remapping.
case OP_V:
case OP_VC:
case OP_VV_FRAME:
case OP_VV_I2:
case OP_VVC_I2:
case OP_VVV_I2_I3:
case OP_VVVC_I2_I3:
case OP_VVVV_I2_I3_I4:
break;
case OP_VV:
case OP_VVC:
case OP_VVV_I3:
case OP_VVVV_I3_I4:
case OP_VVVC_I3:
v2 = slot_mapping[v2];
break;
case OP_VVV:
case OP_VVVC:
case OP_VVVV_I4:
v2 = slot_mapping[v2];
v3 = slot_mapping[v3];
break;
case OP_VVVV:
v2 = slot_mapping[v2];
v3 = slot_mapping[v3];
v4 = slot_mapping[v4];
break;
}
// Note, unlike for UsesSlots() we do *not* include OP1_READ_WRITE
// here, because such instructions will already have v1 remapped
// given it's an assignment target.
if ( op1_flavor[op] == OP1_READ && v1 >= 0 )
v1 = slot_mapping[v1];
}
bool ZInstI::IsGlobalLoad() const
{
if ( op == OP_LOAD_GLOBAL_TYPE_VV )
// These don't have flavors.
return true;
static std::unordered_set<ZOp> global_ops;
if ( global_ops.empty() )
{ // Initialize the set.
for ( int t = 0; t < NUM_TYPES; ++t )
{
TypeTag tag = TypeTag(t);
ZOp global_op_flavor =
AssignmentFlavor(OP_LOAD_GLOBAL_VV, tag, false);
if ( global_op_flavor != OP_NOP )
global_ops.insert(global_op_flavor);
}
}
return global_ops.count(op) > 0;
}
void ZInstI::InitConst(const ConstExpr* ce)
{
auto v = ce->ValuePtr();
t = ce->GetType();
c = ZVal(v, t);
if ( ZAM_error )
reporter->InternalError("bad value compiling code");
}
} // zeek::detail

469
src/script_opt/ZAM/ZInst.h Normal file
View file

@ -0,0 +1,469 @@
// See the file "COPYING" in the main distribution directory for copyright.
// Operators and instructions used in ZAM execution.
#pragma once
#include "zeek/script_opt/ZAM/Support.h"
#include "zeek/script_opt/ZAM/ZOp.h"
namespace zeek::detail {
class Expr;
class ConstExpr;
class Attributes;
class Stmt;
using AttributesPtr = IntrusivePtr<Attributes>;
// Maps ZAM frame slots to associated identifiers.
using FrameMap = std::vector<ID*>;
// Maps ZAM frame slots to information for sharing the slot across
// multiple script variables.
class FrameSharingInfo {
public:
// The variables sharing the slot. ID's need to be non-const so we
// can manipulate them, for example by changing their interpreter
// frame offset.
std::vector<ID*> ids;
// A parallel vector, only used for fully compiled code, which
// gives the names of the identifiers. When in use, the above
// "ids" member variable may be empty.
std::vector<const char*> names;
// The ZAM instruction number where a given identifier starts its
// scope, parallel to "ids".
std::vector<int> id_start;
// The current end of the frame slot's scope. Gets updated as
// new IDs are added to share the slot.
int scope_end;
// Whether this is a managed slot.
bool is_managed;
};
using FrameReMap = std::vector<FrameSharingInfo>;
class ZInstAux;
// A ZAM instruction. This base class has all the information for
// execution, but omits information and methods only necessary for
// compiling.
class ZInst {
public:
ZInst(ZOp _op, ZAMOpType _op_type)
{
op = _op;
op_type = _op_type;
}
// Create a stub instruction that will be populated later.
ZInst() { }
virtual ~ZInst() { }
// Methods for printing out the instruction for debugging/maintenance.
void Dump(int inst_num, const FrameReMap* mappings) const;
void Dump(const std::string& id1, const std::string& id2,
const std::string& id3, const std::string& id4) const;
// Returns the name to use in identifying one of the slots/integer
// values (designated by "n"). "inst_num" identifes the instruction
// by its number within a larger set. "mappings" provides the
// mappings used to translate raw slots to the corresponding
// script variable(s).
std::string VName(int n, int inst_num,
const FrameReMap* mappings) const;
// Number of slots that refer to a frame element. These always
// come first, if we use additional slots.
int NumFrameSlots() const;
// Total number of slots in use. >= NumFrameSlots()
int NumSlots() const;
// Returns nil if this instruction doesn't have an associated constant.
ValPtr ConstVal() const;
// Returns a string describing the constant.
std::string ConstDump() const;
ZOp op;
ZAMOpType op_type;
// Usually indices into frame, though sometimes hold integer constants.
// When an instruction has both frame slots and integer constants,
// the former always come first, even if conceptually in the operation
// the constant is an "earlier" operand.
int v1, v2, v3, v4;
ZVal c; // constant associated with instruction, if any
// Meta-data associated with the execution.
// Type, usually for interpreting the constant.
TypePtr t = nullptr;
TypePtr t2 = nullptr; // just a few ops need two types
const Expr* e = nullptr; // only needed for "when" expressions
Func* func = nullptr; // used for calls
EventHandler* event_handler = nullptr; // used for referring to events
AttributesPtr attrs = nullptr; // used for things like constructors
// Auxiliary information. We could in principle use this to
// consolidate a bunch of the above, though at the cost of
// slightly slower access. Most instructions don't need "aux",
// which is why we bundle these separately.
ZInstAux* aux = nullptr;
// Location associated with this instruction, for error reporting.
const Location* loc = nullptr;
// Whether v1 represents a frame slot type for which we
// explicitly manage the memory.
bool is_managed = false;
};
// A intermediary ZAM instruction, one that includes information/methods
// needed for compiling. Intermediate instructions use pointers to other
// such instructions for branches, rather than concrete instruction
// numbers. This allows the AM optimizer to easily prune instructions.
class ZInstI : public ZInst {
public:
// These constructors can be used directly, but often instead
// they'll be generated via the use of Inst-Gen methods.
ZInstI(ZOp _op) : ZInst(_op, OP_X)
{
op = _op;
op_type = OP_X;
}
ZInstI(ZOp _op, int _v1) : ZInst(_op, OP_V)
{
v1 = _v1;
}
ZInstI(ZOp _op, int _v1, int _v2) : ZInst(_op, OP_VV)
{
v1 = _v1;
v2 = _v2;
}
ZInstI(ZOp _op, int _v1, int _v2, int _v3) : ZInst(_op, OP_VVV)
{
v1 = _v1;
v2 = _v2;
v3 = _v3;
}
ZInstI(ZOp _op, int _v1, int _v2, int _v3, int _v4)
: ZInst(_op, OP_VVVV)
{
v1 = _v1;
v2 = _v2;
v3 = _v3;
v4 = _v4;
}
ZInstI(ZOp _op, const ConstExpr* ce) : ZInst(_op, OP_C)
{
InitConst(ce);
}
ZInstI(ZOp _op, int _v1, const ConstExpr* ce) : ZInst(_op, OP_VC)
{
v1 = _v1;
InitConst(ce);
}
ZInstI(ZOp _op, int _v1, int _v2, const ConstExpr* ce)
: ZInst(_op, OP_VVC)
{
v1 = _v1;
v2 = _v2;
InitConst(ce);
}
ZInstI(ZOp _op, int _v1, int _v2, int _v3, const ConstExpr* ce)
: ZInst(_op, OP_VVVC)
{
v1 = _v1;
v2 = _v2;
v3 = _v3;
InitConst(ce);
}
// Constructor used when we're going to just copy in another ZInstI.
ZInstI() { }
// If "remappings" is non-nil, then it is used instead of frame_ids.
void Dump(const FrameMap* frame_ids, const FrameReMap* remappings) const;
// Note that this is *not* an override of the base class's VName
// but instead a method with similar functionality but somewhat
// different behavior (namely, being cognizant of frame_ids).
std::string VName(int n, const FrameMap* frame_ids,
const FrameReMap* remappings) const;
// True if this instruction definitely won't proceed to the one
// after it.
bool DoesNotContinue() const;
// True if this instruction always branches elsewhere. Different
// from DoesNotContinue() in that returns & hook breaks do not
// continue, but they are not branches.
bool IsUnconditionalBranch() const { return op == OP_GOTO_V; }
// True if this instruction is of the form "v1 = v2".
bool IsDirectAssignment() const;
// True if this instruction has side effects when executed, so
// should not be pruned even if it has a dead assignment.
bool HasSideEffects() const;
// True if the given instruction assigns to the frame location
// given by slot 1 (v1).
bool AssignsToSlot1() const;
// True if the given instruction uses the value in the given frame
// slot. (Assigning to the slot does not constitute using the value.)
bool UsesSlot(int slot) const;
// Returns the slots used (not assigned to). Any slot not used
// is set to -1. Returns true if at least one slot was used.
bool UsesSlots(int& s1, int& s2, int& s3, int& s4) const;
// Updates used (not assigned) slots per the given mapping.
void UpdateSlots(std::vector<int>& slot_mapping);
// True if the instruction corresponds to loading a global into
// the ZAM frame.
bool IsGlobalLoad() const;
// True if the instruction corresponds to some sort of load,
// either from the interpreter frame or of a global.
bool IsLoad() const
{
return op_type == OP_VV_FRAME || IsGlobalLoad();
}
// True if the instruction corresponds to storing a global.
bool IsGlobalStore() const
{
return op == OP_STORE_GLOBAL_V;
}
void CheckIfManaged(const TypePtr& t)
{ if ( ZVal::IsManagedType(t) ) is_managed = true; }
void SetType(TypePtr _t)
{
t = std::move(_t);
if ( t )
CheckIfManaged(t);
}
// Whether the instruction should be included in final code
// generation.
bool live = true;
// Whether the instruction is the beginning of a loop, meaning
// it's the target of backward control flow.
bool loop_start = false;
// How deep the instruction is within loop bodies (for all
// instructions in a loop, not just their beginnings). For
// example, a value of 2 means the instruction is inside a
// loop that itself is inside one more loop.
int loop_depth = 0;
// Branch target, prior to concretizing into PC target.
ZInstI* target = nullptr;
int target_slot = 0; // which of v1/v2/v3 should hold the target
// The final PC location of the statement. -1 indicates not
// yet assigned.
int inst_num = -1;
// Number of associated label(s) (indicating the statement is
// a branch target).
int num_labels = 0;
// Used for debugging. Transformed into the ZInst "loc" field.
const Stmt* stmt = curr_stmt;
private:
// Initialize 'c' from the given ConstExpr.
void InitConst(const ConstExpr* ce);
};
// Auxiliary information, used when the fixed ZInst layout lacks
// sufficient expressiveness to represent all of the elements that
// an instruction needs.
class ZInstAux {
public:
// if n is positive then it gives the size of parallel arrays
// tracking slots, constants, and types.
ZInstAux(int _n)
{
n = _n;
if ( n > 0 )
{
slots = ints = new int[n];
constants = new ValPtr[n];
types = new TypePtr[n];
}
}
~ZInstAux()
{
delete [] ints;
delete [] constants;
delete [] types;
}
// Returns the i'th element of the parallel arrays as a ValPtr.
ValPtr ToVal(const ZVal* frame, int i) const
{
if ( constants[i] )
return constants[i];
else
return frame[slots[i]].ToVal(types[i]);
}
// Returns the parallel arrays as a ListValPtr.
ListValPtr ToListVal(const ZVal* frame) const
{
auto lv = make_intrusive<ListVal>(TYPE_ANY);
for ( auto i = 0; i < n; ++i )
lv->Append(ToVal(frame, i));
return lv;
}
// Converts the parallel arrays to a ListValPtr suitable for
// use as indices for indexing a table or set. "offset" specifies
// which index we're looking for (there can be a bunch for
// constructors), and "width" the number of elements in a single
// index.
ListValPtr ToIndices(const ZVal* frame, int offset, int width) const
{
auto lv = make_intrusive<ListVal>(TYPE_ANY);
for ( auto i = 0; i < 0 + width; ++i )
lv->Append(ToVal(frame, offset + i));
return lv;
}
// Returns the parallel arrays converted to a vector of ValPtr's.
const ValVec& ToValVec(const ZVal* frame)
{
vv.clear();
FillValVec(vv, frame);
return vv;
}
// Populates the given vector of ValPtr's with the conversion
// of the parallel arrays.
void FillValVec(ValVec& vec, const ZVal* frame) const
{
for ( auto i = 0; i < n; ++i )
vec.push_back(ToVal(frame, i));
}
// When building up a ZInstAux, sets one element of the parallel
// arrays to a given frame slot and type.
void Add(int i, int slot, TypePtr t)
{
ints[i] = slot;
constants[i] = nullptr;
types[i] = t;
}
// Same but for constants.
void Add(int i, ValPtr c)
{
ints[i] = -1;
constants[i] = c;
types[i] = nullptr;
}
// Member variables. We could add accessors for manipulating
// these (and make the variables private), but for convenience we
// make them directly available.
// These are parallel arrays, used to build up lists of values.
// Each element is either an integer or a constant. Usually the
// integer is a frame slot (in which case "slots" points to "ints";
// if not, it's nil).
//
// We track associated types, too, enabling us to use
// ZVal::ToVal to convert frame slots or constants to ValPtr's.
int n; // size of arrays
int* slots = nullptr; // either nil or points to ints
int* ints = nullptr;
ValPtr* constants = nullptr;
TypePtr* types = nullptr;
// Used for accessing function names.
ID* id_val = nullptr;
// Whether the instruction can lead to globals changing.
// Currently only needed by the optimizer, but convenient
// to store here.
bool can_change_globals = false;
// The following is only used for OP_CONSTRUCT_KNOWN_RECORD_V,
// to map elements in slots/constants/types to record field offsets.
std::vector<int> map;
///// The following three apply to looping over the elements of tables.
// Frame slots of iteration variables, such as "[v1, v2, v3] in aggr".
std::vector<int> loop_vars;
// Their types.
std::vector<TypePtr> loop_var_types;
// Type associated with the "value" entry, for "k, value in aggr"
// iteration.
TypePtr value_var_type;
// This is only used to return values stored elsewhere in this
// object - it's not set directly.
//
// If we cared about memory penny-pinching, we could make this
// a pointer and only instantiate as needed.
ValVec vv;
};
// Returns a human-readable version of the given ZAM op-code.
extern const char* ZOP_name(ZOp op);
// Maps a generic operation to a specific one associated with the given type.
// The third argument governs what to do if the given type has no assignment
// flavor. If true, this leads to an assertion failure. If false, and
// if there's no flavor for the type, then OP_NOP is returned.
extern ZOp AssignmentFlavor(ZOp orig, TypeTag tag, bool strict=true);
// The following all use initializations produced by Gen-ZAM.
// Maps first operands, and then type tags, to operands.
extern std::unordered_map<ZOp, std::unordered_map<TypeTag, ZOp>> assignment_flavor;
// Maps flavorful assignments to their non-assignment counterpart.
// Used for optimization when we determine that the assigned-to
// value is superfluous.
extern std::unordered_map<ZOp, ZOp> assignmentless_op;
// Maps flavorful assignments to what op-type their non-assignment
// counterpart uses.
extern std::unordered_map<ZOp, ZAMOpType> assignmentless_op_type;
} // namespace zeek::detail

116
src/script_opt/ZAM/ZOp.cc Normal file
View file

@ -0,0 +1,116 @@
// See the file "COPYING" in the main distribution directory for copyright.
#include "zeek/script_opt/ZAM/Support.h"
#include "zeek/script_opt/ZAM/ZOp.h"
namespace zeek::detail {
const char* ZOP_name(ZOp op)
{
switch ( op ) {
#include "zeek/ZAM-OpsNamesDefs.h"
case OP_NOP: return "nop";
}
}
static const char* op_type_name(ZAMOpType ot)
{
switch ( ot ) {
case OP_X: return "X";
case OP_C: return "C";
case OP_V: return "V";
case OP_V_I1: return "V_I1";
case OP_VC_I1: return "VC_I1";
case OP_VC: return "VC";
case OP_VV: return "VV";
case OP_VV_I2: return "VV_I2";
case OP_VV_I1_I2: return "VV_I1_I2";
case OP_VV_FRAME: return "VV_FRAME";
case OP_VVC: return "VVC";
case OP_VVC_I2: return "VVC_I2";
case OP_VVV: return "VVV";
case OP_VVV_I3: return "VVV_I3";
case OP_VVV_I2_I3: return "VVV_I2_I3";
case OP_VVVC: return "VVVC";
case OP_VVVC_I3: return "VVVC_I3";
case OP_VVVC_I2_I3: return "VVVC_I2_I3";
case OP_VVVC_I1_I2_I3: return "VVVC_I1_I2_I3";
case OP_VVVV: return "VVVV";
case OP_VVVV_I4: return "VVVV_I4";
case OP_VVVV_I3_I4: return "VVVV_I3_I4";
case OP_VVVV_I2_I3_I4: return "VVVV_I2_I3_I4";
}
}
ZAMOp1Flavor op1_flavor[] = {
#include "zeek/ZAM-Op1FlavorsDefs.h"
OP1_INTERNAL, // OP_NOP
};
bool op_side_effects[] = {
#include "zeek/ZAM-OpSideEffects.h"
false, // OP_NOP
};
std::unordered_map<ZOp, std::unordered_map<TypeTag, ZOp>> assignment_flavor;
std::unordered_map<ZOp, ZOp> assignmentless_op;
std::unordered_map<ZOp, ZAMOpType> assignmentless_op_type;
ZOp AssignmentFlavor(ZOp orig, TypeTag tag, bool strict)
{
static bool did_init = false;
if ( ! did_init )
{
std::unordered_map<TypeTag, ZOp> empty_map;
#include "zeek/ZAM-AssignFlavorsDefs.h"
did_init = true;
}
// Map type tag to equivalent, as needed.
switch ( tag ) {
case TYPE_BOOL:
case TYPE_ENUM:
tag = TYPE_INT;
break;
case TYPE_PORT:
tag = TYPE_COUNT;
break;
case TYPE_TIME:
case TYPE_INTERVAL:
tag = TYPE_DOUBLE;
break;
default:
break;
}
if ( assignment_flavor.count(orig) == 0 )
{
if ( strict )
ASSERT(false);
else
return OP_NOP;
}
auto orig_map = assignment_flavor[orig];
if ( orig_map.count(tag) == 0 )
{
if ( strict )
ASSERT(false);
else
return OP_NOP;
}
return orig_map[tag];
}
} // zeek::detail

65
src/script_opt/ZAM/ZOp.h Normal file
View file

@ -0,0 +1,65 @@
// See the file "COPYING" in the main distribution directory for copyright.
// ZAM instruction opcodes and associated information.
#pragma once
namespace zeek::detail {
// Opcodes associated with ZAM instructions.
enum ZOp {
#include "zeek/ZAM-OpsDefs.h"
OP_NOP,
};
// Possible types of instruction operands in terms of which fields they use.
// Used for low-level optimization (so important that they're correct),
// and for dumping instructions.
// V: one of the instruction's integer values, treated as a frame slot
// C: the instruction's associated constant
// I1/I2/I3/I4: the instruction's integer value, used directly (not as a slot)
// FRAME: a slot in the (intrepreter) Frame object
// X: no operands
enum ZAMOpType {
OP_X, OP_C, OP_V, OP_V_I1, OP_VC_I1,
OP_VC,
OP_VV,
OP_VV_I2,
OP_VV_I1_I2,
OP_VV_FRAME,
OP_VVC,
OP_VVC_I2,
OP_VVV,
OP_VVV_I3,
OP_VVV_I2_I3,
OP_VVVC,
OP_VVVC_I3,
OP_VVVC_I2_I3,
OP_VVVC_I1_I2_I3,
OP_VVVV,
OP_VVVV_I4,
OP_VVVV_I3_I4,
OP_VVVV_I2_I3_I4,
};
// Possible "flavors" for an operator's first slot.
enum ZAMOp1Flavor {
OP1_READ, // the slot is read, not modified
OP1_WRITE, // the slot is modified, not read - the most common
OP1_READ_WRITE, // the slot is both read and then modified, e.g. "++"
OP1_INTERNAL, // we're doing some internal manipulation of the slot
};
// Maps an operand to its flavor.
extern ZAMOp1Flavor op1_flavor[];
// Maps an operand to whether it has side effects.
extern bool op_side_effects[];
} // namespace zeek::detail

View file

@ -783,6 +783,14 @@ SetupResult setup(int argc, char** argv, Options* zopts)
} }
} }
if ( options.parse_only )
{
if ( analysis_options.usage_issues > 0 )
analyze_scripts();
exit(reporter->Errors() != 0);
}
auto init_stmts = stmts ? analyze_global_stmts(stmts) : nullptr; auto init_stmts = stmts ? analyze_global_stmts(stmts) : nullptr;
analyze_scripts(); analyze_scripts();
@ -791,9 +799,6 @@ SetupResult setup(int argc, char** argv, Options* zopts)
// This option is report-and-exit. // This option is report-and-exit.
exit(0); exit(0);
if ( options.parse_only )
exit(reporter->Errors() != 0);
if ( dns_type != DNS_PRIME ) if ( dns_type != DNS_PRIME )
run_state::detail::init_run(options.interface, options.pcap_file, options.pcap_output_file, options.use_watchdog); run_state::detail::init_run(options.interface, options.pcap_file, options.pcap_output_file, options.use_watchdog);

View file

@ -0,0 +1,8 @@
### BTest baseline data generated by btest-diff. Do not edit. Use "btest -U/-u" to update. Requires BTest >= 0.63.
--- Backtrace ---
--- Backtrace ---
--- Backtrace ---

View file

@ -0,0 +1,4 @@
### BTest baseline data generated by btest-diff. Do not edit. Use "btest -U/-u" to update. Requires BTest >= 0.63.
error in <...>/to_subnet.zeek, line 10: failed converting string to IP prefix (10.0.0.0)
error in <...>/to_subnet.zeek, line 12: failed converting string to IP prefix (10.0.0.0/222)
error in <...>/to_subnet.zeek, line 14: failed converting string to IP prefix (don't work)

View file

@ -0,0 +1,6 @@
### BTest baseline data generated by btest-diff. Do not edit. Use "btest -U/-u" to update. Requires BTest >= 0.63.
10.0.0.0/8, T
2607:f8b0::/32, T
::/0, T
::/0, T
::/0, T

View file

@ -0,0 +1,39 @@
### BTest baseline data generated by btest-diff. Do not edit. Use "btest -U/-u" to update. Requires BTest >= 0.63.
set1, {
1
}
set2, {
[2, two]
}
setvector, {
[one, two]
}
setrecord, {
[a=97, b=B]
}
setfunction, {
foo
ZAM-code foo
}
setpattern, {
/^?(foobar)$?/
}
table1, {
[1] = t1
}
table2, {
[2, two] = t2
}
tablevector, {
[[one, two]] = tvec
}
tablerecord, {
[[a=97, b=B]] = trec
}
tablefunction, {
[foo
ZAM-code foo ] = tfunc
}
tablepattern, {
[/^?(foobar)$?/] = tpat
}

View file

@ -0,0 +1 @@
### BTest baseline data generated by btest-diff. Do not edit. Use "btest -U/-u" to update. Requires BTest >= 0.63.

View file

@ -0,0 +1,11 @@
### BTest baseline data generated by btest-diff. Do not edit. Use "btest -U/-u" to update. Requires BTest >= 0.63.
#separator \x09
#set_separator ,
#empty_field (empty)
#unset_field -
#path test
#open XXXX-XX-XX-XX-XX-XX
#fields b i e c p sn a d t iv s sc ss se vc ve f
#types bool int enum count port subnet addr double time interval string set[count] set[string] set[string] vector[count] vector[string] func
T -42 Test::LOG 21 123 10.0.0.0/24 1.2.3.4 3.14 XXXXXXXXXX.XXXXXX 100.000000 hurz 1 AA (empty) 10,20,30 (empty) foo\x0aZAM-code foo
#close XXXX-XX-XX-XX-XX-XX

View file

@ -0,0 +1,2 @@
### BTest baseline data generated by btest-diff. Do not edit. Use "btest -U/-u" to update. Requires BTest >= 0.63.
Broker::peer_added, 127.0.0.1

View file

@ -0,0 +1,11 @@
### BTest baseline data generated by btest-diff. Do not edit. Use "btest -U/-u" to update. Requires BTest >= 0.63.
#separator \x09
#set_separator ,
#empty_field (empty)
#unset_field -
#path test
#open XXXX-XX-XX-XX-XX-XX
#fields b i e c p sn a d t iv s sc ss se vc ve f
#types bool int enum count port subnet addr double time interval string set[count] set[string] set[string] vector[count] vector[string] func
T -42 Test::LOG 21 123 10.0.0.0/24 1.2.3.4 3.14 XXXXXXXXXX.XXXXXX 100.000000 hurz 1 AA (empty) 10,20,30 (empty) foo\x0aZAM-code foo
#close XXXX-XX-XX-XX-XX-XX

View file

@ -0,0 +1,11 @@
### BTest baseline data generated by btest-diff. Do not edit. Use "btest -U/-u" to update. Requires BTest >= 0.63.
error: Failed to attach master store backend_failure:
error: Could not create Broker master store '../fail'
error in <...>/create-failure.zeek, line 49: invalid Broker store handle (Broker::keys(s) and broker::store::{})
error in <...>/create-failure.zeek, line 56: invalid Broker store handle (check_terminate_conditions() and broker::store::{})
error in <...>/create-failure.zeek, line 56: invalid Broker store handle (check_terminate_conditions() and broker::store::{})
error in <...>/create-failure.zeek, line 49: invalid Broker store handle (Broker::keys(s) and broker::store::{})
error in <...>/create-failure.zeek, line 49: invalid Broker store handle (Broker::keys(s) and broker::store::{})
error in <...>/create-failure.zeek, line 49: invalid Broker store handle (Broker::keys(s) and broker::store::{})
error in <...>/create-failure.zeek, line 49: invalid Broker store handle (Broker::keys(s) and broker::store::{})
received termination signal

View file

@ -0,0 +1,21 @@
### BTest baseline data generated by btest-diff. Do not edit. Use "btest -U/-u" to update. Requires BTest >= 0.63.
T
F
F
F
m1 keys result: [status=Broker::FAILURE, result=[data=<uninitialized>]]
m2 keys result: [status=Broker::SUCCESS, result=[data=broker::data{{}}]]
c2 keys result: [status=Broker::SUCCESS, result=[data=broker::data{{}}]]
T
F
F
F
T
T
T
T
m1 keys result: [status=Broker::FAILURE, result=[data=<uninitialized>]]
c1 keys result: [status=Broker::FAILURE, result=[data=<uninitialized>]]
m2 keys result: [status=Broker::FAILURE, result=[data=<uninitialized>]]
c2 keys result: [status=Broker::FAILURE, result=[data=<uninitialized>]]
c1 timeout

View file

@ -0,0 +1,8 @@
### BTest baseline data generated by btest-diff. Do not edit. Use "btest -U/-u" to update. Requires BTest >= 0.63.
runtime error in <...>/div-by-zero.zeek, line 9: division by zero
runtime error in <...>/div-by-zero.zeek, line 14: division by zero
runtime error in <...>/div-by-zero.zeek, line 19: division by zero
runtime error in <...>/div-by-zero.zeek, line 29: modulo by zero
runtime error in <...>/div-by-zero.zeek, line 34: modulo by zero
runtime error in <...>/div-by-zero.zeek, line 24: division by zero
runtime error in <...>/div-by-zero.zeek, line 39: modulo by zero

View file

@ -0,0 +1,19 @@
### BTest baseline data generated by btest-diff. Do not edit. Use "btest -U/-u" to update. Requires BTest >= 0.63.
ftp field missing
[orig_h=141.142.220.118, orig_p=48649/tcp, resp_h=208.80.152.118, resp_p=80/tcp]
ftp field missing
[orig_h=141.142.220.118, orig_p=49997/tcp, resp_h=208.80.152.3, resp_p=80/tcp]
ftp field missing
[orig_h=141.142.220.118, orig_p=49996/tcp, resp_h=208.80.152.3, resp_p=80/tcp]
ftp field missing
[orig_h=141.142.220.118, orig_p=49998/tcp, resp_h=208.80.152.3, resp_p=80/tcp]
ftp field missing
[orig_h=141.142.220.118, orig_p=50000/tcp, resp_h=208.80.152.3, resp_p=80/tcp]
ftp field missing
[orig_h=141.142.220.118, orig_p=49999/tcp, resp_h=208.80.152.3, resp_p=80/tcp]
ftp field missing
[orig_h=141.142.220.118, orig_p=50001/tcp, resp_h=208.80.152.3, resp_p=80/tcp]
ftp field missing
[orig_h=141.142.220.118, orig_p=35642/tcp, resp_h=208.80.152.2, resp_p=80/tcp]
ftp field missing
[orig_h=141.142.220.235, orig_p=6705/tcp, resp_h=173.192.163.128, resp_p=80/tcp]

View file

@ -0,0 +1,19 @@
### BTest baseline data generated by btest-diff. Do not edit. Use "btest -U/-u" to update. Requires BTest >= 0.63.
#separator \x09
#set_separator ,
#empty_field (empty)
#unset_field -
#path reporter
#open XXXX-XX-XX-XX-XX-XX
#fields ts level message location
#types time enum string string
XXXXXXXXXX.XXXXXX Reporter::ERROR field value missing: $ftp <...>/expr-exception.zeek, line 10
XXXXXXXXXX.XXXXXX Reporter::ERROR field value missing: $ftp <...>/expr-exception.zeek, line 10
XXXXXXXXXX.XXXXXX Reporter::ERROR field value missing: $ftp <...>/expr-exception.zeek, line 10
XXXXXXXXXX.XXXXXX Reporter::ERROR field value missing: $ftp <...>/expr-exception.zeek, line 10
XXXXXXXXXX.XXXXXX Reporter::ERROR field value missing: $ftp <...>/expr-exception.zeek, line 10
XXXXXXXXXX.XXXXXX Reporter::ERROR field value missing: $ftp <...>/expr-exception.zeek, line 10
XXXXXXXXXX.XXXXXX Reporter::ERROR field value missing: $ftp <...>/expr-exception.zeek, line 10
XXXXXXXXXX.XXXXXX Reporter::ERROR field value missing: $ftp <...>/expr-exception.zeek, line 10
XXXXXXXXXX.XXXXXX Reporter::ERROR field value missing: $ftp <...>/expr-exception.zeek, line 10
#close XXXX-XX-XX-XX-XX-XX

View file

@ -0,0 +1,3 @@
### BTest baseline data generated by btest-diff. Do not edit. Use "btest -U/-u" to update. Requires BTest >= 0.63.
runtime error in <...>/init-error.zeek, line 16: no such index
fatal error: errors occurred while initializing

View file

@ -0,0 +1,4 @@
### BTest baseline data generated by btest-diff. Do not edit. Use "btest -U/-u" to update. Requires BTest >= 0.63.
1st event
2nd event
3rd event

View file

@ -0,0 +1,2 @@
### BTest baseline data generated by btest-diff. Do not edit. Use "btest -U/-u" to update. Requires BTest >= 0.63.
error: ID 'A' is not an option

View file

@ -0,0 +1,2 @@
### BTest baseline data generated by btest-diff. Do not edit. Use "btest -U/-u" to update. Requires BTest >= 0.63.
error: Option::on_change needs function argument; got 'count' for ID 'A'

View file

@ -0,0 +1,2 @@
### BTest baseline data generated by btest-diff. Do not edit. Use "btest -U/-u" to update. Requires BTest >= 0.63.
error: Third argument of passed function has to be string in Option::on_change for ID 'A'; got 'count'

View file

@ -0,0 +1,2 @@
### BTest baseline data generated by btest-diff. Do not edit. Use "btest -U/-u" to update. Requires BTest >= 0.63.
error: Wrong number of arguments for passed function in Option::on_change for ID 'A'; expected 2 or 3, got 4

View file

@ -0,0 +1,2 @@
### BTest baseline data generated by btest-diff. Do not edit. Use "btest -U/-u" to update. Requires BTest >= 0.63.
error: Incompatible type for set of ID 'A': got 'string', need 'count'

View file

@ -0,0 +1,2 @@
### BTest baseline data generated by btest-diff. Do not edit. Use "btest -U/-u" to update. Requires BTest >= 0.63.
error: ID 'A' is not an option

View file

@ -0,0 +1,2 @@
### BTest baseline data generated by btest-diff. Do not edit. Use "btest -U/-u" to update. Requires BTest >= 0.63.
error: Second argument of passed function has to be count in Option::on_change for ID 'A'; got 'bool'

View file

@ -0,0 +1,2 @@
### BTest baseline data generated by btest-diff. Do not edit. Use "btest -U/-u" to update. Requires BTest >= 0.63.
error: Wrong number of arguments for passed function in Option::on_change for ID 'A'; expected 2 or 3, got 1

View file

@ -0,0 +1,2 @@
### BTest baseline data generated by btest-diff. Do not edit. Use "btest -U/-u" to update. Requires BTest >= 0.63.
error: Passed function needs to return type 'count' for ID 'A'; got 'bool'

View file

@ -0,0 +1,2 @@
### BTest baseline data generated by btest-diff. Do not edit. Use "btest -U/-u" to update. Requires BTest >= 0.63.
error: Option::on_change needs function argument; not hook or event

View file

@ -0,0 +1,2 @@
### BTest baseline data generated by btest-diff. Do not edit. Use "btest -U/-u" to update. Requires BTest >= 0.63.
error: Option::on_change needs function argument; not hook or event

Some files were not shown because too many files have changed in this diff Show more