mirror of
https://github.com/zeek/zeek.git
synced 2025-10-02 22:58:20 +00:00
extensive rewrite of generation & execution of run-time initialization
This commit is contained in:
parent
bc3bf4ea6c
commit
e1a760e674
26 changed files with 3459 additions and 1580 deletions
|
@ -5,18 +5,20 @@
|
|||
#include "zeek/Desc.h"
|
||||
#include "zeek/script_opt/CPP/Func.h"
|
||||
#include "zeek/script_opt/CPP/HashMgr.h"
|
||||
#include "zeek/script_opt/CPP/InitsInfo.h"
|
||||
#include "zeek/script_opt/CPP/Tracker.h"
|
||||
#include "zeek/script_opt/CPP/Util.h"
|
||||
#include "zeek/script_opt/ScriptOpt.h"
|
||||
|
||||
// We structure the compiler for generating C++ versions of Zeek script
|
||||
// bodies as a single large class. While we divide the compiler's
|
||||
// bodies maily as a single large class. While we divide the compiler's
|
||||
// functionality into a number of groups (see below), these interact with
|
||||
// one another, and in particular with various member variables, enough
|
||||
// so that it's not clear there's benefit to further splitting the
|
||||
// functionality into multiple classes. (Some splitting has already been
|
||||
// done for more self-contained functionality, resulting in the CPPTracker
|
||||
// and CPPHashManager classes.)
|
||||
// and CPPHashManager classes, and initialization information in
|
||||
// InitsInfo.{h,cc} and RuntimeInits.{h,cc}.)
|
||||
//
|
||||
// Most aspects of translating to C++ have a straightforward nature.
|
||||
// We can turn many Zeek script statements directly into the C++ that's
|
||||
|
@ -45,26 +47,6 @@
|
|||
// all of the scripts loaded in "bare" mode, plus those for foo.zeek; and
|
||||
// without the "-b" for all of the default scripts plus those in foo.zeek.
|
||||
//
|
||||
// One of the design goals employed is to support "incremental" compilation,
|
||||
// i.e., compiling *additional* Zeek scripts at a later point after an
|
||||
// initial compilation. This comes in two forms.
|
||||
//
|
||||
// "-O update-C++" produces C++ code that extends that already compiled,
|
||||
// in a manner where subsequent compilations can leverage both the original
|
||||
// and the newly added. Such compilations *must* be done in a consistent
|
||||
// context (for example, any types extended in the original are extended in
|
||||
// the same manner - plus then perhaps further extensions - in the updated
|
||||
// code).
|
||||
//
|
||||
// "-O add-C++" instead produces C++ code that (1) will not be leveraged in
|
||||
// any subsequent compilations, and (2) can be inconsistent with other
|
||||
// "-O add-C++" code added in the future. The main use of this feature is
|
||||
// to support compiling polyglot versions of Zeek scripts used to run
|
||||
// the test suite.
|
||||
//
|
||||
// Zeek invocations specifying "-O use-C++" will activate any code compiled
|
||||
// into the zeek binary; otherwise, the code lies dormant.
|
||||
//
|
||||
// "-O report-C++" reports on which compiled functions will/won't be used
|
||||
// (including ones that are available but not relevant to the scripts loaded
|
||||
// on the command line). This can be useful when debugging to make sure
|
||||
|
@ -104,29 +86,41 @@
|
|||
//
|
||||
// Emit Low-level code generation.
|
||||
//
|
||||
// Of these, Inits is probably the most subtle. It turns out to be
|
||||
// very tricky ensuring that we create run-time variables in the
|
||||
// proper order. For example, a global might need a record type to be
|
||||
// defined; one of the record's fields is a table; that table contains
|
||||
// another record; one of that other record's fields is the original
|
||||
// record (recursion); another field has an &default expression that
|
||||
// requires the compiler to generate a helper function to construct
|
||||
// the expression dynamically; and that helper function might in turn
|
||||
// refer to other types that require initialization.
|
||||
// Of these, Inits is the most subtle and complex. There are two major
|
||||
// challenges in creating run-time values (such as Zeek types and constants).
|
||||
//
|
||||
// To deal with these dependencies, for every run-time object the compiler
|
||||
// maintains (1) all of the other run-time objects on which its initialization
|
||||
// depends, and (2) the C++ statements needed to initialize it, once those
|
||||
// other objects have been initialized. It then beings initialization with
|
||||
// objects that have no dependencies, marks those as done (essentially), finds
|
||||
// objects that now can be initialized and emits their initializations,
|
||||
// marks those as done, etc.
|
||||
// First, generating individual code for creating each of these winds up
|
||||
// incurring unacceptable compile times (for example, clang compiling all
|
||||
// of the base scripts with optimization takes many hours on a high-end
|
||||
// laptop). As a result, we employ a table-driven approach that compiles
|
||||
// much faster (though still taking many minutes on the same high-end laptop,
|
||||
// running about 40x faster however).
|
||||
//
|
||||
// Below in declaring the CPPCompiler class, we group methods in accordance
|
||||
// with those listed above. We also locate member variables with the group
|
||||
// most relevant for their usage. However, keep in mind that many member
|
||||
// variables are used by multiple groups, which is why we haven't created
|
||||
// distinct per-group classes.
|
||||
// Second, initializations frequently rely upon *other* initializations
|
||||
// having occurred first. For example, a global might need a record type
|
||||
// to be defined; one of the record's fields is a table; that table contains
|
||||
// another record; one of that other record's fields is the original record
|
||||
// (recursion); another field has an &default expression that requires the
|
||||
// compiler to generate a helper function to construct the expression
|
||||
// dynamically; and that helper function might in turn refer to other types
|
||||
// that require initialization. What's required is a framework for ensuring
|
||||
// that everything occurs in the proper order.
|
||||
//
|
||||
// The logic for dealing with these complexities is isolated into several
|
||||
// sets of classes. InitsInfo.{h,cc} provides the classes related to tracking
|
||||
// how to generate initializations in the proper order. RuntimeInits.{h,cc}
|
||||
// provides the classes used when initialization generated code in order
|
||||
// to instantiate all of the necessary values. See those files for discussions
|
||||
// on how they address the points framed above.
|
||||
//
|
||||
// In declaring the CPPCompiler class, we group methods in accordance with
|
||||
// those listed above, locating member variables with the group most relevant
|
||||
// for their usage. However, keep in mind that many member variables are
|
||||
// used by multiple groups, which is why we haven't created distinct
|
||||
// per-group classes. In addition, we make a number of methods public
|
||||
// in order to avoid the need for numerous "friend" declarations to allow
|
||||
// associated classes (like those for initialization) access to a the
|
||||
// necessary compiler methods.
|
||||
|
||||
namespace zeek::detail
|
||||
{
|
||||
|
@ -135,10 +129,124 @@ class CPPCompile
|
|||
{
|
||||
public:
|
||||
CPPCompile(std::vector<FuncInfo>& _funcs, ProfileFuncs& pfs, const std::string& gen_name,
|
||||
const std::string& addl_name, CPPHashManager& _hm, bool _update, bool _standalone,
|
||||
const std::string& addl_name, CPPHashManager& _hm, bool _standalone,
|
||||
bool report_uncompilable);
|
||||
~CPPCompile();
|
||||
|
||||
// Constructing a CPPCompile object does all of the compilation.
|
||||
// The public methods here are for use by helper classes.
|
||||
|
||||
// Tracks the given type (with support methods for ones that
|
||||
// are complicated), recursively including its sub-types, and
|
||||
// creating initializations for constructing C++ variables
|
||||
// representing the types.
|
||||
//
|
||||
// Returns the initialization info associated with the type.
|
||||
std::shared_ptr<CPP_InitInfo> RegisterType(const TypePtr& t);
|
||||
|
||||
// Easy access to the global offset and the initialization
|
||||
// cohort associated with a given type.
|
||||
int TypeOffset(const TypePtr& t) { return GI_Offset(RegisterType(t)); }
|
||||
int TypeCohort(const TypePtr& t) { return GI_Cohort(RegisterType(t)); }
|
||||
|
||||
// Tracks a Zeek ValPtr used as a constant value. These occur
|
||||
// in two contexts: directly as constant expressions, and indirectly
|
||||
// as elements within aggregate constants (such as in vector
|
||||
// initializers).
|
||||
//
|
||||
// Returns the associated initialization info. In addition,
|
||||
// consts_offset returns an offset into an initialization-time
|
||||
// global that tracks all constructed globals, providing
|
||||
// general access to them for aggregate constants.
|
||||
std::shared_ptr<CPP_InitInfo> RegisterConstant(const ValPtr& vp, int& consts_offset);
|
||||
|
||||
// Tracks a global to generate the necessary initialization.
|
||||
// Returns the associated initialization info.
|
||||
std::shared_ptr<CPP_InitInfo> RegisterGlobal(const ID* g);
|
||||
|
||||
// Tracks a use of the given set of attributes, including
|
||||
// initialization dependencies and the generation of any
|
||||
// associated expressions.
|
||||
//
|
||||
// Returns the initialization info associated with the set of
|
||||
// attributes.
|
||||
std::shared_ptr<CPP_InitInfo> RegisterAttributes(const AttributesPtr& attrs);
|
||||
|
||||
// Convenient access to the global offset associated with
|
||||
// a set of Attributes.
|
||||
int AttributesOffset(const AttributesPtr& attrs)
|
||||
{
|
||||
return GI_Offset(RegisterAttributes(attrs));
|
||||
}
|
||||
|
||||
// The same, for a single attribute.
|
||||
std::shared_ptr<CPP_InitInfo> RegisterAttr(const AttrPtr& attr);
|
||||
int AttrOffset(const AttrPtr& attr) { return GI_Offset(RegisterAttr(attr)); }
|
||||
|
||||
// Returns a mapping of from Attr objects to their associated
|
||||
// initialization information. The Attr must have previously
|
||||
// been registered.
|
||||
auto ProcessedAttr() { return processed_attr; }
|
||||
|
||||
// True if the given expression is simple enough that we can
|
||||
// generate code to evaluate it directly, and don't need to
|
||||
// create a separate function per RegisterInitExpr() to track it.
|
||||
static bool IsSimpleInitExpr(const ExprPtr& e);
|
||||
|
||||
// Tracks expressions used in attributes (such as &default=<expr>).
|
||||
//
|
||||
// We need to generate code to evaluate these, via CallExpr's
|
||||
// that invoke functions that return the value of the expression.
|
||||
// However, we can't generate that code when first encountering
|
||||
// the attribute, because doing so will need to refer to the names
|
||||
// of types, and initially those are unavailable (because the type's
|
||||
// representatives, per pfs.RepTypes(), might not have yet been
|
||||
// tracked). So instead we track the associated CallExprInitInfo
|
||||
// objects, and after all types have been tracked, then spin
|
||||
// through them to generate the code.
|
||||
//
|
||||
// Returns the associated initialization information.
|
||||
std::shared_ptr<CPP_InitInfo> RegisterInitExpr(const ExprPtr& e);
|
||||
|
||||
// Tracks a C++ string value needed for initialization. Returns
|
||||
// an offset into the global vector that will hold these.
|
||||
int TrackString(std::string s)
|
||||
{
|
||||
if ( tracked_strings.count(s) == 0 )
|
||||
{
|
||||
tracked_strings[s] = ordered_tracked_strings.size();
|
||||
ordered_tracked_strings.emplace_back(s);
|
||||
}
|
||||
|
||||
return tracked_strings[s];
|
||||
}
|
||||
|
||||
// Tracks a profile hash value needed for initialization. Returns
|
||||
// an offset into the global vector that will hold these.
|
||||
int TrackHash(p_hash_type h)
|
||||
{
|
||||
if ( tracked_hashes.count(h) == 0 )
|
||||
{
|
||||
tracked_hashes[h] = ordered_tracked_hashes.size();
|
||||
ordered_tracked_hashes.emplace_back(h);
|
||||
}
|
||||
|
||||
return tracked_hashes[h];
|
||||
}
|
||||
|
||||
// Returns the hash associated with a given function body.
|
||||
// It's a fatal error to call this for a body that hasn't
|
||||
// been compiled.
|
||||
p_hash_type BodyHash(const Stmt* body);
|
||||
|
||||
// Returns true if at least one of the function bodies associated
|
||||
// with the function/hook/event handler of the given fname is
|
||||
// not compilable.
|
||||
bool NotFullyCompilable(const std::string& fname) const
|
||||
{
|
||||
return not_fully_compilable.count(fname) > 0;
|
||||
}
|
||||
|
||||
private:
|
||||
// Start of methods related to driving the overall compilation
|
||||
// process.
|
||||
|
@ -148,6 +256,37 @@ private:
|
|||
// Main driver, invoked by constructor.
|
||||
void Compile(bool report_uncompilable);
|
||||
|
||||
// The following methods all create objects that track the
|
||||
// initializations of a given type of value. In each, "tag"
|
||||
// is the name used to identify the initializer global
|
||||
// associated with the given type of value, and "type" is
|
||||
// its C++ representation. Often "tag" is concatenated with
|
||||
// "type" to designate a specific C++ type. For example,
|
||||
// "tag" might be "Double" and "type" might be "ValPtr";
|
||||
// the resulting global's type is "DoubleValPtr".
|
||||
|
||||
// Creates an object for tracking values associated with Zeek
|
||||
// constants. "c_type" is the C++ type used in the initializer
|
||||
// for each object; or, if empty, it specifies that we represent
|
||||
// the value using an index into a separate vector that holds
|
||||
// the constant.
|
||||
std::shared_ptr<CPP_InitsInfo> CreateConstInitInfo(const char* tag, const char* type,
|
||||
const char* c_type);
|
||||
|
||||
// Creates an object for tracking compound initializers, which
|
||||
// are whose initialization uses indexes into other vectors.
|
||||
std::shared_ptr<CPP_InitsInfo> CreateCompoundInitInfo(const char* tag, const char* type);
|
||||
|
||||
// Creates an object for tracking initializers that have custom
|
||||
// C++ objects to hold their initialization information.
|
||||
std::shared_ptr<CPP_InitsInfo> CreateCustomInitInfo(const char* tag, const char* type);
|
||||
|
||||
// Generates the declaration associated with a set of initializations
|
||||
// and tracks the object to facilitate looping over all so
|
||||
// initializations. As a convenience, returns the object.
|
||||
std::shared_ptr<CPP_InitsInfo> RegisterInitInfo(const char* tag, const char* type,
|
||||
std::shared_ptr<CPP_InitsInfo> gi);
|
||||
|
||||
// Generate the beginning of the compiled code: run-time functions,
|
||||
// namespace, auxiliary globals.
|
||||
void GenProlog();
|
||||
|
@ -158,7 +297,7 @@ private:
|
|||
void RegisterCompiledBody(const std::string& f);
|
||||
|
||||
// After compilation, generate the final code. Most of this is
|
||||
// run-time initialization of various dynamic values.
|
||||
// in support of run-time initialization of various dynamic values.
|
||||
void GenEpilog();
|
||||
|
||||
// True if the given function (plus body and profile) is one
|
||||
|
@ -185,9 +324,13 @@ private:
|
|||
// it including some functionality we don't currently support
|
||||
// for compilation.
|
||||
//
|
||||
// Indexed by the name of the function.
|
||||
// Indexed by the C++ name of the function.
|
||||
std::unordered_set<std::string> compilable_funcs;
|
||||
|
||||
// Tracks which functions/hooks/events have at least one non-compilable
|
||||
// body. Indexed by the Zeek name of function.
|
||||
std::unordered_set<std::string> not_fully_compilable;
|
||||
|
||||
// Maps functions (not hooks or events) to upstream compiled names.
|
||||
std::unordered_map<std::string, std::string> hashed_funcs;
|
||||
|
||||
|
@ -200,10 +343,6 @@ private:
|
|||
// compilation units.
|
||||
int addl_tag = 0;
|
||||
|
||||
// If true, then we're updating the C++ base (i.e., generating
|
||||
// code meant for use by subsequently generated code).
|
||||
bool update = false;
|
||||
|
||||
// If true, the generated code should run "standalone".
|
||||
bool standalone = false;
|
||||
|
||||
|
@ -211,7 +350,7 @@ private:
|
|||
// needed for "seatbelts", to ensure that we can produce a
|
||||
// unique hash relating to this compilation (*and* its
|
||||
// compilation time, which is why these are "seatbelts" and
|
||||
// likely not important to make distinct.
|
||||
// likely not important to make distinct).
|
||||
p_hash_type total_hash = 0;
|
||||
|
||||
// Working directory in which we're compiling. Used to quasi-locate
|
||||
|
@ -236,11 +375,6 @@ private:
|
|||
// track it as such.
|
||||
void CreateGlobal(const ID* g);
|
||||
|
||||
// For the globals used in the compilation, if new then append
|
||||
// them to the hash file to make the information available
|
||||
// to subsequent compilation runs.
|
||||
void UpdateGlobalHashes();
|
||||
|
||||
// Register the given identifier as a BiF. If is_var is true
|
||||
// then the BiF is also used in a non-call context.
|
||||
void AddBiF(const ID* b, bool is_var);
|
||||
|
@ -258,10 +392,9 @@ private:
|
|||
|
||||
// The following match various forms of identifiers to the
|
||||
// name used for their C++ equivalent.
|
||||
const char* IDName(const ID& id) { return IDName(&id); }
|
||||
const char* IDName(const IDPtr& id) { return IDName(id.get()); }
|
||||
const char* IDName(const ID* id) { return IDNameStr(id).c_str(); }
|
||||
const std::string& IDNameStr(const ID* id) const;
|
||||
const std::string& IDNameStr(const ID* id);
|
||||
|
||||
// Returns a canonicalized version of a variant of a global made
|
||||
// distinct by the given suffix.
|
||||
|
@ -280,12 +413,20 @@ private:
|
|||
// conflict with C++ keywords.
|
||||
std::string Canonicalize(const char* name) const;
|
||||
|
||||
// Returns the name of the global corresponding to an expression
|
||||
// (which must be a EXPR_NAME).
|
||||
std::string GlobalName(const ExprPtr& e) { return globals[e->AsNameExpr()->Id()->Name()]; }
|
||||
|
||||
// Maps global names (not identifiers) to the names we use for them.
|
||||
std::unordered_map<std::string, std::string> globals;
|
||||
|
||||
// Similar for locals, for the function currently being compiled.
|
||||
std::unordered_map<const ID*, std::string> locals;
|
||||
|
||||
// Retrieves the initialization information associated with the
|
||||
// given global.
|
||||
std::unordered_map<const ID*, std::shared_ptr<CPP_InitInfo>> global_gis;
|
||||
|
||||
// Maps event names to the names we use for them.
|
||||
std::unordered_map<std::string, std::string> events;
|
||||
|
||||
|
@ -307,14 +448,37 @@ private:
|
|||
// Similar, but for lambdas.
|
||||
void DeclareLambda(const LambdaExpr* l, const ProfileFunc* pf);
|
||||
|
||||
// Declares the CPPStmt subclass used for compiling the given
|
||||
// Generates code to declare the compiled version of a script
|
||||
// function. "ft" gives the functions type, "pf" its profile,
|
||||
// "fname" its C++ name, "body" its AST, "l" if non-nil its
|
||||
// corresponding lambda expression, and "flavor" whether it's
|
||||
// a hook/event/function.
|
||||
//
|
||||
// We use two basic approaches. Most functions are represented
|
||||
// by a "CPPDynStmt" object that's parameterized by a void* pointer
|
||||
// to the underlying C++ function and an index used to dynamically
|
||||
// cast the pointer to having the correct type for then calling it.
|
||||
// Lambdas, however (including "implicit" lambdas used to associate
|
||||
// complex expressions with &attributes), each have a unique
|
||||
// subclass derived from CPPStmt that calls the underlying C++
|
||||
// function without requiring a cast, and that holds the values
|
||||
// of the lambda's captures.
|
||||
//
|
||||
// It would be cleanest to use the latter approach for all functions,
|
||||
// but the hundreds/thousands of additional classes required for
|
||||
// doing so significantly slows down C++ compilation, so we instead
|
||||
// opt for the uglier dynamic casting approach, which only requires
|
||||
// one additional class.
|
||||
void CreateFunction(const FuncTypePtr& ft, const ProfileFunc* pf, const std::string& fname,
|
||||
const StmtPtr& body, int priority, const LambdaExpr* l,
|
||||
FunctionFlavor flavor);
|
||||
|
||||
// Used for the case of creating a custom subclass of CPPStmt.
|
||||
void DeclareSubclass(const FuncTypePtr& ft, const ProfileFunc* pf, const std::string& fname,
|
||||
const StmtPtr& body, int priority, const LambdaExpr* l,
|
||||
FunctionFlavor flavor);
|
||||
const std::string& args, const IDPList* lambda_ids);
|
||||
|
||||
// Used for the case of employing an instance of a CPPDynStmt object.
|
||||
void DeclareDynCPPStmt();
|
||||
|
||||
// Generates the declarations (and in-line definitions) associated
|
||||
// with compiling a lambda.
|
||||
|
@ -331,11 +495,40 @@ private:
|
|||
// the given type, lambda captures (if non-nil), and profile.
|
||||
std::string ParamDecl(const FuncTypePtr& ft, const IDPList* lambda_ids, const ProfileFunc* pf);
|
||||
|
||||
// Returns in p_types the types associated with the parameters for a function
|
||||
// of the given type, set of lambda captures (if any), and profile.
|
||||
void GatherParamTypes(std::vector<std::string>& p_types, const FuncTypePtr& ft,
|
||||
const IDPList* lambda_ids, const ProfileFunc* pf);
|
||||
|
||||
// Same, but instead returns the parameter's names.
|
||||
void GatherParamNames(std::vector<std::string>& p_names, const FuncTypePtr& ft,
|
||||
const IDPList* lambda_ids, const ProfileFunc* pf);
|
||||
|
||||
// Inspects the given profile to find the i'th parameter (starting
|
||||
// at 0). Returns nil if the profile indicates that that parameter
|
||||
// is not used by the function.
|
||||
const ID* FindParam(int i, const ProfileFunc* pf);
|
||||
|
||||
// Information associated with a CPPDynStmt dynamic dispatch.
|
||||
struct DispatchInfo
|
||||
{
|
||||
std::string cast; // C++ cast to use for function pointer
|
||||
std::string args; // arguments to pass to the function
|
||||
bool is_hook; // whether the function is a hook
|
||||
TypePtr yield; // what type the function returns, if any
|
||||
};
|
||||
|
||||
// An array of cast/invocation pairs used to generate the CPPDynStmt
|
||||
// Exec method.
|
||||
std::vector<DispatchInfo> func_casting_glue;
|
||||
|
||||
// Maps casting strings to indices into func_casting_glue. The index
|
||||
// is what's used to dynamically switch to the right dispatch.
|
||||
std::unordered_map<std::string, int> casting_index;
|
||||
|
||||
// Maps functions (using their C++ name) to their casting strings.
|
||||
std::unordered_map<std::string, std::string> func_index;
|
||||
|
||||
// Names for lambda capture ID's. These require a separate space
|
||||
// that incorporates the lambda's name, to deal with nested lambda's
|
||||
// that refer to the identifiers with the same name.
|
||||
|
@ -344,7 +537,7 @@ private:
|
|||
// The function's parameters. Tracked so we don't re-declare them.
|
||||
std::unordered_set<const ID*> params;
|
||||
|
||||
// Whether we're parsing a hook.
|
||||
// Whether we're compiling a hook.
|
||||
bool in_hook = false;
|
||||
|
||||
//
|
||||
|
@ -362,8 +555,12 @@ private:
|
|||
void CompileLambda(const LambdaExpr* l, const ProfileFunc* pf);
|
||||
|
||||
// Generates the body of the Invoke() method (which supplies the
|
||||
// "glue" between for calling the C++-generated code).
|
||||
void GenInvokeBody(const std::string& fname, const TypePtr& t, const std::string& args);
|
||||
// "glue" for calling the C++-generated code, for CPPStmt subclasses).
|
||||
void GenInvokeBody(const std::string& fname, const TypePtr& t, const std::string& args)
|
||||
{
|
||||
GenInvokeBody(fname + "(" + args + ")", t);
|
||||
}
|
||||
void GenInvokeBody(const std::string& call, const TypePtr& t);
|
||||
|
||||
// Generates the code for the body of a script function with
|
||||
// the given type, profile, C++ name, AST, lambda captures
|
||||
|
@ -405,9 +602,6 @@ private:
|
|||
// Maps function bodies to the names we use for them.
|
||||
std::unordered_map<const Stmt*, std::string> body_names;
|
||||
|
||||
// Reverse mapping.
|
||||
std::unordered_map<std::string, const Stmt*> names_to_bodies;
|
||||
|
||||
// Maps function names to hashes of bodies.
|
||||
std::unordered_map<std::string, p_hash_type> body_hashes;
|
||||
|
||||
|
@ -426,62 +620,84 @@ private:
|
|||
//
|
||||
// End of methods related to generating compiled script bodies.
|
||||
|
||||
// Start of methods related to generating code for representing
|
||||
// script constants as run-time values.
|
||||
// See Consts.cc for definitions.
|
||||
//
|
||||
// Methods related to generating code for representing script constants
|
||||
// as run-time values. There's only one nontrivial one of these,
|
||||
// RegisterConstant() (declared above, as it's public). All the other
|
||||
// work is done by secondary objects - see InitsInfo.{h,cc} for those.
|
||||
|
||||
// Returns an instantiation of a constant - either as a native
|
||||
// C++ constant, or as a C++ variable that will be bound to
|
||||
// a Zeek value at run-time initialization - that is needed
|
||||
// by the given "parent" object (which acquires an initialization
|
||||
// dependency, if a C++ variable is needed).
|
||||
std::string BuildConstant(IntrusivePtr<Obj> parent, const ValPtr& vp)
|
||||
{
|
||||
return BuildConstant(parent.get(), vp);
|
||||
}
|
||||
std::string BuildConstant(const Obj* parent, const ValPtr& vp);
|
||||
// Returns the object used to track indices (vectors of integers
|
||||
// that are used to index various other vectors, including other
|
||||
// indices). Only used by CPP_InitsInfo objects, but stored
|
||||
// in the CPPCompile object to make it available across different
|
||||
// CPP_InitsInfo objects.
|
||||
|
||||
// Called to create a constant appropriate for the given expression
|
||||
// or, more directly, the given value. The second method returns
|
||||
// "true" if a C++ variable needed to be created to construct the
|
||||
// constant at run-time initialization, false if can be instantiated
|
||||
// directly as a C++ constant.
|
||||
void AddConstant(const ConstExpr* c);
|
||||
bool AddConstant(const ValPtr& v);
|
||||
|
||||
// Build particular types of C++ variables (with the given name)
|
||||
// to hold constants initialized at run-time.
|
||||
void AddStringConstant(const ValPtr& v, std::string& const_name);
|
||||
void AddPatternConstant(const ValPtr& v, std::string& const_name);
|
||||
void AddListConstant(const ValPtr& v, std::string& const_name);
|
||||
void AddRecordConstant(const ValPtr& v, std::string& const_name);
|
||||
void AddTableConstant(const ValPtr& v, std::string& const_name);
|
||||
void AddVectorConstant(const ValPtr& v, std::string& const_name);
|
||||
friend class CPP_InitsInfo;
|
||||
IndicesManager& IndMgr() { return indices_mgr; }
|
||||
|
||||
// Maps (non-native) constants to associated C++ globals.
|
||||
std::unordered_map<const ConstExpr*, std::string> const_exprs;
|
||||
|
||||
// Maps the values of (non-native) constants to associated C++ globals.
|
||||
std::unordered_map<const Val*, std::string> const_vals;
|
||||
// Maps the values of (non-native) constants to associated initializer
|
||||
// information.
|
||||
std::unordered_map<const Val*, std::shared_ptr<CPP_InitInfo>> const_vals;
|
||||
|
||||
// Same, but for the offset into the vector that tracks all constants
|
||||
// collectively (to support initialization of compound constants).
|
||||
std::unordered_map<const Val*, int> const_offsets;
|
||||
|
||||
// The same as the above pair, but indexed by the string representation
|
||||
// rather than the Val*. The reason for having both is to enable
|
||||
// reusing common constants even though their Val*'s differ.
|
||||
std::unordered_map<std::string, std::shared_ptr<CPP_InitInfo>> constants;
|
||||
std::unordered_map<std::string, int> constants_offsets;
|
||||
|
||||
// Used for memory management associated with const_vals's index.
|
||||
std::vector<ValPtr> cv_indices;
|
||||
|
||||
// Maps string representations of (non-native) constants to
|
||||
// associated C++ globals.
|
||||
std::unordered_map<std::string, std::string> constants;
|
||||
// For different types of constants (as indicated by TypeTag),
|
||||
// provides the associated object that manages the initializers
|
||||
// for those constants.
|
||||
std::unordered_map<TypeTag, std::shared_ptr<CPP_InitsInfo>> const_info;
|
||||
|
||||
// Maps the same representations to the Val* associated with their
|
||||
// original creation. This enables us to construct initialization
|
||||
// dependencies for later Val*'s that are able to reuse the same
|
||||
// constant.
|
||||
std::unordered_map<std::string, const Val*> constants_to_vals;
|
||||
// Tracks entries for constructing the vector of all constants
|
||||
// (regardless of type). Each entry provides a TypeTag, used
|
||||
// to identify the type-specific vector for a given constant,
|
||||
// and the offset into that vector.
|
||||
std::vector<std::pair<TypeTag, int>> consts;
|
||||
|
||||
// Function variables that we need to create dynamically for
|
||||
// initializing globals, coupled with the name of their associated
|
||||
// constant.
|
||||
std::unordered_map<FuncVal*, std::string> func_vars;
|
||||
// The following objects track initialization information for
|
||||
// different types of initializers: Zeek types, individual
|
||||
// attributes, sets of attributes, expressions that call script
|
||||
// functions (for attribute expressions), registering lambda
|
||||
// bodies, and registering Zeek globals.
|
||||
std::shared_ptr<CPP_InitsInfo> type_info;
|
||||
std::shared_ptr<CPP_InitsInfo> attr_info;
|
||||
std::shared_ptr<CPP_InitsInfo> attrs_info;
|
||||
std::shared_ptr<CPP_InitsInfo> call_exprs_info;
|
||||
std::shared_ptr<CPP_InitsInfo> lambda_reg_info;
|
||||
std::shared_ptr<CPP_InitsInfo> global_id_info;
|
||||
|
||||
// Tracks all of the above objects (as well as each entry in
|
||||
// const_info), to facilitate easy iterating over them.
|
||||
std::set<std::shared_ptr<CPP_InitsInfo>> all_global_info;
|
||||
|
||||
// Tracks the attribute expressions for which we need to generate
|
||||
// function calls to evaluate them.
|
||||
std::unordered_map<std::string, std::shared_ptr<CallExprInitInfo>> init_infos;
|
||||
|
||||
// See IndMgr() above for the role of this variable.
|
||||
IndicesManager indices_mgr;
|
||||
|
||||
// Maps strings to associated offsets.
|
||||
std::unordered_map<std::string, int> tracked_strings;
|
||||
|
||||
// Tracks strings we've registered in order (corresponding to
|
||||
// their offsets).
|
||||
std::vector<std::string> ordered_tracked_strings;
|
||||
|
||||
// The same as the previous two, but for profile hashes.
|
||||
std::vector<p_hash_type> ordered_tracked_hashes;
|
||||
std::unordered_map<p_hash_type, int> tracked_hashes;
|
||||
|
||||
//
|
||||
// End of methods related to generating code for script constants.
|
||||
|
@ -649,9 +865,9 @@ private:
|
|||
// not the outer map).
|
||||
int num_rf_mappings = 0;
|
||||
|
||||
// For each entry in "field_mapping", the record and TypeDecl
|
||||
// associated with the mapping.
|
||||
std::vector<std::pair<const RecordType*, const TypeDecl*>> field_decls;
|
||||
// For each entry in "field_mapping", the record (as a global
|
||||
// offset) and TypeDecl associated with the mapping.
|
||||
std::vector<std::pair<int, const TypeDecl*>> field_decls;
|
||||
|
||||
// For enums that are extended via redef's, maps each distinct
|
||||
// value (that the compiled scripts refer to) to locations in the
|
||||
|
@ -665,9 +881,9 @@ private:
|
|||
// not the outer map).
|
||||
int num_ev_mappings = 0;
|
||||
|
||||
// For each entry in "enum_mapping", the record and name
|
||||
// associated with the mapping.
|
||||
std::vector<std::pair<const EnumType*, std::string>> enum_names;
|
||||
// For each entry in "enum_mapping", the EnumType (as a global
|
||||
// offset) and name associated with the mapping.
|
||||
std::vector<std::pair<int, std::string>> enum_names;
|
||||
|
||||
//
|
||||
// End of methods related to generating code for AST Expr's.
|
||||
|
@ -690,24 +906,6 @@ private:
|
|||
// given script type 't', converts it as needed to the given GenType.
|
||||
std::string GenericValPtrToGT(const std::string& expr, const TypePtr& t, GenType gt);
|
||||
|
||||
// For a given type, generates the code necessary to initialize
|
||||
// it at run time. The term "expand" in the method's name refers
|
||||
// to the fact that the type has already been previously declared
|
||||
// (necessary to facilitate defining recursive types), so this method
|
||||
// generates the "meat" of the type but not its original declaration.
|
||||
void ExpandTypeVar(const TypePtr& t);
|
||||
|
||||
// Methods for expanding specific such types. "tn" is the name
|
||||
// of the C++ variable used for the particular type.
|
||||
void ExpandListTypeVar(const TypePtr& t, std::string& tn);
|
||||
void ExpandRecordTypeVar(const TypePtr& t, std::string& tn);
|
||||
void ExpandEnumTypeVar(const TypePtr& t, std::string& tn);
|
||||
void ExpandTableTypeVar(const TypePtr& t, std::string& tn);
|
||||
void ExpandFuncTypeVar(const TypePtr& t, std::string& tn);
|
||||
|
||||
// The following assumes we're populating a type_decl_list called "tl".
|
||||
std::string GenTypeDecl(const TypeDecl* td);
|
||||
|
||||
// Returns the name of a C++ variable that will hold a TypePtr
|
||||
// of the appropriate flavor. 't' does not need to be a type
|
||||
// representative.
|
||||
|
@ -721,21 +919,11 @@ private:
|
|||
const Type* TypeRep(const TypePtr& t) { return TypeRep(t.get()); }
|
||||
|
||||
// Low-level C++ representations for types, of various flavors.
|
||||
const char* TypeTagName(TypeTag tag) const;
|
||||
static const char* TypeTagName(TypeTag tag);
|
||||
const char* TypeName(const TypePtr& t);
|
||||
const char* FullTypeName(const TypePtr& t);
|
||||
const char* TypeType(const TypePtr& t);
|
||||
|
||||
// Track the given type (with support methods for onces that
|
||||
// are complicated), recursively including its sub-types, and
|
||||
// creating initializations (and dependencies) for constructing
|
||||
// C++ variables representing the types.
|
||||
void RegisterType(const TypePtr& t);
|
||||
void RegisterListType(const TypePtr& t);
|
||||
void RegisterTableType(const TypePtr& t);
|
||||
void RegisterRecordType(const TypePtr& t);
|
||||
void RegisterFuncType(const TypePtr& t);
|
||||
|
||||
// Access to a type's underlying values.
|
||||
const char* NativeAccessor(const TypePtr& t);
|
||||
|
||||
|
@ -744,11 +932,13 @@ private:
|
|||
const char* IntrusiveVal(const TypePtr& t);
|
||||
|
||||
// Maps types to indices in the global "types__CPP" array.
|
||||
CPPTracker<Type> types = {"types", &compiled_items};
|
||||
CPPTracker<Type> types = {"types", true, &compiled_items};
|
||||
|
||||
// Used to prevent analysis of mutually-referring types from
|
||||
// leading to infinite recursion.
|
||||
std::unordered_set<const Type*> processed_types;
|
||||
// leading to infinite recursion. Maps types to their global
|
||||
// initialization information (or, initially, to nullptr, if
|
||||
// they're in the process of being registered).
|
||||
std::unordered_map<const Type*, std::shared_ptr<CPP_InitInfo>> processed_types;
|
||||
|
||||
//
|
||||
// End of methods related to managing script types.
|
||||
|
@ -758,11 +948,6 @@ private:
|
|||
// See Attrs.cc for definitions.
|
||||
//
|
||||
|
||||
// Tracks a use of the given set of attributes, including
|
||||
// initialization dependencies and the generation of any
|
||||
// associated expressions.
|
||||
void RegisterAttributes(const AttributesPtr& attrs);
|
||||
|
||||
// Populates the 2nd and 3rd arguments with C++ representations
|
||||
// of the tags and (optional) values/expressions associated with
|
||||
// the set of attributes.
|
||||
|
@ -772,16 +957,17 @@ private:
|
|||
void GenAttrs(const AttributesPtr& attrs);
|
||||
std::string GenAttrExpr(const ExprPtr& e);
|
||||
|
||||
// Returns the name of the C++ variable that will hold the given
|
||||
// attributes at run-time.
|
||||
std::string AttrsName(const AttributesPtr& attrs);
|
||||
|
||||
// Returns a string representation of the name associated with
|
||||
// different attributes (e.g., "ATTR_DEFAULT").
|
||||
const char* AttrName(const AttrPtr& attr);
|
||||
// different attribute tags (e.g., "ATTR_DEFAULT").
|
||||
static const char* AttrName(AttrTag t);
|
||||
|
||||
// Similar for attributes, so we can reconstruct record types.
|
||||
CPPTracker<Attributes> attributes = {"attrs", &compiled_items};
|
||||
CPPTracker<Attributes> attributes = {"attrs", false, &compiled_items};
|
||||
|
||||
// Maps Attributes and Attr's to their global initialization
|
||||
// information.
|
||||
std::unordered_map<const Attributes*, std::shared_ptr<CPP_InitInfo>> processed_attrs;
|
||||
std::unordered_map<const Attr*, std::shared_ptr<CPP_InitInfo>> processed_attr;
|
||||
|
||||
//
|
||||
// End of methods related to managing script type attributes.
|
||||
|
@ -790,121 +976,42 @@ private:
|
|||
// See Inits.cc for definitions.
|
||||
//
|
||||
|
||||
// Generates code to construct a CallExpr that can be used to
|
||||
// evaluate the expression 'e' as an initializer (typically
|
||||
// for a record &default attribute).
|
||||
void GenInitExpr(const ExprPtr& e);
|
||||
|
||||
// True if the given expression is simple enough that we can
|
||||
// generate code to evaluate it directly, and don't need to
|
||||
// create a separate function per GenInitExpr().
|
||||
bool IsSimpleInitExpr(const ExprPtr& e) const;
|
||||
// Generates code for dynamically generating an expression
|
||||
// associated with an attribute, via a function call.
|
||||
void GenInitExpr(std::shared_ptr<CallExprInitInfo> ce_init);
|
||||
|
||||
// Returns the name of a function used to evaluate an
|
||||
// initialization expression.
|
||||
std::string InitExprName(const ExprPtr& e);
|
||||
|
||||
// Generates code to initializes the global 'g' (with C++ name "gl")
|
||||
// to the given value *if* on start-up it doesn't already have a value.
|
||||
void GenGlobalInit(const ID* g, std::string& gl, const ValPtr& v);
|
||||
|
||||
// Generates code to initialize all of the function-valued globals
|
||||
// (i.e., those pointing to lambdas).
|
||||
void GenFuncVarInits();
|
||||
|
||||
// Generates the "pre-initialization" for a given type. For
|
||||
// extensible types (records, enums, lists), these are empty
|
||||
// versions that we'll later populate.
|
||||
void GenPreInit(const Type* t);
|
||||
|
||||
// Generates a function that executes the pre-initializations.
|
||||
void GenPreInits();
|
||||
|
||||
// The following all track that for a given object, code associated
|
||||
// with initializing it. Multiple calls for the same object append
|
||||
// additional lines of code (the order of the calls is preserved).
|
||||
//
|
||||
// Versions with "lhs" and "rhs" arguments provide an initialization
|
||||
// of the form "lhs = rhs;", as a convenience.
|
||||
void AddInit(const IntrusivePtr<Obj>& o, const std::string& lhs, const std::string& rhs)
|
||||
// Convenience functions for return the offset or initialization cohort
|
||||
// associated with an initialization.
|
||||
int GI_Offset(const std::shared_ptr<CPP_InitInfo>& gi) const { return gi ? gi->Offset() : -1; }
|
||||
int GI_Cohort(const std::shared_ptr<CPP_InitInfo>& gi) const
|
||||
{
|
||||
AddInit(o.get(), lhs + " = " + rhs + ";");
|
||||
}
|
||||
void AddInit(const Obj* o, const std::string& lhs, const std::string& rhs)
|
||||
{
|
||||
AddInit(o, lhs + " = " + rhs + ";");
|
||||
}
|
||||
void AddInit(const IntrusivePtr<Obj>& o, const std::string& init) { AddInit(o.get(), init); }
|
||||
void AddInit(const Obj* o, const std::string& init);
|
||||
|
||||
// We do consistency checking of initialization dependencies by
|
||||
// looking for depended-on objects have initializations. Sometimes
|
||||
// it's unclear whether the object will actually require
|
||||
// initialization, in which case we add an empty initialization
|
||||
// for it so that the consistency-checking is happy.
|
||||
void AddInit(const IntrusivePtr<Obj>& o) { AddInit(o.get()); }
|
||||
void AddInit(const Obj* o);
|
||||
|
||||
// This is akin to an initialization, but done separately
|
||||
// (upon "activation") so it can include initializations that
|
||||
// rely on parsing having finished (in particular, BiFs having
|
||||
// been registered). Only used when generating standalone code.
|
||||
void AddActivation(std::string a) { activations.emplace_back(a); }
|
||||
|
||||
// Records the fact that the initialization of object o1 depends
|
||||
// on that of object o2.
|
||||
void NoteInitDependency(const IntrusivePtr<Obj>& o1, const IntrusivePtr<Obj>& o2)
|
||||
{
|
||||
NoteInitDependency(o1.get(), o2.get());
|
||||
}
|
||||
void NoteInitDependency(const IntrusivePtr<Obj>& o1, const Obj* o2)
|
||||
{
|
||||
NoteInitDependency(o1.get(), o2);
|
||||
}
|
||||
void NoteInitDependency(const Obj* o1, const IntrusivePtr<Obj>& o2)
|
||||
{
|
||||
NoteInitDependency(o1, o2.get());
|
||||
}
|
||||
void NoteInitDependency(const Obj* o1, const Obj* o2);
|
||||
|
||||
// Records an initialization dependency of the given object
|
||||
// on the given type, unless the type is a record. We need
|
||||
// this notion to protect against circular dependencies in
|
||||
// the face of recursive records.
|
||||
void NoteNonRecordInitDependency(const Obj* o, const TypePtr& t)
|
||||
{
|
||||
if ( t && t->Tag() != TYPE_RECORD )
|
||||
NoteInitDependency(o, TypeRep(t));
|
||||
}
|
||||
void NoteNonRecordInitDependency(const IntrusivePtr<Obj> o, const TypePtr& t)
|
||||
{
|
||||
NoteNonRecordInitDependency(o.get(), t);
|
||||
return gi ? gi->InitCohort() : 0;
|
||||
}
|
||||
|
||||
// Analyzes the initialization dependencies to ensure that they're
|
||||
// consistent, i.e., every object that either depends on another,
|
||||
// or is itself depended on, appears in the "to_do" set.
|
||||
void CheckInitConsistency(std::unordered_set<const Obj*>& to_do);
|
||||
|
||||
// Generate initializations for the items in the "to_do" set,
|
||||
// in accordance with their dependencies. Returns 'n', the
|
||||
// number of initialization functions generated. They should
|
||||
// be called in order, from 1 to n.
|
||||
int GenDependentInits(std::unordered_set<const Obj*>& to_do);
|
||||
|
||||
// Generates a function for initializing the nc'th cohort.
|
||||
void GenInitCohort(int nc, std::unordered_set<const Obj*>& cohort);
|
||||
|
||||
// Initialize the mappings for record field offsets for field
|
||||
// accesses into regions of records that can be extensible (and
|
||||
// thus can vary at run-time to the offsets encountered during
|
||||
// compilation).
|
||||
// Generate code to initialize the mappings for record field
|
||||
// offsets for field accesses into regions of records that
|
||||
// can be extensible (and thus can vary at run-time to the
|
||||
// offsets encountered during compilation).
|
||||
void InitializeFieldMappings();
|
||||
|
||||
// Same, but for enum types. The second form does a single
|
||||
// initialization corresponding to the given index in the mapping.
|
||||
// Same, but for enum types.
|
||||
void InitializeEnumMappings();
|
||||
void InitializeEnumMappings(const EnumType* et, const std::string& e_name, int index);
|
||||
|
||||
// Generate code to initialize BiFs.
|
||||
void InitializeBiFs();
|
||||
|
||||
// Generate code to initialize strings that we track.
|
||||
void InitializeStrings();
|
||||
|
||||
// Generate code to initialize hashes that we track.
|
||||
void InitializeHashes();
|
||||
|
||||
// Generate code to initialize indirect references to constants.
|
||||
void InitializeConsts();
|
||||
|
||||
// Generate the initialization hook for this set of compiled code.
|
||||
void GenInitHook();
|
||||
|
@ -917,25 +1024,15 @@ private:
|
|||
// what we compiled.
|
||||
void GenLoad();
|
||||
|
||||
// A list of pre-initializations (those potentially required by
|
||||
// other initializations, and that themselves have no dependencies).
|
||||
std::vector<std::string> pre_inits;
|
||||
|
||||
// A list of "activations" (essentially, post-initializations).
|
||||
// See AddActivation() above.
|
||||
std::vector<std::string> activations;
|
||||
// A list of BiFs to look up during initialization. First
|
||||
// string is the name of the C++ global holding the BiF, the
|
||||
// second is its name as known to Zeek.
|
||||
std::unordered_map<std::string, std::string> BiFs;
|
||||
|
||||
// Expressions for which we need to generate initialization-time
|
||||
// code. Currently, these are only expressions appearing in
|
||||
// attributes.
|
||||
CPPTracker<Expr> init_exprs = {"gen_init_expr", &compiled_items};
|
||||
|
||||
// Maps an object requiring initialization to its initializers.
|
||||
std::unordered_map<const Obj*, std::vector<std::string>> obj_inits;
|
||||
|
||||
// Maps an object requiring initializations to its dependencies
|
||||
// on other such objects.
|
||||
std::unordered_map<const Obj*, std::unordered_set<const Obj*>> obj_deps;
|
||||
CPPTracker<Expr> init_exprs = {"gen_init_expr", false, &compiled_items};
|
||||
|
||||
//
|
||||
// End of methods related to run-time initialization.
|
||||
|
@ -944,12 +1041,20 @@ private:
|
|||
// See Emit.cc for definitions.
|
||||
//
|
||||
|
||||
// The following all need to be able to emit code.
|
||||
friend class CPP_BasicConstInitsInfo;
|
||||
friend class CPP_CompoundInitsInfo;
|
||||
friend class IndicesManager;
|
||||
|
||||
// Used to create (indented) C++ {...} code blocks. "needs_semi"
|
||||
// controls whether to terminate the block with a ';' (such as
|
||||
// for class definitions.
|
||||
void StartBlock();
|
||||
void EndBlock(bool needs_semi = false);
|
||||
|
||||
void IndentUp() { ++block_level; }
|
||||
void IndentDown() { --block_level; }
|
||||
|
||||
// Various ways of generating code. The multi-argument methods
|
||||
// assume that the first argument is a printf-style format
|
||||
// (but one that can only have %s specifiers).
|
||||
|
@ -960,11 +1065,12 @@ private:
|
|||
NL();
|
||||
}
|
||||
|
||||
void Emit(const std::string& fmt, const std::string& arg) const
|
||||
void Emit(const std::string& fmt, const std::string& arg, bool do_NL = true) const
|
||||
{
|
||||
Indent();
|
||||
fprintf(write_file, fmt.c_str(), arg.c_str());
|
||||
NL();
|
||||
if ( do_NL )
|
||||
NL();
|
||||
}
|
||||
|
||||
void Emit(const std::string& fmt, const std::string& arg1, const std::string& arg2) const
|
||||
|
@ -999,14 +1105,15 @@ private:
|
|||
NL();
|
||||
}
|
||||
|
||||
// Returns an expression for constructing a Zeek String object
|
||||
// corresponding to the given byte array.
|
||||
std::string GenString(const char* b, int len) const;
|
||||
|
||||
// For the given byte array / string, returns a version expanded
|
||||
// with escape sequences in order to represent it as a C++ string.
|
||||
std::string CPPEscape(const char* b, int len) const;
|
||||
std::string CPPEscape(const char* s) const { return CPPEscape(s, strlen(s)); }
|
||||
void Emit(const std::string& fmt, const std::string& arg1, const std::string& arg2,
|
||||
const std::string& arg3, const std::string& arg4, const std::string& arg5,
|
||||
const std::string& arg6) const
|
||||
{
|
||||
Indent();
|
||||
fprintf(write_file, fmt.c_str(), arg1.c_str(), arg2.c_str(), arg3.c_str(), arg4.c_str(),
|
||||
arg5.c_str(), arg6.c_str());
|
||||
NL();
|
||||
}
|
||||
|
||||
void NL() const { fputc('\n', write_file); }
|
||||
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue