About implementing specific custom primitives #32

breakds · 2023-09-24T02:48:42Z

Sorry about using GitHub issues for asking questions again. After switching to cpp20 branch I was able to successfully run examples. Thanks!

Now I would like to discuss on whether (and how) the following can be achieved. The dataset that I having is actually a time series which is ordered. This means that theoretically primitives such as "shift 1" and "shift -1" or "rolling mean" are valid primitives that can batch operate on each of the original variables and intermediate variables. After briefly reading the code, mainly functions.hpp, I have a few questions on how to implement the above:

It seems that there is an upper limit on the number of primitives we can have because NodeType is 32 bit integer. If I would like to add more primitives, should I extend this to uint64_t?
It seems that each time when the primitive in function.cpp is called, it is called on a batch of a variable or intermediate. Is it possible to make it call across the whole dataset (across the data point dimension, which in the time series case is the time dimension)?
If the above two can be resolved, I think I can probably come up with a solution. Is there any other approach that you would recommend?

Thank you!

The text was updated successfully, but these errors were encountered:

foolnotion · 2023-09-24T08:33:14Z

Hi, yes this is theoretically possible. This is what the Dynamic node type is supposed to do. My plan is to eventually get rid of all the hard-coded function types and only rely on functions registered at runtime.

The idea is to define a custom function and register it in the dispatch table. Here is an example:

define a custom summation_function
https://github.com/foolnotion/atomic-potentials/blob/5c871ea89c99a3a7e965bb61e48f848d3b4e159a/source/atomic.hpp#L72
register the function
https://github.com/foolnotion/atomic-potentials/blob/5c871ea89c99a3a7e965bb61e48f848d3b4e159a/source/main.cpp#L219
define a Dynamic node type with this function and add it to the primitive set
https://github.com/foolnotion/atomic-potentials/blob/5c871ea89c99a3a7e965bb61e48f848d3b4e159a/source/main.cpp#L229

The atomic potentials code relies on an older version of Operon but you should be able to make it work with the latest cpp20 head.

If you run into issues or find bugs please let me know.

breakds · 2023-09-24T21:22:41Z

Thanks a lot for the detailed explanation and links to the files. I am now understanding the code better. Let me try to implement the idea and update. Appreciate the prompt response!

breakds · 2023-10-02T00:03:55Z

I was slowly learning the concepts, and there is a few more questions if you don't mind.

I do not understand the if branch at here. What does it mean by symbolic == true or symbolic == false? How is this related to the template argument to be int or Scalar?
There are a few options for TreeCreator. Is there a rule of thumb to pick from those candidates before digging into the implementation?
On c++20 branch, In order to create an Interpreter, it requires having a dispatch table, a dataset and a tree. I am not sure what the tree that I should supply to it. My current vague understanding is that trees are "symbolic formulas" generated as candidates for evaluation during the solving phase of the algorithm. And because of this (probably wrong) understanding, I found it confusing how and why I should provide a tree to Interpreter construction, which happens before the algorithm starts running.

Thanks a lot! Still, sorry if some of the questions seems dumb, I didn't have time to full go through all the detailed code yet.

foolnotion · 2023-10-02T09:00:35Z

I do not understand the if branch at here. What does it mean by symbolic == true or symbolic == false? How is this related to the template argument to be int or Scalar?

The symbolic boolean flag was meant to configure the algorithm in a certain way as to promote "nice" models (formulas):

only integer coefficients (during the run and during initialization)
mutation operator configured to only support integer values
nonlinear least squares coefficient tuning disabled

There are a few options for TreeCreator. Is there a rule of thumb to pick from those candidates before digging into the implementation?

In general I've noticed that the choice of creator does not make a difference in algorithm performance. I would recommend using the BalancedTreeCreator which imho is a better version of PTC2. It may also be beneficial to limit max tree size during initialization to a smaller limit (5-15 nodes). Keep the max tree size during the run to a larger value.

On c++20 branch, In order to create an Interpreter, it requires having a dispatch table, a dataset and a tree. I am not sure what the tree that I should supply to it.

Yes, this was a big change from before, in the interest of making it easier to program the entire tree evaluation / optimization infrastructure and integration with likelihoods.

The tree is kept in the Genotype property of the Individual https://github.com/heal-research/operon/blob/cpp20/include/operon/core/individual.hpp#L18

So normally you'd want to use an interpreter in a context where you already have an individual, so then you'd pass individual.Genotype to the interpreter.

Similar to here: https://github.com/heal-research/operon/blob/cpp20/source/operators/evaluator.cpp#L196

breakds · 2023-10-02T15:33:47Z

Thank you for the explanation! I now understand why using int for symbolic case and more about the tree creator!

One more question about Interpreter if you don't mind.

I am actually creating the Interpreter before having anything yet. This is because (I might be wrong) to create the algorithm instance (e.g. NSGA2), it seems that the following need to be constructed:

Interpreter ⇨ ErrorEvaluator ⇨ Generator ⇨ NSGA2

If ErrorEvaluator is going to be able to evaluate all sorts of trees, which specific tree do I need to construct to provide to the Interpreter? This is at the stage that the algorithm is yet to be constructed - does that mean I just create an arbitrary tree by hand?

Thanks!

foolnotion · 2023-10-09T07:17:28Z

Hi,

I am actually creating the Interpreter before having anything yet. This is because (I might be wrong) to create the algorithm instance (e.g. NSGA2), it seems that the following need to be constructed:

Normally you shouldn't need to initialize the interpreter yourself.

The flow should be:
DispatchTable ⇨ ErrorEvaluator ⇨ Generator ⇨ NSGA2

The specific type of interpreter can be passed as a template parameter to the DispatchTable.

If ErrorEvaluator is going to be able to evaluate all sorts of trees, which specific tree do I need to construct to provide to the Interpreter?

The interpreter will know how to evaluate any kind of tree (or, more accurately, any type of node inside the tree) by querying the dispatch table for the appropriate function primitive. The interpreter is meant to be a lightweight cheap object initialized on the spot whenever a tree needs to be interpreted (so you'd construct an interpreter within an evaluator context when you already have a tree). You do not need to construct an interpreter manually before the algorithm.

If you show me your code I can assist more.

github-actions · 2023-11-09T01:45:46Z

This issue is stale because it has been open for 30 days with no activity.

github-actions bot added the stale label Nov 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About implementing specific custom primitives #32

About implementing specific custom primitives #32

breakds commented Sep 24, 2023

foolnotion commented Sep 24, 2023 •

edited

Loading

breakds commented Sep 24, 2023

breakds commented Oct 2, 2023

foolnotion commented Oct 2, 2023

breakds commented Oct 2, 2023

foolnotion commented Oct 9, 2023

github-actions bot commented Nov 9, 2023

About implementing specific custom primitives #32

About implementing specific custom primitives #32

Comments

breakds commented Sep 24, 2023

foolnotion commented Sep 24, 2023 • edited Loading

breakds commented Sep 24, 2023

breakds commented Oct 2, 2023

foolnotion commented Oct 2, 2023

breakds commented Oct 2, 2023

foolnotion commented Oct 9, 2023

github-actions bot commented Nov 9, 2023

foolnotion commented Sep 24, 2023 •

edited

Loading