Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

best practices / example for parameter sweep with batch runner #1915

Open
rlskoeser opened this issue Dec 18, 2023 · 16 comments
Open

best practices / example for parameter sweep with batch runner #1915

rlskoeser opened this issue Dec 18, 2023 · 16 comments
Labels
docs Release notes label
Milestone

Comments

@rlskoeser
Copy link
Contributor

Are there any examples or best practices for doing parameter sweep type analysis with the batch runner?

I'd like to be able to do a large batch run with a number of varying parameters, and then run some analyses on the results - some across all variants, but also isolating specific runs where a single parameter varies.

I don't see any to get the parameters out of the batch runner so that I can figure out which runs where initialized with which parameters. Is this not yet supported? I think a simple csv of run id and parameters would be sufficient for the analysis I want to do.

I found this example of running a parameter sweep in a notebook, but it wasn't quite enough to get me to what I want to do: https://github.com/projectmesa/mesa-schelling-example/blob/master/analysis.ipynb

I also see there are some other open issues related to batch running and parameters, but I'm not sure how related they are to what I'm asking for. This one seems the most similar:

@maltevogl
Copy link

This is my solution. I also did not find a good documentation. I wrote a small script that runs the batch through several parameter ranges. The result is saved as a JSONL. All parameters are part of the reported result. Depending on how you configure the data-collector you get the additional output from the model run, see e.g. the letter pruning here.

So you could run the analysis on the reported data-collector outputs, or select a subset of the results via e.g. pandas.query() and run the analysis only on that.

The script takes quite a while to run, to I put it on a server in a tmux session and come back a day later..

#!/usr/bin/env python
# -*- coding: utf-8 -*-

"""Run historicallettters model for data creation."""
from datetime import datetime
import pandas as pd
import mesa
from scicom.historicalletters.model import HistoricalLetters

for population in [400,300,200,100]:
    starttime = datetime.now()

    params = {
        "population": population,
        "similarityThreshold": 1,
        "longRangeNetworkFactor": 0.3,
        "shortRangeNetworkFactor": 0.8,
        "updateTopic": 0.1,
        "moveRange": [0.25, 0.5, 0.75, 0.9],
        "letterRange": [0.25, 0.5, 0.75, 0.9],
        "useActivation": [True, False],
        "useSocialNetwork": [True, False],
        "tempfolder": "./initialConditions/"
    }

    results = mesa.batch_run(
        HistoricalLetters,
        parameters=params,
        iterations=10,
        max_steps=200,
        number_processes=48,
        data_collection_period=-1,
        display_progress=True,
    )

    now = datetime.now()
    date_time = now.strftime("%d_%m_%Y-%H_%M_%S")

    print(f"Runtime: {(now - starttime)}")

    df = pd.DataFrame(results)

    df.to_json(f"./Run_N_{population}_{date_time}.json", orient="records", lines=True)

The output dataframe has the columns:

['RunId', 'iteration', 'Step', 'population', 'similarityThreshold',
       'longRangeNetworkFactor', 'shortRangeNetworkFactor', 'updateTopic',
       'moveRange', 'letterRange', 'useActivation', 'useSocialNetwork',
       'tempfolder', 'Ledger'] 

where the last one is the actual output of the model run.

@rlskoeser
Copy link
Contributor Author

@maltevogl thanks for sharing your approach, super helpful!

I hadn't thought about including important parameters in the model data collection - they would be redundant if you're collecting multiple rounds, but if you only collect data at the end it would be a good solution.

Running a large number of batches is pretty slow for me too, even though it doesn't take so many rounds for individual simulations to stabilize. It would be nice if there was a way to chunk out the different parameter combinations and run mulltiple jobs on a HPC cluster. Using jsonl would make it pretty easy to combine the results, too.

@EwoutH
Copy link
Member

EwoutH commented Dec 20, 2023

Have you checked out the batch run section of the tutorial? That's a good start. There's also the Bank Reserves Model has an example of the batch_run() function,

I'd like to be able to do a large batch run with a number of varying parameters, and then run some analyses on the results - some across all variants

If you're going into the direction of global sensitivity analysis / robustness, checkout the EMAworkbench!

Here's an example of using it with a Mesa model.

but also isolating specific runs where a single parameter varies.

Still in the draft stage, but I'm working on a built-in function for local sensitivity analysis. See #1908.

run mulltiple jobs on a HPC cluster.

Still experimental, but the EMAworkbench supports this (using our brand new MPIEvaluator).

@rlskoeser
Copy link
Contributor Author

@EwoutH thanks for the response and the links. Yes, I started with the batch run tutorial in the mesa documentation and I have been using it successfully, but now I want to scale up a bit in terms of the number of options for several parameters. My problem is that the batch runner doesn't generate a report of which combination of parameters were used for specific runs.

I'll check out your EMAworkbench.

@shanedicks
Copy link

shanedicks commented Jan 14, 2024

@rlskoeser I ran into this same issue, and what I ended up doing is reading a parameters dictionary from a json file rather than writing to it.

Basically, I write a json config file that gives an experiment name and an output directory, along with the parameters for that experiment, then I create that directory and store a copy of the config file with all the output. Here's my run.py. Hope that's helpful. I'd be happy to answer any questions if you want to reach out.

@rlskoeser
Copy link
Contributor Author

Thanks for chiming in, @shanedicks - good to know others have run into this and worked around it, and to have another example to look at. (FYI: your link has a typo if you want to edit and correct)

I made some progress on this last week and figured some things out, but I didn't get the time to write up any notes here.

I figured out that the default data collection does include the configured parameters in the output (which I feel like I should have noticed, but I think it wasn't documented; certainly I wasn't looking for it there when I was first doing data analysis). It also smashes together agent data, model, data, and parameters - which results in significant duplication even if you're only collecting data once for each simulation when it ends. The default batch runner holds all of this smushed-up collected data in memory until it can be returned, so it's simply not scalable for the number of parameters and runs I'm trying to do. I've written a custom batch runner, which uses some of the internal/undocumented mesa batch run methods for constructing parameter combinations and borrows some of the multiprocessing approach, but writes out to separate model and agent data files as it runs the simulations. I'm following the mesa approach and including the parameters in the model data for convenience. I've successfully run my new script on our HPC cluster, and am testing now running it as an array job so I can easily increase the number of iterations for each configuration by spreading it across multiple tasks/runs.

I've been thinking about how the mesa batch runner could be refactored to make it more extensible - a class-based batch runner would make it easier to reuse some of the existing internal functionality (generating parameters combinations, multiprocessing, model+parameter data) and allow customizing e.g. how the data is collected and output.

@EwoutH
Copy link
Member

EwoutH commented Jan 16, 2024

Would love to see a proposal or draft implementation for a more efficient, flexible and/or performant batch runner. Personally I would find a way to do other configurations than full-factorial very useful.

The batch runner was already redesigned once and a lot of discussion has been around it, so it might be nice to make sure those lessons learned are not forgotten.

@Corvince and @quaquel might also have some thoughts on it.

@shanedicks
Copy link

@rlskoeser thanks for pointing out the typo. I have corrected it.

I have definitely run into the "all results stored in memory problem" so I made a Controller class to run a custom batch runner and a Manager class which is a sort of custom data collector, only it writes to a database instead of holding data in memory.

I've been up and running on our HPC cluster for about a year now.

I foolishly started with Sqlite as a database because I didn't want to bother with setting up Postgres or something, but I have really struggled with the inability to have concurrent connections.

@quaquel
Copy link
Member

quaquel commented Jan 17, 2024

  1. I would not try to rebuild the EMA_workbench in MESA, but there are some simple lessons that I think can be learned from it. First, don't try to solve everything within MESA. Rather, provide a set of reusable components that users can use and extend to build their own custom solutions.
  2. For batch_run, the simple solution would be to support passing a data frame or collection of dicts as experiments. It is easy to use SALIb or numpy.qmc to create custom experiments, turn them into a data frame, and then rely on batch_run to execute them. Why should MESA try to solve the generation of various experimental designs, given that there are several libraries out there that already do this?
  3. For data collection, the problem, in my view, is a bit more complicated due to the current architecture of MESA. In the EMA_workbech, I rely on a callback function that is called after each experiment. It is called with the experiments and the collected results for that experiment. This gives users a hook to write their own custom callback that, for example, flushes the results of each experiment to disk. This design, however, requires a clean separation of concerns: gathering data for a single run and gathering data across a set of experiments. If storing the results of a single experiment in memory is already a problem, this won't work. Personally, I would argue that you then are engaging in a form of data dredging anyway instead of designing experiments with a specific analysis in mind. Also, if a single run is too large to keep in memory, the current individual run-level data collector is inadequate anyway.

@EwoutH
Copy link
Member

EwoutH commented Jan 17, 2024

On 1 and 2: Agreed, but Mesa should support a (clearly documented) end-to-end workflow in my opinion. It doesn't need to rebuilt everything itself, but it doesn't need to connect to the most used components in the modelling world and have clearly documented how to use them.

(but this is an separate topic, so I will open a new discussion for it)

@quaquel
Copy link
Member

quaquel commented Jan 17, 2024

I agree that it would be good to have an example in the documentation showing an entire workflow, including some other tools that can be used as part of this (e.g., SALib).

On data storage of experiments (so separate from individual-run data collection), I would be fine with providing a well-documented hook that users could use to do their custom preferred data storage.

@rht
Copy link
Contributor

rht commented Jan 17, 2024

My question would be, if I want to follow best practice regarding with sensitivity analysis, I should use EMA workbench or SALib instead of batch_run? Then I think it is a matter of a documentation rewrite: the bank reserves example needs to have an alternative implementation written using EMA workbench or SALib.

@quaquel
Copy link
Member

quaquel commented Jan 17, 2024

My question would be, if I want to follow best practice regarding with sensitivity analysis, I should use EMA workbench or SALib instead of batch_run? Then I think it is a matter of a documentation rewrite: the bank reserves example needs to have an alternative implementation written using EMA workbench or SALib.

I have a slightly different take. Current batch_run implicitly uses a factorial design over the specified parameters. Rather than implementing other experimental designs within MESA, generate the design with other tools and pass it to batch_run.

@EwoutH
Copy link
Member

EwoutH commented Jan 21, 2024

I'm going to dump an email here I recently send to some TU Delft professors/teachers, which is somewhat relevant to this disucssions.

Hi ABM teachers,

In now TAing SEN1211, but also on past year in TB233B (Systeemmodellering 4), I notice we always get a lot of questions about experiments, sensitivity, collecting data and visualising and reporting it. The whole process after “My model is done, now what”. And while on most ABM topics I feel we can answer satisfactory, on this topic I’m missing both knowledge (something I need to work on) but also munition/tools to give them.

I think there is consensus that in ABM we most of the time we want to tell a story, that answers the research question (and hypothesis) as well as possible, and the dynamics and emergent behaviour is more important than the actual numbers. Therefore, it depends, it depends and it depends.

But of course, data and visualisations are needed to support your story, and there are best practices and things that clearly don’t make sense. And while everything depends, you can make broad categories of kind of research and there are commonly used tools and methods. However I haven’t found a single, consistent source for this, as teaching material.

Just giving them some random examples from last year doesn’t feel like the optimal situation.

So I think a collection of information that roughly covers these points would be very useful:

A. Which kind of research questions are (commonly) asked in ABM?
B. Which kind of experimental setups could be used to answer them?
C. What kind of sensitivity analysis is useful when?
D. Which kind of data could you collect for each of them?
E. How would you aggerate the data collected (especially with agent data)?
F. How could you visualize that data to support your narrative?

The main point would be giving them both the knowledge to make those decisions and then the tools perform them. I think mixing theory with implementation examples could be really powerful here.

We don’t need to reinvent the wheel here, we can use existing methods, tools and libraries. The difficulty here is that sometimes there are limitations in runtime, time for analysis or complexity for students to learn. So aside from the “ideally” options, we should also provide some ways to make practical shortcuts towards “good enough” or even “better than nothing” solutions.

This problem (like many) could be separated in these steps:

  1. Consensus on the problem
  2. Consensus if it’s worth solving
  3. Consensus on the solutions
  4. Actually creating solutions

On the minimum, I hope to give you some empirical experience that this might be an area worth keeping in mind when teaching ABM. On the best case, I hope that we can find some kind of collaboration to discuss and develop this problem further, that in future courses we have a solid set of tools and guidance to provide to students with on how to approach and handle this.

Best,
Ewout

@rlskoeser
Copy link
Contributor Author

In case another example is useful, here's my custom batch run code: https://github.com/Princeton-CDH/simulating-risk/blob/main/simulatingrisk/hawkdovemulti/batch_run.py

I reused a couple of the internal mesa.batchrunner methods and adapted code from one section that couldn't directly be reused.

The main change is that my batch runner outputs data as model runs complete rather than storing everything in memory and writing it all out at the end.

I also have a slurm script for this batch runner here https://github.com/Princeton-CDH/simulating-risk/blob/main/simulatingrisk/hawkdovemulti/simrisk_batch.slurm

I set it up so I can use arrays of tasks to generate data for multiple runs of the same set of parameters and then combine the resulting csvs when doing data analysis.

I started wanting to do pairwise statistical testing on specific parameters, so I came up with named options for parameter combinations with smaller subsets - those sets of parameter combinations are small enough I can run locally, but it's nice to have an option to run it in an HPC environment.

I think that making the batch runner class-based with a few built in options or mixins, e.g. for generating parameter options and saving results, would make it easier to extend and customize.

@EwoutH EwoutH added the docs Release notes label label Sep 3, 2024
@EwoutH EwoutH modified the milestones: 3.0, v3.0 Sep 3, 2024
@EwoutH
Copy link
Member

EwoutH commented Sep 3, 2024

Having a spec for parameter ranges might help and be sort-of related to this issue:

Would like some good docs for this for 3.0.

@EwoutH EwoutH modified the milestones: v3.0, v3.1 Oct 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Release notes label
Projects
None yet
Development

No branches or pull requests

6 participants