the 2nd Anomalous Diffusion (AnDi) Challenge Forum

Go back to competition Back to thread list Post in this thread

> FOV Boundary Effects

Hi, just a bit confused about simulating data for training. The FOV of size L has reflecting boundary conditions as I understand. If we choose a K value of 10e6 for the single state and alpha = 1.99 and L = 1.5 * 128 (similar to the notebook), I see tracks that are maybe "bouncing" inside the boundary box? (Maybe this is a plotting issue?)
But if I leave L out, the trajectories generated are much straighter as expected from the alpha value used.
Particularly I am wondering if there are boundary effects as this would affect the time series generated. The code is as below:

With L :

traj, labels = models_phenom().single_state(N=100,
T=200,
L=1.5*128,
Ds=10e6,
alphas=1.99)
plot_trajs(traj, L, N, num_to_plot = 4)
plt.show()

as compared to with L commented out:

traj, labels = models_phenom().single_state(N=100,
T=200,
# L=1.5*128,
Ds=10e6,
alphas=1.99)
plot_trajs(traj, L, N, num_to_plot = 4)
plt.show()

The idea here is that I would like to generate training data from a range of parameters for my model to train on and as I understand boundary effects should not affect the tracks (?). Are we meant to scale L as K increases to avoid this?

Posted by: selfschuk @ Feb. 22, 2024, 3:13 p.m.

Hi! Thanks for your question. The functions in models_phenom are "base functions" used to create the experiment, which in this case have reflecting boundaries. To avoid boundary effects, we consider Fields of View (FOVs) of the experiment, which are usually much smaller than the whole experiment.

In short, to generate data that does not have (or has minimal) boundary effects, we do the following:

1. Generate data with models_phenom.xxxx with e.g. L = 500
2. Get a FOV from that experiment with size FOV = 128

The FOVs are placed at a certain distance of the boundary to avoid the boundary effects. If you are interested in how we do that, here is the documentation:
https://andichallenge.github.io/andi_datasets/lib_nbs/utils_trajectories.html#inside_fov_dataset

However, what we propose, instead of generating data wiht models_phenom module, is to use the datasets_challenge module, and in particular the function datasets_challenge.challenge_phenom_dataset. This function has as input the properties of the models you want to generate and handles automatically the creation of FOVs. You can see how to do that in the section "Generating the ANDI 2 challenge dataset" from the tutorial challenge_two_submission.ipynb.

On the other hand, we will also tune the parameters such as to avoid "weird" things happening (e.g. K = 1e6 and alpha = 2). In the previous tutorial we also talk about typical / default values for different parameters such as L and K (see that K is refer to as D in the library).

Hopefully that was useful. Feel free to comeback to this thread if you have further questions.

Posted by: gorka.munoz @ Feb. 22, 2024, 4:18 p.m.

Hi, I generated data using the datasets_challenge.challenge_phenom_dataset module using the default settings.

Few things I am still confused about. How are the values in the default setting of challenge_phenom_dataset and also public validation data in agreement with the constraints set out in the manuscript (for example the tutorial sets length to 200 and the tracks seem to be max 200 in the validation data although the manuscript says recording time is T=500 so should we not see tracks much longer than 200 frames). I haven't looked through the entire dataset yet but couldn't find any.

Also still a bit confused about the range of K (or D). The manuscript and models_phenom function have min max of D as 10e-12 to 10e6. Is the entire range not considered for the challenge?
Not sure if the challenge datasets are simply obtained from running the default parameters (since you said the challenge_phenom_dataset function prevents weird things like alpha=2 and D=10e6)? So why the large constraints in the manuscript? Should the datasets not include D values that high?

Thanks

Posted by: selfschuk @ May 3, 2024, 4:13 p.m.

Hi! Indeed the tutorial and the validation dataset have trajectories of maximum 200. This was set to ease the management and processing of data by the participants during the first stages of the challenge. Note that in the tutorial we overwrite the default value of T = 500. As a general rule, you can always work on the basis that what is on the paper is what we will use for the final stage of the challenge (the Challenge phase). Hence we recommend that you prepare your methods for those scenarios.

For K and alpha, the entire range is considered for the challenge datasets in all three phases. The default values have been set to generate typical trajectories of relevant biophysical scenarios, mostly as examples.

Posted by: gorka.munoz @ May 6, 2024, 7:03 a.m.

Thanks a lot! Finally, is the size of the bounding box taken to be fixed at 230 pixels and FOV at 128 pixels for all datasets used in the challenge and validation? (Say if a particle escapes the FOV and the source code changes these to nans, such that they are removed by "segs_inside_fov", i assume we do not increase the FOV size to include these as this is fixed throughout the entire challenge?

Posted by: selfschuk @ May 16, 2024, 9:04 p.m.
Post in this thread