the 2nd Anomalous Diffusion (AnDi) Challenge Forum

Go back to competition Back to thread list Post in this thread

> Data generation

Hello, I have generated the data using the challenge_phenom_dataset funciton which states in the descriptions that default values for the various diffusion models have been set such as to be in the same ranges as the ones expected for the
challenge.

However, I end up with trajectories that are 500 frames long which is not correct regarding the details on the website. Is there a problem on the definition of the parameters ?

Posted by: eXpensia @ May 21, 2024, 4:20 p.m.

I also noticed that D is drown on a normal distribution of mean 1 and variance 0.01. And it looks the same as the starting kit dataset.

Posted by: eXpensia @ May 21, 2024, 4:37 p.m.

Hi, thanks for your questions. Here are some clarifications:

> I end up with trajectories that are 500 frames long which is not correct regarding the details on the website. Is there a problem on the definition of the parameters ?

In the paper we state that the challenge trajectories will be of maximum 500 frames (see. As we commented in another thread (FOV Boundary Effects), in the tutorials and current phases of the challenge we set T = 200 to ease the management of data. However, you can expect trajectories of up to 500 frames for the final challenge phase. As we say in the thread, if in doubt, follow what is said in the manuscript. Please, could you point us to where you saw the statement of T = 500 not being correct? Thanks!

>I also noticed that D is drown on a normal distribution of mean 1 and variance 0.01. And it looks the same as the starting kit dataset.

Yes, this is the "base" value for D. However, when we generate data for each experiment, we change that value at will. You can see an example of this in the section "Generating the ANDI 2 challenge dataset" in this tutorial: https://github.com/AnDiChallenge/andi_datasets/blob/master/source_nbs/tutorials/challenge_two_submission.ipynb

Good luck with your submissions!

Gorka

Posted by: gorka.munoz @ May 22, 2024, 8:12 a.m.

Thanks for your answer,

> Please, could you point us to where you saw the statement of T = 500 not being correct?

I saw that on the slide of the seminar : https://docs.google.com/presentation/d/1uAd-hYHdlZpx3v-PJHL0Z6DhaJzse06gz_2wOwnb1qY/edit#slide=id.g2b9091789c9_1_201 stating that the length of the trajectories are from 10 to 200 frames.

What I understand from the paper is that the for each experiment, each particle as an alpha and K drawn from a Gaussian with certain limits, [0,2] for alpha and [10^-12, 10^6] for K.
However we have no indication about how are chosen the average value and standard deviation, which is apparently picked by hand but not randomly.
I believe we should at least have the range over which these value can be chosen and how far apart the different 2 states can be for the different models.

Posted by: eXpensia @ May 22, 2024, 8:54 a.m.

Hi! Thanks, I have corrected the slide.

Let me clarify: for each experiment, we fix the mean and std for alpha and K for each state. The K and alpha for each particle is drawn from this distribution. Both parameters have the support you mention (alpha is in (0,2) and K is in [10^-12, 10^6]). The average of alpha and K can be any value within the limits. There is not a range for the standard deviation, it can take any positive value. Because we are sampling from a bounded Gaussian, even if the std is huge, there will never be a value beyond the limits of each parameter.

On the other hand, we make sure that two consequent segments of two different states have different enough parameters. However, the exact size of this spacing won't be share with the participants, and we recommend you to work on the basis that it is as minimal as possible.

Gorka

Posted by: gorka.munoz @ May 22, 2024, 9:38 a.m.

Thank you for the clarification !

There is still a mistake on the slide, the minimal trajectory should be 20 to match the paper.
Then, I have a few quick questions:
for each experiment of the final dataset, T should be put to T=500 to match the paper and remain constant (as stated in appendix A).
Do the FOV = 128 pixels and L=1.5*FOV change over the different experiments?
Do the other model-dependent parameters changes? (r, Pb, and Pu for model 3, T for the model (2 different variables with the same letter btw) for exemple)
If yes, on which range ?

Thanks

Posted by: eXpensia @ May 22, 2024, 12:12 p.m.

Hi, thanks for spotting the typos.

> Do the FOV = 128 pixels and L=1.5*FOV change over the different experiments?

The FOV will always be 128. The L can be tuned to ensure that no boundary effects affect the trajectories.

> Do the other model-dependent parameters changes? (r, Pb, and Pu for model 3, T for the model (2 different variables with the same letter btw) for exemple)
If yes, on which range ?

All parameters change from experiment to experiment. The values are not disclosed because they directly relate to the population amplitudes of the Ensemble task. Pb, Pu and T are probabilities, so they range from 0 to 1. r is unbounded.

Gorka

Posted by: gorka.munoz @ May 22, 2024, 12:37 p.m.

If I understand correctly, we need to generate a training dataset with more than 10 parameters, each varying within certain undisclosed ranges that we have to guess. I'm not asking for the exact values, but rather the ranges of variation. Additionally, certain models have different modes, such as fixed and changing compartment centers, which don't seem to be documented in the paper.
For fairness between teams, could we have a function that randomly generates data, selecting each parameter over the ranges that will be used in the final model evaluations?

Posted by: eXpensia @ May 22, 2024, 7:14 p.m.

Hi! Thanks for the feedback. Let me clarify some points. The goal of the challenge is to have tools that characterize motion changes in trajectories. In that sense, the underlying models are just a way to generate realistic trajectories that produce those motion changes. We are not looking for methods that overfit to our proposed models but rather ones that can generalize to experimental data, where the groundtruth parameter are completely unknown.

Please, if you have a question about a particular parameter range that you believe is important, let us know and we will be happy clarifying it.

We recommend you to create datasets as variations of the starting kit. You can use the pipeline proposed in the tutorial to then tune the parameters. While alpha and K should be changed at will to cover a wide span of values, we recommend to use slight variations of the ones given (e.g. M, Pu, Pb, transmittance, radius and numbers of traps / compartments...). These have been carefully chosen to have realistic population distribution, i.e. prevent that motion changes happen to fast or for instance avoiding cases in which a particle never reaches a trap in the trapping model. However, it is completely up to you change them at will, to test your model against challenging cases, as for instance very short segments of K and alpha very close to each other.

Posted by: gorka.munoz @ May 23, 2024, 6:31 a.m.

Thanks for all the clarifications !

Posted by: eXpensia @ May 24, 2024, 8:42 a.m.

Hi! We realized that the sentence in the Appendix stating that T = 500 is actually a typo. In the rest of the main text and appendix we have the correct value T = 200. We will update the manuscript. Sorry for the misunderstanding.

Posted by: gorka.munoz @ June 14, 2024, 6:52 a.m.

Thanks for the information !

Posted by: eXpensia @ June 14, 2024, 9:50 a.m.

"Hi! We realized that the sentence in the Appendix stating that T = 500 is actually a typo. In the rest of the main text and appendix we have the correct value T = 200"

Does this mean the trajectories in the final challenge will be of max length 200?

Posted by: SolomonAsghar @ June 29, 2024, 11:40 a.m.

Hi, yes, exactly.

Posted by: gorka.munoz @ June 29, 2024, 6:48 p.m.
Post in this thread