> Reinforcement Learning

The competition rules state that the model can run for 10 minutes before it generates the first action. Is there any way that our model can run on the new data set before the 10 minute clock starts?

Posted by: james.maher @ Jan. 28, 2023, 12:16 a.m.

@james.maher, thank you for asking about this.
I just wanted to make sure I understand the question. When you refer to running on the new data set, does this mean that the algorithm can see the current scenario that is being evaluated before the episode starts (e.g. for training purposes)?
If do, the answer is no. The first time the algorithm "sees" the new scenario for the upcoming episode, it has at most 10 minutes before the first action is required. The algorithm cannot see the upcoming scenario before the 10 minute clock starts.
I'd be happy to elaborate if this is still unclear. Thanks again for your interest!

Posted by: abeckus @ Jan. 30, 2023, 4:41 p.m.

Yes, I was trying to see if there is anyway to train on these scenarios offline. It sounds like the algorithm has to be general enough to work on any solution space with a max of 10 minutes to train. This seems to eliminate reinforcement learning as a possibility for these scenarios.

Posted by: james.maher @ Jan. 31, 2023, 3:31 a.m.

We agree, this task requires that the algorithm be quite general, and be able to handle any map in the solution space without time to train on that particular map. The only option to train ahead of time is to generate a training set consisting of randomly generate maps. We provide the generator code to do this (see https://airliftchallenge.com/chapters/ch5_gen/main.html), but there will be a lot of variation in the maps. We realize this will limit the types of algorithms, especially given the time frame of the competition.

We do hope to hold future iterations of this competition. One idea we've considered is to use a particular map/network, and then have episodes centered on that map. Then, when the 10 minute timer starts, the algorithm would already be trained on the map, and could just consider the cargo, schedules, and airplanes specific to that scenario.

If you have further suggestions for how this could be structured to support reinforcement learning, and would be willing to share, we're appreciative of any feedback...

Posted by: abeckus @ Jan. 31, 2023, 5:17 p.m.

I'm going to attempt to build a reinforcement learning model with the data I have available for this scenario and we'll see how well it performs.

My recommendations are to stabilize the map. In a real-world scenario, I would not expect the number of airports and the location of airports to change. By stabilizing the map, I could use deep Q-learning to estimate a discrete number of states. I would also allow for longer training times. You can make these realistic, i.e., 8 hours until the first sortie needs to be generated.

Thanks!

Posted by: james.maher @ Feb. 1, 2023, 9:21 p.m.

Great, thank you for giving it a shot.

And thanks a lot for the feedback!

Posted by: abeckus @ Feb. 3, 2023, 4:49 p.m.
Post in this thread