First, I have to say that it was fun, but I'm glad the competition was over - it was one of the most intense competitions (at least for me). The final results surprised me, and I am very curious to see what approaches the other participants took, mainly because it was a non-typical estimation task, and without a school solution.
Specifically, I would love to hear from the following participants:
sarg:
Your results on the public and private set are almost identical - it's just amazing! I have participated in several competitions and have not seen such a match. Can you please specify how you chose the validation set, or what do you think made such a match (a small model perhaps?).
Saak:
You mentioned in one of the comments that you worked together with tak, and from what I have seen you focused on track 2 while he focused track 1. I guess you shared most of the insights on pre-processing, augmentations, etc. in training the models, but still this are 2 different tasks. Maybe you can detail how you actually did a different optimization for the solution of track 1 compared to track 2? I found no smarter way than setting y = 1 for y> 1, and saving checkpoints according to AUC score.
And of course the table leaders, TrueFit, Harelr, lyzi, Ido_ikar - I'm really interested to know what you did there, and what you think influenced the most.
Did you invested more in improving the data or designing the architectures? Do you have any tips for future competitions?
For my part, I can say that I referred to the signal as a "one-dimensional image", so I chose CNN (resnet 101). I divided the data with overlaps of 120 samples and chose each 30th window to be part of the validation set (of course I have removed the windows which overlap them from the training set). I made a number of simple augmentations and I was a bit of surprised when I realized that the one that made the best impact was a cyclic shift in time. The loss function was of course L1 (ordinal loss classification). I analyzed the data itself during the competition quite a bit, and came to the conclusion that the variance on the differences between the antennas is the most significant parameter (movement of people creates a multi-path in the room) but I didn’t managed to ceate a hand-craft-feature-based model that out performs my deep learning approach, so I stuck With CNN.
p.s.
Where did Bareket have gone? He had a quite good results during the competition...
Thanks z,
I worked on track 1 a bit but after teaming up with tak I focused mainly on 2. I tried to integrate both my own and tak's (#1) solution into my track 2 but could not find a way. I think that figuring out how to integrate the track 1 and 2 solutions would really add to the model.
You could make the model smarter than [ y = 1 if y > 1 ] if you used smart transformations but you are still stuck (in my case) with a model that never reaches the y = 3 answer. I tried very hard to tweak the model to also reach into 3 but had too little time left. Afterwards I hoped the host would allow me but of course that was not allowed (and should not be, but, if you saw that post, that is what it was about).
fwiw I think even just setting [ y = 2 if y > 2 ] would improve your model
Like you I wait for some insight from the table leaders but also from MAFAT. Not reveal any secrets, but comments such as "We saw a great shakeup because the models failed to xxxx" would really be helpful. Just share some experience or guidance and next time we can improve. This is often problem in a comp, you walk away cold afterwards without getting interaction or feedback to guide you and help you improve.
CNN here did not do well for me. My partner tak suggested somewhat similar approach as you at some stage. I tried with hand crafted CNN but it did not make same breakthrough I got from other approaches so I threw it away at some stage.
I also found the L-R to be very important. I did few initial throwaway experiments using simple logit regression to guide feature importance and was surprised how important L-R features were, so I later tried to find smart L-R features. Not sure it made big difference, but you had to use L-R somewhere. In my 1D CNN I used L-median(L), R-median(R) and L-R-median(L-R) e.g. to get some sort of non-linear version of L, R and L-R fwiw. I think you had to do deep distro math to get it right, this is just a practical stab at it.
Finally, really big thanks for sharing - I appreciate it.
Your window method sounds complex and smart. I think if you got that right you'd improve your solution a lot. You had to find a way to deal with the overlap without throwing away too much.
Also, these discussions are so helpful and I appreciate you sharing it. As I mention, would be great if host (here and elsewhere) participated in these types of discussions. Also congrats on your performance here.
Then some errata ... I meant [ y = 2 if y > 1 ] somewhere.