CodaLab -

> Beware: fvcore's FlopCountAnalysis severely undercounts FLOPs

To other participants:
Please be aware that the 'FlopCountAnalysis' function used in the provided example code is not reliable.
- Activation functions and trigonometric functions are not counted.
- Element-wise operations such as addition and multiplication are not counted.
- Bias in linear and convolution layers is not counted.
- Many functions in torch.nn.functional are not counted including 'scaled_dot_product_attention' and 'cosine_similarity'.

This example illustrates it quite well: https://gist.github.com/SimonLarsen/0f79127a02f29ad44ed2a5153cadfac4
Both models are reported to use 1.812 GFLOPs, but in reality the second one exceeds 0.5 TFLOPs.

For a more realistic example, I have trained a model for this competition that I have manually estimated to around 3.93 TFLOPs. However, fvcore estimates this model to just 1.90 TFLOPs.

To the organizers:
How will the FLOP count be evaluated for submitted entries?
Manual calculation is of course quite difficult, but automated tools are very imprecise and invites cheating (intentional or not).

Posted by: SimonLarsen @ March 1, 2025, 3:49 p.m.

I completely agree and hope the organizers can update how computation constraints are calculated. Otherwise, it's unfair since someone could avoid FlopCountAnalysis calculations using some tricks.

Posted by: fire @ March 3, 2025, 5:55 a.m.

We thank the participant for reporting the issue. We have reviewed this issue and discussed it internally.
Finally, we decided to keep this method since the limitations does not only include the FLOPs but also the number of parameters.

If there are any other issues, please let us know.

Posted by: Sangmin-Lee @ March 4, 2025, 8:07 a.m.

Thank you for the clarification. However, I'm still slightly unsure what the implication of this is.

My best solution currently on the leader board uses 3.95 TFLOPs by my own calculations and 1.89 according to fvcore. The number of parameters is only ~19 million because FLOPs is the limiting factor.
I could most likely improve the solution by scaling up to the number of parameters but then I would knowingly violate the 4 TFLOP limit. Would that still be considered a valid entry as long as fvcore estimates it below 4T?

Posted by: SimonLarsen @ March 13, 2025, 12:43 p.m.

To ensure a fair comparison between participants, the criterion of FLOPs is measured by the code we uploaded on GitHub. (Specifically, test_demo.py)

Posted by: Sangmin-Lee @ March 14, 2025, 1:35 a.m.

Post in this thread

Forums

NTIRE 2025 Efficient Burst HDR and Restoration Forum

> Beware: fvcore's FlopCountAnalysis severely undercounts FLOPs