NeurIPS 2022 CausalML Challenge: Causal Insights for Learning Paths in Education Forum

Go back to competition Back to thread list Post in this thread

> Task3: scoring metric on public/private leaderboards

Hi,

Just saw your reply on https://codalab.lisn.upsaclay.fr/forums/5626/963/, and want to make sure we are on the same page:

For task3, are the public and private scores also different? If yes, then how do you define a "partial results" for scoring on public?
Is it to clip and score only on a sub-block of the adjacency matrix? Or is it to show only e.g., the precision, instead of the f-score?
In other words, as you have mentioned, the public score '0' (or e.g., *e-06) is "the **actual score** multiplied by a small constant". Then, what's the definition of that "actual score"?

Thanks and regards!

Posted by: overflow @ Oct. 25, 2022, 5:11 p.m.

Hi there,
it is the former "Is it to clip and score only on a sub-block of the adjacency matrix?" - ie - we are masking regions in the adjacency matrix and only evaluate these ones. The private leaderboard contains edges that we previously did not evaluate in the public leaderboard.

Furthermore, you are correct that the displayed result is `score = F1_score(submission, ground_truth, mask) * tiny_constant`.

Posted by: pawni @ Oct. 25, 2022, 5:17 p.m.

Hi,

Thanks pawni for your quick response!
Just to make sure: here `mask` is a block diagonal matrix (so that it's evaluating on a subgraph), instead of a random mask, right?
If yes, then is it possible to know exactly the nodes set of this subgraph (so that we use these nodes as training set and the rest as unknown test set)?
Or, I guess participants will al least need to know the size of the subgraph, so they can have an estimation on how much this public score would deviates from the final one.

Big thanks!

Posted by: overflow @ Oct. 25, 2022, 5:28 p.m.

To be precise, we are actually only evaluating the existence of certain edges (so think of it as a random mask). The number of edges that are being evaluated is only marginally different (~+50%).

This is because we could only verify a subset of the edges through experimentation on the online learning platform. Nevertheless, we would like the predictions to be reasonable for all edges as we will have a domain expert look at the winning entries.

Posted by: pawni @ Oct. 27, 2022, 1:33 p.m.
Post in this thread