Causal Structure Learning from Event Sequences and Prior Knowledge

Organized by noahlabcausal - Current server time: May 13, 2025, 3:43 p.m. UTC

First phase

Phase 1
Aug. 15, 2023, noon UTC

End

Competition Ends
Nov. 5, 2023, noon UTC

Competition Overview

In this competition, the goal is to solve a causal structure learning problem in AIOps (Artificial Intelligence for IT Operations). In telecommunication networks, anomalies are commonly identified through alarms. The network operators might be facing millions of alarms per day due to the large scale and the interrelated structure of the network, as a single fault in the network can trigger a flood of various types of alarms on multiple connected devices. The goal of the operators is to quickly localize the failure point to facilitate a fast repair and recovery. However, to handle all these alarms is exhausting and can quickly overwhelm the operators, and hence it must be done in an intelligent. Recently, there has been increasing interest in tackling the above root cause analysis (RCA) problem from a causal perspective, i.e., learning a causal graph that represents alarm relations and then using decision-making techniques (such as causal effect estimation and counterfactual inference) to efficiently identify the root cause alarm when a fault occurs. A typical RCA solution for the telecommunication network is depicted in Figure 1.

The competition task can be described as follows: Given a series of datasets, for each dataset, participants are supposed to use the historical alarm data, device topology, and prior knowledge (if available) to learn a causal graph for the involved alarm types. Each learned causal graph is represented by a binary adjacency matrix, where the element in the i-th row and j-th column of the matrix equals 1 (0) means the existence (resp. non-existence) of a directed edge from the alarm type i to alarm type j. The ground truth for these causal graphs, i.e. true causal graphs, are labeled manually by experts or, for the synthetic datasets, the pre-set causal assumptions. Please note that all true causal graphs will not be public during the competition. Besides, we recommend competitors design a unified learning solution(algorithm) for handling all datasets. While it’s not mandatory, the generalization of the submitted solution(algorithm) will be an important aspect of evaluating the novelty and will affect the final ranking.

Figure 1

Figure 1:RCA solution in a telecom network

Evaluation

We evaluate the submitted causal graphs using the metric that we call g-score, which is defined based on real-world requirements and is used internally at Huawei. We want to identify more true causal relations and less false causal relations while being relatively tolerant of being unable to find some of the true causal relations (false negatives). This is a rational setting as the data limit cannot guarantee all causal relations to be founded from data, especially in just partially observed real-world scenarios. The definition of the g-score for an estimated causal graph is as follows:

  • TP (True Positive): A directed edge estimated with correct direction.
  • FP (False Positive): A directed edge that is in estimated graph but not in the true graph.
  • FN (False Negative): A directed edge that is not in estimated graph but in the true graph.

Based on the above definition, the corresponding ranking score of a submission will be evaluated as follows:

where K is the number of datasets. The maximum rank-score is 1.

Rules and Engagement

  1. This competition is supported by the Huawei Noah’s Ark Lab, which assists the competition’s execution and is responsible for the award’s disbursement to the competition winners.
  2. Participant Conditions: This competition is public, but the competition committee approves each user’s request to participate and may elect to disallow participation according to its own considerations. For example, The organizers, their students, close family members(parents, spouse or children) , as well as any person having had access to the truth values or to any information about the data or the competition design giving him (or her) an unfair advantage are excluded from participation.
  3. Users: Each participant must submit their results or algorithm for the competition on CodaLab platform.
  4. If you are entering as a representative of a company, educational institution, or other legal entity, or on behalf of your employer, these rules are binding for you individually and/or for the entity you represent or are an employee of. If you are acting within the scope of your employment as an employee, contractor, or agent of another party, you affirm that such party has full knowledge of your actions and has consented thereof, including your potential receipt of a prize. You further affirm that your actions do not violate your employer’s or entity’s policies and procedures.
  5. Teams: Each participant must join one and only one team. The maximum number of participants in each team is up to 5. Team formation requests will not be permitted after the date specified on the competition website. The total submissions of all the participants in a team must be less than or equal to the maximum number allowed for a team.
  6. Team mergers are allowed and can be performed by the team leader. Team merger requests will not be permitted after the “Team mergers deadline” if such a deadline is listed on the competition website. In order to merge, the combined team must have a total submission count less than or equal to the maximum allowed for a single team. The organizers don’t provide any assistance regarding team mergers.
  7. External data: It’s forbidden to use a new dataset other than the software and data provided by the competition organizer to develop and test your algorithm and submissions.
  8. Upon being awarded a prize:
    • The prize winner must agree to submit and deliver a technical presentation of their solution to the competition organizer.
    • The prize winner must deliver to the competition organizer the software and data created for the purpose of the competition and used to generate the winning submission and associated documentation written in English. The delivered software and data must be capable of regenerating the winning submission and contain a description of the resources required to build and run the regenerated submission successfully. The prize winner shall sign and return all prize acceptance documents requested by the competition organizer.
  9. If a team wins a monetary prize, the competition organizer will allocate the prize money in even shares between team members unless the team members unanimously contact the competition organizer to request an alternative prize distribution within three business days of the submission deadline.
  10. Competition Integrity Guidelines: In our commitment to maintaining competition integrity and deterring cheating via fraudulent accounts, participants with notably high leaderboard scores will be contacted. We'll request code files for result replication. If the provided code fails to reproduce the leaderboard score or if code files aren't received within 24 hours of notification, the respective score will be voided and a warning issued. Accumulating two warnings will lead to disqualification from participating in this competition. 
  11. Team Formation Guidelines: During Phase 1, participants enjoy the freedom to create teams, limited to a maximum of 5 members per team. Throughout this team formation phase, each participant (account) can independently submit their results. Once Phase 1 concludes, the winner team leaders are required to provide the roster of their team members. Subsequently, in the remaining competition duration, only submissions from team leaders will be accepted, with submissions from other team members prohibited.

These rules are an initial set, and we require participants to consent to a change of rules if there is an urgent need during the competition. If a situation should arise that was not anticipated, we will implement a fair solution, ideally using consensus of participants.

Dataset

This competition includes two types of datasets: artificial datasets and real-world datasets, in which the real-world datasets are collected from a telecommunication network, while the artificial datasets are generated by our internal data simulators which are designed using domain expertise. We plan to divide the competition into two phases and provide a total of six datasets over the entire competition, in which four of the datasets will be released in the first phase and the final two are appended in the second (final) phase. The assignment of the datasets are shown in Table 1

Phase No.Dataset
Phase 1 3 simulation datasets + 1 real datasets
Phase 2 1 simulation datasets + 1 real datasets

Dataset information given to the competition participants If you download the datasets from our competition site, you’ll find that K datasets are stored in separated directories named from 1 to K, and each dataset fully or partially includes the following data files:

alarm.csv: Historical alarm data

  • Format: [alarm_id, device_id, start_timestamp, end_timestamp]
  • Description: In the alarm data file we provide historic alarm information. Each row denotes an alarm record which contains the alarm ID (i.e., the alarm type), the device where the alarm occurred, the start timestamp, and the end timestamp. For privacy, every alarm id is encoded to an integer number starting from 0 to N-1, where N is the number of the alarm types. Each device ID is likewise encoded to an integer number starting from 0 to M-1, where M is the number of the devices.
  • Example:
    alarm_iddevice_idstart_timestampend_timestamp
    2 28 30684 32416
    10 28 30684 30867
    13 32 30795 32668
    0 35 32215 32867

topology.npy (Optional): The connections between devices .

  • Format: an M ×M NumPy array, with M being the number of the devices in the network.
  • Description: This NumPy file stores the binary symmetric adjacency matrix for the network topology which is an undirected graph. For example, the element which is in the i-th row and j-th column of the matrix equals 1 (0) means the existence (resp. non-existence) of an undirected link between the device i and the device j.

causal_prior.npy (Optional): Prior knowledge indicating definite causal relation information.

  • Format: An N × N NumPy array, where N is the number of the alarm types.
  • Description: Similar to the topology, causal_prior.npy stores an adjacency matrix for partially representing the true causal alarm graph. The prior information is labeled manually by experts or, for the synthetic datasets, the pre-set causal assumptions. The element in the i-th row and j-th column of the matrix equals 1 (0/-1), which means the existence (resp. non-existence or Uncertain) of a directed edge from the alarm type i to alarm type j.
  • Example: N = 3, see Figure 2
    Figure 2: Causal Prior

rca_prior.npy (Optional): Prior knowledge including some simplified fault snapshots and the corresponding RCA results.

  • Format: [simpified_snapshot, simplified_root_cause]
  • Description: In the real-world RCA scenario, a fault snapshot contains detailed information on the network state (a series of alarms with occurrence time and occurrence location information) within the period a fault occurs, while simplified fault snapshots extract or compress the corresponding network state to an alarm type list ignoring the occurrence time/location and sequential information, and the corresponding RCA result is as well. Due to considering knowledge reusability, the simplified snapshots along with simplified RCA results (also can be regarded as RCA rules) are a common way to store considerable raw RCA cases in the AIOps field.
  • Example: see Figure 3
    Figure 3: RCA Prior
  • Notes:
  1. Each RCA prior record contains a single root cause.
  2. In a snapshot, multiple independent causal alarm chains may exist, resulting in multiple root cause alarm types. However, we select the root cause alarm type that influences the largest number of other alarm types to be the sole root cause in the corresponding RCA prior record. For instance, in a snapshot, consider two independent causal chains: (1) A->B->C, B->D; (2) E->F->G. In this case, as the alarm type 'A'  might influences more other alarm types than alarm type 'E', then the 'simplified_root_cause' of 'simplified_snapshot = (A, B, C, D, E, F, G)' is 'A'.

It’s essential to note that each dataset is causally independent of others, hence it’s not suitable to do any information exchange among these datasets when executing causal discovery tasks.

Organizers

The organizing team consists of experts with a range of different backgrounds from industry or academia.

  • Keli Zhang (Principal Engineer, Huawei Noah’Ark Lab)
  • Ruichu Cai (Full Professor, Guangdong University of Technology, Google Scholar)
  • Kun Kuang (Associate Professor, Zhejiang University, Associate Professor, Homepage)
  • Jiale Zheng (Senior Engineer,  Huawei Noah’Ark Lab)
  • Marcus Kalander (Senior Engineer, Huawei Noah’Ark Lab)
  • Junjian Ye (Senior Engineer, Huawei Noah’Ark Lab)
  • University College London)
  • (Principal Researcher, Huawei Noah’s Ark Lab)
  • Lujia Pan (Expert,  Huawei Noah’Ark Lab)

Contact

noahlabcausal@huawei.com

Awards

Our competition will provide cash prizes and electrical certificates for winners. The total prize amount (USD) is $10,000.

  • 1st place: $3,000.
  • 2nd and 3rd place: $2,000.
  • 4th, 5th and 6th place: $1000.

Schedule and readiness

  • August 1, 2023: Competition opens.
  • August 8, 2023: Release the sample dataset and demonstrative code
  • August 15, 2023 (updated) : The phase 1 starts and submission systems open.
  • October 7 , 2023: Registration and team formation ends.
  • October 8, 2023:  The phase 1 ends and the submission systems close.
  • October 16, 2023: The phase 2 starts and submission systems open.
  • November 5, 2023: Phase 2 ends.
  • November 11 - November 21 Material Review Stage
  • November 22, 2023: Winning teams are announced.

Notices

  • [11/23] Winners Announced: We are delighted to announce the final winners of the NeuriPS 2023 CSL competition: Winners! Congratulations to all the winning teams, and a heartfelt thank you to every participant for your valuable contributions. In light of your exceptional achievements, we kindly request that each winning team prepares to share their solutions at the NeuriPS 2023 competition workshop scheduled for December 15th. Detailed guidelines for the solution presentations will be released on November 28th.

  • [11/6] Material Review Stage: Congratulations to all participants who have successfully advanced to Phase 2 and have now reached the final material review stage. In order to ensure a smooth and efficient process, we kindly request the following materials: Material List and Specifications;  To further aid your understanding of our final ranking score calculation rules, we have provided a comprehensive guide outlining the detailed scoring rules for the final phase:Final Ranking Calculation Rule;  Submission Deadline:  Please submit the requested materials to noahlabcausal@huawei.com for review no later than 23:59 (AoE) on November 11. It is imperative that all candidates meet this deadline to maintain their eligibility for participation in the final ranking. Failure to do so may result in the forfeiture of your qualification. We sincerely appreciate your dedication and continued participation in this competition. Your hard work is greatly valued. We eagerly anticipate receiving your materials and extend our best wishes for success in the final phase of the competition!
  • [10/28] Cheating Acts: We emphasize that competition integrity is our top priority.  Any confirmed cheating acts, such as attempting to improve your ranking through multiple accounts, will result in disqualification from the next stage. Please monitor your provided competition email regularly during Phase 2 as we may request code files for result replication and seek clarifications related to potential cheating. We appreciate your understanding and commitment to upholding competition integrity.
  • [10/28] Extended Phase 2 End Date: The Phase 2 end date has been extended to November 5th. This extension allows us to ensure a fair and comprehensive competition.  On November 3rd, we will release detailed rules for the final material review. The final ranking will consider both the Phase 2 leaderboard scores and the results of the final material review, which will evaluate code replication and generalization of solutions. To ensure fairness, we will use a private internal test dataset to validate the generalization of final solutions. High leaderboard scores alone won't guarantee superiority.
  • [10/17] Dataset Update & Improved RCA Prior Description: For dataset ID: 6, we have deleted the 'topology.npy' file and added the 'causal_prior.npy' file in response to the issue raised in the forum (Link). Participants who have qualified for Phase 2 can now re-download the Phase 2 dataset  from the 'Participate/Datasets' page.  Additionally, to improve participants' utilization of RCA prior knowledge, we have enhanced the description (please visit 'Learn the Details/Dataset' page) to provide a more comprehensive understanding of this prior type.
  • [10/17] Submission System 'BadZipFile' Issue Resolved: We've successfully addressed the 'BadZipFile' problem in our submission system, originating from Codalab, which may have affected result submissions. Normal system operations have now resumed. We apologize for any inconvenience caused.
  • [10/16] Phase 2 Datasets Released: In Phase 2, we will calculate ranking scores based on three distinct datasets(Link). These datasets comprise the real-world dataset used in Phase 1 (dataset ID: 4) in addition to two new datasets, one artificial and one real-world.  The submission format for Phase 2 remains consistent with that of Phase 1. It's essential to note that the ultimate winners of the CSL competition will be determined based on the ranking score on the Phase 2 leaderboard and the final material review. Further details regarding the final material review will be provided prior to the conclusion of Phase 2.
  • [10/13] Congratulations to the teams that have successfully qualified for Phase 2.  You can access the final list of qualifying teams by following this link:  Phase2 Finalists.  If you have any questions regarding the finalists or other suggestions, please don't hesitate to contact us at noahlabcausal@huawei.com.
  • [10/09] Congratulations to the participants who have qualified as candidates for Phase 2 : Candidate List(Link).  To assist the organizers in finalizing the participant list for Phase 2,  we kindly request the following materials: Material Review(Link). Candidates must submit the requested materials to noahlabcausal@huawei.com for review by 23:59 (AoE) on  October 11. Please note that candidates who fail to meet this deadline will forfeit their qualification to participate in the second phase of the competition. The final list of qualified participants for Phase 2 will be announced on October 13.
  • [09/28]  First Stage Deadline Extended: In light of a one-week delay in the initial stage's commencement, the competition organizing committee is extending the deadline for the first stage to October 8th. Qualification for Phase 2 will be determined based on participants' latest leaderboard scores as of October 8th. It's essential to note that all participants with a leaderboard g-score exceeding 0.5 on October 8th will be considered eligible for the next stage. On October 9th, we will be officially announcing the list of candidates who have advanced to the second stage of the competition. For these participants, it is mandatory that they submit their code for the executable, reproducible solution, specify their affiliated organization, and provide teaming information. These details will serve as critical factors in the final selection process for the second stage.

  • [08/24] Team Formation Guidelines: During Phase 1, participants enjoy the freedom to create teams, limited to a maximum of 5 members per team. Throughout this team formation phase, each participant (account) can independently submit their results. Once Phase 1 concludes, the winner team leaders are required to provide the roster of their team members. Subsequently, in the remaining competition duration, only submissions from team leaders will be accepted, with submissions from other team members prohibited.
  • [08/17] Competition Integrity Guidelines (Important) : In our commitment to maintaining competition integrity and deterring cheating via fraudulent accounts, participants with notably high leaderboard scores will be contacted. We'll request code files for result replication. If the provided code fails to reproduce the leaderboard score or if code files aren't received within 24 hours of notification, the respective score will be voided and a warning issued. Accumulating two warnings will lead to disqualification from participating in this competition. We appreciate your understanding!
  • [08/16] Leaderboard Update: The leaderboard has returned to normal. You can proceed with submitting your results. We appreciate your patience.
  • [08/16] Leaderboard Display Issue: Regrettably, our leaderboard is currently experiencing display problems. Our team is diligently resolving this to ensure accuracy. Please hold off on result submissions until the issue is resolved.
  • [08/15] Release of Phase 1 Datasets: We're excited to announce the availability of datasets for Phase 1. Instructions for the submission process can be found in the provided notebook file or starter kit.

  • [08/08] Phase 1 Start Date Update: Due to data privacy policies, the start date for Phase 1 has been rescheduled. To support participants in their preparations, a sample dataset is now accessible for reference.

Phase 1

Start: Aug. 15, 2023, noon

Description: Phase1: create models and submit them or directly submit results on validation and/or test data; feed-back are provided on the validation set only.

Phase 2

Start: Oct. 16, 2023, 1:15 p.m.

Description: Final phase : The ultimate winners will be determined based on the ranking score on the Phase 2 leaderboard and the final material review !

Competition Ends

Nov. 5, 2023, noon

You must be logged in to participate in competitions.

Sign In