Reinforcement learning shows great potential in decision-making. To promote reinforcement learning real-world applications, Jiangsu Association of Artificial Intelligence together with Polixir initialize this competition that aims at learning the best decision-making policies from historical data. Participants are called to solve the best decision plan in the business scenario described below. We hope this competition could accelerate the development of reinforcement learning.
Commodity promotions are fundamental sales activities. Maximizing profit is a direct purpose for designing promotion plans. Meanwhile, it is equally important to maintain a healthy market with no customer discrimination. Artificial intelligence techniques are expected useful to achieve this goal, providing effective and fair plans to stimulate customer demands. However, due to the nonstationarity of the market environment and the high uncertainty of customer behaviors, promotion activities are often accompanied by varying customer feedback. Designing optimal promotion plans is also challenging for the current AI techniques.
The competition provides a high-fidelity promotion simulation environment derived from a real business scenario. However, testing plans in the real-world environment is often expensive. Thus the participants are asked to learn from the historical interactive data between the promotion activities and customer behaviors, which are sampled from the simulation environment. A submitted promotion policy is required to be fair to all customers, i.e., giving the same discount rates for everyone. The policy will be tested by interacting with the customers in the simulation environment. The evaluation score is determined from the customers’ feedback.
This competition is world-wide open. Participants from universities, research institutions, enterprise operators, and any other personnel are welcome to sign up for the competition.
Note: Employees of the competition organizers who have access to the background, data and simulation of the competition shall not participate in the competition.
Email: offlinerl@polixir.ai
Jiangsu Association of Artificial Intelligence (http://www.jsai.org.cn)
Polixir Technologies, Co. Ltd. (https://polixir.ai)
1. Registration: login to the competition website and fill up the personal information registration.
2. A team consists of at most 5 members. One participant can only represent one team.
IMPORTANT FOR TEAMS: Due to the limitations of Codalab system, each team please click here to register the team information, and the staff will set up the team according to the information and turn off the submission priviledges of the other team members. Results can only be submitted by the team leader.
3. Participants need to ensure that the registered personal information is complete and true. The organizing committee has the right to cancel the qualification of teams with fake information.
4. The registration deadline is 00:00 AM (UTC+8), Febrary 11, 2022.
5. Offline data is available to download since 10:00 (UTC+8), December 25, 2021. The development phase starts at the same time.
6. The development phase ends at 00:00 (UTC+8), February 11, 2022. The latest uploaded policy before 00:00 (UTC+8), February 11, 2022, will be used to rank all the teams. The organizers will rank the scores according to the results in the test environment, and announce the preliminary results and qualifying list of the final phase at 15:00 (UTC+8), February 11, 2022. The top 30% teams will enter the final phase.
1. Teams entering the final phase can download the newly provided larger user data from the competition platform.
2. The final phase ends at 00:00 (UTC+8), February 27, 2022. The uploaded policy at "final submission" phase (opened between 00:00 (UTC + 8), February 27, 2022 and 00:00 (UTC + 8), February 28, 2022) will be tested in the test environment. The teams will be ranked according to the test performance, and the results will be announced at 15:00 (UTC+8), February 28, 2022.
1. All participants need to register in the management system.
2. In the competition system, participants can form up in teams. A team needs to appoint a team leader and should have no more than 5 members. Team name should not exceed 15 characters.
3. Each participant can only join one team. Registering multiple accounts to join multiple teams by one participant will result in disqualification of all relevant teams.
4. The competition allows all kinds of techniques to solve the problems.
5. No external data shall be used.
6. Each team has five chances to submit their policy for evaluation per day. The file size of each submission should not exceed 200MB. The running time has a timeout limit of 30 minutes.
7. The competition organizer reserves the right to update the competition schedule and rules as it deems necessary.
The winning teams need to provide technical reports to explain their solutions, training codes, and policy models. Teams fail to provide the materials will be disqualified for the award.
Top 5 teams in the final phase will be award with certificates, trophies and gifts.
A company is transforming its promotion strategy from personalized to customer-equal promotion. Previously, as shown in Figure 1, different promotion discounts are delivered to different customers. The historical data was collected in this process. Now for more healthy marketing, the company needs a customer-equal incentive promotion strategy, which is depicted in Figure 2.
Figure 1. The data flow of personalized promotion. Different discounts aregiven to different customers. The offline data comes from the interactions between the promotion activities and customer behaviors, during a two-month long period.
Figure 2. The data flow of the fair promotion. A same discount is given to all customers.
The offline data is collected in a two-month long period, including the personalized discount promotions issued to each customer as well as the corresponding feedback from the customers. The data is provided in a CSV format file. In the development phase, the offline data is collected from 1,000 virtual customers. In the final phase, the offline data of 10,000 virtual customers will be provided.
Note: The data belongs to the organizer. Participants should use the data only for the purpose of this competition.
The data is provided in a CSV file. The columns are explained below:
1、[index]: customer ID.
2、[day_deliver_coupon_num]: number of coupons delivered to the customer in the day. Note that any coupon expires at the end of the day. The data range is {0,1,2,3,4,5}.
3、[coupon_discount]: discount rate of the coupon (invalid when day_deliver_coupon_num is zero). The data range is {0.95,0.9,0.85,0.8,0.75,0.7,0.65,0.6}.
4、[day_order_num]: number of orders the customer made in the day. Coupons are consumed with orders by default. The data range is {0,1,2,3,4,5,6}.
5、[day_average_order_fee]: average fee the customer paid per order before discount in the day. The data range is [0,100].
6、[step]: index of the date for the customer, ranged from 0 to 59.
7、[date]: date, ranged from 2021/03/19 to 2021/05/17.
For example, for user ID 0, some data contents are shown in the following list:
Taking the first data line as an example, the data shows that on March 19, 2021, the customer received 0 coupon (the discount rate is invalid at this time), and the customer consumed 0 coupon on that day. The next day (the second line), the customer received 2 coupons with 65% discount, and the customer put one order on that day, meaning 1 coupon is consumed, and the order amount before discount was 33.5.
We expect participants to train customer-equal promotion policies learned from the offline data. The policies are tested in the decision-making process as shown in Figure 2.
From the offline data, participants may need to define the state of the customers as the input of the policy. The output promotion actions (discount rate and coupon number) will be delivered to all the customers. In the simulation environment, the virtual customers will take their actions after receiving the promotion actions.
A policy is tested interactively in the simulation environment starting from May 18, 2021.
l In the development phase, a policy is evaluated on 1,000 virtual customers in the next 14 days (i.e. from May 18 to May 31). The results will be fed back to the participants, and the test scores will also be displayed in the leaderboard.
l The policy submitted in the final phase is evaluated on 10,000 customers users in the next 14 days (i.e. from May 18 to May 31), and the test results will be displayed in the leaderboard.
l The final rank of teams is determined according to their performance on the 10,000 virtual customers in the next 30 days (i.e. May 18 to June 17). The latest policy submitted in the final phase will be evaluated for the final rank.
According to the test results of the policies submitted by the participating teams in the test environment, the leaderboard is updated in real time. It should be emphasized that each team is only allowed to submit once a day.
The goal is to maximize Total_GMV under the constraint that Total_ROI >= 6.5. If the constraint is violated, i.e., Total_ROI < 6.5, the score will be 0. Otherwise, the score is Total_GMV.
Note: in the development phase and the final phase, the leaderboard shows the evaluation result of 14 days, and the final rank is determined by a 30-day evaluation after the final phase.
For one day and one customer, we give the following definitions:
coupon_order_num = min(day_deliver_coupon_num, day_order_num)
coupon_order_fee = coupon_order_num×day_average_order_fee ×(1- coupon_discount)
Per_GMV = day_order_num×day_average_order_fee - coupon_order_fee
Total_GMV = sum of Per_GMV over all the customers in all the 14/30 days
Total_Cost = sum of coupon_order_fee over all the customers in all the 14 days
Total_ROI = Total_GMV / max(Total_Cost, 1)
The rules for using coupons are as follows:
(1)Any coupon expires at the end of the day.
(2)A customer can use as many coupons as the received number.
(3)For a customer, a coupon is automatically used with an order, as long as there are coupons available.
Start: Dec. 25, 2021, 2 a.m.
Description: Development phase: Data of 1,000 customers is available for download. Submitted policys are evaluated on the 1,000 customers in 14 days.
Start: Feb. 10, 2022, 4 p.m.
Description: Final phase: Data of 10,000 customers is available for download. Submitted policies are evaluated on the 10,000 customers in 14 days.
Start: Feb. 26, 2022, 4 p.m.
Description: Final Submission: Submitted policies will be evaluated for the final scores on the 10,000 customers in 30 days.
Feb. 27, 2022, 4 p.m.
You must be logged in to participate in competitions.
Sign In