The rules state that the solutions "must not intentionally boost the diversity metric in an artificial manner, e.g., by adding random text to the beginning of a prompt.".
Could you give some clarification on what you mean by "an artificial manner"? For example, are the following things okay?
- Running the algorithm from 50 different seeds, such that you get 50 different test cases per behavior instead of one?
- Running the algorithm 50 times and explicitly steering away from existing test cases, such that you get a diverse set of solutions?
- Running an algorithm that explicitly maximizes both quality of test cases and the distance between test cases?