CodaLab -

> Updated evaluation scheme for the Testing Phase

Dear Participants,

We have received concerns about label leakage due to release of the test data. Therefore, instead of submitting the predicted labels, participants should submit their network definition according to the interface given in the starting kit. test.py from the Development Phase starting kit will be run on the server side to generate the predicted scores on the test set. The server side test script will be run using the public docker image smasc/codalab-custom:latest, which includes a python environment with PyTorch (including torchvision), scipy and fvcore (for GFLOPs and params estimation) installed.

The "starting kit" for the Testing Phase has been uploaded, which provides an example submission.
Since this is the first time we try running the inference code on the server side, please feel free to contact us for any concerns/suggestions about this new evaluation scheme. Thanks!

Best,
FIQA Track Organizers

Posted by: sizhuoma @ June 5, 2025, 8:44 p.m.

Dear Organizers,

Would it be possible to provide the timm package in the running environment?

Additionally, I have a question regarding the image loading process. Is the following command used:
from PIL import Image; from torchvision import transforms; x = Image.open(x_path).convert('RGB'); transform = transforms.ToTensor(); x = transform(x).unsqueeze(0) ?

We need to understand the image loading and preprocessing pipeline to determine if any additional preprocessing steps are required on our side.

Best regards

Posted by: Caesar_D @ June 28, 2025, 10:20 a.m.

Thanks for the question!
(1) The image processing pipeline is like this: (You can find this in the test.py in the development phase starting kit)
img = Image.open(img_path).convert('RGB')
arr = np.asarray(img).astype('float32') / 255.0
arr = np.transpose(arr, (2, 0, 1))
sample = {'d_img_org': arr, 'score': np.array([score], dtype=np.float32)}
if self.transform:
sample = self.transform(sample)
where transform = transforms.Compose([Normalize(0.5, 0.5), ToTensor()])
(2) We will update the docker image to include timm very soon.

Please don't hesitate to raise any questions/concerns and we will try to address them as soon as possible!

Posted by: sizhuoma @ June 28, 2025, 2:32 p.m.

Thanks for the reply! Would it be possible to simplify the preprocessing pipeline so that we can define the transform function directly in our model.py file? For example, could you remove the steps like arr = np.asarray(img).astype('float32') / 255.0, arr = np.transpose(arr, (2, 0, 1)), and Normalize(0.5, 0.5) from the current preprocessing?

Posted by: Caesar_D @ June 28, 2025, 2:49 p.m.

Makes sense. We will update this very soon

Posted by: sizhuoma @ June 28, 2025, 2:57 p.m.

Dear Participants:
As suggested by several participants, we made the following change in the evaluation pipeline:
(1) We added timm in the testing docker image.
(2) We rewrote our preprocessing pipeline for the input image to the following: (This change is also reflected in the updated testing-phase starting_kit)
from PIL import Image; from torchvision import transforms; x = Image.open(x_path).convert(‘RGB’); transform = transforms.ToTensor(); x = transform(x).unsqueeze(0)
Please don’t hesitate to raise any questions/concerns regarding the evaluation process! We will try our best to address them as soon as possible.
Best,
Organizers

Posted by: sizhuoma @ June 28, 2025, 4:50 p.m.

Dear Organizers,

Could you please provide the performance of a baseline model on both the validation and test sets? This would help us better understand the performance gap between the two sets.

Posted by: Caesar_D @ June 29, 2025, 3:58 p.m.

Following is the results for a MobileNetV2 baseline (the same one released in the development-phase starting-kit, but with num_crops=1 to meet the FLOPs requirement):
Score SROCC PLCC
valid: 0.9215 (22) 0.8993 (22) 0.9436 (21)
test: 0.8279 (3) 0.8285 (3) 0.8272 (3)
Hope this helps. Thanks!

Best,
Organizers

Posted by: sizhuoma @ June 29, 2025, 5:21 p.m.

Is it possible to preserve the standard order of operations in the preprocessing pipeline?
If ToTensor() is already applied on your side, it means we are left to handle the rest ourselves, such as Resize() and Normalize(). However, changing the order of these operations can affect model performance.

The best solution would be to respect the standard preprocessing sequence:
transforms.Compose([
transforms.Resize(...),
# any other augmentation
transforms.ToTensor(),
transforms.Normalize(...)
])

Posted by: hamidi.alii1990 @ July 1, 2025, 3:40 p.m.

And actually, in the forward(), we cannot apply standard PyTorch transforms. What can you suggest to me for this problem?

Posted by: hamidi.alii1990 @ July 1, 2025, 4:28 p.m.

Thanks hamidi.alii1990 for pointing it out. We agree that keeping the order of the transforms could best preserve the performance. Specifically, the resize operation should ideally be applied on PIL Images.
We have updated the testing code to allow for custom transform:
- If your model class has an attribute "self.custom_transform", it will be used to preprocess the PIL Images to generate the torch.Tensor input to the forward() method.
- If your model class does not have "custom_transform", torchvision.transforms.ToTensor(). This preserves compatibility with the previous testing code: If your code worked before, you can sit assured that no changes need to be made.
The updated testing-phase starting kit provides an example for the custom_transform.

Posted by: sizhuoma @ July 1, 2025, 6:40 p.m.

Dear Organizers,

We noticed that when using the custom_transform mode, the GFLOPs are approximately 10 times higher than the previous results. Could you please help check the reason for this discrepancy?

Posted by: Caesar_D @ July 2, 2025, 7:46 a.m.

Yes, I have the same problem.
The GFLOPs have increased compared to the previous results after applying the custom transformation.

Posted by: hamidi.alii1990 @ July 2, 2025, 8:14 a.m.

There is a bug where the custom transform is not correctly applied in GFLOPs calculation. This will be fixed soon. Sorry for the confusion

Posted by: sizhuoma @ July 2, 2025, 11:01 a.m.

The bug has been fixed. Please try submit your code again if your GFLOPs was measured incorrectly

Posted by: sizhuoma @ July 2, 2025, 11:33 a.m.

Post in this thread

Forums

VQualA 2025 Face Image Quality Assessment Challenge Forum

> Updated evaluation scheme for the Testing Phase