Before using Rankings Reloaded, make sure that you understand its concepts. In the definition section, you can find brief explanations of input components. In the data structure section, the organization of the input data file is defined. After understanding the concepts, you are ready to prepare your own data file.
In order to generate your benchmarking report, your data is needed in the format of a CSV file. For example, in a
challenge/problem with two tasks, two test cases, and two algorithms, the data might look like this:
Task | Case | Algorithm | Metric Values |
---|---|---|---|
Task 1 | case1 | A1 | 0.27 |
Task 1 | case1 | A2 | 0.20 |
Task 1 | case2 | A1 | 0.57 |
Task 1 | case2 | A2 | 0.95 |
Task 2 | case1 | A1 | 0.37 |
Task 2 | case1 | A2 | 0.89 |
Task 2 | case2 | A1 | 0.91 |
Task 2 | case2 | A2 | NA |
The column structure of your data must be as same as the sample above. The columns represent:
A task identifier. The ranking analysis may be performed for different tasks which are defined via this column. A task may, for example, refer to
Remark: If you are comparing different metrics, make sure that they are ordered in the same way! The toolkit requires the boolean specification whether small values are better, meaning the sort direction of metric values. Example: The first metric is the Dice similarity coefficient (DSC) (range: [0,1]) and should be compared to rankings with the volumetric difference (range: [0,x], x not specified). High values of the DSC (close to 1) correspond to high performance. In comparison, high values of the volumetric difference correspond to low performance. To solve this issue and thus make it work as a multi-task challenge, you can invert the DSC values, so that high values also correspond to low performance.
A case identifier. This column should contain all cases (images) that are used in the challenge/benchmarking experiment. Make sure that each case appears only once for the same algorithm and task.
An algorithm identifier. The column should contain all used algorithms/methods that will be compared. For each case, the algorithm should appear once.
The calculated metric values should appear in this column. For a single metric, a value should appear for each algorithm for each case. In case of missing metric values, a missing observation has to be provided (either as a blank field or “NA”), otherwise the system cannot generate the report. For example, in task “T2”, test case “case2”, algorithm “A2” did not give a prediction and thus NAis inserted to denote a missing value in the table above. Rankings Reloaded will ask you how to handle such missing cases (e.g., by assigning them the worst possible metric score) throughout the report generation.
You can also download our sample_data.csv to analyze the structure of the data file. In this file, results of three tasks are generated to analyze different scenarios (ideal, random, worst case). This file is used to generate the sample_data_report.pdf file at the "Use Cases" page.
Congratulations! Your benchmarking report is only some clicks away! You are ready to prepare your challenge data for report generation. Before you continue, you may want to check Citation and FAQ pages. Then you may prepare your data and click the button below:
[1] Maier-Hein, L., Eisenmann, M., Reinke, A. et al. Why rankings of biomedical image analysis competitions should be interpreted with care. Nat Commun 9, 5217 (2018). https://doi.org/10.1038/s41467-018-07619-7