The similarity matrix represents a graph with vertices and edges.

Each vertex belongs to 3 nested sets

- Level 0 “singleton” set - i.e. just itself
- Level 1 “replicate” set - e.g. replicates of the same perturbation
- Level 2 “group replicate” set e.g. replicates of perturbations with same MOA

We calculate metrics hierarchically:

**Level 1-0**: similarity of elements of a Level 0 (singleton) set to elements of its Level 1 (replicates) set, except elements of its Level 0 set. In simpler terms, this is a*replicate similarity of a vertex, i.e. the*similarity of vertex to its replicates (except itself)*. This is a**Level 0**(singleton) set metric.**Level 2-1**: similarity of each element of a Level 1 set (replicates) to elements of its Level 2 set (group replicates), except elements of its Level 1 set (replicates).*In simpler terms, this is a group replicate similarity of a replicate set, i.e. similarity of elements of a replicate set to its group replicates (except to other elements of its replicate set)*. This is a**Level 1**(replicate) set metric.

We can aggregate each of these metrics to produce more metrics:

**Level 1**: average Level 1-0 similarity across all Level 0 (singleton) sets that are nested in the Level 1 set. In simpler terms, this is the*average replicate similarity of a set of replicate vertices*. This is a**Level 1**(replicate) set metric.**Level 2**: average Level 2-1 similarity across all Level 1 (replicate) sets that are nested in the Level 2 set. In simpler terms, this is the*average group replicate similarity of a set of replicate sets.*This is a a**Level 2**(group replicate) set metric.

Consider a compound perturbation experiment done in replicates in a multi-well plate. Each compound belongs to one (or more) MOAs.

- Each
**replicate well**has a Level 1-0 metric, which is the similarity of that well to its replicates. - Each
**compound**has a Level 2-1 metric, which is the average similarity of each of its replicate wells to replicate wells of other compounds with the same MOA.

Further,

- Each
**compound**has a Level 1 metric, which is the average Level 1-0 metric across all its replicate wells. - Each
**MOA**has a Level 2 metric, which is the average Level 2-1 metric across all its compounds.

The metrics implemented in `matric`

are defined below.

Metric | Description |
---|---|

`sim_mean_i` |
mean similarity of a vertex to its replicate vertices |

Related: `sim_median_i`

which uses median instead of
mean.

Metric | Description |
---|---|

`sim_scaled_mean_non_rep_i` |
scale `sim_mean_i` using
`sim_mean_stat_non_rep_i` and
`sim_sd_stat_non_rep_i` |

where

`sim_mean_stat_non_rep_i`

and`sim_sd_stat_non_rep_i`

are the mean and s.d. of similarity of a vertex to its non-replicate vertices.

Related:

`sim_scaled_median_non_rep_i`

which scales`sim_median_i`

instead of`sim_mean_i`

.`sim_scaled_mean_ref_i`

which scales`sim_mean_i`

w.r.t. reference vertices (i.e. uses`sim_mean_stat_ref_i`

and`sim_sd_stat_ref_i`

– the mean and s.d. of similarity of a vertex to the references vertices – to scale).`sim_scaled_median_ref_i`

which is the same as`sim_scaled_mean_ref_i`

except that is scales`sim_median_i`

instead of`sim_mean_i`

.

Consider a list of vertices comprising

- the replicates of the vertex
- the non-replicates of the vertex

Metric | Description |
---|---|

`sim_ranked_relrank_mean_non_rep_i` |
the mean percentile of the vertex’s replicates in this list |

`sim_retrieval_average_precision_non_rep_i` |
the average precision reported on the list, with the replicates being the positive class |

`sim_retrieval_r_precision_non_rep_i` |
similarly, the R-precision reported on the list |

Related:

`sim_ranked_relrank_median_non_rep_i`

reports the median percentile instead of the mean percentile.`sim_ranked_relrank_mean_ref_i`

,`sim_ranked_relrank_median_ref_i`

,`sim_retrieval_average_precision_ref_i`

, and`sim_retrieval_r_precision_non_rep_i`

use a list of vertices comprising the reference vertices instead of the non-replicate vertices.

`sim_mean_i_mean_i`

is the mean`sim_mean_i`

across all replicate vertices in a replicate set.`sim_mean_i_median_i`

,`sim_median_i_mean_i`

, and`sim_median_i_median_i`

are the corresponding Level 1 aggregated metrics for other combinations of Level 1-0 raw metrics and summary statistics.`sim_scaled_mean_non_rep_i_mean_i`

,`sim_scaled_median_non_rep_i_median_i`

,`sim_scaled_mean_ref_i_mean_i`

,`sim_scaled_median_ref_i_median_i`

are the corresponding Level 1 aggregated metrics for the scaled Level 1-0 metrics.`sim_ranked_relrank_mean_ref_i_mean_i`

,`sim_ranked_relrank_mean_ref_i_median_i`

,`sim_ranked_relrank_median_ref_i_mean_i`

,`sim_ranked_relrank_median_ref_i_median_i`

are the corresponding Level 1 aggregated metrics for the rank-based Level 1-0 metrics.`sim_retrieval_average_precision_ref_i_mean_i`

,`sim_retrieval_average_precision_ref_i_median_i`

,`sim_retrieval_r_precision_ref_i_mean_i`

,`sim_retrieval_r_precision_ref_i_median_i`

are the corresponding Level 1 aggregated metrics for the retrieval-based Level 1-0 metrics.

Note: These are Level 1 summaries of scaling parameters; they are not used for scaling, themselves:

`sim_mean_stat_non_rep_i_mean_i`

,`sim_sd_stat_non_rep_i_mean_i`

,`sim_mean_stat_non_rep_i_median_i`

,`sim_sd_stat_non_rep_i_median_i`

`sim_mean_stat_ref_i_mean_i`

,`sim_sd_stat_ref_i_mean_i`

,`sim_mean_stat_ref_i_median_i`

,`sim_sd_stat_ref_i_median_i`

Metric | Description |
---|---|

`sim_mean_g` |
mean similarity of vertices in a replicate set to its group replicate vertices |

Related: `sim_median_g`

which uses median instead of
mean.

Metric | Description |
---|---|

`sim_scaled_mean_non_rep_g` |
scale `sim_mean_g` using
`sim_mean_stat_non_rep_g` and
`sim_sd_stat_non_rep_g` |

where

`sim_mean_stat_non_rep_g`

and`sim_sd_stat_non_rep_g`

are the mean and s.d. of similarity of vertices in a replicate set to their non-replicate (and non-group replicate) vertices.

Related:

`sim_scaled_median_non_rep_g`

which scales`sim_median_g`

instead of`sim_mean_g`

.`sim_scaled_mean_ref_g`

which scales`sim_mean_g`

w.r.t. reference vertices (i.e. uses`sim_mean_stat_ref_g`

and`sim_sd_stat_ref_g`

– the mean and s.d. of similarity of vertices in a replicate set to the references vertices – to scale).`sim_scaled_median_ref_i`

which is the same as`sim_scaled_mean_ref_i`

except that is scales`sim_median_i`

instead of`sim_mean_i`

.

Consider a list of vertices comprising

- the vertices in a replicate set
- the corresponding non-replicate (and non-group replicate) vertices

We define metrics similar to the corresponding Level 1-0 metrics:

`sim_ranked_relrank_mean_non_rep_g`

`sim_ranked_relrank_median_non_rep_g`

`sim_retrieval_average_precision_non_rep_g`

`sim_retrieval_r_precision_non_rep_g`

`sim_ranked_relrank_median_ref_g`

`sim_ranked_relrank_median_ref_g`

`sim_retrieval_average_precision_ref_g`

`sim_retrieval_r_precision_ref_g`

These are not implemented.

*This a related discussion on metrics, from here.*

We have a weighted graph where the vertices are perturbations with multiple labels (e.g. pathways in the case of genetic perturbations), and edges are the similarity between the vertices (e.g. the cosine similarity between image-based profiles of two CRISPR knockouts).

There are three levels of ranked lists of edges, each of which can produce global metrics (based on classification metrics like average precision or other so-called class probability metrics). These global metrics can be used to compare representations.

In all 3 cases, we pose it as a binary classification problem on the edges:

- Class 1 edges: vertices have a shared label (e.g. at least one MOA in common)
- Class 0 edges: vertices do not have a shared label

The three levels of ranked lists of edges, along with the metrics they induce, are below

(Not all the metrics are useful, and some may be very similar to others. I have highlighted the ones I think are useful.)

- Global: Single list, comprising all edges

- We can directly compute a single
from this list*global metric*

- Label-specific: One list per label, comprising all edges that have at least one vertex with the label

- We can compute a
metric, from each list, with an additional constraint on Class 1 edges: both vertices should share the label being evaluated.*label-specific* - We can then (weighted) average the label-specific metrics to get a
single
*global metric*. - We can also directly compute a
*global metric*directly across all the label-specific lists.

- Sample-specific: One list per sample, comprising all edges that have at least one vertex as that sample

- We can compute a
metric, from each list.*sample-specific* - We can then average the
*sample-specific*metrics to get a*label-specific*metric, but filtered like in 1a although it may not be quite as straightforward; 2.d might be better. - We can further (weighted) average the
*label-specific*metrics to get a single*global metric*. - We can also directly compute a
*label-specific*metric directly across the sample-specific lists, but filtered like in 1a. - We can also directly average the
*sample-specific*metrics to get a single*global metric*. - We can also directly compute a single
directly across all the sample-specific lists.*global metric* - We can also (weighted) average the
*label-specific*metric in 2d to get a single*global metric*.

Notes:

- This discussion on metrics does not address the notion of “group replicates”.
- Level 1 metrics are macro-averaged metrics because we are taking averages of Level 1-0 metrics. Macro-averaged metrics are not currently implemented
- The difference between 1.a and 2.d is in how we construct the label-specific list: 1.a combines the sample-specific lists and then ranks, whereas 2.d first ranks the sample-specific lists and then combines the ranked lists.
- Similarly, the difference between 0.a and 2.g is in how we construct the global list: 0.a combines the sample-specific lists and then ranks, whereas 2.g first ranks the sample-specific lists and then combines the ranked lists.
- 1.b and 2.g are similar; the both aggregate their corresponding label-specific metrics (1.a and 2.d respectively) to get a global metric.
`sim_retrieval_average_precision_non_rep_i`

is an example of 2.a`sim_retrieval_average_precision_non_rep_i_mean_i`

is an example of 2.b

Categorization based on https://scikit-learn.org/stable/modules/model_evaluation.html#multiclass-and-multilabel-classification (I did not double-check; there could be errors)

Index | Averaging | Metric type |
---|---|---|

0.a | micro | global |

1.a | micro | label-specific |

1.b | macro | global |

1.c | micro | global |

2.b | macro | label-specific |

2.c | macro of macro-label-specific | global |

2.d | micro | label-specific |

2.e | macro | global |

2.f | micro | global |

2.g | macro of micro-label-specific | global |