rank的相关loss

1、相关loss

1.1、loss相关简介

排序优化时，主要从三个角度来考虑构建loss，分别为pointwise、pairwise、listwise。pointwise将排序所有query当成一个整体，计算每个<query,doc>对的loss,相当于一个二分问题。pairwise以每个query为维度，构建<query,doc1,doc2>这样的元组，构建doc和doc之间的偏序关系。listwise以query为维度，将此query下的doc作为一个list进行建模。常见的这几种loss包括如下几种，来源于TF-Ranking代码库。

1.2、pointwise

ClickEMLoss：假设点击是由分解模型 P(examination) *P(relevance) 生成的，它们是分别由 `exam_logits` 和 `rel_logits` 确定的潜在变量

SigmoidCrossEntropyLoss:将<query,doc>当成二分类问题，计算logits和labels之间的交叉熵

MeanSquaredLoss:将<query,doc>当成回归问题，计算logits和labels之间的平方差

OrdinalLoss：ordinal_size标签的等级，将回归问题转化为分类问题

MultiClassLoss：将<query,doc>当成多分类问题，计算logits和labels之间的交叉熵

1.3、pairwise

PairwiseLogisticLoss：log(1 + exp(-pairwise_logits)).

PairwiseHingeLoss：Hinge(l_i > l_j) = max(0, 1 - (s_i - s_j))

PairwiseSoftZeroOneLoss：PairwiseHingeLoss的一个变种，当 if (s_i - s_j <0)时，loss为tf.sigmoid(-pairwise_logits)

PairwiseMSELoss：面向回归模型的loss构建，loss为logits和label的平方误差，tf.math.square(pairwise_logit_diff -pairwise_label_diff)

1.4、listwise

SoftmaxLoss:每个list doc的相关性与logits的交叉熵， softmax_cross_entropy_with_logits_v2(
labels_for_softmax, logits_for_softmax)

PolyOneSoftmaxLoss:为理解和改进常用的loss提供了一个框架,灵感来自于损失函数的泰勒展开式。poly_one loss增加了多项式展开的第一项表述。

pt = tf.reduce_sum(labels_for_softmax * tf.nn.softmax(logits_for_softmax), axis=-1)
ce = tf.compat.v1.nn.softmax_cross_entropy_with_logits_v2(labels_for_softmax, logits_for_softmax)
losses = ce + self._epsilon * (1 - pt)

https://arxiv.org/pdf/2204.12511.pdf

CircleLoss：基于对的优化的方法目的都是最大化类内相似性 $S_{p}$ 同时最小化类间相似性 $S_{n}$ 。为了重新加权那些欠优化的分数，文中提出了Circle loss, 之所以叫这个名字是因为其决策边界是一个圆。L_circle = log(1 + sum_{i is p,j is n}
exp(gamma * (a_j * (s_j - d_n) - a_i * (s_i - d_p))))

https://arxiv.org/pdf/2002.10857.pdf

UniqueSoftmaxLoss：主要解决3个问题，1）、排序概率忽略了联系；2）、不利于高相关性的文档；3）、假设宽松：不同步的文档采用假设是独立的。

-sum_i (2^l_i - 1) * log(exp(s_i) / (sum_j exp(s_j) + exp(s_i)))
https://arxiv.org/pdf/2001.01828.pdf

MixtureEMLoss：融合多个排序策略

ListMLELoss：ListMLE核心思想与ListNet类似，

相同点：都是通过概率模型输出的排序概率分布来近似真实概率分布

不同点：

损失函数：ListNet选择的是交叉熵损失ce_loss，ListMLE选择的是似然损失likelihood loss
概率分布：ListNet一般选用top one probability，ListMLE保留了Permutation Probability的特性，使得排序概率相比ListNet中的排序概率更加准确，因为ListNet实现时本质上是直接使用了SoftMax对排序概率Permutation Probability进行了近似上是直接使用了SoftMax对排序概率Permutation Probability进行了近似

ApproxNDCGLoss：将ndcg指标中的位置转化为可导的计算方式，计算ndcg，直接对排序指标ndcg进行优化

ApproxMRRLoss：将mrr指标中的位置转化为可导的计算方式，计算mrr，直接对排序指标mrr进行优化

2、loss的定义及具体计算

2.1、基础实现

2.1.1、_RankingLoss rank相关loss的基础实现

class _RankingLoss(object, metaclass=abc.ABCMeta):
  """Interface for ranking loss."""

  def __init__(self, name, lambda_weight=None, temperature=1.0, ragged=False):
    """Constructor.

    Args:
      name: A string used as the name for this loss.
      lambda_weight: A `_LambdaWeight` object.
      temperature: A float number to modify the logits=logits/temperature.
      ragged: A boolean indicating whether the input tensors are ragged.
    """
    self._name = name
    self._lambda_weight = lambda_weight
    self._temperature = temperature
    self._ragged = ragged

  @property
  def name(self):
    """The loss name."""
    return self._name

  def _prepare_and_validate_params(self, labels, logits, weights, mask):
    """Prepares and validate input parameters.

    Args:
      labels: A `Tensor` of the same shape as `logits` representing graded
        relevance.
      logits: A `Tensor` with shape [batch_size, list_size]. Each value is the
        ranking score of the corresponding item.
      weights: A scalar, a `Tensor` with shape [batch_size, 1] for list-wise
        weights, or a `Tensor` with shape [batch_size, list_size] for item-wise
        weights.
      mask: A `Tensor` of the same shape as logits indicating which entries are
        valid for computing the loss.

    Returns:
      A tuple (labels, logits, weights, mask) of `tf.Tensor` objects that are
      ready to be used in the loss.
    """
    if self._ragged:
      labels, logits, weights, mask = utils.ragged_to_dense(
          labels, logits, weights)

    if mask is None:
      mask = utils.is_label_valid(labels)

    if weights is None:
      weights = 1.0

    labels = tf.convert_to_tensor(labels)
    logits = tf.convert_to_tensor(logits)
    weights = tf.convert_to_tensor(weights)
    mask = tf.convert_to_tensor(mask)

    return labels, logits, weights, mask

  def compute_unreduced_loss(self, labels, logits, mask=None):
    """Computes the unreduced loss.

    Args:
      labels: A `Tensor` or `RaggedTensor` of the same shape as `logits`
        representing graded relevance.
      logits: A `Tensor` or `RaggedTensor` with shape [batch_size, list_size].
        Each value is the ranking score of the corresponding item.
      mask: An optional `Tensor` of the same shape as logits indicating which
        entries are valid for computing the loss. Will be ignored if the loss
        was constructed with ragged=True.

    Returns:
      A tuple(losses, loss_weights) that have the same shape.
    """
    labels, logits, _, mask = self._prepare_and_validate_params(
        labels, logits, None, mask)
    return self._compute_unreduced_loss_impl(labels, logits, mask)

  @abc.abstractmethod
  def _compute_unreduced_loss_impl(self, labels, logits, mask=None):
    """Implementation for the unreduced loss.

    Args:
      labels: A `Tensor` of the same shape as `logits` representing graded
        relevance.
      logits: A `Tensor` with shape [batch_size, list_size]. Each value is the
        ranking score of the corresponding item.
      mask: An optional `Tensor` of the same shape as logits indicating which
        entries are valid for computing the loss.

    Returns:
      A tuple(losses, loss_weights) that have the same shape.
    """
    raise NotImplementedError('Calling an abstract method.')

  def normalize_weights(self, labels, weights):
    """Normalizes weights.

    This is needed for `tf.estimator` given that the reduction may be
    `SUM_OVER_NONZERO_WEIGHTS`.

    This method is also needed to compute normalized weights when calling
    `compute_unreduced_loss`, which is done in the tf.keras losses.

    Args:
      labels: A `Tensor` of shape [batch_size, list_size] representing graded
        relevance.
      weights: A scalar, a `Tensor` with shape [batch_size, 1] for list-wise
        weights, or a `Tensor` with shape [batch_size, list_size] for item-wise
        weights.

    Returns:
      The normalized weights.
    """
    if self._ragged:
      labels, _, weights, _ = utils.ragged_to_dense(labels, None, weights)
    return self._normalize_weights_impl(labels, weights)

  def _normalize_weights_impl(self, labels, weights):
    """See `normalize_weights`."""
    del labels
    return 1.0 if weights is None else weights

  def get_logits(self, logits):
    """Computes logits rescaled by temperature.

    Args:
      logits: A `Tensor` with shape [batch_size, list_size]. Each value is the
        ranking score of the corresponding item.

    Returns:
      Tensor of rescaled logits.
    """
    if not tf.is_tensor(logits):
      logits = tf.convert_to_tensor(value=logits)
    return logits / self._temperature

  def compute(self, labels, logits, weights, reduction, mask=None):
    """Computes the reduced loss for tf.estimator (not tf.keras).

    Note that this function is not compatible with keras.

    Args:
      labels: A `Tensor` of the same shape as `logits` representing graded
        relevance.
      logits: A `Tensor` with shape [batch_size, list_size]. Each value is the
        ranking score of the corresponding item.
      weights: A scalar, a `Tensor` with shape [batch_size, 1] for list-wise
        weights, or a `Tensor` with shape [batch_size, list_size] for item-wise
        weights.
      reduction: One of `tf.losses.Reduction` except `NONE`. Describes how to
        reduce training loss over batch.
      mask: A `Tensor` of the same shape as logits indicating which entries are
        valid for computing the loss.

    Returns:
      Reduced loss for training and eval.
    """
    logits = self.get_logits(logits)
    losses, loss_weights = self._compute_unreduced_loss_impl(
        labels, logits, mask)
    weights = tf.multiply(
        self._normalize_weights_impl(labels, weights), loss_weights)
    return tf.compat.v1.losses.compute_weighted_loss(
        losses, weights, reduction=reduction)

  @abc.abstractmethod
  def compute_per_list(self, labels, logits, weights, mask=None):
    """Computes the per-list loss.

    Args:
      labels: A `Tensor` of the same shape as `logits` representing graded
        relevance.
      logits: A `Tensor` with shape [batch_size, list_size]. Each value is the
        ranking score of the corresponding item.
      weights: A scalar, a `Tensor` with shape [batch_size, 1] for list-wise
        weights, or a `Tensor` with shape [batch_size, list_size] for item-wise
        weights.
      mask: A `Tensor` of the same shape as logits indicating which entries are
        valid for computing the loss.

    Returns:
      A pair of `Tensor` objects of shape [batch_size] containing per-list
      losses and weights.
    """
    raise NotImplementedError('Calling an abstract method.')

  def eval_metric(self, labels, logits, weights, mask=None):
    """Computes the eval metric for the loss in tf.estimator (not tf.keras).

    Note that this function is not compatible with keras.

    Args:
      labels: A `Tensor` of the same shape as `logits` representing graded
        relevance.
      logits: A `Tensor` with shape [batch_size, list_size]. Each value is the
        ranking score of the corresponding item.
      weights: A scalar, a `Tensor` with shape [batch_size, 1] for list-wise
        weights, or a `Tensor` with shape [batch_size, list_size] for item-wise
        weights.
      mask: A `Tensor` of the same shape as logits indicating which entries are
        valid for computing the metric.

    Returns:
      A metric op.
    """
    losses, loss_weights = self._compute_unreduced_loss_impl(
        labels, logits, mask)
    weights = tf.multiply(
        self._normalize_weights_impl(labels, weights), loss_weights)
    return tf.compat.v1.metrics.mean(losses, weights)

2.1.2、_prepare_and_validate_params 参数检验及校验

label mask获取有效valid label。weight、labels、logits转化为tensor过程，大概率是为了兼容list测试的过程

2.1.3、_compute_unreduced_loss计算未reduce的loss

_compute_unreduced_loss计算未reduce的loss，和logits等有相同的维度。调用类中的_compute_unreduced_loss_impl函数实现。这个函数在具体的继承类中被改写

2.1.4、normalize_weights归一化权重

normalize_weights将权重进行归一化，调用类中_normalize_weights_impl函数实现，这个函数在具体的继承类中被改写

2.1.5、temperature调节logits

通过调用函数get_logits实现，logits/_temperature调整logits的分数。logit除以temperature引入的原因语言模型采样策略 - 知乎

2.1.6、计算loss

def compute(self, labels, logits, weights, reduction, mask=None):
    """Computes the reduced loss for tf.estimator (not tf.keras).

    Note that this function is not compatible with keras.

    Args:
      labels: A `Tensor` of the same shape as `logits` representing graded
        relevance.
      logits: A `Tensor` with shape [batch_size, list_size]. Each value is the
        ranking score of the corresponding item.
      weights: A scalar, a `Tensor` with shape [batch_size, 1] for list-wise
        weights, or a `Tensor` with shape [batch_size, list_size] for item-wise
        weights.
      reduction: One of `tf.losses.Reduction` except `NONE`. Describes how to
        reduce training loss over batch.
      mask: A `Tensor` of the same shape as logits indicating which entries are
        valid for computing the loss.

    Returns:
      Reduced loss for training and eval.
    """
    logits = self.get_logits(logits)
    losses, loss_weights = self._compute_unreduced_loss_impl(
        labels, logits, mask)
    weights = tf.multiply(
        self._normalize_weights_impl(labels, weights), loss_weights)
    return tf.compat.v1.losses.compute_weighted_loss(
        losses, weights, reduction=reduction)

语言模型采样策略 - 知乎通过调用_compute_unreduced_loss_impl计算loss，然后reduce计算得到loss

2.1.7、以list为单元计算loss

compute_per_list以list为单元计算loss，返回为batch_size的大小

2.1.8、评估均值loss

以加权loss为评估标准tf.compat.v1.metrics.mean(losses, weights)

2.2、_PointwiseLoss继承_RankingLoss 实现pointwise基类

2.2.1、_PointwiseLoss函数实现

class _PointwiseLoss(_RankingLoss):
  """Interface for pointwise loss."""

  def _normalize_weights_impl(self, labels, weights):
    """See _RankingLoss."""
    if weights is None:
      weights = 1.
    return tf.compat.v1.where(
        utils.is_label_valid(labels),
        tf.ones_like(labels) * weights, tf.zeros_like(labels))

  def compute_per_list(self, labels, logits, weights, mask=None):
    """See `_RankingLoss`."""
    # Prepare input params.
    labels, logits, weights, mask = self._prepare_and_validate_params(
        labels, logits, weights, mask)

    # Pointwise losses and weights will be of shape [batch_size, list_size].
    losses, loss_weights = self._compute_unreduced_loss_impl(
        labels, logits, mask)
    weights = tf.multiply(
        self._normalize_weights_impl(labels, weights), loss_weights)

    # Compute the weighted per-item loss.
    weighted_per_item_loss = tf.math.multiply(losses, weights)

    # Sum the inner dimensions to obtain per-list weights. For pointwise losses
    # this typically indicates the (weighted) number of items per list.
    per_list_weights = tf.reduce_sum(weights, axis=1)

    # This computes the per-list losses by summing all weighted per-item losses.
    per_list_losses = tf.reduce_sum(weighted_per_item_loss, axis=1)

    # Normalize the per-list losses so that lists with different numbers of
    # items have comparable losses. The different numbers of items is reflected
    # in the per-list weights.
    per_list_losses = tf.math.divide_no_nan(per_list_losses, per_list_weights)
    return per_list_losses, per_list_weights

2.2.2、继承_PointwiseLoss的类

ClickEMLoss ：假设点击是由分解模型 P(examination) *P(relevance) 生成的，它们是分别由 `exam_logits` 和 `rel_logits` 确定的潜在变量。

https://static.googleusercontent.com/media/research.google.com/zh-CN//pubs/archive/46485.pdf

SigmoidCrossEntropyLoss：tf.compat.v1.nn.sigmoid_cross_entropy_with_logits(
labels=labels, logits=logits)

MeanSquaredLoss： losses = tf.compat.v1.squared_difference(labels, logits)

OrdinalLoss： losses =tf.where(
tf.expand_dims(mask, -1),
tf.compat.v1.nn.sigmoid_cross_entropy_with_logits(
labels=ordinals,
logits=logits),
0.0)

ordinal_size标签的等级，将回归问题转化为分类问题，

处理分级问题的利器Ordinal Regression - 知乎

MultiClassLoss：losses = tf.keras.losses.CategoricalCrossentropy(
from_logits=self._from_logits,
label_smoothing=self._label_smoothing,
axis=-1,reduction=tf.keras.losses.Reduction.NONE,
name='categorical_crossentropy')(classes, logits, tf.cast(mask, dtype=tf.float32))

一文弄懂各种loss function - 知乎

Keras中的多分类损失函数categorical_crossentropy_keras 多分类损失函数_赵大寳Note的博客-CSDN博客

2.3、_PairwiseLoss继承_RankingLoss实现pairwise的基类

2.3.1、_PairwiseLoss的实现

pairwiseloss的基础实现,其他pairwise loss的实现都继承该类

class _PairwiseLoss(_RankingLoss, metaclass=abc.ABCMeta):
  """Interface for pairwise ranking loss."""

  @abc.abstractmethod
  def _pairwise_loss(self, pairwise_logits):
    """The loss of pairwise logits with l_i > l_j."""
    raise NotImplementedError('Calling an abstract method.')

  def _compute_unreduced_loss_impl(self, labels, logits, mask=None):
    """See `_RankingLoss`."""
    if mask is None:
      mask = utils.is_label_valid(labels)
    ranks = _compute_ranks(logits, mask)
    pairwise_labels, pairwise_logits = _pairwise_comparison(
        labels, logits, mask)
    pairwise_weights = pairwise_labels
    if self._lambda_weight is not None:
      pairwise_weights *= self._lambda_weight.pair_weights(labels, ranks)

    pairwise_weights = tf.stop_gradient(
        pairwise_weights, name='weights_stop_gradient')
    return self._pairwise_loss(pairwise_logits), pairwise_weights

  def compute_per_list(self, labels, logits, weights, mask=None):
    """See `_RankingLoss`."""
    # Prepare input params.
    labels, logits, weights, mask = self._prepare_and_validate_params(
        labels, logits, weights, mask)

    # Pairwise losses and weights will be of shape
    # [batch_size, list_size, list_size].
    losses, loss_weights = self._compute_unreduced_loss_impl(
        labels, logits, mask)
    weights = tf.multiply(
        self._normalize_weights_impl(labels, weights), loss_weights)

    # Compute the weighted per-pair loss.
    weighted_per_pair_loss = tf.math.multiply(losses, weights)

    # Sum the inner dimensions to obtain per-list weights. For pairwise losses
    # this typically indicates the (weighted) number of pairwise preferences per
    # list.
    per_list_weights = tf.reduce_sum(weights, axis=[1, 2])

    # This computes the per-list losses by summing all weighted pairwise losses.
    per_list_losses = tf.reduce_sum(weighted_per_pair_loss, axis=[1, 2])

    # Normalize the per-list losses so that lists with different numbers of
    # pairs have comparable losses. The different numbers of pairs is reflected
    # in the per-list weights.
    per_list_losses = tf.math.divide_no_nan(per_list_losses, per_list_weights)

    return per_list_losses, per_list_weights

  def _normalize_weights_impl(self, labels, weights):
    """See _RankingLoss."""
    # The `weights` is item-wise and is applied non-symmetrically to update
    # pairwise_weights as
    #   pairwise_weights(i, j) = w_i * pairwise_weights(i, j).
    # This effectively applies to all pairs with l_i > l_j. Note that it is
    # actually symmetric when `weights` are constant per list, i.e., listwise
    # weights.
    if weights is None:
      weights = 1.
    weights = tf.compat.v1.where(
        utils.is_label_valid(labels),
        tf.ones_like(labels) * weights, tf.zeros_like(labels))
    return tf.expand_dims(weights, axis=2)

2.3.2、mask获取

通过判断label是否大于0来判断是否被mask

def is_label_valid(labels):
  """Returns a boolean `Tensor` for label validity."""
  labels = tf.convert_to_tensor(value=labels)
  return tf.greater_equal(labels, 0.)

将mask为true的地方赋值为logits的大小，非mask的的地方logits赋值为最小的值，并减去一个较小的值

2.3.3、计算排序位置 _compute_ranks

def _compute_ranks(logits, is_valid):
  """Computes ranks by sorting valid logits.

  Args:
    logits: A `Tensor` with shape [batch_size, list_size]. Each value is the
      ranking score of the corresponding item.
    is_valid: A `Tensor` of the same shape as `logits` representing validity of
      each entry.

  Returns:
    The `ranks` Tensor.
  """
  _check_tensor_shapes([logits, is_valid])
  # Only sort entries with is_valid = True.
  scores = tf.compat.v1.where(
      is_valid, logits, -1e-6 * tf.ones_like(logits) +
      tf.reduce_min(input_tensor=logits, axis=1, keepdims=True))
  return utils.sorted_ranks(scores)

sorted_ranks返回分数的排序位置

def sorted_ranks(scores, shuffle_ties=True, seed=None):
  """Returns an int `Tensor` as the ranks (1-based) after sorting scores.

  Example: Given scores = [[1.0, 3.5, 2.1]], the returned ranks will be [[3, 1,
  2]]. It means that scores 1.0 will be ranked at position 3, 3.5 will be ranked
  at position 1, and 2.1 will be ranked at position 2.

  Args:
    scores: A `Tensor` of shape [batch_size, list_size] representing the
      per-example scores.
    shuffle_ties: See `sort_by_scores`.
    seed: See `sort_by_scores`.

  Returns:
    A 1-based int `Tensor`s as the ranks.
  """
  with tf.compat.v1.name_scope(name='sorted_ranks'):
    batch_size, list_size = tf.unstack(tf.shape(input=scores))
    # The current position in the list for each score.
    positions = tf.tile(tf.expand_dims(tf.range(list_size), 0), [batch_size, 1])
    # For score [[1.0, 3.5, 2.1]], sorted_positions are [[1, 2, 0]], meaning the
    # largest score is at position 1, the 2nd is at position 2 and 3rd is at
    # position 0.
    sorted_positions = sort_by_scores(
        scores, [positions], shuffle_ties=shuffle_ties, seed=seed)[0]
    # The indices of sorting sorted_positions will be [[2, 0, 1]] and ranks are
    # 1-based and thus are [[3, 1, 2]].
    ranks = tf.argsort(sorted_positions) + 1
    return ranks

2.3.4、计算pairwise_labels、pairwise_logits

pairwise_labels、pairwise_logits= _pairwise_comparison(
labels, logits, mask)

def _pairwise_comparison(labels, logits, mask, pairwise_logits_op=tf.subtract):
  r"""Returns pairwise comparison `Tensor`s.

  Given a list of n items, the labels of graded relevance l_i and the logits
  s_i, we form n^2 pairs. For each pair, we have the following:

                        /
                        | 1   if l_i > l_j for valid l_i and l_j.
  * `pairwise_labels` = |
                        | 0   otherwise
                        \
  * `pairwise_logits` = pairwise_logits_op(s_i, s_j)

  Args:
    labels: A `Tensor` with shape [batch_size, list_size].
    logits: A `Tensor` with shape [batch_size, list_size].
    mask: A `Tensor` with shape [batch_size, list_size] indicating which entries
      are valid for computing the pairwise comparisons.
    pairwise_logits_op: A pairwise function which operates on 2 tensors.

  Returns:
    A tuple of (pairwise_labels, pairwise_logits) with each having the shape
    [batch_size, list_size, list_size].
  """
  # Compute the difference for all pairs in a list. The output is a Tensor with
  # shape [batch_size, list_size, list_size] where the entry [-1, i, j] stores
  # the information for pair (i, j).
  pairwise_label_diff = _apply_pairwise_op(tf.subtract, labels)
  pairwise_logits = _apply_pairwise_op(pairwise_logits_op, logits)
  # Only keep the case when l_i > l_j.
  pairwise_labels = tf.cast(
      tf.greater(pairwise_label_diff, 0), dtype=tf.float32)
  valid_pair = _apply_pairwise_op(tf.logical_and, mask)
  pairwise_labels *= tf.cast(valid_pair, dtype=tf.float32)
  return pairwise_labels, pairwise_logits

pairwise_label_diff = _apply_pairwise_op(tf.subtract, labels)

tf.subtract张量减法，被减张量维度可以不一样tf.expand_dims（）和tf.squeeze（）的用法详解_无尽的沉默的博客-CSDN博客

tf.expand_dims(tensor, 2), tf.expand_dims(tensor, 1)将labels在第2维扩展一维，在第一维扩展一维并相减,最终元素的维度保持不变

pairwise_logits = _apply_pairwise_op(pairwise_logits_op, logits)

base版本的pairwise_logits_op=tf.subtract，将logits做和labels相同的处理

def _apply_pairwise_op(op, tensor):
  """Applies the op on tensor in the pairwise manner."""
  _check_tensor_shapes([tensor])
  return op(tf.expand_dims(tensor, 2), tf.expand_dims(tensor, 1))

_apply_pairwise_op检查tensor的shape，通过subtract进行相减操作，构建pair

pairwise_labels = tf.cast(tf.greater(pairwise_label_diff, 0), dtype=tf.float32)，满足条件的置为true，并强制转化为float类型。

【一看就懂】tf.logical_and(),tf.logical_or(),tf.logical_not(),tf.logical_xor()_城序猿的博客-CSDN博客

tf.cast()数据类型转换_-牧野-的博客-CSDN博客

tf.greater(v1,v2)和tf.where(p1,p2,p3)_牛客博客

2.3.5、计算pairwise_weights

代码中有多种pairwise_weights实现方式

pairwise_weights = pairwise_labels
if self._lambda_weight is not None:
pairwise_weights *= self._lambda_weight.pair_weights(labels, ranks)

2.3.6、pairwise_weight 梯度不反向传导

pairwise_weights = tf.stop_gradient(pairwise_weights, name='weights_stop_gradient')

Tensorflow中k.gradients()和tf.stop_gradient()的深入理解_tensorflow stop gradient_码农的科研笔记的博客-CSDN博客

2.3.7、compute_per_list以list为单元计算

def compute_per_list(self, labels, logits, weights, mask=None):
    """See `_RankingLoss`."""
    # Prepare input params.
    labels, logits, weights, mask = self._prepare_and_validate_params(
        labels, logits, weights, mask)

    # Pairwise losses and weights will be of shape
    # [batch_size, list_size, list_size].
    losses, loss_weights = self._compute_unreduced_loss_impl(
        labels, logits, mask)
    weights = tf.multiply(
        self._normalize_weights_impl(labels, weights), loss_weights)

    # Compute the weighted per-pair loss.
    weighted_per_pair_loss = tf.math.multiply(losses, weights)

    # Sum the inner dimensions to obtain per-list weights. For pairwise losses
    # this typically indicates the (weighted) number of pairwise preferences per
    # list.
    per_list_weights = tf.reduce_sum(weights, axis=[1, 2])

    # This computes the per-list losses by summing all weighted pairwise losses.
    per_list_losses = tf.reduce_sum(weighted_per_pair_loss, axis=[1, 2])

    # Normalize the per-list losses so that lists with different numbers of
    # pairs have comparable losses. The different numbers of pairs is reflected
    # in the per-list weights.
    per_list_losses = tf.math.divide_no_nan(per_list_losses, per_list_weights)

    return per_list_losses, per_list_weights

2.3.8、继承_PairtwiseLoss的类

PairwiseHingeLoss： Hinge(l_i > l_j) = max(0, 1 - (s_i - s_j)). So a
correctly ordered pair has 0 loss if (s_i - s_j >= 1). Otherwise the loss
increases linearly with s_i - s_j. When the list_size is 2, this reduces to
the standard hinge loss.

loss=tf.nn.relu(1 - pairwise_logits)

PairwiseLogisticLoss： log(1 + exp(-pairwise_logits)).

loss= tf.nn.relu(-pairwise_logits) + tf.math.log1p(tf.exp(-tf.abs(pairwise_logits)))

PairwiseSoftZeroOneLoss：PairwiseHingeLoss的一个变种，当 if (s_i - s_j <0)时，loss为tf.sigmoid(-pairwise_logits)

loss=tf.compat.v1.where(
tf.greater(pairwise_logits, 0), 1. - tf.sigmoid(pairwise_logits),
tf.sigmoid(-pairwise_logits))

PairwiseMSELoss：改写了_compute_unreduced_loss_impl函数，pairwise_mse_loss = tf.math.square(pairwise_logit_diff -pairwise_label_diff)，这个loss面向回归的模型

def _compute_unreduced_loss_impl(self, labels, logits, mask=None):
    """See `_RankingLoss`."""
    if mask is None:
      mask = utils.is_label_valid(labels)

    # Compute loss.
    pairwise_label_diff = _apply_pairwise_op(tf.subtract, labels)
    pairwise_logit_diff = _apply_pairwise_op(tf.subtract, logits)
    pairwise_mse_loss = tf.math.square(pairwise_logit_diff -
                                       pairwise_label_diff)
    valid_pair = _apply_pairwise_op(tf.logical_and, mask)

    # Compute weights.
    pairwise_weights = tf.ones_like(pairwise_mse_loss)
    batch_size, list_size = tf.unstack(tf.shape(input=labels))
    # Excluding the self pairs.
    pairwise_weights -= tf.eye(
        list_size, batch_shape=[batch_size], dtype=pairwise_weights.dtype)
    # Including only valid pairs
    pairwise_weights *= tf.cast(valid_pair, tf.float32)
    if self._lambda_weight is not None:
      ranks = _compute_ranks(logits, mask)
      pairwise_weights *= self._lambda_weight.pair_weights(labels, ranks)
    pairwise_weights = tf.stop_gradient(
        pairwise_weights, name='weights_stop_gradient')

    return pairwise_mse_loss, pairwise_weights

2.4、_ListwiseLoss继承_RankingLoss实现listwise的基类

2.4.1、ListwiseLoss的实现

class _ListwiseLoss(_RankingLoss):
  """Interface for listwise loss."""

  def _normalize_weights_impl(self, labels, weights):
    """See `_RankingLoss`."""
    if weights is None:
      return 1.0
    else:
      weights = tf.convert_to_tensor(value=weights)
      labels = tf.convert_to_tensor(value=labels)
      is_valid = utils.is_label_valid(labels)
      labels = tf.where(is_valid, labels, tf.zeros_like(labels))
      return tf.compat.v1.math.divide_no_nan(
          tf.reduce_sum(input_tensor=(weights * labels), axis=1, keepdims=True),
          tf.reduce_sum(input_tensor=labels, axis=1, keepdims=True))

  def compute_per_list(self, labels, logits, weights, mask=None):
    """See `_RankingLoss`."""
    # Prepare input params.
    labels, logits, weights, mask = self._prepare_and_validate_params(
        labels, logits, weights, mask)

    # Listwise losses and weights will be of shape [batch_size, 1].
    losses, loss_weights = self._compute_unreduced_loss_impl(
        labels, logits, mask)
    weights = tf.multiply(
        self._normalize_weights_impl(labels, weights), loss_weights)

    # This removes the inner dimension of size 1 to make the output shape
    # [batch_size].
    per_list_losses = tf.squeeze(losses, axis=1)
    per_list_weights = tf.squeeze(weights, axis=1)
    return per_list_losses, per_list_weights

相比于pairwise，listwise少了构建pair的过程，直接在list维度进行计算。

2.4.2、继承_ListwiseLoss的listwise类

CircleLoss：https://arxiv.org/abs/2002.10857

Circle Loss: 一个基于对优化的统一视角-CVPR2020 - 知乎

L_circle = log(1 + sum_{i, j} I_{y_i > y_j}
exp(gamma * (a_j * (s_j - d_n) - a_i * (s_i - d_p))))

class CircleLoss(_ListwiseLoss):
  """Implements circle loss.

  This is the Circle loss originally proposed by Sun et al.
  ["Circle Loss: A Unified Perspective of Pair Similarity Optimization"]. See
  https://arxiv.org/abs/2002.10857.

  For a model that outputs similarity scores `s` on data point with
  corresponding label y, the circle loss from Eq.(6) in the paper is
    L_circle = log(1 + sum_{i is p,j is n}
                   exp(gamma * (a_j * (s_j - d_n) - a_i * (s_i - d_p)))),
  defined for the binary label, p for data points with positive labels and n for
  data points with negative labels.
    a_i = relu(1 + margin - s_i)
    a_j = relu(s_j + margin)
    d_p = 1 - margin
    d_n = margin
  We can extend to non-binary labels with an indiactor function,
    L_circle = log(1 + sum_{i, j} I_{y_i > y_j}
                   exp(gamma * (a_j * (s_j - d_n) - a_i * (s_i - d_p)))),
  Note the loss takes only the similarity scores. We will clip any score value
  beyond 0 and 1 to confine the scores in [0, 1], please be aware of that.
  """

  def __init__(self,
               name,
               lambda_weight=None,
               gamma=64,
               margin=0.25,
               ragged=False):
    """Initializer.

    Args:
      name: A string used as the name for this loss.
      lambda_weight: A `_LambdaWeight` object.
      gamma: A float parameter used in circle loss.
      margin: A float parameter defining the margin in circle loss.
      ragged: A boolean indicating whether the input tensors are ragged.
    """
    super().__init__(
        name, lambda_weight=lambda_weight, temperature=1.0, ragged=ragged)
    self._margin = margin
    self._gamma = gamma

  def get_logits(self, logits):
    """See `_RankingLoss`."""
    # Add a clip to confine scores in [0, 1].
    return tf.clip_by_value(tf.convert_to_tensor(value=logits), 0., 1.)

  def _compute_unreduced_loss_impl(self, labels, logits, mask=None):
    """See `_RankingLoss`."""
    if mask is None:
      mask = utils.is_label_valid(labels)

    def circle_loss_pairwise_op(score_i, score_j):
      alpha_i = tf.stop_gradient(
          tf.nn.relu(1 - score_i + self._margin), name='circle_loss_alpha_pos')
      alpha_j = tf.stop_gradient(
          tf.nn.relu(score_j + self._margin), name='circle_loss_alpha_neg')
      return alpha_i * (1 - score_i - self._margin) + alpha_j * (
          score_j - self._margin)

    pairwise_labels, pairwise_logits = _pairwise_comparison(
        labels, logits, mask, pairwise_logits_op=circle_loss_pairwise_op)
    pairwise_weights = tf.stop_gradient(
        pairwise_labels, name='weights_stop_gradient')
    # TODO: try lambda_weights for circle loss.
    # Pairwise losses and weights will be of shape
    # [batch_size, list_size, list_size].
    losses = tf.exp(self._gamma * pairwise_logits)

    # This computes the per-list losses and weights for circle loss.
    per_list_losses = tf.math.log1p(
        tf.reduce_sum(tf.math.multiply(losses, pairwise_weights), axis=[1, 2]))
    per_list_weights = tf.reduce_sum(
        pairwise_weights, axis=[1, 2]) / tf.reduce_sum(
            tf.cast(pairwise_weights > 0, tf.float32), axis=[1, 2])

    # Return per-list losses and weights with shape [batch_size, 1].
    return tf.expand_dims(per_list_losses,
                          1), tf.expand_dims(per_list_weights, 1)

使用circle_loss_pairwise_op计算alpha_i，alpha_j，将compute_per_list函数在_compute_unreduced_loss_impl这个函数里直接使用，获得list维度的loss。

SoftmaxLoss：https://dl.acm.org/doi/pdf/10.1145/3341981.3344221

label的softmax，logits的softmax求交叉熵loss：losses = tf.compat.v1.nn.softmax_cross_entropy_with_logits_v2(
labels_for_softmax, logits_for_softmax)

具体实现如下：

class SoftmaxLoss(_ListwiseLoss):
  """Implements softmax loss."""

  def precompute(self, labels, logits, weights, mask=None):
    """Precomputes Tensors for softmax cross entropy inputs."""
    if mask is None:
      mask = utils.is_label_valid(labels)
    ranks = _compute_ranks(logits, mask)
    # Reset the masked labels to 0 and reset the masked logits to a logit with
    # ~= 0 contribution in softmax.
    labels = tf.compat.v1.where(mask, labels, tf.zeros_like(labels))
    logits = tf.compat.v1.where(mask, logits,
                                tf.math.log(_EPSILON) * tf.ones_like(logits))
    if self._lambda_weight is not None and isinstance(self._lambda_weight,
                                                      DCGLambdaWeight):
      labels = self._lambda_weight.individual_weights(labels, ranks)
    if weights is not None:
      labels *= weights
    return labels, logits

  def _compute_unreduced_loss_impl(self, labels, logits, mask=None):
    """See `_RankingLoss`."""
    if mask is None:
      mask = utils.is_label_valid(labels)
    label_sum = tf.reduce_sum(input_tensor=labels, axis=1, keepdims=True)
    # Padding for rows with label_sum = 0.
    nonzero_mask = tf.greater(tf.reshape(label_sum, [-1]), 0.0)
    padded_labels = tf.compat.v1.where(nonzero_mask, labels,
                                       _EPSILON * tf.ones_like(labels))
    padded_labels = tf.compat.v1.where(mask, padded_labels,
                                       tf.zeros_like(padded_labels))
    padded_label_sum = tf.reduce_sum(
        input_tensor=padded_labels, axis=1, keepdims=True)
    labels_for_softmax = tf.math.divide_no_nan(padded_labels, padded_label_sum)
    logits_for_softmax = logits
    # Padded labels have 0 weights in label_sum.
    weights_for_softmax = tf.reshape(label_sum, [-1])
    losses = tf.compat.v1.nn.softmax_cross_entropy_with_logits_v2(
        labels_for_softmax, logits_for_softmax)
    return losses, weights_for_softmax

  def compute(self, labels, logits, weights, reduction, mask=None):
    """See `_RankingLoss`."""
    labels, logits, weights, mask = self._prepare_and_validate_params(
        labels, logits, weights, mask)
    logits = self.get_logits(logits)
    labels, logits = self.precompute(labels, logits, weights, mask)
    losses, weights = self._compute_unreduced_loss_impl(labels, logits, mask)
    return tf.compat.v1.losses.compute_weighted_loss(
        losses, weights, reduction=reduction)

  def eval_metric(self, labels, logits, weights, mask=None):
    """See `_RankingLoss`."""
    labels, logits, weights, mask = self._prepare_and_validate_params(
        labels, logits, weights, mask)
    logits = self.get_logits(logits)
    labels, logits = self.precompute(labels, logits, weights, mask)
    losses, weights = self._compute_unreduced_loss_impl(labels, logits, mask)
    return tf.compat.v1.metrics.mean(losses, weights)

  def compute_per_list(self, labels, logits, weights, mask=None):
    """See `_RankingLoss`."""
    # Prepare input params.
    labels, logits, weights, mask = self._prepare_and_validate_params(
        labels, logits, weights, mask)

    # As opposed to the other listwise losses, SoftmaxLoss returns already
    # squeezed losses, which can be returned directly.
    logits = self.get_logits(logits)
    labels, logits = self.precompute(labels, logits, weights, mask)
    return self._compute_unreduced_loss_impl(labels, logits, mask)

  def compute_unreduced_loss(self, labels, logits, mask=None):
    """See `_RankingLoss`."""
    labels, logits, _, mask = self._prepare_and_validate_params(
        labels, logits, None, mask)
    logits = self.get_logits(logits)
    labels, logits = self.precompute(labels, logits, weights=None, mask=mask)
    return self._compute_unreduced_loss_impl(labels, logits, mask)

PolyOneSoftmaxLoss：

PolyLoss：一种将分类损失函数加入泰勒展开式的损失函数 - 知乎

https://zhuanlan.zhihu.com/p/534094714

论文笔记: ICLR 2022 | POLYLOSS: A POLYNOMIAL EXPANSION PERSPECTIVE OF CLASSIFICATION LOSS FUNCTIONS - 知乎

PolyLoss 为理解和改进常用的ce和focal loss提供了一个框架,灵感来自于下面这两个损失函数的泰勒展开式:

pt = tf.reduce_sum(labels_for_softmax * tf.nn.softmax(logits_for_softmax), axis=-1)
ce = tf.compat.v1.nn.softmax_cross_entropy_with_logits_v2(labels_for_softmax, logits_for_softmax)
losses = ce + self._epsilon * (1 - pt)

详细实现

class PolyOneSoftmaxLoss(SoftmaxLoss):
  """Implements poly1 softmax loss."""

  def __init__(self,
               name,
               lambda_weight=None,
               epsilon=1.0,
               temperature=1.0,
               ragged=False):
    """Constructor.

    Args:
      name: A string used as the name for this loss.
      lambda_weight: A `_LambdaWeight` object.
      epsilon: A float number for contribution of the first polynomial.
      temperature: A float number to modify the logits=logits/temperature.
      ragged: A boolean indicating whether the input tensors are ragged.
    """
    super().__init__(
        name,
        lambda_weight=lambda_weight,
        temperature=temperature,
        ragged=ragged)
    self._epsilon = epsilon

  def _compute_unreduced_loss_impl(self, labels, logits, mask=None):
    """See `_RankingLoss`."""
    if mask is None:
      mask = utils.is_label_valid(labels)
    label_sum = tf.reduce_sum(input_tensor=labels, axis=1, keepdims=True)
    # Padding for rows with label_sum = 0.
    nonzero_mask = tf.greater(tf.reshape(label_sum, [-1]), 0.0)
    padded_labels = tf.compat.v1.where(nonzero_mask, labels,
                                       _EPSILON * tf.ones_like(labels))
    padded_labels = tf.compat.v1.where(mask, padded_labels,
                                       tf.zeros_like(padded_labels))
    padded_label_sum = tf.reduce_sum(
        input_tensor=padded_labels, axis=1, keepdims=True)
    labels_for_softmax = tf.math.divide_no_nan(padded_labels, padded_label_sum)
    logits_for_softmax = logits
    # Padded labels have 0 weights in label_sum.
    weights_for_softmax = tf.reshape(label_sum, [-1])
    pt = tf.reduce_sum(
        labels_for_softmax * tf.nn.softmax(logits_for_softmax), axis=-1)
    ce = tf.compat.v1.nn.softmax_cross_entropy_with_logits_v2(
        labels_for_softmax, logits_for_softmax)
    losses = ce + self._epsilon * (1 - pt)
    return losses, weights_for_softmax

UniqueSoftmaxLoss：https://arxiv.org/pdf/2001.01828.pdf

Given the labels l_i and the logits s_i, the unique softmax loss is defined as
-sum_i (2^l_i - 1) * log(exp(s_i) / (sum_j exp(s_j) + exp(s_i))),
where j is over the documents with l_j < l_i.

class UniqueSoftmaxLoss(_ListwiseLoss):
  """Implements unique rating softmax loss."""

  def _compute_unreduced_loss_impl(self, labels, logits, mask=None):
    """See `_RankingLoss`."""
    if mask is None:
      mask = utils.is_label_valid(labels)
    labels = tf.compat.v1.where(mask, labels, tf.zeros_like(labels))
    logits = tf.compat.v1.where(mask, logits,
                                tf.math.log(_EPSILON) * tf.ones_like(logits))
    pairwise_labels, _ = _pairwise_comparison(labels, logits, mask)
    # Used in denominator to compute unique softmax probability for each doc.
    denominator_logits = tf.expand_dims(logits, axis=1) * pairwise_labels
    denominator_logits = tf.concat(
        [denominator_logits, tf.expand_dims(logits, axis=2)], axis=2)
    denominator_mask = tf.concat(
        [pairwise_labels,
         tf.expand_dims(tf.ones_like(logits), axis=2)], axis=2)
    denominator_logits = tf.where(
        tf.greater(denominator_mask, 0.0), denominator_logits, -1e-3 +
        tf.reduce_min(denominator_logits) * tf.ones_like(denominator_logits))
    logits_max = tf.reduce_max(denominator_logits, axis=-1, keepdims=True)
    # Subtract the max so that exp(denominator_logits) is numerically valid.
    denominator_logits -= logits_max
    logits -= tf.squeeze(logits_max, axis=-1)
    # Set gains for loss weights.
    gains = tf.pow(2.0, labels) - 1
    # Compute the softmax loss for each doc.
    per_doc_softmax = -logits + tf.math.log(
        tf.reduce_sum(tf.exp(denominator_logits) * denominator_mask, axis=-1))
    losses = tf.reduce_sum(per_doc_softmax * gains, axis=1, keepdims=True)
    return losses, tf.ones_like(losses)

MixtureEMLoss：多多个模型预测的logits融合

class MixtureEMLoss(_ListwiseLoss):
  """Implements the Mixture EM loss with examination and relevance.

  An Expecatation-Maximization (EM) algorithm is used for estimation and this
  function.
  """

  def __init__(self, name, temperature=1.0, alpha=1.0, ragged=False):
    super().__init__(name, None, temperature, ragged)
    self._alpha = alpha

  def _compute_model_prob(self, per_list_logodds):
    """Computes the probability of models in EM.

    Args:
      per_list_logodds: A `Tensor` with shape [batch_size, 1, model_num].

    Returns:
      A `Tensor` of probability with shape [batch_size, 1, model_num].
    """
    with tf.compat.v1.name_scope(name='compute_model_prob'):
      return tf.stop_gradient(
          tf.exp(-self._alpha *
                 (per_list_logodds -
                  tf.reduce_min(per_list_logodds, axis=2, keepdims=True))))

  def _compute_unreduced_loss_impl(self, labels, logits, mask=None):
    """Computes the loss for each element.

    Args:
      labels: A `Tensor` with shape [batch_size, list_size] representing clicks.
      logits: A `Tensor` with shape [batch_size, list_size, model_num], where
        the 3rd-dim is dimension for the models to mix.
      mask: A `Tensor` of the same shape as labels indicating which entries are
        valid for computing the loss.

    Returns:
      A tuple(losses, loss_weights).
    """
    if mask is None:
      mask = utils.is_label_valid(labels)
    labels = tf.compat.v1.where(mask, labels, tf.zeros_like(labels))
    # The loss in the M step.
    # shape = [batch_size, list_size, model_num]
    losses = tf.stack([
        tf.compat.v1.nn.sigmoid_cross_entropy_with_logits(
            labels=labels, logits=model_logits)
        for model_logits in tf.unstack(logits, axis=-1)
    ],
                      axis=2)
    losses = tf.where(
        tf.expand_dims(mask, axis=-1), losses,
        tf.zeros_like(losses, dtype=tf.float32))

    # The model probability in the E step.
    losses_no_gradient = tf.stop_gradient(losses)
    # shape = [batch_size, 1, model_num]
    per_list_logodds = tf.reduce_sum(losses_no_gradient, axis=1, keepdims=True)
    model_prob = self._compute_model_prob(per_list_logodds)
    prob_norm = tf.reduce_sum(model_prob, axis=2, keepdims=True)

    label_sum = tf.reduce_sum(input_tensor=labels, axis=1, keepdims=True)
    nonzero_mask = tf.greater(label_sum, 0.0)
    return tf.reshape(
        tf.reduce_sum(losses * model_prob / prob_norm, axis=[1, 2]),
        [-1, 1]), tf.cast(
            nonzero_mask, dtype=tf.float32)

ListMLELoss：ListNet和ListMLE - 知乎

排序学习-3.排序学习模型 - 知乎

Learning to Rank学习笔记--ListwiseRank - 知乎

在已知排序下，直接优化概率的负对数

class ListMLELoss(_ListwiseLoss):
  """Implements ListMLE loss."""

  def _compute_unreduced_loss_impl(self, labels, logits, mask=None):
    """See `_RankingLoss`."""
    if mask is None:
      mask = utils.is_label_valid(labels)
    # Reset the masked labels to 0 and reset the masked logits to a logit with
    # ~= 0 contribution.
    labels = tf.compat.v1.where(mask, labels, tf.zeros_like(labels))
    logits = tf.compat.v1.where(mask, logits,
                                tf.math.log(_EPSILON) * tf.ones_like(logits))
    scores = tf.compat.v1.where(
        mask, labels,
        tf.reduce_min(input_tensor=labels, axis=1, keepdims=True) -
        1e-6 * tf.ones_like(labels))
    # Use a fixed ops-level seed and the randomness is controlled by the
    # graph-level seed.
    sorted_labels, sorted_logits = utils.sort_by_scores(
        scores, [labels, logits], shuffle_ties=True, seed=37)

    raw_max = tf.reduce_max(input_tensor=sorted_logits, axis=1, keepdims=True)
    sorted_logits = sorted_logits - raw_max
    sums = tf.cumsum(tf.exp(sorted_logits), axis=1, reverse=True)
    sums = tf.math.log(sums) - sorted_logits

    if self._lambda_weight is not None and isinstance(self._lambda_weight,
                                                      ListMLELambdaWeight):
      batch_size, list_size = tf.unstack(tf.shape(input=sorted_labels))
      sums *= self._lambda_weight.individual_weights(
          sorted_labels,
          tf.tile(tf.expand_dims(tf.range(list_size) + 1, 0), [batch_size, 1]))

    negative_log_likelihood = tf.reduce_sum(
        input_tensor=sums, axis=1, keepdims=True)
    return negative_log_likelihood, tf.ones_like(negative_log_likelihood)

ApproxNDCGLoss：将ndcg可微分化函数表示，然后作为loss进行优化

排序学习-4.ApproxNDCG与NeuralNDCG - 知乎

https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/tr-2008-164.pdf 这篇paper阐述了位置使用可微函数替代的近似，可以是sigmiod，也可以是其他的，论文只是以sigmiod为例进行阐述

https://dl.acm.org/doi/pdf/10.1145/3331184.3331347

ApproxNDCGLoss的实现

class ApproxNDCGLoss(_ListwiseLoss):
  """Implements ApproxNDCG loss."""

  # Use a different default temperature.
  def __init__(self, name, lambda_weight=None, temperature=0.1, ragged=False):
    """See `_ListwiseLoss`."""
    super().__init__(name, lambda_weight, temperature, ragged)

  def _compute_unreduced_loss_impl(self, labels, logits, mask=None):
    """See `_RankingLoss`."""
    if mask is None:
      mask = utils.is_label_valid(labels)
    labels = tf.compat.v1.where(mask, labels, tf.zeros_like(labels))
    logits = tf.compat.v1.where(
        mask, logits, -1e3 * tf.ones_like(logits) +
        tf.reduce_min(input_tensor=logits, axis=-1, keepdims=True))

    label_sum = tf.reduce_sum(input_tensor=labels, axis=1, keepdims=True)
    nonzero_mask = tf.greater(tf.reshape(label_sum, [-1]), 0.0)
    labels = tf.compat.v1.where(nonzero_mask, labels,
                                _EPSILON * tf.ones_like(labels))
    ranks = approx_ranks(logits)

    return -ndcg(labels, ranks), tf.reshape(
        tf.cast(nonzero_mask, dtype=tf.float32), [-1, 1])

根据logits得分计算位置信息

def approx_ranks(logits):
  r"""Computes approximate ranks given a list of logits.

  Given a list of logits, the rank of an item in the list is one plus the total
  number of items with a larger logit. In other words,

    rank_i = 1 + \sum_{j \neq i} I_{s_j > s_i},

  where "I" is the indicator function. The indicator function can be
  approximated by a generalized sigmoid:

    I_{s_j < s_i} \approx 1/(1 + exp(-(s_j - s_i)/temperature)).

  This function approximates the rank of an item using this sigmoid
  approximation to the indicator function. This technique is at the core
  of "A general approximation framework for direct optimization of
  information retrieval measures" by Qin et al.

  Args:
    logits: A `Tensor` with shape [batch_size, list_size]. Each value is the
      ranking score of the corresponding item.

  Returns:
    A `Tensor` of ranks with the same shape as logits.
  """
  list_size = tf.shape(input=logits)[1]
  x = tf.tile(tf.expand_dims(logits, 2), [1, 1, list_size])
  y = tf.tile(tf.expand_dims(logits, 1), [1, list_size, 1])
  pairs = tf.sigmoid(y - x)
  return tf.reduce_sum(input_tensor=pairs, axis=-1) + .5

利用得到的sigmiod表示的位置排序，计算ndcg指标

def ndcg(labels, ranks=None, perm_mat=None):
  """Computes NDCG from labels and ranks.

  Args:
    labels: A `Tensor` with shape [batch_size, list_size], representing graded
      relevance.
    ranks: A `Tensor` of the same shape as labels, or [1, list_size], or None.
      If ranks=None, we assume the labels are sorted in their rank.
    perm_mat: A `Tensor` with shape [batch_size, list_size, list_size] or None.
      Permutation matrices with rows correpond to the ranks and columns
      correspond to the indices. An argmax over each row gives the index of the
      element at the corresponding rank.

  Returns:
    A `tensor` of NDCG, ApproxNDCG, or ExpectedNDCG of shape [batch_size, 1].
  """
  if ranks is not None and perm_mat is not None:
    raise ValueError('Cannot use both ranks and perm_mat simultaneously.')

  if ranks is None:
    list_size = tf.shape(labels)[1]
    ranks = tf.range(list_size) + 1
  discounts = 1. / tf.math.log1p(tf.cast(ranks, dtype=tf.float32))
  gains = _safe_default_gain_fn(tf.cast(labels, dtype=tf.float32))
  if perm_mat is not None:
    gains = tf.reduce_sum(
        input_tensor=perm_mat * tf.expand_dims(gains, 1), axis=-1)
  dcg = tf.reduce_sum(input_tensor=gains * discounts, axis=-1, keepdims=True)
  normalized_dcg = dcg * inverse_max_dcg(labels, gain_fn=_safe_default_gain_fn)

为了避免极值出现的情况，采用了安全的gain计算逻辑

def _safe_default_gain_fn(labels):
  """Calculates safe gain functions for NDCG.

  In applications such as distillation, the labels could have extreme values
  that might result in numerical error when using the original gain function.
  This should only be applied to NDCG related losses, but not DCG ones. It
  should be applied on both the numerator and the denominator of NDCG.

  Args:
    labels: A `Tensor` with shape [batch_size, list_size], representing graded
      relevance.
  Returns:
    A `tensor` of safe gain function values of shape [batch_size, list_size].
  """
  max_labels = tf.reduce_max(labels, axis=-1, keepdims=True)
  gains = tf.pow(2., labels - max_labels) - tf.pow(2., -max_labels)
  return gains

根据labels排序计算最理想状态下的dcg

def inverse_max_dcg(labels,
                    gain_fn=lambda labels: tf.pow(2.0, labels) - 1.,
                    rank_discount_fn=lambda rank: 1. / tf.math.log1p(rank),
                    topn=None):
  """Computes the inverse of max DCG.

  Args:
    labels: A `Tensor` with shape [batch_size, list_size]. Each value is the
      graded relevance of the corresponding item.
    gain_fn: A gain function. By default this is set to: 2^label - 1.
    rank_discount_fn: A discount function. By default this is set to:
      1/log(1+rank).
    topn: An integer as the cutoff of examples in the sorted list.

  Returns:
    A `Tensor` with shape [batch_size, 1].
  """
  ideal_sorted_labels, = utils.sort_by_scores(labels, [labels], topn=topn)
  rank = tf.range(tf.shape(input=ideal_sorted_labels)[1]) + 1
  discounted_gain = gain_fn(ideal_sorted_labels) * rank_discount_fn(
      tf.cast(rank, dtype=tf.float32))
  discounted_gain = tf.reduce_sum(
      input_tensor=discounted_gain, axis=1, keepdims=True)
  return tf.compat.v1.where(
      tf.greater(discounted_gain, 0.), 1. / discounted_gain,
      tf.zeros_like(discounted_gain))

ApproxMRRLoss：近似mrr指标优化，将位置倒数用可微分可导函数表示，计算mrr指标，表示为待优化的loss。

MRR排序指标：平均倒数排名（Mean Reciprocal Rank,MRR），该指标反应的是我们找到的这些item是否摆在用户更明显的位置，强调位置关系，顺序性。公式如下，N表示推荐次数，��表示用户真实访问的item在推荐列表中的位置，如果没在推荐序列中，则p为无穷大，1/p为0。

https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/tr-2008-164.pdf

class ApproxMRRLoss(_ListwiseLoss):
  """Implements ApproxMRR loss."""

  # Use a different default temperature.
  def __init__(self, name, lambda_weight=None, temperature=0.1, ragged=False):
    """See `_ListwiseLoss`."""
    super().__init__(name, lambda_weight, temperature, ragged)

  def _compute_unreduced_loss_impl(self, labels, logits, mask=None):
    """See `_RankingLoss`."""
    if mask is None:
      mask = utils.is_label_valid(labels)
    labels = tf.compat.v1.where(mask, labels, tf.zeros_like(labels))
    logits = tf.compat.v1.where(
        mask, logits, -1e3 * tf.ones_like(logits) +
        tf.math.reduce_min(input_tensor=logits, axis=-1, keepdims=True))

    label_sum = tf.math.reduce_sum(input_tensor=labels, axis=1, keepdims=True)

    nonzero_mask = tf.math.greater(tf.reshape(label_sum, [-1]), 0.0)
    labels = tf.compat.v1.where(nonzero_mask, labels,
                                _EPSILON * tf.ones_like(labels))

    rr = 1. / approx_ranks(logits)
    rr = tf.math.reduce_sum(input_tensor=rr * labels, axis=-1, keepdims=True)
    mrr = rr / tf.math.reduce_sum(input_tensor=labels, axis=-1, keepdims=True)
    return -mrr, tf.reshape(tf.cast(nonzero_mask, dtype=tf.float32), [-1, 1])

NeuralSortNDCGLoss：NDCG指标loss化的扩展版

class NeuralSortNDCGLoss(_ListwiseLoss):
  """Implements PiRank-NDCG loss.

  The PiRank-NDCG loss is a differentiable approximation of the NDCG metric
  using the NeuralSort trick, which generates a permutation matrix based on
  ranking scores. Please refer to https://arxiv.org/abs/2012.06731 for the
  PiRank method. For PiRank-NDCG in specific,
    NDCG_metric = - sum_i (2^y_i - 1) / log(1 + r_i) / maxDCG,
  where y_i and r_i are the label and the score rank of the ith document
  respectively. This metric can be also written as the sum over rank r with an
  indicator function I,
    NDCG_metric = - sum_{i,r} (2^y_i - 1) / log(1 + r) * I(r, r_i) / maxDCG,
  where the indicator function I(r, r_i) = 1 if r = r_i and 0 otherwise, which
  is the permutation matrix.

  Approximated with a differentiable permutation matrix using neural sort,
    PiRank-NDCG = - sum_{i,r} (2^y_i - 1) / log(1 + r) * P(r, i) / maxDCG,
  where P(r, i) is the approximation of the permutation matrix.
  """

  def _compute_unreduced_loss_impl(self, labels, logits, mask=None):
    """See `_RankingLoss`."""
    if mask is None:
      mask = utils.is_label_valid(labels)
    labels = tf.compat.v1.where(mask, labels, tf.zeros_like(labels))
    logits = tf.compat.v1.where(mask, logits, tf.zeros_like(logits))

    label_sum = tf.reduce_sum(input_tensor=labels, axis=1, keepdims=True)
    nonzero_mask = tf.greater(tf.reshape(label_sum, [-1]), 0.0)
    # shape = [batch_size, list_size].
    labels = tf.compat.v1.where(nonzero_mask, labels,
                                _EPSILON * tf.ones_like(labels))
    # shape = [batch_size, list_size, list_size].
    smooth_perm = neural_sort(logits, mask=mask)

    return -ndcg(
        labels, perm_mat=smooth_perm), tf.reshape(
            tf.cast(nonzero_mask, dtype=tf.float32), [-1, 1])


def neural_sort(logits, name=None, mask=None):
  r"""Generate the permutation matrix from logits by deterministic neuralsort.

  The sort on a list of logits can be approximated by a differentiable
  permutation matrix using Neural Sort (https://arxiv.org/abs/1903.08850).
  The approximation is achieved by constructing a list of functions on logits,
    fn_i(k) = (list_size + 1 - 2*i) * logit_k - sum_j |logit_k - logit_j|,
  whose value is maximal when k is at the ith largest logit.
  So that the permutation matrix can be expressed as
           / 1 if j = argmax_k fn_i(k)
    P_ij = |                           = one_hot(argmax(fn_i(j))).
           \ 0 otherwise
  And the differentiable approximation of the matrix is applied with softmax,
    P^_ij = softmax(fn_i(j) / temperature),
  where the parameter temperature tunes the smoothiness of the approximation.

  #### References
  [1]: Aditya Grover, Eric Wang, Aaron Zweig, Stefano Ermon.
       Stochastic Optimization of Sorting Networks via Continuous Relaxations.
       https://arxiv.org/abs/1903.08850

  Args:
    logits: A `Tensor` with shape [batch_size, list_size]. Each value is the
      ranking score of the corresponding item. (We are using logits here,
      noticing the original paper is using probability weights, i.e., the
      exponentials of the logits).
    name: A string used as the name for this loss.
    mask: A `Tensor` with the same shape as logits indicating which entries are
      valid for computing the neural_sort. Invalid entries are pushed to the
      end.

  Returns:
    A tensor of permutation matrices whose dimension is [batch_size, list_size,
    list_size].
  """
  with tf.compat.v1.name_scope(name, 'neural_sort', [logits]):
    if mask is None:
      mask = tf.ones_like(logits, dtype=tf.bool)

    # Reset logits to 0 and compute number of valid entries for each list in the
    # batch.
    logits = tf.where(mask, logits, tf.zeros_like(logits))
    num_valid_entries = tf.reduce_sum(
        tf.cast(mask, dtype=tf.int32), axis=1, keepdims=True)

    # Compute logit differences and mask out invalid entries.
    logit_diff = tf.abs(tf.expand_dims(logits, 2) - tf.expand_dims(logits, 1))
    valid_pair_mask = _apply_pairwise_op(tf.logical_and, mask)
    logit_diff = tf.where(valid_pair_mask, logit_diff,
                          tf.zeros_like(logit_diff))
    # shape = [batch_size, 1, list_size].
    logit_diff_sum = tf.reduce_sum(
        input_tensor=logit_diff, axis=1, keepdims=True)

    # Compute masked range so that masked items do not influence scaling.
    masked_range = tf.cumsum(tf.cast(mask, dtype=tf.int32), axis=1)
    scaling = tf.cast(
        num_valid_entries + 1 - 2 * masked_range, dtype=tf.float32)
    # shape = [batch_size, list_size].
    scaling = tf.expand_dims(scaling, 2)
    # shape = [batch_size, list_size, list_size].
    # Use broadcast to align the dims.
    scaled_logits = scaling * tf.expand_dims(logits, 1)

    p_logits = scaled_logits - logit_diff_sum

    # Masked entries will be forcefully kept in-place by setting their values to
    # -inf everywhere, except for masked rows where they share equal probability
    # with other masked items.
    p_logits = tf.where(valid_pair_mask, p_logits, -math.inf)
    p_logits = tf.where(
        _apply_pairwise_op(tf.logical_or, mask), p_logits,
        tf.zeros_like(p_logits))

    # By swapping the rows of masked items to the end of the permutation matrix,
    # we force masked items to be placed last.
    sorted_mask_indices = tf.argsort(
        tf.cast(mask, dtype=tf.int32),
        axis=1,
        direction='DESCENDING',
        stable=True)
    p_logits = tf.gather(p_logits, sorted_mask_indices, batch_dims=1, axis=1)

    smooth_perm = tf.nn.softmax(p_logits, -1)

    return smooth_perm

CoupledRankDistilLoss：

class CoupledRankDistilLoss(_ListwiseLoss):
  r"""Implements Coupled-RankDistil loss.

  The Coupled-RankDistil loss ([Reddi et al, 2021][reddi2021]) is the
  cross-entropy between k-Plackett's probability of logits (student) and labels
  (teacher).

  The k-Plackett's probability model is defined as:
  $$
  \mathcal{P}_k(\pi|s) = \frac{1}{(N-k)!} \\
  \frac{\prod_{i=1}^k exp(s_{\pi(i)})}{\sum_{j=k}^N log(exp(s_{\pi(i)}))}.
  $$

  The Coupled-RankDistil loss is defined as:
  $$
  \mathcal{L}(y, s) = -\sum_{\pi} \mathcal{P}_k(\pi|y) log\mathcal{P}(\pi|s) \\
  =  \mathcal{E}_{\pi \sim \matcal{P}(.|y)} [-\log \mathcal{P}(\pi|s)]
  $$

    References:
    - [RankDistil: Knowledge Distillation for Ranking, Reddi et al,
       2021][reddi2021]

  [reddi2021]: https://research.google/pubs/pub50695/
  """

  def __init__(self,
               name,
               sample_size,
               topk=None,
               temperature=1.,
               ragged=False):
    """Initializer.

    Args:
      name: A string used as the name for this loss.
      sample_size: Number of permutations to sample from teacher scores.
      topk: top-k entries over which order is matched. A penalty is applied over
        non top-k items.
      temperature: A float number to modify the logits as
        `logits=logits/temperature`.
      ragged: A boolean indicating whether the input tensors are ragged.
    """
    super().__init__(name, None, temperature, ragged)
    self._sample_size = sample_size
    self._topk = topk

  def _compute_unreduced_loss_impl(self, labels, logits, mask=None):
    """See `_RankingLoss`."""
    if mask is None:
      mask = utils.is_label_valid(labels)
    labels = tf.where(mask, labels, tf.zeros_like(labels))
    label_sum = tf.reduce_sum(input_tensor=labels, axis=1, keepdims=True)
    nonzero_mask = tf.greater(tf.reshape(label_sum, [-1]), 0.0)

    teacher_scores = tf.where(mask, labels,
                              tf.math.log(_EPSILON) * tf.ones_like(labels))

    student_scores = tf.where(mask, logits,
                              tf.math.log(_EPSILON) * tf.ones_like(logits))

    # Sample teacher scores.
    # [batch_size, list_size] -> [batch_size, sample_size, list_size].
    sampled_teacher_scores = tf.expand_dims(teacher_scores, 1)
    sampled_teacher_scores = tf.repeat(
        sampled_teacher_scores, [self._sample_size], axis=1)

    batch_size, list_size = tf.unstack(tf.shape(input=labels))
    sampled_teacher_scores += _sample_gumbel(
        [batch_size, self._sample_size, list_size], seed=37)
    sampled_teacher_scores = tf.math.log(
        tf.nn.softmax(sampled_teacher_scores) + _EPSILON)

    # Expand student scores.
    # [batch_size, list_size] -> [batch_size, sample_size, list_size].
    expanded_student_scores = tf.expand_dims(student_scores, 1)
    expanded_student_scores = tf.repeat(
        expanded_student_scores, [self._sample_size], axis=1)

    # Sort teacher scores and student scores to obtain top-k student scores
    # whose order is based on teacher scores.
    sorted_student_scores = utils.sort_by_scores(
        utils.reshape_first_ndims(sampled_teacher_scores, 2,
                                  [batch_size * self._sample_size]),
        [
            utils.reshape_first_ndims(expanded_student_scores, 2,
                                      [batch_size * self._sample_size])
        ],
        shuffle_ties=True,
        seed=37)[0]
    sorted_student_scores = utils.reshape_first_ndims(
        sorted_student_scores, 1, [batch_size, self._sample_size])
    topk = self._topk or list_size
    topk_student_scores = sorted_student_scores[:, :, :topk]

    # For \pi from teacher scores, compute top-k Plackett's probability as:
    # \prod_{i=1}^k exp(s_{\pi(i)}) / \sum_{j=k}^N log(exp(s_{\pi(i)})).

    # Compute the denominator mask for  \sum_{j=k}^N log(exp(s_{\pi(i)}).
    # We apply logsumexp over valid entries in this mask.
    # topk_pl_denominator_mask = batch x sample_size x valid_denom_entries,
    # where valid_denom_entries = [[1 1 1 1 1 1]
    #                             [0 1 1 1 1 1]
    #                             [0 0 1 1 1 1]].
    # An alternative implementation would be to use `cumulative_logsumexp` with
    # `reverse=True` to compute the denominator term.
    ones = tf.ones((topk, list_size), dtype=tf.float32)
    ones_upper = tf.linalg.band_part(ones, 0, -1)
    topk_pl_denominator_mask = tf.tile(
        tf.expand_dims(ones_upper, axis=0),
        [batch_size * self._sample_size, 1, 1])
    # [batch_size * sample_size, topk, list_size] ->
    # [batch_size, sample_size, topk, list_size].
    topk_pl_denominator_mask = tf.cast(
        utils.reshape_first_ndims(topk_pl_denominator_mask, 1,
                                  [batch_size, self._sample_size]),
        dtype=tf.bool)
    sorted_student_scores = tf.tile(
        tf.expand_dims(sorted_student_scores, 2), [1, 1, topk, 1])

    sorted_student_scores_denom = tf.where(
        topk_pl_denominator_mask, sorted_student_scores,
        tf.math.log(_EPSILON) * tf.ones_like(sorted_student_scores))
    logprob = topk_student_scores - tf.math.reduce_logsumexp(
        sorted_student_scores_denom, axis=3)
    # Compute log-likelihood over top-k Plackett-Luce scores.
    # [batch_size, sample_size, topk] -> [batch_size, sample_size].
    logprob = tf.reduce_sum(logprob, axis=2)

    # Compute RankDistil loss as a mean over samples.
    # [batch_size, sample_size] -> [batch_size, 1].
    nll = tf.reduce_mean(-logprob, axis=1, keepdims=True)

    return nll, tf.reshape(tf.cast(nonzero_mask, dtype=tf.float32), [-1, 1])

3、实例

3.1、各种loss基类实现

import tensorflow as tf
_EPSILON = 1e-10

def is_label_valid(labels):
  labels = tf.convert_to_tensor(value=labels)
  return tf.greater_equal(labels, 0.)

def _check_tensor_shapes(tensors):
  for tensor in tensors:
    tensor = tf.convert_to_tensor(value=tensor)
    tensor.get_shape().assert_has_rank(2)
    tensor.get_shape().assert_is_compatible_with(
        tf.convert_to_tensor(value=tensors[0]).get_shape())

def _apply_pairwise_op(op, tensor):
  _check_tensor_shapes([tensor])
  return op(tf.expand_dims(tensor, 2), tf.expand_dims(tensor, 1))

def _pairwise_comparison(labels, logits, mask, pairwise_logits_op=tf.subtract):
  pairwise_label_diff = _apply_pairwise_op(tf.subtract, labels)
  pairwise_logits = _apply_pairwise_op(pairwise_logits_op, logits)
  pairwise_labels = tf.cast(
      tf.greater(pairwise_label_diff, 0), dtype=tf.float32)
  valid_pair = _apply_pairwise_op(tf.logical_and, mask)
  pairwise_labels *= tf.cast(valid_pair, dtype=tf.float32)
  return pairwise_labels, pairwise_logits

def sort_by_scores(scores,
                   features_list,
                   topn=None,
                   shuffle_ties=True,
                   seed=None,
                   mask=None):
  with tf.compat.v1.name_scope(name='sort_by_scores'):
    scores = tf.cast(scores, tf.float32)
    scores.get_shape().assert_has_rank(2)
    list_size = tf.shape(input=scores)[1]
    if topn is None:
      topn = list_size
    topn = tf.minimum(topn, list_size)
    _, indices = tf.math.top_k(scores, topn, sorted=True)
    return [tf.gather(f, indices, batch_dims=1, axis=1) for f in features_list]

def sorted_ranks(scores, shuffle_ties=True, seed=None):
  with tf.compat.v1.name_scope(name='sorted_ranks'):
    batch_size, list_size = tf.unstack(tf.shape(input=scores))
    positions = tf.tile(tf.expand_dims(tf.range(list_size), 0), [batch_size, 1])
    sorted_positions = sort_by_scores(
        scores, [positions], shuffle_ties=shuffle_ties, seed=seed)[0]
    ranks = tf.argsort(sorted_positions) + 1
    return ranks


def _compute_ranks(logits, is_valid):
  _check_tensor_shapes([logits, is_valid])
  scores = tf.compat.v1.where(
      is_valid, logits, -1e-6 * tf.ones_like(logits) +
      tf.reduce_min(input_tensor=logits, axis=1, keepdims=True))
  return sorted_ranks(scores)

class _RankingLoss(object):
  def __init__(self, name, lambda_weight=None, temperature=1.0, ragged=False):
    self._name = name
    self._lambda_weight = lambda_weight
    self._temperature = temperature
    self._ragged = ragged

  @property
  def name(self):
    return self._name

  def _prepare_and_validate_params(self, labels, logits, weights, mask):
    labels = tf.convert_to_tensor(labels)
    logits = tf.convert_to_tensor(logits)
    if weights is None:
      weights = 1.0
    if mask is None:
      mask = is_label_valid(labels)
    weights = tf.convert_to_tensor(weights)
    mask = tf.convert_to_tensor(mask)
    return labels, logits, weights, mask

  def compute_unreduced_loss(self, labels, logits, mask=None):
    labels, logits, _, mask = self._prepare_and_validate_params(
        labels, logits, None, mask)
    return self._compute_unreduced_loss_impl(labels, logits, mask)

  def _compute_unreduced_loss_impl(self, labels, logits, mask=None):
    raise NotImplementedError('Calling an abstract method.')

  def normalize_weights(self, labels, weights):
    return self._normalize_weights_impl(labels, weights)

  def _normalize_weights_impl(self, labels, weights):
    del labels
    return 1.0 if weights is None else weights

  def get_logits(self, logits):
    if not tf.is_tensor(logits):
      logits = tf.convert_to_tensor(value=logits)
    return logits / self._temperature

  def compute(self, labels, logits, weights, reduction, mask=None):
    logits = self.get_logits(logits)
    losses, loss_weights = self._compute_unreduced_loss_impl(
        labels, logits, mask)
    weights = tf.multiply(
        self._normalize_weights_impl(labels, weights), loss_weights)
    return tf.compat.v1.losses.compute_weighted_loss(
        losses, weights, reduction=reduction)

  def compute_per_list(self, labels, logits, weights, mask=None):
    raise NotImplementedError('Calling an abstract method.')

  def eval_metric(self, labels, logits, weights, mask=None):
    losses, loss_weights = self._compute_unreduced_loss_impl(
        labels, logits, mask)
    weights = tf.multiply(
        self._normalize_weights_impl(labels, weights), loss_weights)
    return tf.compat.v1.metrics.mean(losses, weights)

class _PointwiseLoss(_RankingLoss):
  def _normalize_weights_impl(self, labels, weights):
    if weights is None:
      weights = 1.
    return tf.compat.v1.where(
        is_label_valid(labels),
        tf.ones_like(labels) * weights, tf.zeros_like(labels))

  def compute_per_list(self, labels, logits, weights, mask=None):
    labels, logits, weights, mask = self._prepare_and_validate_params(
        labels, logits, weights, mask)

    losses, loss_weights = self._compute_unreduced_loss_impl(
        labels, logits, mask)
    weights = tf.multiply(
        self._normalize_weights_impl(labels, weights), loss_weights)

    weighted_per_item_loss = tf.math.multiply(losses, weights)

    per_list_weights = tf.reduce_sum(weights, axis=1)

    per_list_losses = tf.reduce_sum(weighted_per_item_loss, axis=1)

    per_list_losses = tf.math.divide_no_nan(per_list_losses, per_list_weights)
    return per_list_losses, per_list_weights

class _PairwiseLoss(_RankingLoss):
  def _pairwise_loss(self, pairwise_logits):
    raise NotImplementedError('Calling an abstract method.')

  def _compute_unreduced_loss_impl(self, labels, logits, mask=None):
    if mask is None:
      mask = is_label_valid(labels)
    ranks = _compute_ranks(logits, mask)
    pairwise_labels, pairwise_logits = _pairwise_comparison(
        labels, logits, mask)
    pairwise_weights = pairwise_labels
    pairwise_weights = tf.stop_gradient(
        pairwise_weights, name='weights_stop_gradient')
    return self._pairwise_loss(pairwise_logits), pairwise_weights

  def compute_per_list(self, labels, logits, weights, mask=None):
    labels, logits, weights, mask = self._prepare_and_validate_params(
        labels, logits, weights, mask)
    losses, loss_weights = self._compute_unreduced_loss_impl(
        labels, logits, mask)
    weights = tf.multiply(
        self._normalize_weights_impl(labels, weights), loss_weights)
    weighted_per_pair_loss = tf.math.multiply(losses, weights)
    per_list_weights = tf.reduce_sum(weights, axis=[1, 2])
    per_list_losses = tf.reduce_sum(weighted_per_pair_loss, axis=[1, 2])
    per_list_losses = tf.math.divide_no_nan(per_list_losses, per_list_weights)
    return per_list_losses, per_list_weights

  def _normalize_weights_impl(self, labels, weights):
    if weights is None:
      weights = 1.
    weights = tf.compat.v1.where(
        is_label_valid(labels),
        tf.ones_like(labels) * weights, tf.zeros_like(labels))
    return tf.expand_dims(weights, axis=2)



class _ListwiseLoss(_RankingLoss):

  def _normalize_weights_impl(self, labels, weights):
    if weights is None:
      return 1.0
    else:
      weights = tf.convert_to_tensor(value=weights)
      labels = tf.convert_to_tensor(value=labels)
      is_valid = is_label_valid(labels)
      labels = tf.where(is_valid, labels, tf.zeros_like(labels))
      return tf.compat.v1.math.divide_no_nan(
          tf.reduce_sum(input_tensor=(weights * labels), axis=1, keepdims=True),
          tf.reduce_sum(input_tensor=labels, axis=1, keepdims=True))

  def compute_per_list(self, labels, logits, weights, mask=None):
    labels, logits, weights, mask = self._prepare_and_validate_params(
        labels, logits, weights, mask)
    losses, loss_weights = self._compute_unreduced_loss_impl(
        labels, logits, mask)
    weights = tf.multiply(
        self._normalize_weights_impl(labels, weights), loss_weights)
    per_list_losses = tf.squeeze(losses, axis=1)
    per_list_weights = tf.squeeze(weights, axis=1)
    return per_list_losses, per_list_weights

class SigmoidCrossEntropyLoss(_PointwiseLoss):
  def __init__(self, name, temperature=1.0, ragged=False):
    super().__init__(name, None, temperature, ragged)
  def _compute_unreduced_loss_impl(self, labels, logits, mask=None):
    if mask is None:
      mask = is_label_valid(labels)
    labels = tf.compat.v1.where(mask, labels, tf.zeros_like(labels))
    logits = tf.compat.v1.where(mask, logits, tf.zeros_like(logits))
    losses = tf.compat.v1.nn.sigmoid_cross_entropy_with_logits(
        labels=labels, logits=logits)
    return losses, tf.cast(mask, dtype=tf.float32)

class PairwiseLogisticLoss(_PairwiseLoss):
  def _pairwise_loss(self, pairwise_logits):
    return tf.nn.relu(-pairwise_logits) + tf.math.log1p(
        tf.exp(-tf.abs(pairwise_logits)))

class SoftmaxLoss(_ListwiseLoss):
  def precompute(self, labels, logits, weights, mask=None):
    if mask is None:
      mask = is_label_valid(labels)
    ranks = _compute_ranks(logits, mask)
    # Reset the masked labels to 0 and reset the masked logits to a logit with
    # ~= 0 contribution in softmax.
    labels = tf.compat.v1.where(mask, labels, tf.zeros_like(labels))
    logits = tf.compat.v1.where(mask, logits,
                                tf.math.log(_EPSILON) * tf.ones_like(logits))
    if weights is not None:
      labels *= weights
    return labels, logits

  def _compute_unreduced_loss_impl(self, labels, logits, mask=None):
    if mask is None:
      mask = is_label_valid(labels)
    label_sum = tf.reduce_sum(input_tensor=labels, axis=1, keepdims=True)
    nonzero_mask = tf.greater(tf.reshape(label_sum, [-1]), 0.0)
    padded_labels = tf.compat.v1.where(nonzero_mask, labels,
                                       _EPSILON * tf.ones_like(labels))
    padded_labels = tf.compat.v1.where(mask, padded_labels,
                                       tf.zeros_like(padded_labels))
    padded_label_sum = tf.reduce_sum(
        input_tensor=padded_labels, axis=1, keepdims=True)
    labels_for_softmax = tf.math.divide_no_nan(padded_labels, padded_label_sum)
    logits_for_softmax = logits
    weights_for_softmax = tf.reshape(label_sum, [-1])
    losses = tf.compat.v1.nn.softmax_cross_entropy_with_logits_v2(
        labels_for_softmax, logits_for_softmax)
    return losses, weights_for_softmax

  def compute(self, labels, logits, weights, reduction, mask=None):
    labels, logits, weights, mask = self._prepare_and_validate_params(
        labels, logits, weights, mask)
    logits = self.get_logits(logits)
    labels, logits = self.precompute(labels, logits, weights, mask)
    losses, weights = self._compute_unreduced_loss_impl(labels, logits, mask)
    return tf.compat.v1.losses.compute_weighted_loss(
        losses, weights, reduction=reduction)

  def eval_metric(self, labels, logits, weights, mask=None):
    labels, logits, weights, mask = self._prepare_and_validate_params(
        labels, logits, weights, mask)
    logits = self.get_logits(logits)
    labels, logits = self.precompute(labels, logits, weights, mask)
    losses, weights = self._compute_unreduced_loss_impl(labels, logits, mask)
    return tf.compat.v1.metrics.mean(losses, weights)

  def compute_per_list(self, labels, logits, weights, mask=None):
    labels, logits, weights, mask = self._prepare_and_validate_params(
        labels, logits, weights, mask)

    logits = self.get_logits(logits)
    labels, logits = self.precompute(labels, logits, weights, mask)
    return self._compute_unreduced_loss_impl(labels, logits, mask)

  def compute_unreduced_loss(self, labels, logits, mask=None):
    labels, logits, _, mask = self._prepare_and_validate_params(
        labels, logits, None, mask)
    logits = self.get_logits(logits)
    labels, logits = self.precompute(labels, logits, weights=None, mask=mask)
    return self._compute_unreduced_loss_impl(labels, logits, mask)

3.2、pointwise测试用例

3.2.1、代码实现

# pointwise测试
import tensorflow as tf
from rank_loss import SigmoidCrossEntropyLoss
import math

def _sigmoid_cross_entropy(labels, logits):
  def per_position_loss(logit, label):
    return max(logit, 0) - logit * label + math.log(1 + math.exp(-abs(logit)))

  return sum(
      per_position_loss(logit, label) for label, logit in zip(labels, logits))

class SigmoidCrossEntropyLossTest(tf.test.TestCase):

  def test_sigmoid_cross_entropy_loss(self):
    scores = [[0.2, 0.5, 0.3], [0.2, 0.3, 0.5], [0.2, 0.3, 0.5]]
    labels = [[0., 0., 1.], [0., 0., 2.], [0., 0., 0.]]
    weights = [[2.], [1.], [1.]]
    reduction = tf.compat.v1.losses.Reduction.SUM_BY_NONZERO_WEIGHTS
    loss_fn = SigmoidCrossEntropyLoss(name=None)
    loss1=loss_fn.compute(labels, scores, None, reduction).numpy()
    loss2=(_sigmoid_cross_entropy(labels[0], scores[0]) +
          _sigmoid_cross_entropy(labels[1], scores[1]) +
         _sigmoid_cross_entropy(labels[2], scores[2])) / 9.
    print("loss1:",loss1)
    print("loss2:",loss2)

    loss3=loss_fn.compute(labels, scores, weights, reduction).numpy()
    loss4=(_sigmoid_cross_entropy(labels[0], scores[0]) * 2.0 +
         _sigmoid_cross_entropy(labels[1], scores[1]) +
         _sigmoid_cross_entropy(labels[2], scores[2])) / 9
    print("weight loss3:",loss3)
    print("weight loss4:",loss4)

  def test_sigmoid_cross_entropy_loss_should_handle_mask(self):
    scores = [[1., 3., 2.]]
    labels = [[0., 1., 1.]]
    mask = [[True, False, True]]
    reduction = tf.compat.v1.losses.Reduction.SUM_BY_NONZERO_WEIGHTS

    loss_fn = SigmoidCrossEntropyLoss(name=None)
    result = loss_fn.compute(labels, scores, None, reduction, mask).numpy()
    loss=(math.log(1. + math.exp(-2.)) + math.log(1. + math.exp(1.))) / 2.

    print("result:",result)
    print("loss:",loss)


scel=SigmoidCrossEntropyLossTest()
scel.test_sigmoid_cross_entropy_loss()
scel.test_sigmoid_cross_entropy_loss_should_handle_mask()

3.2.2、运行结果

#直接cross entropy loss的结果
loss1: 0.7310792
loss2: 0.7310792548989641

#加权；loss的结果对比
weight loss3: 0.9895871
weight loss4: 0.9895871546801003

#mask掉一部分数据的结果对比
result: 0.72009486
loss: 0.7200948492805977

3.3、pairwise测试用例

3.3.1、代码实现

# pairwise测试
import tensorflow as tf
from rank_loss import PairwiseLogisticLoss
import math

class PairwiseLogisticLossTest(tf.test.TestCase):

  def test_pairwise_logistic_loss(self):
    scores = [[1., 3., 2.], [1., 2., 3.]]
    labels = [[0., 0., 1.], [0., 0., 2.]]
    reduction = tf.compat.v1.losses.Reduction.MEAN

    loss_fn = PairwiseLogisticLoss(name=None)
    result = loss_fn.compute(labels, scores, weights=None, reduction=reduction)

    logloss = lambda x: math.log(1. + math.exp(-x))
    expected = (logloss(3. - 2.) + logloss(1. - 2.) + logloss(3. - 1.) +
                logloss(3. - 2.)) / 4.

    print("result:",result)
    print("expected:",expected)

  def test_pairwise_logistic_loss_should_handle_per_list_weights(self):
    scores = [[1., 3., 2.], [1., 2., 3.]]
    labels = [[0., 0., 1.], [0., 0., 2.]]
    weights = [[1.], [2.]]
    reduction = tf.compat.v1.losses.Reduction.MEAN

    loss_fn = PairwiseLogisticLoss(name=None)
    result = loss_fn.compute(
        labels, scores, weights=weights, reduction=reduction)

    logloss = lambda x: math.log(1. + math.exp(-x))
    expected = (1. * (logloss(3. - 2.) + logloss(1. - 2.)) + 2. *
                (logloss(3. - 2.) + logloss(3. - 1.))) / 6.
    print("result:",result)
    print("expected:",expected)

pll=PairwiseLogisticLossTest()
pll.test_pairwise_logistic_loss()
pll.test_pairwise_logistic_loss_should_handle_per_list_weights()

3.3.2、运行结果

#对齐的pair情况
result: tf.Tensor(0.5166783, shape=(), dtype=float32)
expected: 0.5166782683994103
#需要补齐的代码情况
result: tf.Tensor(0.41781712, shape=(), dtype=float32)
expected: 0.4178171286931394

3.4、listwise测试用例

3.4.1、代码实现

# listwise测试
import tensorflow as tf
from rank_loss import SoftmaxLoss
import math

def _softmax(values):
  total = sum(math.exp(v) for v in values)
  return [math.exp(v) / (1e-20 + total) for v in values]

class SoftmaxLossTest(tf.test.TestCase):

  def test_softmax_loss(self):
    scores = [[1., 3., 2.], [1., 2., 3.], [1., 2., 3.]]
    labels = [[0., 0., 1.], [0., 0., 2.], [0., 0., 0.]]
    reduction = tf.compat.v1.losses.Reduction.SUM_BY_NONZERO_WEIGHTS

    loss_fn = SoftmaxLoss(name=None)
    result = loss_fn.compute(labels, scores, None, reduction).numpy()
    expect=-(math.log(_softmax(scores[0])[2]) +
      math.log(_softmax(scores[1])[2]) * 2.) / 2.
    print("result:",result)
    print("expect:",expect)

  def test_softmax_loss_should_handle_per_example_weights(self):
    scores = [[1., 3., 2.], [1., 2., 3.], [1., 2., 3.]]
    labels = [[0., 0., 1.], [1., 1., 2.], [0., 0., 0.]]
    example_weights = [[1., 1., 1.], [1., 2., 3.], [1., 0., 1.]]
    reduction = tf.compat.v1.losses.Reduction.SUM_BY_NONZERO_WEIGHTS
    probs = [_softmax(s) for s in scores]

    loss_fn = SoftmaxLoss(name=None)
    result = loss_fn.compute(labels, scores, example_weights, reduction).numpy()
    expect=-(math.log(probs[0][2]) * 1. + math.log(probs[1][0]) * 1. * 1. +
          math.log(probs[1][1]) * 1. * 2. + math.log(probs[1][2]) * 2. * 3.) /2.
    print("result:",result)
    print("expect:",expect)

  def test_softmax_loss_should_handle_per_list_weights(self):
    scores = [[1., 3., 2.], [1., 2., 3.], [1., 2., 3.]]
    labels = [[1., 2., 1.], [0., 0., 2.], [0., 0., 0.]]
    list_weights = [[2.], [1.], [1.]]
    reduction = tf.compat.v1.losses.Reduction.SUM_BY_NONZERO_WEIGHTS
    probs = [_softmax(s) for s in scores]

    loss_fn = SoftmaxLoss(name=None)
    result = loss_fn.compute(labels, scores, list_weights, reduction).numpy()
    expect = -(math.log(probs[0][0]) * 1. * 2. + math.log(probs[0][1]) * 2. * 2. +
          math.log(probs[0][2]) * 1. * 2. + math.log(probs[1][2]) * 2. * 1.) /2.
    print("result:",result)
    print("expect:",expect)

  def test_softmax_compute_per_list(self):
    scores = [[1., 3., 2.], [1., 2., 3.]]
    labels = [[0., 0., 1.], [0., 0., 2.]]
    per_item_weights = [[2., 3., 4.], [1., 1., 1.]]

    loss_fn = SoftmaxLoss(name=None)
    losses, weights = loss_fn.compute_per_list(labels, scores, per_item_weights)
    print("result:",losses)
    print("expected:",[1.407606, 0.407606])

  def test_softmax_loss_should_handle_mask(self):
    scores = [[1., 2., 3.]]
    labels = [[0., 1., 1.]]
    mask = [[True, False, True]]
    reduction = tf.compat.v1.losses.Reduction.SUM_BY_NONZERO_WEIGHTS

    loss_fn = SoftmaxLoss(name=None)
    result = loss_fn.compute(labels, scores, None, reduction, mask).numpy()
    expect=-(math.log(_softmax([1, 3])[1]))
    # self.assertAlmostEqual(result, -(math.log(_softmax([1, 3])[1])), places=5)
    print("result:",result)
    print("expect:",expect)

slt= SoftmaxLossTest()
slt.test_softmax_loss()
slt.test_softmax_loss_should_handle_mask()
slt.test_softmax_compute_per_list()
slt.test_softmax_loss_should_handle_per_list_weights()
slt.test_softmax_loss_should_handle_per_example_weights()

3.4.2、运行结果

#正常运行
result: 1.111409
expect: 1.1114089466665704
#带mask策略
result: 0.12692805
expect: 0.12692801104297252
#list维度loss
result: tf.Tensor([1.407606   0.40760595], shape=(2,), dtype=float32)
expected: [1.407606, 0.407606]
#样本维度加权
result: 5.03803
expect: 5.038029822221901
#list维度加权
result: 4.5380297
expect: 4.538029822221901