finmlkit.label.weights module

finmlkit.label.weights.average_uniqueness(timestamps: ndarray[tuple[int, ...], dtype[int64]], event_idxs: ndarray[tuple[int, ...], dtype[int64]], touch_idxs: ndarray[tuple[int, ...], dtype[int64]]) tuple[ndarray[tuple[int, ...], dtype[float64]], ndarray[tuple[int, ...], dtype[int16]]][source]

Calculate the uniqueness weights for the overlapping label. Based on Advances in Financial Machine Learning, Chapter 4. page 61.

Parameters:
  • timestamps – The timestamps in nanoseconds for the close prices series.

  • event_idxs – The indices of the labeled events, e.g. acquired from the cusum filter. (subset of timestamps)

  • touch_idxs – The touch indices for the given events.

Returns:

A tuple with two arrays - The uniqueness weights [0, 1] for the label. - The concurrency array, which indicates how many labels overlap at each timestamp.

Raises:

ValueError – If timestamps and touch indices are of different lengths.

finmlkit.label.weights.class_balance_weights(labels: ndarray[tuple[int, ...], dtype[int8]], base_w: ndarray[tuple[int, ...], dtype[float64]]) Tuple[ndarray[tuple[int, ...], dtype[int8]], ndarray[tuple[int, ...], dtype[float64]], ndarray[tuple[int, ...], dtype[float64]], ndarray[tuple[int, ...], dtype[float64]]][source]

Run this function after all other sample weights have been calculated and combined into base_w. Calculate the class balance weights for the given label using the base sample weights.

Parameters:
  • labels – The label (e.g., -1, 0, 1) for the given events.

  • base_w – Base weights for the given label (e.g., avg_uniqueness weights, vertical barrier weights, return attribution, time-decay combined). Number of class elements will be calculated as a weighted sum.

Returns:

A tuple containing: - The identified classes. - Corresponding class weights. - Number of class elements per label calculated as a sum of sample weights. - Final weights array per sample: class weights multiplied by base weights.

finmlkit.label.weights.return_attribution(event_idxs: ndarray[tuple[int, ...], dtype[int64]], touch_idxs: ndarray[tuple[int, ...], dtype[int64]], close: ndarray[tuple[int, ...], dtype[float64]], concurrency: ndarray[tuple[int, ...], dtype[int16]], normalize: bool) ndarray[tuple[int, ...], dtype[float64]][source]

Assign more weights to samples with higher return attribution. Advances in Financial Machine Learning, Chapter 4, page 68.

Parameters:
  • event_idxs – Event indices where the label starts.

  • touch_idxs – Touch indices where the label ends.

  • close – Close price array.

  • concurrency – Concurrency array indicating how many labels overlap at each timestamp. From label_average_uniqueness function.

  • normalize – If True, normalize the returned weights to sum to the number of events.

Returns:

NDArray[np.float64] An array of return attribution weights for each event.

finmlkit.label.weights.time_decay(avg_uniqueness: ndarray[tuple[int, ...], dtype[float64]], last_weight: float) ndarray[tuple[int, ...], dtype[float64]][source]

Apply linear time decay based on the average uniqueness weights. Newest observation assigned with 1.0 and oldest with last_weight. If last_weight is negative, the oldest portion (n_events* last_weight) is get erased (assigned with 0.0.) Advances in Financial Machine Learning, Chapter 4, page 70.

Parameters:
  • avg_uniqueness – The average uniqueness weights for the label from average_uniqueness function.

  • last_weight – The weight assigned to the last sample. If 1.0, then there is no decay.

Returns:

An array of time-decayed weights [0, 1] for each event.

Raises:
  • ValueError – The sum of all average uniqueness weights must be greater than 0.

  • ValueError – If last_weight is not in the range [-1, 1].