finmlkit.label.weights module¶
- finmlkit.label.weights.average_uniqueness(timestamps: ndarray[tuple[int, ...], dtype[int64]], event_idxs: ndarray[tuple[int, ...], dtype[int64]], touch_idxs: ndarray[tuple[int, ...], dtype[int64]]) tuple[ndarray[tuple[int, ...], dtype[float64]], ndarray[tuple[int, ...], dtype[int16]]][source]¶
Calculate the uniqueness weights for the overlapping label. Based on Advances in Financial Machine Learning, Chapter 4. page 61.
- Parameters:
timestamps – The timestamps in nanoseconds for the close prices series.
event_idxs – The indices of the labeled events, e.g. acquired from the cusum filter. (subset of timestamps)
touch_idxs – The touch indices for the given events.
- Returns:
A tuple with two arrays - The uniqueness weights [0, 1] for the label. - The concurrency array, which indicates how many labels overlap at each timestamp.
- Raises:
ValueError – If timestamps and touch indices are of different lengths.
- finmlkit.label.weights.class_balance_weights(labels: ndarray[tuple[int, ...], dtype[int8]], base_w: ndarray[tuple[int, ...], dtype[float64]]) Tuple[ndarray[tuple[int, ...], dtype[int8]], ndarray[tuple[int, ...], dtype[float64]], ndarray[tuple[int, ...], dtype[float64]], ndarray[tuple[int, ...], dtype[float64]]][source]¶
Run this function after all other sample weights have been calculated and combined into base_w. Calculate the class balance weights for the given label using the base sample weights.
- Parameters:
labels – The label (e.g., -1, 0, 1) for the given events.
base_w – Base weights for the given label (e.g., avg_uniqueness weights, vertical barrier weights, return attribution, time-decay combined). Number of class elements will be calculated as a weighted sum.
- Returns:
A tuple containing: - The identified classes. - Corresponding class weights. - Number of class elements per label calculated as a sum of sample weights. - Final weights array per sample: class weights multiplied by base weights.
- finmlkit.label.weights.return_attribution(event_idxs: ndarray[tuple[int, ...], dtype[int64]], touch_idxs: ndarray[tuple[int, ...], dtype[int64]], close: ndarray[tuple[int, ...], dtype[float64]], concurrency: ndarray[tuple[int, ...], dtype[int16]], normalize: bool) ndarray[tuple[int, ...], dtype[float64]][source]¶
Assign more weights to samples with higher return attribution. Advances in Financial Machine Learning, Chapter 4, page 68.
- Parameters:
event_idxs – Event indices where the label starts.
touch_idxs – Touch indices where the label ends.
close – Close price array.
concurrency – Concurrency array indicating how many labels overlap at each timestamp. From label_average_uniqueness function.
normalize – If True, normalize the returned weights to sum to the number of events.
- Returns:
NDArray[np.float64] An array of return attribution weights for each event.
- finmlkit.label.weights.time_decay(avg_uniqueness: ndarray[tuple[int, ...], dtype[float64]], last_weight: float) ndarray[tuple[int, ...], dtype[float64]][source]¶
Apply linear time decay based on the average uniqueness weights. Newest observation assigned with 1.0 and oldest with last_weight. If last_weight is negative, the oldest portion (n_events* last_weight) is get erased (assigned with 0.0.) Advances in Financial Machine Learning, Chapter 4, page 70.
- Parameters:
avg_uniqueness – The average uniqueness weights for the label from average_uniqueness function.
last_weight – The weight assigned to the last sample. If 1.0, then there is no decay.
- Returns:
An array of time-decayed weights [0, 1] for each event.
- Raises:
ValueError – The sum of all average uniqueness weights must be greater than 0.
ValueError – If last_weight is not in the range [-1, 1].