finmlkit.label.kit module¶

A API wrapper around the core numba function for better usability

class finmlkit.label.kit.SampleWeights[source]¶

Bases: object

A wrapper class for time decay and class balance weights calculation. These weights should be run on the training window part of the full dataset.

static compute_final_weights(avg_uniqueness: Series, time_decay_intercept: float = 1.0, return_attribution: Series = None, vertical_touch_weights: Series = None, labels: Series = None) → DataFrame[source]¶

Compute the time decay and class balance weights based on the average uniqueness and return attribution. Normalizes return attribution to sum up to event count.

Parameters:

avg_uniqueness – Average uniqueness weights for the events.
return_attribution – Provide unnormalized return attribution if use this as info weights instead of average uniqueness.
vertical_touch_weights – Provide vertical touch weights if you want to apply them to the final weights.
time_decay_intercept – The intercept for the time decay function. 1.0 means no decay, 0.0 means full decay. Negative values will erase the oldest portion of the weights.
labels – Provide labels if you want to apply class balancing to the final weights.

Returns:

A pandas Dataframe containing the weight parts and the combined weights.

static compute_info_weights(trades: TradesData, labels: DataFrame, normalize: bool = False) → DataFrame[source]¶

Computes the average uniqueness and (non-normalized) return attribution for the events.

Parameters:

trades – The raw trades on which the events are evaluated
labels – Labels dataframe containing event indices and touch indices (output of compute_labels method).
normalize – Whether to normalize the returned weights.

Returns:

A pandas DataFrame containing the average uniqueness and return attribution and vertical touch weights.

class finmlkit.label.kit.TBMLabel(features: DataFrame, target_ret_col: str, min_ret: float, horizontal_barriers: tuple[float, float], vertical_barrier: Timedelta, min_close_time: Timedelta = Timedelta('0 days 00:00:01'), is_meta: bool = False)[source]¶

Bases: object

Implements the Triple Barrier Method (TBM) for labeling financial events, as described by Marcos Lopez de Prado. This method assigns labels to events based on whether the price touches an upper barrier (take-profit), lower barrier (stop-loss), or a vertical time barrier first. It supports both side labeling and meta-labeling modes.

The Triple Barrier Method is a technique for labeling outcomes in financial machine learning, particularly useful for creating supervised learning datasets from time-series data. It helps mitigate issues like overfitting and improves the informativeness of labels by considering profitability thresholds and time horizons.

For a set of events (e.g., trading signals or cusum events), the method constructs three barriers around each event’s starting price:

Upper horizontal barrier: Take-profit level, computed as starting price plus (target return * upper multiplier).
Lower horizontal barrier: Stop-loss level, computed as starting price minus (target return * lower multiplier).
Vertical barrier: A time-based barrier after a specified timedelta.

The label is determined by which barrier is touched first by the price path:

+1 if upper barrier is touched first (profitable).
-1 if lower barrier is touched first (loss).
0 if vertical barrier is touched first (timeout), or adjustable based on meta-labeling.

In meta-labeling mode (is_meta=True), the method incorporates predictions from a primary model (via the ‘side’ column). Labels are assigned only if the primary model’s direction aligns with the barrier outcome, enabling meta-models to learn when to trust the primary model.

Mathematically, for an event at time \(t\) with starting price \(p_t\), target return \(r_t\) (e.g., volatility estimate), and horizontal multipliers \((m_{low}, m_{up})\):

\[ \begin{align}\begin{aligned}\text{Upper barrier} = p_t \cdot (1 + r_t \cdot m_{up})\\\text{Lower barrier} = p_t \cdot (1 - r_t \cdot m_{low})\\\text{Vertical barrier} = t + \Delta t\end{aligned}\end{align} \]

The label \(l\) for the event is:

\[\begin{split}l = \begin{cases} 1, & \text{if upper barrier touched first} \\ -1, & \text{if lower barrier touched first} \\ 0, & \text{if vertical barrier touched first} \end{cases}\end{split}\]

Important

In this implementation, we are constructing binary labels: either +1 or -1 for side prediction as recommended in Advances in Financial Machine Learning. We introduce “vertical_touch_weights” to decrease the weights of misleading labels associated with a vertical barrier touch. Consider the following scenario: vertical barrier is hit slightly above/below the initial price resulting in 1/-1 label, but the price path was very close to the lower/upper barrier (almost hit it). If the ML model predicted -1/1 for this event, we don’t want to heavily punish it.

In meta-labeling, the label is modulated by the primary side \(s \in \{-1, 1\}\):

\[\begin{split}l_{meta} = \begin{cases} 1, & \text{if } (s = 1 \land l = 1) \lor (s = -1 \land l = -1) \\ 0, & \text{otherwise} \end{cases}\end{split}\]

Note

To disable a horizontal barrier, set its multiplier to \(+\infty\) or \(-\infty\). For the vertical barrier, use a very large timedelta (e.g., 1000 years) to effectively disable it.

Note

This implementation supports computation of sample weights via the related SampleWeights class. After labeling, use compute_weights() to calculate information-driven weights, including:

Label concurrency: Measures overlap of event durations.
Return attribution: Attributes returns to overlapping events proportionally to their uniqueness.

These can be combined with time decay and class balancing for final sample weights in model training using SampleWeights.compute_final_weights().

Parameters:

features (pd.DataFrame) – The events dataframe containing the return target column and optionally event indices (“event_idx” column) and features. If not provided, event indices will be computed based on timestamps.
target_ret_col (str) – The name of the target return column in the features dataframe. Typically a volatility estimator output. This is used to scale the horizontal barriers. Should be in log-return space.
min_ret (float) – Minimum required return threshold. Events where the absolute target return (scaled by max horizontal multiplier) is below this threshold will be dropped.
horizontal_barriers (tuple[float, float]) – Bottom and top (stop-loss/take-profit) horizontal barrier multipliers. The target return is multiplied by these to determine barrier widths. Use -inf/+inf to disable.
vertical_barrier (pd.Timedelta) – The temporal barrier duration. Set to a large value (e.g., pd.Timedelta(days=365*1000)) to disable.
min_close_time (pd.Timedelta, optional) – Prevents premature event closure before this minimum time. Default: pd.Timedelta(seconds=1).
is_meta (bool, optional) – Enable meta-labeling mode. If True, features must contain a ‘side’ column with primary model predictions (-1, 0, 1). Default: False.

Raises:

ValueError – If input validations fail, such as missing columns, invalid types, or empty data after filtering.