finmlkit.label.kit module¶
A API wrapper around the core numba function for better usability
- class finmlkit.label.kit.SampleWeights[source]¶
Bases:
objectA wrapper class for time decay and class balance weights calculation. These weights should be run on the training window part of the full dataset.
- static compute_final_weights(avg_uniqueness: Series, time_decay_intercept: float = 1.0, return_attribution: Series = None, vertical_touch_weights: Series = None, labels: Series = None) DataFrame[source]¶
Compute the time decay and class balance weights based on the average uniqueness and return attribution. Normalizes return attribution to sum up to event count.
- Parameters:
avg_uniqueness – Average uniqueness weights for the events.
return_attribution – Provide unnormalized return attribution if use this as info weights instead of average uniqueness.
vertical_touch_weights – Provide vertical touch weights if you want to apply them to the final weights.
time_decay_intercept – The intercept for the time decay function. 1.0 means no decay, 0.0 means full decay. Negative values will erase the oldest portion of the weights.
labels – Provide labels if you want to apply class balancing to the final weights.
- Returns:
A pandas Dataframe containing the weight parts and the combined weights.
- static compute_info_weights(trades: TradesData, labels: DataFrame, normalize: bool = False) DataFrame[source]¶
Computes the average uniqueness and (non-normalized) return attribution for the events.
- Parameters:
trades – The raw trades on which the events are evaluated
labels – Labels dataframe containing event indices and touch indices (output of compute_labels method).
normalize – Whether to normalize the returned weights.
- Returns:
A pandas DataFrame containing the average uniqueness and return attribution and vertical touch weights.
- class finmlkit.label.kit.TBMLabel(features: DataFrame, target_ret_col: str, min_ret: float, horizontal_barriers: tuple[float, float], vertical_barrier: Timedelta, min_close_time: Timedelta = Timedelta('0 days 00:00:01'), is_meta: bool = False)[source]¶
Bases:
object- compute_labels(trades: TradesData) tuple[DataFrame, DataFrame][source]¶
Compute the labels for the events using the triple barrier method.
- Parameters:
trades – The raw trades data the events will be evaluated
- Returns:
A tuple containing: - The features DataFrame with the event indices and other features. - A dataframe containing labels, event indices, touch indices, returns, and weights.
- compute_weights(trades: TradesData, normalized: bool = False) DataFrame[source]¶
Computes the sample average uniqueness and return attribution. :param trades: Same Raw trades data passed to compute_labels(). :param normalized: Whether to normalize the weights. :return: DataFrame containing the sample average uniqueness and return attribution.
- property event_count: int¶
Get the number of events in the features DataFrame. :return: The number of events.
- property event_range: str¶
Get the range of event timestamps. :return: A string containing the first and last event timestamps.
- property event_returns: Series¶
Get the log returns associated with each event. :return: A pandas Series containing the log returns.
- property features: DataFrame¶
Get the features corresponding the generated labels. I might be a subset of the original features DataFrame due to TBM evaluation window. :return: The features DataFrame.
- property first_event_timestamp: Timestamp | None¶
Get the timestamp of the first event. :return: The timestamp of the first event.
- property full_output: DataFrame¶
Get the full output DataFrame containing labels, event indices, touch indices, returns, and weights. :return: A pandas DataFrame containing the full output.
- property labels: Series¶
Get the labels for the events. :return: A pandas Series containing the labels.
- property last_event_timestamp: Timestamp | None¶
Get the timestamp of the last event. :return: The timestamp of the last event.
- property target_returns: Series¶
Get the target returns for the events. :return: A pandas Series containing the target returns.