finmlkit.bar.base module

This module contains the functions to build candlestick bar and other intra-bar features (i.e. directional features, footprints) from raw trades data using the indexer functions outputs defined in the logic module.

class finmlkit.bar.base.BarBuilderBase(trades: TradesData)[source]

Bases: ABC

This class provides a template for generating bar from raw trades data.

property bar_close_indices: ndarray[tuple[int, ...], dtype[int64]] | None

Return the bar close indices in the raw trades data. :return: The bar close indices regarding the raw trades data as a numpy array of int64.

property bar_close_timestamps: ndarray[tuple[int, ...], dtype[int64]] | None

Return the bar close timestamps in the raw trades data. :return: The bar close ns timestamps as a numpy array of int64.

build_directional_features() DataFrame[source]

Build the directional features using the generated indices and raw trades data. :returns: A dataframe containing the directional features:

ticks_buy, ticks_sell, volume_buy, volume_sell, dollars_buy, dollars_sell, max_spread, cum_volumes_min, cum_volumes_max, cum_dollars_min, cum_dollars_max.

build_footprints(price_tick_size=None, imbalance_factor=3.0) FootprintData[source]

Build the footprint data using the generated indices and raw trades data. :param price_tick_size: Optional tick size; inferred if None. :param imbalance_factor: Multiplier for detecting imbalances. Default is 3.0. :returns: A FootprintData object containing the footprint data.

build_ohlcv() DataFrame[source]

Build the bar features using the generated indices and raw trades data. :returns: A dataframe containing the OHLCV + VWAP features with datetime index corresponding to the bar open timestamps.

build_trade_size_features(theta: ndarray[tuple[int, ...], dtype[float64]] | None, theta_mult: float = 5.0) DataFrame[source]

Build the trade size features using the generated indices and raw trades data. :param theta: Optional typical trade size (e.g., 30 day rolling median trade size). :param theta_mult: Multiplier for theta to define the block size threshold. Default is 5.0. :returns: A dataframe containing the trade size features:

mean_size_rel, size_95_rel, pct_block, size_gini.

finmlkit.bar.base.comp_bar_directional_features(prices: ndarray[tuple[int, ...], dtype[float64]], volumes: ndarray[tuple[int, ...], dtype[float64]], bar_close_indices: ndarray[tuple[int, ...], dtype[int64]], trade_sides: ndarray[tuple[int, ...], dtype[int8]]) tuple[ndarray[tuple[int, ...], dtype[int64]], ndarray[tuple[int, ...], dtype[int64]], ndarray[tuple[int, ...], dtype[float32]], ndarray[tuple[int, ...], dtype[float32]], ndarray[tuple[int, ...], dtype[float32]], ndarray[tuple[int, ...], dtype[float32]], ndarray[tuple[int, ...], dtype[float32]], ndarray[tuple[int, ...], dtype[float32]], ndarray[tuple[int, ...], dtype[int64]], ndarray[tuple[int, ...], dtype[int64]], ndarray[tuple[int, ...], dtype[float32]], ndarray[tuple[int, ...], dtype[float32]], ndarray[tuple[int, ...], dtype[float32]], ndarray[tuple[int, ...], dtype[float32]]][source]

Compute directional bar features such as tick counts, volumes, dollars, spreads, and cumulative flows.

Parameters:
  • prices – Trade prices.

  • volumes – Trade volumes.

  • bar_close_indices – Indices marking the end of each bar.

  • trade_sides – Trade direction (1 for market buy, -1 for market sell).

Returns:

Tuple containing: - ticks_buy: Number of buy trades per bar. - ticks_sell: Number of sell trades per bar. - volume_buy: Volume of buy trades per bar. - volume_sell: Volume of sell trades per bar. - dollars_buy: Dollar value of buy trades per bar. - dollars_sell: Dollar value of sell trades per bar. - mean_spread: Mean bid/ask spread within each bar. - max_spread: Maximum spread within each bar. - cum_ticks_min: Minimum cumulative tick imbalance. - cum_ticks_max: Maximum cumulative tick imbalance. - cum_volumes_min: Minimum cumulative volume imbalance. - cum_volumes_max: Maximum cumulative volume imbalance. - cum_dollars_min: Minimum cumulative dollar imbalance. - cum_dollars_max: Maximum cumulative dollar imbalance.

finmlkit.bar.base.comp_bar_footprints(prices: ndarray[tuple[int, ...], dtype[float64]], amounts: ndarray[tuple[int, ...], dtype[float64]], bar_close_indices: ndarray[tuple[int, ...], dtype[int64]], trade_sides: ndarray[tuple[int, ...], dtype[int8]], price_tick_size: float, bar_lows: ndarray[tuple[int, ...], dtype[float64]], bar_highs: ndarray[tuple[int, ...], dtype[float64]], imbalance_factor: float) tuple[List[ndarray[tuple[int, ...], dtype[int32]]], List[ndarray[tuple[int, ...], dtype[float32]]], List[ndarray[tuple[int, ...], dtype[float32]]], List[ndarray[tuple[int, ...], dtype[int32]]], List[ndarray[tuple[int, ...], dtype[int32]]], List[ndarray[tuple[int, ...], dtype[bool]]], List[ndarray[tuple[int, ...], dtype[bool]]], ndarray[tuple[int, ...], dtype[uint16]], ndarray[tuple[int, ...], dtype[uint16]], ndarray[tuple[int, ...], dtype[int32]], ndarray[tuple[int, ...], dtype[int16]], ndarray[tuple[int, ...], dtype[float64]], ndarray[tuple[int, ...], dtype[float64]]][source]

Compute the footprint features for each bar, including buy/sell volumes and imbalances per price level. The price levels are calculated in (integer) price tick units to eliminate floating point errors.

Parameters:
  • prices – Trade prices.

  • amounts – Trade amounts.

  • bar_close_indices – Indices marking the end of each bar.

  • trade_sides – The side information of the market order (1 for market buy, -1 for market sell).

  • price_tick_size – Tick size used for price level quantization.

  • bar_lows – Lowest price per bar.

  • bar_highs – Highest price per bar.

  • imbalance_factor – Multiplier threshold for detecting imbalance.

Returns:

Tuple containing: - bar_open_timestamps: Timestamps for each bar. - price_levels: List of price level arrays per bar. - buy_volumes: List of buy volumes per price level. - sell_volumes: List of sell volumes per price level. - buy_ticks: List of buy ticks per price level. - sell_ticks: List of sell ticks per price level. - buy_imbalances: List of boolean arrays indicating buy imbalances. - sell_imbalances: List of boolean arrays indicating sell imbalances. - buy_imbalances_sum: Total number of buy imbalances per bar. - sell_imbalances_sum: Total number of sell imbalances per bar. - cot_price_levels: Price level with highest total volume per bar. - imb_max_run_signed: Longest signed imbalance run for each bar. - vp_skew: Volume profile skew for each bar (positive = buy pressure above VWAP). - vp_gini: Volume profile Gini coefficient for each bar.

finmlkit.bar.base.comp_bar_ohlcv(prices: ndarray[tuple[int, ...], dtype[float64]], volumes: ndarray[tuple[int, ...], dtype[float64]], bar_close_indices: ndarray[tuple[int, ...], dtype[int64]]) tuple[ndarray[tuple[int, ...], dtype[float64]], ndarray[tuple[int, ...], dtype[float64]], ndarray[tuple[int, ...], dtype[float64]], ndarray[tuple[int, ...], dtype[float64]], ndarray[tuple[int, ...], dtype[float32]], ndarray[tuple[int, ...], dtype[float64]], ndarray[tuple[int, ...], dtype[int64]], ndarray[tuple[int, ...], dtype[float64]]][source]

Build the candlestick bar from raw trades data based in bar open indices.

Parameters:
  • prices – Trade prices.

  • volumes – Trade volumes.

  • bar_close_indices – Indices marking the end of each bar.

Returns:

Tuple containing:

  • open: Opening price of each bar.

  • high: Highest price of each bar.

  • low: Lowest price of each bar.

  • close: Closing price of each bar.

  • volume: Total traded volume in each bar.

  • vwap: Volume-weighted average price of each bar.

  • bar_trades: Number of trades in each bar.

  • bar_median_trade_size: Median trade size in each bar.

finmlkit.bar.base.comp_bar_trade_size_features(amounts: ndarray[tuple[int, ...], dtype[float64]], theta: ndarray[tuple[int, ...], dtype[float64]], bar_close_indices: ndarray[tuple[int, ...], dtype[int64]], theta_mult: float) tuple[ndarray[tuple[int, ...], dtype[float32]], ndarray[tuple[int, ...], dtype[float32]], ndarray[tuple[int, ...], dtype[float32]], ndarray[tuple[int, ...], dtype[float32]]][source]

Compute the size distribution features for each bar, including the mean, 95 percentile, pct_block relative to thehta and size_gini. Are there large trade block prints in the bar?

Parameters:
  • amounts – Array of trade amounts (raw trade sizes).

  • theta – The typical trade size (e.g., 30 day rolling median trade size).

  • bar_close_indices – Indices marking the end of each bar.

  • theta_mult – Multiplier for theta to define the block size threshold. (eg. 5 times the median trade size)

Returns:

A tuple containing: - mean_size_rel: Mean trade size relative to theta per bar: log1p(mean_size / theta) - size_95_rel: 95th percentile of trade sizes per bar relative to theta: log1p(size_95 / theta) - pct_block: Percentage of trades that are larger than theta per bar: SUM( size_i [ size_i>theta ] / volume ) - size_gini: Gini coefficient of trade sizes per bar.

finmlkit.bar.base.comp_footprint_features(price_levels, buy_volumes, sell_volumes, imbalance_multiplier)[source]

Calculate footprint statistics such as buy/sell imbalances and Commitment of Traders (COT) level.

Parameters:
  • price_levels – Array of int64 tick unit price levels in ascending order.

  • buy_volumes – Array of buy volumes at each price level.

  • sell_volumes – Array of sell volumes at each price level.

  • imbalance_multiplier – Threshold multiplier to detect imbalance.

Returns:

Tuple containing: - buy_imbalances: Boolean array where True indicates buy imbalance at the level. - sell_imbalances: Boolean array where True indicates sell imbalance at the level. - imbalance_max_run_signed: Longest signed imbalance run (number of consecutive imbalanced level) - cot_price_level: Price level with the highest total volume. - vp_skew: Volume profile skew relative to vwap (positive = buy pressure above VWAP). - vp_gini: Volume profile Gini coefficient (0 = concentrated, →1 = even distribution).