finmlkit.bar.data_model module

class finmlkit.bar.data_model.FootprintData(bar_timestamps: ndarray[tuple[int, ...], dtype[int64]], price_tick: float, price_levels: ndarray[tuple[int, ...], dtype[ndarray[tuple[int, ...], dtype[int32]]]] | List[ndarray[tuple[int, ...], dtype[int32]]], buy_volumes: ndarray[tuple[int, ...], dtype[ndarray[tuple[int, ...], dtype[float32]]]] | List[ndarray[tuple[int, ...], dtype[float32]]], sell_volumes: ndarray[tuple[int, ...], dtype[ndarray[tuple[int, ...], dtype[float32]]]] | List[ndarray[tuple[int, ...], dtype[float32]]], buy_ticks: ndarray[tuple[int, ...], dtype[ndarray[tuple[int, ...], dtype[int32]]]] | List[ndarray[tuple[int, ...], dtype[int32]]], sell_ticks: ndarray[tuple[int, ...], dtype[ndarray[tuple[int, ...], dtype[int32]]]] | List[ndarray[tuple[int, ...], dtype[int32]]], buy_imbalances: ndarray[tuple[int, ...], dtype[ndarray[tuple[int, ...], dtype[bool]]]] | List[ndarray[tuple[int, ...], dtype[bool]]], sell_imbalances: ndarray[tuple[int, ...], dtype[ndarray[tuple[int, ...], dtype[bool]]]] | List[ndarray[tuple[int, ...], dtype[bool]]], cot_price_levels: ndarray[tuple[int, ...], dtype[int32]] | None = None, sell_imbalances_sum: ndarray[tuple[int, ...], dtype[uint16]] | None = None, buy_imbalances_sum: ndarray[tuple[int, ...], dtype[uint16]] | None = None, imb_max_run_signed: ndarray[tuple[int, ...], dtype[int16]] | None = None, vp_skew: ndarray[tuple[int, ...], dtype[float64]] | None = None, vp_gini: ndarray[tuple[int, ...], dtype[float64]] | None = None, _datetime_index: Series = None)[source]

Bases: object

Container for dynamic memory footprint calculations including trade volumes, price levels, and imbalance information.

Parameters:
  • bar_timestamps – Timestamps of each bar in nanoseconds.

  • price_tick – Price tick size.

  • price_levels – Array of price levels per bar.

  • buy_volumes – Buy volumes per price level.

  • sell_volumes – Sell volumes per price level.

  • buy_ticks – Number of buy ticks per price level.

  • sell_ticks – Number of sell ticks per price level.

  • buy_imbalances – Buy imbalance flags per price level.

  • sell_imbalances – Sell imbalance flags per price level.

  • cot_price_levels – Optional Commitment of Traders price levels.

  • sell_imbalances_sum – Optional total sell imbalance counts per bar.

  • buy_imbalances_sum – Optional total buy imbalance counts per bar.

  • imb_max_run_signed – Optional longest signed imbalance run for each bar.

  • vp_skew – Optional volume profile skew for each bar (positive = buy pressure above VWAP).

  • vp_gini – Optional volume profile Gini coefficient for each bar (0 = concentrated, →1 = even).

bar_timestamps: ndarray[tuple[int, ...], dtype[int64]]
buy_imbalances: ndarray[tuple[int, ...], dtype[ndarray[tuple[int, ...], dtype[bool]]]] | List[ndarray[tuple[int, ...], dtype[bool]]]
buy_imbalances_sum: ndarray[tuple[int, ...], dtype[uint16]] | None = None
buy_ticks: ndarray[tuple[int, ...], dtype[ndarray[tuple[int, ...], dtype[int32]]]] | List[ndarray[tuple[int, ...], dtype[int32]]]
buy_volumes: ndarray[tuple[int, ...], dtype[ndarray[tuple[int, ...], dtype[float32]]]] | List[ndarray[tuple[int, ...], dtype[float32]]]
cast_to_numba_list()[source]

Convert internal arrays to NumbaList for JIT-compatible processing.

cast_to_numpy()[source]

Convert internal lists to NumPy arrays for general-purpose processing.

cot_price_levels: ndarray[tuple[int, ...], dtype[int32]] | None = None
classmethod from_dict(data: Dict) FootprintData[source]

Create a FootprintData object from a dictionary of arrays. :param data: Dictionary with raw footprint arrays. :returns: A validated FootprintData instance. :raises ValueError: If data length is inconsistent.

classmethod from_numba(data: Tuple, price_tick: float) FootprintData[source]

Create a FootprintData object from Numba-based output. :param data: Output tuple from comp_bar_footprint. :param price_tick: Tick size for price levels. :returns: A validated FootprintData instance. :raises ValueError: If data length is inconsistent.

get_df()[source]

Convert the footprint data into a pandas DataFrame. :returns: A DataFrame with structured footprint information.

imb_max_run_signed: ndarray[tuple[int, ...], dtype[int16]] | None = None
is_valid() bool[source]

Check if all internal arrays are consistent. :returns: True if valid, False otherwise.

memory_usage()[source]

Calculate the approximate memory usage of this object in MB.

price_levels: ndarray[tuple[int, ...], dtype[ndarray[tuple[int, ...], dtype[int32]]]] | List[ndarray[tuple[int, ...], dtype[int32]]]
price_tick: float
sell_imbalances: ndarray[tuple[int, ...], dtype[ndarray[tuple[int, ...], dtype[bool]]]] | List[ndarray[tuple[int, ...], dtype[bool]]]
sell_imbalances_sum: ndarray[tuple[int, ...], dtype[uint16]] | None = None
sell_ticks: ndarray[tuple[int, ...], dtype[ndarray[tuple[int, ...], dtype[int32]]]] | List[ndarray[tuple[int, ...], dtype[int32]]]
sell_volumes: ndarray[tuple[int, ...], dtype[ndarray[tuple[int, ...], dtype[float32]]]] | List[ndarray[tuple[int, ...], dtype[float32]]]
vp_gini: ndarray[tuple[int, ...], dtype[float64]] | None = None
vp_skew: ndarray[tuple[int, ...], dtype[float64]] | None = None
class finmlkit.bar.data_model.TradesData(ts: ndarray[tuple[int, ...], dtype[_ScalarType_co]], px: ndarray[tuple[int, ...], dtype[_ScalarType_co]], qty: ndarray[tuple[int, ...], dtype[_ScalarType_co]], id: ndarray[tuple[int, ...], dtype[_ScalarType_co]] = None, *, is_buyer_maker: ndarray[tuple[int, ...], dtype[_ScalarType_co]] = None, side=None, dt_index: DatetimeIndex | None = None, timestamp_unit: str | None = None, preprocess: bool = False, proc_res: str | None = None, name=None)[source]

Bases: object

Class to preprocess trades data for bar building.

This class handles standardization of column names, timestamp conversion, trade merging, and side inference for consistent processing across different data sources.

property data: DataFrame

Get the processed trades data as a DataFrame corresponding to the active view range.

Returns:

DataFrame containing trades data.

property end_date

Get the end date of the trades data.

Returns:

End date as a pandas Timestamp.

classmethod load_trades_h5(filepath: str, *, key: str | None = None, start_time: str | Timestamp | None = None, end_time: str | Timestamp | None = None, n_workers: int | None = None, enable_multiprocessing: bool = True, min_groups_for_mp: int = 2) TradesData[source]

Load trades from filepath with optional multiprocessing support.

Three usage modes exist:

  1. key only – load the full monthly partition /trades/<key>.

  2. start_time / end_time – assemble the minimal set of monthly groups touching the range, slice at read time for maximum speed.

  3. Combination – constrain selection within the chosen “key”.

Parameters:
  • filepath – Path to the HDF5 file.

  • key – Optional specific monthly key to load (e.g., “2025-03”).

  • start_time – Optional start time for filtering.

  • end_time – Optional end time for filtering.

  • n_workers – Number of worker processes. If None, uses CPU count - 1.

  • enable_multiprocessing – Whether to use multiprocessing when loading multiple groups.

  • min_groups_for_mp – Minimum number of groups required to enable multiprocessing.

Returns:

TradesData instance with loaded data.

property orig_timestamp_unit: str

Get the timestamp unit used for processing.

Returns:

Timestamp unit string.

save_h5(filepath: str, *, month_key: str | None = None, complib: str = 'blosc:lz4', complevel: int = 1, mode: str = 'a', chunksize: int = 1000000, overwrite_month: bool = True) str[source]

Persist the raw trades to an on-disk HDF5 store. The data of each calendar month lives under /trades/YYYY-MM in the file.

  • When adding new monthly data, it will be stored in a new group.

  • When adding data for an existing month, you can either append to it or overwrite it with confirmation.

Parameters:
  • filepath – Destination .h5 file. The parent directories are created automatically when missing.

  • month_key – Override the key of the form "YYYY-MM". When None the key is derived from the first timestamp of self.data.

  • complib – Compression backend used by PyTables. Default is blosc:zstd.

  • complevel – Compression level. Default is 5.

  • mode – File mode – "a" to create or append, "w" to start fresh. Default is "a".

  • chunksize – Row chunk size used by PyTables when writing large frames. Default is 1000000.

  • overwrite_month – If True and the month data exists, prompts for confirmation to overwrite. Default is True.

Returns:

The full key used inside the store, e.g. "/trades/2025-02".

Raises:

ValueError if user declines to overwrite existing data.

set_view_range(start: Timestamp | str, end: Timestamp | str)[source]

Set the view range for the trades data. :param start: Start timestamp for the view range. :param end: End timestamp for the view range. :return: None

property start_date

Get the start date of the trades data.

Returns:

Start date as a pandas Timestamp.