finmlkit.bar.data_model module¶
- class finmlkit.bar.data_model.FootprintData(bar_timestamps: ndarray[tuple[int, ...], dtype[int64]], price_tick: float, price_levels: ndarray[tuple[int, ...], dtype[ndarray[tuple[int, ...], dtype[int32]]]] | List[ndarray[tuple[int, ...], dtype[int32]]], buy_volumes: ndarray[tuple[int, ...], dtype[ndarray[tuple[int, ...], dtype[float32]]]] | List[ndarray[tuple[int, ...], dtype[float32]]], sell_volumes: ndarray[tuple[int, ...], dtype[ndarray[tuple[int, ...], dtype[float32]]]] | List[ndarray[tuple[int, ...], dtype[float32]]], buy_ticks: ndarray[tuple[int, ...], dtype[ndarray[tuple[int, ...], dtype[int32]]]] | List[ndarray[tuple[int, ...], dtype[int32]]], sell_ticks: ndarray[tuple[int, ...], dtype[ndarray[tuple[int, ...], dtype[int32]]]] | List[ndarray[tuple[int, ...], dtype[int32]]], buy_imbalances: ndarray[tuple[int, ...], dtype[ndarray[tuple[int, ...], dtype[bool]]]] | List[ndarray[tuple[int, ...], dtype[bool]]], sell_imbalances: ndarray[tuple[int, ...], dtype[ndarray[tuple[int, ...], dtype[bool]]]] | List[ndarray[tuple[int, ...], dtype[bool]]], cot_price_levels: ndarray[tuple[int, ...], dtype[int32]] | None = None, sell_imbalances_sum: ndarray[tuple[int, ...], dtype[uint16]] | None = None, buy_imbalances_sum: ndarray[tuple[int, ...], dtype[uint16]] | None = None, imb_max_run_signed: ndarray[tuple[int, ...], dtype[int16]] | None = None, vp_skew: ndarray[tuple[int, ...], dtype[float64]] | None = None, vp_gini: ndarray[tuple[int, ...], dtype[float64]] | None = None, _datetime_index: Series = None)[source]¶
Bases:
objectContainer for dynamic memory footprint calculations including trade volumes, price levels, and imbalance information.
- Parameters:
bar_timestamps – Timestamps of each bar in nanoseconds.
price_tick – Price tick size.
price_levels – Array of price levels per bar.
buy_volumes – Buy volumes per price level.
sell_volumes – Sell volumes per price level.
buy_ticks – Number of buy ticks per price level.
sell_ticks – Number of sell ticks per price level.
buy_imbalances – Buy imbalance flags per price level.
sell_imbalances – Sell imbalance flags per price level.
cot_price_levels – Optional Commitment of Traders price levels.
sell_imbalances_sum – Optional total sell imbalance counts per bar.
buy_imbalances_sum – Optional total buy imbalance counts per bar.
imb_max_run_signed – Optional longest signed imbalance run for each bar.
vp_skew – Optional volume profile skew for each bar (positive = buy pressure above VWAP).
vp_gini – Optional volume profile Gini coefficient for each bar (0 = concentrated, →1 = even).
- bar_timestamps: ndarray[tuple[int, ...], dtype[int64]]¶
- buy_imbalances: ndarray[tuple[int, ...], dtype[ndarray[tuple[int, ...], dtype[bool]]]] | List[ndarray[tuple[int, ...], dtype[bool]]]¶
- buy_imbalances_sum: ndarray[tuple[int, ...], dtype[uint16]] | None = None¶
- buy_ticks: ndarray[tuple[int, ...], dtype[ndarray[tuple[int, ...], dtype[int32]]]] | List[ndarray[tuple[int, ...], dtype[int32]]]¶
- buy_volumes: ndarray[tuple[int, ...], dtype[ndarray[tuple[int, ...], dtype[float32]]]] | List[ndarray[tuple[int, ...], dtype[float32]]]¶
- cot_price_levels: ndarray[tuple[int, ...], dtype[int32]] | None = None¶
- classmethod from_dict(data: Dict) FootprintData[source]¶
Create a FootprintData object from a dictionary of arrays. :param data: Dictionary with raw footprint arrays. :returns: A validated FootprintData instance. :raises ValueError: If data length is inconsistent.
- classmethod from_numba(data: Tuple, price_tick: float) FootprintData[source]¶
Create a FootprintData object from Numba-based output. :param data: Output tuple from comp_bar_footprint. :param price_tick: Tick size for price levels. :returns: A validated FootprintData instance. :raises ValueError: If data length is inconsistent.
- get_df()[source]¶
Convert the footprint data into a pandas DataFrame. :returns: A DataFrame with structured footprint information.
- imb_max_run_signed: ndarray[tuple[int, ...], dtype[int16]] | None = None¶
- is_valid() bool[source]¶
Check if all internal arrays are consistent. :returns: True if valid, False otherwise.
- price_levels: ndarray[tuple[int, ...], dtype[ndarray[tuple[int, ...], dtype[int32]]]] | List[ndarray[tuple[int, ...], dtype[int32]]]¶
- price_tick: float¶
- sell_imbalances: ndarray[tuple[int, ...], dtype[ndarray[tuple[int, ...], dtype[bool]]]] | List[ndarray[tuple[int, ...], dtype[bool]]]¶
- sell_imbalances_sum: ndarray[tuple[int, ...], dtype[uint16]] | None = None¶
- sell_ticks: ndarray[tuple[int, ...], dtype[ndarray[tuple[int, ...], dtype[int32]]]] | List[ndarray[tuple[int, ...], dtype[int32]]]¶
- sell_volumes: ndarray[tuple[int, ...], dtype[ndarray[tuple[int, ...], dtype[float32]]]] | List[ndarray[tuple[int, ...], dtype[float32]]]¶
- vp_gini: ndarray[tuple[int, ...], dtype[float64]] | None = None¶
- vp_skew: ndarray[tuple[int, ...], dtype[float64]] | None = None¶
- class finmlkit.bar.data_model.TradesData(ts: ndarray[tuple[int, ...], dtype[_ScalarType_co]], px: ndarray[tuple[int, ...], dtype[_ScalarType_co]], qty: ndarray[tuple[int, ...], dtype[_ScalarType_co]], id: ndarray[tuple[int, ...], dtype[_ScalarType_co]] = None, *, is_buyer_maker: ndarray[tuple[int, ...], dtype[_ScalarType_co]] = None, side=None, dt_index: DatetimeIndex | None = None, timestamp_unit: str | None = None, preprocess: bool = False, proc_res: str | None = None, name=None)[source]¶
Bases:
objectClass to preprocess trades data for bar building.
This class handles standardization of column names, timestamp conversion, trade merging, and side inference for consistent processing across different data sources.
- property data: DataFrame¶
Get the processed trades data as a DataFrame corresponding to the active view range.
- Returns:
DataFrame containing trades data.
- property end_date¶
Get the end date of the trades data.
- Returns:
End date as a pandas Timestamp.
- classmethod load_trades_h5(filepath: str, *, key: str | None = None, start_time: str | Timestamp | None = None, end_time: str | Timestamp | None = None, n_workers: int | None = None, enable_multiprocessing: bool = True, min_groups_for_mp: int = 2) TradesData[source]¶
Load trades from filepath with optional multiprocessing support.
Three usage modes exist:
keyonly – load the full monthly partition/trades/<key>.start_time/end_time– assemble the minimal set of monthly groups touching the range, slice at read time for maximum speed.Combination – constrain selection within the chosen “key”.
- Parameters:
filepath – Path to the HDF5 file.
key – Optional specific monthly key to load (e.g., “2025-03”).
start_time – Optional start time for filtering.
end_time – Optional end time for filtering.
n_workers – Number of worker processes. If None, uses CPU count - 1.
enable_multiprocessing – Whether to use multiprocessing when loading multiple groups.
min_groups_for_mp – Minimum number of groups required to enable multiprocessing.
- Returns:
TradesData instance with loaded data.
- property orig_timestamp_unit: str¶
Get the timestamp unit used for processing.
- Returns:
Timestamp unit string.
- save_h5(filepath: str, *, month_key: str | None = None, complib: str = 'blosc:lz4', complevel: int = 1, mode: str = 'a', chunksize: int = 1000000, overwrite_month: bool = True) str[source]¶
Persist the raw trades to an on-disk HDF5 store. The data of each calendar month lives under
/trades/YYYY-MMin the file.When adding new monthly data, it will be stored in a new group.
When adding data for an existing month, you can either append to it or overwrite it with confirmation.
- Parameters:
filepath – Destination .h5 file. The parent directories are created automatically when missing.
month_key – Override the key of the form
"YYYY-MM". WhenNonethe key is derived from the first timestamp ofself.data.complib – Compression backend used by PyTables. Default is
blosc:zstd.complevel – Compression level. Default is 5.
mode – File mode –
"a"to create or append,"w"to start fresh. Default is"a".chunksize – Row chunk size used by PyTables when writing large frames. Default is 1000000.
overwrite_month – If True and the month data exists, prompts for confirmation to overwrite. Default is True.
- Returns:
The full key used inside the store, e.g.
"/trades/2025-02".- Raises:
ValueError if user declines to overwrite existing data.
- set_view_range(start: Timestamp | str, end: Timestamp | str)[source]¶
Set the view range for the trades data. :param start: Start timestamp for the view range. :param end: End timestamp for the view range. :return: None
- property start_date¶
Get the start date of the trades data.
- Returns:
Start date as a pandas Timestamp.