finmlkit.bar.io module

class finmlkit.bar.io.AddTimeBarH5(h5_path: str, keys: list[str] = None)[source]

Bases: object

Builds and adds 1 sec TimeBar to trades h5

process_all(overwrite: bool = False) Dict[str, bool][source]

Process all keys to build and save 1-second time bars.

Parameters:

overwrite – Whether to overwrite existing time bar data

Returns:

Dictionary mapping keys to success status

process_key(key: str, overwrite: bool = False) bool[source]

Process a single key to build and save 1-second time bars.

Parameters:
  • key – The key to process (e.g., ‘/trades/2023-01’)

  • overwrite – Whether to overwrite existing time bar data for this key

Returns:

True if successful, False otherwise

class finmlkit.bar.io.H5Inspector(filepath: str)[source]

Bases: object

Class to inspect HDF5 files containing trades data.

This class provides methods to list available keys, check metadata, and retrieve basic statistics about the trades data stored in HDF5 format.

get_integrity_info(key: str) DataFrame | None[source]

Get data integrity information for a specific key in the HDF5 file. This retrieves discontinuity information stored during the save_h5 process.

Parameters:

key – Key to retrieve integrity information for (e.g., ‘/trades/2023-01’).

Returns:

DataFrame with discontinuity information or None if no integrity issues were found.

get_integrity_summary(verbose=True) Dict[str, Dict][source]

Generate a summary of data integrity issues across all tables in the HDF5 file.

This function identifies tables with integrity issues (data_integrity_ok=False), collects statistics about the issues (missing percentage, etc.), and retrieves the detailed discontinuity information for affected tables.

Parameters:

verbose – Whether to print the results to console

Returns:

Dictionary with keys as HDF5 groups and values as dictionaries containing: - ‘metadata’: Basic metadata about the table including integrity flags - ‘discontinuities’: DataFrame with detailed discontinuity information (if available) - Or None if no integrity issues are found

get_metadata(key: str) Dict[str, any][source]

Get metadata for a specific key in the HDF5 file.

Parameters:

key – Key to retrieve metadata for (Eg.: /trades/2023-02)

Returns:

Metadata dictionary.

get_statistics(key: str) Dict[str, any][source]

Get basic statistics for a specific key in the HDF5 file.

Parameters:

key – Key to retrieve statistics for.

Returns:

Statistics dictionary.

inspect_gaps(max_gap: Timedelta = Timedelta('0 days 00:01:00'), processes: int = 4) Dict[str, list[tuple[Timestamp, Timedelta]]][source]

Inspect gaps in trades data across all keys in the HDF5 file.

Parameters:
  • max_gap – Maximum allowable gap between consecutive timestamps.

  • processes – Number of processes to use for multiprocessing.

Returns:

Dictionary with keys as HDF5 groups and values as lists of gap timestamps.

list_keys() list[str][source]

List all available keys in the HDF5 file.

Returns:

List of keys.

class finmlkit.bar.io.TimeBarReader(h5_path: str)[source]

Bases: object

Reads time bars from an H5 file and allows resampling to larger timeframes.

This class enables: - Reading 1-second time bars stored in an H5 file - Filtering by date range - Resampling to arbitrary timeframes (e.g., 5min, 1h, 1d) - Proper aggregation of OHLCV data - Correct calculation of VWAP for resampled periods

list_keys() List[str][source]

List all available klines keys in the HDF5 file.

Returns:

List of klines keys.

read(start_time: str | Timestamp | datetime | None = None, end_time: str | Timestamp | datetime | None = None, timeframe: str | None = None) DataFrame[source]

Read time bars from the H5 file, optionally filtering by time range and resampling.

Parameters:
  • start_time – Start time for filtering (inclusive, optional)

  • end_time – End time for filtering (inclusive, optional)

  • timeframe – Timeframe for resampling (e.g., ‘5min’, ‘1h’, ‘1d’, None for original 1s bars)

Returns:

DataFrame with the requested time bars

Examples

# Get all bars for a specific day (inclusive of both start and end dates) reader = TimeBarReader(‘data.h5’) df_1s = reader.read(‘2023-01-01’, ‘2023-01-01’) # Full day of Jan 1st

# Get all bars for two full months (Feb 1 through Mar 31 inclusive) df_feb_mar = reader.read(‘2022-02-01’, ‘2022-03-31’)

# Get 5-minute bars for a date range df_5min = reader.read(‘2023-01-01’, ‘2023-01-31’, timeframe=’5min’)

# Get hourly bars for a specific month df_1h = reader.read(‘2023-01-01’, ‘2023-01-31’, timeframe=’1h’)

# Get daily bars for all available data df_daily = reader.read(timeframe=’1D’)