finmlkit.bar.io module¶
- class finmlkit.bar.io.AddTimeBarH5(h5_path: str, keys: list[str] = None)[source]¶
Bases:
objectBuilds and adds 1 sec TimeBar to trades h5
- class finmlkit.bar.io.H5Inspector(filepath: str)[source]¶
Bases:
objectClass to inspect HDF5 files containing trades data.
This class provides methods to list available keys, check metadata, and retrieve basic statistics about the trades data stored in HDF5 format.
- get_integrity_info(key: str) DataFrame | None[source]¶
Get data integrity information for a specific key in the HDF5 file. This retrieves discontinuity information stored during the save_h5 process.
- Parameters:
key – Key to retrieve integrity information for (e.g., ‘/trades/2023-01’).
- Returns:
DataFrame with discontinuity information or None if no integrity issues were found.
- get_integrity_summary(verbose=True) Dict[str, Dict][source]¶
Generate a summary of data integrity issues across all tables in the HDF5 file.
This function identifies tables with integrity issues (data_integrity_ok=False), collects statistics about the issues (missing percentage, etc.), and retrieves the detailed discontinuity information for affected tables.
- Parameters:
verbose – Whether to print the results to console
- Returns:
Dictionary with keys as HDF5 groups and values as dictionaries containing: - ‘metadata’: Basic metadata about the table including integrity flags - ‘discontinuities’: DataFrame with detailed discontinuity information (if available) - Or None if no integrity issues are found
- get_metadata(key: str) Dict[str, any][source]¶
Get metadata for a specific key in the HDF5 file.
- Parameters:
key – Key to retrieve metadata for (Eg.: /trades/2023-02)
- Returns:
Metadata dictionary.
- get_statistics(key: str) Dict[str, any][source]¶
Get basic statistics for a specific key in the HDF5 file.
- Parameters:
key – Key to retrieve statistics for.
- Returns:
Statistics dictionary.
- inspect_gaps(max_gap: Timedelta = Timedelta('0 days 00:01:00'), processes: int = 4) Dict[str, list[tuple[Timestamp, Timedelta]]][source]¶
Inspect gaps in trades data across all keys in the HDF5 file.
- Parameters:
max_gap – Maximum allowable gap between consecutive timestamps.
processes – Number of processes to use for multiprocessing.
- Returns:
Dictionary with keys as HDF5 groups and values as lists of gap timestamps.
- class finmlkit.bar.io.TimeBarReader(h5_path: str)[source]¶
Bases:
objectReads time bars from an H5 file and allows resampling to larger timeframes.
This class enables: - Reading 1-second time bars stored in an H5 file - Filtering by date range - Resampling to arbitrary timeframes (e.g., 5min, 1h, 1d) - Proper aggregation of OHLCV data - Correct calculation of VWAP for resampled periods
- list_keys() List[str][source]¶
List all available klines keys in the HDF5 file.
- Returns:
List of klines keys.
- read(start_time: str | Timestamp | datetime | None = None, end_time: str | Timestamp | datetime | None = None, timeframe: str | None = None) DataFrame[source]¶
Read time bars from the H5 file, optionally filtering by time range and resampling.
- Parameters:
start_time – Start time for filtering (inclusive, optional)
end_time – End time for filtering (inclusive, optional)
timeframe – Timeframe for resampling (e.g., ‘5min’, ‘1h’, ‘1d’, None for original 1s bars)
- Returns:
DataFrame with the requested time bars
Examples
# Get all bars for a specific day (inclusive of both start and end dates) reader = TimeBarReader(‘data.h5’) df_1s = reader.read(‘2023-01-01’, ‘2023-01-01’) # Full day of Jan 1st
# Get all bars for two full months (Feb 1 through Mar 31 inclusive) df_feb_mar = reader.read(‘2022-02-01’, ‘2022-03-31’)
# Get 5-minute bars for a date range df_5min = reader.read(‘2023-01-01’, ‘2023-01-31’, timeframe=’5min’)
# Get hourly bars for a specific month df_1h = reader.read(‘2023-01-01’, ‘2023-01-31’, timeframe=’1h’)
# Get daily bars for all available data df_daily = reader.read(timeframe=’1D’)