Processing Raw Trade Data

This tutorial demonstrates how to process raw trade data using FinMLKit’s TradesData class.

Downloading Raw Trade Data

To begin, download raw trade data from Binance:

curl -s "https://data.binance.vision/data/futures/um/monthly/trades/BTCUSDT/BTCUSDT-trades-2025-07.zip" -o "BTCUSDT-trades-2025-07.zip"
curl -s "https://data.binance.vision/data/futures/um/monthly/trades/BTCUSDT/BTCUSDT-trades-2025-07.zip.CHECKSUM" -o "BTCUSDT-trades-2025-07.zip.CHECKSUM"
shasum -a 256 -c "BTCUSDT-trades-2025-07.zip.CHECKSUM"
unzip -o "BTCUSDT-trades-2025-07.zip"

Preprocessing the Data

Use the TradesData class to preprocess the raw data:

import pandas as pd
from finmlkit.bar.data_model import TradesData

df = pd.read_csv("BTCUSDT-trades-2025-07.csv")
trades = TradesData(
    df.time.values, df.price.values, df.qty.values,
    id=df.id.values, is_buyer_maker=df.is_buyer_maker.values,
    preprocess=True
)

Key Features of TradesData

  • Timestamp Conversion: Converts timestamps to nanoseconds.

  • Data Integrity Checks: Identifies missing trades and discontinuities.

  • Trade Merging: Merges fragmented trades with the same timestamp and price.

Inspect the processed data:

print(trades.discontinuities)  # Check for discontinuities
print(trades.data.head())      # View the processed data

Next Steps

Once the data is processed, you can save it for future use. Continue to the next tutorial: Saving and Loading Data.