finmlkit.feature.base module¶

class finmlkit.feature.base.BaseTransform(input_cols: Sequence[str] | str, output_cols: Sequence[str] | str)[source]¶

Bases: ABC

Abstract base class for data transformations in financial machine learning pipelines.

This class provides a standardized interface for implementing feature transformations, technical indicators, and data processing operations on financial time series data. It serves as the foundation for a modular transformation system that enables composable data preprocessing workflows with consistent input/output handling and validation.

The transform framework is designed around the concept of declarative data dependencies, where each transform explicitly specifies its required input columns and produced output columns. This approach enables automatic dependency resolution, pipeline validation, and efficient computation planning for complex feature engineering workflows.

Core Design Principles:

Explicit Dependencies: Each transform declares required input columns (requires) and output columns (produces), enabling automated pipeline construction and validation.
Backend Flexibility: Supports multiple computational backends (pandas "pd" for development/debugging, NumPy "nb" for production performance) with consistent interfaces.
Immutable Operations: Transforms are designed as pure functions that don’t modify input data, promoting reproducibility and thread safety in parallel processing environments.
Composability: Transforms can be chained together to create complex feature engineering pipelines, with automatic handling of intermediate column dependencies.

Transform Lifecycle:

The execution of a transform follows this standardized pattern:

Input Validation: _validate_input() ensures required columns are present and data types are appropriate
Computation: __call__() applies the core transformation logic using the specified backend
Output Formatting: Results are returned as properly named Series or tuples for integration into DataFrames

This lifecycle enables robust error handling and consistent behavior across different transform implementations.

Backend Architecture:

The dual-backend system provides flexibility for different use cases:

Pandas Backend (``”pd”``): Uses pandas operations for readable, debuggable code with excellent error messages and automatic handling of missing data, timestamps, and mixed data types.
Numba Backend (``”nb”``): Leverages Numba for high-performance vectorized operations on numeric data, suitable for production environments with large datasets.

Subclasses typically implement both backends to provide optimal performance characteristics for their specific use case while maintaining consistent results across backends.

Note

Subclasses must implement all abstract methods (__call__, _validate_input, output_name) to provide complete functionality. The base class handles input/output column management and provides the structural framework for consistent transform behavior.

Note

For transforms producing multiple outputs, ensure that the length of produces matches the number of Series returned by __call__. This enables proper column naming in downstream DataFrame construction and pipeline operations.

Parameters:

input_cols (Union[Sequence[str], str]) – Column name(s) required as input for the transformation. Can be a single string for single-column transforms or a sequence for multi-column operations.
output_cols (Union[Sequence[str], str]) – Column name(s) produced by the transformation. Must match the number of outputs returned by the __call__ method.

Raises:

AssertionError – If input_cols or output_cols are not strings or sequences of strings.
NotImplementedError – If abstract methods are not implemented in subclasses.

See also

CoreTransform: Extends this base class to implement specific transformations

__init__(input_cols: Sequence[str] | str, output_cols: Sequence[str] | str)[source]¶

_abc_impl = <_abc._abc_data object>¶

_output_name: str | list[str]¶

abstract _validate_input(x: DataFrame) → bool[source]¶

Check if the input columns are present in the input DataFrame. This method is called before applying the transform.

Parameters:: x – DataFrame to validate
Returns:: True if the input is valid

abstract property output_name: str | list[str]¶: Get the output names of the transform. This is used to determine the output column names in the DataFrame. Used by prepare_output_nb to create the output Series. :return: Output name or list of output names

produces: list[str]¶

requires: list[str]¶

class finmlkit.feature.base.BinaryOpTransform(left: BaseTransform, right: BaseTransform, op_name: str, op_func: Callable)[source]¶

Bases: BaseTransform

Transform that applies binary operations between two transforms

__init__(left: BaseTransform, right: BaseTransform, op_name: str, op_func: Callable)[source]¶

_abc_impl = <_abc._abc_data object>¶

_validate_input(x)[source]¶

Check if the input columns are present in the input DataFrame. This method is called before applying the transform.

Parameters:: x – DataFrame to validate
Returns:: True if the input is valid

property output_name: str | list[str]¶: Get the output names of the transform. This is used to determine the output column names in the DataFrame. Used by prepare_output_nb to create the output Series. :return: Output name or list of output names

class finmlkit.feature.base.ConstantOpTransform(transform: BaseTransform, constant: float, op_name: str, op_func: Callable)[source]¶

Bases: BaseTransform

Transform that applies operations between a transform and a constant

__init__(transform: BaseTransform, constant: float, op_name: str, op_func: Callable)[source]¶

_abc_impl = <_abc._abc_data object>¶

_validate_input(x)[source]¶

Check if the input columns are present in the input DataFrame. This method is called before applying the transform.

Parameters:: x – DataFrame to validate
Returns:: True if the input is valid

property output_name: str | list[str]¶: Get the output names of the transform. This is used to determine the output column names in the DataFrame. Used by prepare_output_nb to create the output Series. :return: Output name or list of output names

class finmlkit.feature.base.CoreTransform(input_cols: Sequence[str] | str, output_cols: Sequence[str] | str)[source]¶

Bases: BaseTransform, ABC

Concrete implementation framework for data transformations with dual-backend support and temporal data handling.

This class extends BaseTransform by providing a complete implementation skeleton for data transformations that require both pandas and Numba computational backends. It serves as the primary base class for financial indicators, technical analysis functions, and time-series feature engineering operations that need to handle temporal data with high performance requirements.

CoreTransform Architecture:

The class implements the abstract BaseTransform.__call__() method and introduces a structured approach to backend-specific computation through four new abstract methods that subclasses must implement:

_pd(): Pandas-based implementation for development and mixed data types
_nb(): Numba-based implementation for production performance
_prepare_input_nb(): Data preparation for NumPy backend operations
_prepare_output_nb(): Result formatting for consistent DataFrame integration

This separation enables clean implementation of complex transforms while maintaining performance optimization opportunities through specialized NumPy operations and potential Numba compilation.

Temporal Data Support:

CoreTransform provides specialized utilities for time-series data processing, which is essential for financial machine learning applications:

DateTime Index Validation: Ensures input DataFrames have proper temporal indexing for time-based features
Timestamp Extraction: Converts pandas datetime indexes to nanosecond timestamps for efficient numerical operations
Temporal Consistency: Maintains index alignment between input and output data for proper time-series handling

These features enable transforms to work seamlessly with financial time series while preserving temporal relationships and enabling vectorized operations on timestamp data.

Backend Implementation Pattern:

The dual-backend pattern follows this structure:

def _pd(self, x: pd.DataFrame) -> pd.Series:
    # Pandas implementation - readable, handles edge cases
    return x['price'].rolling(window=self.window).mean()

def _nb(self, x: pd.DataFrame) -> pd.Series:
    # NumPy implementation - optimized for performance
    inputs = self._prepare_input_nb(x)
    result = fast_moving_average_nb(inputs['price'], self.window)
    return self._prepare_output_nb(x.index, result)

This pattern enables subclasses to provide both readable pandas code for development and optimized NumPy/Numba code for production, with automatic backend selection based on performance requirements.

Error Handling and Validation:

The class enhances the validation framework from BaseTransform with temporal-specific checks: - Validates datetime indexes for time-based operations - Ensures sufficient data history for windowed computations - Provides clear error messages for temporal data inconsistencies

Note

Subclasses implementing time-based features should call _check_datetime_index() in their _validate_input() implementation to ensure proper temporal data handling.

Note

The NumPy backend (_nb()) should leverage vectorized operations and consider Numba compilation for transforms that will be applied to large datasets or in real-time processing scenarios.

Parameters:

input_cols (Union[Sequence[str], str]) – Column name(s) required as input for the transformation. Inherited from BaseTransform.
output_cols (Union[Sequence[str], str]) – Column name(s) produced by the transformation. Inherited from BaseTransform.

Raises:

ValueError – If backend is not “pd” or “nb”, or if datetime index validation fails.
TypeError – If input is not a pandas DataFrame for temporal operations.
NotImplementedError – If required abstract methods are not implemented by subclasses.

See also

SISOTransform

__init__(input_cols: Sequence[str] | str, output_cols: Sequence[str] | str)[source]¶

_abc_impl = <_abc._abc_data object>¶

static _check_datetime_index(x: DataFrame) → bool[source]¶

Validate that input DataFrame has a datetime index suitable for time-based operations.

This static method provides a reusable validation check for transforms that require temporal data. It ensures the DataFrame index can support time-based feature calculations and windowed operations that depend on temporal ordering.

Parameters:

x – DataFrame to validate for datetime index.

Returns:

True if validation passes.

Raises:

ValueError – If DataFrame does not have a datetime index.
TypeError – If input is not a pandas DataFrame.

Note

This method should be called in the _validate_input() implementation of subclasses that perform time-based computations.

_get_timestamps(x: DataFrame) → ndarray[tuple[int, ...], dtype[int64]][source]¶

Extract nanosecond timestamps from DataFrame index for numerical operations.

Converts pandas datetime index to NumPy array of int64 nanosecond timestamps, enabling efficient vectorized operations on temporal data while preserving precision for high-frequency financial data.

Parameters:: x – DataFrame with datetime index to extract timestamps from.
Returns:: NumPy array of timestamps as int64 nanoseconds since epoch.
Raises:: ValueError – If DataFrame does not have a datetime index.

Note

Nanosecond precision is maintained to support high-frequency trading data where microsecond or nanosecond timing precision may be relevant for analysis.

abstract _nb(x: DataFrame | Series) → Series | tuple[Series][source]¶

abstract _pd(x: DataFrame | Series) → Series | tuple[Series][source]¶: Transform the input data using pandas. For fast prototyping :param x: DataFrame or Series to transform

abstract _prepare_input_nb(x: DataFrame) → dict[str, ndarray[tuple[int, ...], dtype[_ScalarType_co]]] | ndarray[tuple[int, ...], dtype[_ScalarType_co]][source]¶

Prepare array inputs for numba functions.

Parameters:: x – DataFrame or Series to transform
Returns:: Dict of input data for DataFrame or array for Series

abstract _prepare_output_nb(idx: Index, y: ndarray[tuple[int, ...], dtype[_ScalarType_co]] | tuple[ndarray[tuple[int, ...], dtype[_ScalarType_co]]]) → Series | tuple[Series, ...][source]¶: Prepare the output data for numba functions. :param idx: index of the original DataFrame :param y: Output data from the transform :return: Series or tuple of Series with the same index as the input data

class finmlkit.feature.base.MIMOTransform(input_cols: Sequence[str], output_cols: Sequence[str])[source]¶

Bases: CoreTransform, ABC

Specialized transform for multiple-input, multiple-output operations in advanced financial feature engineering.

This class extends CoreTransform to provide a comprehensive interface for transforms that require multiple input columns and produce multiple output columns. This pattern represents the most general and powerful transformation capability in quantitative finance, enabling complex multi-dimensional feature engineering, cross-sectional analysis, and sophisticated indicator systems that capture relationships across multiple time series and produce coordinated output features.

MIMO Transform Applications:

The Multiple-Input, Multiple-Output pattern enables the most sophisticated analytical operations:

Portfolio Analytics: Computing multiple risk metrics (VaR, CVaR, Sharpe ratio) from price and volume series
Cross-Asset Analysis: Generating correlation matrices, beta coefficients, and cointegration vectors from multiple asset prices
Factor Models: Computing factor loadings, residuals, and explained variance from multiple input series
Advanced Technical Analysis: Multi-timeframe indicators, regime detection systems, and composite scoring models
Risk Decomposition: Breaking down portfolio risk into systematic and idiosyncratic components across multiple factors
Statistical Arbitrage: Computing spread statistics, mean reversion signals, and hedge ratios from pairs or baskets of assets

Mathematical Framework:

For MIMO transforms operating on input columns \(X_1, X_2, \ldots, X_n\), the transformation produces multiple outputs \(Y_1, Y_2, \ldots, Y_m\) through a system of related functions:

\[\begin{split}\begin{bmatrix} Y_{1,t} \\ Y_{2,t} \\ \vdots \\ Y_{m,t} \end{bmatrix} = \begin{bmatrix} f_1(X_{1,t}, \ldots, X_{n,t}, \theta_1) \\ f_2(X_{1,t}, \ldots, X_{n,t}, \theta_2) \\ \vdots \\ f_m(X_{1,t}, \ldots, X_{n,t}, \theta_m) \end{bmatrix}\end{split}\]

where \(\theta_i\) represents function-specific parameters. The functions often share computational dependencies, enabling efficient batch processing and Numba compilation.

Output Naming Strategy:

Unlike the previous transform types, MIMO transforms use output names directly as specified in the output_cols parameter. This approach prevents unwieldy concatenated names when dealing with multiple inputs and outputs, and allows for semantic naming that clearly describes the transform’s purpose.

The naming philosophy follows: descriptive and domain-specific names that clearly indicate the analytical purpose rather than mechanical combinations of input column names.

Parameters:

input_cols (Sequence[str]) – Names of input columns required for the transformation. Order may be significant for certain mathematical operations.
output_cols (Sequence[str]) – Names of output columns produced by the transformation. These names are used directly without modification or combination.

Raises:

TypeError – If input is not a pandas DataFrame during validation.
ValueError – If any required input columns are missing from the DataFrame.
ValueError – If the number of output arrays doesn’t match the expected number of outputs.
NotImplementedError – If abstract methods from CoreTransform are not implemented.

Examples

Implementing a portfolio risk decomposition transform:

class PortfolioRiskTransform(MIMOTransform):
    def __init__(self, asset_cols: list[str], weights: np.ndarray):
        output_names = ['portfolio_return', 'total_risk', 'systematic_risk', 'idiosyncratic_risk']
        super().__init__(asset_cols, output_names)
        self.weights = weights
        self.n_assets = len(asset_cols)

    def _pd(self, x: pd.DataFrame) -> tuple[pd.Series, ...]:
        import numpy as np

        # Extract asset returns
        returns = x[self.requires].values

        # Portfolio calculations
        portfolio_returns = returns @ self.weights

        # Risk calculations (simplified)
        rolling_window = 30
        total_risk = pd.Series(portfolio_returns).rolling(rolling_window).std()

        # Placeholder for systematic/idiosyncratic decomposition
        systematic_risk = total_risk * 0.7  # Simplified
        idiosyncratic_risk = total_risk * 0.3

        return (
            pd.Series(portfolio_returns, index=x.index, name=self.output_name[0]),
            total_risk.rename(self.output_name[1]),
            systematic_risk.rename(self.output_name[2]),
            idiosyncratic_risk.rename(self.output_name[3])
        )

    def _nb(self, x: pd.DataFrame) -> tuple[pd.Series, ...]:
        import numba as nb
        import numpy as np

        inputs = self._prepare_input_nb(x)
        returns_matrix = np.column_stack([inputs[col] for col in self.requires])

        @nb.jit(nopython=True)
        def compute_portfolio_risk(returns, weights, window=30):
            n_periods = returns.shape[0]
            portfolio_rets = returns @ weights

            total_risk = np.full(n_periods, np.nan)
            systematic_risk = np.full(n_periods, np.nan)
            idiosyncratic_risk = np.full(n_periods, np.nan)

            for i in range(window-1, n_periods):
                window_rets = portfolio_rets[i-window+1:i+1]
                risk_val = np.std(window_rets)
                total_risk[i] = risk_val
                systematic_risk[i] = risk_val * 0.7
                idiosyncratic_risk[i] = risk_val * 0.3

            return portfolio_rets, total_risk, systematic_risk, idiosyncratic_risk

        results = compute_portfolio_risk(returns_matrix, self.weights)
        return self._prepare_output_nb(x.index, results)

Implementing a multi-asset correlation and cointegration system:

class CrossAssetAnalysisTransform(MIMOTransform):
    def __init__(self, asset_cols: list[str], window: int = 60):
        output_names = ['correlation_12', 'cointegration_stat', 'hedge_ratio', 'spread']
        super().__init__(asset_cols[:2], output_names)  # Focus on first two assets
        self.window = window

    def _pd(self, x: pd.DataFrame) -> tuple[pd.Series, ...]:
        asset1, asset2 = self.requires
        s1, s2 = x[asset1], x[asset2]

        # Rolling correlation
        correlation = s1.rolling(self.window).corr(s2)

        # Simple cointegration test (placeholder)
        spread = s1 - s2
        cointegration_stat = spread.rolling(self.window).apply(
            lambda x: abs(x.mean() / x.std()) if x.std() > 0 else 0
        )

        # Hedge ratio from rolling regression
        hedge_ratio = s1.rolling(self.window).cov(s2) / s2.rolling(self.window).var()

        return (
            correlation.rename(self.output_name[0]),
            cointegration_stat.rename(self.output_name[1]),
            hedge_ratio.rename(self.output_name[2]),
            spread.rename(self.output_name[3])
        )

    def _nb(self, x: pd.DataFrame) -> tuple[pd.Series, ...]:
        import numba as nb
        import numpy as np

        inputs = self._prepare_input_nb(x)
        asset1_prices = inputs[self.requires[0]]
        asset2_prices = inputs[self.requires[1]]

        @nb.jit(nopython=True)
        def compute_cross_asset_metrics(p1, p2, window):
            n = len(p1)
            correlation = np.full(n, np.nan)
            cointegration = np.full(n, np.nan)
            hedge_ratio = np.full(n, np.nan)
            spread = p1 - p2

            for i in range(window-1, n):
                w1 = p1[i-window+1:i+1]
                w2 = p2[i-window+1:i+1]
                w_spread = spread[i-window+1:i+1]

                # Correlation
                correlation[i] = np.corrcoef(w1, w2)[0, 1]

                # Cointegration statistic
                spread_mean = np.mean(w_spread)
                spread_std = np.std(w_spread)
                cointegration[i] = abs(spread_mean / spread_std) if spread_std > 0 else 0

                # Hedge ratio
                cov_12 = np.cov(w1, w2)[0, 1]
                var_2 = np.var(w2)
                hedge_ratio[i] = cov_12 / var_2 if var_2 > 0 else 0

            return correlation, cointegration, hedge_ratio, spread

        results = compute_cross_asset_metrics(asset1_prices, asset2_prices, self.window)
        return self._prepare_output_nb(x.index, results)

Using MIMO transforms in complex feature pipelines:

>>> 
>>> import pandas as pd
>>> import numpy as np
>>> # Sample multi-asset data
>>> dates = pd.date_range('2023-01-01', periods=100, freq='D')
>>> np.random.seed(42)
>>> data = pd.DataFrame({
...     'stock_a': 100 + np.random.randn(100).cumsum(),
...     'stock_b': 100 + np.random.randn(100).cumsum(),
...     'stock_c': 100 + np.random.randn(100).cumsum()
... }, index=dates)
>>>
>>> # Create portfolio analysis transform
>>> weights = np.array([0.4, 0.4, 0.2])
>>> portfolio_transform = PortfolioRiskTransform(['stock_a', 'stock_b', 'stock_c'], weights)  
>>> print(f"Input columns: {portfolio_transform.requires}")  
Input columns: ['stock_a', 'stock_b', 'stock_c']
>>> print(f"Output names: {portfolio_transform.output_name}")  
Output names: ['portfolio_return', 'total_risk', 'systematic_risk', 'idiosyncratic_risk']
>>>
>>> # Apply transform
>>> portfolio_results = portfolio_transform(data, backend='pd')  
>>> print(f"Generated {len(portfolio_results)} output series")  
Generated 4 output series

See also

CoreTransform: General transform base class providing the foundational framework.
SISOTransform: Transform base class for single-input, single-output operations.
MISOTransform: Transform base class for multiple-input, single-output operations.
SIMOTransform: Transform base class for single-input, multiple-output operations.
FactorModel: Specialized MIMO transform for factor analysis and decomposition.

References

__init__(input_cols: Sequence[str], output_cols: Sequence[str])[source]¶

_abc_impl = <_abc._abc_data object>¶

_prepare_input_nb(x: DataFrame) → dict[str, ndarray[tuple[int, ...], dtype[_ScalarType_co]]][source]¶: Prepare the input data for numba functions. :param x: DataFrame to transform :return: Dict of input data for each column

_prepare_output_nb(idx: Index, y: tuple[ndarray[tuple[int, ...], dtype[_ScalarType_co]]]) → tuple[Series, ...][source]¶: Prepare the output data for numba functions. :param idx: index of the original DataFrame :param y: Output data from the transform :return: Tuple of Series with the same index as the input data

_validate_input(x: DataFrame) → bool[source]¶

Check if the input columns are present in the input DataFrame. This method is called before applying the transform.

Parameters:: x – DataFrame to validate
Returns:: True if the input is valid

property output_name: list[str]¶: Get the output names of the transform. :return: List of output names

class finmlkit.feature.base.MISOTransform(input_cols: Sequence[str], output_col: str)[source]¶

Bases: CoreTransform, ABC

Specialized transform for multiple-input, single-output operations in financial feature engineering.

This class extends CoreTransform to provide an optimized interface for transforms that require multiple input columns but produce exactly one output column. This pattern is fundamental in quantitative finance for creating composite features, ratios, statistical relationships, and cross-sectional indicators that combine information from multiple data sources or time series.

MISO Transform Applications:

The Multiple-Input, Single-Output pattern captures essential relationships in financial

Price Ratios and Spreads: Computing price ratios between assets, bid-ask spreads, or relative strength metrics
Cross-Asset Correlations: Rolling correlations, beta calculations, or cointegration measures between multiple series
Volume-Price Relationships: VWAP calculations, volume-weighted returns, or price-volume divergence indicators
Multi-Timeframe Indicators: Combining fast and slow moving averages, momentum crossovers, or trend convergence measures
Statistical Composites: Principal component features, factor loadings, or custom composite scores

Mathematical Framework:

For MISO transforms operating on input columns \(X_1, X_2, \ldots, X_n\), the transformation produces a single output \(Y\) through a function \(f\):

\[Y_t = f(X_{1,t}, X_{2,t}, \ldots, X_{n,t}, \theta)\]

where \(\theta\) represents transform-specific parameters (e.g., window sizes, weights, thresholds).

Common examples include:

Price Ratio: \(Y_t = \frac{P_{1,t}}{P_{2,t}}\)
Spread: \(Y_t = P_{1,t} - P_{2,t}\)
Correlation: \(Y_t = \text{Corr}(X_{1,t-w:t}, X_{2,t-w:t})\)
VWAP: \(Y_t = \frac{\sum_{i=t-w}^{t} P_i \cdot V_i}{\sum_{i=t-w}^{t} V_i}\)

Performance Optimization:

MISO transforms benefit significantly from Numba compilation due to their multi-column computational requirements:

Vectorized Operations: Multiple input arrays can be processed simultaneously with optimized loops
Memory Efficiency: Dictionary-based input preparation minimizes data copying and memory allocation
JIT Compilation: Complex mathematical operations across multiple series compile to efficient machine code
Parallel Processing: Independent calculations across time periods can leverage parallel execution

Input Management:

The class provides structured input handling through:

Column Validation: Ensures all required input columns are present before computation
Type Consistency: Maintains data type integrity across multiple input series
Missing Data Handling: Provides framework for consistent NaN propagation across inputs
Index Alignment: Preserves temporal relationships when combining multiple time series

Note

Unlike SISOTransform, MISO transforms use the output column name directly rather than combining input and output names. This prevents unwieldy names when multiple inputs are involved (e.g., prefer 'price_ratio' over 'high_low_close_price_ratio').

Note

When implementing Numba-compiled transforms (_nb method), ensure all input arrays have compatible dtypes to avoid compilation issues. Consider explicit type conversion in _prepare_input_nb() for numerical stability across different data sources.

Parameters:

input_cols (Sequence[str]) – Names of input columns required for the transformation. Order matters for transforms where column sequence affects computation.
output_col (str) – Name of the single output column produced by the transformation.

Raises:

TypeError – If input is not a pandas DataFrame during validation.
ValueError – If any required input columns are missing from the DataFrame.
NotImplementedError – If abstract methods from CoreTransform are not implemented.

Examples

Implementing a simple price ratio transform:

class PriceRatioTransform(MISOTransform):
    def __init__(self, numerator_col: str, denominator_col: str, output_name: str = None):
        if output_name is None:
            output_name = f'{numerator_col}_{denominator_col}_ratio'
        super().__init__([numerator_col, denominator_col], output_name)

    def _pd(self, x: pd.DataFrame) -> pd.Series:
        num_col, den_col = self.requires
        ratio = x[num_col] / x[den_col]
        return ratio.rename(self.output_name)

    def _nb(self, x: pd.DataFrame) -> pd.Series:
        import numba as nb

        inputs = self._prepare_input_nb(x)
        numerator = inputs[self.requires[0]]
        denominator = inputs[self.requires[1]]

        @nb.jit(nopython=True)
        def compute_ratio(num_arr, den_arr):
            return num_arr / den_arr

        result = compute_ratio(numerator, denominator)
        return self._prepare_output_nb(x.index, result)

Implementing a rolling correlation transform:

class RollingCorrelationTransform(MISOTransform):
    def __init__(self, col1: str, col2: str, window: int):
        super().__init__([col1, col2], f'corr_{window}d')
        self.window = window

    def _pd(self, x: pd.DataFrame) -> pd.Series:
        col1, col2 = self.requires
        corr = x[col1].rolling(self.window).corr(x[col2])
        return corr.rename(self.output_name)

    def _nb(self, x: pd.DataFrame) -> pd.Series:
        import numba as nb
        import numpy as np

        inputs = self._prepare_input_nb(x)
        arr1 = inputs[self.requires[0]].astype(np.float64)
        arr2 = inputs[self.requires[1]].astype(np.float64)

        @nb.jit(nopython=True)
        def rolling_correlation(x1, x2, window):
            n = len(x1)
            result = np.full(n, np.nan)
            for i in range(window-1, n):
                start_idx = i - window + 1
                sub1 = x1[start_idx:i+1]
                sub2 = x2[start_idx:i+1]
                result[i] = np.corrcoef(sub1, sub2)[0, 1]
            return result

        corr_values = rolling_correlation(arr1, arr2, self.window)
        return self._prepare_output_nb(x.index, corr_values)

Using MISO transforms in practice:

>>> 
>>> import pandas as pd
>>> import numpy as np
>>> # Sample data with multiple price series
>>> dates = pd.date_range('2023-01-01', periods=20, freq='D')
>>> data = pd.DataFrame({
...     'stock_a': np.random.randn(20).cumsum() + 100,
...     'stock_b': np.random.randn(20).cumsum() + 100,
...     'volume_a': np.random.randint(1000, 5000, 20),
...     'volume_b': np.random.randint(1000, 5000, 20)
... }, index=dates)
>>>
>>> # Create price ratio transform
>>> ratio_transform = PriceRatioTransform('stock_a', 'stock_b', 'a_b_ratio')  
>>> print(f"Input columns: {ratio_transform.requires}")  
Input columns: ['stock_a', 'stock_b']
>>> print(f"Output name: {ratio_transform.output_name}")  
Output name: a_b_ratio
>>>
>>> # Apply transform
>>> ratio_series = ratio_transform(data, backend='pd')  
>>> print(f"Ratio range: {ratio_series.min():.3f} - {ratio_series.max():.3f}")  
Ratio range: 0.943 - 1.089

See also

CoreTransform: General transform base class for multi-input/output operations.
SISOTransform: Transform base class for single-input, single-output operations.
MIMOTransform: Transform base class for multiple-input, multiple-output operations.
CrossSectionalTransform: Specialized MISO for cross-asset relationship analysis.

References

__init__(input_cols: Sequence[str], output_col: str)[source]¶

_abc_impl = <_abc._abc_data object>¶

_prepare_input_nb(x: DataFrame) → dict[str, ndarray[tuple[int, ...], dtype[_ScalarType_co]]][source]¶: Prepare the input data for numba functions. :param x: DataFrame to transform :return: Dict of input data for each column

_prepare_output_nb(idx: Index, y: ndarray[tuple[int, ...], dtype[_ScalarType_co]]) → Series[source]¶: Prepare the output data for numba functions. :param idx: index of the original DataFrame :param y: Output data from the transform :return: Series with the same index as the input data

_validate_input(x: DataFrame) → bool[source]¶

Check if the input columns are present in the input DataFrame. This method is called before applying the transform.

Parameters:: x – DataFrame to validate
Returns:: True if the input is valid

property output_name: str¶

For MISO transforms, the output name is the same as the produces.

Returns:: Output name

class finmlkit.feature.base.MinMaxOpTransform(left: BaseTransform, right: BaseTransform, op_name: str, op_func: Callable)[source]¶

Bases: BaseTransform

Transform that applies min or max operations between two transforms

__init__(left: BaseTransform, right: BaseTransform, op_name: str, op_func: Callable)[source]¶

_abc_impl = <_abc._abc_data object>¶

_validate_input(x)[source]¶

Check if the input columns are present in the input DataFrame. This method is called before applying the transform.

Parameters:: x – DataFrame to validate
Returns:: True if the input is valid

property output_name: str | list[str]¶: Get the output names of the transform. This is used to determine the output column names in the DataFrame. Used by prepare_output_nb to create the output Series. :return: Output name or list of output names

class finmlkit.feature.base.SIMOTransform(input_col: str, output_cols: Sequence[str])[source]¶

Bases: CoreTransform, ABC

Specialized transform for single-input, multiple-output operations in financial feature engineering.

This class extends CoreTransform to provide an optimized interface for transforms that operate on exactly one input column but produce multiple related output columns. This pattern is essential in quantitative finance for decomposing complex indicators, generating feature sets from single time series, and creating comprehensive technical analysis outputs from individual price or volume streams.

SIMO Transform Applications:

The Single-Input, Multiple-Output pattern captures sophisticated analytical relationships:

Technical Indicator Decomposition: Bollinger Bands (upper, middle, lower), MACD components (line, signal, histogram)
Statistical Decomposition: Rolling statistics (mean, std, skew, kurtosis) from single price series
Time Series Analysis: Trend, seasonal, and residual components from decomposition algorithms
Risk Metrics: Multiple percentiles (VaR at different confidence levels) from returns distributions
Momentum Indicators: RSI with associated momentum, rate of change, and divergence signals
Volatility Measures: Different volatility estimators (Parkinson, Garman-Klass, Rogers-Satchell) from OHLC data

Mathematical Framework:

For SIMO transforms operating on input column \(X\), the transformation produces multiple outputs \(Y_1, Y_2, \ldots, Y_m\) through related functions \(f_1, f_2, \ldots, f_m\):

\[Y_{1,t} = f_1(X_t, \theta_1), \quad Y_{2,t} = f_2(X_t, \theta_2), \quad \ldots, \quad Y_{m,t} = f_m(X_t, \theta_m)\]

where \(\theta_i\) represents function-specific parameters. Often, these functions are mathematically related or derived from common intermediate calculations.

Common SIMO Examples:

Bollinger Bands: .. math:

\text{Middle} = \text{SMA}(X, n), \quad \text{Upper} = \text{Middle} + k \cdot \text{Std}(X, n), \quad \text{Lower} = \text{Middle} - k \cdot \text{Std}(X, n)

MACD Components: .. math:

\text{MACD} = \text{EMA}(X, 12) - \text{EMA}(X, 26), \quad \text{Signal} = \text{EMA}(\text{MACD}, 9), \quad \text{Histogram} = \text{MACD} - \text{Signal}

Rolling Statistics: .. math:

\mu_t = \text{Mean}(X_{t-w:t}), \quad \sigma_t = \text{Std}(X_{t-w:t}), \quad S_t = \text{Skew}(X_{t-w:t})

Performance Optimization:

SIMO transforms are particularly well-suited for Numba compilation because:

Shared Computation: Multiple outputs often share intermediate calculations, reducing redundant operations
Vectorized Processing: Single input array can be processed once to generate multiple output arrays
Memory Efficiency: Intermediate results can be reused across output calculations
Batch Operations: All related outputs computed in single pass through input data

Naming Convention:

Following the established pattern from SISOTransform, output columns combine the input column name with each transform-specific suffix:

\[\text{output\_names} = [\text{input\_col} + \text{"\_"} + \text{output\_col}_i \text{ for } i \text{ in produces}]\]

For example, Bollinger Bands on 'close' prices with outputs ['bb_upper', 'bb_middle', 'bb_lower'] produces ['close_bb_upper', 'close_bb_middle', 'close_bb_lower'].

Note

SIMO transforms excel when multiple related features are derived from the same input, sharing computational overhead. For unrelated outputs from the same input, consider separate SISO transforms for better modularity and debugging capabilities.

Note

When implementing Numba-compiled transforms, ensure all output arrays have consistent lengths and appropriate dtypes. The _prepare_output_nb() method validates output count automatically.

Parameters:

input_col (str) – Name of the single input column to transform.
output_cols (Sequence[str]) – Sequence of output column suffixes that will be combined with the input column name to create full output column names.

Raises:

TypeError – If input is not a pandas DataFrame during validation.
ValueError – If the specified input column is not present in the DataFrame.
ValueError – If the number of output arrays doesn’t match the expected number of outputs.
NotImplementedError – If abstract methods from CoreTransform are not implemented.

Examples

Implementing Bollinger Bands as a SIMO transform:

class BollingerBandsTransform(SIMOTransform):
    def __init__(self, window: int = 20, std_dev: float = 2.0, input_col: str = 'close'):
        super().__init__(input_col, ['bb_upper', 'bb_middle', 'bb_lower'])
        self.window = window
        self.std_dev = std_dev

    def _pd(self, x: pd.DataFrame) -> tuple[pd.Series, ...]:
        price_series = x[self.requires[0]]

        # Shared calculations
        rolling_mean = price_series.rolling(self.window).mean()
        rolling_std = price_series.rolling(self.window).std()

        # Multiple outputs
        bb_upper = rolling_mean + (self.std_dev * rolling_std)
        bb_middle = rolling_mean
        bb_lower = rolling_mean - (self.std_dev * rolling_std)

        # Rename outputs according to SIMO convention
        return (
            bb_upper.rename(self.output_name[0]),
            bb_middle.rename(self.output_name[1]),
            bb_lower.rename(self.output_name[2])
        )

    def _nb(self, x: pd.DataFrame) -> tuple[pd.Series, ...]:
        import numba as nb
        import numpy as np

        prices = self._prepare_input_nb(x)

        @nb.jit(nopython=True)
        def compute_bollinger_bands(prices, window, std_dev):
            n = len(prices)
            upper = np.full(n, np.nan)
            middle = np.full(n, np.nan)
            lower = np.full(n, np.nan)

            for i in range(window-1, n):
                window_data = prices[i-window+1:i+1]
                mean_val = np.mean(window_data)
                std_val = np.std(window_data)

                middle[i] = mean_val
                upper[i] = mean_val + std_dev * std_val
                lower[i] = mean_val - std_dev * std_val

            return upper, middle, lower

        results = compute_bollinger_bands(prices, self.window, self.std_dev)
        return self._prepare_output_nb(x.index, results)

Implementing rolling statistics as a SIMO transform:

class RollingStatsTransform(SIMOTransform):
    def __init__(self, window: int, input_col: str = 'returns'):
        super().__init__(input_col, ['mean', 'std', 'skew', 'kurt'])
        self.window = window

    def _pd(self, x: pd.DataFrame) -> tuple[pd.Series, ...]:
        series = x[self.requires[0]]
        rolling = series.rolling(self.window)

        mean_vals = rolling.mean()
        std_vals = rolling.std()
        skew_vals = rolling.skew()
        kurt_vals = rolling.kurt()

        return (
            mean_vals.rename(self.output_name[0]),
            std_vals.rename(self.output_name[1]),
            skew_vals.rename(self.output_name[2]),
            kurt_vals.rename(self.output_name[3])
        )

    def _nb(self, x: pd.DataFrame) -> tuple[pd.Series, ...]:
        import numba as nb
        import numpy as np
        from scipy import stats

        data = self._prepare_input_nb(x)

        @nb.jit(nopython=True)
        def compute_rolling_stats(data, window):
            n = len(data)
            means = np.full(n, np.nan)
            stds = np.full(n, np.nan)
            skews = np.full(n, np.nan)
            kurts = np.full(n, np.nan)

            for i in range(window-1, n):
                window_data = data[i-window+1:i+1]
                means[i] = np.mean(window_data)
                stds[i] = np.std(window_data)
                # Note: Numba-compatible skew/kurtosis implementations needed

            return means, stds, skews, kurts

        results = compute_rolling_stats(data, self.window)
        return self._prepare_output_nb(x.index, results)

Using SIMO transforms in feature pipelines:

>>> 
>>> import pandas as pd
>>> import numpy as np
>>> # Sample price data
>>> dates = pd.date_range('2023-01-01', periods=50, freq='D')
>>> data = pd.DataFrame({
...     'close': 100 + np.random.randn(50).cumsum()
... }, index=dates)
>>>
>>> # Create Bollinger Bands transform
>>> bb_transform = BollingerBandsTransform(window=20, input_col='close')  
>>> print(f"Input column: {bb_transform.requires[0]}")  
Input column: close
>>> print(f"Output names: {bb_transform.output_name}")  
Output names: ['close_bb_upper', 'close_bb_middle', 'close_bb_lower']
>>>
>>> # Apply transform
>>> bb_results = bb_transform(data, backend='pd')  
>>> print(f"Generated {len(bb_results)} output series")  
Generated 3 output series
>>>
>>> # Integrate into DataFrame
>>> enhanced_data = data.copy()  
>>> for i, series in enumerate(bb_results):  
...     enhanced_data[bb_transform.output_name[i]] = series  

See also

CoreTransform: General transform base class for multi-input/output operations.
SISOTransform: Transform base class for single-input, single-output operations.
MISOTransform: Transform base class for multiple-input, single-output operations.
TechnicalIndicator: Specialized base class for comprehensive technical analysis indicators.

References

__init__(input_col: str, output_cols: Sequence[str])[source]¶

_abc_impl = <_abc._abc_data object>¶

_prepare_input_nb(x: DataFrame) → ndarray[tuple[int, ...], dtype[_ScalarType_co]][source]¶: Prepare the input data for numba functions. :param x: DataFrame to transform :return: Numpy array of the input column

_prepare_output_nb(idx: Index, y: tuple[ndarray[tuple[int, ...], dtype[_ScalarType_co]], ...]) → tuple[Series, ...][source]¶: Prepare the output data for numba functions. :param idx: index of the original DataFrame :param y: Output data from the transform :return: Tuple of Series with the same index as the input data

_validate_input(x: DataFrame) → bool[source]¶

Check if the input columns are present in the input DataFrame. This method is called before applying the transform.

Parameters:: x – DataFrame to validate
Returns:: True if the input is valid

property output_name: list[str]¶: Get the output names of the transform. For SIMO transforms, the output names are derived from the input column name. :return: List of output names

class finmlkit.feature.base.SISOTransform(input_col: str, output_col: str)[source]¶

Bases: CoreTransform, ABC

Specialized transform for single-input, single-output operations on financial time series data.

This class extends CoreTransform to provide a streamlined interface for transforms that operate on exactly one input column and produce exactly one output column. It implements the most common pattern in financial feature engineering, where individual price series, volumes, or derived metrics are transformed into new features through mathematical operations, statistical calculations, or technical indicators.

SISO Transform Pattern:

The Single-Input, Single-Output pattern is fundamental in quantitative finance for creating derived features:

Price Transformations: Converting raw prices to returns, log-returns, or normalized values
Statistical Features: Computing rolling statistics like moving averages, volatility, or z-scores
Technical Indicators: Calculating RSI, MACD, Bollinger Bands, or momentum indicators
Mathematical Operations: Applying log transforms, power functions, or custom mathematical mappings

This specialization provides several advantages over the general CoreTransform:

Simplified Interface: Single string parameters instead of sequences for input/output specification
Automatic Naming: Output columns follow a standardized {input_col}_{output_col} naming convention
Type Safety: Guarantees single Series input/output for cleaner implementation
Performance Optimization: Streamlined data preparation methods optimized for single-column operations

Naming Convention:

The class implements a consistent naming scheme where output columns combine the input column name with the transform-specific suffix:

\[\text{output\_name} = \text{input\_col} + \text{"\_"} + \text{output\_col}\]

For example, transforming the 'close' price with a 'sma_20' transform produces 'close_sma_20'. This convention enables clear traceability of feature derivation and prevents naming conflicts in complex feature engineering pipelines.

Implementation Framework:

Subclasses need only implement the abstract methods from CoreTransform:

_pd(): Pandas-based computation for development and debugging
_nb(): NumPy/Numba-based computation for production performance

The class provides concrete implementations for input/output preparation and validation, significantly reducing boilerplate code for single-column transforms.

Note

The standardized naming convention assumes that transform names (output_col) are descriptive and unique within a feature set. Consider using prefixes or suffixes that clearly identify the transform type and parameters (e.g., 'sma_20', 'rsi_14', 'vol_30d').

Note

For transforms requiring multiple input columns (e.g., price and volume for VWAP), use the more general CoreTransform base class instead. SISO transforms are optimized specifically for single-column operations.

Parameters:

input_col (str) – Name of the input column to transform (e.g., ‘close’, ‘volume’, ‘high’).
output_col (str) – Suffix for the output column name. Combined with input_col to create the full output column name following the pattern {input_col}_{output_col}.

Raises:

TypeError – If input is not a pandas DataFrame during validation.
ValueError – If the specified input column is not present in the DataFrame.
NotImplementedError – If abstract methods from CoreTransform are not implemented.

Examples

Implementing a simple moving average transform:

class SimpleMovingAverageTransform(SISOTransform):
    def __init__(self, window: int, input_col: str = 'close'):
        super().__init__(input_col, f'sma_{window}')
        self.window = window

    def _pd(self, x: pd.DataFrame) -> pd.Series:
        outp = x[self.requires[0]].rolling(window=self.window).mean()
        return outp.rename(self.output_name)

    def _nb(self, x: pd.DataFrame) -> pd.Series:
        import numpy as np
        from scipy import ndimage

        data = self._prepare_input_nb(x)
        # Use uniform filter for moving average
        result = ndimage.uniform_filter1d(data.astype(float),
                                         size=self.window,
                                         mode='constant',
                                         cval=np.nan)
        return self._prepare_output_nb(x.index, result)

Using SISO transforms in a feature pipeline:

>>> 
>>> import pandas as pd
>>> import numpy as np
>>> # Sample price data
>>> dates = pd.date_range('2023-01-01', periods=10, freq='D')
>>> data = pd.DataFrame({
...     'close': [100, 102, 101, 103, 105, 104, 106, 108, 107, 109]
... }, index=dates)
>>>
>>> # Create transform
>>> sma_transform = SimpleMovingAverageTransform(window=3)  
>>> print(f"Input column: {sma_transform.requires[0]}")  
Input column: close
>>> print(f"Output name: {sma_transform.output_name}")  
Output name: close_sma_3
>>>
>>> # Apply transform
>>> sma_values = sma_transform(data, backend='pd')  
>>> print(f"First valid SMA: {sma_values.dropna().iloc[0]:.2f}")  
First valid SMA: 101.00

Chaining multiple SISO transforms:

# Create multiple transforms
sma_5 = SimpleMovingAverageTransform(5, 'close')     # close_sma_5
sma_20 = SimpleMovingAverageTransform(20, 'close')   # close_sma_20

# Apply to same data
data_with_sma = data.copy()
data_with_sma['close_sma_5'] = sma_5(data, backend='pd')
data_with_sma['close_sma_20'] = sma_20(data, backend='pd')

See also

CoreTransform: General transform base class for multi-input/output operations. MISOTransform: Transform base class for multiple-input, single-output operations.

__init__(input_col: str, output_col: str)[source]¶

_abc_impl = <_abc._abc_data object>¶

_prepare_input_nb(x: DataFrame) → ndarray[tuple[int, ...], dtype[_ScalarType_co]][source]¶: Prepare the input data for numba functions. :param x: DataFrame to transform :return: Numpy array of the input column

_prepare_output_nb(idx: Index, y: ndarray[tuple[int, ...], dtype[_ScalarType_co]]) → Series[source]¶: Prepare the output data for numba functions. :param idx: index of the original DataFrame :param y: Output data from the transform :return: Series with the same index as the input data

_validate_input(x: DataFrame) → bool[source]¶

Check if the input columns are present in the input DataFrame. This method is called before applying the transform.

Parameters:: x – DataFrame to validate
Returns:: True if the input is valid

property output_name: str¶: Get the output name of the transform. This is used to determine the output column name in the DataFrame. :return: Output name

class finmlkit.feature.base.UnaryOpTransform(transform: BaseTransform, op_name: str, op_func: Callable)[source]¶

Bases: BaseTransform

Transform that applies unary operations to a transform

__init__(transform: BaseTransform, op_name: str, op_func: Callable)[source]¶

_abc_impl = <_abc._abc_data object>¶

_validate_input(x)[source]¶

Check if the input columns are present in the input DataFrame. This method is called before applying the transform.

Parameters:: x – DataFrame to validate
Returns:: True if the input is valid

property output_name: str | list[str]¶: Get the output names of the transform. This is used to determine the output column names in the DataFrame. Used by prepare_output_nb to create the output Series. :return: Output name or list of output names