finmlkit.feature.kit module

class finmlkit.feature.kit.Compose(*transforms: SISOTransform | MISOTransform)[source]

Bases: BaseTransform

Composite transform that chains multiple single-output transforms into a sequential processing pipeline.

This class implements the Composite Pattern for data transformations, enabling the creation of complex feature engineering pipelines by sequentially chaining SISOTransform and MISOTransform instances. It provides a unified interface for executing multi-step transformations while maintaining the same interface as individual transforms, enabling seamless integration with the broader transform ecosystem.

Pipeline Composition Framework:

The Compose class creates a linear processing pipeline where each transform’s output becomes the input to the next transform in the sequence. For a composition of transforms \(T_1, T_2, \ldots, T_n\), the overall transformation is:

\[Y = T_n(T_{n-1}(\ldots T_2(T_1(X)) \ldots))\]

where \(X\) is the input DataFrame and \(Y\) is the final output Series. This composition enables building sophisticated feature engineering workflows from simple, reusable transform components.

Key Design Features:

  • Sequential Processing: Transforms are applied in the order specified during initialization

  • Type Safety: Only accepts SISO and MISO transforms that produce single outputs compatible with subsequent inputs

  • Automatic Naming: Generates descriptive output names by concatenating all transform identifiers

  • Input Validation: Validates the initial input and ensures compatibility throughout the pipeline

  • Backend Consistency: Maintains the same computational backend across all pipeline stages

  • Caching Optimization: Supports skipping initial transforms if their outputs already exist in the input DataFrame

Pipeline Execution Logic:

The composition handles several execution scenarios:

  1. Fresh Computation: All transforms are executed sequentially from the input DataFrame

  2. Partial Caching: If the first transform’s output already exists in the input DataFrame, it uses the cached result

  3. Intermediate Processing: Each subsequent transform receives a temporary DataFrame containing only the required input column

  4. Result Propagation: Intermediate results are passed through the pipeline until the final output is produced

Naming Convention:

Output names are constructed by concatenating the first transform’s output name with all subsequent transforms’ produces identifiers:

\[\text{output_name} = \text{first_output} + \text{"_"} + \text{produces}_2 + \text{"_"} + \ldots + \text{"_"} + \text{produces}_n\]

For example, composing a moving average transform (producing ‘ma20’), RSI transform (producing ‘rsi14’), and signal transform (producing ‘signal’) results in the output name ‘ma20_rsi14_signal’.

Performance Considerations:

  • Memory Efficiency: Only one intermediate Series is maintained at a time, minimizing memory footprint

  • Backend Optimization: All transforms use the same computational backend for consistency

  • Caching Benefits: Can leverage pre-computed results to avoid redundant calculations

  • Error Propagation: Validation errors are caught early before expensive computations begin

Use Cases in Financial Engineering:

  • Technical Indicator Chains: Price → Moving Average → RSI → Trading Signal

  • Risk Metric Pipelines: Returns → Volatility → Value-at-Risk → Risk Adjusted Return

  • Factor Construction: Raw Data → Normalization → Winsorization → Z-Score → Factor Score

  • Signal Processing: Price → Log Returns → Smoothing → Momentum → Regime Classification

Important

All transforms in the composition must produce single outputs (SISO or MISO only). For transforms producing multiple outputs (SIMO, MIMO), use explicit pipeline construction with intermediate DataFrame management instead of the Compose class.

Parameters:

*transforms (SISOTransform|MISOTransform) – Variable number of transforms to compose into a pipeline. Must be single-output transforms with compatible input/output column specifications. The first transform determines the pipeline’s input requirements.

Raises:
  • TypeError – If input is not a pandas DataFrame during validation.

  • ValueError – If the required input column is not found in the DataFrame.

  • AssertionError – If backend parameter is not “pd” or “nb”.

  • AttributeError – If transforms don’t have required attributes (requires, produces, output_name).

Examples

Creating a technical analysis pipeline:

from your_module import Compose, SimpleMovingAverageTransform, RSITransform, SignalTransform

# Create individual transforms
ma_transform = SimpleMovingAverageTransform('close', 'ma20')
rsi_transform = RSITransform('ma20', 'rsi14')  # RSI of moving average
signal_transform = SignalTransform('rsi14', 'signal')  # Trading signal from RSI

# Compose into pipeline
pipeline = Compose(ma_transform, rsi_transform, signal_transform)

print(f"Pipeline input: {pipeline.requires}")  # ['close']
print(f"Pipeline output: {pipeline.output_name}")  # 'ma20_rsi14_signal'

Using the composed pipeline:

>>> 
>>> # Sample price data
>>> dates = pd.date_range('2023-01-01', periods=100, freq='D')
>>> data = pd.DataFrame({
...     'close': 100 + np.random.randn(100).cumsum()
... }, index=dates)
>>>
>>> # Apply the complete pipeline
>>> pipeline = Compose(ma_transform, rsi_transform, signal_transform)  
>>> result = pipeline(data, backend='pd')  
>>> print(f"Pipeline output type: {type(result)}")  
Pipeline output type: <class 'pandas.core.series.Series'>
>>> print(f"Output name: {result.name}")  
Output name: ma20_rsi14_signal

Advanced pipeline with caching optimization:

>>> 
>>> # Data with pre-computed moving average
>>> data_with_ma = data.copy()
>>> data_with_ma['ma20'] = data['close'].rolling(20).mean()  
>>>
>>> # Pipeline will skip the first transform and use cached MA
>>> result_cached = pipeline(data_with_ma, backend='nb')  
>>> # First transform is skipped, starts with RSI calculation

Multi-step risk analysis pipeline:

# Risk analysis pipeline: Returns → Volatility → VaR → Risk Score
returns_transform = ReturnsTransform('close', 'returns')
volatility_transform = VolatilityTransform('returns', 'vol30')
var_transform = VaRTransform('vol30', 'var95')
risk_score_transform = RiskScoreTransform('var95', 'risk_score')

risk_pipeline = Compose(
    returns_transform,
    volatility_transform,
    var_transform,
    risk_score_transform
)

# Single call computes entire risk analysis chain
risk_metrics = risk_pipeline(price_data, backend='nb')

Integration with Feature class:

# Compose can be wrapped in Feature for mathematical operations
from your_module import Feature

technical_pipeline = Compose(ma_transform, rsi_transform)
technical_feature = Feature(technical_pipeline)

# Mathematical operations on the composed pipeline
normalized_signal = (technical_feature - 50) / 50  # Normalize RSI
combined_signal = technical_feature * volume_feature

See also

  • BaseTransform: The base interface that Compose implements and extends.

  • SISOTransform: Single-input, single-output transforms that can be composed.

  • MISOTransform: Multiple-input, single-output transforms that can be composed.

  • Feature: High-level wrapper that can encapsulate Compose instances for mathematical operations.

  • Pipeline: Alternative approach for more complex multi-branch transformation workflows.

References

__init__(*transforms: SISOTransform | MISOTransform)[source]
_abc_impl = <_abc._abc_data object>
_run_pipeline(x: DataFrame, *, backend) Series[source]

Apply the composed transforms to the input DataFrame with caching/optimization: - If the final output already exists in the input DataFrame, return it immediately. - For each step, if its output exists in the DataFrame, reuse it rather than recomputing. - Prefer DataFrame-provided required column(s) when available; else use the prior step’s output. :param x: DataFrame to transform :param backend: Backend is already specified in the transforms :return: Transformed Series

_validate_input(x: DataFrame) bool[source]

Validate that the input DataFrame contains the required columns for all transforms. :param x: DataFrame to validate :return: True if the input is valid

property output_name: str

Get the output name of the composed transform. The output name is a combination of the first transform’s output and the subsequent transforms’ produces. :return: Output name

class finmlkit.feature.kit.Feature(transform: BaseTransform)[source]

Bases: object

High-level wrapper for data transformations enabling intuitive mathematical operations and fluent feature engineering.

This class provides a user-friendly interface for financial feature engineering by wrapping BaseTransform instances and enabling mathematical operations, function composition, and chainable transformations using familiar Python operators and methods. It serves as the primary building block for constructing complex feature engineering pipelines through an intuitive, expression-based syntax that mirrors mathematical notation.

Core Design Philosophy:

The Feature class implements a fluent interface design pattern that enables natural mathematical expressions for feature engineering. Instead of manually composing transform objects, users can write feature engineering logic using familiar mathematical operators and method chaining:

# Traditional transform composition (verbose)
ma_transform = SimpleMovingAverageTransform('close', 'sma_20')
ratio_transform = BinaryOpTransform(price_transform, ma_transform, 'div', lambda x, y: x / y)

# Feature-based composition (intuitive)
price = Feature(PriceTransform('close'))
ma20 = Feature(SimpleMovingAverageTransform('close', 'sma_20'))
price_to_ma_ratio = price / ma20

Mathematical Operations Framework:

The class overloads Python’s mathematical operators to create new Feature instances with automatically composed transformations. Supported operations include:

  • Binary Operations: Addition (+), subtraction (-), multiplication (*), division (/)

  • Unary Operations: Absolute value (abs()), negation

  • Comparison Operations: Element-wise minimum and maximum (static methods)

  • Constant Operations: Mathematical operations with scalar values (e.g., feature * 2)

  • Reverse Operations: Enable natural syntax like 3 + feature

Each mathematical operation creates a new Feature instance wrapping an appropriate transform that performs the mathematical computation during evaluation.

Function Application and Composition:

The apply() method enables applying arbitrary functions to feature outputs, supporting:

  • Custom Functions: User-defined lambda functions or named functions

  • Built-in Methods: Pandas methods like rolling operations, transformations

  • Mathematical Functions: NumPy mathematical functions (log, exp, sqrt, etc.)

  • Automatic Naming: Generates descriptive names based on function names and parameters

Performance Optimization Features:

  • Caching Support: Optional caching mechanism to avoid recomputing expensive transformations

  • Backend Selection: Supports both pandas (“pd”) and Numba (“nb”) computational backends

  • Lazy Evaluation: Transformations are only computed when __call__() is invoked

  • Transform Reuse: Wrapped transforms can be shared across multiple Feature instances

Built-in Convenience Methods:

The class provides pre-implemented methods for common financial operations:

  • Statistical Functions: Rolling mean, standard deviation, exponential moving average

  • Mathematical Transforms: Logarithms, exponentials, square roots, clipping

  • Time Series Operations: Lagging, shifting, rolling aggregations

  • Risk/Return Metrics: Log returns, volatility calculations, normalized features

Feature Naming and Metadata:

Features maintain intelligent naming schemes that:

  • Preserve Traceability: Names reflect the sequence of operations applied

  • Support Customization: Allow manual name overrides for semantic clarity

  • Enable Pipeline Integration: Generate column names suitable for DataFrame integration

  • Maintain Consistency: Ensure naming conventions across mathematical operations

Integration with Transform Hierarchy:

The Feature class seamlessly integrates with the transform ecosystem:

  • Transform Wrapping: Can wrap any BaseTransform subclass (SISO, MISO, SIMO, MIMO)

  • Operation Transforms: Automatically creates appropriate operation transforms for mathematical expressions

  • Backend Compatibility: Supports both pandas and Numba backends through wrapped transforms

  • Validation Inheritance: Inherits input validation and error handling from underlying transforms

Note

Feature instances are designed to be immutable - mathematical operations create new Feature objects rather than modifying existing ones. This design promotes functional programming patterns and prevents unintended side effects in complex feature engineering pipelines.

Tip

The caching mechanism is particularly valuable for expensive transformations that are reused across multiple features. Consider enabling caching for computationally intensive operations like rolling correlations, technical indicators, or statistical decompositions.

Parameters:

transform (BaseTransform) – The underlying transform to wrap. Can be any subclass of BaseTransform including SISO, MISO, SIMO, or MIMO transforms.

Raises:
  • AttributeError – If the wrapped transform doesn’t have the required output_name attribute.

  • TypeError – If name setter receives incompatible types during custom name assignment.

  • AssertionError – If custom names have mismatched lengths for multi-output transforms.

Examples

Basic feature creation and mathematical operations:

>>> 
>>> # Create base features
>>> dates = pd.date_range('2023-01-01', periods=20, freq='D')
>>> data = pd.DataFrame({
...     'close': 100 + np.random.randn(20).cumsum(),
...     'volume': np.random.randint(1000, 5000, 20)
... }, index=dates)
>>>
>>> # Wrap transforms as features
>>> price = Feature(SimpleMovingAverageTransform('close', 'sma_20'))  
>>> volume = Feature(VolumeTransform('volume'))  
>>>
>>> # Mathematical operations
>>> price_vol_ratio = price / volume  
>>> normalized_price = (price - price.rolling_mean(10)) / price.rolling_std(10)  

Advanced function application:

>>> 
>>> # Custom function application
>>> log_returns = price.apply(lambda x: x.pct_change().apply(np.log), suffix='log_ret')  
>>>
>>> # Built-in convenience methods
>>> clipped_returns = log_returns.clip(lower=-0.1, upper=0.1)  
>>> volatility = log_returns.rolling_std(30)  
>>>
>>> # Composite feature engineering
>>> momentum = price / price.lag(20) - 1  
>>> momentum_signal = Feature.max(Feature.min(momentum, 0.2), -0.2)  

Performance optimization with caching:

>>> 
>>> # Create expensive computation
>>> complex_indicator = Feature(ComplexTechnicalIndicator('close', window=100))  
>>>
>>> # Use caching for repeated calculations
>>> cache = pd.DataFrame()  
>>> result1 = complex_indicator(data, cache=cache)  
>>> result2 = complex_indicator(data, cache=cache)  # Uses cached result  

Feature pipeline construction:

# Build a comprehensive feature set
base_price = Feature(PriceTransform('close'))

# Technical indicators
sma_20 = base_price.rolling_mean(20)
sma_50 = base_price.rolling_mean(50)
rsi = Feature(RSITransform('close', 14))

# Derived features
price_momentum = base_price / base_price.lag(10) - 1
sma_ratio = sma_20 / sma_50
mean_reversion = (base_price - sma_20) / base_price.rolling_std(20)

# Composite signals
trend_signal = Feature.max(Feature.min(sma_ratio - 1, 0.1), -0.1)
momentum_signal = price_momentum.clip(lower=-0.2, upper=0.2)
combined_signal = (trend_signal + momentum_signal) / 2

See also

  • BaseTransform: The underlying transform interface that Feature wraps.

  • CoreTransform: Base class for dual-backend transforms used within Features.

  • BinaryOpTransform: Transform class created for binary mathematical operations.

  • UnaryOpTransform: Transform class created for unary mathematical operations.

  • ConstantOpTransform: Transform class created for operations with scalar constants.

References

__init__(transform: BaseTransform)[source]
abs()[source]

Get the absolute values of the feature.

Returns:

A new Feature with absolute values

apply(func, *args, suffix=None, **kwargs)[source]

Apply an arbitrary function to the output of this feature.

Parameters:
  • func – The function to apply to the feature output

  • args – Additional positional arguments to pass to the function

  • suffix – Optional suffix to add to the feature name (default is function name)

  • kwargs – Additional keyword arguments to pass to the function

Returns:

A new Feature with the function applied

clip(lower=None, upper=None)[source]

Clip the values of the feature between lower and upper bounds.

Parameters:
  • lower – Lower boundary (optional)

  • upper – Upper boundary (optional)

Returns:

A new Feature with clipped values

ema(span, adjust=True)[source]

Calculate the Exponential Moving Average (EMA) of the feature.

Parameters:
  • span – Span for the EMA calculation

  • adjust – Whether to adjust the EMA calculation (default is True)

Returns:

A new Feature with EMA values

exp()[source]

Get the exponential of the feature.

Returns:

A new Feature with exp values

static from_config(cfg: dict) Feature[source]
lag(period)[source]

Create a lagged version of the feature.

Parameters:

period – Number of periods to lag

Returns:

A new Feature with lagged values

log()[source]

Get the natural logarithm of the feature.

Returns:

A new Feature with log values

log1p()[source]

Get the natural logarithm of the feature.

Returns:

A new Feature with log values

static max(a, b)[source]

Calculate the element-wise maximum between two features.

Parameters:
  • a – First feature or scalar

  • b – Second feature or scalar

Returns:

A new Feature containing the element-wise maximum

static min(a, b)[source]

Calculate the element-wise minimum between two features.

Parameters:
  • a – First feature or scalar

  • b – Second feature or scalar

Returns:

A new Feature containing the element-wise minimum

property name

Get the output name from the wrapped transform

rolling_mean(window)[source]

Calculate the rolling mean of the feature.

Parameters:

window – Rolling window size

Returns:

A new Feature with rolling mean values

rolling_std(window)[source]

Calculate the rolling standard deviation of the feature.

Parameters:

window – Rolling window size

Returns:

A new Feature with rolling std values

rolling_sum(window)[source]

Calculate the rolling sum of the feature.

Parameters:

window – Rolling window size

Returns:

A new Feature with rolling sum values

sqrt()[source]

Get the square root of the feature.

Returns:

A new Feature with square root values

square()[source]

Get the square of the feature.

Returns:

A new Feature with squared values

to_config() dict[source]

Serialize this Feature (and underlying transform) to a JSON-serializable dict. Note: custom arbitrary functions used via Feature.apply may not be fully reconstructable unless their op_name is recognized (abs, log, log1p, exp, square, sqrt, clip_*).

class finmlkit.feature.kit.FeatureKit(features: list[Feature], retain: list[str] = None)[source]

Bases: object

High-level orchestration framework for executing collections of Feature objects in financial machine learning pipelines.

This class serves as the primary interface for building comprehensive feature engineering workflows by coordinating multiple Feature instances, managing computational resources, and providing performance diagnostics. It represents the culmination of the feature engineering framework, enabling practitioners to construct, execute, and analyze complex feature sets with minimal boilerplate code and maximum computational efficiency.

Pipeline Orchestration Architecture:

FeatureKit implements a batch processing pattern for feature computation, where multiple Feature objects are executed sequentially against a shared DataFrame. This approach enables sophisticated optimization strategies:

  • Incremental Caching: Intermediate feature outputs are cached within the working DataFrame, enabling dependent features to reuse previously computed results without redundant calculations

  • Selective Retention: Original DataFrame columns can be preserved alongside computed features, maintaining data lineage and enabling hybrid analytical workflows

  • Performance Profiling: Optional timing analysis identifies computational bottlenecks and guides optimization efforts

  • Backend Consistency: All features execute with the same computational backend for consistent performance characteristics

Mathematical Processing Framework:

For a collection of features \(F_1, F_2, \ldots, F_n\) applied to input DataFrame \(D\), the processing follows this computational model:

\[\begin{split}\begin{align} D_0 &= D \cup \text{retain_columns} \\ D_1 &= D_0 \cup \{F_1(D_0)\} \\ D_2 &= D_1 \cup \{F_2(D_1)\} \\ &\vdots \\ D_n &= D_{n-1} \cup \{F_n(D_{n-1})\} \end{align}\end{split}\]

where \(\cup\) represents column-wise DataFrame concatenation and each \(F_i(D_{i-1})\) can access all previously computed features through the caching mechanism.

This iterative approach enables:

  • Memory Efficiency: Only one working DataFrame is maintained, with features added incrementally

  • Computational Reuse: Expensive intermediate calculations are preserved for downstream feature computations

Performance Analysis System:

When timing analysis is enabled, FeatureKit generates detailed performance metrics:

\[\text{Relative Performance} = \frac{t_i}{\max(t_1, t_2, \ldots, t_n)} \times 100\%\]

where \(t_i\) is the execution time for feature \(i\). Results are visualized using ASCII bar charts that provide immediate visual feedback on computational bottlenecks.

Integration with Feature Ecosystem:

FeatureKit seamlessly integrates with the complete feature engineering framework:

  • Feature Objects: Accepts any Feature instance, regardless of the underlying transform type (SISO, MISO, SIMO, MIMO)

  • Mathematical Expressions: Can execute features created through mathematical operations (addition, multiplication, etc.)

  • Composed Transforms: Supports features built using the Compose class for complex transformation chains

  • Custom Functions: Works with features created using the Feature.apply() method for arbitrary function application

Production Deployment Considerations:

The class is designed for both research and production environments:

  • Scalability: Efficient memory management enables processing of large datasets without excessive resource consumption

  • Reproducibility: Deterministic execution order ensures consistent results across runs

  • Debugging: Timing analysis and clear error messages facilitate troubleshooting in complex pipelines

  • Flexibility: Support for both pandas and Numba backends enables optimization for different deployment scenarios

Caching Strategy and Optimization:

The caching mechanism provides significant performance benefits:

  1. Intermediate Result Reuse: Features that depend on common sub-computations automatically benefit from cached results

  2. Memory Efficiency: Results are stored directly in the working DataFrame, minimizing memory overhead

  3. Cache Coherence: The cache is updated incrementally, ensuring all features see consistent intermediate state

Parameters:
  • features (list[Feature]) – Ordered list of Feature instances to execute. Order determines execution sequence and affects caching behavior for interdependent features.

  • retain (list[str], optional) – Column names from the input DataFrame to preserve in the output unchanged. If None or empty, only computed features are included in the output DataFrame.

Raises:
  • TypeError – If any feature returns an unexpected type (not Series or tuple of Series).

  • KeyError – If retained columns are not present in the input DataFrame.

  • AttributeError – If Feature objects lack required attributes or methods.

Examples

Basic feature pipeline construction:

>>> 
>>> # Prepare sample financial data
>>> dates = pd.date_range('2023-01-01', periods=100, freq='D')
>>> np.random.seed(42)
>>> prices = 100 + np.random.randn(100).cumsum()
>>> data = pd.DataFrame({'close': prices, 'volume': np.random.randint(1000, 10000, 100)}, index=dates)
>>>
>>> # Create individual features
>>> price_feature = Feature(SimpleMovingAverageTransform('close', 'sma_20'))
>>> rsi_feature = Feature(RSITransform('close', 'rsi_14'))
>>>
>>> # Build feature pipeline
>>> feature_kit = FeatureKit([price_feature, rsi_feature], retain=['close', 'volume'])
>>> result_df = feature_kit.build(data, backend='nb', timeit=False)
>>> print(f"Output shape: {result_df.shape}")  
Output shape: (100, 4)
>>> print(f"Columns: {list(result_df.columns)}")  
Columns: ['close', 'volume', 'close_sma_20', 'close_rsi_14']

Advanced pipeline with interdependent features:

>>> 
>>> # Create features with dependencies
>>> base_price = Feature(PriceTransform('close'))
>>> sma_20 = base_price.rolling_mean(20)
>>> price_to_sma_ratio = base_price / sma_20  # Depends on sma_20
>>> momentum_signal = price_to_sma_ratio.clip(lower=0.8, upper=1.2)  # Depends on ratio
>>>
>>> advanced_kit = FeatureKit([
...     base_price,
...     sma_20,
...     price_to_sma_ratio,
...     momentum_signal
... ], retain=['close'])
>>>
>>> # Execute with performance profiling
>>> advanced_result = advanced_kit.build(data, backend='nb', timeit=True)  

Production-scale feature engineering:

# Large-scale feature construction
import pandas as pd
from finmlkit.features import Feature, FeatureKit
from finmlkit.transforms import *

# Load large dataset
large_data = pd.read_csv('large_financial_dataset.csv', index_col='timestamp', parse_dates=True)

# Comprehensive feature set
features = []

# Price-based features
price = Feature(PriceTransform('close'))
features.extend([
    price.rolling_mean(10),
    price.rolling_mean(20),
    price.rolling_mean(50),
    price.rolling_std(20),
    price.log().diff(),  # Log returns
])

# Volume-based features
volume = Feature(VolumeTransform('volume'))
features.extend([
    volume.rolling_mean(20),
    (price * volume).rolling_mean(20),  # Dollar volume
])

# Technical indicators
features.extend([
    Feature(RSITransform('close', 'rsi_14')),
    Feature(MACDTransform('close', 'macd')),
    Feature(BollingerBandsTransform('close', 'bb')),
])

# Cross-asset features (if multiple assets)
if 'close_spy' in large_data.columns:
    spy_price = Feature(PriceTransform('close_spy'))
    beta = Feature(BetaTransform(['close', 'close_spy'], 'beta_spy'))
    features.append(beta)

# Create comprehensive feature kit
production_kit = FeatureKit(features, retain=['close', 'volume', 'open', 'high', 'low'])

# Execute with timing for optimization analysis
feature_matrix = production_kit.build(large_data, backend='nb', timeit=True)

# Save results for model training
feature_matrix.to_parquet('feature_matrix.parquet')

Reproducibility and config I/O:

# Save and reload feature pipeline configuration
kit = FeatureKit(features, retain=['close', 'volume'])
kit.save_config('featurekit.json')
kit2 = FeatureKit.from_config('featurekit.json')
df2 = kit2.build(large_data, backend='pd', order='defined')

Execution order and dependency graph:

# Compute in topological order to resolve dependencies automatically
df_topo = kit.build(large_data, backend='pd', order='topo')
# Visualize the graph
print(kit.build_graph().visualize())

External functions (e.g., NumPy/TA-Lib) via ExternalFunction:

from finmlkit.feature.transforms import ExternalFunction

# Single-output example using NumPy (passes numpy arrays to function)
log_close = Feature(ExternalFunction('numpy.log', input_cols='close', output_cols='log_close', pass_numpy=True))

# TA-Lib example (if talib is installed)
# rsi14 = Feature(ExternalFunction('talib.RSI', input_cols='close', output_cols='ta_rsi14', args=[14], pass_numpy=True))

kit_ext = FeatureKit([log_close], retain=['close'])
df_ext = kit_ext.build(large_data, backend='pd')

See also

  • Feature: Core wrapper class for individual transformations with mathematical operations.

  • BaseTransform: Abstract base class for all transformation implementations.

  • Compose: Pipeline composition class for chaining single-output transforms.

  • SISOTransform, MISOTransform, SIMOTransform, MIMOTransform: Concrete transform base classes.

References

__init__(features: list[Feature], retain: list[str] = None)[source]
build(df, *, backend='nb', timeit=False, order: str = 'defined')[source]

Execute all Features and return a DataFrame with retained and computed columns.

Parameters:
  • df (pd.DataFrame) – Input DataFrame containing raw columns required by features.

  • backend (str) – Computational backend for all features. “pd” for pandas, “nb” for numba. Default “nb”.

  • timeit (bool) – If True, prints a timing analysis for each feature after execution.

  • order (str) –

    Execution order for features: - “defined” (default): Run features in the order they were provided to FeatureKit - “topo”: Run features in topological order based on dependencies inferred

    from their underlying transforms. This helps when the list order doesn’t already respect dependencies (e.g., when a feature uses the output of another).

Returns:

A DataFrame that contains retained columns and all computed feature columns.

Return type:

pd.DataFrame

build_graph() ComputationGraph[source]
classmethod from_config(path: str) FeatureKit[source]
static from_dict(cfg: dict) FeatureKit[source]
save_config(path: str)[source]
to_config() dict[source]
topological_order() list[str][source]