Data modules
fowt_ml.datasets
This module contains functions to load and preprocess datasets.
Functions:
-
convert_mat_to_df–Reads a matlab file and returns a pandas DataFrame.
-
get_data–Returns a dataframe for the given data_id.
-
check_data–Checks if the dataframe has the required columns and their are valid.
-
fix_column_names–Fixes the column names to remove special characters.
convert_mat_to_df
Reads a matlab file and returns a pandas DataFrame.
Parameters:
-
mat_file(str) –Path to a matlab file.
-
data_id(str) –ID of the data in the matlab file.
Returns:
-
DataFrame–pd.DataFrame: DataFrame containing the data.
Source code in src/fowt_ml/datasets.py
get_data
Returns a dataframe for the given data_id.
Parameters:
-
data_id(str) –ID of the data in the configuration file.
-
config(dict) –Configuration dictionary. Example: {"data_id": {"path_file": "data.mat"}}.
Returns:
-
DataFrame–pd.DataFrame: DataFrame for the given data_id.
Source code in src/fowt_ml/datasets.py
check_data
Checks if the dataframe has the required columns and their are valid.
Parameters:
-
df(DataFrame) –DataFrame to check.
-
col_names(list) –List of required columns.
Returns:
-
DataFrame–pd.DataFrame: DataFrame with valid required columns.
Source code in src/fowt_ml/datasets.py
fix_column_names
Fixes the column names to remove special characters.
Parameters:
-
df(DataFrame) –DataFrame to fix.
Returns:
-
DataFrame–pd.DataFrame: DataFrame with fixed column names.
Source code in src/fowt_ml/datasets.py
Config modules
fowt_ml.config
This module contains functions to read configuration files.
Classes:
-
BaseConfig– -
MLConfig– -
Config–Base class for configuration files.
Functions:
-
get_allowed_kwargs–Return valid keyword args for a function or class constructor.
-
get_config_file–Get the config file path.
BaseConfig
MLConfig
Bases: BaseConfig
Methods:
-
validate_tts_kwargs–Validate train_test_split kwargs.
-
validate_cv_kwargs–Validate cross_validate kwargs.
-
validate_models–Validate model names and their kwargs.
validate_tts_kwargs
classmethod
Validate train_test_split kwargs.
Source code in src/fowt_ml/config.py
validate_cv_kwargs
classmethod
Validate cross_validate kwargs.
Source code in src/fowt_ml/config.py
validate_models
classmethod
Validate model names and their kwargs.
Source code in src/fowt_ml/config.py
Config
Bases: BaseConfig
Base class for configuration files.
Methods:
from_yaml
classmethod
Read configs from a config.yaml file.
If key is not found in config.yaml, the default value is used.
Source code in src/fowt_ml/config.py
to_yaml
classmethod
Write configs to a yaml config_file.
Source code in src/fowt_ml/config.py
get_allowed_kwargs
Return valid keyword args for a function or class constructor.
Source code in src/fowt_ml/config.py
get_config_file
Get the config file path.
Source code in src/fowt_ml/config.py
ML pipelines
fowt_ml.pipeline
Classes:
-
Pipeline–
Pipeline
Pipeline(config: str | Config)
Parameters:
-
config(str | Config) –Path to the configuration file or a Config object.
-
kwargs–Additional keyword arguments to override the configuration file.
Returns:
-
None–None
Methods:
-
get_data–Returns the dataset for the given data_id.
-
train_test_split–Splits the data into training and testing sets.
-
get_models–Returns the models for the given model names.
-
setup–Set up the machine learning experiment.
-
compare_models–Compares the models and returns the best model.
Source code in src/fowt_ml/pipeline.py
get_data
Returns the dataset for the given data_id.
Parameters:
-
data_id(str) –ID of the data in the configuration file.
Returns:
-
DataFrame–pd.DataFrame: DataFrame for the given data_id, set in the
-
DataFrame–configuration file.
Source code in src/fowt_ml/pipeline.py
train_test_split
Splits the data into training and testing sets.
The data should be set in self.data before calling this method. kwargs are passed to sklearn.model_selection.train_test_split.
Source code in src/fowt_ml/pipeline.py
get_models
Returns the models for the given model names.
Returns:
-
dict–Dictionary of models.
Source code in src/fowt_ml/pipeline.py
setup
Set up the machine learning experiment.
- find the data
- train test split
- setup the models for comparison
Parameters:
-
data(DataFrame) –DataFrame containing the data.
Returns:
-
Any–Experiment object or similar.
Source code in src/fowt_ml/pipeline.py
compare_models
Compares the models and returns the best model.
"model_fit_time" is in seconds.
Parameters:
-
sort(str, default:'r2') –Metric to sort the models by. Defaults to "r2".
-
cross_validation(bool, default:False) –Whether to use cross-validation
Returns:
-
tuple(Any) –(dict of fitted models, pd.DataFrame of grid scores sorted by
sort)
Source code in src/fowt_ml/pipeline.py
Model modules
fowt_ml.base
This is the base class for all models in the fowt_ml package.
Classes:
-
BaseModel–Base class for all models.
BaseModel
Base class for all models.
Methods:
-
calculate_score–Calculate the score for the model using test data.
-
cross_validate–Perform cross-validation on the model.
-
use_scaled_data–Wrap the estimator to use scaled data for both X and y.
Source code in src/fowt_ml/base.py
calculate_score
calculate_score(x_train: ArrayLike, x_test: ArrayLike, y_train: ArrayLike, y_test: ArrayLike, scoring: str | Iterable) -> float | dict[str, float]
Calculate the score for the model using test data.
First, the model is fitted to the training data, and the time taken to
fit the model is recorded. Then, the model is scored using the provided
scoring method(s) on the test data.
In multi-output regression, by default, 'uniform_average' is used, which specifies a uniformly weighted mean over outputs. see https://scikit-learn.org/stable/modules/model_evaluation.html#regression-metrics
For scoring paramers overview: https://scikit-learn.org/stable/modules/model_evaluation.html#string-name-scorers
Parameters:
-
x_train(ArrayLike) –training data for features
-
x_test(ArrayLike) –test data for features
-
y_train(ArrayLike) –training data for targets
-
y_test(ArrayLike) –test data for targets
-
scoring(str | Iterable) –scoring method(s) to use.
Returns:
-
float | dict[str, float]–float | dict[str, float]: the calculated score(s)
Source code in src/fowt_ml/base.py
cross_validate
cross_validate(x_train: ArrayLike, y_train: ArrayLike, scoring: str | Iterable, **kwargs: Any) -> dict[str, Any]
Perform cross-validation on the model.
Parameters:
-
x_train(ArrayLike) –features data
-
y_train(ArrayLike) –target data
-
scoring(str | Iterable) –scoring method(s) to use.
-
**kwargs(Any, default:{}) –additional keyword arguments to pass to
cross_validate
Returns:
-
dict[str, Any]–dict[str, Any]: dictionary containing cross-validation results
Source code in src/fowt_ml/base.py
use_scaled_data
Wrap the estimator to use scaled data for both X and y.
Source code in src/fowt_ml/base.py
fowt_ml.linear_models
Module to handle linear models.
Classes:
-
LinearModels–Class to handle linear models and metrics for comparison.
LinearModels
Bases: BaseModel
Class to handle linear models and metrics for comparison.
Source code in src/fowt_ml/base.py
fowt_ml.ensemble
Class to handle random forest models and metrics for comparison.
Classes:
-
EnsembleModel–Class to handle random forest models and metrics for comparison.
EnsembleModel
Bases: BaseModel
Class to handle random forest models and metrics for comparison.
Methods:
-
oob_score–Fit and estimate generalization score from out-of-bag samples.
Source code in src/fowt_ml/base.py
oob_score
Fit and estimate generalization score from out-of-bag samples.
Source code in src/fowt_ml/ensemble.py
fowt_ml.neural_network
Module to handle Neural Network models.
Classes:
-
GenericRNNModule– -
NeuralNetwork–Class to handle Neural Network models and metrics for comparison.
Functions:
-
create_skorch_regressor–Create a skorch NeuralNetRegressor with a specified RNN model.
-
RNNRegressor–Create a skorch NeuralNetRegressor with a standard RNN model.
-
LSTMRegressor–Create a skorch NeuralNetRegressor with an LSTM model.
-
GRURegressor–Create a skorch NeuralNetRegressor with a GRU model.
GenericRNNModule
Bases: Module
Methods:
-
forward–Forward pass of the RNN module.
Source code in src/fowt_ml/neural_network.py
forward
Forward pass of the RNN module.
Source code in src/fowt_ml/neural_network.py
NeuralNetwork
Bases: BaseModel
Class to handle Neural Network models and metrics for comparison.
Source code in src/fowt_ml/base.py
create_skorch_regressor
Create a skorch NeuralNetRegressor with a specified RNN model.
Source code in src/fowt_ml/neural_network.py
RNNRegressor
LSTMRegressor
fowt_ml.gaussian_process
Module for sparse Gaussian process for multi-output regeression problem.
Classes:
-
MultitaskGPModelApproximate–Multitask GP model with approximate inference.
-
SklearnGPRegressor–Sklearn Wrapper for MultitaskGPModelApproximate.
-
SparseGaussianModel–Class to handle sparse Gaussian process regression.
MultitaskGPModelApproximate
Bases: ApproximateGP
Multitask GP model with approximate inference.
This module models similarities/correlation in the outputs simultaneously. Each output dimension (task) is the linear combination of some latent function. Base on example https://docs.gpytorch.ai/en/stable/examples/04_Variational_and_Approximate_GPs/SVGP_Multitask_GP_Regression.html#Types-of-Variational-Multitask-Models
Methods:
-
forward–Forward pass of the model.
Source code in src/fowt_ml/gaussian_process.py
SklearnGPRegressor
Bases: RegressorMixin, BaseEstimator
Sklearn Wrapper for MultitaskGPModelApproximate.
Methods:
-
fit–Fit the model to the training data.
-
predict–Make predictions using the trained model.
-
score–Return the R^2 score of the prediction.
Source code in src/fowt_ml/gaussian_process.py
fit
fit(x_train: ArrayLike, y_train: ArrayLike) -> SklearnGPRegressor
Fit the model to the training data.
Source code in src/fowt_ml/gaussian_process.py
predict
Make predictions using the trained model.
Source code in src/fowt_ml/gaussian_process.py
SparseGaussianModel
Bases: BaseModel
Class to handle sparse Gaussian process regression.
Source code in src/fowt_ml/base.py
fowt_ml.xgboost
The module for XGBoost model training and evaluation.
Classes:
-
XGBoost–Class to handle linear models and metrics for comparison.
XGBoost
Bases: BaseModel
Class to handle linear models and metrics for comparison.