top of page

Ensemble methods in machine learning

Ensemble methods are a powerful technique in deep learning and machine learning that  combine the predictions of multiple models to create a more accurate and robust final  prediction. They are particularly effective at improving model performance, reducing  overfitting, and achieving a better balance between bias and variance


BAGGING (Bootstrap Aggregating) 


The goal of bagging is to reduce variance and improve generalization. Bagging involves  training multiple independent models on different subsets of the training data, typically  generated through bootstrapping (sampling with replacement). Each model is trained in  parallel, and their predictions are combined, usually by averaging (for regression) or majority  voting (for classification). 


Bagging in ML
Bagging

In detail, each model is trained on a random subset of the data sampled with replacement,  meaning that the individual data points can be chosen more than once. This random subset is  known as a bootstrap sample. By training models on different bootstraps, bagging reduces the  variance of the individual models.  


The predictions from all the sampled models are then combined through a simple averaging to  make the overall prediction. This way, the aggregated model incorporates the strengths of the  individual ones and cancels out their errors


Example: Random Forest is a popular bagging method where multiple decision trees are  trained, and their outputs are averaged to make the final prediction.


Benefits: Reduces overfitting by averaging out the errors of individual models. Improves  model stability and robustness. 


BOOSTING 


The goal of bagging is to reduce bias and improve generalization Boosting involves training  models sequentially, where each new model focuses on the mistakes made by previous models.  i.e models are trained and tested in a sequential way, one after another. The predictions of all  models are combined, often by weighted voting or summation, to produce the final output. 


Since the models are trained sequentially, boosting method was considered to take a lot more  time than other methods. 



Boosting in ML
Boosting

Example: AdaBoost and Gradient Boosting Machines (GBM) are popular boosting techniques.  In these methods, each subsequent model attempts to correct the errors of the previous ones,  resulting in a strong final model. 


Benefits: Can improve performance on complex datasets by iteratively refining predictions.  Capable of converting weak learners into a strong ensemble.

Comments


bottom of page