Random Forest For Complete Beginners

Photo by Alex Knight on Unsplash

Random Forest For Complete Beginners

Random Forest Algorithm is a supervised machine learning algoroithm that helps in both classification and regression problems, random forest is a classifier that contains various decision trees on several subsets of dataset and take average of that dataset to increase the predictive accuracy related to that dataset.

The main function of random forest algorithm is to predict the outcomes on new data on the basis of training data specially for complex data.

Things we need to undertand about random forest algorithm:

  • Esemble learning: Random forest algorithm is like a democracy where majority wins, it has a lot of decision trees these trees work together to make a prediction the most frequent prediction among the trees becomes the final answer.

  • Randomness: While decision trees work on the rigid data random forest introduces randomness in the triaining data that helps it to overcome overfitting

Features of Random Forest algorithm:

  • Multiple trees prevent any single tree's mistake from swaying the final outcome.

  • By introducing randomness during the training, the forest avoids memorizing noise in the data that means it is less prone to overfitting.

  • It works well without lot of tweaking making it user-friendly.

Important terms to know:

  • Entropy: It is the measure of randomness and unpredictabily of the data.

  • Leaf-node: Node that carries the classification or the decision.

  • Root-node: Node where the branching starts.

  • Decision-node: A node that has two or more branches.

  • Information gain: The root node is the topmost decision node, which is where you have all of your data.

The following steps explain the working Random Forest Algorithm:

Step 1: Select random samples from a given data or training set.

Step 2: This algorithm will construct a decision tree for every training data.

Step 3: Voting will take place by averaging the decision tree.

Step 4: Finally, select the most voted prediction result as the final prediction result.

This is the most basic code perfoming random forest:

from sklearn.ensemble import RandomForestClassifier

# Define some sample data (replace with your actual data)
X = [[1, 2], [3, 4], [5, 6], [7, 8]]  # Features
y = [1, 0, 1, 0]  # Labels (e.g., 1 for positive, 0 for negative)

# Create a Random Forest Classifier with 100 trees
model = RandomForestClassifier(n_estimators=100)

# Train the model on the data
model.fit(X, y)

# Make a prediction on new data (replace with your new data)
new_data = [[9, 10]]
prediction = model.predict(new_data)

print("Predicted label:", prediction)