Manish Barnwal

...just another human

Improve runtime of Random Forest in R

There are two ways one can write the code to train a random forest model in R. Both the ways are listed below.

A normal and frequent way of writing the command to train the random forest model is something like this.

rfModel <- randomForest(Survived~. , data = trainSample[, -c(6, 8 ...

How to install a package of a particular version in R

I recently tried installing caret package in R using

install.packages('caret', dependencies=T)

Normally this installation of package works and I continue to work with the functions associated with the package. When I tried including the package using

library(caret)

I got the following error.

Error in loadNamespace(j ...

Don’t introduce that bias in your child

A thought crossed my mind yesterday and just when it was about to get lost in the cloud of many thoughts that scatters in mind…I caught hold of it. I felt I had gathered the maximum out of this thought but I wanted to write about it so that ...

Shell commands come in handy for a data scientist

I am no expert of shell commands. I have been using them for quite some time and thought I give an attempt to list down the most common commands. I am writing these mostly from the perspective of a data-science guy. Let us get started.

I will use the file- ...

ROC and AUC - The three lettered acronyms

I don't feel bad to confess this that ROC curve, AUC, True-positive and related terms took quite some time for me to understand. If today I contemplate on the reasons why I found this topic confusing. The first would be there are not many resources that explains intuitively what ...

Vim/Vi editor shortcuts

Repetitive tasks should be done using as many shortcuts as possible. You are not doing anything new and hence not even an extra minute should be spent on doing the same. This post refers to the shortcuts that come in handy when working on the vi/vim editor.

This is ...

Hadoop Streaming

A few days ago, I had written a post on The Big Data Problem which attempted to understand why we need big data and what the fuss is all about. You may want to read it here.

Having understood why we need big data, let’s understand how we can ...

When R package is not available across the cluster

When deploying R codes across the cluster, many a times the reason for the failure of the task is unavailability of a particular package across all nodes of the cluster. We wait for someone to get the package installed across all the nodes. This may take some days. Do we ...