Blog of Sara Jakša

Splitting the StackOverflow Data

For our Data Mining project, me and my teammate were analysing the StackOverflow data on Kaggle. But since she was dead set on not installing anything on the computer, we needed to find a different solution. Especially since the Data Science workbench that she was using had a stingy file limit.

I know there is a reason for that limit, as they don't want free customers abusing the service. I just find it impractical, so I am not using it.

Either way, we needed to find another way, so I created a script that could filter the data based on the tags and create the new files. This way she would at least make language specific analysis. The code for this can be found here