Blog of Sara Jakša

How I Used Vim to Clean MBTI Data

Before I had gotten the Personality Cafe posts in order to analyse them. But when I was checking the file over, I realized that there is still some sort of JavaScript code included in it.

Thankfully, by analysing it a little, I figure out that spacing of it is really convenient. It was always on the end of the posts, and it always started in the same way: "(function(w,d,s,i)"

I first tried to use some sort of python script, but then I realized that I am most likely overcomplicating things. I mean, this would be simple, if I could just use the search and replace. But gedit, the program that I normally use, had problems dealing with 30MB+ files.

Then I remembered that I saw some examples of how people used vim to clean big files. I figured out that it could not be that hard.

    :%s/(function(w,d,s,i)/\r(function(w,d,s,i)/g 
    :g/(function(w,d,s,i)/d

The first line find the beginning of the JavaScript and puts it in the new line.

    :%s/what to find/what to replace with/g 

The thing to be careful here is, that the new line in vim in \r, and not \n like in the python.

The second line simply deletes the whole lines, if the content is inside. Or maybe starting with it, in my case it did not make interesting, so I am not certain.

    :g/what line to delete/d

Vim is a good for manipulating small amount of big files. But I had 16 files and I almost lost track of which did I already went through.