Blog of Sara Jakša

How Many Pieces of Knowledge from Different Places are Hidden in my Code

For a hobby programmer like me, Google is like a bible for fundamentalists. When they don‘t have an answer for something, they go and find it on bible. When I have a programming problem, I go and find it by googling it.

So I was wondering how many websites do I usually have to access for my to program something. I have a feeling that the number has to be quite high, but I did not know how much.

In order to figure that out, I decide to try and figure out how to program a factor analysis in the python. I wanted to get the loadings of variables for different factors. I continued until I had a working code that could print the loadings in the terminal for analysis. I only counted the websites whose solutions ended up in the code. There were more, but they were dead ends so I did not include them.

For the analysis I used the answers to the Big Five personality questionnaire that I found on http://personality-testing.info/_rawdata/.

This is the final code that I ended up with:

    from sklearn.decomposition import FactorAnalysis
    import numpy
    from rpy2.robjects.packages import importr
    from rpy2.robjects import r, numpy2ri

    data = numpy.genfromtxt('data.csv', delimiter='\t')

    for i in [6,5,4,3,2,1,0]:
        data = numpy.delete(data, i, 1)
    data = numpy.delete(data, 0, 0)

    numpy2ri.activate()
    fit = r.factanal(data, 5, rotation="varimax")
    results = fit.rx('loadings')
    print(results)

I used 7 different websites to code this 12 lines of code. Which means that I needed to check one webpage for ever 1.7 line of code.

Here is the code with websites used above the piece of code that they were used for:

    from sklearn.decomposition import FactorAnalysis
    import numpy

    #http://stackoverflow.com/questions/3518778/how-to-read-csv-into-record-array-in-numpy
    data = numpy.genfromtxt('data.csv', delimiter='\t')

    #http://stackoverflow.com/questions/24898754/delete-dimension-of-array
    #http://docs.scipy.org/doc/numpy/reference/generated/numpy.delete.html
    for i in [6,5,4,3,2,1,0]:
        data = numpy.delete(data, i, 1)
    data = numpy.delete(data, 0, 0)

    #http://stackoverflow.com/questions/25036588/extract-correlation-matrix-from-rs-factanal-via-rpy
    from rpy2.robjects import r, numpy2ri
    numpy2ri.activate()

    #http://blog.yhat.com/posts/rpy2-combing-the-power-of-r-and-python.html
    from rpy2.robjects.packages import importr
    #http://www.statmethods.net/advstats/factor.html
    fit = r.factanal(data, 5, rotation="varimax")
    #http://stackoverflow.com/questions/27575848/how-to-convert-rpy2-listvector-rpy2-robjects-vectors-listvector-to-python
    results = fit.rx('loadings')
    print(results)

I knew that I use internet as a crutch a lot, but these results were surprising. I don‘t really want to believe that I use it that much, but at least in this case the data shows that. Maybe I should rethink my way of programming…