Blog of Sara Jakša

Creating CSV Word Frequency Table with Python

Now I have finally come to the step, where I could create the word frequency table. So I tired various ways, again, but I ended up using python.

Here is the code:

    from collections import defaultdict

    filenames = ["ENFJ-2.csv",
                 "ESFJ-2.csv",
                 "INFJ-2.csv",
                 "ISFJ-2.csv",
                 "ENFP-2.csv",
                 "ESFP-2.csv",
                 "INFP-2.csv",
                 "ISFP-2.csv",
                 "ENTJ-2.csv",
                 "ESTJ-2.csv",
                 "INTJ-2.csv",
                 "ISTJ-2.csv",
                 "ENTP-2.csv",
                 "ESTP-2.csv",
                 "INTP-2.csv",
                 "ISTP-2.csv",
                 ]

    outfile = "table-freq.csv"

    allwords = defaultdict(defaultdict)

    for filename in filenames:
        with open(filename, "r") as read:
            read = read.readlines()

        for line in read[1:]:
            word, freq = line.split("\t")
            word = word.strip()
            freq = int(freq.strip())
            if not filename[:4] in allwords[word]:
                allwords[word][filename[:4]] = 0
            allwords[word][filename[:4]] += freq

    allwordslist = [[], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], []]

    for word, typedict in allwords.items():
        allwordslist[0].append(word)
        for i, typename in [(1, "ESFJ"), (2, "INFJ"), (3, "ENFJ"), (4, "ISFJ"), (5, "ENFP"), (6, "ESFP"), (7, "INFP"), (8, "ISFP"), (9, "ENTJ"), (10, "ESTJ"), (11, "INTJ"), (12, "ISTJ"), (13, "ENTP"), (14, "ESTP"), (15, "INTP"), (16, "ISTP")]:
            if typename not in typedict:
                typedict[typename] = 0
            allwordslist[i].append(typedict[typename])

    with open(outfile, "w") as write:
        for line in allwordslist:
            write.write("\t".join([str(element) for element in line]) + "\n")

At this point I hardcoded some of the variables, because I was starting to feel that I had spend to much time on this path. And I would be right, as I did not end up using it.

But if anybody is interested, here is the link to the word frequency file here. You can find the types order hidden in the code.