Blog of Sara Jakša

How to Put Textdata to SQLite with Python

After I had the data that I wanted, I started to do some simple analysis.

Considering the size of the data, I figured that R is most likely not the right way to go, since this was the only time before this project that I had memory problems.

So the first idea that I ended up using was Orange. They have an addon Textable, which has some functions for analysing text. And they had some text analysis tutorials on the site. But I was quickly getting into serious memory problems.

Then I tired following along with some of the tutorials that I found on the internet, but usually I got into memory problems pretty quickly in the analysis. Yes, I am aware that my computer is not the best one there is.

Then I borrowed a book that had a chapter on the text analysis as well. There was an example of classifying Reddit posts, but they were saving them to the SQLite. I am aware of the SQL, since I actually had to listen to it in school.

So I decided to get all the data in the SQL file. The python code that I used for this is used below.

    import sqlite3

    filenames = ["ENFJ",
                 "ENFP",
                 "ENTJ",
                 "ENTP",
                 "ESFJ",
                 "ESFP",
                 "ESTJ",
                 "ESTP",
                 "INFJ",
                 "INFP",
                 "INTJ",
                 "INTP",
                 "ISFJ",
                 "ISFP",
                 "ISTJ",
                 "ISTP"]

    database = "mbti-posts.db"

    conn = sqlite3.connect(database)
    c = conn.cursor()

    c.execute("""CREATE TABLE personalitycafe(id INTEGER PRIMARY KEY, text TEXT, type CHAR(5))""")

    for filename in filenames:

        with open(filename, "r") as read:
            content = read.readlines()

        content = "".join(content)
        content = content.split("\n\n\n")

        filename = '"' + filename + '"'

        for element in content:
            element = element.strip()
            if not element:
                continue
            element = element.replace("\n", " ")
            element = element.replace('"', "'")
            element = '"' + element + '"'
            string = """INSERT INTO personalitycafe (text, type) VALUES (""" + element + """, """ + filename + """)"""
            c.execute(string)

        print(filename)
        conn.commit()