Blog of Sara Jakša

The Summary of Year 2018

I think that by this time, the whole Earth had entered the year 2019. Happy new year to anybody reading it right now. For the beginning of new year, I though to write a short numerical summary of what I did in this year.

For this, I am only able to do the summary for the things, that I do the good job of keeping track of. So, that means, that the only things that I can describe is the use of my internet, the number of blog posts and the amount of material read.

Blog Posts

Let me start with the blog. I have recently changed the way how I show the blog posts here. I am using nikola now, which means, that if I wanted to keep the links the same as when I was using htmly, I needed to put the blog posts in the year/month folders. Which made the analysis a lot easier, since I only needed to count the number of files in the folders.

import os

post_folder = "/mnt/Stable/sarajaksa.eu-nikola/posts/2018"

months = os.listdir(post_folder)

number_of_posts = dict()

for month in months:
    posts = os.listdir(post_folder + "/" + month)
    number_of_posts[int(month)] = len(posts)

with open("blog_posts.csv", "w") as f:
    all_posts = 0
    f.write("Month;PostNumber\n")
    for i in range(1, 13):
        if i not in number_of_posts:
            post_number = 0
        else:
            post_number = number_of_posts[i]
        f.write(str(i) + ";" + str(post_number) + "\n")
        all_posts += post_number
    print("I had written " + str(all_posts) + " blog posts in year 2018.")

In the year 2018 I had written 58 blog posts. Here is the breakdown by the months:

Month Posts Written
January 2
February 0
March 3
April 1
May 6
June 3
July 2
August 7
September 7
October 9
November 11
December 7

For the more visual types, there is also an easy way to create a graph:

library(tidyverse)

data <- read.csv("blog_posts.csv", sep=";", header=TRUE)

png("number_of_blog_posts_2018.png")
ggplot(data, aes(x=Month, y=PostNumber)) + 
    geom_line(color="blue") + 
    xlab("Months") + 
    ylab("Number of Posts Written") +
    ggtitle("Blogging in 2018")
dev.off()

graph of number of post written (data is on the table further up)

Reading Material

Ever since the later months of 2017, I have kept track of all the books, scientific articles and similar material (like thesis). I simply put them in the .bib file with the time stamp, which indicates, when I had finished the material. Which means, that for the year 2018, I have the information for every book that I finished and every scientific article that I had finished.

The script this time is a big longer, since i needed po parse the bib files as well:

import re


def get_bib_entries_by_date(list_of_files, date_to_match):
    re_entry = r"@\w*{.+?timestamp.+?}"
    re_timestamp = r"timestamp[\s]+?=[\s]+?{.+?}"
    re_date = r"\d\d\d\d-\d\d-\d\d"
    re_type = r"@\w*{"

    lines = ""

    for bib_file in list_of_files:
        with open(bib_file) as f:
            lines2 = f.readlines()

        lines2 = " ".join(lines2)
        lines = lines + lines2

    lines = lines.replace("\n", " ")

    lines = re.findall(re_entry, lines)
    all_entries = dict()

    for line in lines:
        timestamp = re.findall(re_timestamp, line)[0]
        date = re.findall(re_date, timestamp)
        if date:
            date = date[0]
        entry = re.findall(re_type, line)[0][1:-1]
        if date_to_match in date:
            if entry not in all_entries:
                all_entries[entry] = 0
            all_entries[entry] += 1

    return all_entries


def dictionary_to_csv(dictionary, filename):
    with open(filename, "w") as f:
        f.write("Month;Entry;Count\n")
        for month in dictionary:
            for entry in dictionary[month]:
                f.write(
                    month + ";" + entry + ";" + str(dictionary[month][entry]) + "\n"
                )


def get_entries_by_months(filenames, year, filename):
    months = ["01", "02", "03", "04", "05", "06", "07", "08", "09", "10", "11", "12"]
    all_results = dict()
    for month in months:
        all_results[month] = get_bib_entries_by_date(filenames, year + "-" + month)
    dictionary_to_csv(all_results, filename)


get_entries_by_months(
    ["Articles.bib", "Books.bib", "Conferences.bib", "Predelano.bib", "Thesis.bib"],
    "2018",
    "2018.csv",
)

This got me the csv file, that I could then analyse and visualize in R. This is the code that I had used:

library(tidyverse)

data <- read.csv("2018.csv", sep=";", header=TRUE)

data_articles <- data %>% 
    filter(Entry=="Article" | Entry=="InProceedings") %>% 
    group_by(Month) %>% 
    summarize(Count=sum(Count), Entry="Article")
data_books <- data %>% 
    filter(Entry=="Book") %>% 
    group_by(Month) %>% 
    summarize(Count=sum(Count), Entry="Book")
data_chapters <- data %>% 
    filter(Entry=="InBook" | Entry=="InCollection") %>% 
    group_by(Month) %>% 
    summarize(Count=sum(Count), Entry="Chapter")
data_other <- data %>% 
    filter(Entry=="MasterThesis" | Entry=="Report" | Entry=="TechReport" | Entry=="Thesis") %>% 
    group_by(Month) %>% 
    summarize(Count=sum(Count), Entry="Other")

png("articles_read_2018.png")
ggplot(data_articles, aes(x=Month, y=Count)) + 
    geom_line(color="blue") + 
    xlab("Months") + 
    ylab("Number of Articles Read") +
    ggtitle("Reading Articles in 2018")
dev.off()

png("books_read_2018.png")
ggplot(data_books, aes(x=Month, y=Count)) + 
    geom_line(color="blue") + 
    xlab("Months") + 
    ylab("Number of Books Read") +
    ggtitle("Reading Books in 2018")
dev.off()

data <- merge.data.frame(data_articles, data_books, all=TRUE)
data <- merge.data.frame(data, data_chapters, all=TRUE)
data <- merge.data.frame(data, data_other, all=TRUE)

data_summary <- data %>% group_by(Entry) %>% summarize(Count=sum(Count), MonthAvg=sum(Count)/12, DayAvg=sum(Count)/365)

data_summary

So now for the results. In the following table, there are the results for the number of material that I read:

Type Units Read Month Average Day Average
Articles 530 44.2 1.45
Books 102 8.5 0.279
Chapters 39 3.25 0.107
Other 14 1.17 0.0384

In the personal development, they suggest to read 50 book per year. It was supposed to be good for something? I am not sure. Well, I read about twice as much without this being the goal, so I think this was quite nice.

But I was more interested in how many articles do I read per day. So I read about 1.5 article per day. This seems like a relatively small number, but this would be the number, if I was consistent in doing it every day. Which I am not. So I guess I should be happy with the result. I mean, I read more than 500 of them in a year, which is not a small number.

For people, that are more visual, I also added the graphs of how many books and articles did I read per month.

This is the graph for the articles:

graph of number of articles read in 2018

And another graph for books:

graph of number of books read in 2018

Time Spend on the Internet

While not for for all months, I do have data for my use of internet form July to October. So here I am going to show you some of this as well. Let us first start with the parsing, before we go to the visualization:

folder = "Programming-MyUsedTime"
files = [
    "MindTheTime-2018-Julij.csv",
    "MindTheTime-2018-Avgust.csv",
    "MindTheTime-2018-September.csv",
    "MindTheTime-2018-Oktober.csv",
]

time_spend_on_websites = dict()

for filename in files:
    with open(folder + "/" + filename) as f:
        lines = f.readlines()
    for line in lines:
        line = line.strip()
        _, website, time, _, _ = line.split(";")
        if not website:
            continue
        if website not in time_spend_on_websites:
            time_spend_on_websites[website] = 0
        time = time.split(":")
        if len(time) == 3:
            time = int(time[0]) * 60 * 60 + int(time[1]) * 60 + int(time[0])
        elif len(time) == 2:
            time = int(time[0]) * 60 + int(time[1])
        time_spend_on_websites[website] += time

with open("Websites-2018.csv", "w") as f:
    f.write("Website;Time\n")
    for website, time in time_spend_on_websites.items():
        if time > 0:
            f.write(website + ";" + str(time) + "\n")

So, for the first analysis, lets see how many hours did I spend on the internet per month. I do apologize, that the names of the months are in Slovenian. But the sames are similar, so it should not impend the understanding too much.

library(tidyverse)

data_total <- read.csv("Programming-MyUsedTime/MindTheTime-2018-Total.csv", sep=";", header=FALSE)
names(data_total) <- c("Month", "Time", "Nothing")
data_total$Nothing <- NULL

data_total <- data_total %>%
    separate(col=Time, into=c("Hours", "Minutes", "Seconds"), sep=":") 

data_total$Hours <- as.numeric(data_total$Hours)
data_total$Minutes <- as.numeric(data_total$Minutes)
data_total$Seconds <- as.numeric(data_total$Seconds)

data_total$Time = data_total$Hours * 60 * 60 + data_total$Minutes * 60 + data_total$Seconds

png("website_total_2018.png")
ggplot(data_total, aes(x=Month, y=Hours)) + 
    geom_col(color="blue") + 
    scale_x_discrete(limits=c("Junij", "Julij", "August", "September", "Oktober")) +
    xlab("Months") + 
    ylab("Time Spend on Internet") +
    ggtitle("Time Spend on Internet in 2018")
dev.off()

time spend on the internet in hours per month (June: 72, July: 119, August: 140, September: 110, October: 62)

Well, now I could also check the most visited 10 domains in this time frame. I am going to do it, simply because when I saw it for the first time a couple of months ago, I started to change some of my habits. So, let us see these pages:

library(tidyverse)

data <- read.csv("Websites-2018.csv", sep=";", header=TRUE)
data <- data[with(data, order(Time, decreasing=TRUE)), ]
data10 <- head(data, n=10)

data10$Minutes <- data10$Time %/% 60
data10$Seconds <- data10$Time %% 60
data10$Hours <- data10$Minutes %/% 60
data10$Minutes <- data10$Minutes %% 60
data10$Time <- NULL

data10
Website Hours Minutes Seconds
archiveofourown.org 98 55 9
watch-series.com 45 39 55
www.youtube.com 1 5 49
watchseries.fi 0 20 21
www.fanfiction.net 0 12 4
mail.google.com 0 9 50
putlocker0.com 0 9 8
www.google.com 0 8 32
putlocker-hd.is 0 7 36
www.duolingo.com 0 7 14

This can be grouped into a couple of categories: fanfiction (archiveofourown.org, fanfiction.net), watching (illegal) videos (watch-series.com, www.youtube.com, watchseries.fi, putlocker0.com, putlocker-hd.is), email (mail.google.com), search (google.com) and language learning (duolingo.com). Sure puts things in the perspective in how I spend time on the internet.

Well, since this time, I have blocked the sites containing fanfiction. I also decided to stop watching series illegally on the internet, but instead by DVDs. I am expecting three more season DVDs this week. Also, since I buy European DVDs, they always have some foreign language track there as well. It is a lot more interesting watching series in German than doing the German Duolingo tree. I now use mutt as email, so it will not longer log there. Also, when I use mutt it feels faster than when I was using gmail through browser. Not much I can do about search yet. I need to find some way there as well.

So I am assuming that the data for year 2019 is going to look a lot different.

So this was my year 2018 in some numerical numbers.