Blog of Sara Jakša

How I Managed to Force Myself to Finish my Master Thesis

The last two weeks, I had spent at the seaside, finishing my economics master thesis. The finished version was already send to my mentor, and I have to say, that I am still coming back for that. I am still tired, probably mostly mentally.

But how did I manage to finish something in two weeks, when I had procrastinated on it for months? I mean, I knew in January that I had enough data and I had procrastinated with that as well? I know that I did not do anything from January to April.

I think there were multiple reasons, most of them connected with solitude. Since it is not a season yet, there were not many people there. I exchanged words with three people, and one of them was the lady selling the bread. Which freed my mental capacities, so I could concentrated on the master thesis only.

The second reason is similar in a way. Because I was there just for master thesis, there was no context switching. I did not anticipated, how much would it help, that I did not had to do this. When I came back to the same project all the time, it was just easier to start.

The third was probably, that I wanted to put it out of my plate, and the grand gesture of travelling an hour and a half and putting everything else aside helped. I remember reading that grand gestures can help with the motivation and it really helped with it.

I have to say, that I still procrastinated. There were three days in these two weeks, that I had done nothing (one was the first day of menstruation, which is understandable). The rest of the day, the day when I put three hours of work was deemed unproductive, and on the productive day I could work 12 hours.

Which is probably why I was exhausted when I came back and why I am still tired.

I am still happy I did this, and I am already planning how I am going to repeat the experiment, when I am going to be finishing my cognitive science master thesis.

But I think, what it also showed me, is that I like working on one project at the time. I don't think this is quite possible in the work-force, especially if one if an entrepreneur. But maybe scheduling things in that way would be helpful. So, work on one thing until it is done or I am stuck, either because I no longer know how to continue forward or because I am waiting for somebody else to do something.

And I could also put this in my leisure. I could stop reading the articles, that are interesting, but are not my main focus. I should do these in the daily bursts as well (or however long am I interested) and put the rest of the time to more focused approach.

Which is why today and tomorrow, I am seeing how many of the notes that I had can I change into blog posts or delete, if they are no longer relevant. I have some of the non-processed notes from 2017, which is horrifying. Because, yes, they are almost two years old.

I don't know if this type of work is for everybody, but I don't know if I would be convinced, if I did not take on this experiment. And I do recommend to everybody, to try.

The Problem with Women in Tech Initiatives

When I was in the PyCokSK, I did my first lightning talk. And it was a rant (which, yes, I was aware of it being a rant at the time) about the women initiatives for programming. I don't have anything against women in programming, I am a woman in programming, but I find the whole fascination with the 50% representation goal weird. Why is this even the goal at all? What would achieving it even mean?

I will first say something, I don't understand feminism. At least, I don't understand what is the third wave feminism, because legal differences between genders, I can get behind why the world is most likely a better place without it. I did get an introduction to it from a bit of unlikely source, but I still don't understand it. Not only that, I think it had a negative effect on my life (knowing about it). Let me explain.

It started like this. I was abroad in Bratislava, and I, along with three of my other classmates from second year cognitive science, I took the philosophy of artificial intelligence. And every two weeks, we had a couple of articles and books to read, to have a discussion on them. In of of these, we had to read A Cyborg Manifesto by Haraway. I mean, I consider myself pretty smart, but I know that when I got to the end of it, the only thing that I felt was utter confusion of what-the-hell-did-I-just-read. I think I read it again, because I was absolutely sure, that I got something wrong. But no, it was not any clearer afterwards.

Well, I knew that at least two of my three classmates are going to read all the articles, and I shared a Monday class with one of them. So after a class, I came to him and asked asked him, if he would be willing to explain the point of the article with had to read for the philosophy of AI class, since I did not understand it. He was willing, so he wanted to know, which one, and I told him. His reply was that he liked the article and them asked me, which part did I not understand. And my reply, to a guy that said he liked the article seconds before was sort of like: "Everything. The article was constantly talking about oppression, like it is just given that it exists without any explanation." His explanation? Well, apparently this is what a third wave of feminism is.

I got a pretty interesting lecture/conversation out of it, and then I read a couple of books about feminism, trying to figure out what the hell is this. And after reading and thinking about it, I started noticing the whole groups ratio. The first thing that I noticed was, that since primary school, I did not had a single groups I was a part of, that involved more than half of women. I mean, in primary school I had dancing (mostly if not all women) and handball (where trainings were women only). But even in primary school, my best results were in mathematics, logics and physics, and there were more men there than women. In my school and in the competitions after the school level. Maybe if languages would be better for me, things would be different, because preparations for languages did hold more women, but I was never good at this. I mean, my English teacher told me, that she thinks I will never be fluent in English.

Considering that people go into STEM when they have high numerical but lower verbal intelligence, that should be the first sign for me to go to STEM from the start. Surprisingly, nobody ever tried to discourage me from that. And there were a lot of things people try to discourage me from, but from doing the supposedly men-oriented stuff? Never, not from family, not from society and not from school. Instead they tired to discourage me from going to what was perceived as high school for richer people (but still public and free, but I would feel isolated and lonely - I guess sort of like this guy, though I could not read it to the end), from going into economics (this one was from my family, and I am starting to understand why), from learning languages (we already touched upon this, right :) ) and so on and so on.

But let me see the things that I am currently a part of. At rhe lectures in cognitive science, we have a master seminar, and there might be more women, but not by a lot, but for most of the study it was about even. In the Python meetups, there are mostly men, with just some women. In the UX meetups, there are more men than women, though the ratio is not as skewed as in Python meetups. The place I worked with, well I saw one other woman so far, but I was not introduced to her. Instead I was introduced to a lot of men and I had status report presentations, where I was the only woman there. I guess I should be feeling isolated?

The problem is, I don't. For years, I had sometimes been the only woman in the room, and I did not even notice. At the start of the university, I was a lot of times also the youngest person in the room, by a large margin. In recent years, it is not that much true anymore. So I guess, I should be feeling scared and a victim? That there should be more people like me with me, so I would feel safer? More able to express myself? But the problem was, that I entered these groups because we had similar interests (like programming) so it was not hard for me to be myself.

I can also see the effect on other people. For example, in my native tongue, each noun used for the person can have both the female and male form. And I remember the last time, somebody used a male form of the word programmer to refer to me. Immediately, they started to apologize. And I was like, I don't care. I am sure most (but not all) would not care either. I heard stories from people, that are afraid to say that they disagree with this doctrine, because they are white, strong males, and so have no right to say it. And that the only reason why I can say it is because I am a woman. But apparently I am safe, because I am willing to say that from time to time.

Which is another good point, that as long as I am willing to say what I believe in, there are always going to be people that will agree with me on it, and this is a way to start finding them.

So, back to the point, I don't like being aware of the gender ratio of the group, because it is simply not important. I would have rather been in ignorance. Because now I am aware of it, without still understanding why this is a problem for some people.

I mean, somebody has to perceive it as a problem, because otherwise we would not have so many programs for teaching programming to women. In the article stereotype threat, they do suggest, that creating safe environments can help with fighting stereotype thread (assuming that this is a problem). But their examples are all like the upper three, make gender separated education in math, for example. And while I did not go deeper to read the original study, they made it sound, that gender imbalances make people feel less belonging, and this leads them to have less interest in participating. So, they are basically another groups of people, that I don't understand?

I had an interesting conversation this week. I was out on lunch and the talk came to how women are less direct. I then replied that this is also problem for some men and it would be easier, if there was less of that. I did had to admit on the end, that with women, this problem is more frequent. But it was interesting in the pause, that they never connected this problem to any males before. It was a classical stereotype.

There seems to be a lot of possible explanations for the differences, that they could fix. From stereotype thread making math less enjoyable and less interesting to women, as touched upon in the article stereotype threat. But when I was reading the meta-analysis of the gender differences, the biggest difference seems to be in interests. For example, there is like a Cohen.D difference of 1 between men and women. For some subsets of STEM, like science and maths, the differences still exist, but they were smaller. Just like they are almost none of the differences in intelligence (when looking at the effect sizes), and only some in masturbations and porn use and different illnesses, like depression and ADHD. Though this one checked the difference in the mean. Because when researching intelligence, it seems there are no differences in the mean, but they are in the variability. Even so, if there is not difference in mean, then the genders overlap a lot.

And it seems that disposition towards mathematics, science and engineering (where there are differences, see above) and creative tendencies (not sure if there are any gender differences there) is what makes people enter STEM, like shown in the article. And it seems, that if the people are interested in more their gender stereotyped or reverse gender stereotyped activities when children, in seem this would continue in the future, as shown in this article. But they there are studies, that show that the more women peruse romantic endeavors, the less interested they are in math. This reminds me of a podcast I listened recently. There one of the points (among many) was, that we need to pick the life we want to live, and imitating somebody's job would only work, if we also take their lifestyle with them. What then this makes me wonder is, what lifestyle do people in STEM have? The other thing is reminds me of, is the difference between Empathizing and Systematizing.

When did this piece become the excuse, to dump all the facts about gender differences?

Well, the point that I was trying to make is, that men and women are on average different. So that means, that there will never be 50% gender ratio in most things. And pushing for that, just to make some people feel less isolated, is a goal that I do not understand, and might not even be needed, if we put less emphasis on the gender ratios in the first place.

How to Find a Job

This week, when I went from school (I think it was yesterday, but it seems like much more time had passed), we ended up talking to a couple of schoolmates. When we were talking about it, I remembered the conversation I had at PyConSK. Especially how Exponia (I hope I wrote their names right) and Kiwi take all the programmers and how one person had to 'import' them from Ukraine. Since one of the classmates I talked about expressed his wish to work as a freelance programmers abroad, I shared this informations.

What I found interesting what the reaction of another classmate present there. It was sort of a call for more jobs for non-programmers. I think the main point was, that programmers have a lot of job opportunities, and the rest of them don't. Which, she is at least partly right. I mean, my last two job interviews for a programming jobs were basically, the job is yours, if you want it. And I hear from other programmers, that if you have a profile on LinkedIn, then you constantly get contacted by recruiters (I am not, so I can not confirm this). But the programmers are not the only ones like that, I know that at least mechanical engineers are in the same position.

But I have been thinking about it, and I don't think this is the main reason. I mean, even before I started to work as a programmer, if I wanted a specific job, I knew who to ask to find out, what I need to do to get it (assuming it can be done as a paying job). Sure, people would not throw jobs, but it is possible. On the other hand, from a very limited sample of talking to the people I study with, they just don't know how to attempt it.

So I have been thinking about the reason. I think that the main reason is, surprisingly, my lack of socialization drive. I know, it sounds weird, but let me explain in more detail. From middle of primary primary school to high school, I did not spend a lot of times with my classmates. In my whole time in high school, I have been invited to less than 10 parties. Which was great, since that left me with more time and energy to read books.

But then in the beginning of the university, I wanted to become a entrepreneur. So I started reading books about it, and one of the topics that resurface a lot was the importance of networking.

Now, here is a person, who doesn't get this whole social things, and tries to start networking. So what happens? Well, I go to lectures (outside classes) and start having conversations with lecturers. I start attending Toastmasters. I start going to meetups. Usually alone, which form what I heard, it is not a standard practice for a lot of people, but mostly still in school people. Which means, I needed to 'socialize' (yes, the quotes were deliberate) with people there. Which means that I always associated with people older than me, most of them having jobs and so on.

I mean, lets take Toastmasters for example. I was attending meetings there with an HR person from Dars (the company responsible for all the highways in Slovenia), a diplomate from the Foreign minister, a tourist guide, a programmer that was starting his own start-up, a psychiatrist in training, a project manager (I think, not sure about the exact position) from Krka, a person being employed for the company giving support for SAP, a person who ended up working for Google (I have no idea, what he did beforehand), and so on. I then went abroad, and could immediately make connection with these people, which included from English teacher to trainer hosting seminars.

If I at the time wanted to find a job, it would have been a hell of a lot easier than if my circle of friends consisted of my classmates. Which is also one (among many) of reasons, why I am against the women only places in programming. The fact say what they say, and there are right now more male programmers and there are more employed programmers as well. So they will be the one, that could provide the best advice and give the best opportunities.

Maybe, at least for me, this meeting have always been a way for me for force myself to be social, so I never wanted anything from these groups. That does not mean, that some things were not thrown at me (ranging from very good to very bad). But I would recommend anybody to give it a try. Worst case scenario, you meet some people, that you realize you never want to meet again. Best case scenarios, you show some wish and some incentive to do something about it, and people will start throwing opportunities are you. :)

The Characters from Arrowverse Appearing in the Same Stories in Fanfiction

I am (most likely) going to be analyzing fanfiction data for my master thesis. Since I already had this data avalable, I decided to try and see if I can come up with some interesting analysis.

One of the things, that I am interested in, is the relationships between people. I wanted to see which people appear together in the stories. For this, I used the tags of the stories and try to analyse when do the appear together.

import sqlite3
import os
import re
import bs4
import pandas
import networkx
import matplotlib.pyplot as plt
from matplotlib import cm
import numpy
import community
import json
database_file_name = "sqldata_arrowverse.sql"
folder_with_stories = "data"

First I needed to get all the character tags from the database (that I had collected beforehand).

re_remove_middle_names = r'(".*?")'
sql_database = sqlite3.connect(database_file_name)
cursor = sql_database.cursor()
cursor.execute("""DROP VIEW all_tags;""")
cursor.execute("""
CREATE VIEW all_tags AS SELECT work, tag FROM tags WHERE category='Character' AND tag IN 
(SELECT tag
FROM Tags 
WHERE category="Character"
AND tag NOT IN ('Jason Todd', 'Alfred Pennyworth', 'James "Bucky" Barnes', 'Team Legends', 'Rogues',
'OC - Character', 'Sam Winchester', 'Sebastian Smythe', 'Stiles Stilinski', 'Barbara Gordon',
'Original Character', 'Dawn Allen', 'Dean Winchester', 'Clint Barton', 'Hal Jordan', 'Tony Stark', 
'Steve Rogers', 'Dick Grayson', 'Original Child Character(s)', 'Original Male Character(s)',
'Diana (Wonder Woman)', 'You', 'Bruce Wayne', 'Reader', 'Original Female Character(s)',
'Original Characters', 'Jason Todd', 'Batman', 'Selina Kyle', 'Original Metahuman Character',
'Team Flash', 'Team Flash (The Flash TV 2014)', 'Team Legends (DC''s Legends of Tomorrow)', 
'Original Metahuman Character(s)', 'Rogues (The Flash)')
GROUP BY tag 
HAVING count(tag) > 99
ORDER BY count(tag) DESC);""")
cursor.execute("""SELECT t1.tag as tag1, t2.tag as tag2, count(*) 
FROM all_tags t1 
INNER JOIN all_tags t2 ON t2.work = t1.work
AND tag1<>tag2
GROUP BY t1.tag, t2.tag
ORDER BY count(*) DESC;""")
tags_together = cursor.fetchall()
cursor.execute("SELECT id FROM work")
works_number = len(cursor.fetchall())
cursor.execute("""SELECT tag, count(*) FROM tags 
WHERE category='Character' AND tag IN (SELECT tag FROM all_tags) 
GROUP BY tag""")
tags_number_by_person = cursor.fetchall()
sql_database.close()
len(tags_together)
19110

Since some characters had mutiple ways, that they can be refered to (it is a superhero show, so a lot of people have at least a cuperhero name), I am doing so preprocessing in order to deal with this.

combine_people_dict = {"The Flash - Character": "Barry Allen", 
                       "Killer Frost": "Caitlin Snow", 
                       "Harrison Wells | Eobard Thawne": "Eobard Thawne",
                       "Eobard Thawne | Harrison Wells": "Eobard Thawne",
                       "Zari Adrianna Tomaz": "Zari Tomaz",
                       "Supergirl - Character": "Kara Danvers",
                       "Kara Zor-El": "Kara Danvers",
                       "Alura In-Ze | Alura Zor-El": "Alura Zor-El",
                       "Jimmy Olsen": "James Olsen",
                       "J'onn J'onzz | Hank Henshaw": "J'onn J'onzz",
                       "Hank Henshaw | J'onn J'onzz": "J'onn J'onzz",
                       "mon-el": "Mon-El",
                       "Harry Wells": "Earth-2 Harrison Wells",
                       "Jay Garrick | Hunter Zolomon": "Zoom",
                       "Winn Schott Jr.": "Winn Schott",
                       "Captain Cold": "Leonard Snart",
                       "Jess the Secretary": "Jess"}
tags_together_dict = dict()
for person1, person2, count in tags_together:
    if person1 == 'Harrison "Harry" Wells':
        person1 = "Earth-2 Harrison Wells"
    if person2 == 'Harrison "Harry" Wells':
        person2 = "Earth-2 Harrison Wells"        
    person1 = person1.split("(")[0].strip()
    person2 = person2.split("(")[0].strip()
    string_to_remove_1 = re.findall(re_remove_middle_names, person1)
    string_to_remove_2 = re.findall(re_remove_middle_names, person2)
    if string_to_remove_1:
        string_to_remove_1 = string_to_remove_1[0]
        person1 = person1[:person1.index(string_to_remove_1) - 1] + person1[person1.index(string_to_remove_1) + len(string_to_remove_1):]
    if string_to_remove_2:
        string_to_remove_2 = string_to_remove_2[0]
        person2 = person2[:person2.index(string_to_remove_2) - 1] + person2[person2.index(string_to_remove_2) + len(string_to_remove_2):]
    if person1 in combine_people_dict:
        person1 = combine_people_dict[person1]
    if person2 in combine_people_dict:
        person2 = combine_people_dict[person2]
    if not person1 in tags_together_dict:
        tags_together_dict[person1] = dict()
    if not person2 in tags_together_dict[person1]:
        tags_together_dict[person1][person2] = 0
    tags_together_dict[person1][person2] += count
len(tags_together_dict.keys())
137
tags_person_dict = dict()
for person, count in tags_number_by_person:
    tags_person_dict[person] = count

So, now then I did the preprocesing of people and connection, I have my first data. And this is, in how many stories a character appears. Kara seems to be the most popular.

tags_person_pandas = pandas.DataFrame.from_dict(tags_person_dict, orient="index", columns=["Count"])
tags_person_pandas.reset_index(level=0, inplace=True)
tags_person_pandas.sort_values("Count", ascending=False, inplace=True)
tags_person_pandas.head(10)
index Count
80 Kara Danvers 17055
118 Oliver Queen 15330
10 Barry Allen 14777
45 Felicity Smoak 13503
2 Alex Danvers 12858
87 Lena Luthor 9482
88 Leonard Snart 8789
140 Sara Lance 8125
19 Cisco Ramon 8112
12 Caitlin Snow 6792
all_relationships = []
for person1 in tags_together_dict:
    for person2 in tags_together_dict[person1]:
        all_relationships.append(tuple([person1, person2, {"weight": tags_together_dict[person1][person2]}]))

So now that we have this, let us try to vizualize the whole network of people.

S = networkx.Graph()
S.add_nodes_from([a for a in tags_together_dict])
S.add_edges_from(all_relationships)
len(S.nodes())
137
plt.figure(1,figsize=(20,20)) 
networkx.draw(S, 
        with_labels=True, 
        pos=networkx.spring_layout(S), 
        font_weight='bold', 
        node_color="yellow", 
        width=3, 
        arrows=True, 
        node_size=2000,
        edge_color = numpy.linspace(0,1,len(S.edges()))
       )
/usr/lib/python3.7/site-packages/networkx/drawing/nx_pylab.py:611: MatplotlibDeprecationWarning: isinstance(..., numbers.Number)
  if cb.is_numlike(alpha):

png

Even limiting to just characters that appear in at least 100 stories (which is between 0.1% and 0.2% of stories), there is not a lot of things that can be seen from the graph. So the next point is to also limit it to the connections, that exist in at least 100 stories.

lowest_weight = 100
S = networkx.Graph()
S.add_edges_from([a for a in all_relationships if a[2]["weight"] > lowest_weight])
plt.figure(1,figsize=(30,30)) 
networkx.draw(S, 
        with_labels=True, 
        pos=networkx.spring_layout(S), 
        font_weight='bold', 
        node_color="yellow", 
        width=3, 
        arrows=True, 
        node_size=2000,
        edge_color = numpy.linspace(0,1,len(S.edges()))
             )

png

Looking at the graph above, there seems to be a clear devide between the Supergirl and the rest of the series. The other three series have some differences, but there seems to be a lot more connections between them. Which is interesting, it is like the audience for Supergirl is different than for the other three shows.

In the next part, I want to see which characters are the most influential.

pagerank = pandas.DataFrame.from_dict(networkx.pagerank(S, weight='weight'), orient="index", columns=["PageRank"])
pagerank.reset_index(level=0, inplace=True)
centrality = pandas.DataFrame.from_dict(networkx.degree_centrality(S), orient="index", columns=["Centrality"])
centrality.reset_index(level=0, inplace=True)
betweenes = pandas.DataFrame.from_dict(networkx.betweenness_centrality(S, weight='weight'), orient="index", columns=["Between"])
betweenes.reset_index(level=0, inplace=True)
ranking = pagerank.merge(centrality, left_on='index', right_on='index')
ranking = ranking.merge(betweenes, left_on='index', right_on='index')

The first one that I want to see is the PageRank. This is the one that takes the strength of connecting nodes in account when calculating it. So, somebody, that is not well connected, but is connected to well connected people could still have a high page rank.

In this regard, the three of the four leads are at the top (Legends of Tomorrow is the one missing). Kara is leading in this one.

ranking.sort_values("PageRank", ascending=False).head(10)
index PageRank Centrality Between
25 Kara Danvers 0.057478 0.488 0.117742
7 Barry Allen 0.050267 0.592 0.097677
1 Oliver Queen 0.048903 0.512 0.116387
0 Felicity Smoak 0.040533 0.440 0.065419
34 Alex Danvers 0.040261 0.352 0.058516
10 Cisco Ramon 0.034321 0.464 0.032194
5 Sara Lance 0.032335 0.440 0.052516
9 Caitlin Snow 0.031984 0.488 0.070516
48 Lena Luthor 0.029198 0.288 0.079419
21 Leonard Snart 0.029180 0.360 0.025613

The next one is centrality, which just means that a person has a lot of relationships with other characters. The same three people (four, since Kara is sharing her spot with Caitlin) are at the top. But people like Caitlin and Cisco rised up, while people like Felicity Smoak fell down. Some people like Lena Luthor and Alex Danvers are no longer there (both from Supergirl), while people like Mick Rory and Iris West appeared (both from Flash).

ranking.sort_values("Centrality", ascending=False).head(10)
index PageRank Centrality Between
7 Barry Allen 0.050267 0.592 0.097677
1 Oliver Queen 0.048903 0.512 0.116387
25 Kara Danvers 0.057478 0.488 0.117742
9 Caitlin Snow 0.031984 0.488 0.070516
10 Cisco Ramon 0.034321 0.464 0.032194
12 Iris West 0.028664 0.448 0.124968
0 Felicity Smoak 0.040533 0.440 0.065419
5 Sara Lance 0.032335 0.440 0.052516
21 Leonard Snart 0.029180 0.360 0.025613
23 Mick Rory 0.025047 0.360 0.030258

And the last one is betweeness. It is the bridge between different clusters and removing this node would increase the path of other people. And where Iris West don't really makes sense.

ranking.sort_values("Between", ascending=False).head(10)
index PageRank Centrality Between
12 Iris West 0.028664 0.448 0.124968
25 Kara Danvers 0.057478 0.488 0.117742
1 Oliver Queen 0.048903 0.512 0.116387
7 Barry Allen 0.050267 0.592 0.097677
48 Lena Luthor 0.029198 0.288 0.079419
9 Caitlin Snow 0.031984 0.488 0.070516
0 Felicity Smoak 0.040533 0.440 0.065419
34 Alex Danvers 0.040261 0.352 0.058516
64 James Olsen 0.015777 0.232 0.055677
5 Sara Lance 0.032335 0.440 0.052516

Because Iris does not makes much sense, I calculated the unweighted betweeness. Kara makes a lot more sense, since she is the connecting node between her Earth-38 and Earth-1. This is also why Oliver and Bary makes sense, since they were the ones usually going together with Kara.

betweenes_2 = pandas.DataFrame.from_dict(networkx.betweenness_centrality(S), orient="index", columns=["Between2"])
betweenes_2.reset_index(level=0, inplace=True)
ranking = ranking.merge(betweenes_2, left_on='index', right_on='index')
ranking.sort_values("Between2", ascending=False).head(10)
index PageRank Centrality Between Between2
25 Kara Danvers 0.057478 0.488 0.117742 0.210268
1 Oliver Queen 0.048903 0.512 0.116387 0.141382
7 Barry Allen 0.050267 0.592 0.097677 0.140181
10 Cisco Ramon 0.034321 0.464 0.032194 0.070581
34 Alex Danvers 0.040261 0.352 0.058516 0.069021
5 Sara Lance 0.032335 0.440 0.052516 0.065604
0 Felicity Smoak 0.040533 0.440 0.065419 0.062297
9 Caitlin Snow 0.031984 0.488 0.070516 0.061614
12 Iris West 0.028664 0.448 0.124968 0.058943
48 Lena Luthor 0.029198 0.288 0.079419 0.045495

Next, I wanted to try if the algoritm could find any communities in the data. And looking at the picture below, it did a lot better job than expected. Most people got correctly assigned to the series they appear in.

plt.figure(1,figsize=(30,30)) 
partition = community.best_partition(S, weight="weight")
size = (len(set(partition.values())))
pos = networkx.spring_layout(S)
count = 0
colors = [cm.jet(x) for x in numpy.linspace(0, 1, size)]
labels = {node[0]: node[0] for node in S.nodes(data=True)}
for com in set(partition.values()):
    list_nodes = [nodes for nodes in partition.keys()
                                if partition[nodes] == com]
    networkx.draw_networkx_nodes(S, pos, list_nodes, node_size = 2000, node_color=colors[count])
    networkx.draw_networkx_labels(S, pos, labels, font_size=20, alpha=0.8)
    count = count + 1
networkx.draw_networkx_edges(S, pos)
'c' argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with 'x' & 'y'.  Please use a 2-D array with a single row if you really want to specify the same RGB or RGBA value for all points.
'c' argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with 'x' & 'y'.  Please use a 2-D array with a single row if you really want to specify the same RGB or RGBA value for all points.
'c' argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with 'x' & 'y'.  Please use a 2-D array with a single row if you really want to specify the same RGB or RGBA value for all points.
'c' argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with 'x' & 'y'.  Please use a 2-D array with a single row if you really want to specify the same RGB or RGBA value for all points.
/usr/lib/python3.7/site-packages/networkx/drawing/nx_pylab.py:611: MatplotlibDeprecationWarning: isinstance(..., numbers.Number)
  if cb.is_numlike(alpha):





<matplotlib.collections.LineCollection at 0x7f0495492470>

png

Here is the function, to save the graph above in the JSON file for vizualization on the website.

all_characters_as_numbers = dict()
for i, name in enumerate(S.nodes()):
    all_characters_as_numbers[name] = i
nodes = [{'name': all_characters_as_numbers[i], 'label': i, 'fandom': str(partition[i])} for i in S.nodes()]
links = [{'source': all_characters_as_numbers[u[0]], 'target': all_characters_as_numbers[u[1]]} for u in S.edges()]
with open('graph2.json', 'w') as f:
    json.dump({'nodes': nodes, 'links': links},
              f, indent=4,)

Now I wanted to try and vizualize the different communities. I put a higher limit on it, so people are only present, if they appear in at least 500 stories. This makes graphs more understandable.

lowest_weight_subgraph = 500
nodes_arrow = {item: group for item, group in partition.items() if group == 0}
S = networkx.Graph()
S.add_edges_from([a for a in all_relationships if a[1] in nodes_arrow and a[0] in nodes_arrow and a[2]['weight'] > lowest_weight_subgraph])
plt.figure(1,figsize=(30,30)) 
partition_arrow = community.best_partition(S, weight="weight")
size = (len(set(partition_arrow.values())))
pos = networkx.spring_layout(S)
count = 0
colors = [cm.jet(x) for x in numpy.linspace(0, 1, size)]
labels = {node[0]: node[0] for node in S.nodes(data=True)}
for com in set(partition_arrow.values()):
    list_nodes = [nodes for nodes in partition_arrow.keys()
                                if partition_arrow[nodes] == com]
    networkx.draw_networkx_nodes(S, pos, list_nodes, node_size = 2000, node_color=colors[count])
    networkx.draw_networkx_labels(S, pos, labels, font_size=20, alpha=0.8)
    count = count + 1
networkx.draw_networkx_edges(S, pos)
'c' argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with 'x' & 'y'.  Please use a 2-D array with a single row if you really want to specify the same RGB or RGBA value for all points.
'c' argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with 'x' & 'y'.  Please use a 2-D array with a single row if you really want to specify the same RGB or RGBA value for all points.





<matplotlib.collections.LineCollection at 0x7f049743ba90>

png

nodes_lot = {item: group for item, group in partition.items() if group == 1}
S = networkx.Graph()
S.add_edges_from([a for a in all_relationships if a[1] in nodes_lot and a[0] in nodes_lot and a[2]["weight"] > lowest_weight_subgraph])
plt.figure(1,figsize=(30,30)) 
partition_lot = community.best_partition(S, weight="weight")
size = (len(set(partition_lot.values())))
pos = networkx.spring_layout(S)
count = 0
colors = [cm.jet(x) for x in numpy.linspace(0, 1, size)]
labels = {node[0]: node[0] for node in S.nodes(data=True)}
for com in set(partition_lot.values()):
    list_nodes = [nodes for nodes in partition_lot.keys()
                                if partition_lot[nodes] == com]
    networkx.draw_networkx_nodes(S, pos, list_nodes, node_size = 2000, node_color=colors[count])
    networkx.draw_networkx_labels(S, pos, labels, font_size=20, alpha=0.8)
    count = count + 1
networkx.draw_networkx_edges(S, pos)
'c' argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with 'x' & 'y'.  Please use a 2-D array with a single row if you really want to specify the same RGB or RGBA value for all points.
'c' argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with 'x' & 'y'.  Please use a 2-D array with a single row if you really want to specify the same RGB or RGBA value for all points.





<matplotlib.collections.LineCollection at 0x7f049772beb8>

png

nodes_flash = {item: group for item, group in partition.items() if group == 2}
S = networkx.Graph()
S.add_edges_from([a for a in all_relationships if a[1] in nodes_flash and a[0] in nodes_flash and a[2]["weight"] > lowest_weight_subgraph])
plt.figure(1,figsize=(30,30)) 
partition_flash = community.best_partition(S, weight="weight")
size = (len(set(partition_flash.values())))
pos = networkx.spring_layout(S)
count = 0
colors = [cm.jet(x) for x in numpy.linspace(0, 1, size)]
labels = {node[0]: node[0] for node in S.nodes(data=True)}
for com in set(partition_flash.values()):
    list_nodes = [nodes for nodes in partition_flash.keys()
                                if partition_flash[nodes] == com]
    networkx.draw_networkx_nodes(S, pos, list_nodes, node_size = 2000, node_color=colors[count])
    networkx.draw_networkx_labels(S, pos, labels, font_size=20, alpha=0.8)
    count = count + 1
networkx.draw_networkx_edges(S, pos)
'c' argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with 'x' & 'y'.  Please use a 2-D array with a single row if you really want to specify the same RGB or RGBA value for all points.





<matplotlib.collections.LineCollection at 0x7f0494829cf8>

png

nodes_supergirl = {item: group for item, group in partition.items() if group == 3}
S = networkx.Graph()
S.add_edges_from([a for a in all_relationships if a[1] in nodes_supergirl and a[0] in nodes_supergirl and a[2]["weight"] > lowest_weight_subgraph])
plt.figure(1,figsize=(30,30)) 
partition_supergirl = community.best_partition(S, weight="weight")
size = (len(set(partition_supergirl.values())))
pos = networkx.spring_layout(S)
count = 0
colors = [cm.jet(x) for x in numpy.linspace(0, 1, size)]
labels = {node[0]: node[0] for node in S.nodes(data=True)}
for com in set(partition_supergirl.values()):
    list_nodes = [nodes for nodes in partition_supergirl.keys()
                                if partition_supergirl[nodes] == com]
    networkx.draw_networkx_nodes(S, pos, list_nodes, node_size = 2000, node_color=colors[count])
    networkx.draw_networkx_labels(S, pos, labels, font_size=20, alpha=0.8)
    count = count + 1
networkx.draw_networkx_edges(S, pos)
'c' argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with 'x' & 'y'.  Please use a 2-D array with a single row if you really want to specify the same RGB or RGBA value for all points.
'c' argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with 'x' & 'y'.  Please use a 2-D array with a single row if you really want to specify the same RGB or RGBA value for all points.





<matplotlib.collections.LineCollection at 0x7f0494787ef0>

png

For the end, I played a little with vizualization for Javascript, mostly to be able to eventually put the vizualization directlly in the blog, as a part of story telling. Here is the code and current result.

%%html
<div id="d3-example"></div>
<style>
.node {stroke: #fff; stroke-width: 1.5px;}
.link {stroke: #999; stroke-opacity: .6;}
</style>
%%javascript
// We load the d3.js library from the Web.
require.config({paths:
    {d3: "http://d3js.org/d3.v3.min"}});
require(["d3"], function(d3) {
  // The code in this block is executed when the
  // d3.js library has been loaded.

  // First, we specify the size of the canvas
  // containing the visualization (size of the
  // <div> element).
  var width = 500, height = 500;

  // We create a color scale.
  var color = d3.scale.category10();

  // We create a force-directed dynamic graph layout.
  var force = d3.layout.force()
    .charge(-100)
    .linkDistance(100)
    .size([width, height]);

  // In the <div> element, we create a <svg> graphic
  // that will contain our interactive visualization.
  var svg = d3.select("#d3-example").select("svg")
  if (svg.empty()) {
    svg = d3.select("#d3-example").append("svg")
          .attr("width", width)
          .attr("height", height);
  }

  // We load the JSON file.
  d3.json("graph2.json", function(error, graph) {
    // In this block, the file has been loaded
    // and the 'graph' object contains our graph.

    // We load the nodes and links in the
    // force-directed graph.
    force.nodes(graph.nodes)
      .links(graph.links)
      .start();

    // We create a <line> SVG element for each link
    // in the graph.
    var link = svg.selectAll(".link")
      .data(graph.links)
      .enter().append("line")
      .attr("class", "link");

    // We create a <circle> SVG element for each node
    // in the graph, and we specify a few attributes.
    var node = svg.selectAll(".node")
      .data(graph.nodes)
      .enter().append("circle")
      .attr("r", 5)  // radius
      .style("fill", function(d) {
         // The node color depends on the club.
         return color(d.fandom);
      })
      .call(force.drag);

    // We bind the positions of the SVG elements
    // to the positions of the dynamic force-directed
    // graph, at each time step.
    force.on("tick", function() {
      link.attr("x1", function(d){return d.source.x})
          .attr("y1", function(d){return d.source.y})
          .attr("x2", function(d){return d.target.x})
          .attr("y2", function(d){return d.target.y});

      node.attr("cx", function(d){return d.x})
          .attr("cy", function(d){return d.y});
    });
  });
});
<IPython.core.display.Javascript object>

My First Python Conference at PyConSK

Right now, when I am writing this, I am sitting on the bus from Bratislava to Ljubljana. This weekend, I have attended my first python conference, namely the PyConSK2019 conference. I have been hearing about how great the software developer's conferences were every single month (along with some weird stories about pants, which I am glad I did not see on this conference :) ). I have to admit, that I caved to the peer pressure and decided to check, if they were really what they were hyped to be (peer pressure is going to play some role later as well). Considering the hype, I think there should be something more? But I can understand now, why they hype them, as this one at least was great.

My preparation for this was basically asking one friend of mine, how do they look like and what would his recommendation be. According to him, there is going to be too much information to process all on the fly and that it will be a great social opportunity. Which, alright, was helpful for my expectations, but not really in what I want to do at the conference.

So, my plan for the conference was just to go there and observe this, and then the next time, I will know what to do. Spoiler alert, that is not what I ended up doing at all. I went from i-have-no-idea-why-i-am-doing-somebody-stop-me on the first day, to being surprisingly comfortable doing and socializing in the third day. I don't know if the same is true for all software development communities, but the Python community is the most accepting and open community, that I had ever been a part of. It is something, that for me is crystal clear, but maybe it does not come up enough. The point, that I will also be returning to later.

So, when I arrived there, I talked to two of the guys. I have to admit, I don't remember any of their names, though I could check one, since I attended his talk later during the conference. Sometimes, I don't know if I don't include the names because they might me more privacy conscious or because I don't remember a lot of them and I am trying to be consistent. Well, anyway, I got my first tips there. First, the description that I got as a preparation for the conference was accurate. The second was to check the Django girls tutorial, which has some easy example of very simple concepts. The same ones, that are too obvious for me to try and explain them to the novices in programming, that I am tutoring sometimes (not, that I am a very good one).

As far as the morning session goes: I will remember Arvil, in case somebody will need a site without too much complication, if they needed something more than a static website. But I (so far) don't see myself using it. I am excited now to try the MindsDB, both because the data science is something that I am interested in and become they were convincing. The space talk was nice, but there was nothing that I remember would be something, that would be helpful to remember. (I did write notes during the presentation, but I am writing the summary from my head. Just to see, what my mind found it important enough to remember).

Then it was lunch, and I was positively surprised, that there was lunch included. I ended up talking to a woman. We discussed many things, but the one that will come up later is, that this was her first tech conference, if she did not count the women in tech conference that she attended. And how these are organized, to bring the gender ratio to 50:50.

After the lunch, I attended the workshop on passwordless authentication. Where a person would get a link in their email to log in with. So we went through example, and even though it did not work on the end (firewall problems or something), the code itself seems simple enough. I already have a personal project in mind, in where I am going to try using it. The code is here (I am going to assume, that since it is on GitHub, it is alright to link to it?).

After the workshop, I went back to listening to the lectures. The GitHub bots gave me one or two potential ideas, but the workshop that happened later was more helpful for me. I think this is when they announced, that it is possible to apply for the lightning talk, because I don't remember even the main message of the next talk. There were unwanted thought in my head all the time, in which I am going to expand about in the next paragraph. I then applied for a lightning talk, but then I was still not very present for the next talk. I remember that there was some comparison with Facebook and some stock prices, but I also have no idea, what the main message was. The last one was an funny talk about the time zones. Along with a pointer to a timezone database, that I will be using in the future. It seems like a fun database to play with (not to mention, with historical data, there are a lot of thing that I can visualize at least).

Now, what did I mean with the unwanted thoughts in the previous paragraph. There were a couple of things that were maybe a bit unusual. So, back in Ljubljana, I have this friend, that I had known for years. One conversation with him eventually lead to the creation of this blog years ago. And we have been recently seeing each other in the Python Meetups. So, I knew that I eventually am going to do the lightning talks and the main talks at the meetup, but I wanted to take a lot more time. So, after that main presentation, he was like giving me the comment, that the next one is going to be at the conference with the standing applause. And I was already primed for a rant because of the comment at the lunch about the women tech conferences. Not, that the person I was speaking with was making me like that, but the whole idea of equality of outcomes, which will never happen without some version of totalitarianism (which I really don't like). And then the poor organizer had to open the flood with the comment about the lightning talks. I am aware, that it was not his fault, he was just doing his job. And I remembered the lightning talk that I gave at the Python Meetup before. And then it was playing on the loop, for the entire lecture (which was 40 minutes) the voice of that friend of mine and his comments and me trying to convince myself to not do it.

Well, it did not work. Like at all. And I realized, that I will not be able to talk myself out of it. So I went to the place, where one could apply for a lightning talk, with the intention that they will talk me out of it. Well, things did not go according to plans. They were all so encouraging and nice and helpful and tried to convince me to just apply. Which was the opposite of what I wanted at that time. But there was a catch, which was that if there were too many, they will randomly choose some. So, even if I apply, there was still a chance I will not do it. It would be better, if the choice was not random, but as long as there was a chance to not do it, I will take it.

Things once again did not go according to plan. There were only four people that applied, so I did had to go to the stage. To make matter worse, the first one was a quite well prepared speech about a framework (I think it was about parallelization, I think this one: https://github.com/ray-project/ray?). The second one was the one, where he showed us a keyboard hack. And to convince us about it, he spoke in like ten different languages. And then there was me, and a announcement for another python conference (I think it was for the one in Berlin?).

My presentation was based on my agent based model to understand the gender differences in STEM. It was about 3 minutes long and it was a rant. I think they will put the video there eventually, so I will link to it, when I will notice that it is up. And afterwards, I was mortified.

Spoilers: it did end up being a good idea.

I decided that I will not go to the party (I was still under stress from the lightning talk). I walked back with the women, whose comment helped inspire this, and then wrote an angry email to a friend, that was responsible for the voice in my head. I do regret that email a bit now, but I know that until I will not have a bad reaction to an angry email, I will keep sending them (not that I send many). So, if by any miniscule chance, that person will ever read this blog, he knows that I was talking about him. He is dangerous, but I would count him as somebody, who had a noticeable positive effect on my life (this blog would be example enough, and that was not the only thing).

So I went to sleep early, and then there was day two. I got lost during the Django talk, I think the reason was mostly that the talk was quite detailed, but I had never before even tired Django's ORM. Otherwise, I think I would have found it a lot more interesting. Then it was interactive talk about Google API's, which were fun, but I don't need them for professional project (especially since data privacy is a big thing with whom I work for) and I don't see myself using it in personal projects. Maybe in the future. The next talk was about OAuth (Slides), which is great, because I will likely have to implement some sort of authentication service, and this seems like a good way. The final decision is not mine, but the talk convinced me that for me, this become like the first approach, that I would try. Then I listened to the talk, which was presenting some weird stuff in python, that (if I understood correctly) were there for the optimization. Like changing the list inside tuple in the same line, where the error is thrown out.

I don't really remember lunch. I remember eating something, but this was it. I think I might have ended outside, with a cup of tea, on the sun?

So, after lunch I attended the workshop about teaching algorithmic efficiency, where I was clearly not the target audience, but it was a nice overview of it. Then I decided to play hooky and go on the drink with one of my sort-of former classmate. That was fun. When I came back, I listened to the talk about micro:bits (still not sure, how they are different from raspberry pi), participated in the Quiz (apparently Python was released the same year I was born) and listened to the lightning talks (this time I did not participate).

The lightning talks had a lot of announcements of the future conferences. There was a guy that is sending messages from one conference to another. There was a really interesting one about the Indie OAuth, which build on the concepts of Indie Web, which I though it was an interesting idea, when I first encountered it. I can't remember any other out of my head.

Next it was socializing. I talked to way too many people. Some for longer, some for shorter amount of time. There was a guy with existentialist problems, there was a woman would was about as old as I was, but had a resume longer than some of the 40-something people I know and more impressive than others. There was this one guy, that was interested in cognitive science and the foodie woman. There was one, who really encouraged me to learn some soft skills. Then there was this other guy, who we talked about the quality of talks at the conferences, one guy, that I only one day later I remembered, I talked to years ago (thankfully, he did not remember me either) and a nice guy with a guitar. And I probably missed a lot. (Yes, I remember other things about them, but I don't think I remember a single name).

I only came back to the hostel at three in the morning.

The next day, I decided to come a bit later, since the English program did not start until 10 o'clock. I enjoyed a good cup of tea, as I was waiting. I ended up talking to a pair of people, and ended up being late to the workshop.

The workshop itself was helpful for three reasons. It was about graph analysis, which I will have to know for my cognitive science master thesis, and I realized that most of the things that I found out on my own were the things, that are considered basics (which for me is a lot of times not true). Next, I knew that I will eventually have to deal with communities, and this was a nice intro (on Game of Thrones) and I had a small problem with a small analysis that I was doing, considering weights, that I got a good idea in how to tackle it in a new way in the US Airport example. So, overall a good workshop.

Then it was already lunch, when I ate and then spend a lot of time talking to the guy from yesterday (the one that was really interested in cognitive science).

Then I attended the GitHub Bots tutorial. They were using the Octomachinery tutorial. I had to leave before the end, since there is only one direct bus from Bratislava to Ljubljana, and I wanted to catch the direct one, since when I was buying the tickets, I was not sure how sleepy I will be. Well, instead of sleep, I have been writing this blog for the last 2 hours and a half. But I think I am at the end.

Overall impression (which I know I should wait to process everything, but whatever), is that these events are great and now I want to attend more of them. I really need to finish with my two master thesis, so that I could actually get a 'real' job and my wallet will not mind, if I just go to as many of these are I have the time and energy :).

Goals and Measurements

There is an interesting phenomena, when it comes to the goal setting and measurement. In economics, it is called the Goodhard's Law, but it also exists in the artificial intelligence and I have heard anecdotally, that it also exists in the hiring. Basically, what it means is, that when some statistical regularity is used as a target, then this target can break this regularity. It means, that just because something worked in the past, if we take it as a goal, it stops being a good predictor.

Lets take an very cliche and cartoony example from the personal health. But I think it is really understandable for explaining the principle. I don't know a lot about health studies, but I think there is a connection between health and not being overweight. Otherwise, they would not talk about obesity epidemic and be all panicky about children being more obese, because it is bad for their health? I mean, it is an assumption, but if it is wrong, then a lot of discourse in media is just... misleading :). So, lets now say, that because we want to be more healthy, the weight will be the crude measurement of that (I am from social sciences, so I am very ok with crude measurements). So we start dieting, and at one point, the people send us to the hospital, because we can't function normally anymore - which can be the effect of severe anorexia.

I guess I could say, that the very act of measurement can change the phenomena itself. I could bring quantum physics here, but I think that then I would only prove, that I don't know much about it. I can see this with myself. The very act of me putting every scientific article that I read in my bib file makes me more likely to try and finish the article, even if it is not that interesting. And that is without me checking how many of them do I read very frequently. I think the first time that I checked was for the New Year, and that was after more than a year of starting collecting. And I have not checked since. The books are the same, but there the effect is a bit different. Now, I am slightly more favoring the short books, than the long ones.

Well, reading a lot of books, short or long, and finishing more scientific articles, whenever they are relevant or not, is not a bad. But I am not checking constantly. Imagine how things change for people, that keep track of more aspects of their life. Quantified self movement comes to mind. How are these very subtle effect coming together to shape their lives?

Not to mention, that this is sort of done to us on a regular basis. Lanier in the book Ten arguments for Deleting Your Social Media Accounts Right Now calls them BUMMER (Behaviours of Users Modified, and Made into an Empire for Rent), which are the social media and search engines and personalized advertising and so on (the book itself is quite good, I can recommend it). How their algorithms collect the information about us, and then they use this as a target to predict our behaviour. Can you see the problem, in relation to the problem defined above?

What does this means for us? It means, that just because something can be measured, that does not necessary means, that it is a good goal. That we need to be more careful about how do we measure the goals for ourselves. But it also gives us hope. One thing that economics figured out, that when the governments were using inflation as a target, the Goodhart's Law kicked in and it became useless for prediction. But once it stopped being the measurement, it again become the good predictor. So, on the short term, I don't think it is that bad to use measurements like that. Lets say that one wants to be healthy. First one could try losing weight (if overweight), then start exercise more frequently and so on. Each measurement would only be under this effect for a short amount of time.

I guess even better would be to use multiple ones, since it is hard to go too much in one direction with multiple of them. It there really needs to be a measurement, then they could be combined, but summing them (better with regards to avoid overlearning) or as some sort of principle-component index (because not everything is a good indicator).

The best, but probably not so easy way, would be to use all the evidence and have a more holistic way of looking at the situation. Not everything can be measured and maybe our long-term improvement in live needs a bit more quantitative approach, not just qualitative.

Family Recipes

It is Lent right now, and even though I am not religious, I still decided to try and curb one of my addiction at this time. I my case, it was sweets, and I have to admit, that first days were harder, than I imagined. I guess there was some addiction there, like I would have joked for the last couple of years.

I think this is why I went through the recipes, that I had collected. I realized that way too many of them were sweets. Not counting sweets, there was only one recipe, that I still wanted to try. Another thing, that I could get rid of. (This is always a time for celebration :) )

I also found some of the recipes, that I got from my two grandmothers. Since I might regret not having them in the future, I will write them down here. Even if they are sweets.

What I always assumed where my grandmother cookies (but then I realized, that she did not put chocolate in them):

Ingredients:

  • 200g of flour (polnozrnata pšenična moka)
  • 0,5 little spoon of baking soda (jedila soda)
  • 100g of butter
  • 75g of sugar (trsnega sladkorja)
  • 75g of chocolate
  • 1 egg
  • 1 little spoon of vanilla (vanilijev extrakt)

Maybe grandmother cookies (?):

  • 2 dl of water
  • 70 dag of butter
  • pinch of salt
  • 100g of flour
  • 3 eggs

Bake for 20 minutes at 200°C

Some chocolate cookie things that my other grandmother once got:

  • 3 eggs (whole)
  • 15 dag of sugar
  • 10 dag of butter
  • 32 dag of flour
  • a little of soda

  • walnuts, raisins, chocolate, (soncn? - maybe sunflower seeds? I can't read)

So I am putting it there, mostly so I make it easier to myself for throwing it away.

The Fear of Boredom

Do you know these conversations, where the conversation itself is a everyday thing, but they when you really think about it, you realize why there are problems in your life? Well, I recently had a conversation like that with a friend of mine. And there were three points in that conversation, that eventually led me to realize what is holding me back (and how useless it is):

  1. I told him that I am afraid of boredom
  2. He observed, that I know, how to shrug other ideas off
  3. I told him, that this is because I had too many of them

I mean, there are like everyday normal thing, that come up in the conversation all the time. Well, expect the boredom part, that one was a surprise even for me.

So things started to change very quickly after that. For one thing, the first thing I did was delete a lot of my files in the Later folder. This is a folder, where I keep all the things that I am currently working on: the articles that I want to read, the code that I am writing, the collection of ideas that I had and so on. A lot of it went away.

This is also an example of action coming before the insight. Because this is the first thing that I did, before I realized the rest of it.

Then I looked back, and I that insight came: I had a couple of interesting opportunities, that I did not follow up on, because I did not want to add something new to the plate. I did not know, how to handle another project on top of what I already had. So, I let the opportunities slip, which could make my life a lot more interesting.

But on the other hand, I am an INTP. I might not have the big vision stuff going on, but I can spot many interesting ideas in a single day. I mean, every time I come from the Python Meetup, I have more ideas that I could implement in a month, without having a conversation with a single purpose. I get way to many ideas from books or lectures and so on. That is before people offer me a chance to work on something.

And then I realized, that out of fear of not having enough opportunities (something my mind and the world already showed me that I do), I am sabotaging myself in taking them.

That made me realize, that I better finish the things that I started (my two master thesis, the working version of UExperience - though I know this will me a lot more long term as far as support goes, but by then, it will be a lot smaller demand on my time), eliminate what I don't want to do (I am looking at you Introduction to Cognitive Science 2) or things that I had procrastinated way too long on (like that analysis project of Arrowverse, which had from deep learning to graph analysis in it, learning lips and so on).

And I could eliminate a lot of internet communication. I am sorry, but I already have a hobbies that I can do, when I don't have energy for something else, and this is drawing and watching my favorite series in other languages (currently German). It is still going to be there, in case I ever need some more input, right?

I finally got to the end of this insight (about a week and a half after the conversation - it felt longer), when I was rereading the book Focus by Leo Babauta. Which talks about how to find some focus in our lives. And then the association when to the book Deep Work by Cal Newport and then to the So Good they can't Ignore You also by Cal Newport. And these three books were sort of a social proof, that I am on the right track. :)

I think this has been a lot easier for me, because I had dabbled in minimalism for years. So eliminating clutter is something that had become easier through time. Maybe for somebody else, the road would be a bit harder. And it is also interesting to see, if this road to minimalism will lead to bigger abundance.

But I think that even more important is, that we all need some sort of a mirror to see, what we are doing. So that in fear of something, we don't end up doing something, that will make it more likely to come true. Or, as a proverb, that I remember from one of the fanfiction stories: "One often meets one’s fate on the road one takes to avoid it."

I need to learn to post blogs when I write them as well - this one was posted more than a week after writting it

Learning Statistics with Basketball

As I am reading some of the statistical articles, to help me with the master thesis, I had come across one, that might be interested to more people.

This short (4 pages) article talks about how high-school students could be more motivated in statistics, but comparing Michael Jordan and LeBron James. So it is interesting to people who had interest in basketball (there are some analysis and data there), education and statistics.

The article is Understanding summary statistics and graphical techniques to compare Michael Jordan versus LeBron James

Social Skills and Math Skills

library(tidyverse)

I had to tutor a lot of people in mathematical or statistical part of economics and in programming. And a lot of times (though far from always), there were these moment, when I could see that people were not getting it, but I was not sure why.

I remeber a case in my master studies, where one of my classmates from abroad had problems, and wanted my help. So I said yes. Well, the problem was, that I eventually found out, that he did not even uderstand some of the basic mathematical principles. Like using the letters to stand for the numbers and linear regression. How does one explains this? I don't know, because to be, it was clear from the start.

It was only when I started tutoring in the programming, that I was able to formulate some hypotesis. One interesting things, that I had seen, with the people I was tutoring, was the use of the theory of mind. At least some of their problems steamed from their expectation, that the computer has a theory of mind and that this is a conversations.

I mean, it is, but it is a lot more structured. It is a conversation, where everything has to be explicitly said or was at one point explicitly agreed on, and these agreements can be checked. There is no reading the intention going one.

Maybe this is the reason, why some people are afraid of the code? Because the code/computer might take offense?

So, my hypotesis was, that this social understanding would impeed the ability to do math and code.

I found the article titled The Empathizing-Systemizing Theory, Social Abilities, and Mathematical Achievement in Children, that used the systemizing-empthatizing to try and research this, in out case, the empthazing would be connected to the empathy/social skills and so on. The people lower on empthaizing were better at math (calculating at this level). Systemizing was not connected in this stage.

But by the time people come to the university, systemizing was conneted to math intelligence, as discivered in article Systemisers are better at maths. Subject and gender differences in math dissapread when controled for systemizing.

This is as far as ability goes. In the article Testing the Empathizing–Systemizing theory of sex differences and the Extreme Male Brain theory of autism in half a million people, one of the analysis that they did was the difference in systemizing and empathizing between STEM and non-STEM employees. The STEM employes were lower in empthaizing (beta = -1.10) and higher in systemizing (beta = +1.27).

Now, if anybody is interested in the short, but very readable analysis of gender differences in STEM, I suggest the report Why don’t more girls choose to pursue a science career?

So, as far as the literature goes, there is some indication, that mathematical interest or skill could be connected to social skills.

So in order to get one more piece of the information, I will try to find some data and do analysis in this direction as well.

Country Level Analysis of Agreeablness and PISA Mathematical Scores

While the effect on the country levels can be different than the effects on the individual levels, it can still be useful to check it. Expecially, since a lot more country-level data is already avalable.

First I combined two dataset. The personality came for the article The Geographic Distribution of the Big Five Personality Traits: Patterns and Profiles of Human Self-Description Across 56 Nations, which can be found here: https://www.toddkshackelford.com/downloads/Schmitt-JCCP-2007.pdf . And the mathematical data came for the 2015 PISA results, that can be found here: http://pisadataexplorer.oecd.org . I combined the dataset, so it only included the countries, that were represented in both (40 countries).

data_country = read.csv("data/data-personality-math.csv")
head(data_country)
Country Math E A C N O
Argentina 409 49.10 42.75 48.18 55.05 50.83
Australia 494 48.98 47.51 45.87 50.82 50.07
Austria 497 50.61 45.90 46.73 49.69 49.29
Belgium 507 45.99 45.07 43.03 53.60 54.59
Brazil 377 45.89 45.86 45.38 53.14 49.16
Canada 516 48.32 49.14 49.05 50.58 48.75

Now, the social skills (that I am interested in) is most connected to agreeablness. So this is what I am going to be using for this analysis.

Let us first see the distribution of the two variables, that will be used.

ggplot(data_country, aes(x=Math)) + 
       geom_histogram(bins=20) +
       xlab("PISA Avreage Score Levels") + 
       ggtitle("Distribution of Mathematical Skills in Dataset")

Histogram of Mathematical Skills

ggplot(data_country, aes(x=A)) + 
       geom_histogram(bins=20) +
       xlab("Avreage Agreeablness Level") + 
       ggtitle("Distribution of Agreeablness in Dataset")

Histogram of Agreeablness Levels

And now for the connection.

ggplot(data_country, aes(x=Math, y=A)) + 
       geom_jitter(width=0.01, alpha=0.3) +
       ylab("Avreage Agreeablness Level") + 
       xlab("PISA Mathematical Level") + 
       ggtitle("PISA Math Skills and Agreeablness")

Scatterplot of Math Skill and Agreeablness Level

PISA results are standardized to 500 mean and 100 standard deviation over all the countries. The personality took US as standard and had all the countries standardized with US having the mean of 50 and standard deviation of 10. Since the scaling can be potential problem in statistical analysis, I will devide PISA scores with 10. This way, that will both be standardized on the similar scale (but not the same).

data_country$Math <- data_country$Math/10
head(data_country)
Country Math E A C N O
Argentina 40.9 49.10 42.75 48.18 55.05 50.83
Australia 49.4 48.98 47.51 45.87 50.82 50.07
Austria 49.7 50.61 45.90 46.73 49.69 49.29
Belgium 50.7 45.99 45.07 43.03 53.60 54.59
Brazil 37.7 45.89 45.86 45.38 53.14 49.16
Canada 51.6 48.32 49.14 49.05 50.58 48.75

Now, I will build a linear regression model, where I will try to predict mathematical scores from agreeablness.

model_country = lm(data_country$Math ~ data_country$A)

While the coeficient is -0.24 (so in the direction, that I would predict), I did not have a high enough sample to make it statistically significant. So this might just be a chance of random fluctuation.

summary(model_country)
Call:
lm(formula = data_country$Math ~ data_country$A)

Residuals:
   Min     1Q Median     3Q    Max 
-9.595 -2.489  1.317  3.140  6.736

Coefficients:
               Estimate Std. Error t value Pr(&gt;|t|)    
(Intercept)     58.4208    12.7601   4.578 4.91e-05 ***
data_country$A  -0.2426     0.2708  -0.896    0.376    
---
Signif. codes:  0 *** 0.001 ** 0.01 * 0.05 . 0.1   1

Residual standard error: 4.596 on 38 degrees of freedom
Multiple R-squared:  0.02069,   Adjusted R-squared:  -0.005086 
F-statistic: 0.8026 on 1 and 38 DF,  p-value: 0.3759
ggplot(data_country, aes(x=Math, y=A)) + 
       geom_jitter(width=0.01, alpha=0.3) +
       geom_smooth(method=lm, color="red") + 
       ylab("Agreeablness") + 
       xlab("Math Skills") + 
       ggtitle("Connection Between Agreeablness and Math")

Graph of Connection Between Agreeablness and Math Skills

Just for my own interest, I want to see, which one would have the highest effect amoung personalty dimentions. I will again use linear regression.

model_country_5 = lm(data_country$Math ~ data_country$A + data_country$C + data_country$O + data_country$E + data_country$N)
summary(model_country_5)
Call:
lm(formula = data_country$Math ~ data_country$A + data_country$C + 
    data_country$O + data_country$E + data_country$N)

Residuals:
    Min      1Q  Median      3Q     Max 
-10.053  -2.184   1.636   2.686   5.458

Coefficients:
               Estimate Std. Error t value Pr(&gt;|t|)   
(Intercept)    151.4562    54.6188   2.773  0.00895 **
data_country$A  -0.0740     0.3350  -0.221  0.82649   
data_country$C  -0.5721     0.3204  -1.786  0.08306 . 
data_country$O  -0.3000     0.2818  -1.065  0.29452   
data_country$E  -0.4551     0.6164  -0.738  0.46542   
data_country$N  -0.7286     0.4199  -1.735  0.09177 . 
---
Signif. codes:  0 *** 0.001 ** 0.01 * 0.05 . 0.1   1

Residual standard error: 4.42 on 34 degrees of freedom
Multiple R-squared:  0.1896,    Adjusted R-squared:  0.07048 
F-statistic: 1.591 on 5 and 34 DF,  p-value: 0.189

It seems, that when all the personality dimentions are added in, the agreeablness have the least affect. Instread it seems that neuroticism and maybe openess are much more interesting in this regard.