Skip to main content

Spending Based on Personality can Increase Your Happiness

I have recently read a pretty interesting article. I think by now it quite widespread knowledge, that more money does not make one happy. It helped until about 60.000$ (based on US standard), and after that it has minimal effect. So it was pretty interesting to see a scientific article, that had a different thesis.

The article Money Buys Happiness When Spending Fits Our Personality talks about how people with different personalities spend, how well this predicts their happiness level and if this could be manipulated to increase people's happiness. The quick answers were that people with different personalities spend money on different things, that the more a person spend in accordance to their personality, the happier they were (which was the stronger predictor than total income and total spending) and this can be manipulated by making people spend money on certain things.

Lets get first to the point of personal relevance: how can this help me spend money, to make me more happy? Well, I used the aggregate data from the article and created a small program, that helps you figure this out. Here you just input the percentile scores for each Big Five trait, and it will generate a score for each activity in the dataset. Since there are a lot of them, you can set a threshold, so only activities higher than certain values are shown.

If you don't have a Big Five percentile test, there are a couple of places, that you can get it. I recommend the SAPA test, but this one can be quite long. If you have a long text written in English or a Twitter account where you tweet in English, you can also just put it in this IBM Personality Tool, and it will also generate you a personality profile. But any test, you can find on the internet, and gives you results from 0 to 100 will work. Just search for Big Five personality tests. Or any apps, that will give you the same type of results from some other stuff. I know that there exists at least one, were it is calculated from Facebook account.






Only show activities with score higher than:

Based on my results, spending money on books would make me happier. Which I agree with, as most of my money, after spending for neccesities (rent, food, ...) goes for books, DVDs, courses and conferences. Books being one of the important parts.

Now, that I got my programming self-plug out of the way (I still hope it was useful to somebody), I will return back to the article. I will first copy the table of different categories and their correlations with different personality traits. Here you can see some of the connections.

Category Openness Conscientiousness Extraversion Agreeableness Neuroticism
Accountants’ fees −1.81 2.02 −1.40 −0.68 −0.62
Advertising services 1.98 0.70 2.04 −0.04 0.34
Airports and duty-free shops −0.50 0.96 0.34 −0.18 −0.02
Arts and crafts 2.51 0.20 1.05 1.71 −0.46
Bakers and confectioners 1.45 1.59 0.86 1.41 −0.80
Books 1.71 1.92 −0.82 1.53 −1.39
Cable and satellite TV 0.48 0.00 1.29 −0.17 0.14
Car rentals −0.53 1.39 −0.06 0.31 −0.96
Caravans and camping 1.65 0.60 1.51 1.00 −0.64
Catalogue and bargain stores −0.34 −0.27 0.35 0.54 −0.21
Charities −0.35 1.65 0.10 2.31 −1.39
Cinemas 2.30 0.22 1.75 0.71 −0.02
Clothes 0.83 0.44 0.96 0.89 −0.44
Coffee shops 0.89 1.24 0.45 1.79 −1.23
Computers and technology 1.36 2.05 0.28 0.19 −1.00
Confectioners and tobacconists 0.75 0.21 0.77 0.42 −0.06
Days out and tourism 2.19 0.57 2.25 1.10 −0.28
Dental care −1.25 1.79 −0.59 0.32 −0.59
Department stores −0.30 1.28 0.70 0.57 −0.62
Digital 1.55 1.05 0.77 0.02 −0.45
Discount stores −0.17 −0.42 0.32 0.28 0.19
DIY projects 2.22 1.37 1.20 0.98 −0.54
Eating out: pubs 1.35 −0.41 2.22 0.40 0.48
Eating out: restaurants 1.56 0.44 1.74 0.91 −0.39
Entertainment 2.67 −0.43 2.51 0.31 0.49
Family clothes −0.28 0.43 0.00 1.16 −0.96
Florists 1.69 1.38 1.13 1.87 −0.98
Foreign travel 2.54 0.65 2.15 0.85 −0.11
Gambling 1.55 −2.08 2.33 −1.81 1.98
Gardening 0.59 1.75 −0.73 1.94 −1.59
Gift shops 0.83 0.94 0.55 1.74 −0.94
Hair and beauty 1.91 0.31 1.49 0.85 0.22
Hardware −0.78 1.73 −0.61 0.04 −1.22
Health and fitness 0.32 2.22 1.29 1.00 −0.93
Health insurance −1.61 1.52 −1.11 −0.16 −0.50
Home furnishing 0.63 1.48 0.17 1.38 −1.22
Home insurance −2.05 2.40 −1.46 0.33 −1.48
Hotels −0.16 1.69 0.31 1.55 −1.63
Information technology 0.93 1.36 0.33 0.15 −0.80
Jewelry 1.60 0.73 1.43 0.96 −0.61
Life insurance −1.30 2.21 −1.02 1.11 −1.25
Mobile telephone 1.02 1.33 1.65 0.33 −0.13
Motor sports 1.34 0.09 2.32 −0.55 0.82
Music 2.61 0.12 2.33 0.94 0.15
Newsagents −0.22 0.76 1.06 −0.29 0.12
Pets 1.14 0.08 2.04 1.98 0.24
Photography 2.33 0.69 1.44 1.09 −0.33
Residential mortgages −2.10 1.98 −1.40 −0.48 −0.85
Shoe shops 0.40 1.19 0.43 0.58 −0.77
Sports 1.44 1.30 2.24 −0.41 0.77
Stationery −0.14 1.98 −0.78 1.51 −1.63
Subscriptions −0.43 1.42 −0.26 0.44 −0.86
Supermarkets −0.69 1.27 0.51 0.58 −0.73
Takeout food 0.84 −0.07 1.16 0.23 −0.19
Toys and hobbies 2.19 −0.90 1.94 0.78 −0.06
Traffic fines −2.25 0.91 −0.58 −2.33 1.34
Travel 2.51 0.24 2.37 1.18 −0.20
TV license −0.17 1.29 0.26 −0.33 −0.39
Unions and subscriptions −1.04 1.26 0.42 −0.58 0.25

One of the experiments was also quite interesting. They collected introverts and extroverts. And then they gave half of each the voucher for book and half of each a voucher for a drink in a bar. Then they asked them, when they were making purchase, about their happiness. The extroverts were satisfied with both, but there was a large difference with introverts. They were a lot more happy with the book.

What can this help us? Well, for one, if you have no idea what to give somebody for a gift, and you don't want to ask them, it help help you choose something, that they might be more happy with. It help help you see, if you are spending more than average on something that makes you unhappy, and you can try a test to spend less, to see if that makes you happier. You can increase the spending for things that make you happier. I don't know, I think there are a lot of way, this can be used, just like most self-knowledge tidbits.

I also added the Jupyter-Notebook analysis of it.

Profiling from Blogs and from Social Media

On the first of April, I had presented my cognitive science master thesis topic at the class. The slides can be found here (in Slovenian). Very short summary would be, that I am researching the individual differences in sharing the opinion on social media.

After the presentation, I was talking to my classmates. One of them mentioned, that if this means, that I am more careful with what I post on the internet. My reply was, that I am posting my blog (the one that I am reading right here). And I don't mind, if anybody tries to analyse me based on this.

But on the other hand, I don't really post things on social media (I really need to delete the last remaining ones, that I have). And for anything that I can get in stores here in Ljubljana, I am using cash. So in a way, I am more careful in what data am I leaving, just in a different way.

I also studied business informatics, and this gave me a glimpse of what people can do with the data. Once the data is cleaned in the databases/tables, there is a lot of information, that can be gleaned from comparing the users. It is how the basic recommendation systems work. You find people or groups of people, that have the similar evaluation of the same works. And then check, what other works did these people also rated high and then recommend it to people, that they did not rate it yet.

Another example if the information, that can be gleaned from the liking behaviour. There was a good article, that showed how personality, gender and other attributes can be discovered through liking. People with different individual differences like different things, and this can be used to discover things about people.

But in order for this to happen, the data has to be in sort of a structured form. The blog is not exactly the structured form (unless it includes a lot of semantic web features). It is unstructured text, with maybe some videos and podcast thrown in. Maybe a picture or more as well. This means, that there is more work needed, in order to get these data in such form (the search engines still do it). So no everybody will do it with everybody.

The other reason I will borrow is from Janor Lanier book titled Ten Arguments for Deleting Your Social Media Accounts Right Now. And this is the BUMMER principle. But the BUMMER only makes sense, if the companies can get some money out of it. On my blog, I have no advertisements and I doubt I will ever have it, so what is the point of doing it?

Does this mean, that there is no way that people will abuse it? I mean, if I have a person trying to target me directly, I am sure they will go through all the writings that I wrote, trying to find something about me. But I am not afraid of that. I just don't want to be just another entity in the database.

Which is why I don't mind sharing the info through the Indie-web, which blogs are. And there is another plus form my side, and this is that the content is under my control. Nobody can delete the blogs but me. And even if the servers go down and the country blocks my webpage, I still have my backup. I don't have thin on any website, which I do not control. I had already lost some of my data, because the website stopped working. But here it depends on whatever I want to continue paying for the domain and hosting and nothing else.

How I Managed to Force Myself to Finish my Master Thesis

The last two weeks, I had spent at the seaside, finishing my economics master thesis. The finished version was already send to my mentor, and I have to say, that I am still coming back for that. I am still tired, probably mostly mentally.

But how did I manage to finish something in two weeks, when I had procrastinated on it for months? I mean, I knew in January that I had enough data and I had procrastinated with that as well? I know that I did not do anything from January to April.

I think there were multiple reasons, most of them connected with solitude. Since it is not a season yet, there were not many people there. I exchanged words with three people, and one of them was the lady selling the bread. Which freed my mental capacities, so I could concentrated on the master thesis only.

The second reason is similar in a way. Because I was there just for master thesis, there was no context switching. I did not anticipated, how much would it help, that I did not had to do this. When I came back to the same project all the time, it was just easier to start.

The third was probably, that I wanted to put it out of my plate, and the grand gesture of travelling an hour and a half and putting everything else aside helped. I remember reading that grand gestures can help with the motivation and it really helped with it.

I have to say, that I still procrastinated. There were three days in these two weeks, that I had done nothing (one was the first day of menstruation, which is understandable). The rest of the day, the day when I put three hours of work was deemed unproductive, and on the productive day I could work 12 hours.

Which is probably why I was exhausted when I came back and why I am still tired.

I am still happy I did this, and I am already planning how I am going to repeat the experiment, when I am going to be finishing my cognitive science master thesis.

But I think, what it also showed me, is that I like working on one project at the time. I don't think this is quite possible in the work-force, especially if one if an entrepreneur. But maybe scheduling things in that way would be helpful. So, work on one thing until it is done or I am stuck, either because I no longer know how to continue forward or because I am waiting for somebody else to do something.

And I could also put this in my leisure. I could stop reading the articles, that are interesting, but are not my main focus. I should do these in the daily bursts as well (or however long am I interested) and put the rest of the time to more focused approach.

Which is why today and tomorrow, I am seeing how many of the notes that I had can I change into blog posts or delete, if they are no longer relevant. I have some of the non-processed notes from 2017, which is horrifying. Because, yes, they are almost two years old.

I don't know if this type of work is for everybody, but I don't know if I would be convinced, if I did not take on this experiment. And I do recommend to everybody, to try.

The Problem with Women in Tech Initiatives

When I was in the PyCokSK, I did my first lightning talk. And it was a rant (which, yes, I was aware of it being a rant at the time) about the women initiatives for programming. I don't have anything against women in programming, I am a woman in programming, but I find the whole fascination with the 50% representation goal weird. Why is this even the goal at all? What would achieving it even mean?

I will first say something, I don't understand feminism. At least, I don't understand what is the third wave feminism, because legal differences between genders, I can get behind why the world is most likely a better place without it. I did get an introduction to it from a bit of unlikely source, but I still don't understand it. Not only that, I think it had a negative effect on my life (knowing about it). Let me explain.

It started like this. I was abroad in Bratislava, and I, along with three of my other classmates from second year cognitive science, I took the philosophy of artificial intelligence. And every two weeks, we had a couple of articles and books to read, to have a discussion on them. In of of these, we had to read A Cyborg Manifesto by Haraway. I mean, I consider myself pretty smart, but I know that when I got to the end of it, the only thing that I felt was utter confusion of what-the-hell-did-I-just-read. I think I read it again, because I was absolutely sure, that I got something wrong. But no, it was not any clearer afterwards.

Well, I knew that at least two of my three classmates are going to read all the articles, and I shared a Monday class with one of them. So after a class, I came to him and asked asked him, if he would be willing to explain the point of the article with had to read for the philosophy of AI class, since I did not understand it. He was willing, so he wanted to know, which one, and I told him. His reply was that he liked the article and them asked me, which part did I not understand. And my reply, to a guy that said he liked the article seconds before was sort of like: "Everything. The article was constantly talking about oppression, like it is just given that it exists without any explanation." His explanation? Well, apparently this is what a third wave of feminism is.

I got a pretty interesting lecture/conversation out of it, and then I read a couple of books about feminism, trying to figure out what the hell is this. And after reading and thinking about it, I started noticing the whole groups ratio. The first thing that I noticed was, that since primary school, I did not had a single groups I was a part of, that involved more than half of women. I mean, in primary school I had dancing (mostly if not all women) and handball (where trainings were women only). But even in primary school, my best results were in mathematics, logics and physics, and there were more men there than women. In my school and in the competitions after the school level. Maybe if languages would be better for me, things would be different, because preparations for languages did hold more women, but I was never good at this. I mean, my English teacher told me, that she thinks I will never be fluent in English.

Considering that people go into STEM when they have high numerical but lower verbal intelligence, that should be the first sign for me to go to STEM from the start. Surprisingly, nobody ever tried to discourage me from that. And there were a lot of things people try to discourage me from, but from doing the supposedly men-oriented stuff? Never, not from family, not from society and not from school. Instead they tired to discourage me from going to what was perceived as high school for richer people (but still public and free, but I would feel isolated and lonely - I guess sort of like this guy, though I could not read it to the end), from going into economics (this one was from my family, and I am starting to understand why), from learning languages (we already touched upon this, right :) ) and so on and so on.

But let me see the things that I am currently a part of. At rhe lectures in cognitive science, we have a master seminar, and there might be more women, but not by a lot, but for most of the study it was about even. In the Python meetups, there are mostly men, with just some women. In the UX meetups, there are more men than women, though the ratio is not as skewed as in Python meetups. The place I worked with, well I saw one other woman so far, but I was not introduced to her. Instead I was introduced to a lot of men and I had status report presentations, where I was the only woman there. I guess I should be feeling isolated?

The problem is, I don't. For years, I had sometimes been the only woman in the room, and I did not even notice. At the start of the university, I was a lot of times also the youngest person in the room, by a large margin. In recent years, it is not that much true anymore. So I guess, I should be feeling scared and a victim? That there should be more people like me with me, so I would feel safer? More able to express myself? But the problem was, that I entered these groups because we had similar interests (like programming) so it was not hard for me to be myself.

I can also see the effect on other people. For example, in my native tongue, each noun used for the person can have both the female and male form. And I remember the last time, somebody used a male form of the word programmer to refer to me. Immediately, they started to apologize. And I was like, I don't care. I am sure most (but not all) would not care either. I heard stories from people, that are afraid to say that they disagree with this doctrine, because they are white, strong males, and so have no right to say it. And that the only reason why I can say it is because I am a woman. But apparently I am safe, because I am willing to say that from time to time.

Which is another good point, that as long as I am willing to say what I believe in, there are always going to be people that will agree with me on it, and this is a way to start finding them.

So, back to the point, I don't like being aware of the gender ratio of the group, because it is simply not important. I would have rather been in ignorance. Because now I am aware of it, without still understanding why this is a problem for some people.

I mean, somebody has to perceive it as a problem, because otherwise we would not have so many programs for teaching programming to women. In the article stereotype threat, they do suggest, that creating safe environments can help with fighting stereotype thread (assuming that this is a problem). But their examples are all like the upper three, make gender separated education in math, for example. And while I did not go deeper to read the original study, they made it sound, that gender imbalances make people feel less belonging, and this leads them to have less interest in participating. So, they are basically another groups of people, that I don't understand?

I had an interesting conversation this week. I was out on lunch and the talk came to how women are less direct. I then replied that this is also problem for some men and it would be easier, if there was less of that. I did had to admit on the end, that with women, this problem is more frequent. But it was interesting in the pause, that they never connected this problem to any males before. It was a classical stereotype.

There seems to be a lot of possible explanations for the differences, that they could fix. From stereotype thread making math less enjoyable and less interesting to women, as touched upon in the article stereotype threat. But when I was reading the meta-analysis of the gender differences, the biggest difference seems to be in interests. For example, there is like a Cohen.D difference of 1 between men and women. For some subsets of STEM, like science and maths, the differences still exist, but they were smaller. Just like they are almost none of the differences in intelligence (when looking at the effect sizes), and only some in masturbations and porn use and different illnesses, like depression and ADHD. Though this one checked the difference in the mean. Because when researching intelligence, it seems there are no differences in the mean, but they are in the variability. Even so, if there is not difference in mean, then the genders overlap a lot.

And it seems that disposition towards mathematics, science and engineering (where there are differences, see above) and creative tendencies (not sure if there are any gender differences there) is what makes people enter STEM, like shown in the article. And it seems, that if the people are interested in more their gender stereotyped or reverse gender stereotyped activities when children, in seem this would continue in the future, as shown in this article. But they there are studies, that show that the more women peruse romantic endeavors, the less interested they are in math. This reminds me of a podcast I listened recently. There one of the points (among many) was, that we need to pick the life we want to live, and imitating somebody's job would only work, if we also take their lifestyle with them. What then this makes me wonder is, what lifestyle do people in STEM have? The other thing is reminds me of, is the difference between Empathizing and Systematizing.

When did this piece become the excuse, to dump all the facts about gender differences?

Well, the point that I was trying to make is, that men and women are on average different. So that means, that there will never be 50% gender ratio in most things. And pushing for that, just to make some people feel less isolated, is a goal that I do not understand, and might not even be needed, if we put less emphasis on the gender ratios in the first place.

How to Find a Job

This week, when I went from school (I think it was yesterday, but it seems like much more time had passed), we ended up talking to a couple of schoolmates. When we were talking about it, I remembered the conversation I had at PyConSK. Especially how Exponia (I hope I wrote their names right) and Kiwi take all the programmers and how one person had to 'import' them from Ukraine. Since one of the classmates I talked about expressed his wish to work as a freelance programmers abroad, I shared this informations.

What I found interesting what the reaction of another classmate present there. It was sort of a call for more jobs for non-programmers. I think the main point was, that programmers have a lot of job opportunities, and the rest of them don't. Which, she is at least partly right. I mean, my last two job interviews for a programming jobs were basically, the job is yours, if you want it. And I hear from other programmers, that if you have a profile on LinkedIn, then you constantly get contacted by recruiters (I am not, so I can not confirm this). But the programmers are not the only ones like that, I know that at least mechanical engineers are in the same position.

But I have been thinking about it, and I don't think this is the main reason. I mean, even before I started to work as a programmer, if I wanted a specific job, I knew who to ask to find out, what I need to do to get it (assuming it can be done as a paying job). Sure, people would not throw jobs, but it is possible. On the other hand, from a very limited sample of talking to the people I study with, they just don't know how to attempt it.

So I have been thinking about the reason. I think that the main reason is, surprisingly, my lack of socialization drive. I know, it sounds weird, but let me explain in more detail. From middle of primary primary school to high school, I did not spend a lot of times with my classmates. In my whole time in high school, I have been invited to less than 10 parties. Which was great, since that left me with more time and energy to read books.

But then in the beginning of the university, I wanted to become a entrepreneur. So I started reading books about it, and one of the topics that resurface a lot was the importance of networking.

Now, here is a person, who doesn't get this whole social things, and tries to start networking. So what happens? Well, I go to lectures (outside classes) and start having conversations with lecturers. I start attending Toastmasters. I start going to meetups. Usually alone, which form what I heard, it is not a standard practice for a lot of people, but mostly still in school people. Which means, I needed to 'socialize' (yes, the quotes were deliberate) with people there. Which means that I always associated with people older than me, most of them having jobs and so on.

I mean, lets take Toastmasters for example. I was attending meetings there with an HR person from Dars (the company responsible for all the highways in Slovenia), a diplomate from the Foreign minister, a tourist guide, a programmer that was starting his own start-up, a psychiatrist in training, a project manager (I think, not sure about the exact position) from Krka, a person being employed for the company giving support for SAP, a person who ended up working for Google (I have no idea, what he did beforehand), and so on. I then went abroad, and could immediately make connection with these people, which included from English teacher to trainer hosting seminars.

If I at the time wanted to find a job, it would have been a hell of a lot easier than if my circle of friends consisted of my classmates. Which is also one (among many) of reasons, why I am against the women only places in programming. The fact say what they say, and there are right now more male programmers and there are more employed programmers as well. So they will be the one, that could provide the best advice and give the best opportunities.

Maybe, at least for me, this meeting have always been a way for me for force myself to be social, so I never wanted anything from these groups. That does not mean, that some things were not thrown at me (ranging from very good to very bad). But I would recommend anybody to give it a try. Worst case scenario, you meet some people, that you realize you never want to meet again. Best case scenarios, you show some wish and some incentive to do something about it, and people will start throwing opportunities are you. :)

The Characters from Arrowverse Appearing in the Same Stories in Fanfiction

I am (most likely) going to be analyzing fanfiction data for my master thesis. Since I already had this data avalable, I decided to try and see if I can come up with some interesting analysis.

One of the things, that I am interested in, is the relationships between people. I wanted to see which people appear together in the stories. For this, I used the tags of the stories and try to analyse when do the appear together.

import sqlite3
import os
import re
import bs4
import pandas
import networkx
import matplotlib.pyplot as plt
from matplotlib import cm
import numpy
import community
import json
database_file_name = "sqldata_arrowverse.sql"
folder_with_stories = "data"

First I needed to get all the character tags from the database (that I had collected beforehand).

re_remove_middle_names = r'(".*?")'
sql_database = sqlite3.connect(database_file_name)
cursor = sql_database.cursor()
cursor.execute("""DROP VIEW all_tags;""")
CREATE VIEW all_tags AS SELECT work, tag FROM tags WHERE category='Character' AND tag IN 
FROM Tags 
WHERE category="Character"
AND tag NOT IN ('Jason Todd', 'Alfred Pennyworth', 'James "Bucky" Barnes', 'Team Legends', 'Rogues',
'OC - Character', 'Sam Winchester', 'Sebastian Smythe', 'Stiles Stilinski', 'Barbara Gordon',
'Original Character', 'Dawn Allen', 'Dean Winchester', 'Clint Barton', 'Hal Jordan', 'Tony Stark', 
'Steve Rogers', 'Dick Grayson', 'Original Child Character(s)', 'Original Male Character(s)',
'Diana (Wonder Woman)', 'You', 'Bruce Wayne', 'Reader', 'Original Female Character(s)',
'Original Characters', 'Jason Todd', 'Batman', 'Selina Kyle', 'Original Metahuman Character',
'Team Flash', 'Team Flash (The Flash TV 2014)', 'Team Legends (DC''s Legends of Tomorrow)', 
'Original Metahuman Character(s)', 'Rogues (The Flash)')
HAVING count(tag) > 99
ORDER BY count(tag) DESC);""")
cursor.execute("""SELECT t1.tag as tag1, t2.tag as tag2, count(*) 
FROM all_tags t1 
INNER JOIN all_tags t2 ON =
AND tag1<>tag2
GROUP BY t1.tag, t2.tag
ORDER BY count(*) DESC;""")
tags_together = cursor.fetchall()
cursor.execute("SELECT id FROM work")
works_number = len(cursor.fetchall())
cursor.execute("""SELECT tag, count(*) FROM tags 
WHERE category='Character' AND tag IN (SELECT tag FROM all_tags) 
GROUP BY tag""")
tags_number_by_person = cursor.fetchall()

Since some characters had mutiple ways, that they can be refered to (it is a superhero show, so a lot of people have at least a cuperhero name), I am doing so preprocessing in order to deal with this.

combine_people_dict = {"The Flash - Character": "Barry Allen", 
                       "Killer Frost": "Caitlin Snow", 
                       "Harrison Wells | Eobard Thawne": "Eobard Thawne",
                       "Eobard Thawne | Harrison Wells": "Eobard Thawne",
                       "Zari Adrianna Tomaz": "Zari Tomaz",
                       "Supergirl - Character": "Kara Danvers",
                       "Kara Zor-El": "Kara Danvers",
                       "Alura In-Ze | Alura Zor-El": "Alura Zor-El",
                       "Jimmy Olsen": "James Olsen",
                       "J'onn J'onzz | Hank Henshaw": "J'onn J'onzz",
                       "Hank Henshaw | J'onn J'onzz": "J'onn J'onzz",
                       "mon-el": "Mon-El",
                       "Harry Wells": "Earth-2 Harrison Wells",
                       "Jay Garrick | Hunter Zolomon": "Zoom",
                       "Winn Schott Jr.": "Winn Schott",
                       "Captain Cold": "Leonard Snart",
                       "Jess the Secretary": "Jess"}
tags_together_dict = dict()
for person1, person2, count in tags_together:
    if person1 == 'Harrison "Harry" Wells':
        person1 = "Earth-2 Harrison Wells"
    if person2 == 'Harrison "Harry" Wells':
        person2 = "Earth-2 Harrison Wells"        
    person1 = person1.split("(")[0].strip()
    person2 = person2.split("(")[0].strip()
    string_to_remove_1 = re.findall(re_remove_middle_names, person1)
    string_to_remove_2 = re.findall(re_remove_middle_names, person2)
    if string_to_remove_1:
        string_to_remove_1 = string_to_remove_1[0]
        person1 = person1[:person1.index(string_to_remove_1) - 1] + person1[person1.index(string_to_remove_1) + len(string_to_remove_1):]
    if string_to_remove_2:
        string_to_remove_2 = string_to_remove_2[0]
        person2 = person2[:person2.index(string_to_remove_2) - 1] + person2[person2.index(string_to_remove_2) + len(string_to_remove_2):]
    if person1 in combine_people_dict:
        person1 = combine_people_dict[person1]
    if person2 in combine_people_dict:
        person2 = combine_people_dict[person2]
    if not person1 in tags_together_dict:
        tags_together_dict[person1] = dict()
    if not person2 in tags_together_dict[person1]:
        tags_together_dict[person1][person2] = 0
    tags_together_dict[person1][person2] += count
tags_person_dict = dict()
for person, count in tags_number_by_person:
    tags_person_dict[person] = count

So, now then I did the preprocesing of people and connection, I have my first data. And this is, in how many stories a character appears. Kara seems to be the most popular.

tags_person_pandas = pandas.DataFrame.from_dict(tags_person_dict, orient="index", columns=["Count"])
tags_person_pandas.reset_index(level=0, inplace=True)
tags_person_pandas.sort_values("Count", ascending=False, inplace=True)
index Count
80 Kara Danvers 17055
118 Oliver Queen 15330
10 Barry Allen 14777
45 Felicity Smoak 13503
2 Alex Danvers 12858
87 Lena Luthor 9482
88 Leonard Snart 8789
140 Sara Lance 8125
19 Cisco Ramon 8112
12 Caitlin Snow 6792
all_relationships = []
for person1 in tags_together_dict:
    for person2 in tags_together_dict[person1]:
        all_relationships.append(tuple([person1, person2, {"weight": tags_together_dict[person1][person2]}]))

So now that we have this, let us try to vizualize the whole network of people.

S = networkx.Graph()
S.add_nodes_from([a for a in tags_together_dict])
        edge_color = numpy.linspace(0,1,len(S.edges()))
/usr/lib/python3.7/site-packages/networkx/drawing/ MatplotlibDeprecationWarning: isinstance(..., numbers.Number)
  if cb.is_numlike(alpha):


Even limiting to just characters that appear in at least 100 stories (which is between 0.1% and 0.2% of stories), there is not a lot of things that can be seen from the graph. So the next point is to also limit it to the connections, that exist in at least 100 stories.

lowest_weight = 100
S = networkx.Graph()
S.add_edges_from([a for a in all_relationships if a[2]["weight"] > lowest_weight])
        edge_color = numpy.linspace(0,1,len(S.edges()))


Looking at the graph above, there seems to be a clear devide between the Supergirl and the rest of the series. The other three series have some differences, but there seems to be a lot more connections between them. Which is interesting, it is like the audience for Supergirl is different than for the other three shows.

In the next part, I want to see which characters are the most influential.

pagerank = pandas.DataFrame.from_dict(networkx.pagerank(S, weight='weight'), orient="index", columns=["PageRank"])
pagerank.reset_index(level=0, inplace=True)
centrality = pandas.DataFrame.from_dict(networkx.degree_centrality(S), orient="index", columns=["Centrality"])
centrality.reset_index(level=0, inplace=True)
betweenes = pandas.DataFrame.from_dict(networkx.betweenness_centrality(S, weight='weight'), orient="index", columns=["Between"])
betweenes.reset_index(level=0, inplace=True)
ranking = pagerank.merge(centrality, left_on='index', right_on='index')
ranking = ranking.merge(betweenes, left_on='index', right_on='index')

The first one that I want to see is the PageRank. This is the one that takes the strength of connecting nodes in account when calculating it. So, somebody, that is not well connected, but is connected to well connected people could still have a high page rank.

In this regard, the three of the four leads are at the top (Legends of Tomorrow is the one missing). Kara is leading in this one.

ranking.sort_values("PageRank", ascending=False).head(10)
index PageRank Centrality Between
25 Kara Danvers 0.057478 0.488 0.117742
7 Barry Allen 0.050267 0.592 0.097677
1 Oliver Queen 0.048903 0.512 0.116387
0 Felicity Smoak 0.040533 0.440 0.065419
34 Alex Danvers 0.040261 0.352 0.058516
10 Cisco Ramon 0.034321 0.464 0.032194
5 Sara Lance 0.032335 0.440 0.052516
9 Caitlin Snow 0.031984 0.488 0.070516
48 Lena Luthor 0.029198 0.288 0.079419
21 Leonard Snart 0.029180 0.360 0.025613

The next one is centrality, which just means that a person has a lot of relationships with other characters. The same three people (four, since Kara is sharing her spot with Caitlin) are at the top. But people like Caitlin and Cisco rised up, while people like Felicity Smoak fell down. Some people like Lena Luthor and Alex Danvers are no longer there (both from Supergirl), while people like Mick Rory and Iris West appeared (both from Flash).

ranking.sort_values("Centrality", ascending=False).head(10)
index PageRank Centrality Between
7 Barry Allen 0.050267 0.592 0.097677
1 Oliver Queen 0.048903 0.512 0.116387
25 Kara Danvers 0.057478 0.488 0.117742
9 Caitlin Snow 0.031984 0.488 0.070516
10 Cisco Ramon 0.034321 0.464 0.032194
12 Iris West 0.028664 0.448 0.124968
0 Felicity Smoak 0.040533 0.440 0.065419
5 Sara Lance 0.032335 0.440 0.052516
21 Leonard Snart 0.029180 0.360 0.025613
23 Mick Rory 0.025047 0.360 0.030258

And the last one is betweeness. It is the bridge between different clusters and removing this node would increase the path of other people. And where Iris West don't really makes sense.

ranking.sort_values("Between", ascending=False).head(10)
index PageRank Centrality Between
12 Iris West 0.028664 0.448 0.124968
25 Kara Danvers 0.057478 0.488 0.117742
1 Oliver Queen 0.048903 0.512 0.116387
7 Barry Allen 0.050267 0.592 0.097677
48 Lena Luthor 0.029198 0.288 0.079419
9 Caitlin Snow 0.031984 0.488 0.070516
0 Felicity Smoak 0.040533 0.440 0.065419
34 Alex Danvers 0.040261 0.352 0.058516
64 James Olsen 0.015777 0.232 0.055677
5 Sara Lance 0.032335 0.440 0.052516

Because Iris does not makes much sense, I calculated the unweighted betweeness. Kara makes a lot more sense, since she is the connecting node between her Earth-38 and Earth-1. This is also why Oliver and Bary makes sense, since they were the ones usually going together with Kara.

betweenes_2 = pandas.DataFrame.from_dict(networkx.betweenness_centrality(S), orient="index", columns=["Between2"])
betweenes_2.reset_index(level=0, inplace=True)
ranking = ranking.merge(betweenes_2, left_on='index', right_on='index')
ranking.sort_values("Between2", ascending=False).head(10)
index PageRank Centrality Between Between2
25 Kara Danvers 0.057478 0.488 0.117742 0.210268
1 Oliver Queen 0.048903 0.512 0.116387 0.141382
7 Barry Allen 0.050267 0.592 0.097677 0.140181
10 Cisco Ramon 0.034321 0.464 0.032194 0.070581
34 Alex Danvers 0.040261 0.352 0.058516 0.069021
5 Sara Lance 0.032335 0.440 0.052516 0.065604
0 Felicity Smoak 0.040533 0.440 0.065419 0.062297
9 Caitlin Snow 0.031984 0.488 0.070516 0.061614
12 Iris West 0.028664 0.448 0.124968 0.058943
48 Lena Luthor 0.029198 0.288 0.079419 0.045495

Next, I wanted to try if the algoritm could find any communities in the data. And looking at the picture below, it did a lot better job than expected. Most people got correctly assigned to the series they appear in.

partition = community.best_partition(S, weight="weight")
size = (len(set(partition.values())))
pos = networkx.spring_layout(S)
count = 0
colors = [cm.jet(x) for x in numpy.linspace(0, 1, size)]
labels = {node[0]: node[0] for node in S.nodes(data=True)}
for com in set(partition.values()):
    list_nodes = [nodes for nodes in partition.keys()
                                if partition[nodes] == com]
    networkx.draw_networkx_nodes(S, pos, list_nodes, node_size = 2000, node_color=colors[count])
    networkx.draw_networkx_labels(S, pos, labels, font_size=20, alpha=0.8)
    count = count + 1
networkx.draw_networkx_edges(S, pos)
'c' argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with 'x' & 'y'.  Please use a 2-D array with a single row if you really want to specify the same RGB or RGBA value for all points.
'c' argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with 'x' & 'y'.  Please use a 2-D array with a single row if you really want to specify the same RGB or RGBA value for all points.
'c' argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with 'x' & 'y'.  Please use a 2-D array with a single row if you really want to specify the same RGB or RGBA value for all points.
'c' argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with 'x' & 'y'.  Please use a 2-D array with a single row if you really want to specify the same RGB or RGBA value for all points.
/usr/lib/python3.7/site-packages/networkx/drawing/ MatplotlibDeprecationWarning: isinstance(..., numbers.Number)
  if cb.is_numlike(alpha):

<matplotlib.collections.LineCollection at 0x7f0495492470>


Here is the function, to save the graph above in the JSON file for vizualization on the website.

all_characters_as_numbers = dict()
for i, name in enumerate(S.nodes()):
    all_characters_as_numbers[name] = i
nodes = [{'name': all_characters_as_numbers[i], 'label': i, 'fandom': str(partition[i])} for i in S.nodes()]
links = [{'source': all_characters_as_numbers[u[0]], 'target': all_characters_as_numbers[u[1]]} for u in S.edges()]
with open('graph2.json', 'w') as f:
    json.dump({'nodes': nodes, 'links': links},
              f, indent=4,)

Now I wanted to try and vizualize the different communities. I put a higher limit on it, so people are only present, if they appear in at least 500 stories. This makes graphs more understandable.

lowest_weight_subgraph = 500
nodes_arrow = {item: group for item, group in partition.items() if group == 0}
S = networkx.Graph()
S.add_edges_from([a for a in all_relationships if a[1] in nodes_arrow and a[0] in nodes_arrow and a[2]['weight'] > lowest_weight_subgraph])
partition_arrow = community.best_partition(S, weight="weight")
size = (len(set(partition_arrow.values())))
pos = networkx.spring_layout(S)
count = 0
colors = [cm.jet(x) for x in numpy.linspace(0, 1, size)]
labels = {node[0]: node[0] for node in S.nodes(data=True)}
for com in set(partition_arrow.values()):
    list_nodes = [nodes for nodes in partition_arrow.keys()
                                if partition_arrow[nodes] == com]
    networkx.draw_networkx_nodes(S, pos, list_nodes, node_size = 2000, node_color=colors[count])
    networkx.draw_networkx_labels(S, pos, labels, font_size=20, alpha=0.8)
    count = count + 1
networkx.draw_networkx_edges(S, pos)
'c' argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with 'x' & 'y'.  Please use a 2-D array with a single row if you really want to specify the same RGB or RGBA value for all points.
'c' argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with 'x' & 'y'.  Please use a 2-D array with a single row if you really want to specify the same RGB or RGBA value for all points.

<matplotlib.collections.LineCollection at 0x7f049743ba90>


nodes_lot = {item: group for item, group in partition.items() if group == 1}
S = networkx.Graph()
S.add_edges_from([a for a in all_relationships if a[1] in nodes_lot and a[0] in nodes_lot and a[2]["weight"] > lowest_weight_subgraph])
partition_lot = community.best_partition(S, weight="weight")
size = (len(set(partition_lot.values())))
pos = networkx.spring_layout(S)
count = 0
colors = [cm.jet(x) for x in numpy.linspace(0, 1, size)]
labels = {node[0]: node[0] for node in S.nodes(data=True)}
for com in set(partition_lot.values()):
    list_nodes = [nodes for nodes in partition_lot.keys()
                                if partition_lot[nodes] == com]
    networkx.draw_networkx_nodes(S, pos, list_nodes, node_size = 2000, node_color=colors[count])
    networkx.draw_networkx_labels(S, pos, labels, font_size=20, alpha=0.8)
    count = count + 1
networkx.draw_networkx_edges(S, pos)
'c' argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with 'x' & 'y'.  Please use a 2-D array with a single row if you really want to specify the same RGB or RGBA value for all points.
'c' argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with 'x' & 'y'.  Please use a 2-D array with a single row if you really want to specify the same RGB or RGBA value for all points.

<matplotlib.collections.LineCollection at 0x7f049772beb8>


nodes_flash = {item: group for item, group in partition.items() if group == 2}
S = networkx.Graph()
S.add_edges_from([a for a in all_relationships if a[1] in nodes_flash and a[0] in nodes_flash and a[2]["weight"] > lowest_weight_subgraph])
partition_flash = community.best_partition(S, weight="weight")
size = (len(set(partition_flash.values())))
pos = networkx.spring_layout(S)
count = 0
colors = [cm.jet(x) for x in numpy.linspace(0, 1, size)]
labels = {node[0]: node[0] for node in S.nodes(data=True)}
for com in set(partition_flash.values()):
    list_nodes = [nodes for nodes in partition_flash.keys()
                                if partition_flash[nodes] == com]
    networkx.draw_networkx_nodes(S, pos, list_nodes, node_size = 2000, node_color=colors[count])
    networkx.draw_networkx_labels(S, pos, labels, font_size=20, alpha=0.8)
    count = count + 1
networkx.draw_networkx_edges(S, pos)
'c' argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with 'x' & 'y'.  Please use a 2-D array with a single row if you really want to specify the same RGB or RGBA value for all points.

<matplotlib.collections.LineCollection at 0x7f0494829cf8>


nodes_supergirl = {item: group for item, group in partition.items() if group == 3}
S = networkx.Graph()
S.add_edges_from([a for a in all_relationships if a[1] in nodes_supergirl and a[0] in nodes_supergirl and a[2]["weight"] > lowest_weight_subgraph])
partition_supergirl = community.best_partition(S, weight="weight")
size = (len(set(partition_supergirl.values())))
pos = networkx.spring_layout(S)
count = 0
colors = [cm.jet(x) for x in numpy.linspace(0, 1, size)]
labels = {node[0]: node[0] for node in S.nodes(data=True)}
for com in set(partition_supergirl.values()):
    list_nodes = [nodes for nodes in partition_supergirl.keys()
                                if partition_supergirl[nodes] == com]
    networkx.draw_networkx_nodes(S, pos, list_nodes, node_size = 2000, node_color=colors[count])
    networkx.draw_networkx_labels(S, pos, labels, font_size=20, alpha=0.8)
    count = count + 1
networkx.draw_networkx_edges(S, pos)
'c' argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with 'x' & 'y'.  Please use a 2-D array with a single row if you really want to specify the same RGB or RGBA value for all points.
'c' argument looks like a single numeric RGB or RGBA sequence, which should be avoided as value-mapping will have precedence in case its length matches with 'x' & 'y'.  Please use a 2-D array with a single row if you really want to specify the same RGB or RGBA value for all points.

<matplotlib.collections.LineCollection at 0x7f0494787ef0>


For the end, I played a little with vizualization for Javascript, mostly to be able to eventually put the vizualization directlly in the blog, as a part of story telling. Here is the code and current result.

<div id="d3-example"></div>
.node {stroke: #fff; stroke-width: 1.5px;}
.link {stroke: #999; stroke-opacity: .6;}
// We load the d3.js library from the Web.
    {d3: ""}});
require(["d3"], function(d3) {
  // The code in this block is executed when the
  // d3.js library has been loaded.

  // First, we specify the size of the canvas
  // containing the visualization (size of the
  // <div> element).
  var width = 500, height = 500;

  // We create a color scale.
  var color = d3.scale.category10();

  // We create a force-directed dynamic graph layout.
  var force = d3.layout.force()
    .size([width, height]);

  // In the <div> element, we create a <svg> graphic
  // that will contain our interactive visualization.
  var svg ="#d3-example").select("svg")
  if (svg.empty()) {
    svg ="#d3-example").append("svg")
          .attr("width", width)
          .attr("height", height);

  // We load the JSON file.
  d3.json("graph2.json", function(error, graph) {
    // In this block, the file has been loaded
    // and the 'graph' object contains our graph.

    // We load the nodes and links in the
    // force-directed graph.

    // We create a <line> SVG element for each link
    // in the graph.
    var link = svg.selectAll(".link")
      .attr("class", "link");

    // We create a <circle> SVG element for each node
    // in the graph, and we specify a few attributes.
    var node = svg.selectAll(".node")
      .attr("r", 5)  // radius
      .style("fill", function(d) {
         // The node color depends on the club.
         return color(d.fandom);

    // We bind the positions of the SVG elements
    // to the positions of the dynamic force-directed
    // graph, at each time step.
    force.on("tick", function() {
      link.attr("x1", function(d){return d.source.x})
          .attr("y1", function(d){return d.source.y})
          .attr("x2", function(d){return})
          .attr("y2", function(d){return});

      node.attr("cx", function(d){return d.x})
          .attr("cy", function(d){return d.y});
<IPython.core.display.Javascript object>

My First Python Conference at PyConSK

Right now, when I am writing this, I am sitting on the bus from Bratislava to Ljubljana. This weekend, I have attended my first python conference, namely the PyConSK2019 conference. I have been hearing about how great the software developer's conferences were every single month (along with some weird stories about pants, which I am glad I did not see on this conference :) ). I have to admit, that I caved to the peer pressure and decided to check, if they were really what they were hyped to be (peer pressure is going to play some role later as well). Considering the hype, I think there should be something more? But I can understand now, why they hype them, as this one at least was great.

My preparation for this was basically asking one friend of mine, how do they look like and what would his recommendation be. According to him, there is going to be too much information to process all on the fly and that it will be a great social opportunity. Which, alright, was helpful for my expectations, but not really in what I want to do at the conference.

So, my plan for the conference was just to go there and observe this, and then the next time, I will know what to do. Spoiler alert, that is not what I ended up doing at all. I went from i-have-no-idea-why-i-am-doing-somebody-stop-me on the first day, to being surprisingly comfortable doing and socializing in the third day. I don't know if the same is true for all software development communities, but the Python community is the most accepting and open community, that I had ever been a part of. It is something, that for me is crystal clear, but maybe it does not come up enough. The point, that I will also be returning to later.

So, when I arrived there, I talked to two of the guys. I have to admit, I don't remember any of their names, though I could check one, since I attended his talk later during the conference. Sometimes, I don't know if I don't include the names because they might me more privacy conscious or because I don't remember a lot of them and I am trying to be consistent. Well, anyway, I got my first tips there. First, the description that I got as a preparation for the conference was accurate. The second was to check the Django girls tutorial, which has some easy example of very simple concepts. The same ones, that are too obvious for me to try and explain them to the novices in programming, that I am tutoring sometimes (not, that I am a very good one).

As far as the morning session goes: I will remember Arvil, in case somebody will need a site without too much complication, if they needed something more than a static website. But I (so far) don't see myself using it. I am excited now to try the MindsDB, both because the data science is something that I am interested in and become they were convincing. The space talk was nice, but there was nothing that I remember would be something, that would be helpful to remember. (I did write notes during the presentation, but I am writing the summary from my head. Just to see, what my mind found it important enough to remember).

Then it was lunch, and I was positively surprised, that there was lunch included. I ended up talking to a woman. We discussed many things, but the one that will come up later is, that this was her first tech conference, if she did not count the women in tech conference that she attended. And how these are organized, to bring the gender ratio to 50:50.

After the lunch, I attended the workshop on passwordless authentication. Where a person would get a link in their email to log in with. So we went through example, and even though it did not work on the end (firewall problems or something), the code itself seems simple enough. I already have a personal project in mind, in where I am going to try using it. The code is here (I am going to assume, that since it is on GitHub, it is alright to link to it?).

After the workshop, I went back to listening to the lectures. The GitHub bots gave me one or two potential ideas, but the workshop that happened later was more helpful for me. I think this is when they announced, that it is possible to apply for the lightning talk, because I don't remember even the main message of the next talk. There were unwanted thought in my head all the time, in which I am going to expand about in the next paragraph. I then applied for a lightning talk, but then I was still not very present for the next talk. I remember that there was some comparison with Facebook and some stock prices, but I also have no idea, what the main message was. The last one was an funny talk about the time zones. Along with a pointer to a timezone database, that I will be using in the future. It seems like a fun database to play with (not to mention, with historical data, there are a lot of thing that I can visualize at least).

Now, what did I mean with the unwanted thoughts in the previous paragraph. There were a couple of things that were maybe a bit unusual. So, back in Ljubljana, I have this friend, that I had known for years. One conversation with him eventually lead to the creation of this blog years ago. And we have been recently seeing each other in the Python Meetups. So, I knew that I eventually am going to do the lightning talks and the main talks at the meetup, but I wanted to take a lot more time. So, after that main presentation, he was like giving me the comment, that the next one is going to be at the conference with the standing applause. And I was already primed for a rant because of the comment at the lunch about the women tech conferences. Not, that the person I was speaking with was making me like that, but the whole idea of equality of outcomes, which will never happen without some version of totalitarianism (which I really don't like). And then the poor organizer had to open the flood with the comment about the lightning talks. I am aware, that it was not his fault, he was just doing his job. And I remembered the lightning talk that I gave at the Python Meetup before. And then it was playing on the loop, for the entire lecture (which was 40 minutes) the voice of that friend of mine and his comments and me trying to convince myself to not do it.

Well, it did not work. Like at all. And I realized, that I will not be able to talk myself out of it. So I went to the place, where one could apply for a lightning talk, with the intention that they will talk me out of it. Well, things did not go according to plans. They were all so encouraging and nice and helpful and tried to convince me to just apply. Which was the opposite of what I wanted at that time. But there was a catch, which was that if there were too many, they will randomly choose some. So, even if I apply, there was still a chance I will not do it. It would be better, if the choice was not random, but as long as there was a chance to not do it, I will take it.

Things once again did not go according to plan. There were only four people that applied, so I did had to go to the stage. To make matter worse, the first one was a quite well prepared speech about a framework (I think it was about parallelization, I think this one: The second one was the one, where he showed us a keyboard hack. And to convince us about it, he spoke in like ten different languages. And then there was me, and a announcement for another python conference (I think it was for the one in Berlin?).

My presentation was based on my agent based model to understand the gender differences in STEM. It was about 3 minutes long and it was a rant. I think they will put the video there eventually, so I will link to it, when I will notice that it is up. And afterwards, I was mortified.

Spoilers: it did end up being a good idea.

I decided that I will not go to the party (I was still under stress from the lightning talk). I walked back with the women, whose comment helped inspire this, and then wrote an angry email to a friend, that was responsible for the voice in my head. I do regret that email a bit now, but I know that until I will not have a bad reaction to an angry email, I will keep sending them (not that I send many). So, if by any miniscule chance, that person will ever read this blog, he knows that I was talking about him. He is dangerous, but I would count him as somebody, who had a noticeable positive effect on my life (this blog would be example enough, and that was not the only thing).

So I went to sleep early, and then there was day two. I got lost during the Django talk, I think the reason was mostly that the talk was quite detailed, but I had never before even tired Django's ORM. Otherwise, I think I would have found it a lot more interesting. Then it was interactive talk about Google API's, which were fun, but I don't need them for professional project (especially since data privacy is a big thing with whom I work for) and I don't see myself using it in personal projects. Maybe in the future. The next talk was about OAuth (Slides), which is great, because I will likely have to implement some sort of authentication service, and this seems like a good way. The final decision is not mine, but the talk convinced me that for me, this become like the first approach, that I would try. Then I listened to the talk, which was presenting some weird stuff in python, that (if I understood correctly) were there for the optimization. Like changing the list inside tuple in the same line, where the error is thrown out.

I don't really remember lunch. I remember eating something, but this was it. I think I might have ended outside, with a cup of tea, on the sun?

So, after lunch I attended the workshop about teaching algorithmic efficiency, where I was clearly not the target audience, but it was a nice overview of it. Then I decided to play hooky and go on the drink with one of my sort-of former classmate. That was fun. When I came back, I listened to the talk about micro:bits (still not sure, how they are different from raspberry pi), participated in the Quiz (apparently Python was released the same year I was born) and listened to the lightning talks (this time I did not participate).

The lightning talks had a lot of announcements of the future conferences. There was a guy that is sending messages from one conference to another. There was a really interesting one about the Indie OAuth, which build on the concepts of Indie Web, which I though it was an interesting idea, when I first encountered it. I can't remember any other out of my head.

Next it was socializing. I talked to way too many people. Some for longer, some for shorter amount of time. There was a guy with existentialist problems, there was a woman would was about as old as I was, but had a resume longer than some of the 40-something people I know and more impressive than others. There was this one guy, that was interested in cognitive science and the foodie woman. There was one, who really encouraged me to learn some soft skills. Then there was this other guy, who we talked about the quality of talks at the conferences, one guy, that I only one day later I remembered, I talked to years ago (thankfully, he did not remember me either) and a nice guy with a guitar. And I probably missed a lot. (Yes, I remember other things about them, but I don't think I remember a single name).

I only came back to the hostel at three in the morning.

The next day, I decided to come a bit later, since the English program did not start until 10 o'clock. I enjoyed a good cup of tea, as I was waiting. I ended up talking to a pair of people, and ended up being late to the workshop.

The workshop itself was helpful for three reasons. It was about graph analysis, which I will have to know for my cognitive science master thesis, and I realized that most of the things that I found out on my own were the things, that are considered basics (which for me is a lot of times not true). Next, I knew that I will eventually have to deal with communities, and this was a nice intro (on Game of Thrones) and I had a small problem with a small analysis that I was doing, considering weights, that I got a good idea in how to tackle it in a new way in the US Airport example. So, overall a good workshop.

Then it was already lunch, when I ate and then spend a lot of time talking to the guy from yesterday (the one that was really interested in cognitive science).

Then I attended the GitHub Bots tutorial. They were using the Octomachinery tutorial. I had to leave before the end, since there is only one direct bus from Bratislava to Ljubljana, and I wanted to catch the direct one, since when I was buying the tickets, I was not sure how sleepy I will be. Well, instead of sleep, I have been writing this blog for the last 2 hours and a half. But I think I am at the end.

Overall impression (which I know I should wait to process everything, but whatever), is that these events are great and now I want to attend more of them. I really need to finish with my two master thesis, so that I could actually get a 'real' job and my wallet will not mind, if I just go to as many of these are I have the time and energy :).

Goals and Measurements

There is an interesting phenomena, when it comes to the goal setting and measurement. In economics, it is called the Goodhard's Law, but it also exists in the artificial intelligence and I have heard anecdotally, that it also exists in the hiring. Basically, what it means is, that when some statistical regularity is used as a target, then this target can break this regularity. It means, that just because something worked in the past, if we take it as a goal, it stops being a good predictor.

Lets take an very cliche and cartoony example from the personal health. But I think it is really understandable for explaining the principle. I don't know a lot about health studies, but I think there is a connection between health and not being overweight. Otherwise, they would not talk about obesity epidemic and be all panicky about children being more obese, because it is bad for their health? I mean, it is an assumption, but if it is wrong, then a lot of discourse in media is just... misleading :). So, lets now say, that because we want to be more healthy, the weight will be the crude measurement of that (I am from social sciences, so I am very ok with crude measurements). So we start dieting, and at one point, the people send us to the hospital, because we can't function normally anymore - which can be the effect of severe anorexia.

I guess I could say, that the very act of measurement can change the phenomena itself. I could bring quantum physics here, but I think that then I would only prove, that I don't know much about it. I can see this with myself. The very act of me putting every scientific article that I read in my bib file makes me more likely to try and finish the article, even if it is not that interesting. And that is without me checking how many of them do I read very frequently. I think the first time that I checked was for the New Year, and that was after more than a year of starting collecting. And I have not checked since. The books are the same, but there the effect is a bit different. Now, I am slightly more favoring the short books, than the long ones.

Well, reading a lot of books, short or long, and finishing more scientific articles, whenever they are relevant or not, is not a bad. But I am not checking constantly. Imagine how things change for people, that keep track of more aspects of their life. Quantified self movement comes to mind. How are these very subtle effect coming together to shape their lives?

Not to mention, that this is sort of done to us on a regular basis. Lanier in the book Ten arguments for Deleting Your Social Media Accounts Right Now calls them BUMMER (Behaviours of Users Modified, and Made into an Empire for Rent), which are the social media and search engines and personalized advertising and so on (the book itself is quite good, I can recommend it). How their algorithms collect the information about us, and then they use this as a target to predict our behaviour. Can you see the problem, in relation to the problem defined above?

What does this means for us? It means, that just because something can be measured, that does not necessary means, that it is a good goal. That we need to be more careful about how do we measure the goals for ourselves. But it also gives us hope. One thing that economics figured out, that when the governments were using inflation as a target, the Goodhart's Law kicked in and it became useless for prediction. But once it stopped being the measurement, it again become the good predictor. So, on the short term, I don't think it is that bad to use measurements like that. Lets say that one wants to be healthy. First one could try losing weight (if overweight), then start exercise more frequently and so on. Each measurement would only be under this effect for a short amount of time.

I guess even better would be to use multiple ones, since it is hard to go too much in one direction with multiple of them. It there really needs to be a measurement, then they could be combined, but summing them (better with regards to avoid overlearning) or as some sort of principle-component index (because not everything is a good indicator).

The best, but probably not so easy way, would be to use all the evidence and have a more holistic way of looking at the situation. Not everything can be measured and maybe our long-term improvement in live needs a bit more quantitative approach, not just qualitative.