Skip to main content

Personalities of Different Programming Languages

One thing, that I also will sort of need to do for my master thesis, is to get personality out of texts. There are different ways of doing this. In science, the most popular one is the LIWC, followed by models created from texts with MBTI types.

What I wanted to tested here was the IBM's model. I took the users, that had more than half of postings in one specific language, then I filtered out the users, which did not have a lot of posts and out of them picked the 15 most popular language.

The personality data is also added in the folder personality in the GitHub repository.

So here is first the graph of different programming languages and their personality. I decided to use the raw scores instead of the percentiles, so the differences are not as big.

I also then used this personality and tried to see, if I could cluster the languages. I got the picture below:

Some of the clusters are clear to me. Swift and Objective-C are both languages used for programming iPhone apps. On the other side, we have a C family, so C, C++ and C#. PHP, CSS and SQL are I guess the internet, but JavaScript and Ruby are put a bit separately. I don't know... since I grouped by personality, maybe I should be looking at how they work or something?

Predicting Personality with Factor Machines and Logistic Regression

This summer, I have attended the Zemanta Data Science Summer School. As the second practical project, we had to predict the click rate of advertisement. Me and two of my teammates ended up doing some light feature picking and testing different algorithms. We ended up testing five different algorithms: LightGBM, Naive Bayes, Factor machines and two different logistic regressions. I tested the VowpalWabbit logistic regression and factor machines.

Since the data, that we were using is something, that I can not publish (and probably also not any of the analysis, that we did), I decided to redo the two algorithms, that I had with a different dataset. Also, I have deleted the datasets on the last day of the summer school.

For at least two of my classes (Artificial Intelligence and Data Mining) I have picked the problem of predicting personality from text. Since I still had some data from that time lying around, I decided to simply try these algorithms also on this data.

The examples of the algorithms could be found on my GitHub.

How to Motivate People

I have a friend, who is really good at motivating people. He calls it his own distortion field.

So , a couple of months ago, I was talking to him and we came up with some of the things, that help with creating this field. In case, somebody else is interested in this, I am putting the list here:

  • Empathy
  • See the good/potential in people
  • Give people permission (to do what they want)
  • Time to think
  • Find the shortest possible path

PyConBalkan 2019

The last week, I have attended the PyConBalkan conference in Belgrade. So far, I liked all of the Python conferences, that I had attended. But this one would be the best one so far. But I can not really say why. I just know, that by the end of the third day, I was excited about everything. Maybe it was just sort of an energy? Or I got used to them, so I could enjoy them better? Because there were a lot of people that I knew? I sort of doubt, that it was the second one, because I had a bit of cold through the whole thing, so I think it was something else than me...

There was a lot of interesting talks. The talk that I remembered the most was the keynote from the last day on meritocracy, which was amazing. But there were other interesting one, like pair programming, language evolution, freelancing, article reading and others.

But as with any conference, it is was is happening in the halls and in the evenings, that is the best part. And even though nothing was 'organized' until the last day, there was still a lot of fun and high-quality hanging out that happened (at least for me).

I also had the talk and lightning talk at the conference.

I am grateful to the organizers, for organizing such a great conference, and to all of the amazing people, that I had met.

Ljubljana Python Meetup September 2019

In the September Python meetup, I had a trial for the PyConBalkan presentation. For some time, I was thinking of just ignoring it, because I was not that satisfied with the presentation. Even though the feedback that I got was relatively good.

Still, everybody should learn from their mistakes. So here are some of mine:

  • I picked the examples, that I thought will be interesting to the audience, and then I did not know what to do with them on the stage
  • I spend too much time introducing the topic instead of talking about the topic
  • Some of the buildup to my points was too long - for example the cooking example for explaining the role of preprocessing
  • Even though I had a long introduction, I did not actually explain, what I am going to be talking about

I also improvised a bit on the answering the questions, but here I am not sure what I could do better with preparation.

Here is also my presentation:

The Importance of Knowledge

When discussing educational system, I sometimes hear an interesting opinion, that I don't really understand. It is that way of thinking is much more important then facts what we learn in school, so we should replace these facts with something.

Some at least admit, that maybe facts could be important, but they all seems to degrade it.

I can see it also with other people. When some problem is discussed, then then they ask me for my opinion. Even though I clearly am the person with the least amount of knowledge in the subject.

But every year I am more convinced of the importance of knowledge. I think the first good example was the Toastmasters. The reason, why I was at one point an important member was because I knew a lot. I was one of the people that knew when am I breaking the rules and why.

They say the same thing for writing. One needs to know the rules in order to know, when to break it. I am still far from this level, but I think it is the same principle.

And it is something that I am experiencing right now. At my job, I am constantly seeing where the holes in my knowledge are. And because of that, I lack creativity and problem solving that I would have, if I already knew some stuff. Which is why I really like my job, I have the feeling like I am improving fast.

In psychology, there is actually the principle of difficulty of transference. The skills or knowledge need to be at least on the certain level, in order for transference to happen. Which is why brain games usually don't really work that well, but speaking a foreign language can protect the brain from degeneration. In order to speak the language, it needs to be on quite some level, it is not enough to just know translations of some words.

In the same way, it is a lot easier to be creative, once I have at least the adequate amount of knowledge.

And this is something that I have forgotten recently. In the recent week, I have been slowly preparing the speech for my first programming conference. Before I had a practice run at our meetup, I was trying to came up with the examples, that would be interesting to other people. But in doing this, I ended up picking some examples, where I lacked knowledge (the topic analysis of PEPs was one such example). And these examples were then the ones that had fallen flat.

Which left me with not many examples left, but at least I made this mistake in front of a bit smaller crowd, that actually knew me from before.

Another point that I would like to make is, that more knowledge somebody has, the easier it is to acquire more. For example, when I am hearing people talk about the cars or something similar, it is hard to put the new facts in the right place. This is not true for facts about personality. Even if I find out about the next system of individual differences, I can find similarities with at least some of the ones that I already know. The same with language learning. When I started to learn German, I did it by watching BBC Sherlock in German. And I needed 15 minutes to understand the first word (langweilig). On the other side, if I do it now with Japanese or German, just by watching without subtitles, I can learn many more words from context in this time.

And yes, I can understand not everybody is interested in everything. As mentioned before, I am still quite unknowledgable about cars. The same could be said about many other topics. But I try to learn about the topics that are interesting to me or they are important for my life.

Also, one way of getting procedural knowledge, which allows us to act in the world, is through declarative knowledge, so facts. The procedural knowledge is the one that allows us to cycle, talk, dance, and act in the life. I guess one could say, that living minimalistically, or being self-disciplined is also a type of procedural knowledge.

So knowledge and facts allow us to be more creative, learn quicker and live better. Do we really need another reason to not bash them?

Slovenian Cuisine (Studying effect of Preprocessing on Topic Modeling)

When I have worked on my topic modeling of the cognitive science articles, I have noticed something. By using a different algorithms on the same preprocessed data, I would get relatively similar results. But I could get a lot more interpretative results, if I simply filtered out the noise. For example, filtering out the stop words or filtering out the verbs. For some reason, by including these in, I have more problems finding meaning in the topics.

When I have been preparing the PyConBalkan speech (which will happen this Friday), I have tried to find the examples to present. One of the things, that I am interested in is cooking. And I figured that everybody eats, so the topic of food would be at least familiar to everybody.

So what I did was downloaded over 18000 recipes from one of the Slovenian recipe sites. The code that I eventually used could be found on my GitHub. I though to include them in my presentation, but when I practiced it on Python Meetup, I realized that 12 different food categories is too much. So instead, what I am going to do, it present the results here. I also rerun the analysis, while the pictures that I drew were from the first run. The results should at least be very similar, but I did not check that.

I will first present the 12 groups, that I got without preprocessing. This means, that none (actually just most) of my biases or decisions are included here. But I find these groups to be less representative. I have put down the 10 most representative words for each group, excluding punctuation and numbers. Still, feel free to peruse them.

Topic 1 Topic 2 Topic 3 Topic 4
izbiri (choice) sladkorja (sugar) bio (bio) kocki (cubes)
lastni (one own) moke (flour) le (only) marinada (marinade)
ste (are) za (for) gusto (coffee) pesta (pesto)
noč (night) mleka (milk) milfina (Milfina - brand) paličice (sticks)
agar (agar) v (in) okus (taste) pesti (fists)
vsaj (at least) masla (butter) natur (natural) pekač (baking tray)
namočeni (soaked) smetane (cream) aktiv (Aktiv - brand) bele (white)
jajc (eggs) ali (or) piranske (from Piran) kakav (cacao)
občutku (feeling) sladkor (sugar) iz (from) česnom (garlic)
podlaga (grounding) prahu (powder) soline (salters) močno (strong)
Topic 5 Topic 6 Topic 7 Topic 8
začimbe (spices) bananinega (banana) janež (anise) vodke (vodka)
kis (vinegar) grobe (rough) mandarin (mandarin) polenovke (codfish)
omaka (sauce) gre (goes) smarties (Smarties - brand) blue (blue)
sojina (soya) kruhki (canapes) francoski (French) topi (blunt)
solate (salad) nimamo (not having) luskic (little scales) losos (salmon)
zelenjava (vegetables) marsale (wine) čaj (tea) zrno (grain)
file (fillet) soka (juice) žlico (spoon) dimljen (smoked)
česen (garlic) poljuben (optional) marcipanove (marzipan) dan (day)
olje (oil) solata (salad) lan (flax) zamenjamo (exchange)
koruza (corn) ostali (other) fine (fine) trda (hard/rigid)
Topic 9 Topic 10 Topic 11 Topic 12
sol (salt) ki (which) ravna (flat) so (then)
poper (pepper) ga (him) mu (him) je (is)
česna (garlic) jo (her) kupljeno (bought) da (that)
olje (oil) kavo (coffee) polovičke (halves) led (ice)
ali (or) semen (seeds) orehovo (walnuts) pri (at)
čebula (onion) domači (homemade) žafranke (saffron) ga (him)
po (after) puding (pudding) sardelinih (anchovy) toliko (this much)
olja (oil) sami (on our own) sirova (cheese) bedra (leg)
in (and) domač (homemade) rastlinske (plants) kot (like)
sol (salt) kakija (persimmon) nescafeja (Nescafe - brand) ker (because)

One thing to keep in mind is, that if topic modeling is done without preprocessing, then some of the topics are noise. But here a lot of them seems like noise to me.

I also drew the picture of these twelve groups. See it below:

Then I did the preprocessing. Since I have now structured the model more, I might get different result. So I will first add the picture (and the rest of you can see from this, how much can very small personal decisions in filtering effect the result):

Because here are supposed to be only ingredient words, I am only going to describe each topic with 5 words:

Topic 1 Topic 2 Topic 3 Topic 4
limona (lemon) jabolko (apple) oreščki (nuts) soja (soya)
sok (juice) banana (banana) muškat (nutmeg) sezam (sesame)
sladkor (sugar) breskev (peach) ingver (ginger) buča (pumpkin)
pomaranča (orange) ananas (pineapple) klinček (clove) sončnica (sunflower)
voda (water) jogurt (yoghurt) maslo (butter) ohrovt (Brussels sprout)
Topic 5 Topic 6 Topic 7 Topic 8
sadje (fruit) marmelada (jam) sladkor (sugar) sol (salt)
vino (wine) jagoda (strawberry) jajce (egg) poper (pepper)
cimet (cinnamon) marelica (apricot) moka (flour) olje (oil)
hruška (pear) sliva (plum) mleko (milk) čebula (onion)
sladoled (icecream) borovnica (blueberry) vanilija (vanilla) česen (garlic)
Topic 9 Topic 10 Topic 11 Topic 12
liker (liqueur) med (honey) sir (cheese) voda (water)
oblat (layer/wafer) mandelj (almond) testo (dough) moka (flour)
pivo (beer) kokos (coconut) mascarpone (cheese) sol (salt)
marcipan (marzipan) kosmiči (cereal) kava (coffee) maščoba (fat)
ribez (currant) cimet (cinnamon) piškot (cookies) ajda (buckwheat)

For some reason, when I looked at the original groups, they seems to make more sense then these ones. But these still make sort of sense. At least for some of them, I can imagine how it came together? So topic 8 is probably the group of specific way of making vegetables and meats (we say it "na čebuli", which would be directly translated on onions). Topic 7 is basic baking. Topic 10 is probably breakfast. Topic 1 in juicing. Topic 5 would be Christmas, if not for ice cream. And so on.

Still, this shows that filtering can have a huge effect on the results. On the other side, I have no idea how to interpret the results that I got.

And I guess the results of the 4 group solution is only going to go into my presentation for PyConBalkan.

Creativity Test

I have found a pretty interesting creativity test on the internet. I uses words associations, and it tries to see, how disconnected are the words. It uses LSA difference between all preceding words as a measurement. And the results of this test seems to be connected to creativity.

Here you can find the test and the article describing it.

Needs in Communication

I have finally deleted my Facebook account. I have only created it, because it was the main communication channel for my cognitive science studies. And then there was always another reason, why I did not deleted, usually because this was the one way to communicate with one or two people. But now I have decided to screw everything and deleted it.

Instead, I prefer to simply meet people in person. This has been this way since the end of the primary school when the email and MSN Messenger has become popular with people in my school. I never really liked using it, and I stopped bothering when a friend of mine told me, that my personality changed when using it.

In reality, I have never experienced communication through these medium as more positive than face-to-face communication. At first I though it was because of the content and quality. There are many place on the internet, that has the chat room problem. But eventually I figured out that this can only be part of the explanation. The best readings on the internet were about on the same level than the average conversation.

Even if I only take the content that is not on social media, but on the websites owned by people (check IndieWeb, if you want to know more), the feeling was better than social media, but worse then real life.

This can also be seen in the other things. I feel better, after reading a book for two hours, then reading the internet articles for two hours. In the first place, I feel like I either enjoyed the story or actually learned something new or started to think differently about something. And it happens almost every single time. On the internet, this feeling is a lot less frequent and in majority of cases. Is it because I am choosing the books myself, instead of the algorithms? Or it is because I can be more focused on the ideas and go deeper into them? I actually don't know.

On the other hand, I am still writing this blog, and I do still sometimes read blogs. Unlike the social media, which I get bored in 5 minutes of spending time on them (which is why I ended up deleting the last one). But now thinking about it, maybe I should start changing this as well. But some of the things I am interested in - like minimalism - it is hard to find the books for here in Slovenia. And if I want to order them from abroad, then I need to find out about them from somewhere. I maybe I am way too much of a thinker instead of a doer, but even that is slowly changing to a more doing direction. And eventually, I will not need a constant support of other people's ideas - or at least not that much.

Plus, most of the blogs of people that I know and follow don't actually publish much. So even if I would check once-per-month, I still would not get a lot of reading material.

Because the personality did not give me the definitive answer either. I though that maybe the low extroversion, low agreeableness and maybe high openness could explain that. As low extroverted person, I don't get as many positive emotions out of the social status. So there is less emotional reaction on likes and readers, as least in theory. Which makes me worried for all the people, that feel this more acutely than me. And low agreeableness make me less interested in people than acreage, and most of the content there was what people were doing. And maybe openness wanted something more unusual? I don't know.

On a little sidetrack, most studies done in the first years of social media showed, that openness predicted the activity of people. So the higher the openness, the more likely they were active on social media. Now I have a hypothesis, that the trend is changing. That people with more openness are the ones more likely to leave it behind.

Then I have recently started going into the non-violent communication. One of the important concepts there was to understand and then express one own needs, without expectation that they must be addressed. But here is the problem, I am using other people as sort of exploration of the opportunity space, so I am not sure what my needs are in this case. Or maybe my need is to be pushed to do something more? But then, I also enjoy conversations, where I don't get to express it.

This reminds me of a exercise, that we needed to do in the university. I remember writing, that I am afraid of asking people for things, because I was afraid, they will say yes, without wanting to. This might be a consequence of reading to many marketing and selling books before university. Inside of them, they hammer on the point, that people generally don't like saying no. This is why I really like being in the company of people, that I can trust will tell me to go fuck myself, when they have enough of me.

But that still leaves me with the problem of what need am I addressing with face to face communication, that is not addressed with the other forms of communication? Is it simply that we evolved to be social? Do I prefer the synchronicity of it? Do I want to feel heard? What do I want from it? I don't think it is safety, or being heard or respect. But knowing what it is not does not make it easier to realize what it is.

Which might be a reason, why I have problem with directing the social energy intentionally. Since I can't conceptualize what I really want from it. I just know, that whatever it is, I can't get it from the internet or SMS-ing or anything other similar, and these things should just stay for information sharing.