One thing, that I also will sort of need to do for my master thesis, is to get personality out of texts. There are different ways of doing this. In science, the most popular one is the LIWC, followed by models created from texts with MBTI types.
What I wanted to tested here was the IBM's model. I took the users, that had more than half of postings in one specific language, then I filtered out the users, which did not have a lot of posts and out of them picked the 15 most popular language.
The personality data is also added in the folder personality in the GitHub repository.
So here is first the graph of different programming languages and their personality. I decided to use the raw scores instead of the percentiles, so the differences are not as big.
I also then used this personality and tried to see, if I could cluster the languages. I got the picture below: