OpenMined Architecture and Application to Science Research

Toward the end of my visit in Bratislava (I cannot believe, that it has been 8 months already), one of my classmates told me about Andrew Trask and his OpenMined architecture. I don't exactly remember the context, but I think it was something about the privacy and considering the class, where it came up (Cognitive Semantics and Cognitive Theories of Representation), it probably had at least a small connection to meaning.

Seriously, you should watch his videos for explanation about it, as he does a hell of a lot better job, than I ever could. It is an architecture based on federated learning, which allows machine learning to learn on distributed data (for example using PyTorch, Keras or TensorFlow on distributed data), homomorphic encryption, which allows commutations to be done without the need to see or reveal the data (Python: python-paller, R: HomomorphicEncription) and smart contacts based on the block chain, which would allow for accountability.

But the basic idea is, that each person would be in control of their own data, and this data could then be made available to people, that want to train different models, without having to send them the data itself. Sort of in the spirit of the GDPR, but still with the ability for the companies and the people to get insights, that can come forth with the usage of big data.

This got me thinking. I don't know, how the research is done in other place, but this month it was the first time, when I was reading the agreement to participate in the study, where it was specifically mentioned, that I could ask for the data from my experiment. Before, the implicit norm was, that the data will not be given to the participant. On the other hand, there are storages, where the researches are putting the data from their studies. For example, there are places, where 1000s of fMRI data from different studies are collected.

A bit more decentralized idea would be for the participants in the study to control their own data, and then made it available in sort of the same way that openminded architecture wants to do. This way, there would not be just the data from that specific experiment, but a person that participated in one experiment could also participated in many more. From the top of my head, I took place in a little number of experiments so far.

For example, the last experiment that I participated in, I needed to learn grammatical rules of an artificially constructed language. But I had the MR picture of my head taken before, had the resting state EEG measured and had a battery of tests from memory capacity to personality measured. I am sure some of the others did as well. If the data was in the hands of the participants, they could offer it to them, and they could test many other hypotheses, without repeating the measurements.

On the other hand, I am aware of a couple of problems. Most participants, who are not the researchers themselves, are most likely not interested in the raw data collected and would probably not care about keeping it. After all, having a raw EEG data is just a series of numbers, that can be plotted, but most people probably would not know what to do with this numbers/lines. So, motivation and disinterest of the current society is most likely the main disadvantage that I see. And I can attest to this, as I recently posted my survey in the Facebook group dedicated to this, so posting and answering surveys, and I did not get a lot of answers. It is just not a priority in their life.

But it is still a nice mental experiment, of how different could things be organized.