Junior developer Interaction with API Documentation (API Days Speech)

Junior developer Interaction with API Documentation (API Days Speech)

I am presenting in the API Days Helsinki & North Online conference tomorrow. Here is the material, that I am going to be presenting tomorrow. If the video is ever released, I will also linked it here.

Introduction

The idea from the speech came from the conversation, that I had at work. One of my coworkers from the team noted, an interesting thing. Multiple teammates asked him about the difference between GET and POST calls to the API. In the tone that he found the question weird - like should they not already know it.

I am proud, that my team is the team that predominantly employees juniors, so I figured out, that I want to dedicate the speech to some of the things, that I noticed, while my team encountered.

Public documentation

The first point, that I would like to point out is, about the place of the documentation. In most cases, as long as it is public, it should be fine. If it is under the same domain (subdomains also count), that is an extra plus.

Once, I was looking for the API of one service. They were advertising having the API, so I figured it has to be somewhere. I was really happy, when I found the swagger file in the site map. So I looked over it and implemented the solution based on this.

Imagine my surprise, when I got back the message, that this documentation is outdated, and I should use the one accessible after I log in.

Well, except that I could not find this supposed documentation. We end up sharing credentials, in case some of us just missed it. And since it is behind the account wall, the search engines can not help with searching as well. On the end we gave up.

There was also a case, where they were advertising the API. Again, no documentation. But what I could find was, that they did have an API on the API subdomain - that they were not using for the website. I am pretty sure about it, since they were using the backend rendering.

They also had something that looked like an API Key, but it did not help me. Some of the frequent API authentication were tried - like the Authentication headers, both raw, as Bearer header, and as Basic header as raw, as username or as password, as API and key query variables and as x-token header.

I would like to say, that the situations, when people advertise the API, but do not provide the documentation as rare - but they are really now. At one point I started to write emails to each of them, asking for documentation - but I never heard back. Now I just assume, it is a very weird marketing.

If you have an API, please make sure, that the documentation is publicly available.

Don't assume all domain knowledge

This one is usually the problem only with some of the enterprise services. I have seen the documentation, where there was an endpoint to all any table in the database. With absolutely no way to get the list of all tables or with some information, what tables are available.

GraphQL

I would like to say, that the GraphQL API can sometimes hide the information, that almost ends up with the same result.

How do we get to this? Well, the reason is that in some cases the GraphQL documentation does not even provides the information about the objects and their connections, that can be accessible with the API.

I know, that there is a supposed way to get the structure by using the API - but I have yet to see it work anywhere. Maybe the problem is, that I have never see it documented correctly yet?

But honestly, most of the GraphQL calls get created by trial and error. You start with something that works. Either it is an example in the documentation, or if not there, by talking the query from the internet. Then try to change it, until you get something that more or less does what you want.

Examples here are a lot more important, then in the REST-type of APIs.

SDK vs. HTTP calls

One thing, that actually surprised me, is how SDK usually provides a more confusing experience. At least compared to the raw HTTP calls.

This shows in a preference to work on the HTTP codebase, compared to the SDK part of the codebase. I was thinking about it why. Shouldn't the SDKs be easier to deal with than the raw HTTP calls?

I think the reason is, that the HTTP calls are similar to each other. The way how to deal with URLs, with query parameters, with body, with headers is always the same.

But each SDK has a different syntax. Because of this, it is basically impossible to simply copy the solution from one part of the codebase to another part of the codebase. Instead, each integration problem needs to be tackled as it is a new problem.

Some would be surprised, how many junior developers simply start with coping the structure and code and then change it until it works. Not that non-juniors also do not copy code. But in later cases the copied code is generally not immediately seen as the copied code.

So taking into account, that they do not have a code, they can use as a base - the HTTP calls suddenly become a simpler solution.

Base URLs

Before going into a more HTTP like point, I would like to point out the problem with the base URLs. In some cases, there does not seems a better way to figure out which URL to use as the base URL then try all of them.

If there is a way, please just spell it out in a way, that people can understand. In a couple of cases, I have asked the people on which region they where? Or which plans do they have? Both some of the frequent ways to determent the base URL. And they could not tell me, since they did not know.

If you have a place, where people could check -> even if it is the URL they access the frontend of the app at, please just write this down in the documentation.

HTTP Methods

I talked in the upper part, how the HTTP methods are not usually well understood by juniors, when they join.

We generally only have authentication and read calls (because of the nature of what my team is doing), so the question is normally just between POST and GET verbs. Since these are the only ones, that we use in our codebase.

If we get with the definitions from the MDN, the definitions go like this:

GET: The GET method requests a representation of the specified resource. Requests using GET should only retrieve data.

POST: The POST method submits an entity to the specified resource, often causing a change in state or side effects on the server.

The definitions sometimes included in the documentation as also in this spirit.

Ok, then the follow up question is, why we are using the POST, if we only use the read calls. Sometimes OAuth authentication calls are not really read integrations - since they do sort of change the access (and refresh tokens) on their side.

The not really accurate description that works is generally, that GET do not have a post body, but the POST do.

I think this is a point of the documentation, that is not really needed even for all the juniors out there. It generally help with the confusion, especially since the verbs are not always used in a way, they are described in their definitions.

HTTP Codes

The next one is about the HTTP codes. Sometimes, they are added as part of the documentation - but they are usually only added, when the usage corresponds to the right HTTP codes. There are almost never added, when the HTTP codes do not correspond to the normal usage.

This means, that the juniors are extra careful for the normal cases, and not in the non-normal cases.

What do I mean with the non-normal cases. There are multiple APIs, that in case of non-authenticated calls they return the HTTP code 200 with a login page. There are also multiple cases, when non-authenticated cases return the HTTP code 400 - though a substantial fraction of these will at least provide a descriptive message in this case.

Some more weird cases for non-authenticated calls are returning HTTP codes 404 or 503 and 502.

One of the weird example, not connected with the authentication was using HTTP code 403 for the rate limit.

Of course, there are also cases, when the API always returns HTTP code 200, and then in the response body describes the problem.

If you are using the HTTP codes in some unusual ways, then please include this information in the documentation. If not, then use your judgment.

What is an OAuth

The next point, where there is potential for confusion. Working on the first OAuth integration is usually one of the steps, that the juniors need to go through. And once they do, they are a couple of confusions, that came connected to this.

The first one is, that what juniors imagine as the OAuth is explicit OAuth grant with non-permanent tokens and refresh flow. But what gets advertised in documentation at the OAuth is also password and credentials grant.

So there is always that conversation, when they see the OAuth in the documentation, but they ask about the refresh flow - this is how the conversation usually starts.

In is interesting to try and explain, why the API is using the credentials grant.

This is the description from the [OAuth site]:

The Client Credentials grant is used when applications request an access token to access their own resources, not on behalf of a user.

So I never have an answer to the questions, of why this authentication is used.

But my favorite example is one, where the documentation claimed, that the tokens are permanent, and there is no refresh flow. But one of my teammate showed with implementation, that the refresh flow work for them.

Authentication

The second to last point is about authentication of the API. There are a lot of different ways to authenticate with the API. We dealt with the OAuth authentication before, but some of the frequent ones are API key in either the header or as query parameters, or basic authentication with username and password / API key. There is also Soap, which I hope will eventually fail out of use, since it is pain in the ass.

But there are also some more unusually ones, like manipulating the personal certificate. In the case, that I am thinking of right now, there had a non-working sample code, in how this is supposed to work. Without a working code example, there is no way a junior developer can handle this without any help.

As long as the more normal ways of authentication are used, it is usually alright to not describe them in a lot of details. Even junior developers can usually figure out how to add an Authorization header or query parameters.

Changing responses

The last point is about something a bit unusual. Something, that I have noticed in the junior developers is, that they will trust the documentation more than they would trust the results of the code.

Let me explain. There were quite a lot of cases, when the response structure in the documentation and the response structure in the actual calls was not the same. Each time, when I asked them about it, the answer is usually the documentation says this.

It can lead to some interesting cases, when the API provides more data, then the documentation explains. One example was, when the API, that would return real data, suddenly started to miss the property in the results. The documentation did not provide any information, helping us explain this. So it was interesting to see people being confused by this.

Conclusion

In the end, I would like to say, that most of the documentation we deal with does not provide a lot of problems for me. And at least for smaller companies, they are usually very well written.

I am also aware, that writing a good documentation is hard work, and a lot of times it is a skill, that we just expect the people to have. But good documentation is not a simple thing to do. And I am always thankful for any decently written documentation.

But because it is both a hard and important topic, I wanted to bring some potential problems, that I noticed in my work, to the front of other peoples mind.

The Examples of Fandom in Fanfiction

I found an interesting Reddit thread titled What is something people in your fandom forget isn't actually canon but fanon?. There are a lot of examples of the fandoms, that I can use in the introduction of my master thesis. These are the things, that are considered true in the fandom community, but were never expressed in canon or there was a conflicting information in the canon presented.

Some of the examples, that are could use are Hawkeye hiding in the vents or Avengers all hanging in the Avengers tower (MCU), that Snape is not godfather of the Draco (Harry Potter) or Akefia being the other Bakura's name (Yu-Gi-Oh) or that Kirk in the new movies had the allergic reaction after McCoy dosed him (Star Trek).

Another examples of fandom and ascended canon can also be found on the TvTropes and on Fanon entry on the fanlore.

The Most Surprising Fandoms

There was a Reddit thread about most surprising fandom, that had a couple of pretty interesting examples, from politicians, to historial figures to Santa Claus to the pairing of different websites.

I could include in the master thesis the one comment about newspaper articles for Italian politicians. Mostly because this is even a thing, and also because the newspapers are 'more avtoritativen' than Reddit threads. The comment is below:

A few years ago, my country elected a coalition government made up of a far-right nationalist party and a let's say European libertarian party (that's really not what they are but I'm not gonna get into Italian politics now lmao). Anyway…. There were articles in our actual newspapers about Wattpad High School AUs shipfics of the leaders of the two parties. I recall one specifically in which Angela Merkel was the headmistress. I wish I was joking….

And for the record, neither one of them is or was at the time even remotely good-looking. Like, I can understand (kinda….) Obama/Trudeau fics, but this was……. Bad lmao

Even with this variety, there is sometimes a push for everybody to have the same head canon.

One does not Need to Open a Browser to Scrape Sites

I had an interesting conversation at work today. We create a lot of integrations, so we recently started to create metadata for them. One of them is whenever we use the API or scraping to get the data.

But for some reason, we also had additional possible types: inner api, for the scrapers, that did not need a browser, and mix for scrapers, where we collect some data with the browser and some without it.

This is not look like the data we would want to expose to the outside at all.

The problems between scrapers are generally similar (different captcha, blocked user agents, MFA, frequent unannounced changes,...), and different for the integrations using official APIs. Considering this, I wanted to know, why we are keeping track of different scrapers in this way? What is the use case?

But when I talked to the person dealing with the part of the code, as long as we did not open the browser, this was not scraping for him. Faking the HTTP calls, that the site would make with cookies was the same as using the official API with the documentation. This actually surprised me. Did that person not have enough experience with debugging integrations like that yet?

So I want to declare it here - even if the integration or script is not using the browser, it can still be scraper. It difference is whenever we are getting the data in the way the end users consume it or programs consume it. And it does not matter if this happens through browser, or through the HTTP calls.

The Importance of Type Checking in Names

Today I have come across an interesting bug regarding names. In our integrations at work, we have a check for both undefined and null in names, since it did happen in the past, that we would simply concatenate these to a string and show them. JavaScript would allow you to do this. In this case, we would check, if it is valid or not and adjust based on this.

So it was weird, when an integration, that would just use the name field from the API started to throw these errors. This was the users endpoint in question. And when I check, there seems to be a couple of names in the form of 'null lastName'.

Now, I can not assume, that these are not a valid names. I am aware I am using a lot of assumptions in what a name looks like and that there are people with the null as last name.

  • https://www.wired.com/2015/11/null/
  • https://www.kalzumeus.com/2010/06/17/falsehoods-programmers-believe-about-names/

Null would also be a legal name in Slovenia, at least according to the law about personal names. Expect if the bureaucrats would warn parents, that it could hurt the child. But even in this case, if the parents do not choose a new name, the old one stays - but the social workers are notified.

Considering the level of programming knowledge in the general population, I do doubt it would fall in their view of hurtful names.

But thankfully, our company was also using this integrations we wrote. And the problem also showed in their case. Looking over the problematic cases by hand, in all the cases in our company it was not an actual name. It seems more like the case of missing first name, that got combined with a bit unexpected results.

We already adjusted to deal with this, so it will not bring us additional warnings. But it was nice to see, that even bigger companies do still let something like that slip. And Atlassian is a bigger company than us, both by revenue and employee count.

It is also a problem, that one of my coworker would probably point out as valid case for why we should be doing more type checking in our code.

The Importance of Updating the API documentation

This week, I have been dealing with the interesting bug. In one of the third part APIs, that we integrate with, one of the endpoints started to return HTTP status code 400. But we did not change the code in the meantime, the documentation still said it should be a valid endpoint, and the other endpoints still worked.

So I wrote to the support, not expecting any answer. I mean, we just develop the integration, I am not actually a paying customer. I wrote about the problem.

I did get a response really quick. And after providing the account ID of the problematic case, they told me the actual cause.

They were creating the new endpoint for usage info (what we wanted to get) and I would need to to use this new endpoint, that she provided me. The documentation has not been updated to match it yet.

But even though the documentation was not updated yet, she was not allowed to share the information about the endpoint without an account ID. I wonder what the reason behind this was?

I mean, their account ID is just an integer. Based on the length, I would assume they are not random, so I think it would not be hard to guess a valid one in a couple of guesses. If each is done from the different email... does not seems to be done for the security reasons.

Updating the documentation was probably not done, because a lot of times this is something left for the end, and it might not get done, because something more important comes up. But still... they why control the flow of information through different channels?

I would prefer for these changes to be also reflected in the documentation by the time the old endpoint stop working. This would help to resolve a lot of problems without back-and-forth with support.

Verda knows Ard Identity [Demon Lord gets Reborn as a Typical Nobody Fanfction]

Just a short scene, that I wrote, before the 11 episode of the season came up. I was getting the feeling that everybody from the past are going to figure out Ard until the arc was done.


After Ireena and Ginny left the Verda's office, Ard let just a bit of his feeling show and sighed, before focusing again. He did not expect the last expedition to end with the war declaration. The way how he called him foolish your man was the unexpected part. Did the demon lord knew who he was? More importantly, did he knew, he was from the future? Or...

"If you want to kill yourself, I will need to stop you." said Verda from the other side of the table. "But I don't want to do it, so don't do it."

Looking at her, she was sitting upside down, looking at Ard with a big grin.

"And what reason would I have to kill myself?" I asked, since I was not sure where this conversation was going. I went through the meeting at the Mevilas' castle. There was nothing to indicate this there, right?

Verda was still grinning at me. It was clear she was waiting for something. Ard was not sure what for, so he decided to wait it out. If she wanted his reaction, then she will eventually elaborate. Her grin become even bigger.

"Considering you came here to save the past, you did not reincarnate because we lost, right?"

Ard stopped. She is not saying it, right? "I don't know what you are talking..." he automatically started saying, until he remembered why that decided to enlist in the Verda's group. And it was not just to keep Ireena and Ginny safe. "Hypothetically, lets say you are right, what are you going to do with this information?"

"I wasn't going to do anything." Verda rolled. "But if you really don't want to be found do, you should stop acting like yourself." Her eyes were glittering. "Not telling, if you will help me."

Ard nodded his head. "Take care of me." Yes, he could guess what she wanted. For somebody like Verda, having somebody like him participate in her research would probably allow he to test some things she otherwise would not. About time travel or reincarnation or any other things.

"Yeah. I got Vardy's help." Verda started to jump around the room. Ard started to get a bit worried. She is not going to intentionally revealed anything, but...

She stopped. "I was convinced, that you will refuse. Things must have changed quite a lot in the future." But her eyes were glowing even more than before. "So how much did research advanced in the future?"

Ard touched his forehead, the first signs of a headache there. This was going to be a long conversation. Let's hope Ireena and Ginny will take a long time coming back.

本: 天才王子赤字国家再生術1

私は天才王子赤字国家再生術第一が読み終わりました。前ではその本は英語で読んだ、そしてその本の内容のアニメを見ていた。だから本に何があったのにはもう知っていた。

それにしても、今でも面白い。やっぱり、そんなみたいな コメディーが好きです。ウェインは私のお気に入りの囚人公だ。賢い、なんでもできる人だ。ファンの小説にならBAMFと呼ばれるかもしれない。大好きだ。

ジワはウェインと交渉すると忘れてた。次の本はゼノが現れたのは全部のウェインのお陰で。ニニムは最後の交渉を参加しなかったのはよかった。そのお陰で第四の本は存在する。

次の本はもう買ったから、次の本は読み始めるができる。