OAuth 2.0 in R

April 2019 · 7 minute read

Many APIs require some form of authentication. A very common form used with many cloud providers and commercial APIs is OAuth 2.0. In this post, I want to give you an introduction to how OAuth 2.0 works and use it to authenticate with Microsoft Azure services.

OAuth 2.0 - Overview

The OAuth 2.0 specification is not what you would commonly describe as ‘easy to understand’. Eran Hammer, the former lead author for the OAuth 2.0 project actually resigned and removed his name from the specification. He even wrote a Medium post titled ‘OAuth 2.0 and the Road to Hell’. His core concern is that OAuth 2.0 might be too complex for most developers to implement securely. However OAuth 2.0 is more or less the de facto authentication standard and used by Google, Facebook and Microsoft to secure access to their APIs, so we have to learn to work with it to a certain extend.

OAuth is not to be confused with OpenID. “OpenID is about authentication (i.e. I can identify myself with an url) whereas OAuth is about authorization (i.e. I can grant permission to access my data on some website to another website, without providing this website the authentication information for the original website)”.

So how does OAuth 2.0 work conceptually? At its core, OAuth defines four roles:

  1. resource owner: An entity who can grant access to a protected resource (often the end-user)
  2. resource server: The server hosting the protected resource
  3. client: An app that wants to access protected resources with authorization from the resource owner
  4. authorization server: The server issuing access tokens to client after resource owner is successfully authenticated and authorization is obtained.

The protocol flow looks like this (taken from ietf.org):

 +--------+                               +---------------+
 |        |--(A)- Authorization Request ->|   Resource    |
 |        |                               |     Owner     |
 |        |<-(B)-- Authorization Grant ---|               |
 |        |                               +---------------+
 |        |
 |        |                               +---------------+
 |        |--(C)-- Authorization Grant -->| Authorization |
 | Client |                               |     Server    |
 |        |<-(D)----- Access Token -------|               |
 |        |                               +---------------+
 |        |
 |        |                               +---------------+
 |        |--(E)----- Access Token ------>|    Resource   |
 |        |                               |     Server    |
 |        |<-(F)--- Protected Resource ---|               |
 +--------+                               +---------------+

Note that the authorization server and the resource server do not need to be distinct servers. In practice, the authorization request is not made directly to the resource owner, but handled by the authorization server. The client gets an authorization grant which it can use to get an access token to access protected resources.

There are four different authorization grant types:

  1. authorization code,
  2. implicit,
  3. resource owner password credentials and
  4. client credentials

It is possible to define additional custom grant types as well. Let’s look at them in more detail:

Authorization Code

The client redirects the resource owner to the authorization server via its user agent (usually a browser). The authorization server then redirects the resource owner back to the client with the authorization code.

A typical call to login.microsoftonline.com might look like this:

// Line breaks for legibility only

https://login.microsoftonline.com/<tenant-id>/OAuth2/Authorize?
client_id=a0448380-c346-4f9f-b897-c18733de9394&
response_mode=query&
response_type=code&
state=12345&
redirect_uri=http%3A%2F%2Flocalhost%3A3000&
resource=https%3a%2f%2fgraph.windows.net%2f&domain_hint=live.com

The query contains the following key-value pairs:

  • client_id: the id from the client we are making the call from
  • response_mode: specifies how the token should be send back to the client
  • response_type: with the value code.
  • scope: missing in this case, but tells us which authorization we want (space delimited list of scopes, but ignored in Azure)
  • state: optional, but highly recommended to help mitigate CSRF attacks. Can be an arbitrary string. This option can also be used to put the user back to the page from where the login was made.
  • redirect_uri: optional, this is the URI where we get the return from the authorization server. If not set, the default is used. Note that the redirect_uri ‘http://localhost:3000’ has been url-encoded to ‘http%3A%2F%2Flocalhost%3A3000’.
  • resource: The App ID URI of the traget web API (secured resource)

You can find more information about OAuth2.0 code grand on Azure in the official documentation.

Let’s take an example from Microsoft’s Azure documentation:

If the user approves the client, he is redirected back to redirect_uri with the following parameters in the query string:

http://localhost:3000?code=<authorization-token>&state=<state-string-from-original-request>

Now the client uses the authorization code in a POST request to get an access token:

POST https://login.microsoftonline.com/{tenant-id}/OAuth2/Token HTTP/1.1

Content-Type: application/x-www-form-urlencoded
Content-Length: 1012

grant_type=authorization_code&code=AAABAAAAiL9Kn2Z*****L1nVMH3Z5ESiAA&redirect_uri=http%3A%2F%2Flocalhost%3A62080%2FAccount%2FSignIn&client_id=a0448380-c346-4f9f-b897-c18733de9394&client_secret=olna84E8*****goScOg%3D

Note that the client also includes a client_secret in the POST request to prove its identiy against the server before it gets the access token.

An example response might look like this:

HTTP/1.1 200 OK

{"token_type":"Bearer","expires_in":"3599","expires_on":"1432039858","not_before":"1432035958","resource":"https://management.core.windows.net/","access_token":"eyJ0eXAiOiJKV1Q****M7Cw6JWtfY2lGc5A","refresh_token":"AAABAAAAiL9Kn2Z****55j-sjnyYgAA","scope":"user_impersonation","id_token":"eyJ0eXAiOiJKV*****-drP1J3P-HnHi9Rr46kGZnukEBH4dsg"}

Using authorization code grant is quite involved and in some cases a simpler authentication is more appropriate:

Implicit

The implicit grant is an authorization flow optimized for clients implemented in a browser or any app that cannot protect a client secret. The implicit grant flow is the same as the authorization grant flow above, but instead of getting an authorization code first that is exchanged for an access token after presenting the client secret with the authorization code, the authorization server returns an access token immediately.

Note that apparently industry best practices now recommend to use the authorization code flow without a client secret instead of implicit grant flow (see here and here).

Resource Owner Password Credentials

In this case we use the resource owner credentials directly to login to a given service. Since the app obviously collects password and user name from resource owner, this grant flow should only be used when the application is trusted to a very high degree by the resource owner. However, credentials do not need to be stored as the client gets an access token from the authentication server.

Client Credentials

When the authorization scope is limited to the resources owned by the client, the client can use its client id and client secret to request an access token. This grant flow is more or less the same as the resource owner password credentials flow, except that the client is under control of the protected resources.

Access and Refresh Tokens

Access tokens allow access to protected resources and are usually short lived to minimize harm in case they are leaked. Refresh tokens on the other hand are usually long lived and can be used to request new access tokens after the old ones have expired.

You can find more details about OAuth 2.0 on the official OAuth webpage. I also found ‘OAuth 2 Simplified’ and ‘A Guide To OAuth 2.0 Grants’ by Aaron Parecki really helpful (He also gave a talk about “The vowel R”, which as an R programmer I find pretty cool:).

So, after we covered some basic theory about how OAuth 2.0 works, let’s get our hands dirty and try to use our knowledge to access APIs in Azure.

OAuth 2.0 on Azure:

To authenticate with Azure the following call is made to the /authorize endpoint (line breaks for readability only):

https://login.microsoftonline.com/common/oauth2/authorize?
response_type=code
&client_id=04b07795-8ddb-461a-bbee-02f9e1bf7b46
&redirect_uri=http://localhost:8400
&state=mkb28apjvs1tj4m8m4hb
&resource=https://management.core.windows.net/
&prompt=select_account

The client_id in the example above is from the Azure CLI client. Also note that compared to the example we had earlier we are accessing /common/ instead of /<tenant-id>/. Pasting the above URL into your browser will return a redirect url of the following form:

http://localhost:8400/?code=<your-access-token>&session_state=<session-state>

Before we start: In Azure there are two types of apps you can register: native and web apps. Only web apps have a client_secret parameter.

Let’s try to get access tokens for the Azure resource manager in R:

library(httr)

app_name = "test"
# client_id of Azure CLI
client_id = "04b07795-8ddb-461a-bbee-02f9e1bf7b46"
client_secret = NULL
resource_uri = "https://management.core.windows.net/"

# Endpoints are of the form: 
# https://login.windows.net/<common | tenant-id>/oauth2/<authorize | token>
azure_endpoint = oauth_endpoint(authorize = "https://login.windows.net/common/oauth2/authorize",
                                access = "https://login.windows.net/common/oauth2/token")

azure_app = oauth_app(
  appname = app_name,
  key = client_id,
  secret = client_secret
)

token <- oauth2.0_token(azure_endpoint, azure_app,
  user_params = list(resource = resource_uri),
  use_oob = FALSE
)

You should now be redirected to a browser window where you can sign-in to your Azure account. There is also a library called AzureAuth as part of the cloudyR project, but I like that we can cover the entire flow with httr and no further dependencies.

Further Examples & Documentation

I found the following resources super helpful as well: