APIs#

  • Application Programming Interface

  • A standardised interface for access to:

    • data and transfers,

    • programs and services,

    • general communication between apps/programs.

  • The “hidden” version of a User Interface letting computers and programs communicate.

  • Can be limited (by owners) on:

    • number of requests per time unit,

    • access codes/credentials.

Web APIs#

  • Hyper Text Transfer Protocol (HTTP) based queries and answers using POST or GET methods.

  • Each API has its own hierachy and possibilites for querying.

  • URLs used for querying using the GET method typically consist of:

    • a server address: http://api.openweathermap.org,

    • a hierarchy with descriptive names: /data/2.5/forecast, and

    • a question mark marking the beginning of user supplied named variables with
      contents joined by ampersands: ?q=London&appid=MY_API_KEY

  • Variations in naming include data / api, q / query.

  • Some APIs use time-limited access tokens, e.g., see BarentsWatch tutorial and GitHub example:

    • First POST to the API includes a client ID and client “secret”.

    • A token (temporary password) is returned and can be used for subsequent requests (typically expiring after 3600 s).

  • APIs are not eternal.

    • Formats are changed over time.

    • Sometimes different formatting can be accessed by the 1.0/1.1, etc. part of the URL.

JSON#

  • The query can be a JSON string, e.g., {‘city’: ‘London’, ‘year’: ‘2000’} which can be sent separately,
    see example below using POST.

  • Also the returned contents are often JSON formatted.

Example with JSON query and JSON-stat return#

  • Statistics Norway (SSB)

  • Traffic accident data

from pyjstat import pyjstat
import requests

# API for Statistics Norway, table of traffic accidents
POST_URL = 'https://data.ssb.no/api/v0/en/table/06794'
# Paste the URL into a browser to see all the options

# The payload is the JSON-stat dataset identifier
payload = { "query": [{ "code": "Skadegrad", "selection": { "filter": "item", "values": [ "01", "20", "02", "04", "05" ] } }, 
                      { "code": "Kjonn", "selection": { "filter": "item", "values": [ "1", "2" ] } }, 
                      { "code": "Trafikkantgruppe", "selection": { "filter": "item", "values": [ "1", "2", "3", "7", "8" ] } }, 
                      { "code": "ContentsCode", "selection": { "filter": "item", "values": [ "SkaddDrept" ] } }, 
                      { "code": "Tid", "selection": { "filter": "item", "values": [ "1999M01", "1999M02", "1999M03", "2024M06", "2024M07" ] } } 
                      ], 
                      "response": { "format": "json-stat2" } }

result = requests.post(POST_URL, json = payload)
print(result) # 200 = OK
<Response [200]>
# Extract DataFrame from JSON-stat
dataset = pyjstat.Dataset.read(result.text)
df = dataset.write('dataframe')
print(df.shape)
df.head()
(250, 6)
degree of damage sex group of road user contents month value
0 Killed Females Drivers of car Persons killed or injured 1999M01 2
1 Killed Females Drivers of car Persons killed or injured 1999M02 3
2 Killed Females Drivers of car Persons killed or injured 1999M03 1
3 Killed Females Drivers of car Persons killed or injured 2024M06 0
4 Killed Females Drivers of car Persons killed or injured 2024M07 1
# New payload with less restrictions
payload = { "query": [ { "code": "Skadegrad", "selection": { "filter": "all", "values": [ "*" ] } }, 
                      { "code": "Kjonn", "selection": { "filter": "all", "values": [ "*" ] } }, 
                      { "code": "Trafikkantgruppe", "selection": { "filter": "all", "values": [ "*" ] } }, 
                      { "code": "ContentsCode", "selection": { "filter": "all", "values": [ "*" ] } }, 
                      { "code": "Tid", "selection": { "filter": "all", "values": [ "*" ] } } 
                      ], 
                      "response": { "format": "json-stat2" } }

result = requests.post(POST_URL, json = payload)
print(result) # 200 = OK
<Response [200]>
dataset = pyjstat.Dataset.read(result.text)
df_all = dataset.write('dataframe')
print(df_all.shape)
df_all.head()
(29760, 6)
degree of damage sex group of road user contents month value
0 Killed Females Drivers of car Persons killed or injured 1999M01 2
1 Killed Females Drivers of car Persons killed or injured 1999M02 3
2 Killed Females Drivers of car Persons killed or injured 1999M03 1
3 Killed Females Drivers of car Persons killed or injured 1999M04 3
4 Killed Females Drivers of car Persons killed or injured 1999M05 3

Exercise#

  • Visit Statistics Norway’s Ready-made datasets.

  • Select a different dataset, download through the API and inspect the results.