Visualising the History of openSenseMap.org

Norwin Roosen

2023-03-08

This vignette serves as an example on data wrangling & visualization with opensensmapr, dplyr and ggplot2.

# required packages:
library(opensensmapr) # data download
library(dplyr)        # data wrangling
library(ggplot2)      # plotting
library(lubridate)    # date arithmetic
library(zoo)          # rollmean()

openSenseMap.org has grown quite a bit in the last years; it would be interesting to see how we got to the current 11448 sensor stations, split up by various attributes of the boxes.

While opensensmapr provides extensive methods of filtering boxes by attributes on the server, we do the filtering within R to save time and gain flexibility. So the first step is to retrieve all the boxes:

# if you want to see results for a specific subset of boxes,
# just specify a filter such as grouptag='ifgi' here

# boxes = osem_boxes(cache = '.')
boxes = readRDS('boxes_precomputed.rds')  # read precomputed file to save resources 

Plot count of boxes by time

By looking at the createdAt attribute of each box we know the exact time a box was registered. With this approach we have no information about boxes that were deleted in the meantime, but that’s okay for now.

…and exposure

exposure_counts = boxes %>%
  group_by(exposure) %>%
  mutate(count = row_number(createdAt))

exposure_colors = c(indoor = 'red', outdoor = 'lightgreen', mobile = 'blue', unknown = 'darkgrey')
ggplot(exposure_counts, aes(x = createdAt, y = count, colour = exposure)) +
  geom_line() +
  scale_colour_manual(values = exposure_colors) +
  xlab('Registration Date') + ylab('senseBox count')

Outdoor boxes are growing fast! We can also see the introduction of mobile sensor “stations” in 2017. While mobile boxes are still few, we can expect a quick rise in 2018 once the new senseBox MCU with GPS support is released.

Let’s have a quick summary:

exposure_counts %>%
  summarise(
    oldest = min(createdAt),
    newest = max(createdAt),
    count = max(count)
  ) %>%
  arrange(desc(count))
exposure oldest newest count
outdoor 2016-08-09 19:34:42 2023-02-28 09:47:17 8417
indoor 2018-05-10 20:14:44 2023-02-27 09:53:33 2364
mobile 2020-10-24 14:39:30 2023-02-20 16:32:48 590
unknown 2022-03-01 07:04:31 2022-03-30 11:25:43 19

…and grouptag

We can try to find out where the increases in growth came from, by analysing the box count by grouptag.

Caveats: Only a small subset of boxes has a grouptag, and we should assume that these groups are actually bigger. Also, we can see that grouptag naming is inconsistent (Luftdaten, luftdaten.info, …)

grouptag_counts = boxes %>%
  group_by(grouptag) %>%
  # only include grouptags with 8 or more members
  filter(length(grouptag) >= 8 & !is.na(grouptag)) %>%
  mutate(count = row_number(createdAt))

# helper for sorting the grouptags by boxcount
sortLvls = function(oldFactor, ascending = TRUE) {
  lvls = table(oldFactor) %>% sort(., decreasing = !ascending) %>% names()
  factor(oldFactor, levels = lvls)
}
grouptag_counts$grouptag = sortLvls(grouptag_counts$grouptag, ascending = FALSE)

ggplot(grouptag_counts, aes(x = createdAt, y = count, colour = grouptag)) +
  geom_line(aes(group = grouptag)) +
  xlab('Registration Date') + ylab('senseBox count')

grouptag_counts %>%
  summarise(
    oldest = min(createdAt),
    newest = max(createdAt),
    count = max(count)
  ) %>%
  arrange(desc(count))
grouptag oldest newest count
edu 2022-03-30 11:25:43 2023-02-28 09:47:17 431
Save Dnipro 2022-03-30 11:25:43 2022-03-30 11:25:43 354
Luftdaten 2022-03-30 11:25:43 2023-01-27 15:22:54 244
CS:iDrop 2023-01-10 10:22:33 2023-02-27 09:53:33 140
HU Explorers 2022-03-30 11:25:43 2022-12-14 10:11:34 124
#stropdeaer 2022-03-30 11:25:43 2022-03-30 11:25:43 101
321heiss 2022-06-27 14:12:25 2022-08-08 10:22:21 91
GIZ Clean Air Day Project 2022-03-30 11:25:43 2022-03-30 11:25:43 76
Captographies 2021-05-21 15:24:45 2023-01-31 12:11:49 62
Futurium 2022-03-30 11:25:43 2022-03-30 11:25:43 39
Bad_Hersfeld 2022-03-30 11:25:43 2022-03-30 11:25:43 37
TKS Bonn 2022-03-30 11:25:43 2022-03-30 11:25:43 36
kerekdomb_ 2022-03-30 11:25:43 2022-03-30 11:25:43 34
Mikroprojekt Mitmachklima 2022-03-30 11:25:43 2022-08-23 13:14:11 34
Bottrop-Feinstaub 2022-03-30 11:25:43 2022-03-30 11:25:43 33
Luchtwachters Delft 2022-03-30 11:25:43 2022-03-30 11:25:43 33
Futurium 2021 2022-03-30 11:25:43 2022-03-30 11:25:43 32
Feinstaub 2022-03-30 11:25:43 2022-08-01 16:27:10 29
luftdaten.info 2022-03-30 11:25:43 2022-03-30 11:25:43 28
ifgi 2022-03-30 11:25:43 2022-03-30 11:25:43 26
SUGUCS 2022-11-30 15:25:32 2023-01-23 13:17:54 25
cleanairfrome 2022-03-30 11:25:43 2022-05-15 21:13:30 24
freshairbromley 2022-03-30 11:25:43 2023-01-31 10:18:57 24
WAUW!denberg 2022-03-30 11:25:43 2022-03-30 11:25:43 23
Riga 2022-03-30 11:25:43 2022-03-30 11:25:43 22
bad_hersfeld 2022-03-30 11:25:43 2022-06-14 09:34:02 21
KJR-M 2022-03-30 11:25:43 2022-03-30 11:25:43 21
Mikroklima 2022-03-30 11:25:43 2022-09-05 08:38:57 21
Smart City MS 2022-03-30 11:25:43 2022-03-30 11:25:43 20
SekSeeland 2022-03-30 11:25:43 2022-03-30 11:25:43 19
Luftdaten.info 2022-03-30 11:25:43 2022-03-30 11:25:43 18
1 2022-03-30 11:25:43 2022-04-25 15:07:39 17
Apeldoorn 2022-03-30 11:25:43 2022-03-30 11:25:43 17
luftdaten 2022-03-30 11:25:43 2022-03-30 11:25:43 17
BurgerMeetnet 2022-03-30 11:25:43 2022-05-10 21:22:35 16
Haus C 2022-03-30 11:25:43 2022-03-30 11:25:43 16
AGIN 2022-11-28 17:33:12 2022-11-28 17:42:18 15
APPI 2023-01-26 13:38:22 2023-01-26 13:40:59 15
BRGL 2022-11-06 19:23:43 2022-11-06 22:08:36 15
BRGW 2022-11-02 10:28:52 2022-11-02 13:32:12 15
Burgermeetnet 2022-03-30 11:25:43 2022-03-30 11:25:43 15
HTLJ 2022-11-21 22:04:17 2022-11-21 22:05:47 15
MakeLight 2022-03-30 11:25:43 2022-03-30 11:25:43 15
MSGB 2022-11-14 09:08:57 2022-11-14 10:19:24 15
MSHO 2022-12-20 09:28:40 2022-12-20 10:01:38 15
MSIN 2022-11-21 17:02:39 2022-11-21 23:06:22 15
MSKE 2023-01-05 15:40:58 2023-01-05 15:52:02 15
PMSI 2023-01-20 14:22:03 2023-01-20 14:31:52 15
Haus B 2022-03-30 11:25:43 2022-03-30 11:25:43 14
UrbanGarden 2023-02-02 19:27:40 2023-02-18 14:50:19 14
Соседи по воздуху 2022-03-30 11:25:43 2023-01-27 09:50:43 14
PIE 2022-03-30 11:25:43 2022-03-30 11:25:43 13
RB-DSJ 2022-03-30 11:25:43 2022-03-30 11:25:43 13
co2mofetten 2022-03-30 11:25:43 2023-01-17 07:38:21 12
Sofia 2022-03-30 11:25:43 2022-03-30 11:25:43 12
Haus D 2022-03-30 11:25:43 2022-03-30 11:25:43 11
home 2022-03-30 11:25:43 2022-03-30 11:25:43 11
Netlight 2022-03-30 11:25:43 2022-03-30 11:25:43 11
#STROPDEAER 2022-03-30 11:25:43 2023-02-16 15:12:50 10
AirAberdeen 2022-03-30 11:25:43 2022-03-30 11:25:43 10
Balthasar-Neumann-Schule 1 2022-03-30 11:25:43 2022-03-30 11:25:43 10
Bestäuberprojekt 2022-03-30 11:25:43 2022-03-30 11:25:43 10
Che Aria Tira? 2022-03-30 11:25:43 2022-03-30 11:25:43 10
dwih-sp 2022-03-30 11:25:43 2022-03-30 11:25:43 10
esri-de 2022-03-30 11:25:43 2022-03-30 11:25:43 10
HBG Bonn 2022-03-30 11:25:43 2022-03-30 11:25:43 10
IntegrA 2022-03-30 11:25:43 2022-03-30 11:25:43 10
makerspace-partheland 2022-03-30 11:25:43 2023-02-20 18:34:50 10
Mikroklima H 2022-05-07 17:29:00 2022-05-07 17:47:42 10
montorioveronese.it 2022-03-30 11:25:43 2022-12-29 07:45:57 10
ATSO 2022-03-30 11:25:43 2022-03-30 11:25:43 9
clevermint 2022-03-30 11:25:43 2022-03-30 11:25:43 9
Fläming 2022-08-15 19:16:48 2022-12-13 06:29:22 9
Mikroklima C-R 2022-03-30 11:25:43 2022-03-30 11:25:43 9
Ostroda 2022-03-30 11:25:43 2022-03-30 11:25:43 9
RSS 2022-03-30 11:25:43 2022-03-30 11:25:43 9
test 2022-03-30 11:25:43 2022-12-18 22:20:34 9
2 2022-03-30 11:25:43 2023-01-07 15:44:29 8
Data4City 2022-03-30 11:25:43 2022-03-30 11:25:43 8
DBDS 2022-03-30 11:25:43 2022-03-30 11:25:43 8
IKG 2022-03-30 11:25:43 2022-03-30 11:25:43 8
IVKOWeek 2022-03-30 11:25:43 2022-07-05 09:42:31 8
Koerber-Stiftung 2022-03-30 11:25:43 2022-03-30 11:25:43 8
M7 2022-03-30 11:25:43 2022-11-28 13:00:44 8
Natlab Ökologie 2022-03-30 11:25:43 2022-03-30 11:25:43 8
PGKN 2022-03-30 11:25:43 2022-03-30 11:25:43 8
Raumanmeri 2022-03-30 11:25:43 2022-03-30 11:25:43 8
stw 2022-03-30 11:25:43 2022-03-30 11:25:43 8

Plot rate of growth and inactivity per week

First we group the boxes by createdAt into bins of one week:

bins = 'week'
mvavg_bins = 6

growth = boxes %>%
  mutate(week = cut(as.Date(createdAt), breaks = bins)) %>%
  group_by(week) %>%
  summarize(count = length(week)) %>%
  mutate(event = 'registered')

We can do the same for updatedAt, which informs us about the last change to a box, including uploaded measurements. This method of determining inactive boxes is fairly inaccurate and should be considered an approximation, because we have no information about intermediate inactive phases. Also deleted boxes would probably have a big impact here.

inactive = boxes %>%
  # remove boxes that were updated in the last two days,
  # b/c any box becomes inactive at some point by definition of updatedAt
  filter(updatedAt < now() - days(2)) %>%
  mutate(week = cut(as.Date(updatedAt), breaks = bins)) %>%
  group_by(week) %>%
  summarize(count = length(week)) %>%
  mutate(event = 'inactive')

Now we can combine both datasets for plotting:

boxes_by_date = bind_rows(growth, inactive) %>% group_by(event)

ggplot(boxes_by_date, aes(x = as.Date(week), colour = event)) +
  xlab('Time') + ylab(paste('rate per ', bins)) +
  scale_x_date(date_breaks="years", date_labels="%Y") +
  scale_colour_manual(values = c(registered = 'lightgreen', inactive = 'grey')) +
  geom_point(aes(y = count), size = 0.5) +
  # moving average, make first and last value NA (to ensure identical length of vectors)
  geom_line(aes(y = rollmean(count, mvavg_bins, fill = list(NA, NULL, NA))))

We see a sudden rise in early 2017, which lines up with the fast growing grouptag Luftdaten. This was enabled by an integration of openSenseMap.org into the firmware of the air quality monitoring project luftdaten.info. The dips in mid 2017 and early 2018 could possibly be explained by production/delivery issues of the senseBox hardware, but I have no data on the exact time frames to verify.

Plot duration of boxes being active

While we are looking at createdAt and updatedAt, we can also extract the duration of activity of each box, and look at metrics by exposure and grouptag once more:

…by exposure

duration = boxes %>%
  group_by(exposure) %>%
  filter(!is.na(updatedAt)) %>%
  mutate(duration = difftime(updatedAt, createdAt, units='days'))

ggplot(duration, aes(x = exposure, y = duration)) +
  geom_boxplot() +
  coord_flip() + ylab('Duration active in Days')

The time of activity averages at only 158 days, though there are boxes with 2394 days of activity, spanning a large chunk of openSenseMap’s existence.

…by grouptag

duration = boxes %>%
  group_by(grouptag) %>%
  # only include grouptags with 8 or more members
  filter(length(grouptag) >= 8 & !is.na(grouptag) & !is.na(updatedAt)) %>%
  mutate(duration = difftime(updatedAt, createdAt, units='days'))
  
ggplot(duration, aes(x = grouptag, y = duration)) +
  geom_boxplot() +
  coord_flip() + ylab('Duration active in Days')

duration %>%
  summarize(
    duration_avg = round(mean(duration)),
    duration_min = round(min(duration)),
    duration_max = round(max(duration)),
    oldest_box = round(max(difftime(now(), createdAt, units='days')))
  ) %>%
  arrange(desc(duration_avg))
grouptag duration_avg duration_min duration_max oldest_box
Ostroda 335 days 335 days 335 days 343 days
Mikroklima C-R 332 days 321 days 335 days 343 days
Apeldoorn 331 days 263 days 335 days 343 days
freshairbromley 304 days 28 days 335 days 343 days
Mikroklima 283 days 42 days 335 days 343 days
Mikroklima H 283 days 229 days 297 days 305 days
Smart City MS 272 days 0 days 335 days 343 days
Feinstaub 223 days 0 days 335 days 343 days
makerspace-partheland 217 days 0 days 335 days 343 days
co2mofetten 213 days 0 days 334 days 343 days
Luftdaten 211 days 0 days 335 days 343 days
luftdaten.info 200 days 0 days 335 days 343 days
Burgermeetnet 190 days 0 days 335 days 343 days
esri-de 190 days 0 days 335 days 343 days
#stropdeaer 187 days 0 days 335 days 343 days
Sofia 172 days 0 days 335 days 343 days
WAUW!denberg 168 days 0 days 335 days 343 days
KJR-M 167 days 0 days 335 days 343 days
IKG 163 days 0 days 335 days 343 days
AirAberdeen 155 days 0 days 335 days 343 days
M7 155 days 92 days 243 days 343 days
1 148 days 0 days 335 days 343 days
BurgerMeetnet 141 days 0 days 335 days 343 days
Luftdaten.info 139 days 0 days 335 days 343 days
Bottrop-Feinstaub 133 days 0 days 335 days 343 days
cleanairfrome 130 days 0 days 335 days 343 days
montorioveronese.it 130 days 0 days 335 days 343 days
stw 130 days 0 days 335 days 343 days
RB-DSJ 122 days 0 days 335 days 343 days
Mikroprojekt Mitmachklima 118 days 0 days 335 days 343 days
BRGL 113 days 109 days 114 days 122 days
Luchtwachters Delft 111 days 0 days 335 days 343 days
Fläming 110 days 23 days 180 days 205 days
BRGW 109 days 98 days 118 days 126 days
PIE 103 days 0 days 335 days 343 days
Riga 103 days 0 days 335 days 343 days
kerekdomb_ 100 days 0 days 335 days 343 days
luftdaten 100 days 0 days 335 days 343 days
home 95 days 0 days 335 days 343 days
Bad_Hersfeld 94 days 0 days 335 days 343 days
MSGB 94 days 50 days 106 days 114 days
dwih-sp 92 days 0 days 335 days 343 days
AGIN 91 days 87 days 92 days 100 days
HTLJ 91 days 58 days 99 days 107 days
Соседи по воздуху 87 days 0 days 335 days 343 days
bad_hersfeld 84 days 0 days 335 days 343 days
Captographies 82 days 0 days 648 days 656 days
Save Dnipro 75 days 0 days 335 days 343 days
PGKN 68 days 0 days 335 days 343 days
MSHO 61 days 36 days 70 days 78 days
Netlight 61 days 0 days 335 days 343 days
#STROPDEAER 55 days 0 days 335 days 343 days
Futurium 54 days 0 days 335 days 343 days
test 54 days 0 days 335 days 343 days
ifgi 52 days 0 days 335 days 343 days
MSIN 52 days 0 days 79 days 107 days
2 50 days 0 days 331 days 343 days
ATSO 48 days 0 days 279 days 343 days
MakeLight 47 days 0 days 335 days 343 days
Haus B 44 days 0 days 239 days 343 days
Futurium 2021 43 days 0 days 329 days 343 days
DBDS 42 days 0 days 335 days 343 days
IVKOWeek 42 days 0 days 335 days 343 days
PMSI 38 days 38 days 38 days 47 days
edu 37 days 0 days 335 days 343 days
GIZ Clean Air Day Project 37 days 0 days 335 days 343 days
TKS Bonn 32 days 0 days 335 days 343 days
HU Explorers 28 days 0 days 319 days 343 days
321heiss 24 days 0 days 43 days 254 days
SUGUCS 9 days 0 days 53 days 98 days
APPI 3 days 0 days 7 days 41 days
MSKE 3 days 0 days 7 days 62 days
RSS 3 days 0 days 28 days 343 days
CS:iDrop 2 days 0 days 36 days 57 days
UrbanGarden 2 days 0 days 16 days 34 days
Balthasar-Neumann-Schule 1 0 days 0 days 0 days 343 days
Bestäuberprojekt 0 days 0 days 0 days 343 days
Che Aria Tira? 0 days 0 days 0 days 343 days
clevermint 0 days 0 days 0 days 343 days
Data4City 0 days 0 days 0 days 343 days
Haus C 0 days 0 days 0 days 343 days
Haus D 0 days 0 days 0 days 343 days
HBG Bonn 0 days 0 days 0 days 343 days
IntegrA 0 days 0 days 0 days 343 days
Koerber-Stiftung 0 days 0 days 0 days 343 days
Natlab Ökologie 0 days 0 days 0 days 343 days
Raumanmeri 0 days 0 days 0 days 343 days
SekSeeland 0 days 0 days 0 days 343 days

The time of activity averages at only 90 days, though there are boxes with 648 days of activity, spanning a large chunk of openSenseMap’s existence.

…by year of registration

This is less useful, as older boxes are active for a longer time by definition. If you have an idea how to compensate for that, please send a Pull Request!

# NOTE: boxes older than 2016 missing due to missing updatedAt in database
duration = boxes %>%
  mutate(year = cut(as.Date(createdAt), breaks = 'year')) %>%
  group_by(year) %>%
  filter(!is.na(updatedAt)) %>%
  mutate(duration = difftime(updatedAt, createdAt, units='days'))

ggplot(duration, aes(x = substr(as.character(year), 0, 4), y = duration)) +
  geom_boxplot() +
  coord_flip() + ylab('Duration active in Days') + xlab('Year of Registration')

More Visualisations

Other visualisations come to mind, and are left as an exercise to the reader. If you implemented some, feel free to add them to this vignette via a Pull Request.