This vignette serves as an example on data wrangling & visualization with
opensensmapr,dplyrandggplot2.
# required packages:
library(opensensmapr) # data download
library(dplyr)        # data wrangling
library(ggplot2)      # plotting
library(lubridate)    # date arithmetic
library(zoo)          # rollmean()openSenseMap.org has grown quite a bit in the last years; it would be interesting to see how we got to the current 11448 sensor stations, split up by various attributes of the boxes.
While opensensmapr provides extensive methods of
filtering boxes by attributes on the server, we do the filtering within
R to save time and gain flexibility. So the first step is to retrieve
all the boxes:
# if you want to see results for a specific subset of boxes,
# just specify a filter such as grouptag='ifgi' here
# boxes = osem_boxes(cache = '.')
boxes = readRDS('boxes_precomputed.rds')  # read precomputed file to save resources By looking at the createdAt attribute of each box we
know the exact time a box was registered. With this approach we have no
information about boxes that were deleted in the meantime, but that’s
okay for now.
exposure_counts = boxes %>%
  group_by(exposure) %>%
  mutate(count = row_number(createdAt))
exposure_colors = c(indoor = 'red', outdoor = 'lightgreen', mobile = 'blue', unknown = 'darkgrey')
ggplot(exposure_counts, aes(x = createdAt, y = count, colour = exposure)) +
  geom_line() +
  scale_colour_manual(values = exposure_colors) +
  xlab('Registration Date') + ylab('senseBox count')Outdoor boxes are growing fast! We can also see the
introduction of mobile sensor “stations” in 2017. While
mobile boxes are still few, we can expect a quick rise in 2018 once the
new senseBox MCU with GPS support is released.
Let’s have a quick summary:
exposure_counts %>%
  summarise(
    oldest = min(createdAt),
    newest = max(createdAt),
    count = max(count)
  ) %>%
  arrange(desc(count))| exposure | oldest | newest | count | 
|---|---|---|---|
| outdoor | 2016-08-09 19:34:42 | 2023-02-28 09:47:17 | 8417 | 
| indoor | 2018-05-10 20:14:44 | 2023-02-27 09:53:33 | 2364 | 
| mobile | 2020-10-24 14:39:30 | 2023-02-20 16:32:48 | 590 | 
| unknown | 2022-03-01 07:04:31 | 2022-03-30 11:25:43 | 19 | 
We can try to find out where the increases in growth came from, by analysing the box count by grouptag.
Caveats: Only a small subset of boxes has a grouptag, and we should
assume that these groups are actually bigger. Also, we can see that
grouptag naming is inconsistent (Luftdaten,
luftdaten.info, …)
grouptag_counts = boxes %>%
  group_by(grouptag) %>%
  # only include grouptags with 8 or more members
  filter(length(grouptag) >= 8 & !is.na(grouptag)) %>%
  mutate(count = row_number(createdAt))
# helper for sorting the grouptags by boxcount
sortLvls = function(oldFactor, ascending = TRUE) {
  lvls = table(oldFactor) %>% sort(., decreasing = !ascending) %>% names()
  factor(oldFactor, levels = lvls)
}
grouptag_counts$grouptag = sortLvls(grouptag_counts$grouptag, ascending = FALSE)
ggplot(grouptag_counts, aes(x = createdAt, y = count, colour = grouptag)) +
  geom_line(aes(group = grouptag)) +
  xlab('Registration Date') + ylab('senseBox count')grouptag_counts %>%
  summarise(
    oldest = min(createdAt),
    newest = max(createdAt),
    count = max(count)
  ) %>%
  arrange(desc(count))| grouptag | oldest | newest | count | 
|---|---|---|---|
| edu | 2022-03-30 11:25:43 | 2023-02-28 09:47:17 | 431 | 
| Save Dnipro | 2022-03-30 11:25:43 | 2022-03-30 11:25:43 | 354 | 
| Luftdaten | 2022-03-30 11:25:43 | 2023-01-27 15:22:54 | 244 | 
| CS:iDrop | 2023-01-10 10:22:33 | 2023-02-27 09:53:33 | 140 | 
| HU Explorers | 2022-03-30 11:25:43 | 2022-12-14 10:11:34 | 124 | 
| #stropdeaer | 2022-03-30 11:25:43 | 2022-03-30 11:25:43 | 101 | 
| 321heiss | 2022-06-27 14:12:25 | 2022-08-08 10:22:21 | 91 | 
| GIZ Clean Air Day Project | 2022-03-30 11:25:43 | 2022-03-30 11:25:43 | 76 | 
| Captographies | 2021-05-21 15:24:45 | 2023-01-31 12:11:49 | 62 | 
| Futurium | 2022-03-30 11:25:43 | 2022-03-30 11:25:43 | 39 | 
| Bad_Hersfeld | 2022-03-30 11:25:43 | 2022-03-30 11:25:43 | 37 | 
| TKS Bonn | 2022-03-30 11:25:43 | 2022-03-30 11:25:43 | 36 | 
| kerekdomb_ | 2022-03-30 11:25:43 | 2022-03-30 11:25:43 | 34 | 
| Mikroprojekt Mitmachklima | 2022-03-30 11:25:43 | 2022-08-23 13:14:11 | 34 | 
| Bottrop-Feinstaub | 2022-03-30 11:25:43 | 2022-03-30 11:25:43 | 33 | 
| Luchtwachters Delft | 2022-03-30 11:25:43 | 2022-03-30 11:25:43 | 33 | 
| Futurium 2021 | 2022-03-30 11:25:43 | 2022-03-30 11:25:43 | 32 | 
| Feinstaub | 2022-03-30 11:25:43 | 2022-08-01 16:27:10 | 29 | 
| luftdaten.info | 2022-03-30 11:25:43 | 2022-03-30 11:25:43 | 28 | 
| ifgi | 2022-03-30 11:25:43 | 2022-03-30 11:25:43 | 26 | 
| SUGUCS | 2022-11-30 15:25:32 | 2023-01-23 13:17:54 | 25 | 
| cleanairfrome | 2022-03-30 11:25:43 | 2022-05-15 21:13:30 | 24 | 
| freshairbromley | 2022-03-30 11:25:43 | 2023-01-31 10:18:57 | 24 | 
| WAUW!denberg | 2022-03-30 11:25:43 | 2022-03-30 11:25:43 | 23 | 
| Riga | 2022-03-30 11:25:43 | 2022-03-30 11:25:43 | 22 | 
| bad_hersfeld | 2022-03-30 11:25:43 | 2022-06-14 09:34:02 | 21 | 
| KJR-M | 2022-03-30 11:25:43 | 2022-03-30 11:25:43 | 21 | 
| Mikroklima | 2022-03-30 11:25:43 | 2022-09-05 08:38:57 | 21 | 
| Smart City MS | 2022-03-30 11:25:43 | 2022-03-30 11:25:43 | 20 | 
| SekSeeland | 2022-03-30 11:25:43 | 2022-03-30 11:25:43 | 19 | 
| Luftdaten.info | 2022-03-30 11:25:43 | 2022-03-30 11:25:43 | 18 | 
| 1 | 2022-03-30 11:25:43 | 2022-04-25 15:07:39 | 17 | 
| Apeldoorn | 2022-03-30 11:25:43 | 2022-03-30 11:25:43 | 17 | 
| luftdaten | 2022-03-30 11:25:43 | 2022-03-30 11:25:43 | 17 | 
| BurgerMeetnet | 2022-03-30 11:25:43 | 2022-05-10 21:22:35 | 16 | 
| Haus C | 2022-03-30 11:25:43 | 2022-03-30 11:25:43 | 16 | 
| AGIN | 2022-11-28 17:33:12 | 2022-11-28 17:42:18 | 15 | 
| APPI | 2023-01-26 13:38:22 | 2023-01-26 13:40:59 | 15 | 
| BRGL | 2022-11-06 19:23:43 | 2022-11-06 22:08:36 | 15 | 
| BRGW | 2022-11-02 10:28:52 | 2022-11-02 13:32:12 | 15 | 
| Burgermeetnet | 2022-03-30 11:25:43 | 2022-03-30 11:25:43 | 15 | 
| HTLJ | 2022-11-21 22:04:17 | 2022-11-21 22:05:47 | 15 | 
| MakeLight | 2022-03-30 11:25:43 | 2022-03-30 11:25:43 | 15 | 
| MSGB | 2022-11-14 09:08:57 | 2022-11-14 10:19:24 | 15 | 
| MSHO | 2022-12-20 09:28:40 | 2022-12-20 10:01:38 | 15 | 
| MSIN | 2022-11-21 17:02:39 | 2022-11-21 23:06:22 | 15 | 
| MSKE | 2023-01-05 15:40:58 | 2023-01-05 15:52:02 | 15 | 
| PMSI | 2023-01-20 14:22:03 | 2023-01-20 14:31:52 | 15 | 
| Haus B | 2022-03-30 11:25:43 | 2022-03-30 11:25:43 | 14 | 
| UrbanGarden | 2023-02-02 19:27:40 | 2023-02-18 14:50:19 | 14 | 
| Соседи по воздуху | 2022-03-30 11:25:43 | 2023-01-27 09:50:43 | 14 | 
| PIE | 2022-03-30 11:25:43 | 2022-03-30 11:25:43 | 13 | 
| RB-DSJ | 2022-03-30 11:25:43 | 2022-03-30 11:25:43 | 13 | 
| co2mofetten | 2022-03-30 11:25:43 | 2023-01-17 07:38:21 | 12 | 
| Sofia | 2022-03-30 11:25:43 | 2022-03-30 11:25:43 | 12 | 
| Haus D | 2022-03-30 11:25:43 | 2022-03-30 11:25:43 | 11 | 
| home | 2022-03-30 11:25:43 | 2022-03-30 11:25:43 | 11 | 
| Netlight | 2022-03-30 11:25:43 | 2022-03-30 11:25:43 | 11 | 
| #STROPDEAER | 2022-03-30 11:25:43 | 2023-02-16 15:12:50 | 10 | 
| AirAberdeen | 2022-03-30 11:25:43 | 2022-03-30 11:25:43 | 10 | 
| Balthasar-Neumann-Schule 1 | 2022-03-30 11:25:43 | 2022-03-30 11:25:43 | 10 | 
| Bestäuberprojekt | 2022-03-30 11:25:43 | 2022-03-30 11:25:43 | 10 | 
| Che Aria Tira? | 2022-03-30 11:25:43 | 2022-03-30 11:25:43 | 10 | 
| dwih-sp | 2022-03-30 11:25:43 | 2022-03-30 11:25:43 | 10 | 
| esri-de | 2022-03-30 11:25:43 | 2022-03-30 11:25:43 | 10 | 
| HBG Bonn | 2022-03-30 11:25:43 | 2022-03-30 11:25:43 | 10 | 
| IntegrA | 2022-03-30 11:25:43 | 2022-03-30 11:25:43 | 10 | 
| makerspace-partheland | 2022-03-30 11:25:43 | 2023-02-20 18:34:50 | 10 | 
| Mikroklima H | 2022-05-07 17:29:00 | 2022-05-07 17:47:42 | 10 | 
| montorioveronese.it | 2022-03-30 11:25:43 | 2022-12-29 07:45:57 | 10 | 
| ATSO | 2022-03-30 11:25:43 | 2022-03-30 11:25:43 | 9 | 
| clevermint | 2022-03-30 11:25:43 | 2022-03-30 11:25:43 | 9 | 
| Fläming | 2022-08-15 19:16:48 | 2022-12-13 06:29:22 | 9 | 
| Mikroklima C-R | 2022-03-30 11:25:43 | 2022-03-30 11:25:43 | 9 | 
| Ostroda | 2022-03-30 11:25:43 | 2022-03-30 11:25:43 | 9 | 
| RSS | 2022-03-30 11:25:43 | 2022-03-30 11:25:43 | 9 | 
| test | 2022-03-30 11:25:43 | 2022-12-18 22:20:34 | 9 | 
| 2 | 2022-03-30 11:25:43 | 2023-01-07 15:44:29 | 8 | 
| Data4City | 2022-03-30 11:25:43 | 2022-03-30 11:25:43 | 8 | 
| DBDS | 2022-03-30 11:25:43 | 2022-03-30 11:25:43 | 8 | 
| IKG | 2022-03-30 11:25:43 | 2022-03-30 11:25:43 | 8 | 
| IVKOWeek | 2022-03-30 11:25:43 | 2022-07-05 09:42:31 | 8 | 
| Koerber-Stiftung | 2022-03-30 11:25:43 | 2022-03-30 11:25:43 | 8 | 
| M7 | 2022-03-30 11:25:43 | 2022-11-28 13:00:44 | 8 | 
| Natlab Ökologie | 2022-03-30 11:25:43 | 2022-03-30 11:25:43 | 8 | 
| PGKN | 2022-03-30 11:25:43 | 2022-03-30 11:25:43 | 8 | 
| Raumanmeri | 2022-03-30 11:25:43 | 2022-03-30 11:25:43 | 8 | 
| stw | 2022-03-30 11:25:43 | 2022-03-30 11:25:43 | 8 | 
First we group the boxes by createdAt into bins of one
week:
bins = 'week'
mvavg_bins = 6
growth = boxes %>%
  mutate(week = cut(as.Date(createdAt), breaks = bins)) %>%
  group_by(week) %>%
  summarize(count = length(week)) %>%
  mutate(event = 'registered')We can do the same for updatedAt, which informs us about
the last change to a box, including uploaded measurements. This method
of determining inactive boxes is fairly inaccurate and should be
considered an approximation, because we have no information about
intermediate inactive phases. Also deleted boxes would probably have a
big impact here.
inactive = boxes %>%
  # remove boxes that were updated in the last two days,
  # b/c any box becomes inactive at some point by definition of updatedAt
  filter(updatedAt < now() - days(2)) %>%
  mutate(week = cut(as.Date(updatedAt), breaks = bins)) %>%
  group_by(week) %>%
  summarize(count = length(week)) %>%
  mutate(event = 'inactive')Now we can combine both datasets for plotting:
boxes_by_date = bind_rows(growth, inactive) %>% group_by(event)
ggplot(boxes_by_date, aes(x = as.Date(week), colour = event)) +
  xlab('Time') + ylab(paste('rate per ', bins)) +
  scale_x_date(date_breaks="years", date_labels="%Y") +
  scale_colour_manual(values = c(registered = 'lightgreen', inactive = 'grey')) +
  geom_point(aes(y = count), size = 0.5) +
  # moving average, make first and last value NA (to ensure identical length of vectors)
  geom_line(aes(y = rollmean(count, mvavg_bins, fill = list(NA, NULL, NA))))We see a sudden rise in early 2017, which lines up with the fast
growing grouptag Luftdaten. This was enabled by an
integration of openSenseMap.org into the firmware of the air quality
monitoring project luftdaten.info. The dips in mid
2017 and early 2018 could possibly be explained by production/delivery
issues of the senseBox hardware, but I have no data on the exact time
frames to verify.
While we are looking at createdAt and
updatedAt, we can also extract the duration of activity of
each box, and look at metrics by exposure and grouptag once more:
duration = boxes %>%
  group_by(exposure) %>%
  filter(!is.na(updatedAt)) %>%
  mutate(duration = difftime(updatedAt, createdAt, units='days'))
ggplot(duration, aes(x = exposure, y = duration)) +
  geom_boxplot() +
  coord_flip() + ylab('Duration active in Days')The time of activity averages at only 158 days, though there are boxes with 2394 days of activity, spanning a large chunk of openSenseMap’s existence.
duration = boxes %>%
  group_by(grouptag) %>%
  # only include grouptags with 8 or more members
  filter(length(grouptag) >= 8 & !is.na(grouptag) & !is.na(updatedAt)) %>%
  mutate(duration = difftime(updatedAt, createdAt, units='days'))
  
ggplot(duration, aes(x = grouptag, y = duration)) +
  geom_boxplot() +
  coord_flip() + ylab('Duration active in Days')duration %>%
  summarize(
    duration_avg = round(mean(duration)),
    duration_min = round(min(duration)),
    duration_max = round(max(duration)),
    oldest_box = round(max(difftime(now(), createdAt, units='days')))
  ) %>%
  arrange(desc(duration_avg))| grouptag | duration_avg | duration_min | duration_max | oldest_box | 
|---|---|---|---|---|
| Ostroda | 335 days | 335 days | 335 days | 343 days | 
| Mikroklima C-R | 332 days | 321 days | 335 days | 343 days | 
| Apeldoorn | 331 days | 263 days | 335 days | 343 days | 
| freshairbromley | 304 days | 28 days | 335 days | 343 days | 
| Mikroklima | 283 days | 42 days | 335 days | 343 days | 
| Mikroklima H | 283 days | 229 days | 297 days | 305 days | 
| Smart City MS | 272 days | 0 days | 335 days | 343 days | 
| Feinstaub | 223 days | 0 days | 335 days | 343 days | 
| makerspace-partheland | 217 days | 0 days | 335 days | 343 days | 
| co2mofetten | 213 days | 0 days | 334 days | 343 days | 
| Luftdaten | 211 days | 0 days | 335 days | 343 days | 
| luftdaten.info | 200 days | 0 days | 335 days | 343 days | 
| Burgermeetnet | 190 days | 0 days | 335 days | 343 days | 
| esri-de | 190 days | 0 days | 335 days | 343 days | 
| #stropdeaer | 187 days | 0 days | 335 days | 343 days | 
| Sofia | 172 days | 0 days | 335 days | 343 days | 
| WAUW!denberg | 168 days | 0 days | 335 days | 343 days | 
| KJR-M | 167 days | 0 days | 335 days | 343 days | 
| IKG | 163 days | 0 days | 335 days | 343 days | 
| AirAberdeen | 155 days | 0 days | 335 days | 343 days | 
| M7 | 155 days | 92 days | 243 days | 343 days | 
| 1 | 148 days | 0 days | 335 days | 343 days | 
| BurgerMeetnet | 141 days | 0 days | 335 days | 343 days | 
| Luftdaten.info | 139 days | 0 days | 335 days | 343 days | 
| Bottrop-Feinstaub | 133 days | 0 days | 335 days | 343 days | 
| cleanairfrome | 130 days | 0 days | 335 days | 343 days | 
| montorioveronese.it | 130 days | 0 days | 335 days | 343 days | 
| stw | 130 days | 0 days | 335 days | 343 days | 
| RB-DSJ | 122 days | 0 days | 335 days | 343 days | 
| Mikroprojekt Mitmachklima | 118 days | 0 days | 335 days | 343 days | 
| BRGL | 113 days | 109 days | 114 days | 122 days | 
| Luchtwachters Delft | 111 days | 0 days | 335 days | 343 days | 
| Fläming | 110 days | 23 days | 180 days | 205 days | 
| BRGW | 109 days | 98 days | 118 days | 126 days | 
| PIE | 103 days | 0 days | 335 days | 343 days | 
| Riga | 103 days | 0 days | 335 days | 343 days | 
| kerekdomb_ | 100 days | 0 days | 335 days | 343 days | 
| luftdaten | 100 days | 0 days | 335 days | 343 days | 
| home | 95 days | 0 days | 335 days | 343 days | 
| Bad_Hersfeld | 94 days | 0 days | 335 days | 343 days | 
| MSGB | 94 days | 50 days | 106 days | 114 days | 
| dwih-sp | 92 days | 0 days | 335 days | 343 days | 
| AGIN | 91 days | 87 days | 92 days | 100 days | 
| HTLJ | 91 days | 58 days | 99 days | 107 days | 
| Соседи по воздуху | 87 days | 0 days | 335 days | 343 days | 
| bad_hersfeld | 84 days | 0 days | 335 days | 343 days | 
| Captographies | 82 days | 0 days | 648 days | 656 days | 
| Save Dnipro | 75 days | 0 days | 335 days | 343 days | 
| PGKN | 68 days | 0 days | 335 days | 343 days | 
| MSHO | 61 days | 36 days | 70 days | 78 days | 
| Netlight | 61 days | 0 days | 335 days | 343 days | 
| #STROPDEAER | 55 days | 0 days | 335 days | 343 days | 
| Futurium | 54 days | 0 days | 335 days | 343 days | 
| test | 54 days | 0 days | 335 days | 343 days | 
| ifgi | 52 days | 0 days | 335 days | 343 days | 
| MSIN | 52 days | 0 days | 79 days | 107 days | 
| 2 | 50 days | 0 days | 331 days | 343 days | 
| ATSO | 48 days | 0 days | 279 days | 343 days | 
| MakeLight | 47 days | 0 days | 335 days | 343 days | 
| Haus B | 44 days | 0 days | 239 days | 343 days | 
| Futurium 2021 | 43 days | 0 days | 329 days | 343 days | 
| DBDS | 42 days | 0 days | 335 days | 343 days | 
| IVKOWeek | 42 days | 0 days | 335 days | 343 days | 
| PMSI | 38 days | 38 days | 38 days | 47 days | 
| edu | 37 days | 0 days | 335 days | 343 days | 
| GIZ Clean Air Day Project | 37 days | 0 days | 335 days | 343 days | 
| TKS Bonn | 32 days | 0 days | 335 days | 343 days | 
| HU Explorers | 28 days | 0 days | 319 days | 343 days | 
| 321heiss | 24 days | 0 days | 43 days | 254 days | 
| SUGUCS | 9 days | 0 days | 53 days | 98 days | 
| APPI | 3 days | 0 days | 7 days | 41 days | 
| MSKE | 3 days | 0 days | 7 days | 62 days | 
| RSS | 3 days | 0 days | 28 days | 343 days | 
| CS:iDrop | 2 days | 0 days | 36 days | 57 days | 
| UrbanGarden | 2 days | 0 days | 16 days | 34 days | 
| Balthasar-Neumann-Schule 1 | 0 days | 0 days | 0 days | 343 days | 
| Bestäuberprojekt | 0 days | 0 days | 0 days | 343 days | 
| Che Aria Tira? | 0 days | 0 days | 0 days | 343 days | 
| clevermint | 0 days | 0 days | 0 days | 343 days | 
| Data4City | 0 days | 0 days | 0 days | 343 days | 
| Haus C | 0 days | 0 days | 0 days | 343 days | 
| Haus D | 0 days | 0 days | 0 days | 343 days | 
| HBG Bonn | 0 days | 0 days | 0 days | 343 days | 
| IntegrA | 0 days | 0 days | 0 days | 343 days | 
| Koerber-Stiftung | 0 days | 0 days | 0 days | 343 days | 
| Natlab Ökologie | 0 days | 0 days | 0 days | 343 days | 
| Raumanmeri | 0 days | 0 days | 0 days | 343 days | 
| SekSeeland | 0 days | 0 days | 0 days | 343 days | 
The time of activity averages at only 90 days, though there are boxes with 648 days of activity, spanning a large chunk of openSenseMap’s existence.
This is less useful, as older boxes are active for a longer time by definition. If you have an idea how to compensate for that, please send a Pull Request!
# NOTE: boxes older than 2016 missing due to missing updatedAt in database
duration = boxes %>%
  mutate(year = cut(as.Date(createdAt), breaks = 'year')) %>%
  group_by(year) %>%
  filter(!is.na(updatedAt)) %>%
  mutate(duration = difftime(updatedAt, createdAt, units='days'))
ggplot(duration, aes(x = substr(as.character(year), 0, 4), y = duration)) +
  geom_boxplot() +
  coord_flip() + ylab('Duration active in Days') + xlab('Year of Registration')Other visualisations come to mind, and are left as an exercise to the reader. If you implemented some, feel free to add them to this vignette via a Pull Request.