Making World Tile Grid-Grids

Share Tweet

A colleague asked if I would blog about how I crafted the grid of world tile grids in this post and I accepted the challenge. The technique isn’t too hard as it just builds on the initial work by Jon Schwabish and a handy file made by Maarten Lambrechts.

The Premise

For this particular use-case, I sifted through our internet scan data and classified a series of device families from their telnet banners then paired that with our country-level attribution data for each IPv4 address. I’m not generally “a fan” of rolling things up at a country level, but since many (most) of these devices are residential or small/medium-business routers, country-level attribution has some merit.

But, I’m also not a fan of country-level choropleths when it comes to “cyber” nor am I wont to area-skewed cartograms since most folks still cannot interpret them. Both of those take up a ton of screen real estate, too, espeically if you have more than one of them. Yet, I wanted to show a map-like structure without resorting to Hilbert IPv4 heatmaps since they are neither very readable by a general audience and become skewed when you have to move up from a 1 pixel == 1 Class C network block.

I think the tile grid is a great compromise since it avoids the “area”and projection skewness confusion that regular global choropleths cause while still preserving geographic & positional proximity. Sure, they’ll take some getting used to by casual readers, but I felt it was the best of all the tradeoffs.

The Setup

Here’s the data:


library(here)
library(hrbrthemes)
library(tidyverse)

wtg <- read_csv("https://gist.githubusercontent.com/maartenzam/787498bbc07ae06b637447dbd430ea0a/raw/9a9dafafb44d8990f85243a9c7ca349acd3a0d07/worldtilegrid.csv")

glimpse(wtg)
 
## Observations: 192
## Variables: 11
## $ name            
  
  
    "Afghanistan", "Albania", "Algeria", "Angola",... ## $ alpha.2 
   
     "AF", "AL", "DZ", "AO", "AQ", "AG", "AR", "AM"... ## $ alpha.3 
    
      "AFG", "ALB", "DZA", "AGO", "ATA", "ATG", "ARG... ## $ country.code 
     
       "004", "008", "012", "024", "010", "028", "032... ## $ iso_3166.2 
      
        "ISO 3166-2:AF", "ISO 3166-2:AL", "ISO 3166-2:... ## $ region 
       
         "Asia", "Europe", "Africa", "Africa", "Antarct... ## $ sub.region 
        
          "Southern Asia", "Southern Europe", "Northern ... ## $ region.code 
         
           "142", "150", "002", "002", NA, "019", "019", ... ## $ sub.region.code 
          
            "034", "039", "015", "017", NA, "029", "005", ... ## $ x 
           
             22, 15, 13, 13, 15, 7, 6, 20, 24, 15, 21, 4, 2... ## $ y 
            
              8, 9, 11, 17, 23, 4, 14, 6, 19, 6, 7, 2, 9, 8,... routers <- read_csv(here::here("data", "routers.csv")) routers   ## # A tibble: 453,027 x 3 ## type country_name country_code ## 
              
               
               
                 ## 1 mikrotik Slovak Republic SK ## 2 mikrotik Czechia CZ ## 3 mikrotik Colombia CO ## 4 mikrotik Bosnia and Herzegovina BA ## 5 mikrotik Czechia CZ ## 6 mikrotik Brazil BR ## 7 mikrotik Vietnam VN ## 8 mikrotik Brazil BR ## 9 mikrotik India IN ## 10 mikrotik Brazil BR ## # ... with 453,017 more rows distinct(routers, type) %>% arrange(type) %>% print(n=11)   ## # A tibble: 11 x 1 ## type ## 
                
                  ## 1 asus ## 2 dlink ## 3 huawei ## 4 linksys ## 5 mikrotik ## 6 netgear ## 7 qnap ## 8 tplink ## 9 ubiquiti ## 10 upvel ## 11 zte 
                 
                
               
              
             
            
           
          
         
        
       
      
     
    
  

So, we have 11 different device families under assault by “VPNFilter”
and I wanted to show the global distribution of them. Knowing the
compact world tile grid would facet well, I set off to make it happen.

Let’s get some decent names for facet labels:


real_names <- read_csv(here::here("data", "real_names.csv"))

real_names
 
## # A tibble: 11 x 2
##    type     lab             
##    
  
   
   
     ## 1 asus Asus Device ## 2 dlink D-Link Devices ## 3 huawei Huawei Devices ## 4 linksys Linksys Devices ## 5 mikrotik Mikrotik Devices ## 6 netgear Netgear Devices ## 7 qnap QNAP Devices ## 8 tplink TP-Link Devices ## 9 ubiquiti Ubiquiti Devices ## 10 upvel Upvel Devices ## 11 zte ZTE Devices 
    
  

Next, we need to summarise our scan results and pair it up the world
tile grid data and our real names:


count(routers, country_code, type) %>%  # summarise the data into # of device familes per country
  left_join(wtg, by = c("country_code" = "alpha.2")) %>% # join them up on the common field
  filter(!is.na(alpha.3)) %>% # we only want countries on the grid and maxmind attributes some things to meta-regions and anonymous proxies
  left_join(real_names) -> wtg_routers

glimpse(wtg_routers)

## Observations: 629
## Variables: 14
## $ country_code    
  
  
    "AE", "AE", "AE", "AF", "AF", "AF", "AG", "AL"... ## $ type 
   
     "asus", "huawei", "mikrotik", "huawei", "mikro... ## $ n 
    
      1, 12, 70, 12, 264, 27, 1, 941, 2081, 7, 2, 1,... ## $ name 
     
       "United Arab Emirates", "United Arab Emirates"... ## $ alpha.3 
      
        "ARE", "ARE", "ARE", "AFG", "AFG", "AFG", "ATG... ## $ country.code 
       
         "784", "784", "784", "004", "004", "004", "028... ## $ iso_3166.2 
        
          "ISO 3166-2:AE", "ISO 3166-2:AE", "ISO 3166-2:... ## $ region 
         
           "Asia", "Asia", "Asia", "Asia", "Asia", "Asia"... ## $ sub.region 
          
            "Western Asia", "Western Asia", "Western Asia"... ## $ region.code 
           
             "142", "142", "142", "142", "142", "142", "019... ## $ sub.region.code 
            
              "145", "145", "145", "034", "034", "034", "029... ## $ x 
             
               20, 20, 20, 22, 22, 22, 7, 15, 15, 15, 20, 20,... ## $ y 
              
                10, 10, 10, 8, 8, 8, 4, 9, 9, 9, 6, 6, 6, 6, 1... ## $ lab 
               
                 "Asus Device", "Huawei Devices", "Mikrotik Dev... 
                
               
              
             
            
           
          
         
        
       
      
     
    
  

Then, plot it:


ggplot(wtg_routers, aes(x, y, fill=n, group=lab)) +
  geom_tile(color="#b2b2b2", size=0.125) +
  scale_y_reverse() +
  viridis::scale_fill_viridis(name="# Devices", trans="log10", na.value="white", label=scales::comma) +
  facet_wrap(~lab, ncol=3) +
  coord_equal() +
  labs(
    x=NULL, y=NULL,
    title = "World Tile Grid Per-country Concentration of\nSeriously Poorly Configured Network Devices",
    subtitle = "Device discovery based on in-scope 'VPNFilter' vendor device banner strings",
    caption = "Source: Rapid7 Project Sonar & Censys"
  ) +
  theme_ipsum_rc(grid="") +
  theme(panel.background = element_rect(fill="#969696", color="#969696")) +
  theme(axis.text=element_blank()) +
  theme(legend.direction="horizontal") +
  theme(legend.key.width = unit(2, "lines")) +
  theme(legend.position=c(0.85, 0.1))

 

Doh! We forgot to ensure we had data for every country. Let’s try that
again:


count(routers, country_code, type) %>%
  complete(country_code, type) %>%
  filter(!is.na(country_code)) %>%
  left_join(wtg, c("country_code" = "alpha.2")) %>%
  filter(!is.na(alpha.3)) %>%
  left_join(real_names) %>%
  complete(country_code, type, x=unique(wtg$x), y=unique(wtg$y)) %>%
  filter(!is.na(lab)) %>%
  ggplot(aes(x, y, fill=n, group=lab)) +
  geom_tile(color="#b2b2b2", size=0.125) +
  scale_y_reverse() +
  viridis::scale_fill_viridis(name="# Devices", trans="log10", na.value="white", label=scales::comma) +
  facet_wrap(~lab, ncol=3) +
  coord_equal() +
  labs(
    x=NULL, y=NULL,
    title = "World Tile Grid Per-country Concentration of\nSeriously Poorly Configured Network Devices",
    subtitle = "Device discovery based on in-scope 'VPNFilter' vendor device banner strings",
    caption = "Source: Rapid7 Project Sonar & Censys"
  ) +
  theme_ipsum_rc(grid="") +
  theme(panel.background = element_rect(fill="#969696", color="#969696")) +
  theme(axis.text=element_blank()) +
  theme(legend.direction="horizontal") +
  theme(legend.key.width = unit(2, "lines")) +
  theme(legend.position=c(0.85, 0.1))

 

That’s better.

We take advantage of ggplot2’s ability to facet and just ensure we have
complete (even if NA) tiles for each panel.

Once consumers start seeing these used more they’ll be able to pick up
key markers (or one of us will come up with a notation that makes key
markers more visible) and be able to get specific information from the
chart. I just wanted to show regional and global differences between
vendors (and really give MikroTik users a swift kick in the patootie for
being so bad with their kit).

FIN

You can find the RStudio project (code + data) here: (http://rud.is/dl/tile-grid-grid.zip)

Share Tweet



Related articles


0 Comments