Bubble Plots to Visualize Singapore Private Property

7 minute read

Introduction

Having seen choropleths and bubble plots, we now examine how to display bubble plots on maps. In particular, we would like to display the location of private properties in Singapore. The Markers vignette for R leaflet gives a quick introduction on how to display markers on maps.

Data

Our dataset, Private2017.csv contains summarized property transaction data. To correctly display property in such, we will need accurate geolocation data. Fortunately OneMap provides excellent geolocation data, which can accessed via their API.

We use the postcode from our data to systematically retrieve the longitude and latitude of the various private properties that were sold in 2017. As we have the individual postcodes, we are able to distinguish between blocks within a project. OneMap limits us to 250 API calls per minute.

import time
import requests
import simplejson as json
import pandas as pd

df_postcode=pd.read_csv("D:/private2017.csv",dtype=str)

for i in range(len(df_postcode)):
    if len(str(df_postcode['Postal.Code'][i]))==5:
        df_postcode['Postal.Code'][i]='0'+df_postcode['Postal.Code'][i]
df_postcode['OnemapLongitude'] = pd.Series(0, index=df_postcode.index)
df_postcode['OnemapLatitude'] = pd.Series(0, index=df_postcode.index)       

start=time.time()
for i in range(len(df_postcode)):
    req=requests.get('https://developers.onemap.sg/commonapi/search?searchVal='+df_postcode['Postal.Code'][i]+'&returnGeom=Y&getAddrDetails=Y&pageNum=1')
    jdata = json.loads(req.text)
    if jdata['found']>=1:
        df_postcode['OnemapLongitude'][i]=jdata['results'][0]['LONGITUDE']
        df_postcode['OnemapLatitude'][i]=jdata['results'][0]['LATITUDE']
    else:
        df_postcode['OnemapLongitude'][i]=float('nan')
        df_postcode['OnemapLatitude'][i]=float('nan')
df_postcode.to_csv("D:/private2017-longlat.csv")
print('Time Taken:',time.time()-start)

Similarly, we starting from a list of MRT station names MRT.csv in our data folder, we can find the position of the MRT stations in Singapore. Note that OneMap can return the location of the MRT exits or just a central location. For our application we would like to determine the distance to the nearest MRT exit. As such we will request the location of all exits

mrt="D:/MRT.csv"
df_mrt_names=pd.read_csv(mrt,header=None)

df_mrt=pd.DataFrame(columns=['MRT','long','lat'])

for i in range(len(df_mrt_names)):
    time.sleep(0.3)
    req=requests.get('https://developers.onemap.sg/commonapi/search?searchVal='+df_mrt_names[0][i]+'&returnGeom=Y&getAddrDetails=Y&pageNum=1')
    jdata = json.loads(req.text)
    if jdata['found']>=1:
        for j in range(min(jdata['found'],10)):
            if 'MRT STATION EXIT' in jdata['results'][j]['ADDRESS']:
                df_new=pd.DataFrame(np.random.randint(low=0, high=10, size=(1, 3)),columns=['MRT','long','lat'])
                df_new['MRT'][0]=jdata['results'][j]['ADDRESS']
                df_new['long'][0]=jdata['results'][j]['LONGITUDE']
                df_new['lat'][0]=jdata['results'][j]['LATITUDE']
                df_mrt=df_mrt.append(df_new)

Distance to nearest MRT

The distance between 2 points can be accurately calculated using Vincenty’s formula. It is possible to implement our own calculation using the math library in python. Another option is to install the vincenty library and call the vincenty function for our calculations (the function returns the distance in meters). In R, the geosphere package has the vincenty formulae implemented as well and returns a distance matrix. However the downside is that it is single threaded and consumes a large amount of memory. To calculate the nearest MRT alone we have around 6400 properties and 500 MRT station exits, requiring in 3.2M calculations.

from vincenty import vincenty
import pandas as pd
from multiprocessing import Pool
import time
from itertools import product

def postcode_vincenty(mrt,prop):
    dist=vincenty(mrt[1:],prop[1:])*1000
    return((prop[0],)+(mrt[0],)+(dist,))    

if __name__ == '__main__':
    ## Read Data
    mrt=pd.read_csv('MRT_lat_long.csv')
    prop=pd.read_csv('private2017-longlat.csv')
    ## Convert to List of tuples
    mrt=[(mrt['MRT'][i],mrt['lat'][i],mrt['long'][i]) for i in range(len(mrt))]
    prop=[(prop['Postal.Code'][i],prop['OnemapLatitude'][i],prop['OnemapLongitude'][i]) for i in range(len(prop))]

    p = Pool(4)  
    start=time.time()
    results=p.starmap(postcode_vincenty,product(mrt,prop))
    p.terminate()
    p.join()
    data=pd.DataFrame(results,columns=['Postcode','MRT Exit','Distance(m)'])
    data=data.sort_values("Distance(m)").groupby("Postcode", as_index=False).first()
    print('Time Taken:',time.time()-start)
    data.to_csv('vincenty.csv')

We can also calculate the nearest private properties with slight modification to the codes, and this will require almost 41M calculations.

Leaflet and Bubble Plots

Previously we used the addPolygons function in leaflet to produce choropleths. To add bubbles in specific locations, we use the addCircles function. The parameters lng and lat refer to the longitude and latitude. The radius determines the size of the bubble, which we set to be proportional to the square root of the average price. We can also display a pop up when the user’s mouse hovers over the bubble. In our case we display the project name, average price, nearest MRT and distance to MRT. A bit of html is required to separate the attributes as \n does not work here. We use the htmltools package to convert to HTML. Refer to this post on stackoverflow for more information.

library(tidyverse)
library(leaflet)
library(htmltools)

# Read In data
property<-read.csv("D:/private2017-longlat.csv")
mrt<-read.csv("C:/Users/david/Desktop/vincenty.csv")


# Join MRT data and property data
property<-merge(property,mrt,by.x="Postal.Code", by.y = "Postcode")


# Define color palette
pal <- colorFactor(c('#4AC6B7', '#1972A4', '#965F8A',
 '#FF7070', '#C61951',"gold"), domain = c("Detached House",
 "Apartment","Condominium","Executive Condominium",
 "Semi-Detached House","Terrace House"))


# Create html for new lines in label
labs <- lapply(seq(nrow(property)), function(i) {
  paste0( '<b>Project Name: </b>', property[i, "Project.Name"], '</br>',
          '<b>Average Price($): </b>',property[i, "Average_Price"],'</br>',
          '<b>Nearest MRT: </b>',property[i, "MRT.Exit"],'</br>',
          '<b>Distance to MRT (m): </b>',property[i, "Distance.m."],'</br>')
})




m <- leaflet(property) %>%
  setView(lng = 103.8198, lat = 1.3521, zoom = 12) %>%
  setMaxBounds( lng1 = 103.600250,
                lat1 = 1.202674,
                lng2 = 104.027344,
                lat2 = 1.484121 )%>%
  addTiles(options=tileOptions(opacity=0.6)) %>%  
  addCircles(lng=~OnemapLongitude, lat=~OnemapLatitude,
           radius=~sqrt(Average_Price/500),
           color=~pal(Property.Type),
           weight=2,
           label=lapply(labs, HTML),
           stroke=TRUE,opacity=0.8,fillOpacity=0.5,
           #clusterOptions=markerClusterOptions(),
           layerId = ~Project.Name) %>%
  addLegend("topright", pal = pal,
          values = ~Property.Type,
          title = "Property Type",
          opacity = 1)

m