Analyzing the DC Housing Market - August, 2017

2017-09-20

Real Estate > Data

• 7 Minute Read

Choropleth Closing Prices by ZIP Code

The Data

One of my favorite DC blogs is POPville - written by the Prince of Petworth himself, Dan Silverman. While skimming through posts earlier this week, I noticed an entry containing data on August, 2017 DC home and condo sales compiled by Kevin Wood, a local Realtor. True to form, I couldn’t wait to dig into it and see what I could learn about the city’s real estate market!

Preparing the Data

I downloaded the Excel version of the data set from the POPville post and imported the data into RStudio for my analysis. To get things going, I created a dataframe with the sales data, and also created a new column to calculate the difference between closing prices and list prices.

library("tidyverse")
library(readxl)
library(scales)

fill <- "dodgerblue4"
line <- "grey22"

sales <- read_excel("YOUR_PATH/August2017Sales.xls", col_types = c("text", "text", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric", "date"))
sales <- mutate(sales, sale_diff = close_price - list_price)
View(sales)

Then, I grouped the information in ZIP code buckets, averaging the core data points across all the sales in each ZIP. With this done, I could start to dig into the numbers for each ZIP code in the city!

g_zip <- sales %>% group_by(zip) %>% summarise(count = n(), avg_list_price = mean(list_price), avg_close_price = mean(close_price), avg_subsidy = mean(subsidy[subsidy>0]), avg_bd = mean(bd), avg_bath = mean(bath), avg_half_bath = mean(half_bath), max_close_price = max(close_price), min_close_price = min(close_price), avg_sale_diff = mean(sale_diff))

Findings

Sales Data by ZIP Code

Average Sale Price by ZIP Code

plot_zip_prices <- ggplot(data = na.omit(g_zip), aes(x = reorder(zip, -avg_close_price), y = avg_close_price, fill = avg_close_price)) + geom_col() + scale_y_continuous(name = "Average Close Price", labels = comma) + ggtitle("Plot of Average Close Prices Across DC Zip Codes") + labs(x = "ZIP Code") + scale_fill_gradient(low = "light grey", high = fill) + theme_bw() + theme(legend.position="none", axis.text.x = element_text(angle = 90, hjust = 1))
plot_zip_prices

Average Sales Difference by ZIP Code

This data set is interesting to me because it begins to highlight which housing markets within DC are “heating up”. Seeing a positive difference between closing price and list price implies increased competition for homes in these ZIPs, and is where rapid-fire bidding wars are most likely to take place.

We’ll check this data out in greater depth in a few minutes, but it’s interesting to see Brookland, Congress Heights/Anacostia, and Logan Circle as the three “hottest” ZIPs with Georgetown, Kalorama/Woodley Park/Cleveland, and West End as the three “coldest” ZIPs.

plot_zip_sales_diff <- ggplot(data = na.omit(g_zip), aes(x = reorder(zip, -avg_sale_diff), y = avg_sale_diff, fill = avg_sale_diff)) + geom_col() + scale_y_continuous(name = "Difference in Close and List Prices ($)", labels = comma) + ggtitle("Plot of Difference in List and Sale Prices") + labs(x = "ZIP Code", subtitle = "Positive values denote ZIP codes where the average closing price exceeds the list price") + scale_fill_gradient() + theme_bw() + theme(legend.position="none", axis.text.x = element_text(angle = 90, hjust = 1))
plot_zip_sales_diff

When are Sales Happening?

Sales During August, 2017

Gross Sales Volume During August, 2017

One cool trend from both of these charts is the spike in sales occuring each Monday (the three highest mid-month points) - clearly folks are investigating homes on the weekend and looking to get their weeks off to a good start.

g_date <- sales %>% group_by(close_date) %>% summarize(count = n(), gross = sum(close_price))

plot_num_sales <- ggplot(na.omit(g_date), aes(x = close_date, y = count)) + geom_line() + scale_y_continuous(name = "Number of Homes Sold", labels = comma) + ggtitle("House Sales During August 2017") + labs(x = "Date") + theme_bw() 
plot_num_sales

plot_gross_sales <- ggplot(na.omit(g_date), aes(x = close_date, y = gross)) + geom_area(alpha = 0.7, color = line, fill = fill) + scale_y_continuous(name = "Gross Sales Volume ($)", labels = comma) + ggtitle("Gross Sales Volume During August 2017") + labs(x = "Date") + theme_bw() 
plot_gross_sales

Mapping ZIP Code Sales Data

When I started this exercise, the thing I wanted to do most was visualize different types of sales data on top of a map of DC broken up by ZIP code. After wasting an embarrassing amount of time with fruitless Googling and Stack Overflow-ing, I finally found what I needed - the choropleth map. A crazy name for a relatively intuitive map, the choropleth breaks down the frequency or proportion of a measurement over blocks of space in a map.

Armed with my new favorite word, I set out in search of how to make my own choropleths in R. Eventually I stumbled on the choroplethrZip library from arilamstein and was able to get his code going on my machine.

install.packages("devtools")
library(devtools)
install_github('arilamstein/choroplethrZip@v1.3.0')
library(choroplethrZip)

With the choropleth library set up, all I needed to do was pass a dataframe with two columns, named region and value, into the zip_choropleth function to start making maps.

Choropleth Closing Prices by ZIP Code

No surprises here - sales in Northwest dominate this chart, with many of the most expensive occurring in Georgetown or Kalorama.

Choropleth Sales by ZIP Code

This map shows where the most transactions are taking place across the city: H Street/NOMA/Eckington takes the lead with 111 purchases in August, followed by the 14th Street Corridor/Adams Morgan with 94, and then Petworth/Brightwood/16th Street Heights with 87.

Choropleth Sales Difference by ZIP Code

As alluded to earlier, this map shows the difference in list vs. closing price across DC ZIPs - another good way to see which neighborhoods are receiving above-market interest. While they may dominate the price charts, ZIPs throughout Northwest fare poorly by this metric, frequently selling for much less than asking. Interest in properties along the periphery of DC proper could also mean that speculation is moving further from the urban core, as more buyers are priced out of expensive neighborhoods and bidding up prices for less-expensive areas.

Choropleth Subsidies by ZIP Code

Interestingly, the ZIP code (20032) receiving the highest average purchase subsidy, at just under $15,000 per transaction, was #2 in the previous section, where closing prices exceeded list prices by almost $7,000. I’m not totally sure the significance of that connection, but an easy inference to make is that government subsidies are bolstering the finances of home buyers in Southeast DC, artificially raising sale prices. I’ll be curious to see if this is a one-time coincidence, or turns out to be a trend moving forward.

zip_price <- select(g_zip, region = zip, value = avg_close_price)
choro_price <- zip_choropleth(zip_price, state_zoom = "district of columbia", title = "August 2017 DC Home Closings by ZIP Code", legend = "Close Price") + coord_map() 
choro_price

zip_sales <- select(g_zip, region = zip, value = count)
choro_sales <- zip_choropleth(zip_sales, state_zoom = "district of columbia", title = "August 2017 DC Sales Count by ZIP Code", legend = "Number of Sales") + coord_map() 
choro_sales

zip_subsidy <- select(g_zip, region = zip, value = avg_subsidy)
choro_subsidy <- zip_choropleth(zip_subsidy, state_zoom = "district of columbia", title = "August 2017 DC Home Purchase Subsidies by ZIP Code", legend = "Size of Subsidies") + coord_map() 
choro_subsidy

zip_sale_diff <- select(g_zip, region = zip, value = avg_sale_diff)
choro_sale_diff <- zip_choropleth(zip_sale_diff, state_zoom = "district of columbia", title = "August 2017 DC Closing Sale Differences by ZIP Code", legend = "Difference in Close and List Prices") + coord_map() 
choro_sale_diff

Sales of the Month (5 Most Expensive Closings)

Rank	Address	ZIP Code	List Price	Close Price	Bedrooms	Bathrooms	Half Bathrooms	Closing Date
1	3241 R ST NW	20007	4500000	4000000	5	3	2	8/15/2017
2	1339 29TH ST NW	20007	4299000	3975000	7	4	2	8/21/2017
3	2922 GLOVER DR NW	20016	3695000	3500000	6	5	1	8/1/2017
4	2121 S ST NW	20008	3400000	2900000	5	4	2	8/17/2017
5	4054 52ND TER NW	20016	2695000	2700000	5	4	2	8/1/2017

head(arrange(sales,desc(close_price)), n = 5)

Steals of the Month (5 Least Expensive Closings)

Rank	Address	ZIP Code	List Price	Bedrooms	Bathrooms	Closing Date
1	3872 9TH ST SE #102	20032	65000	1	1	8/30/2017
2	3872 9TH ST SE #302	20032	64900	1	1	8/28/2017
3	5 BRANDYWINE ST SE #B-33	20032	40000	2	1	8/2/2017
4	5 BRANDYWINE ST SE #B-35	20032	40000	2	1	8/8/2017
5	3072 30TH ST SE #302	20020	26900	2	1	8/29/2017

tail(arrange(na.omit(sales),desc(close_price)), n = 5)

Wrap-Up

All in all, exploring this data set was a great way for me to keep building my R scales while also exploring some of the macro trends in the DC housing market.

Since POPville publishes the home sales dataset monthly, I hope to be back next month with an update!