Data Science Portfolio

Geo Data Visualization: Hurricanes and Tropical Storm

Introduction

The Hurricanes and Tropical Storm dataset was obtained from GitHub. It's published by the National Hurricane Center (NHC) for the purpose of tracking and analyzing tropical cyclones. It includes the wind information at a six-hourly level with the Geo location (Latitude and Longtitude). Here I am using this dataset to demonstrate the power of data visualization using python. For data aggregation and visualization using pyplot, please check out my other page here.

Data Exploration

Dataset Snapshot

  • Below is a snapshot of the hurricane dataset (source from NOAA, and dataset obtained from GitHub)

  • The dataset covers data through 2015.

Columns and Metadata

  • The dataset here is a combination of both Atlantic and Pacific dataset.

  • Below are the headers:

ID uniquely defines each cyclone, in the format of (XX012005)

  • It starts with two letters (AL, EP, or CP). Per my understanding, AL represents Atlantic, and EP or CP are Pacific (East and Central).

  • Two digits followed the two letters, which indicating the ID of the cyclones in each year. It starts from 01, and keeps going up.

  • Then followed by the four digits of the year.


Name defines the name of the hurricane.

Date and Time indicate when the Hurricane were monitored.

Status:

Lat and Long indicate the location.

The rest of the metrics are all related to how strong the winds were.

Data Manipulation

The dataset will need some data manipulation before we can start visualize the information easily. I focused on below:

  1. extracting the Year information from the Date. The Date column is integer in the format of YYYYMMDD. Dividing it by 10000 and drop decimals will do the work.

df['Year'] = df.apply(lambda x: round(x['Date']/10000), axis = 1)

  1. converting Lat and Long to float with both + and - numbers.

In this dataset, the Lat and Long were given as strings, ending with E or W for Long, and N or S for Lat. I used where function in numpy to set up criteria. Then, I used regular expression to identify the letters in my string and remove them. Lastly, if the letter is W (for Long) or S (for Lat), I made them the opposite. Below is the example code:

dfmp['Latitude'] = np.where(dfmp['Latitude'].str.contains('N'), dfmp['Latitude'].replace({'[A-Z]': ''}, regex = True).astype(float), dfmp['Latitude'].replace({'[A-Z]': ''}, regex = True).astype(float) * -1)

  1. Data aggregation and subsetting.

This dataset contains data from 1851, however, for my visualization purpose I'd like to be able to focus on one year of my own interest, and maybe limited to the data points where maximum wind were over, let's say, 90 mph. To make my life easier, I used python to construct the data aggregation and subsetting function, where I can pass my criteria as arguments. I named my function as data_filter(). Then I told my function to subset the aggregated data for my for only the year of 2005 and only for those whose max wind were 90 or higher.

data_filter(data, year = 2005, min_wind = 90)

Data Visualization using Map in Folium

Folium is a very useful module in Python. You can construct the world map using Folium.Map, and set different tiles so it shows the map in different format. Then, you are able to add additional layers on top to mark the location you are investigating into. In my visualization of the hurricanes, I used folium to construct the map and labeled hurricanes track from 2005, where the max wind is 90 and higher. I also used different colors for each different hurricanes.

Here we go!!!

BTW, I used the size of the bubble here to indicate the wind speed! Something you may not tell as I filtered the data for only 90 mph and above. However, this is a great way to visualize both the geo location and the weight of any geo level data we have!!!


Stay Connected!

If you are interested in our Data Science and Business Analytics solutions, feel free to reach out to admin@abetterme.us