For our Final Tutorial, I, Annie Donahue, and Shay Prinz, will be working together. We are planning on working with 911 call data from the years 2019 and 2020, and possibly looking backwards to earlier years if it seems necessary, or if we have the time. We would like to manipulate this data to be able to see how different events/phenomena affect either peoples’ ability to call 911, or their willingness to.
For example, we would like to see what the data can tell us about pre-COVID19 911 calls, vs. post COVID19, by comparing the data March of 2020 with April of 2020, and then using March and April 2019 and 2018 as a control factor to see if anything changed. We were also thinking about looking at the 911 call data during times of civil unrest where there was a prominent anti-police sentiment in the U.S. Right now, for example, it would be interesting to see if less people are calling the police as protests calling for police abolishment continue across the country. This data can tell us a lot about the current state of New Orleans as a whole as we will be able to see both crime reporting data and 911 calls related to health issues. These datasets encompass a lot of information, and it may be difficult to filter it in a way that will be easier and more digestible so as to see the true effects of the various events we are trying to center our analysis around.
A specific challenge Shay came across when transforming the data was converting from string to datetime. Due to how large this dataset is, it was taking over 10 minutes to complete the conversion. This is because pandas uses dateutil.parser.parse for parsing the strings when it has a non-default format or when no format string is supplied. I was able to speed up the process a bit by adding the parameter "infer_datetime_format=True".
We are planning on meeting once a week as a starting point, so as to gauge how much we can get done in one sitting, and make sure we are on a good pace to be able to complete our tutorial in the semester. If need be, we will move up to meeting multiple times a week, especially as we get closer to the end of the semester. Mostly, we will be completing our work on GitHub, and then using the weekly meeting to make a timeline for the next week, and set goals and checkpoints to accomplish before the next time we meet, as well as check in on each others progress and make sure our code and our analyses are compatible.
We would like to plan at least one big goal to accomplish for each week, which is a hard deadline for us both, as well as a stretch goal that we would ideally get done before the next week, but could be moved up to the next week if it became clear that we didn’t have enough time during that week. This is so as to keep us on a steady track, but also work with both of our busy schedules is knowing that there are some weeks where we may have more time and be able to get larger bulk of things done, but even in our busier weeks, we would have to get the set goal done, keeping us on a steady movement towards completing the project. Ideally, our weeks will work around Thursdays, as since we are in the service learning portion of the class, we would like to be able to check in during those times with updates to our service learning partner and our class.
import pandas as pd
Calls2018_df = pd.read_csv("../Calls_for_Service_2018.csv",low_memory=False)
#Tidy data by dropping columns that won't be used (NOPD_Item, MapX, MapY, TimeClosed, Beat, Location)
del Calls2018_df['NOPD_Item']
del Calls2018_df['MapX']
del Calls2018_df['MapY']
del Calls2018_df['TimeClosed']
del Calls2018_df['Beat']
Calls2018_df[['Latitude','Longitude']] = Calls2018_df.Location.str.split(",",expand=True)
#Remove location column
del Calls2018_df['Location']
#Strip parentheses from left and right side
Calls2018_df['Latitude'] = Calls2018_df['Latitude'].replace(to_replace='\(', value="", regex=True)
Calls2018_df['Longitude'] = Calls2018_df['Longitude'].replace(to_replace='\)', value="", regex=True)
#Convert TimeCreate, TimeDispatch, and TimeArrival to datetime
Calls2018_df['TimeCreate'] = pd.to_datetime(Calls2018_df['TimeCreate'],infer_datetime_format=True)
Calls2018_df['TimeDispatch'] = pd.to_datetime(Calls2018_df['TimeDispatch'],infer_datetime_format=True)
Calls2018_df['TimeArrive'] = pd.to_datetime(Calls2018_df['TimeArrive'],infer_datetime_format=True)
#Remove all rows with years that are not 2019
Calls2018_df = Calls2018_df[(Calls2018_df['TimeCreate'].dt.year == 2018)]
#replace missing data
Calls2018_df.head()
Calls2019_df = pd.read_csv("../Calls_for_Service_2019.csv",low_memory=False)
#Tidy data by dropping columns that won't be used (NOPD_Item, MapX, MapY, TimeClosed, Beat, Location)
del Calls2019_df['NOPD_Item']
del Calls2019_df['MapX']
del Calls2019_df['MapY']
del Calls2019_df['TimeClosed']
del Calls2019_df['Beat']
Calls2019_df[['Latitude','Longitude']] = Calls2019_df.Location.str.split(",",expand=True)
#Remove location column
del Calls2019_df['Location']
#Strip parentheses from left and right side
Calls2019_df['Latitude'] = Calls2019_df['Latitude'].replace(to_replace='\(', value="", regex=True)
Calls2019_df['Longitude'] = Calls2019_df['Longitude'].replace(to_replace='\)', value="", regex=True)
#Calls2019_df.dtypes
#Convert TimeCreate, TimeDispatch, and TimeArrival to datetime
Calls2019_df['TimeCreate'] = pd.to_datetime(Calls2019_df['TimeCreate'],infer_datetime_format=True)
Calls2019_df['TimeDispatch'] = pd.to_datetime(Calls2019_df['TimeDispatch'],infer_datetime_format=True)
Calls2019_df['TimeArrival'] = pd.to_datetime(Calls2019_df['TimeArrival'],infer_datetime_format=True)
#Remove all rows with years that are not 2019
Calls2019_df = Calls2019_df[(Calls2019_df['TimeCreate'].dt.year == 2019)]
#replace missing data
Calls2019_df.head()
This dataset reflects incidents that have been reported to the New Orleans Police Department in 2020. Data is provided by Orleans Parish Communication District (OPCD), the administrative office of 9-1-1 for the City of New Orleans.
Calls2020_df = pd.read_csv("../Call_for_Service_2020.csv")
#Tidy data by dropping columns that won't be used (NOPD_Item, MapX, MapY, TimeClosed, Beat, Location)
del Calls2020_df['NOPD_Item']
del Calls2020_df['MapX']
del Calls2020_df['MapY']
del Calls2020_df['TimeClosed']
del Calls2020_df['Beat']
#Clean up location column to get Latitude and Longitude columns
Calls2020_df['Location'] = Calls2020_df['Location'].replace(to_replace='POINT ', value="", regex=True)
Calls2020_df[['Longitude','Latitude']] = Calls2020_df.Location.str.split(" ",expand=True)
#Remove location column
del Calls2020_df['Location']
#Strip parentheses from left and right side
Calls2020_df['Longitude'] = Calls2020_df['Longitude'].replace(to_replace='\(', value="", regex=True)
Calls2020_df['Latitude'] = Calls2020_df['Latitude'].replace(to_replace='\)', value="", regex=True)
#Calls2020_df.dtypes
#Convert TimeCreate, TimeDispatch, and TimeArrive to datetime
Calls2020_df['TimeCreate'] = pd.to_datetime(Calls2020_df['TimeCreate'],infer_datetime_format=True)
Calls2020_df['TimeDispatch'] = pd.to_datetime(Calls2020_df['TimeDispatch'],infer_datetime_format=True)
Calls2020_df['TimeArrive'] = pd.to_datetime(Calls2020_df['TimeArrive'],infer_datetime_format=True)
#replace missing data
Calls2020_df.head()
A challenge I came across when transforming the data was converting from string to datetime. Due to how large this dataset is, it was taking over 10 minutes to complete the conversion. This is because pandas uses dateutil.parser.parse for parsing the strings when it has a non-default format or when no format string is supplied. I was able to speed up the process a bit by adding the parameter "infer_datetime_format=True".
A challenge we knew we were going to face was the issue of population swelling. Due to coronavirus, we knew there would be a significant decrease in New Orleans population due to cancellation of tourist events and restrictions on travel. To try to estimate populations during each month, we have collected data on number of passengers arriving at MSY airport during each month from March 2018 - August 2020 (as far as data would go). We will merge this data with our 911 calls data so we can calculate 911 calls per 100,000 people. This should make our data analysis more realistic and account for unequal population sizes.
Air_arrivals = pd.read_csv("../arrivals_data.csv")
Air_arrivals
LA_cases = pd.read_csv("../louisiana-history.csv")
data = {'Date': LA_cases['date'],
'Cases per day': LA_cases['positiveIncrease']
}
# Convert the dictionary into DataFrame
LA_cases_by_day = pd.DataFrame(data)
LA_cases_by_day['Date'] = pd.to_datetime(LA_cases_by_day['Date'],infer_datetime_format=True)
LA_cases_by_day = LA_cases_by_day.set_index('Date')
LA_cases_by_day.head()
LA_cases_by_day.plot(figsize = (20,8))
Trends: spikes Highest spikes in April, July, November
Calls2018_df = Calls2018_df.copy()
Calls2018_df = Calls2018_df.rename(columns={'TimeArrive': 'TimeArrival'})
Calls2018_df['Year'] = Calls2018_df['TimeCreate'].dt.year
Calls2018_df['Date'] = Calls2018_df['TimeCreate'].dt.date
Calls2018_df['Month'] = Calls2018_df['TimeCreate'].dt.month
CallsbyMonth2018 = Calls2018_df[Calls2018_df['Month'] <= 8]
CallsbyMonth2018 = Calls2018_df[Calls2018_df['Month'] >= 3]
Calls2019_df = Calls2019_df.copy()
Calls2019_df['Year'] = Calls2019_df['TimeCreate'].dt.year
Calls2019_df['Date'] = Calls2019_df['TimeCreate'].dt.date
Calls2019_df['Month'] = Calls2019_df['TimeCreate'].dt.month
CallsbyMonth2019 = Calls2019_df[Calls2019_df['Month'] <= 8]
CallsbyMonth2019 = Calls2019_df[Calls2019_df['Month'] >= 3]
Calls2020_df = Calls2020_df.copy()
Calls2020_df = Calls2020_df.rename(columns={'TimeArrive': 'TimeArrival'})
Calls2020_df['Year'] = Calls2020_df['TimeCreate'].dt.year
Calls2020_df['Date'] = Calls2020_df['TimeCreate'].dt.date
Calls2020_df['Month'] = Calls2020_df['TimeCreate'].dt.month
CallsbyMonth2020 = Calls2020_df[Calls2020_df['Month'] <= 8]
CallsbyMonth2020 = Calls2020_df[Calls2020_df['Month'] >= 3]
CallsbyMonth = CallsbyMonth2018.append(CallsbyMonth2019)
CallsbyMonth = CallsbyMonth.append(CallsbyMonth2020)
CallsbyMonth = CallsbyMonth.reset_index()
CallsbyMonth["Month"] = CallsbyMonth["Month"].map({
3: "March",
4: "Apr",
5: "May",
6: "June",
7: "July",
8: "Aug",
})
CallsbyMonth.head()
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
result = CallsbyMonth.pivot_table(index='Month',
columns='Year', values='TypeText', aggfunc='count')
flattened = pd.DataFrame(result.to_records())
std_number_calls = flattened.merge(Air_arrivals, on='Month')
std_number_calls['Calls per 100,000 2018'] = (std_number_calls['2018']/std_number_calls['Total Pop 2018'])*100000
std_number_calls['Calls per 100,000 2019'] = (std_number_calls['2019']/std_number_calls['Total Pop 2019'])*100000
std_number_calls['Calls per 100,000 2020'] = (std_number_calls['2020']/std_number_calls['Total Pop 2020'])*100000
result = std_number_calls.pivot_table(index='Month',
values=['Calls per 100,000 2018',
'Calls per 100,000 2019',
'Calls per 100,000 2020'])
monthOrder = ['March', 'Apr', 'May', 'June', 'July', 'Aug']
ax = result.loc[monthOrder].plot(kind="bar", legend=True,
title ="Number of Calls Per 100,000 People", figsize=(20, 8), fontsize=16)
ax.legend(['2018','2019','2020'])
plt.xlabel("Month")
plt.ylabel("Number of Calls")
plt.show()
This graph shows the number of calls made per 100,000 people in the months of March through August in 2018, 2019, and 2020. As we predicted, more calls were made per 100,000 people during the spikes of the pandemic. Things do even out in August though, as restricitions are lifted.
from IPython.display import display_html
def display_side_by_side(*args):
html_str=''
for df in args:
html_str+=df.to_html()
display_html(html_str.replace('table','table style="display:inline"'),raw=True)
TopCalls_2018 = CallsbyMonth2018.loc[
(CallsbyMonth2018['TypeText'] != 'AREA CHECK')
& (CallsbyMonth2018['TypeText'] != 'BUSINESS CHECK')
& (CallsbyMonth2018['TypeText'] != 'COMPLAINT OTHER')
& (CallsbyMonth2018['TypeText'] != 'RETURN FOR ADDITIONAL INFO')
& (CallsbyMonth2018['TypeText'] != 'DISTURBANCE (OTHER)')]
TopCalls_2018 = TopCalls_2018['TypeText'].value_counts()[:10].rename_axis('2018').to_frame('counts')
TopCalls_2019 = CallsbyMonth2019.loc[
(CallsbyMonth2019['TypeText'] != 'AREA CHECK')
& (CallsbyMonth2019['TypeText'] != 'BUSINESS CHECK')
& (CallsbyMonth2019['TypeText'] != 'COMPLAINT OTHER')
& (CallsbyMonth2019['TypeText'] != 'RETURN FOR ADDITIONAL INFO')
& (CallsbyMonth2019['TypeText'] != 'DISTURBANCE (OTHER)')]
TopCalls_2019 = TopCalls_2019['TypeText'].value_counts()[:10].rename_axis('2019').to_frame('counts')
TopCalls_2020 = CallsbyMonth2020.loc[
(CallsbyMonth2020['TypeText'] != 'AREA CHECK')
& (CallsbyMonth2020['TypeText'] != 'BUSINESS CHECK')
& (CallsbyMonth2020['TypeText'] != 'COMPLAINT OTHER')
& (CallsbyMonth2020['TypeText'] != 'RETURN FOR ADDITIONAL INFO')
& (CallsbyMonth2020['TypeText'] != 'DISTURBANCE (OTHER)')]
TopCalls_2020 = TopCalls_2020['TypeText'].value_counts()[:10].rename_axis('2020').to_frame('counts')
display_side_by_side(TopCalls_2018,TopCalls_2019,TopCalls_2020)
The tables above show the top types of calls made in 2018, 2019, and 2020. It does not appear they vary too greatly. However, we can see tthat traffic incidents are not a top call in 2020 while public gatherings and mental patient are. We can be sure that public gatherings is a result of the pandemic. We can speculate that mental patient is now a top call in 2020 due the depressing reality that is lockdown and disease; however, that is purely speculation.
!pip install folium
import folium
April2020 = CallsbyMonth2020.loc[(CallsbyMonth2020['Month'] == 4)& (CallsbyMonth2020['TypeText'] == 'PUBLIC GATHERING')]
locations = April2020[['Latitude', 'Longitude']].dropna()
locationlist = locations.values.tolist()
len(locationlist)
from folium.plugins import FastMarkerCluster
NOLA_location = [29.951065, -90.071533 ]
m = folium.Map(location=NOLA_location,
zoom_start=15,
tiles='openstreetmap')
m.add_child(FastMarkerCluster(locationlist, name='2020'))
Looking deeper into this new type of call about Public Gatherings, this visualization shows calls placed about Public Gatherings (608 calls) in April 2020. Notice the areas with higher number of calls, such as Mid City and Uptown - near Tulane's Campus.
DomesticAssault = CallsbyMonth.loc[(CallsbyMonth['TypeText'] == 'AGGRAVATED ASSAULT DOMESTIC')|(CallsbyMonth['TypeText'] == 'SIMPLE ASSAULT DOMESTIC')
|(CallsbyMonth['TypeText'] == 'CRIMINAL DAMAGE DOMESTIC')|(CallsbyMonth['TypeText'] == 'CRIMINAL MISCHIEF DOMESTIC')
|(CallsbyMonth['TypeText'] == 'AGGRAVATED BATTERY DOMESTIC')|(CallsbyMonth['TypeText'] == 'SIMPLE BATTERY DOMESTIC') ]
result = DomesticAssault.pivot_table(index='Month',
columns='Year', values='TypeText', aggfunc='count')
flattened = pd.DataFrame(result.to_records())
std_number_calls = flattened.merge(Air_arrivals, on='Month')
std_number_calls['Calls per 100,000 2018'] = (std_number_calls['2018']/std_number_calls['Total Pop 2018'])*100000
std_number_calls['Calls per 100,000 2019'] = (std_number_calls['2019']/std_number_calls['Total Pop 2019'])*100000
std_number_calls['Calls per 100,000 2020'] = (std_number_calls['2020']/std_number_calls['Total Pop 2020'])*100000
result = std_number_calls.pivot_table(index='Month',
values=['Calls per 100,000 2018',
'Calls per 100,000 2019',
'Calls per 100,000 2020'])
monthOrder = ['March', 'Apr', 'May', 'June', 'July', 'Aug']
ax = result.loc[monthOrder].plot(kind="bar", legend=True,
title ="Domestic Assault Calls Per 100,000 People", figsize=(20, 8), fontsize=16)
ax.legend(['2018','2019','2020'])
plt.xlabel("Month")
plt.ylabel("Number of Calls")
plt.show()
A hypothesis of ours was a possible increase in domestic assault due to mandatory lockdowns at the beginning of the coronavirus spread...this is actually not something we saw in all the months. We hypothesized this based on the idea that some people may be forced to lockdown in abusive households. This data suggests that there was an increase in domestic assault and violence. This suggestion is important because it can help us put out more resources regarding domestic violence now that we see the problem has exacerbated during the pandemic.
ViolentCrime = CallsbyMonth[CallsbyMonth['TypeText'].str.contains("RAPE")|CallsbyMonth['TypeText'].str.contains("ASSAULT")
|CallsbyMonth['TypeText'].str.contains("BATTERY")]
result = ViolentCrime.pivot_table(index='Month',
columns='Year', values='TypeText', aggfunc='count')
flattened = pd.DataFrame(result.to_records())
std_number_calls = flattened.merge(Air_arrivals, on='Month')
std_number_calls['Calls per 100,000 2018'] = (std_number_calls['2018']/std_number_calls['Total Pop 2018'])*100000
std_number_calls['Calls per 100,000 2019'] = (std_number_calls['2019']/std_number_calls['Total Pop 2019'])*100000
std_number_calls['Calls per 100,000 2020'] = (std_number_calls['2020']/std_number_calls['Total Pop 2020'])*100000
result = std_number_calls.pivot_table(index='Month',
values=['Calls per 100,000 2018',
'Calls per 100,000 2019',
'Calls per 100,000 2020'])
monthOrder = ['March', 'Apr', 'May', 'June', 'July', 'Aug']
ax = result.loc[monthOrder].plot(kind="bar", legend=True,
title ="Violent Crime Calls Per 100,000 People", figsize=(20, 8), fontsize=16)
ax.legend(['2018','2019','2020'])
plt.xlabel("Month")
plt.ylabel("Number of Calls")
plt.show()
A hypothesis of ours was a possible decrease in violent crime due to mandatory lockdowns at the beginning of the coronavirus spread. This data suggest that we are incorrect and that there are more violent crime reports overall in 2020 per 100,000 people. We can only speculate as to why, but the pandemic introduces a lot of new factors such as loss of employment, mental health issues, and lockdowns. In Aguust, a time where Louisiana moved to Phase 3 and less strict restrictions regarding coronavirus were imposed, we see things start to even out in comparison with 2018 and 2019. Our data suggests that coronavirus is correlated with more violent crime.
TrafficIncidents = CallsbyMonth.loc[(CallsbyMonth['TypeText'] == 'TRAFFIC INCIDENT')]
result = TrafficIncidents.pivot_table(index='Month',
columns='Year', values='TypeText', aggfunc='count')
flattened = pd.DataFrame(result.to_records())
std_number_calls = flattened.merge(Air_arrivals, on='Month')
std_number_calls['Calls per 100,000 2018'] = (std_number_calls['2018']/std_number_calls['Total Pop 2018'])*100000
std_number_calls['Calls per 100,000 2019'] = (std_number_calls['2019']/std_number_calls['Total Pop 2019'])*100000
std_number_calls['Calls per 100,000 2020'] = (std_number_calls['2020']/std_number_calls['Total Pop 2020'])*100000
result = std_number_calls.pivot_table(index='Month',
values=['Calls per 100,000 2018',
'Calls per 100,000 2019',
'Calls per 100,000 2020'])
monthOrder = ['March', 'Apr', 'May', 'June', 'July', 'Aug']
ax = result.loc[monthOrder].plot(kind="bar", legend=True,
title ="Traffic Incident Calls Per 100,000 People", figsize=(20, 8), fontsize=16)
ax.legend(['2018','2019','2020'])
plt.xlabel("Month")
plt.ylabel("Number of Calls")
plt.show()
We hypothesized that traffic incidents would decreased during lockdown due to the most parsimonious explanation that less people would be on the road due to lockdowns. Our hypothesis turned out to be true for all months. We were surpised that traffic incident calls did not increase as restrictions began to be lifted in August though. But, it is important to note that many people still work from home at this time.
Suicide = CallsbyMonth.loc[(CallsbyMonth['TypeText'] == 'SUICIDE')]
result = Suicide.pivot_table(index='Month',
columns='Year', values='TypeText', aggfunc='count')
flattened = pd.DataFrame(result.to_records())
std_number_calls = flattened.merge(Air_arrivals, on='Month')
std_number_calls['Calls per 100,000 2018'] = (std_number_calls['2018']/std_number_calls['Total Pop 2018'])*100000
std_number_calls['Calls per 100,000 2019'] = (std_number_calls['2019']/std_number_calls['Total Pop 2019'])*100000
std_number_calls['Calls per 100,000 2020'] = (std_number_calls['2020']/std_number_calls['Total Pop 2020'])*100000
result = std_number_calls.pivot_table(index='Month',
values=['Calls per 100,000 2018',
'Calls per 100,000 2019',
'Calls per 100,000 2020'])
monthOrder = ['March', 'Apr', 'May', 'June', 'July', 'Aug']
ax = result.loc[monthOrder].plot(kind="bar", legend=True,
title ="Suicide Calls Per 100,000 People", figsize=(20, 8), fontsize=16)
ax.legend(['2018','2019','2020'])
plt.xlabel("Month")
plt.ylabel("Number of Calls")
plt.show()
Some positive findings: we do not see an increase in calls reporting suicides. In fact, aside from March 2018, trends seem to remain fairly even between the past two years.
Homelessness = CallsbyMonth.loc[(CallsbyMonth['TypeText'] == 'HOMELESS')]
result = Homelessness.pivot_table(index='Month',
columns='Year', values='TypeText', aggfunc='count')
flattened = pd.DataFrame(result.to_records())
std_number_calls = flattened.merge(Air_arrivals, on='Month')
std_number_calls['Calls per 100,000 2018'] = (std_number_calls['2018']/std_number_calls['Total Pop 2018'])*100000
std_number_calls['Calls per 100,000 2019'] = (std_number_calls['2019']/std_number_calls['Total Pop 2019'])*100000
std_number_calls['Calls per 100,000 2020'] = (std_number_calls['2020']/std_number_calls['Total Pop 2020'])*100000
result = std_number_calls.pivot_table(index='Month',
values=['Calls per 100,000 2018',
'Calls per 100,000 2019',
'Calls per 100,000 2020'])
monthOrder = ['March', 'Apr', 'May', 'June', 'July', 'Aug']
ax = result.loc[monthOrder].plot(kind="bar", legend=True,
title ="Homelessness Calls Per 100,000 People", figsize=(20, 8), fontsize=16)
ax.legend(['2018','2019','2020'])
plt.xlabel("Month")
plt.ylabel("Number of Calls")
plt.show()
Lastly, due to the increased the sheer nature of the virus and economic downfall, we predicted more calls reporting homelessness. Even with the population in New Orleans shrinking in 2020 due to lack of tourism and travel, we still see significantly more calls at In March, April and May, at the height of lockdowns and during the first spike of the pandemic.
Our main challenge revolved around the issue that we may be comparing unequal sizes of population in 2018, 2019 and 2020. Due to this, we chose to standardize our populations by analyzing calls per 100,000 people. We did this by estimating a total population per month by adding New Orlean's total population each year to the number of arrivals at MSY airport each month.
Coronavirus is correlated with a higher number of 911 calls.
Data on domestic assault calls accepts our hypothesis that lockdowns would increase domestic assault. A study also found X-ray evidence points to pandemic lockdowns triggering a surge in cases of domestic violence..
More violent crime was reported at the height of coronavirus and lockdowns. Although it is important to keep in mind that a possible confound here could be police violence, rioting, and protests. Reference that supports this.
Less traffic incidents during the lockdowns and overall throughout the pandemic.
No significant changes in Suicide Calls.
More reports of homelessness at the height of pandemic. Reference that supports this.
And of course, a new whole new 911 call category reporting public gatherings!