analyses/laser_strikes_2010_2018.Rmd at master · airsafe/analyses · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
---
title: 'Patterns of laser strikes on US aircraft: 2010 to 2018'
author: "Todd Curtis"
date: "December 2, 2019"
output:
  html_document: default
  pdf_document: default
---

```{r, echo=FALSE, message=FALSE, warning=FALSE}

# === Created 27 August 2019 ===

# Data file of laser events was located at FAA page "Laser News, Laws, & Civil
# Penalties" page at https://www.faa.gov/about/initiatives/lasers/laws/

# Original data is at https://www.faa.gov/about/initiatives/lasers/laws/laser_incidents_2010-2014.xls

# Cleand and converted to a CSV file and can be downloaded from AirSafe.com at
#       http://www.airsafe.com/analyze/faa_laser_data_2010_to_2019.csv"

# Operational data on air carriers taken from the FAA Operations Network (OPSNET)
#       site at https://aspm.faa.gov/opsnet/sys/airport.asp

# Raw data included are all 50 states, the District of Columbia, and Puerto Rico, and several US territories.


# ===PACKAGES===
options(repos = c(CRAN = "http://cran.rstudio.com"))
if("downloader" %in% rownames(installed.packages()) == FALSE)
{install.packages("downloader")}
library(downloader)

if("maps" %in% rownames(installed.packages()) == FALSE)
{install.packages("maps")}
library(maps)

if("ggplot2" %in% rownames(installed.packages()) == FALSE)
{install.packages("ggplot2")}
library(ggplot2)


if("stringr" %in% rownames(installed.packages()) == FALSE)
{install.packages("stringr")}
library(stringr)


# === END PACKAGES ===

# ===FUNCTIONS===

# === Function: expanded_date ===
expanded_date = function(df, date_var_name){
        # Function: Insert processed date value into data frame
        #       Goal is to take the date, and add the following variables:
        #       Day_num (day of the month), Month, Year, Weekday
        #       Month and Weekday will be three-character abbreviation
        # Requirements: Date must be in the format '%d-%b-%y' and must not be NA or missing
        # df = laserhits
        # date_var_name = "Date"
        # Create Day_num variable
        Day_num = as.numeric(format(df[,date_var_name],'%d'))

        # Create Month variable
        Month =  months(df[,date_var_name], abbreviate = TRUE)
        Month = factor(Month, levels = c("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"))

        # Create Year variable
        Year = as.numeric(format(df[,date_var_name],'%Y')) # Four-digit year format

        # Create Weekday variable
        Weekday = weekdays(df[,date_var_name], abbreviate = TRUE)
        Weekday = factor(Weekday, levels = c("Sun", "Mon", "Tue", "Wed", "Thu", "Fri", "Sat"))

        # Insert new variables into data frame

        df_split = which(colnames(df)==date_var_name)
        df = cbind(df[,1:df_split], Day_num, Month, Year, Weekday, df[,(df_split+1):ncol(df)])
} # === End expanded_date ===

# === Function: expanded_time ===
expanded_time = function(df, time_var_name){
        # If the time variable is in the 24-hour (HHMM) format in the
        #       original file, converting that file to CSV format may remove leading zeros.
        #       The goal is to put the time into the
        #       format HH:MM by doing a combination of adding leading zeros
        #       and inserting a colon to get to HH:MM format

        # In addition to converting the format, an additional factor variable
        #       will be created, with 25 levels including any missing time

        #       Requirements: Missing time values are already coded as NA

        df[,time_var_name] = as.character(df[,time_var_name])
        Raw_time=df[,time_var_name]
        # Paste string "00:0 if one character long
        one.long.ndx = which(nchar(df[,time_var_name])==1)

        df[,time_var_name][one.long.ndx] = paste0("00:0",df[,time_var_name][one.long.ndx])

        # Add two leading zeros and colon if two characters long
        # Becasue 'NA' is two characters long, need to
        #       find intersection of those not NA and those
        #       two characters long

        non.NA.values = which(!is.na(df[,time_var_name]))
        two.long.all = which(nchar(df[,time_var_name])==2)
        two.long.ndx = intersect(non.NA.values, two.long.all)

        df[,time_var_name][two.long.ndx] = paste0("00:",df[,time_var_name][two.long.ndx])

        # Add one leading zero insert colon before last two characters if three characters long
        three.long.ndx = which(nchar(df[,time_var_name])==3)

        df[,time_var_name][three.long.ndx] = paste0("0",substr(df[,time_var_name][three.long.ndx], 1,1),":",substr(df[,time_var_name][three.long.ndx], 2,3))


        # Insert colon after second character if time is four characters long (1000 to 2359, as well as missing time value of 9999)
        four.long.ndx = which(nchar(df[,time_var_name])==4)

        df[,time_var_name][four.long.ndx] = paste0(substr(df[,time_var_name][four.long.ndx], 1,2),":",substr(df[,time_var_name][four.long.ndx], 3,4))

        #==Create factor variable for time of day==
        # Will also create a factor variable of time by hour of the day
        # Convert hour vector so that time is put into 25 bins
        #       with bin 1 (coded as "1) including events from 0000 to 0059 hours, etc. until bin 24
        #       Bin 25 will be coded as "25" and will include those events with an unknown time.
        #       Then create an ordered factor variable from 1-24, plus 25

        Hour_Category=as.integer(substr(df[,time_var_name],1,2))
        Hour_Category=Hour_Category+1
        Hour_Category[which(Hour_Category==100)]=25
        Hour_Category=as.factor(Hour_Category)

        df_split = which(colnames(df) == time_var_name)
        df = cbind(df[,1:df_split], Raw_time, Hour_Category, df[,(df_split+1):ncol(df)])


} # === End expanded_time ===

# === Function: insert_variable inserts a new vector after a specific data frame column ===
insert_variable = function(df, insert_after_name, new_data){
        # Note: Need the data frame name, the insert location',
        #       column name, and name of the new data
        #       The new data may be one variable equal in length to the number of rows of
        #       the data frame df, or it is a data frame with
        #       the same number of rows as df

        df_split = which(colnames(df) == insert_after_name)
        df = cbind(df[,1:df_split], new_data, df[,(df_split+1):ncol(df)])
} # === End insert_variable function ===

# Function to create categorical variable from widely spaced numerical vector
make_factor = function(input.num.vec, num.limits, factor.levels){
        # This function takes a numeric vector, values that define value ranges
        #       to be associated with a new factor vector and returns a factor vector
        #       of the same length.

        # First step is to change numbers to specific number levels
        # Second step is to define that vector as factor and simultaneously
        #       relabel the factor levels
        num.limits= c(1999,17999,39999, 328000)
        factor.levels = c("under_FL20", "FL20_FL180", "FL180_FL400", "FL400_plus")

        num.limits = sort(num.limits) # Ensure number limits are in order

        if((length(num.limits)) != length(factor.levels)){paste("Incorrect number of factor levels given the number limits")}

        for(i in 1:length(factor.levels)){

                if(i==1){
                        input.num.vec[input.num.vec <= num.limits[1]] = -1
                }
                else if (i!=1){
                        input.num.vec[(input.num.vec > num.limits[i-1]) & (input.num.vec <= num.limits[i])]  = -i
                }
        }
        input.num.vec = -input.num.vec
        new.factor.vec = factor(input.num.vec, labels=factor.levels, ordered = TRUE)
} # End make_factor function

na_to_factor_level = function(df, cat_cols){
        # Turns any NA value in a vector into a factor level
        for(col in cat_cols){
                df[,col] = addNA(df[,col])
        }
} # End na_to_factor_level function

# === END OF FUNCTIONS ===

# === DATA CLEANING ===
# The laser event data was reviewed and cleaned prior to uploading
# The raw data file from the FAA contained numerous cases of incorrect
# data with respect to location (airport, city, and state), including misspellings and capitalization errors,
# as well as missing data. The events were manually reviewed to correct these errors when sufficient
# information was contained in the rest of the record.

# Any missing or incomplete data was coded as "UNK" prior to uploading

# In some cases, the airport location code was substituted
# for a navigational aid code when they were located at or
# near an airport. For example, several reports for the city
# of Baltimore, MD used the 'BAL' IATA code,
# which is for the VORTAC at the field, and the arport code
# 'BWI' was substituted.

# Incorrect, misspelled, or missing data for any variable
# was corrected or filled in when enough informaiton was
# available in the record.
# Example errors inculude:
#       - 'GUA' instead of 'GUM' for GUAM,
#       - 'VHP' instead of 'VPZ' for Valparasio, IN
#       - 'POM' instead of 'POC' for Pomona or Ontario  , California

# A variety of resources were used to identify key data for some records, including:
#       - World Airport Codes - https://www.world-airport-codes.com/
#       - FlightAware - https://flightaware.com
#       - AirFleets - https://www.airfleets.net/home/
#       - Nations online - http://www.nationsonline.org/oneworld/IATA_Codes/airport_code_list.htm
#       - Locations identifier search tool - https://nfdc.faa.gov/xwiki/bin/view/NFDC/Location+Identifiers+Search+Tool
#       - AirNav.com - https://www.airnav.com/airports/

# Reported Laser colors were standardized
# by making all inputs with multiple identified colors of the form Color1/Color2,
# with the colors listed alphabetically, insuring that the first letter in
# a single word color identifier was capitalized, and correcting misspellings.
# Example: Blue and Green, became Multiple (Blue, Green), and Blue or green become Multiple (Blue or Green)

# === UPLOADING ===

laserhits = NULL
raw.laserhits = NULL
raw.laserhits = read.csv("http://www.airsafe.com/analyze/faa_laser_data_2010_2018.csv")
laserhits=raw.laserhits
# === END OF UPLOADING ===

# === POST-UPLOAD DATA FRAME FORMATTING AND CLEANING ===
# STEP 1: Ensure that all of the variables are of type character
# while making sure none of these are treated as factors
laserhits = data.frame(lapply(laserhits,as.character), stringsAsFactors=FALSE)

# STEP 2: Remove all leading and trailing spaces from selected variables
# Using the function trimws removes leading and trailing space characters
laserhits = as.data.frame(apply(laserhits, 2, trimws))

# STEP 3: Change all "UNK" entries to NA
laserhits[laserhits == "UNK"] = NA

# STEP 4: Identify and remove duplicated records,
#       (Must ingnore the unique ID that was added to the database)

duplicated.ndx = which(duplicated(laserhits[,-1]))
laserhits = laserhits[-duplicated.ndx,]

# The following analyses will depend on records that have inputs for
#       all of the following variables:
#               Date
#               Time (UTC)
#               Aircraft ID
#               Aircraft type
#               Location (IATA or ICAO code)
#               City
#               State
#               Metro_area


initial_records = nrow(laserhits)

useful.vars = c("Date", "Time","Aircraft_Type", "Airport", "Altitude", "City", "State")
useful.cases.ndx = complete.cases(laserhits[,useful.vars])
laserhits = laserhits[useful.cases.ndx,]

# === END POST-UPLOAD DATA FRAME FORMATTING AND CLEANING ===

# === REFORMAT OR EXPAND VARIALBES ===

# Ensure date variable is of type date
laserhits$Date  = as.Date(laserhits$Date, format = '%d-%b-%y')

# Ensure Altitudes are of type numeric
laserhits$Altitude = as.numeric(as.character(laserhits$Altitude))


# Expand the date variable by adding variables for Year, Month, Weekday, and (numerical) Day
laserhits = expanded_date(laserhits, "Date")

# Add a variable that displays time in the format hh:mm
laserhits = expanded_time(laserhits, "Time")

# Restrict data to events prior to 2019
laserhits = laserhits[laserhits$Year < 2019,]

# Restrict data to airline aircraft only
# Note: The following aircraft types were associated with
#       the the air carrier data in the FAA data used in this analysis:
#       - 1: Large jet airliner (over 100 seats, single ailse)
#       - 2: Large jet airliner (over 100 seats, twin ailse)
#       - 3: Regional jet airliner (60 to 100 seats)
#       - 4: Prop-drive airliner (60 seats and above)

laserhits = laserhits[laserhits$Aircraft_Type %in% c("1","2","3","4"),]

# Create a new variable of altitude categories
new_variable = as.numeric(as.character(laserhits$Altitude))
# Next is to add one foot to zero altitude records
new_variable[new_variable == 0] = 1


# Insert a factor variable based on altitudes
num.limits= c(1999,17999,39999, 328000)
factor.levels = c("below_FL20", "FL20_to_FL180", "FL180_to_FL400", "above_FL400")
laserhits$Altitude_cat = make_factor(laserhits$Altitude, num.limits, factor.levels)

# identify numeric and factor variables
numeric.vars = c("Day_num", "Year", "Raw_time", "Altitude", "Log_altitude")
factor.vars = c("Event_ID", "Month", "Weekday", "Time", "Hour_Category", "Flight_ID", "Model", "Aircraft_Type", "Altitude_cat", "Airport",
                "Laser_color", "Injuries", "City", "State", "Metro_area")

numeric.vars.ndx = which(colnames(laserhits) %in% numeric.vars)
factor.vars.ndx = which(colnames(laserhits) %in% factor.vars)

# Drop unused factors for selected variabls
laserhits[,factor.vars] = droplevels(laserhits[,factor.vars])

# === END REFORMAT OR EXPAND VARIALBES ===

# === CREATE VARIABLE OF STATE ABBREVIATIONS ===
# Will match state names to official two-letter state abbreviations

# R has built in vectors of state names and abbreviations (state.name and state.abb)
#       used by the US Postal Service (USPS)

# Source is USPS Publication 28 - postal addressing standards
# Located at https://pe.usps.com/text/pub28/28apb.htm#ep19305 accessed 8 September 2019

# The built-in abbrevations in R datasets only the
#       names (state.name) and abbreviations (state.abb) for the 50 US states.
#       Additional names and abbreviations were added to match those in the laserhits data frame.

# The analysis will only deal with geographical areas that were included in the
#       FAA data about air carrier opertions.

# The first step would be to determine what states
#       were included in the FAA data.

# The FAA data uses state codes, so step zero would be to determine
#       what state codes could be used in the laserhits data frame.


# Find state values in laserhits not in state.name list
missing.states = which(!(unique(laserhits$State) %in% state.name))


# Those missing territory  names and their abbreviations
#       (District of Columbia, Puerto Rico, Guam, and Virgin Islands),
#       will be used to create a two-letter code for every state and territory in the laserhits database

extra.abb = c("DC", "PR","GU", "VI", "MP")
extra.name = c("District of Columbia", "Puerto Rico", "Guam", "Virgin Islands", "Northern Mariana Islands")

# Now append them to the R-provided list of 50 state names and abbreviations
usps.abb = as.character(c(state.abb,extra.abb))
usps.state = as.character(c(state.name,extra.name))
usps.state.abb = data.frame(cbind(usps.state,usps.abb))
colnames(usps.state.abb) = c("State","State_abb")


# Will create a variable with the two-letter USPS abbreviation

# Initialize the full state name abbreviation variable with a number of NA values equal to the
# number of input records to ensure that the final vector will be compatible with ntsb.data
laserhits$State_abb = rep(NA,nrow(laserhits))

# Now replace an NA value with an appropriate two-letter code.
#       If not an appropriate state or territory name, remains as NA
for (i in 1:nrow(laserhits)) {
        if(laserhits$State[i] %in% usps.state) {
                laserhits$State_abb[i] = usps.abb[which(usps.state == laserhits$State[i])]
        }
}

# Eliminate any records with a remaining NA value for the state code
laserhits = laserhits[!is.na(laserhits$State_abb),]

# === END CREATE VARIALBLE OF STATE ABBREVIATIONS ===

# === INPUT FAA FLIGHT STATISTICS ===

# Raw data taken from FAA at https://aspm.faa.gov/opsnet/sys/main.asp
# Resides at "http://www.airsafe.com/analyze/faa_opsnet_by_state_2019.csv"

# Converted to a CSV file and is downloadable from AirSafe.com
flight.data = NULL
flight.data.raw = NULL
raw.flight.data = read.csv("http://www.airsafe.com/analyze/faa_opsnet_by_state_2019.csv")
flight.data=raw.flight.data

# STEP 1: Ensure that all of the variables are of type character
# while making sure none of these are treated as factors
flight.data = data.frame(sapply(flight.data,as.character), stringsAsFactors=FALSE)

# STEP 2: Ensure that Air_carrier_ops, Total_ops, and Percent_air_carrier are numeric
flight.data[,3:5] = apply(flight.data[,3:5],2,as.numeric)

# Each record contains data for a single airport, including:
#       - State
#       - Airport code
#       - Air carrier operations
#       - Total operations (including air carrier)
#       - Percent of air carrier operations

# STEP 3: Exclude any airport with no Air Carrier operations
flight.data = flight.data[flight.data$Air_carrier_ops>0,]

# STEP 4: Rename column "State" to "State.abb" to be consistent with earlier usage
names(flight.data)[names(flight.data)=="State"] = "State.abb"

# Raw data has flights by airport and by state, so need
#       a data frame that summarizes data by state
state.data.raw = flight.data[,c("State.abb","Air_carrier_ops")]
# Make State.abb a factor
#state.data.raw$State.abb = as.factor(state.data.raw$State.abb)

# Build a summary of traffic by state
state.total.ops = NULL
state.total.events = NULL
unique.state.full = NULL
unique.state = unique(state.data.raw$State.abb)
for(i in 1:length(unique.state)){
        # Get air carrier operations for each state
        state.total.ops = c(state.total.ops, sum(state.data.raw$Air_carrier_ops[state.data.raw$State.abb == unique.state[i]]))
        # Get total events by state
        state.total.events = c(state.total.events, sum(laserhits$State_abb == unique.state[i]))
        unique.state.full = c(unique.state.full, as.character(usps.state.abb[which(usps.state.abb$State_abb == unique.state[i]),"State"]))
}
state.data = as.data.frame(cbind(unique.state.full, unique.state, state.total.ops,state.total.events))
colnames(state.data) = c("State", "State.abb", "State_air_carrier_ops", "State_laser_events")
state.data$State_air_carrier_ops = as.numeric(as.character(state.data$State_air_carrier_ops))
state.data$State_laser_events = as.numeric(as.character(state.data$State_laser_events))

# Add column for events per 100K
state.data$Rate_per_100K = round(100000*(state.data$State_laser_events/state.data$State_air_carrier_ops), digits = 2)


# Add column for ratio events per 100K compared to all US
rate_per_100K_US = 100000*(sum(state.data$State_laser_events)/sum(state.data$State_air_carrier_ops))
state.data$Rate_per_100K_ratio_US = round(state.data$Rate_per_100K/rate_per_100K_US, digits = 2)


# Add column to state.data data frame for percentage of total air carrier flights
# usa.tot.flights = sum(state.data$State_air_carrier_ops)
# state.data$Percent.tot.usa = round(100*state.data$State_air_carrier_ops/usa.tot.flights, digits = 3)

# Add column to state.data data frame for percentage of total laser events
# usa.tot.laser.events = sum(state.data$State_laser_events)
# state.data$Percent.events.usa = round(100*state.data$State_laser_events/usa.tot.laser.events, digits = 3)

# Add column to state.data data frame for rate ratio of percentage of total laser events over percentage of flights
# state.data$Rate_ratio = round(state.data$Percent.events.usa/state.data$Percent.tot.usa, digits = 3)

# Add column for rank of rate ratios
state.data$Rate_ratio_rank = as.integer(rank(-state.data$Rate_per_100K_ratio_US))

# === END INPUT FAA FLIGHT STATISTICS ===


# === METROPOLITAN AREA ANALYSIS

metro.areas = as.character(sort(unique(laserhits$Metro_area)))

# Each metro area is analyzed to create the following data frame variables:
#       Area
#       Events
#       Traffic
#       Percent_US_traffic
#       Event_rate
#       Event_rate_ratio_to_US

# Ensure Metro_area variable is of type character

laserhits$Metro_area = as.character(laserhits$Metro_area)
all.areas = c(metro.areas,"US_metro", "US_non_metro","US_all")
area.events = rep(NA,length(all.areas))
area.traffic = rep(NA,length(all.areas))

# traffic.airports.ndx = rep(NA,length(metro.areas))
# traffic.airports = rep(NA,length(metro.areas))
# list(traffic.airports)

for (i in 1:(length(all.areas) - 3)){
        area.rows = which(laserhits$Metro_area == all.areas[i])
        area.events[i] = length(area.rows)
        area.airports = as.character(unique(laserhits$Airport[area.rows]))
        # traffic.airports.ndx = which(flight.data$Airport %in% area.airports)
        # print(paste(all.areas[i], which(flight.data$Airport %in% area.airports))) # Included only in testing phase
        area.traffic[i] = sum(flight.data$Air_carrier_ops[which(flight.data$Airport %in% area.airports)])
}

# Total metro area figures
msa.events.total = sum(area.events[1:(length(area.events)-3)], na.rm = TRUE)
msa.traffic.total = sum(area.traffic[1:(length(area.traffic)-3)], na.rm = TRUE)
area.events[length(area.events)-2] = msa.events.total
area.traffic[length(area.events)-2] = msa.traffic.total

# The is data for the row consisting of total metro US (US_non_metro)
area.events[length(area.events)-2] = msa.events.total
area.traffic[length(area.events)-2] = msa.traffic.total

# The is data for the row consisting of entire US (US_all)
area.events[length(area.events)] = nrow(laserhits)
area.traffic[length(area.events)] = sum(flight.data$Air_carrier_ops)

# The is data for the row consisting of non-metro US (US_non_metro)
area.events[length(area.events)-1] = nrow(laserhits)-msa.events.total
area.traffic[length(area.events)-1] = sum(flight.data$Air_carrier_ops) - msa.traffic.total


# Summarize the metro areas, non metro areas, and the US as a whole
metro.summary = cbind(all.areas, area.events, area.traffic)
metro.summary = as.data.frame(metro.summary, stringsAsFactors
 = FALSE)
colnames(metro.summary) = c("Metro_area", "Events", "Traffic")
metro.summary$Events = as.numeric(as.character(metro.summary$Events))
metro.summary$Traffic = as.numeric(as.character(metro.summary$Traffic))
metro.summary$Rate_per_100K = round(100000*metro.summary$Events/metro.summary$Traffic, 2)
metro.summary$Rate_ratio_US = round(metro.summary$Rate_per_100K/metro.summary$Rate_per_100K[nrow(metro.summary)], digits = 2)
# metro.summary$Rate_ratio_rank = rank(-metro.summary$Rate_ratio)


# === END METROPOLITAN AREA ANALYSIS

# === AIRILNE FLIGHT IDENTIFIER ANALYSIS
# Goal is to get a quick survey of the top flight identifiers,
#       specifically categorizing them by the three-letter ICAO code
#       at the beginning of the flight identifier
#
# Included events are those with a known value for the following
#       Flight_ID, Model, Altitude, Airport, City, State

included.flight.ids = which(complete.cases(laserhits[,c("Flight_ID", "Model", "Altitude", "Airport", "City", "State")]))

# === END AIRILNE FLIGHT IDENTIFIER ANALYSIS

# === MISC TESTS ===
# Find all the cities with any word listed in all caps
# Note: May have multiple hits on one city, eg. LOS ANGELES
cap_cities = unlist(str_match_all(laserhits$City, "\\b[A-Z]+\\b"))


date.test  = nrow(laserhits) == sum(sapply(laserhits$Date, function(x) !all(is.na(as.Date(as.character(x),format="%d-%b-%y")))))
# ==== END OF MISC ===


# Save the data that will be further processed
write.csv(laserhits, file = "laserhits_playpen_airsafe.csv")

# === QUICK SUMMARY ===

date.range = as.numeric(as.Date(max(laserhits$Date))) - as.numeric(as.Date(min(laserhits$Date))) + 1


# Dates with no reports
# Look for gaps (time differences) from day 2 until the last day
unique.dates = sort(unique(laserhits$Date))
date.gaps = as.numeric(unique.dates[2:(length(unique.dates))]-unique.dates[1:(length(unique.dates)-1)])
date.gaps.ndx = which(date.gaps>1)


# === END OF QUICK SUMMARY ===


# === EXPLORATORY DATA ANALYSIS ===
# Distribution, histogram, and summary of the number of daily laser encounters

# Create a complete table and histogram of days with x-amount of strike events
#  by adding a vector of zero values equal to the number of days with no strikes


daily.strikes=c(rep(0,date.range-length(table(laserhits$Date))),as.data.frame(table(laserhits$Date))[,2])


Hour_num = as.numeric(as.character(laserhits$Hour_Category))


# Chi-square test to see if reports uniformly distributed througout the year


# === INPUT STATE ABBREVIATIONS ===
# Note that in ggplot2, the map function used here has the 48 states of the
# continental US plus the district of Columbia
# For the map displays used in this study, only those 48 states plus the
# District of Columbia will be used in the rest of this analysis'
# However, the FAA air traffic data used in this analysis, inlcudes
# several other areas in addition to the 50 states and the District of
# Columbia. They were previously added earlier in this program.


# extra.abb = c("DC", "GU", "MP", "PR", "VI")
# extra.name = c("District of Columbia", "Guam", "Northern Mariana Is",
#                "Puerto Rico", "Virgin Islands")

# Now append them to the R-provided list of 50 state names and abbreviations
# every.abb = c(state.abb,extra.abb)
# every.state = c(state.name,extra.name)

# For ggplot, all but the lower 48 and DC will be included.
# Full set in usps.state.abb data frame, with the following columns:
#       States: usps.state.abb$State
#       Abbreviations: usps.state.abb$State_abb

# Will take out Alaska, Hawaii, Guam, Northern Mariana Islands,
#       Puerto Rico, and Virgin Islands
used.abb = setdiff(usps.state.abb$State_abb, c("AK", "HI", "GU", "MP", "PR", "VI"))
used.state = setdiff(usps.state.abb$State, c("Alaska", "Hawaii", "Guam",
                                    "Northern Mariana Islands","Puerto Rico",
                                    "Virgin Islands"))

# Will only use a combination of those rows in state.data that correspond
#       to the use.state list

state.combo = state.data[which(state.data$State %in% used.state),]

# The data frame state.combo information that will be used to build four CONUS maps:
#       1. Events by state (lower 48 plus DC)
#       2. Air carrier traffic by state
#       3. Rate of laser events per 100K air carrier flights by state
#       4. Ratio of state event rate compared with the national rate


# The following gets the state boundary data into a data frame
# Regions happpen to be lower case state names
states <- map_data("state")

# map_data is from ggplot2, creates data frame from information from the
# 'maps' package which has a polygon associated with each state.

# ggplot2 likes their full state names (regions) in lower case, so a
# new "regions" column in state.data will have values in lower case.
state.combo$region = tolower(state.combo$State)

# Each row corresponds to a state.
# Create a new variable called 'region' which is the lower case
# version of each state name.
# Note that ggplot2 related variables use lower case of state
# and the map of the states only includes 49 areas
# corresponding to the 48 states in the continental US plus
# the District of Columbia.
#
#
# Merging data frames "states" and "state.combo" by common variable "region")"
#
# The merged data frame "chrono" has relevant laser data in each state region
# Note that because "states" has more rows than "state.combo", the
# number of rows of the merged data.frame will match the
# data frame with the highest number of rows
# In other words, nrow(<merged data frame>) = nrow(states) and
# some of the data from the data frame with the smaller
# number of rows will be duplicated.

choro = merge(states, state.combo, sort = FALSE, by = "region")

# In the 'states' data frame, each region has a unique number (variable 'order'),
# from 1 to nrow(states). The line below reorders the rows by 'order' from 1 to
# nrow(choro). Note again that nrow(choro) = nrow(states)
choro <- choro[order(choro$order), ]

# Note that there are 50 states (regions) but 63 groups.

```

###Summary###
This study reviewed nine years of reports (2010-2018) compiled by the Federal Aviation Administration (FAA) concerning unauthorized laser illuminations of larger capacity transport aircraft (airliners, whether used for passenger or cargo service, that are designed to carry 60 or more passengers) in the US.

Among the findings was that the rate of reported laser events varies widely across the US (the 50 states, the District of Columbia, Puerto Rico, Guam, and other US territories), whether one looks at the rates in individual states or if one looks at the rates within selected US Census-designated Metropolitan Statistical Areas (MSAs).

In addition to examining the reporting rates, some of the insights from a review of the data included the following:

- There were a total of `r format(nrow(laserhits), digits=5, big.mark = ",")` FAA laser incident reports from the years 2010-2018 involving air carrier aircraft that also included information about the location of the encounter.

- Over that nine-year span, on any given day in the United States, there was a `r format((length(unique(laserhits$Date))/date.range)*100, digits=3, big.mark = ",")`% chance that at least one large capacity (60 seats or greater) airliner had a potentially dangerous encounter with a laser beam. In other words, in the `r format(date.range, big.mark=",")`-day period from 1 Janury 2010 to 31 December 2018, there were only `r date.range-length(table(laserhits$Date))` days without at least one reported laser incident involving a larger air carrier type aircraft.

- There were an average of `r format(nrow(laserhits)/date.range, digits=3, big.mark = ",")` laser encounters per day, with as many as `r max(sort(table(laserhits$Date)))` strikes in a single day.

- Laser events were more likely on late Friday night and early Saturday morning, during the months of July through December, and from 0000-0600 hours UTC.

- The average rate of reported laser incidents in the United States was `r metro.summary$Rate_per_100K[nrow(metro.summary)]`  per 100,000 air carrier flights. There were `r sum(state.combo$Rate_per_100K >= metro.summary$Rate_per_100K[nrow(metro.summary)])` states and `r sum(metro.summary$Rate_per_100K >= metro.summary$Rate_per_100K[nrow(metro.summary)])-1` MSAs with a higher encounter rate.

- Southwest, the airline with the highest number of reported laser encounters, had an average of over one reported encounter per day.

- Among the 50 states, Puerto Rico, and the District of Columbia, the reported rate of reported encounters, measured by number of encounters per 100K flights, ranged from a low of 2.52 for Alaska to a high of 992.65 for Delaware.

- Among the 44 MSAs that were analyzed, the rate ranged form a , went from a low of 1.99 for Anchorage, AK to a high of 81.48 for the Riverside-San Bernardino-Ontario, CA Metropolitan Statistical Area, which is east of Los Angeles and includes the California counties of Riverside and San Bernardino counties.

###Background###
In the US, the Federal Aviation Administration has long recognized that unauthorized laser illuminations of aircraft may have numerous hazardous effects on aircrew, including distraction, glare, afterimage, flash blindness, and, in extreme circumstances, persistent or permanent visual impairment ([FAA Advisory Circular 70-2A](http://www.faa.gov/documentLibrary/media/Advisory_Circular/AC_70-2A_.pdf)).

As part of their effort to deal with the hazards posed by lasers, the FAA has encouraged air crew members, air traffic controllers, and the general public to submit reports of aircraft being illuminated by lasers. The FAA has collected this kind of data since at least 2004, and in 2011 published a study ([Report DOT/FAA/AM-11/7](https://www.faa.gov/data_research/research/med_humanfacs/oamtechreports/2010s/media/201107.pdf)) that analyzed 2,492 laser events that occurred in the US from 2004-2008.

Since 2008, the FAA has received substantially more reports. [The FAA's Laser News, Laws, & Civil Penalties page] (https://www.faa.gov/about/initiatives/lasers/laws/) provides a link to Excel files of data covering 2010-2018.


###Focus of this study###
While the events in the FAA databases included reports from all sectors of aviation, including military operations, helicopters, and lighter-than-air aircraft, this study focused on events involving civilian operations involving large transport aircraft that were designed to carry 60 or more passengers. This in part due to the greater potential risk to both life and property when this kind of aircraft is struck by a laser.

###Aircraft models and air traffic used in this study###
Because this study focused on larger airliner type aircraft, the FAA traffic data used to compute laser encounter rates helped define what models were included. The airport traffic data was taken from the ([FAA's Operations Network (OPSNET)](https://aspm.faa.gov/opsnet/sys/Airport.asp)) site and covered the years 2010-2018.

The study focused on air carrier laser encounters, and only used air carrier traffic information. FAA defines air carrier aircraft as aircraft with a seating capacity of more than 60 seats or a maximum payload capacity of more than 18,000 pounds, carrying passengers or cargo for hire or compensation.

The events in this study, all of which were involved in civil aircraft flight operations,  involved the following aircraft models:

* **Airbus:** A220, A300, A310, A318, A319, A320, A321, A330, A340, A350, A380

* **ATR:** ATR 72

* **Boeing:**
707,  727, 737, 747, 747, 757, 767, 777, 787

* **Boeing (McDonnell-Douglas):** 717, DC-8, DC-9, DC-10, MD-10, MD-11, MD-80, MD-90

* **Bombardier (Canadair):** C100, CRJ-700, CRJ-900

* **Bombardier (de Havilland):** Dash 8

* **Embraer:** ERJ 170, ERJ 175, ERJ 190


###Who is this report for?###
This report may be useful for the following kinds of groups:

* Anyone who wants to educate flight crews, airport authorities, law enforcement and other relevant groups as to the extent of the problem (This could be done just cutting and pasting the graphics and basic stats in the report).

* Organizations that deal with policies related to the commercial use of lasers, for example pushing for specific limits on what can be sold to the public.

* Organizations currently involved in understanding or reducing aviation safety risks associated with inappropriate use of lasers.

* Organizations, especially law enforcement agencies that, that seek to understand when and where the risk of laser strikes are highest.

###Methods###
After downloading the data and removing records that did not contain sufficient information on the location of the laser encounter, the data was processed in order to summarize the likelihood of a laser encounter by geographical area (specifially, states, territorries, and selected metropolitan areas), time of day, day of the week, and month of the year. Heat maps were used to help illustrate these relationships.

####Laser encounter data preparation####
The raw laser encounter data was included in an Excel file with each sheet containing information for one calendar year. The various sheets for 2010 to 2018 (the last available complete year at the time of this study) were combined to form one CSV file. There were several variables included for each record, including the following that were used in this study:

- Date
- Time (UTC)
- Aircraft ID
- Aircraft type
- Location (IATA or ICAO code)
- City
- State


The raw data file from the FAA contained numerous cases of incorrect data with respect to location (airport, city, and state), including misspellings and capitalization errors,as well as missing data. The events were manually reviewed to correct these errors when sufficient information was contained in the rest of the record.

Also, for consistency, locations were identified using the three-character IATA codes when they were available for an airport, navigation aid, or other location. Where IATA codes were not available, four-character ICAO codes were used.

Because part of this study focused on air carrier related laser events in selected metropolitan areas, part of the data preparation included adding three variables:

- **Event_ID:** A uniuqe identifier

- **Aircraft_Type:** Category variable for type of aircraft

- **Metro_area:** Identifier for a Metropolitan Statistical Areas (MSAs) as identified by the US Census publication Annual Estimates of the Resident Population: April 1, 2010 to July 1, 2018.

The MSAs used met one or more of the following criteria:

- The MSA was in the top 40 in population in 2018
- The MSA contained an airport that was a hub or major base of operations for one of the top 10 airlines by air carrier traffic

####Laser encounter preliminary data cleaning###
In addition to adding the above variables to each record, the raw data file from the FAA contained numerous cases of missing or incorrect data. The events were manually reviewed to correct these errors when sufficient information was contained in the rest of the record. The following types of changes were made:

- Incorrect, misspelled, or missing data for any variable was corrected or filled in when enough informaiton was available in the record.

- Removing duplicate records (also done a second time within the R program)

- Missing, incorrect, or incomplete data that could not be found or corrected were coded as "UNK".

- Airports, navigational aids,  and other locations identified using the three-character IATA codes or four-character ICAO codes whenever possible,

- In some cases, the airport location code was substituted for a navigational aid code when they were located at or near an airport. For example, sevearl reports for the city of Baltimore, MD used the 'BAL' IATA code, which is for the VORTAC at the field, and the arport code 'BWI' was substituted.

- Reported Laser colors were standardized by making all inputs with multiple identified colors of the form Color1/Color2, with the colors listed alphabetically, insuring that the first letter in a single word color identifier was capitalized, and correcting misspellings. Example: Blue and Green, became Multiple (Blue, Green), and Blue or green become Multiple (Blue or Green)

A variety of resources were used to identify key data for some records, including:

- [World Airport Codes](https://www.world-airport-codes.com/)
- [FlightAware](https://flightaware.com)
- [AirFleets](https://www.airfleets.net/home/)
- [Nations online](http://www.nationsonline.org/oneworld/IATA_Codes/airport_code_list.htm)
- [Locations identifier search tool](https://nfdc.faa.gov/xwiki/bin/view/NFDC/Location+Identifiers+Search+Tool)
- [AirNav.com](https://www.airnav.com/airports/)
- [US Census Bureau](https://www.census.gov/data/datasets/time-series/demo/popest/2010s-total-metro-and-micro-statistical-areas.html)


####Preprocessed data and data dictionary###
This preprocessed data is made available at  [http://www.airsafe.com/analyze/faa_laser_data_2010_2018.csv](http://www.airsafe.com/analyze/faa_laser_data_2010_2018.csv)


The data dictionary that describes the variables in each record is available at [http://www.airsafe.com/analyze/faa_laser_data_dictionary.pdf](http://www.airsafe.com/analyze/faa_laser_data_dictionary.pdf)


```{r, echo=FALSE, message=FALSE}

# First, ensure we have the packages we need

if (!require("downloader")){
      install.packages("downloader")
}
library(downloader)

```


###Data transformation###
Additional data transformations and changes would occur after uploading:

- Initializing all data as type character prior to addiontal changes
- Removing leading and trailing spaces
- Changing all unknown data to type 'NA'
- Removing duplicated records
- Using the date variable to create new variables for day of the month, day of the week, month, and year
- Using the time variable to create to new varibles for time in the HH:MM format, and a categorical variable with 25 values representing both the UTC hour and an uknown time value

Once the revisions were complete, a total of `r length(duplicated.ndx)` duplicated records were removed.

Further processing changed the UTC times to an integer from one to 24 to coincide with the hour of occurrence. Additional variables were added for the day of the week and the month corresponding to the date.

Dates in the FAA data were in form 5-Jan-06, and were converted to the date format of yyyy-mm-dd. The converted dates were used to create two additional variables based on the date, the day of the week and, the month of the year, to ensure proper ordering, the two new variables were made into factors and ordered as they would be in a calendar.

```{r, echo=FALSE}

# First estimate of useful records
initial_records = nrow(laserhits)
useful.vars = c("Date", "Time","Aircraft_Type", "Altitude", "City", "State")
useful.cases.ndx = complete.cases(laserhits[,useful.vars])
laserhits = laserhits[useful.cases.ndx,]

# === END POST-UPLOAD DATA FRAME FORMATTING AND CLEANING ===

```

There are initially `r initial_records` total records, but only  `r sum(useful.cases.ndx)`  records have data in the most important variables: `r useful.vars`.


```{r, echo=FALSE}

# === REFORMAT OR EXPAND VARIALBES ===


# Now create the new variable based on the log of the transformed Altitude variable
laserhits$Altitude_log = round(log(new_variable), digits=4)

# Drop unused factors for selected variabls
# laserhits[,factor.vars] = droplevels(laserhits[,factor.vars])
# paste("Resulting data frame has",nrow(laserhits), "rows and", ncol(laserhits), "columns:")
# colnames(laserhits)
# === END REFORMAT OR EXPAND VARIALBES ===

# === CREATE VARIABLE OF STATE ABBREVIATIONS ===

# Now append them to the R-provided list of 50 state names and abbreviations
usps.abb = as.character(c(state.abb,extra.abb))
usps.state = as.character(c(state.name,extra.name))
usps.state.abb = data.frame(cbind(usps.state,usps.abb))
colnames(usps.state.abb) = c("State","State_abb")


# Will create a variable with the two-letter USPS abbreviation

# Initialize the full state name abbreviation variable with a number of NA values equal to the
# number of input records to ensure that the final vector will be compatible with ntsb.data
laserhits$State_abb = rep(NA,nrow(laserhits))

# Now replace an NA value with an appropriate two-letter code.
#       If not an appropriate state or territory name, remains as NA
for (i in 1:nrow(laserhits)) {
        if(laserhits$State[i] %in% usps.state) {
                laserhits$State_abb[i] = usps.abb[which(usps.state == laserhits$State[i])]
        }
}

# Check if there are any records with a remaining NA value for the state code

# Eliminate any records with a remaining NA value for the state code
laserhits = laserhits[!is.na(laserhits$State_abb),]


# === END INPUT FAA FLIGHT STATISTICS ===
```

###Quick summary of the data###

- From 2010 to 2018, there were `r format(nrow(laserhits), digits=5, big.mark = ",")` encounters where a laser beam affected one or more aircraft at or near at least `r format(length(table(laserhits$Airport)), digits=4, big.mark = ",")`  unique airports or other locations.

- During this nine-year period, there was an average of `r format(nrow(laserhits)/date.range, digits=3)` laser encounters per day, with as many as `r max(table(laserhits$Date))` strikes in a single day. The median number of daily laser encounters was `r median(table(laserhits$Date)[[1]])`.

- There were only `r date.range-length(table(laserhits$Date))` days from the nine-year period 2010-2018 with no reported laser strikes on aircraft in the United States. In other words, on any given day in the United States, there was a `r format((length(unique(laserhits$Date))/date.range)*100, digits=3)`% chance that at least one aircraft will have a potentially dangerous encounter with a laser beam.

Below are several summary graphics illustrating the distribution of laser encounters by:

- Frequency
- Year
- Month, and
- Day of the week

```{r, echo=FALSE}

# Create a complete table and histogram of days with x-amount of strike events
#  by adding a vector of zero values equal to the number of days with no strikes
daily.strikes=c(rep(0,date.range-length(table(laserhits$Date))),as.data.frame(table(laserhits$Date))[,2])

# Note that adding 0.001 to the vector of values helps to align the axes with
# the hourly range of encounter
hist(daily.strikes+0.5, main="Figure 1: Distribution of number of laser encounters in a day",
     xlab="Number of strikes in a day", breaks = seq(-1,36,by=1), xlim=c(-1,40),
     include.lowest=TRUE, col="dodgerblue")

summary(daily.strikes)

```

The following histograms illustrate the distribution of encounters by year, month,  day of the week, and time of day (UTC) respectively.

```{r, echo=FALSE}
plot(as.factor(laserhits$Year), main = "Figure 2: Distribution of number of laser encounters by Year (2010-2018)",xlab = "Year", col="dodgerblue")

plot(laserhits$Month, main="Figure 3: Distribution of number of laser encounters by month  2010-2018",
     ylab = "Number of encounters", xlab="Month of the year (2010-2018)", col="dodgerblue")

plot(laserhits$Weekday, main="Figure 4: Distribution of laser encounters by day of the week (2010-2018)",
     ylab = "Number of encounters", xlab="Day of the week", col="dodgerblue")


plot(laserhits$Hour, main = "Figure 5: Distribution of laser encounters by hour (2010-2018) ",xlab = "Hour", col="dodgerblue")

```

###Testing for uniform distribution of laser encounters###
A chi-square test was used to test the null hypothesis that the laser strikes were uniformly distributed by the day of the week or the month of the year. The null hypothesis was rejected in both cases because the p-value was much less than 0.05:

- Day of the week p-value: `r format(chisq.test(table(laserhits$Weekday))[[3]], digits=3)`
- Month of the year p-value:  `r format(chisq.test(table(laserhits$Month))[[3]], digits=3)`

The distribution by both day of the week and month of the year can be displayed by a table and tested as well for uniformity.

###Table 1: Distribution of laser encounters by month and day of the week ###
The following table describes the distribution of laser encounters by day of the week and month of the year.

```{r, echo=FALSE}
# Table of laser encounters by month of the year and day of the week
print(table(laserhits$Month,laserhits$Weekday),row.names=FALSE)
```
As was the case when conducting a chi-square test on the distribution of laser encounters by day of the week or month of the year, when considering the two together, the null hypothesis of a uniform distribution of strike encounters by a combination of day of the week and month of the year would also be rejected (p-value of `r format(chisq.test(table(laserhits$Month,laserhits$Weekday))[[3]], digits=3)`.

####Using heat maps to illustrate relationships####

It is also possible to visually depict this non-uniform distribution using heat maps. The heat map would reflect the data in the previous table, with 84 cells representing a combination of the month and day of the week. The colors correspond to a level of intensity with white being on the low end of the scale and dark blue on the upper end.

#####Heat map 1: Showing months with the most hits#####
There are three ways to display this heat map, all of which use the data in Table 1. In the first option, the darkest cell corresponds to the month of the year and day of the week with the most laser encounters.

```{r, echo=FALSE}
# Heat maps will use a color pallette that goes from white for lowest to dark blue for the highest value

palette = colorRampPalette(c('#ffffff','#0000ff'))(64)

heatmap(table(laserhits$Month,laserhits$Weekday),Rowv=NA, Colv=NA,revC=TRUE,
        scale="none", col = palette, margins=c(5,1),
        main="Figure 6: Laser encounters by month and day")
```

The above map shows that July through November tends to have more laser encounters, as does Friday and Saturday.

#####Heat map 2: Showing days of the week with the most hits#####
By scaling the heat map by the row values (month of the year), the darker cells in each row correspond to the days of the week with the most laser encounters. This means that a column that is consistently darker would correspond to the days of the week that is more likely to have laser encounters.

```{r, echo=FALSE}
heatmap(table(laserhits$Month,laserhits$Weekday),Rowv=NA, Colv=NA,revC=TRUE,
        scale="row", col = palette, margins=c(5,1),
        main="Figure 7: Laser encounters scaled by month")
```

This second heat map show that for most months of the year, Friday and Saturday have a consistently higher number of laser encounter reports than other days of the week.

#####Heat map 3: Showing months of the year with the most hits#####
By scaling the heat map by the column values (day of the week), the darker cells in each column correspond to the months with the most laser encounters.

```{r, echo=FALSE}