In most situations, you will most likely not be dealing with a single FPOD file, but a large number of files generated by many different pods sitting at different sites. This vignette demonstrates a few useful tips and tricks to deal with multiple files.
Reading multiple files
First, load the package and get the path to some example data.
library(fpod)
fn <- fp_example("gullars_period1.FP3") # <- example FP3 fileLet’s imagine a scenario where we have 5 different FP3 files that we
want to read. Since the fpod package only bundles one
example data set, for the purposes of this vignette, we’ll “simulate”
the multiple files situation by re-reading that same file multiple
times, and running our combination logic on the duplicated data set.
fpod_files <- rep(fn, 5) # simulate 5 FPOD files
basename(fpod_files)
#> [1] "gullars_period1.FP3" "gullars_period1.FP3" "gullars_period1.FP3"
#> [4] "gullars_period1.FP3" "gullars_period1.FP3"As we can see, this gives us a character vector with five filenames.
Normally, we wouldn’t want to manually specify our list of files to read
in a variable like this, but rather generate the list automatically by
calling list.files() on the directory where we’ve stored
the FP3 files, e.g.:
#fpod_files <- list.files("/users/andre/projects/fpod_troms/data", pattern = "FP3$", full = TRUE, recursive = TRUE)This would give us a list of all FP3 files that are stored in the directory specified, including those that might be tucked away in subfolders, and it would automatically detect any files that may have been added since the last time the code was run.
But anyway, going back to the list we’re using for this vignette - we
can now use, for example, lapply to easily read our list of
files into R.
dat <- lapply(fpod_files, fp_read)
str(dat, 2)
#> List of 5
#> $ :List of 4
#> ..$ header:List of 15
#> ..$ env :Classes 'data.table' and 'data.frame': 14400 obs. of 7 variables:
#> .. ..- attr(*, ".internal.selfref")=<pointer: 0x562f8b9acef0>
#> ..$ wav :Classes 'data.table' and 'data.frame': 15995 obs. of 3 variables:
#> .. ..- attr(*, ".internal.selfref")=<pointer: 0x562f8b9acef0>
#> ..$ clicks:Classes 'data.table' and 'data.frame': 82637 obs. of 14 variables:
#> .. ..- attr(*, ".internal.selfref")=<pointer: 0x562f8b9acef0>
#> .. ..- attr(*, "start")= POSIXct[1:1], format: "2024-12-07 06:00:00"
#> .. ..- attr(*, "on")= int [1:14400] 1 2 3 4 5 6 7 8 9 10 ...
#> $ :List of 4
#> ..$ header:List of 15
#> ..$ env :Classes 'data.table' and 'data.frame': 14400 obs. of 7 variables:
#> .. ..- attr(*, ".internal.selfref")=<pointer: 0x562f8b9acef0>
#> ..$ wav :Classes 'data.table' and 'data.frame': 15995 obs. of 3 variables:
#> .. ..- attr(*, ".internal.selfref")=<pointer: 0x562f8b9acef0>
#> ..$ clicks:Classes 'data.table' and 'data.frame': 82637 obs. of 14 variables:
#> .. ..- attr(*, ".internal.selfref")=<pointer: 0x562f8b9acef0>
#> .. ..- attr(*, "start")= POSIXct[1:1], format: "2024-12-07 06:00:00"
#> .. ..- attr(*, "on")= int [1:14400] 1 2 3 4 5 6 7 8 9 10 ...
#> $ :List of 4
#> ..$ header:List of 15
#> ..$ env :Classes 'data.table' and 'data.frame': 14400 obs. of 7 variables:
#> .. ..- attr(*, ".internal.selfref")=<pointer: 0x562f8b9acef0>
#> ..$ wav :Classes 'data.table' and 'data.frame': 15995 obs. of 3 variables:
#> .. ..- attr(*, ".internal.selfref")=<pointer: 0x562f8b9acef0>
#> ..$ clicks:Classes 'data.table' and 'data.frame': 82637 obs. of 14 variables:
#> .. ..- attr(*, ".internal.selfref")=<pointer: 0x562f8b9acef0>
#> .. ..- attr(*, "start")= POSIXct[1:1], format: "2024-12-07 06:00:00"
#> .. ..- attr(*, "on")= int [1:14400] 1 2 3 4 5 6 7 8 9 10 ...
#> $ :List of 4
#> ..$ header:List of 15
#> ..$ env :Classes 'data.table' and 'data.frame': 14400 obs. of 7 variables:
#> .. ..- attr(*, ".internal.selfref")=<pointer: 0x562f8b9acef0>
#> ..$ wav :Classes 'data.table' and 'data.frame': 15995 obs. of 3 variables:
#> .. ..- attr(*, ".internal.selfref")=<pointer: 0x562f8b9acef0>
#> ..$ clicks:Classes 'data.table' and 'data.frame': 82637 obs. of 14 variables:
#> .. ..- attr(*, ".internal.selfref")=<pointer: 0x562f8b9acef0>
#> .. ..- attr(*, "start")= POSIXct[1:1], format: "2024-12-07 06:00:00"
#> .. ..- attr(*, "on")= int [1:14400] 1 2 3 4 5 6 7 8 9 10 ...
#> $ :List of 4
#> ..$ header:List of 15
#> ..$ env :Classes 'data.table' and 'data.frame': 14400 obs. of 7 variables:
#> .. ..- attr(*, ".internal.selfref")=<pointer: 0x562f8b9acef0>
#> ..$ wav :Classes 'data.table' and 'data.frame': 15995 obs. of 3 variables:
#> .. ..- attr(*, ".internal.selfref")=<pointer: 0x562f8b9acef0>
#> ..$ clicks:Classes 'data.table' and 'data.frame': 82637 obs. of 14 variables:
#> .. ..- attr(*, ".internal.selfref")=<pointer: 0x562f8b9acef0>
#> .. ..- attr(*, "start")= POSIXct[1:1], format: "2024-12-07 06:00:00"
#> .. ..- attr(*, "on")= int [1:14400] 1 2 3 4 5 6 7 8 9 10 ...Alternatively, we can pre-declare a list of the right length and
populate it element by element via a conventional for-loop. It is not
recommended to create an empty vector and add to it iteratively using
c(), although it might be one intuitive way of doing this.
The reason is that this would make R copy the data internally every time
c() is called (i.e. in each iteration of the loop), and so
it is highly inefficient to do this. For small data sets, it probably
won’t be noticable, but the computational cost increases with increasing
data size.
dat <- vector(mode = "list", length = length(fpod_files))
for (i in 1:length(fpod_files)) {
dat[[i]] <- fp_read(fpod_files[[i]])
}
str(dat, 2)
#> List of 5
#> $ :List of 4
#> ..$ header:List of 15
#> ..$ env :Classes 'data.table' and 'data.frame': 14400 obs. of 7 variables:
#> .. ..- attr(*, ".internal.selfref")=<pointer: 0x562f8b9acef0>
#> ..$ wav :Classes 'data.table' and 'data.frame': 15995 obs. of 3 variables:
#> .. ..- attr(*, ".internal.selfref")=<pointer: 0x562f8b9acef0>
#> ..$ clicks:Classes 'data.table' and 'data.frame': 82637 obs. of 14 variables:
#> .. ..- attr(*, ".internal.selfref")=<pointer: 0x562f8b9acef0>
#> .. ..- attr(*, "start")= POSIXct[1:1], format: "2024-12-07 06:00:00"
#> .. ..- attr(*, "on")= int [1:14400] 1 2 3 4 5 6 7 8 9 10 ...
#> $ :List of 4
#> ..$ header:List of 15
#> ..$ env :Classes 'data.table' and 'data.frame': 14400 obs. of 7 variables:
#> .. ..- attr(*, ".internal.selfref")=<pointer: 0x562f8b9acef0>
#> ..$ wav :Classes 'data.table' and 'data.frame': 15995 obs. of 3 variables:
#> .. ..- attr(*, ".internal.selfref")=<pointer: 0x562f8b9acef0>
#> ..$ clicks:Classes 'data.table' and 'data.frame': 82637 obs. of 14 variables:
#> .. ..- attr(*, ".internal.selfref")=<pointer: 0x562f8b9acef0>
#> .. ..- attr(*, "start")= POSIXct[1:1], format: "2024-12-07 06:00:00"
#> .. ..- attr(*, "on")= int [1:14400] 1 2 3 4 5 6 7 8 9 10 ...
#> $ :List of 4
#> ..$ header:List of 15
#> ..$ env :Classes 'data.table' and 'data.frame': 14400 obs. of 7 variables:
#> .. ..- attr(*, ".internal.selfref")=<pointer: 0x562f8b9acef0>
#> ..$ wav :Classes 'data.table' and 'data.frame': 15995 obs. of 3 variables:
#> .. ..- attr(*, ".internal.selfref")=<pointer: 0x562f8b9acef0>
#> ..$ clicks:Classes 'data.table' and 'data.frame': 82637 obs. of 14 variables:
#> .. ..- attr(*, ".internal.selfref")=<pointer: 0x562f8b9acef0>
#> .. ..- attr(*, "start")= POSIXct[1:1], format: "2024-12-07 06:00:00"
#> .. ..- attr(*, "on")= int [1:14400] 1 2 3 4 5 6 7 8 9 10 ...
#> $ :List of 4
#> ..$ header:List of 15
#> ..$ env :Classes 'data.table' and 'data.frame': 14400 obs. of 7 variables:
#> .. ..- attr(*, ".internal.selfref")=<pointer: 0x562f8b9acef0>
#> ..$ wav :Classes 'data.table' and 'data.frame': 15995 obs. of 3 variables:
#> .. ..- attr(*, ".internal.selfref")=<pointer: 0x562f8b9acef0>
#> ..$ clicks:Classes 'data.table' and 'data.frame': 82637 obs. of 14 variables:
#> .. ..- attr(*, ".internal.selfref")=<pointer: 0x562f8b9acef0>
#> .. ..- attr(*, "start")= POSIXct[1:1], format: "2024-12-07 06:00:00"
#> .. ..- attr(*, "on")= int [1:14400] 1 2 3 4 5 6 7 8 9 10 ...
#> $ :List of 4
#> ..$ header:List of 15
#> ..$ env :Classes 'data.table' and 'data.frame': 14400 obs. of 7 variables:
#> .. ..- attr(*, ".internal.selfref")=<pointer: 0x562f8b9acef0>
#> ..$ wav :Classes 'data.table' and 'data.frame': 15995 obs. of 3 variables:
#> .. ..- attr(*, ".internal.selfref")=<pointer: 0x562f8b9acef0>
#> ..$ clicks:Classes 'data.table' and 'data.frame': 82637 obs. of 14 variables:
#> .. ..- attr(*, ".internal.selfref")=<pointer: 0x562f8b9acef0>
#> .. ..- attr(*, "start")= POSIXct[1:1], format: "2024-12-07 06:00:00"
#> .. ..- attr(*, "on")= int [1:14400] 1 2 3 4 5 6 7 8 9 10 ...Personally, I prefer the lapply approach because I think
it is cleaner, less verbose and possibly more computationally efficient.
But both approaches demonstrated here are perfectly valid, as is evident
from the identical output from str above.
Combining data from multiple files
The next step depends on what we want to do with the data. First, let’s say we want to just simply combine all NBHF clicks into one potentially enormous data.table.
This is pretty simple:
library(data.table)
#>
#> Attaching package: 'data.table'
#> The following object is masked from 'package:base':
#>
#> %notin%
nbhf <- lapply(dat, function(x) {
x$clicks[species == "NBHF"]
}) |> rbindlist()Let’s have a look:
nbhf[, 1:7] # show only first 7 cols for brevity
#> pod time minute microsec click_no train_id species
#> <int> <POSc> <int> <int> <int> <int> <char>
#> 1: 7660 2024-12-07 16:06:24 606 24099690 510 3 NBHF
#> 2: 7660 2024-12-07 16:06:24 606 24192330 511 3 NBHF
#> 3: 7660 2024-12-07 16:06:24 606 24276885 512 3 NBHF
#> 4: 7660 2024-12-07 16:06:24 606 24355150 513 3 NBHF
#> 5: 7660 2024-12-07 16:06:24 606 24425345 514 3 NBHF
#> ---
#> 257946: 7660 2024-12-16 21:58:10 13918 10793125 82624 3 NBHF
#> 257947: 7660 2024-12-16 21:58:10 13918 10796900 82625 3 NBHF
#> 257948: 7660 2024-12-16 21:58:10 13918 10800740 82626 3 NBHF
#> 257949: 7660 2024-12-16 21:58:10 13918 10804635 82627 3 NBHF
#> 257950: 7660 2024-12-16 21:58:10 13918 10808595 82628 3 NBHFIn some cases, you may run into trouble with empty FP3 files (files with no clicks registered), e.g. if the POD has been become activated/deactivated due to the tilt trigger, bad batteries, or has restarted for some other reason. Those files should probably be deleted, but if they aren’t, we can add a check in the body of the lapply-loop to handle them gracefully.
nbhf <- lapply(dat, function(x) {
clicks <- x$clicks[species == "NBHF"]
if (nrow(clicks) == 0) {
clicks <- clicks[0L]
}
clicks
}) |> rbindlist()In many cases however, you might want to summarize
detection-positive-minutes (DPMs) for all KERNO-F categories (NBHF,
OtherCet and Sonar), and buzz-positive- minutes (BPMs) for NBHF clicks.
Here’s one way we could do that, again using lapply:
dpm <- lapply(dat, function(x) {
nbhf <- x$clicks[species == "NBHF"]
dolphins <- x$clicks[species == "OtherCet"]
sonar <- x$clicks[species == "Sonar"]
nbhf$buzz <- fp_find_buzzes(nbhf)
nbhf_dpm <- fp_summarize(nbhf)
dol_dpm <- fp_summarize(dolphins)
sonar_dpm <- fp_summarize(sonar)
# checks to handle cases of no detections for each category
if (all(is.na(nbhf_dpm$pod))) nbhf_dpm[, pod := x$header$pod_id]
if (all(is.na(dol_dpm$pod))) dol_dpm[, pod := x$header$pod_id]
if (all(is.na(sonar_dpm$pod))) sonar_dpm[, pod := x$header$pod_id]
dpm <- merge(nbhf_dpm, dol_dpm, by = c("pod", "time"), suffix = c("", "_dol"))
dpm <- merge(dpm, sonar_dpm, by = c("pod", "time"), suffix = c("", "_sonar"))
dpm[, -c("bpm_dol", "bpm_sonar")]
}) |> rbindlist()
dpm
#> pod time dpm bpm dpm_dol dpm_sonar
#> <int> <POSc> <int> <int> <int> <int>
#> 1: 7660 2024-12-07 06:01:00 0 0 0 0
#> 2: 7660 2024-12-07 06:02:00 0 0 0 0
#> 3: 7660 2024-12-07 06:03:00 0 0 0 0
#> 4: 7660 2024-12-07 06:04:00 0 0 0 0
#> 5: 7660 2024-12-07 06:05:00 0 0 0 0
#> ---
#> 71996: 7660 2024-12-17 05:56:00 0 0 0 0
#> 71997: 7660 2024-12-17 05:57:00 0 0 0 0
#> 71998: 7660 2024-12-17 05:58:00 0 0 0 0
#> 71999: 7660 2024-12-17 05:59:00 0 0 0 0
#> 72000: 7660 2024-12-17 06:00:00 0 0 0 0Now we have our FPOD data in a format that is suited for plotting/analyses!