Advanced usage • fpod

In most situations, you will most likely not be dealing with a single FPOD file, but a large number of files generated by many different pods sitting at different sites. This vignette demonstrates a few useful tips and tricks to deal with multiple files.

Reading multiple files

First, load the package and get the path to some example data.

library(fpod)
fn <- fp_example("gullars_period1.FP3") # <- example FP3 file

Let’s imagine a scenario where we have 5 different FP3 files that we want to read. Since the fpod package only bundles one example data set, for the purposes of this vignette, we’ll “simulate” the multiple files situation by re-reading that same file multiple times, and running our combination logic on the duplicated data set.

fpod_files <- rep(fn, 5) # simulate 5 FPOD files
basename(fpod_files)
#> [1] "gullars_period1.FP3" "gullars_period1.FP3" "gullars_period1.FP3"
#> [4] "gullars_period1.FP3" "gullars_period1.FP3"

As we can see, this gives us a character vector with five filenames. Normally, we wouldn’t want to manually specify our list of files to read in a variable like this, but rather generate the list automatically by calling list.files() on the directory where we’ve stored the FP3 files, e.g.:

#fpod_files <- list.files("/users/andre/projects/fpod_troms/data", pattern = "FP3$", full = TRUE, recursive = TRUE)

This would give us a list of all FP3 files that are stored in the directory specified, including those that might be tucked away in subfolders, and it would automatically detect any files that may have been added since the last time the code was run.

But anyway, going back to the list we’re using for this vignette - we can now use, for example, lapply to easily read our list of files into R.

dat <- lapply(fpod_files, fp_read)
str(dat, 2)
#> List of 5
#>  $ :List of 4
#>   ..$ header:List of 15
#>   ..$ env   :Classes 'data.table' and 'data.frame':  14400 obs. of  7 variables:
#>   .. ..- attr(*, ".internal.selfref")=<pointer: 0x562f8b9acef0> 
#>   ..$ wav   :Classes 'data.table' and 'data.frame':  15995 obs. of  3 variables:
#>   .. ..- attr(*, ".internal.selfref")=<pointer: 0x562f8b9acef0> 
#>   ..$ clicks:Classes 'data.table' and 'data.frame':  82637 obs. of  14 variables:
#>   .. ..- attr(*, ".internal.selfref")=<pointer: 0x562f8b9acef0> 
#>   .. ..- attr(*, "start")= POSIXct[1:1], format: "2024-12-07 06:00:00"
#>   .. ..- attr(*, "on")= int [1:14400] 1 2 3 4 5 6 7 8 9 10 ...
#>  $ :List of 4
#>   ..$ header:List of 15
#>   ..$ env   :Classes 'data.table' and 'data.frame':  14400 obs. of  7 variables:
#>   .. ..- attr(*, ".internal.selfref")=<pointer: 0x562f8b9acef0> 
#>   ..$ wav   :Classes 'data.table' and 'data.frame':  15995 obs. of  3 variables:
#>   .. ..- attr(*, ".internal.selfref")=<pointer: 0x562f8b9acef0> 
#>   ..$ clicks:Classes 'data.table' and 'data.frame':  82637 obs. of  14 variables:
#>   .. ..- attr(*, ".internal.selfref")=<pointer: 0x562f8b9acef0> 
#>   .. ..- attr(*, "start")= POSIXct[1:1], format: "2024-12-07 06:00:00"
#>   .. ..- attr(*, "on")= int [1:14400] 1 2 3 4 5 6 7 8 9 10 ...
#>  $ :List of 4
#>   ..$ header:List of 15
#>   ..$ env   :Classes 'data.table' and 'data.frame':  14400 obs. of  7 variables:
#>   .. ..- attr(*, ".internal.selfref")=<pointer: 0x562f8b9acef0> 
#>   ..$ wav   :Classes 'data.table' and 'data.frame':  15995 obs. of  3 variables:
#>   .. ..- attr(*, ".internal.selfref")=<pointer: 0x562f8b9acef0> 
#>   ..$ clicks:Classes 'data.table' and 'data.frame':  82637 obs. of  14 variables:
#>   .. ..- attr(*, ".internal.selfref")=<pointer: 0x562f8b9acef0> 
#>   .. ..- attr(*, "start")= POSIXct[1:1], format: "2024-12-07 06:00:00"
#>   .. ..- attr(*, "on")= int [1:14400] 1 2 3 4 5 6 7 8 9 10 ...
#>  $ :List of 4
#>   ..$ header:List of 15
#>   ..$ env   :Classes 'data.table' and 'data.frame':  14400 obs. of  7 variables:
#>   .. ..- attr(*, ".internal.selfref")=<pointer: 0x562f8b9acef0> 
#>   ..$ wav   :Classes 'data.table' and 'data.frame':  15995 obs. of  3 variables:
#>   .. ..- attr(*, ".internal.selfref")=<pointer: 0x562f8b9acef0> 
#>   ..$ clicks:Classes 'data.table' and 'data.frame':  82637 obs. of  14 variables:
#>   .. ..- attr(*, ".internal.selfref")=<pointer: 0x562f8b9acef0> 
#>   .. ..- attr(*, "start")= POSIXct[1:1], format: "2024-12-07 06:00:00"
#>   .. ..- attr(*, "on")= int [1:14400] 1 2 3 4 5 6 7 8 9 10 ...
#>  $ :List of 4
#>   ..$ header:List of 15
#>   ..$ env   :Classes 'data.table' and 'data.frame':  14400 obs. of  7 variables:
#>   .. ..- attr(*, ".internal.selfref")=<pointer: 0x562f8b9acef0> 
#>   ..$ wav   :Classes 'data.table' and 'data.frame':  15995 obs. of  3 variables:
#>   .. ..- attr(*, ".internal.selfref")=<pointer: 0x562f8b9acef0> 
#>   ..$ clicks:Classes 'data.table' and 'data.frame':  82637 obs. of  14 variables:
#>   .. ..- attr(*, ".internal.selfref")=<pointer: 0x562f8b9acef0> 
#>   .. ..- attr(*, "start")= POSIXct[1:1], format: "2024-12-07 06:00:00"
#>   .. ..- attr(*, "on")= int [1:14400] 1 2 3 4 5 6 7 8 9 10 ...

Alternatively, we can pre-declare a list of the right length and populate it element by element via a conventional for-loop. It is not recommended to create an empty vector and add to it iteratively using c(), although it might be one intuitive way of doing this. The reason is that this would make R copy the data internally every time c() is called (i.e. in each iteration of the loop), and so it is highly inefficient to do this. For small data sets, it probably won’t be noticable, but the computational cost increases with increasing data size.

dat <- vector(mode = "list", length = length(fpod_files))
for (i in 1:length(fpod_files)) {
    dat[[i]] <- fp_read(fpod_files[[i]])
}
str(dat, 2)
#> List of 5
#>  $ :List of 4
#>   ..$ header:List of 15
#>   ..$ env   :Classes 'data.table' and 'data.frame':  14400 obs. of  7 variables:
#>   .. ..- attr(*, ".internal.selfref")=<pointer: 0x562f8b9acef0> 
#>   ..$ wav   :Classes 'data.table' and 'data.frame':  15995 obs. of  3 variables:
#>   .. ..- attr(*, ".internal.selfref")=<pointer: 0x562f8b9acef0> 
#>   ..$ clicks:Classes 'data.table' and 'data.frame':  82637 obs. of  14 variables:
#>   .. ..- attr(*, ".internal.selfref")=<pointer: 0x562f8b9acef0> 
#>   .. ..- attr(*, "start")= POSIXct[1:1], format: "2024-12-07 06:00:00"
#>   .. ..- attr(*, "on")= int [1:14400] 1 2 3 4 5 6 7 8 9 10 ...
#>  $ :List of 4
#>   ..$ header:List of 15
#>   ..$ env   :Classes 'data.table' and 'data.frame':  14400 obs. of  7 variables:
#>   .. ..- attr(*, ".internal.selfref")=<pointer: 0x562f8b9acef0> 
#>   ..$ wav   :Classes 'data.table' and 'data.frame':  15995 obs. of  3 variables:
#>   .. ..- attr(*, ".internal.selfref")=<pointer: 0x562f8b9acef0> 
#>   ..$ clicks:Classes 'data.table' and 'data.frame':  82637 obs. of  14 variables:
#>   .. ..- attr(*, ".internal.selfref")=<pointer: 0x562f8b9acef0> 
#>   .. ..- attr(*, "start")= POSIXct[1:1], format: "2024-12-07 06:00:00"
#>   .. ..- attr(*, "on")= int [1:14400] 1 2 3 4 5 6 7 8 9 10 ...
#>  $ :List of 4
#>   ..$ header:List of 15
#>   ..$ env   :Classes 'data.table' and 'data.frame':  14400 obs. of  7 variables:
#>   .. ..- attr(*, ".internal.selfref")=<pointer: 0x562f8b9acef0> 
#>   ..$ wav   :Classes 'data.table' and 'data.frame':  15995 obs. of  3 variables:
#>   .. ..- attr(*, ".internal.selfref")=<pointer: 0x562f8b9acef0> 
#>   ..$ clicks:Classes 'data.table' and 'data.frame':  82637 obs. of  14 variables:
#>   .. ..- attr(*, ".internal.selfref")=<pointer: 0x562f8b9acef0> 
#>   .. ..- attr(*, "start")= POSIXct[1:1], format: "2024-12-07 06:00:00"
#>   .. ..- attr(*, "on")= int [1:14400] 1 2 3 4 5 6 7 8 9 10 ...
#>  $ :List of 4
#>   ..$ header:List of 15
#>   ..$ env   :Classes 'data.table' and 'data.frame':  14400 obs. of  7 variables:
#>   .. ..- attr(*, ".internal.selfref")=<pointer: 0x562f8b9acef0> 
#>   ..$ wav   :Classes 'data.table' and 'data.frame':  15995 obs. of  3 variables:
#>   .. ..- attr(*, ".internal.selfref")=<pointer: 0x562f8b9acef0> 
#>   ..$ clicks:Classes 'data.table' and 'data.frame':  82637 obs. of  14 variables:
#>   .. ..- attr(*, ".internal.selfref")=<pointer: 0x562f8b9acef0> 
#>   .. ..- attr(*, "start")= POSIXct[1:1], format: "2024-12-07 06:00:00"
#>   .. ..- attr(*, "on")= int [1:14400] 1 2 3 4 5 6 7 8 9 10 ...
#>  $ :List of 4
#>   ..$ header:List of 15
#>   ..$ env   :Classes 'data.table' and 'data.frame':  14400 obs. of  7 variables:
#>   .. ..- attr(*, ".internal.selfref")=<pointer: 0x562f8b9acef0> 
#>   ..$ wav   :Classes 'data.table' and 'data.frame':  15995 obs. of  3 variables:
#>   .. ..- attr(*, ".internal.selfref")=<pointer: 0x562f8b9acef0> 
#>   ..$ clicks:Classes 'data.table' and 'data.frame':  82637 obs. of  14 variables:
#>   .. ..- attr(*, ".internal.selfref")=<pointer: 0x562f8b9acef0> 
#>   .. ..- attr(*, "start")= POSIXct[1:1], format: "2024-12-07 06:00:00"
#>   .. ..- attr(*, "on")= int [1:14400] 1 2 3 4 5 6 7 8 9 10 ...

Personally, I prefer the lapply approach because I think it is cleaner, less verbose and possibly more computationally efficient. But both approaches demonstrated here are perfectly valid, as is evident from the identical output from str above.

Combining data from multiple files

The next step depends on what we want to do with the data. First, let’s say we want to just simply combine all NBHF clicks into one potentially enormous data.table.

This is pretty simple:

library(data.table)
#> 
#> Attaching package: 'data.table'
#> The following object is masked from 'package:base':
#> 
#>     %notin%
nbhf <- lapply(dat, function(x) {
    x$clicks[species == "NBHF"]
}) |> rbindlist()

Let’s have a look:

nbhf[, 1:7] # show only first 7 cols for brevity
#>           pod                time minute microsec click_no train_id species
#>         <int>              <POSc>  <int>    <int>    <int>    <int>  <char>
#>      1:  7660 2024-12-07 16:06:24    606 24099690      510        3    NBHF
#>      2:  7660 2024-12-07 16:06:24    606 24192330      511        3    NBHF
#>      3:  7660 2024-12-07 16:06:24    606 24276885      512        3    NBHF
#>      4:  7660 2024-12-07 16:06:24    606 24355150      513        3    NBHF
#>      5:  7660 2024-12-07 16:06:24    606 24425345      514        3    NBHF
#>     ---                                                                    
#> 257946:  7660 2024-12-16 21:58:10  13918 10793125    82624        3    NBHF
#> 257947:  7660 2024-12-16 21:58:10  13918 10796900    82625        3    NBHF
#> 257948:  7660 2024-12-16 21:58:10  13918 10800740    82626        3    NBHF
#> 257949:  7660 2024-12-16 21:58:10  13918 10804635    82627        3    NBHF
#> 257950:  7660 2024-12-16 21:58:10  13918 10808595    82628        3    NBHF

In some cases, you may run into trouble with empty FP3 files (files with no clicks registered), e.g. if the POD has been become activated/deactivated due to the tilt trigger, bad batteries, or has restarted for some other reason. Those files should probably be deleted, but if they aren’t, we can add a check in the body of the lapply-loop to handle them gracefully.

nbhf <- lapply(dat, function(x) {
    clicks <- x$clicks[species == "NBHF"]
    if (nrow(clicks) == 0) {
        clicks <- clicks[0L]
    }
    clicks
}) |> rbindlist()

In many cases however, you might want to summarize detection-positive-minutes (DPMs) for all KERNO-F categories (NBHF, OtherCet and Sonar), and buzz-positive- minutes (BPMs) for NBHF clicks. Here’s one way we could do that, again using lapply:

dpm <- lapply(dat, function(x) {
    nbhf <- x$clicks[species == "NBHF"]
    dolphins <- x$clicks[species == "OtherCet"]
    sonar <- x$clicks[species == "Sonar"]
    nbhf$buzz <- fp_find_buzzes(nbhf)
    nbhf_dpm <- fp_summarize(nbhf)
    dol_dpm <- fp_summarize(dolphins)
    sonar_dpm <- fp_summarize(sonar)

    # checks to handle cases of no detections for each category
    if (all(is.na(nbhf_dpm$pod))) nbhf_dpm[, pod := x$header$pod_id]
    if (all(is.na(dol_dpm$pod))) dol_dpm[, pod := x$header$pod_id]
    if (all(is.na(sonar_dpm$pod))) sonar_dpm[, pod := x$header$pod_id]
    
    dpm <- merge(nbhf_dpm, dol_dpm, by = c("pod", "time"), suffix = c("", "_dol"))
    dpm <- merge(dpm, sonar_dpm, by = c("pod", "time"), suffix = c("", "_sonar"))
    dpm[, -c("bpm_dol", "bpm_sonar")]
}) |> rbindlist()
dpm
#>          pod                time   dpm   bpm dpm_dol dpm_sonar
#>        <int>              <POSc> <int> <int>   <int>     <int>
#>     1:  7660 2024-12-07 06:01:00     0     0       0         0
#>     2:  7660 2024-12-07 06:02:00     0     0       0         0
#>     3:  7660 2024-12-07 06:03:00     0     0       0         0
#>     4:  7660 2024-12-07 06:04:00     0     0       0         0
#>     5:  7660 2024-12-07 06:05:00     0     0       0         0
#>    ---                                                        
#> 71996:  7660 2024-12-17 05:56:00     0     0       0         0
#> 71997:  7660 2024-12-17 05:57:00     0     0       0         0
#> 71998:  7660 2024-12-17 05:58:00     0     0       0         0
#> 71999:  7660 2024-12-17 05:59:00     0     0       0         0
#> 72000:  7660 2024-12-17 06:00:00     0     0       0         0

Now we have our FPOD data in a format that is suited for plotting/analyses!