Access file creation and structure

Starting a new access file

Data collected onboard are stored in access files, located in the “OnBoard/access” subfolder. Ideally, a maximum of 20 hauls should be stored in a .accdb file. IMPORTANT: in case the data collection is done in parallel over multiple laptops, the “OnBoard” folder must be renamed with unique identifiers (e.g.: OnBoard_2024_PLE). Instructions to set up a folder on a new laptop are reported in the repository main page

To start a new file, copy the template version, which in 2024 it is called “bio_data_v2024_SOLEMON_template.accdb”, paste it in the same “OnBoard/access” (or OnBoard_2024_PLE/access) subfolder and rename it as “bio_data_v2024_SOLEMON_YEAR_N.accdb”, where YEAR is the reporting year and N is a progressive number. Example: “bio_data_v2024_SOLEMON_2024_1.accdb” . It does not matter if parts of the same haul are in different files, this is handled in post-processing. It does not matters if access with the same name are located in different laptops, this is handled in post processing.

Starting a new haul

Each access file contains a template sheet, called “cala_template”. To start a new haul, copy the template sheet, paste it and rename it as “cala_x”, where x is the haul number. Example: cala_1; cala_7bis. It does not matter if parts of the same haul are in different files, this is handled in post-processing.

Haul sheet Structure

The structure of the sheets, in 2024, is according to Figure1. In detail:

  • gear: gear code. Accepts A and D.
  • species_name: solemon code
  • length_mm: length of the individual, in millimeters
  • weight_g: weight of the individuals, in grams. This also host the cumulative weight for some species, refer to “deal with cumulative data” section for further details.
  • Sex: required only for target species, accepts F, M, I. Leave empty if sex data not required
  • Mat: required only for target species, just specify the stage (1, 2, etc..). Exceptions for crustaceans: refer to section “cr”
  • id_specimen: fill for specimens for which detailed samples (otolit, genetic etc.) were taken. It accepts numbers and letters. When filled a serial “fishID” number would be generated.
  • length_field2: disc length for elasmobranchs; free spot for other species
  • length_field3: disc width for elasmobranchs; free spot for other species
  • total_number: use this only when the total number is the real total number (no subsample)
  • subsample_number: placeholder for subsamples. The compilation depends on the case. read handbook
  • kg_field1: weight in kilograms, placeholder for cumulative number and subsamples. The compilation depends on the case
  • kg_field2: weight in kilograms, placeholder for cumulative number and subsamples. The compilation depends on the case
  • kg_field2: weight in kilograms, placeholder for cumulative number and subsamples. The compilation depends on the case
  • haul_weight: weight of the haul in kilograms. The need for compilation depends on the case
  • type_subsample: accepts values “C1_haul”, “C2_species”,“C3_benthos”, “C4_multiple”, “other”. The compilation depends on the case.
  • Notes: accept notes of any kind.
Figure 2: Example of the structure of the access file
Figure 2: Example of the structure of the access file

.accdb auto compilation general details

Auto compilation applies to some column of the access file. This mean that in the processing of the table, to empty cells is assigned the first available data that is found in the previous rows. When collecting data, for these column you need to specify the value just for the first observation, then you should fill the value again only when this change. These columns are:

  • gear
  • species_name
  • sex (only for crustaceans)

Collecting data

Standard procedure

Standard procedure applies when all the individuals for a given species are collected and reported

Target species

When reporting data for target species you should use only the first 7 columns. Of these, the columns gear and species_name autocompiles according to the first record inserted. This mean that you should write in these columns only the first time you report an observation (i.e.: first record of a gear, first record of a species). The column id_specimen serves to store any kind of individual id (e.g.: otholits code, genetic samples).

Figure 3: example of compilation for target species species
Figure 3: example of compilation for target species species

Elasmobranchs

When reporting data for elasmobranchs you should use only the first 9 columns. Of these, the columns gear and species_name autocompiles according to the first record inserted. This mean that you should write in these columns only the first time you report an observation (i.e.: first record of a gear, first record of a species). The column id_specimen serves to store any kind of individual id (e.g.: otholits code, genetic samples). Refer to the Figure 4 for reporting the 3 length measures

Figure 4: example of compilation for elasmobranchs species
Figure 4: example of compilation for elasmobranchs species

Other commercial species

When reporting data for other commercial species you should use only the first 4 columns. Of these, the columns gear and species_name autocompiles according to the first record inserted. This mean that you should write in these columns only the first time you report an observation (i.e.: first record of a gear, first record of a species).

For this species category are recorded individual length and cumulative weight. The cumulative weight should be reported in the last record, as in Figure 5.

Figure 5: example of compilation for other commercial species
Figure 5: example of compilation for other commercial species

Shellfishes (when all the individuals are sampled)

When reporting data for shellfishes (MUREBRA, HEXATRU, OSTREDU) you should use the columns as reported in Figure 6. Of these, the columns gear and species_name autocompiles according to the first record inserted. This mean that you should write in these columns only the first time you report an observation (i.e.: first record of a gear, first record of a species).

NB: in case of subsamples, refer to the dedicated section

Figure 6: example of compilation for shellfishes species
Figure 6: example of compilation for shellfishes species

Subsamples

Subsamples are taken when the amount of individuals in a given species is too high to be processed. Considering the onboard practice used in the solemon survey there are three cases of subsamples happening. Figure 7 reports the steps that are done according to the three cases. Each case is treated differently depending on the type of species, and the data treatment is explained in the dedicated sections.

In terms of onboard procedure, the cases general refers to:

  • Case 1 (C1): ALL the individuals from a haul are collected (sorted sample), then the sorted sample is subsampled and processed
  • Case 2 (C2): an unsorted subsample is taken from the haul, then it is sorted and ALL the individuals from the sorted subsample are processed
  • Case 3 (C3): a partially sorted subsample (e.g.: the subsample is taken after commercial species are sorted out, same procedure used for the discard subsample) is taken from the haul, then it is sorted and ALL the individuals from the sorted subsample are processed
  • Case 4 (C4): it applies for those shellfish coming in huge quantity (MUREBRA, HEXATRU). The subsampling happens in multiple steps (refer to dedicated section)
Figure 7: type of subsamples
Figure 7: type of subsamples

Target species

A typical case for subsample of target species is AEQUOPE, which usually occurs in large quantities and LFD is needed. In this case only a few individuals are measured to obtain the length structure, then it is needed to estimate total number (and sometimes total weight). Individuals that are processed for length and weight can be treated as any other target species. Regarding the other information needed to raise the values, you need to create a new record for the species (and gear) to store subsample data. There are two expected cases:

Case 1 (C1_species) - Target species

ALL individuals collected from the haul (sorted sample), then a subsample is taken (sorted subsample) for individual processing. One record is dedicated to the subsample details. The total weight of the sorted sample (in kilograms) is reported in the in the kg_field1 field. The weight of the sorted subsample is reported in the kg_field2. type_subsample is “C1_species”. The individuals subsampled are processed according to the standard procedure for target species, creating a new record for each specimen.

Figure 8: subsamples, case 1 for target species
Figure 8: subsamples, case 1 for target species



Case 2 (C2_haul) - Target species

A subsample is taken from the haul as soon as it is taken (unsorted subsample, before removing anything from the haul), then ALL the individuals in the subsample are collected (sorted subsample) for individual processing. One record is dedicated to the subsample details. The weight of the haul is reported in haul_weight. The weigth of the unsorted subsample is reported in kg_field1. The weight of the sorted subsample is reported in kg_field2. type_subsample is “C2_haul”. The individuals subsampled are processed according to the standard procedure for target species, creating a new record for each specimen.

Figure 9: subsamples, case 2 for target species
Figure 9: subsamples, case 2 for target species



Case 3 (C3_benthos) - Target species

A subsample is taken from the haul after partial sorting (partially sorted subsample). This procedure is tipically applied to benthos subsample, which is taken after commercial species and litter are removed. Therefore, subsamples taken with this procedure need to be raised using the standard procedure to raise discard data. Then, ALL the individuals in the subsample are collected (sorted subsample) for individual processing. One record is dedicated to the subsample details. The weight of the haul is reported in haul_weight. The weigth of the unsorted subsample is reported in kg_field1. The weight of the sorted subsample is reported in kg_field2. type_subsample is “C3_benthos”. The individuals subsampled are processed according to the standard procedure for target species, creating a new record for each specimen.

Figure 10: subsamples, case 3 for target species
Figure 10: subsamples, case 3 for target species



Non-target species (MUREBRA, HEXATRU, etc..)

A typical case for subsample of non-target species is MUREBRA. When it occurrs in large aggregations, total number (and sometimes total weight) are estimated from subsamples. To store information needed to raise the values, there are two expected cases:


Case 1 (C1_species) - Non target

ALL individuals collected from the haul (sorted sample), then a subsample is taken (sorted subsample) for estimating total number. One record is dedicated to the subsample details. The total weight of the sorted sample (in kilograms) is reported in the in the kg_field1 field. The weight of the sorted subsample is reported in the kg_field2. The number of individual in the sorted subsample is reported in the subsample_number. type_subsample is “C1_species”.

Figure 11: subsamples, case 1 for non target species
Figure 11: subsamples, case 1 for non target species



Case 2 (C2_haul) - Non target

A subsample is taken from the haul as soon as it is taken (unsorted subsample, before removing anything from the haul), then ALL the individuals in the subsample are collected (sorted subsample) for estimating total number. One record is dedicated to the subsample details. The weight of the haul is reported in haul_weight. The weigth of the unsorted subsample is reported in kg_field1. The weight of the sorted subsample is reported in kg_field2. The number of individual in the sorted subsample is reported in the subsample_number. type_subsample is “C2_haul”.

Figure 12: subsamples, case 2 for non target species
Figure 12: subsamples, case 2 for non target species



Case 3 (C3_benthos) - Non target

A subsample is taken from the haul after partial sorting (partially sorted subsample). This procedure is tipically applied to benthos subsample, which is taken after commercial species and litter are removed. Therefore, subsamples taken with this procedure need to be raised using the standard procedure to raise discard data. Then, ALL the individuals in the subsample are collected (sorted subsample) for individual processing. One record is dedicated to the subsample details. The weight of the haul is reported in haul_weight. The weigth of the unsorted subsample is reported in kg_field1. The weight of the sorted subsample is reported in kg_field2. The number of individual in the sorted subsample is reported in the subsample_number. type_subsample is “C3_benthos”.

Figure 13: subsamples, case 3 for non target
Figure 13: subsamples, case 3 for non target



Case 4 (C4_multiple) - Non target

it has been only used for MUREBRA and HEXATRU in cases of exceptional catches. ALL individuals of these two species are collected from the haul (partially sorted sample), then a subsample is taken from the partially sorted sample (partially sorted subsample). The partially sorted subsample is sorted, aka it is divede into species (sorted subsample). The sorted subsample of each species is subsampled againg (sorted sub subsample) for estimating total number. One record is dedicated to the subsample details. The weight of the partially sorted sample is reported in kg_field1. The weigth of the sorted subsample for each species is reported in kg_field2. The weight of the sorted sub-subsample is reported in kg_field3. The number of individual in the sorted sub-subsample is reported in the subsample_number. type_subsample is “multi”.

Figure 14: subsamples, case 4 for non target
Figure 14: subsamples, case 4 for non target



Processing data

Data are processed in R. It is good practice to process the data relatively frequently in order to create backup files and to produce some checks that can be used to adjust data.

process haul data

Data stored in the .accdb file are retrieved and handled by R scripts, located in the “R” folder. The required script is workflow_access_v2024.R.

process single hauls

To process single hauls, run the lines up to 15 and then please set the following parameters:

  • haul: number of the haul corresponding to the name given in the access file, but without the extension ’‘. Example: ’cala22’ in access is ‘22’ here, ‘cala45bis’ is ‘45bis’
  • db: suffix given to the acces file. E.g.: if access file name is ‘Maschera inserimento SOLEMON_test.accdb’, type just ‘test’ here.
  • updateID: this control if update (‘Y’) or not (‘N’) the serial number ‘fishID’. Recommended to type ‘Y’ only when batch processing all the hauls.
  • area_sepia: old command
  • year: year of the survey
  • area: stratum of the haul
# set parameters
haul=22 
db='test'
updateID='N'
area_sepia='D'
year=2022
area='ITA17'


When parameters are set, the data processing is pre-defined: you just have to run it without changing the parameters. The first step is the function1, which scope is to format the access table according to output standards. There is no need to see the output of this function. Just run it.

# function1 extract data from access db and format them
hauldata=function1(haul=haul, 
                   db=db,
                   year=year)# extract and format data


When the function1 is done, you can proceed with the function2, which scope is to perform some checks. Checks done are plots that would be saved under the path ‘output/checks’

# function 2: perform checks
function2(xdat=hauldata, 
          haul=haul)


function3 scope is to format data according to trust format. Excel sheets are saved under the path ‘output/trust’.

# function 3: format data to trust format
trustdat=function3(xdat=hauldata[[1]], 
                  haul=haul, 
                  year = year, 
                  weight_not_target = hauldata[[2]],  
                  subsamples_target=hauldata[[3]],
                  catch_sample_disattivati = catch_sample_disattivati) # function 2
                  
                  


function4 creates a pdf report and save it under the path ‘output/pdf’.

# function4: save PDF
function4(trustdat = trustdat, 
          year=year,
          area = area,
          haul=haul)

loop to process more hauls

To process more than one haul, you should care to properly fill in the ‘haul_order’ excel sheet (see input data section). After having loaded the haul summary, just run the loop represented below.

haul_summary=read_excel("data/haul_order.xlsx")
haul_summary=haul_summary[1:5,]

for(xhaul in 1:nrow(haul_summary)){
  
  
  # loop parameters
  haul=haul_summary[xhaul,]$haul
  db=haul_summary[xhaul,]$DB
  area=haul_summary[xhaul,]$country
  
  cat('processing haul no.', haul, '(', xhaul,'/', nrow(haul_summary),')' )
  
  # function1 extract data from access db and format them
  hauldata=function1(haul=haul, 
                     db=db,
                     year=year)# extract and format data
  
  # function 2: perform checks
  function2(xdat=hauldata, 
            haul=haul)
  
  # function 3: format data to trust format
  trustdat=function3(xdat=hauldata[[1]], 
                     haul=haul, 
                     year = year, 
                     weight_not_target = hauldata[[2]],  
                     subsamples_target=hauldata[[3]],
                     catch_sample_disattivati = catch_sample_disattivati) # function 2
  
  # function4: save PDF
  function4(trustdat = trustdat, 
            year=year,
            area=area,
            haul=haul)


}

Data files associated to the folder

target_species

Contains the species that are target for the survey (individual length and weight) and the molluscs for which only total weight and total number are needed.

  • Species: species solemon code
  • target: indicates which type of target is the species. 1 is for species that requires individual length and weight; 2 is for molluscs that requires total weight and total number



List of target species. Target = 1 are proper target; Target = 2 are shellfishes on which to collect number and weight
Species Species_code target
Aequipecten opercularis AEQUOPE 1
Callinectes sapidus CALLSAP 1
Flexopecten glaber proteus (= Chlamys proteus) CHLAPRO 1
Chlamys varia CHLAVAR 1
Melicertus kerathurus MELIKER 1
Merluccius merluccius MERLMER 1
Mullus barbatus MULLBAR 1
Nephrops norvegicus NEPRNOR 1
Parapenaeus longirostris PAPELON 1
Pecten jacobaeus PECTJAC 1
Platichthys flesus PLATFLE 1
Psetta maxima PSETMAX 1
Raja asterias RAJAAST 1
Raja clavata RAJACLA 1
Raja miraletus RAJAMIR 1
Scophthalmus rhombus SCOHRHO 1
Scyliorhinus canicula SCYOCAN 1
Scyliorhinus stellaris SCYOSTE 1
Sepia officinalis SEPIOFF 1
Solea aegyptiaca SOLEAEG 1
Solea solea SOLEVUL 1
Squilla mantis SQUIMAN 1
Torpedo marmorata TORPMAR 1
Crassostrea gigas CRASGIG 2
Galeodea echinophora GALEECH 2
Hexaplex trunculus HEXATRU 2
Bolinus brandaris MUREBRA 2
Mytilus galloprovincialis MYTGALL 2
Natica stercusmuscarum NATISTE 2
Ostrea edulis OSTREDU 2
Rapana venosa RAPAVEN 2

catch_sample_disattivati

This file controls formatting of trust templates. It indicates which samples (Station, Gear and SpecCode) should be indicated as “InUse” = FALSE in the catch sample files used as input data in trust.

Station Gear SpecCode
11 D AEQUOPE



fishID

Store the updated serial number used to identify specimens for which detailed samples (otolit, genetic etc.) were taken. This number should refere to the last ID assigned to a specimen. The use of this file is controlled by the updateID parameter in the workflow_access_v0 file: if updateID is set equal to Y, the fishID file is used to assign IDs (when requested) and it is then updated. The columns refers to:

  • type: not useful (to be removed)
  • code: this is the alphanumeric part of the code to be assigned to specimens
  • fishID: serial number referring to the last individual contained in the past records
  • haul: not useful (to be removed)
  • species: indicates which species belong to the code category. If need to add species, use : as separator and do not insert spaces.



Example of the fishID file structure
type code fishID haul species
ELAS Elas 202 NA RAJAAST:RAJACLA:RAJAMIR:TORPMAR
solea SS 7061 NA SOLEVUL:SOLEAEG
RHO SR 447 NA SCOHRHO



haul_order

Store the information associated with hauls. This file is used (1) when the data workflow is applied in loop; (2) by the minilog script. The columns refers to:

  • day: day of the haul (yyyy-mm-dd)
  • haul: number of the haul as from solemon protocol
  • id: progressive number of the haul
  • note: this space serve to write any kind of note, it is ignored by the code
  • inizio: time of setting the net (hh:mm:ss)
  • fine: time of hauling the net (hh:mm:ss)
  • verifica_shell: this space serve to write additional notes, it is ignored by the code
  • DB: indicates the name of the access database where the haul was recorder. Do not include ‘.accdb’ extension.
  • country: indicates in which stratum the haul was performed. Available strata are ‘HRV’, ‘ITA17’, ‘SVN’



Example of the haul_order file structure
day haul id note inizio fine valid DB country peso_rapido_A peso_rapido_D peso_subcampione_a peso_subcampione_D
2022-11-24 31 1 NA NA NA 1 2022_1 ITA17 NA NA NA NA
2022-11-24 66 2 NA NA NA 1 2022_1 ITA17 NA NA NA NA
2022-11-24 29 3 NA NA NA 1 2022_1 ITA17 NA NA NA NA



lw_pars

Store the length-weigth parameters of target species. This file is used to reconstruct length (or weight) when it is not available in the recorded data. Example: shrimps where missi a part of the tail but have the head intact are suitable only for length measurement; fishes that were spoiled by the gear may be ok for weight but not measurables for length. The columns refers to:

  • species_name: indicates which species belong to the code category (solemon code)
  • sex: indicates the sex in case when sexual dimorphism in the growth is relevant. If dimorphism is not relevant, put combined parameter and write NA in this field
  • a: parameter “a” for the length-weight relationship
  • b: parameter “b” for the length-weight relationship
  • source: source from where a and b were retrieved



Example of the lw_pars file structure
species_name sex a b source
SOLEVUL NA 0.00460 3.11 benchmark_assessment
MULLBAR NA 0.00871 3.09 fishbase
RAJACLA NA 0.00269 3.23 fishbase



maturity_stages

Store the code of the maturity scales for target species. This file is used to format input files for trust.

  • SPECIES: indicates which species belong to the code category (solemon code)
  • SEX: indicates the sex
  • SCALE: include the alphanumeric part of the maturity scale code



Example of the maturity_stages file structure
SPECIES SEX SCALE
MELIKER F MEDPF
MERLMER F MEDFI
MERLMER M MEDFI



solemon_TB

This is just the TB file updated to 2021, as stored in trust. It serves to perform some checks and it should not be modified. Preview not shown.



species_list

This file is downloaded from the trust database and not modified. It contains the species list. Last update xxx. If need to modify, please do it in thrust and then download the excel again!.

  • Species: species scientific name
  • Medits: species solemon code
  • Sp_Subcat: commercial category of the species
  • Lan_Class: not relevant
  • Sp_Subcat: not relevant



Example of the species_list file structure
Species Medits Sp_Subcat Len_Class Notes
Aaptos aaptos AAPTAAP E NA NA
Abra prismatica ABRAPRI E NA NA
Abra sp ABRASPP E NA NA