Access file creation and structure

Starting a new access file

Data collected onboard are stored in access files, located in the “OnBoard/access” subfolder. Ideally, a maximum of 20 hauls should be stored in a .accdb file. IMPORTANT: in case the data collection is done in parallel over multiple laptops, the “OnBoard” folder must be renamed with unique identifiers (e.g.: OnBoard_2024_PLE). Instructions to set up a folder on a new laptop are reported in the repository main page

To start a new file, copy the template version, which in 2024 it is called “bio_data_v2024_SOLEMON_template.accdb”, paste it in the same “OnBoard/access” (or OnBoard_2024_PLE/access) subfolder and rename it as “bio_data_v2024_SOLEMON_YEAR_N.accdb”, where YEAR is the reporting year and N is a progressive number. Example: “bio_data_v2024_SOLEMON_2024_1.accdb” . It does not matter if parts of the same haul are in different files, this is handled in post-processing. It does not matters if access with the same name are located in different laptops, this is handled in post processing.

Starting a new haul

Each access file contains a template sheet, called “cala_template”. To start a new haul, copy the template sheet, paste it and rename it as “cala_x”, where x is the haul number. Example: cala_1; cala_7bis. It does not matter if parts of the same haul are in different files, this is handled in post-processing.

Haul sheet Structure

The structure of the sheets, in 2024, is according to Figure1. In detail:

gear: gear code. Accepts A and D.
species_name: solemon code
length_mm: length of the individual, in millimeters
weight_g: weight of the individuals, in grams. This also host the cumulative weight for some species, refer to “deal with cumulative data” section for further details.
Sex: required only for target species, accepts F, M, I. Leave empty if sex data not required
Mat: required only for target species, just specify the stage (1, 2, etc..). Exceptions for crustaceans: refer to section “cr”
id_specimen: fill for specimens for which detailed samples (otolit, genetic etc.) were taken. It accepts numbers and letters. When filled a serial “fishID” number would be generated.
length_field2: disc length for elasmobranchs; free spot for other species
length_field3: disc width for elasmobranchs; free spot for other species
total_number: use this only when the total number is the real total number (no subsample)
subsample_number: placeholder for subsamples. The compilation depends on the case. read handbook
kg_field1: weight in kilograms, placeholder for cumulative number and subsamples. The compilation depends on the case
kg_field2: weight in kilograms, placeholder for cumulative number and subsamples. The compilation depends on the case
kg_field2: weight in kilograms, placeholder for cumulative number and subsamples. The compilation depends on the case
haul_weight: weight of the haul in kilograms. The need for compilation depends on the case
type_subsample: accepts values “C1_haul”, “C2_species”,“C3_benthos”, “C4_multiple”, “other”. The compilation depends on the case.
Notes: accept notes of any kind.

Figure 2: Example of the structure of the access file

.accdb auto compilation general details

Auto compilation applies to some column of the access file. This mean that in the processing of the table, to empty cells is assigned the first available data that is found in the previous rows. When collecting data, for these column you need to specify the value just for the first observation, then you should fill the value again only when this change. These columns are:

gear
species_name
sex (only for crustaceans)

Collecting data

Standard procedure

Standard procedure applies when all the individuals for a given species are collected and reported

Target species

When reporting data for target species you should use only the first 7 columns. Of these, the columns gear and species_name autocompiles according to the first record inserted. This mean that you should write in these columns only the first time you report an observation (i.e.: first record of a gear, first record of a species). The column id_specimen serves to store any kind of individual id (e.g.: otholits code, genetic samples).

Figure 3: example of compilation for target species species

Elasmobranchs

When reporting data for elasmobranchs you should use only the first 9 columns. Of these, the columns gear and species_name autocompiles according to the first record inserted. This mean that you should write in these columns only the first time you report an observation (i.e.: first record of a gear, first record of a species). The column id_specimen serves to store any kind of individual id (e.g.: otholits code, genetic samples). Refer to the Figure 4 for reporting the 3 length measures

Figure 4: example of compilation for elasmobranchs species

Other commercial species

When reporting data for other commercial species you should use only the first 4 columns. Of these, the columns gear and species_name autocompiles according to the first record inserted. This mean that you should write in these columns only the first time you report an observation (i.e.: first record of a gear, first record of a species).

For this species category are recorded individual length and cumulative weight. The cumulative weight should be reported in the last record, as in Figure 5.

Figure 5: example of compilation for other commercial species

Shellfishes (when all the individuals are sampled)

When reporting data for shellfishes (MUREBRA, HEXATRU, OSTREDU) you should use the columns as reported in Figure 6. Of these, the columns gear and species_name autocompiles according to the first record inserted. This mean that you should write in these columns only the first time you report an observation (i.e.: first record of a gear, first record of a species).

NB: in case of subsamples, refer to the dedicated section

Figure 6: example of compilation for shellfishes species

Subsamples

Subsamples are taken when the amount of individuals in a given species is too high to be processed. Considering the onboard practice used in the solemon survey there are three cases of subsamples happening. Figure 7 reports the steps that are done according to the three cases. Each case is treated differently depending on the type of species, and the data treatment is explained in the dedicated sections.

In terms of onboard procedure, the cases general refers to:

Case 1 (C1): ALL the individuals from a haul are collected (sorted sample), then the sorted sample is subsampled and processed
Case 2 (C2): an unsorted subsample is taken from the haul, then it is sorted and ALL the individuals from the sorted subsample are processed
Case 3 (C3): a partially sorted subsample (e.g.: the subsample is taken after commercial species are sorted out, same procedure used for the discard subsample) is taken from the haul, then it is sorted and ALL the individuals from the sorted subsample are processed
Case 4 (C4): it applies for those shellfish coming in huge quantity (MUREBRA, HEXATRU). The subsampling happens in multiple steps (refer to dedicated section)

Figure 7: type of subsamples

Target species

A typical case for subsample of target species is AEQUOPE, which usually occurs in large quantities and LFD is needed. In this case only a few individuals are measured to obtain the length structure, then it is needed to estimate total number (and sometimes total weight). Individuals that are processed for length and weight can be treated as any other target species. Regarding the other information needed to raise the values, you need to create a new record for the species (and gear) to store subsample data. There are two expected cases:

Case 1 (C1_species) - Target species

ALL individuals collected from the haul (sorted sample), then a subsample is taken (sorted subsample) for individual processing. One record is dedicated to the subsample details. The total weight of the sorted sample (in kilograms) is reported in the in the kg_field1 field. The weight of the sorted subsample is reported in the kg_field2. type_subsample is “C1_species”. The individuals subsampled are processed according to the standard procedure for target species, creating a new record for each specimen.

Figure 8: subsamples, case 1 for target species

Case 2 (C2_haul) - Target species

A subsample is taken from the haul as soon as it is taken (unsorted subsample, before removing anything from the haul), then ALL the individuals in the subsample are collected (sorted subsample) for individual processing. One record is dedicated to the subsample details. The weight of the haul is reported in haul_weight. The weigth of the unsorted subsample is reported in kg_field1. The weight of the sorted subsample is reported in kg_field2. type_subsample is “C2_haul”. The individuals subsampled are processed according to the standard procedure for target species, creating a new record for each specimen.

Figure 9: subsamples, case 2 for target species

Case 3 (C3_benthos) - Target species

A subsample is taken from the haul after partial sorting (partially sorted subsample). This procedure is tipically applied to benthos subsample, which is taken after commercial species and litter are removed. Therefore, subsamples taken with this procedure need to be raised using the standard procedure to raise discard data. Then, ALL the individuals in the subsample are collected (sorted subsample) for individual processing. One record is dedicated to the subsample details. The weight of the haul is reported in haul_weight. The weigth of the unsorted subsample is reported in kg_field1. The weight of the sorted subsample is reported in kg_field2. type_subsample is “C3_benthos”. The individuals subsampled are processed according to the standard procedure for target species, creating a new record for each specimen.

Figure 10: subsamples, case 3 for target species

Non-target species (MUREBRA, HEXATRU, etc..)

A typical case for subsample of non-target species is MUREBRA. When it occurrs in large aggregations, total number (and sometimes total weight) are estimated from subsamples. To store information needed to raise the values, there are two expected cases:

Case 1 (C1_species) - Non target

ALL individuals collected from the haul (sorted sample), then a subsample is taken (sorted subsample) for estimating total number. One record is dedicated to the subsample details. The total weight of the sorted sample (in kilograms) is reported in the in the kg_field1 field. The weight of the sorted subsample is reported in the kg_field2. The number of individual in the sorted subsample is reported in the subsample_number. type_subsample is “C1_species”.

Figure 11: subsamples, case 1 for non target species

Case 2 (C2_haul) - Non target

A subsample is taken from the haul as soon as it is taken (unsorted subsample, before removing anything from the haul), then ALL the individuals in the subsample are collected (sorted subsample) for estimating total number. One record is dedicated to the subsample details. The weight of the haul is reported in haul_weight. The weigth of the unsorted subsample is reported in kg_field1. The weight of the sorted subsample is reported in kg_field2. The number of individual in the sorted subsample is reported in the subsample_number. type_subsample is “C2_haul”.

Figure 12: subsamples, case 2 for non target species

Case 3 (C3_benthos) - Non target

A subsample is taken from the haul after partial sorting (partially sorted subsample). This procedure is tipically applied to benthos subsample, which is taken after commercial species and litter are removed. Therefore, subsamples taken with this procedure need to be raised using the standard procedure to raise discard data. Then, ALL the individuals in the subsample are collected (sorted subsample) for individual processing. One record is dedicated to the subsample details. The weight of the haul is reported in haul_weight. The weigth of the unsorted subsample is reported in kg_field1. The weight of the sorted subsample is reported in kg_field2. The number of individual in the sorted subsample is reported in the subsample_number. type_subsample is “C3_benthos”.

Figure 13: subsamples, case 3 for non target

Case 4 (C4_multiple) - Non target

it has been only used for MUREBRA and HEXATRU in cases of exceptional catches. ALL individuals of these two species are collected from the haul (partially sorted sample), then a subsample is taken from the partially sorted sample (partially sorted subsample). The partially sorted subsample is sorted, aka it is divede into species (sorted subsample). The sorted subsample of each species is subsampled againg (sorted sub subsample) for estimating total number. One record is dedicated to the subsample details. The weight of the partially sorted sample is reported in kg_field1. The weigth of the sorted subsample for each species is reported in kg_field2. The weight of the sorted sub-subsample is reported in kg_field3. The number of individual in the sorted sub-subsample is reported in the subsample_number. type_subsample is “multi”.

Figure 14: subsamples, case 4 for non target

Processing data

Data are processed in R. It is good practice to process the data relatively frequently in order to create backup files and to produce some checks that can be used to adjust data.

process haul data

Data stored in the .accdb file are retrieved and handled by R scripts, located in the “R” folder. The required script is workflow_access_v2024.R.

process single hauls

To process single hauls, run the lines up to 15 and then please set the following parameters:

haul: number of the haul corresponding to the name given in the access file, but without the extension ’‘. Example: ’cala22’ in access is ‘22’ here, ‘cala45bis’ is ‘45bis’
db: suffix given to the acces file. E.g.: if access file name is ‘Maschera inserimento SOLEMON_test.accdb’, type just ‘test’ here.
updateID: this control if update (‘Y’) or not (‘N’) the serial number ‘fishID’. Recommended to type ‘Y’ only when batch processing all the hauls.
area_sepia: old command
year: year of the survey
area: stratum of the haul

# set parameters
haul=22 
db='test'
updateID='N'
area_sepia='D'
year=2022
area='ITA17'

When parameters are set, the data processing is pre-defined: you just have to run it without changing the parameters. The first step is the function1, which scope is to format the access table according to output standards. There is no need to see the output of this function. Just run it.

# function1 extract data from access db and format them
hauldata=function1(haul=haul, 
                   db=db,
                   year=year)# extract and format data

When the function1 is done, you can proceed with the function2, which scope is to perform some checks. Checks done are plots that would be saved under the path ‘output/checks’

# function 2: perform checks
function2(xdat=hauldata, 
          haul=haul)

function3 scope is to format data according to trust format. Excel sheets are saved under the path ‘output/trust’.

# function 3: format data to trust format
trustdat=function3(xdat=hauldata[[1]], 
                  haul=haul, 
                  year = year, 
                  weight_not_target = hauldata[[2]],  
                  subsamples_target=hauldata[[3]],
                  catch_sample_disattivati = catch_sample_disattivati) # function 2

function4 creates a pdf report and save it under the path ‘output/pdf’.

# function4: save PDF
function4(trustdat = trustdat, 
          year=year,
          area = area,
          haul=haul)

loop to process more hauls

To process more than one haul, you should care to properly fill in the ‘haul_order’ excel sheet (see input data section). After having loaded the haul summary, just run the loop represented below.

haul_summary=read_excel("data/haul_order.xlsx")
haul_summary=haul_summary[1:5,]

for(xhaul in 1:nrow(haul_summary)){
  
  
  # loop parameters
  haul=haul_summary[xhaul,]$haul
  db=haul_summary[xhaul,]$DB
  area=haul_summary[xhaul,]$country
  
  cat('processing haul no.', haul, '(', xhaul,'/', nrow(haul_summary),')' )
  
  # function1 extract data from access db and format them
  hauldata=function1(haul=haul, 
                     db=db,
                     year=year)# extract and format data
  
  # function 2: perform checks
  function2(xdat=hauldata, 
            haul=haul)
  
  # function 3: format data to trust format
  trustdat=function3(xdat=hauldata[[1]], 
                     haul=haul, 
                     year = year, 
                     weight_not_target = hauldata[[2]],  
                     subsamples_target=hauldata[[3]],
                     catch_sample_disattivati = catch_sample_disattivati) # function 2
  
  # function4: save PDF
  function4(trustdat = trustdat, 
            year=year,
            area=area,
            haul=haul)


}

Data files associated to the folder

target_species

Contains the species that are target for the survey (individual length and weight) and the molluscs for which only total weight and total number are needed.

Species: species solemon code
target: indicates which type of target is the species. 1 is for species that requires individual length and weight; 2 is for molluscs that requires total weight and total number

List of target species. Target = 1 are proper target; Target = 2 are shellfishes on which to collect number and weight
Species	Species_code	target
Aequipecten opercularis	AEQUOPE	1
Callinectes sapidus	CALLSAP	1
Flexopecten glaber proteus (= Chlamys proteus)	CHLAPRO	1
Chlamys varia	CHLAVAR	1
Melicertus kerathurus	MELIKER	1
Merluccius merluccius	MERLMER	1
Mullus barbatus	MULLBAR	1
Nephrops norvegicus	NEPRNOR	1
Parapenaeus longirostris	PAPELON	1
Pecten jacobaeus	PECTJAC	1
Platichthys flesus	PLATFLE	1
Psetta maxima	PSETMAX	1
Raja asterias	RAJAAST	1
Raja clavata	RAJACLA	1
Raja miraletus	RAJAMIR	1
Scophthalmus rhombus	SCOHRHO	1
Scyliorhinus canicula	SCYOCAN	1
Scyliorhinus stellaris	SCYOSTE	1
Sepia officinalis	SEPIOFF	1
Solea aegyptiaca	SOLEAEG	1
Solea solea	SOLEVUL	1
Squilla mantis	SQUIMAN	1
Torpedo marmorata	TORPMAR	1
Crassostrea gigas	CRASGIG	2
Galeodea echinophora	GALEECH	2
Hexaplex trunculus	HEXATRU	2
Bolinus brandaris	MUREBRA	2
Mytilus galloprovincialis	MYTGALL	2
Natica stercusmuscarum	NATISTE	2
Ostrea edulis	OSTREDU	2
Rapana venosa	RAPAVEN	2

catch_sample_disattivati

This file controls formatting of trust templates. It indicates which samples (Station, Gear and SpecCode) should be indicated as “InUse” = FALSE in the catch sample files used as input data in trust.

Station	Gear	SpecCode
11	D	AEQUOPE

fishID

Store the updated serial number used to identify specimens for which detailed samples (otolit, genetic etc.) were taken. This number should refere to the last ID assigned to a specimen. The use of this file is controlled by the updateID parameter in the workflow_access_v0 file: if updateID is set equal to Y, the fishID file is used to assign IDs (when requested) and it is then updated. The columns refers to:

type: not useful (to be removed)
code: this is the alphanumeric part of the code to be assigned to specimens
fishID: serial number referring to the last individual contained in the past records
haul: not useful (to be removed)
species: indicates which species belong to the code category. If need to add species, use : as separator and do not insert spaces.

Example of the fishID file structure
type	code	fishID	haul	species
ELAS	Elas	202	NA	RAJAAST:RAJACLA:RAJAMIR:TORPMAR
solea	SS	7061	NA	SOLEVUL:SOLEAEG
RHO	SR	447	NA	SCOHRHO

haul_order

Store the information associated with hauls. This file is used (1) when the data workflow is applied in loop; (2) by the minilog script. The columns refers to:

day: day of the haul (yyyy-mm-dd)
haul: number of the haul as from solemon protocol
id: progressive number of the haul
note: this space serve to write any kind of note, it is ignored by the code
inizio: time of setting the net (hh:mm:ss)
fine: time of hauling the net (hh:mm:ss)
verifica_shell: this space serve to write additional notes, it is ignored by the code
DB: indicates the name of the access database where the haul was recorder. Do not include ‘.accdb’ extension.
country: indicates in which stratum the haul was performed. Available strata are ‘HRV’, ‘ITA17’, ‘SVN’

Example of the haul_order file structure
day	haul	id	note	inizio	fine	valid	DB	country	peso_rapido_A	peso_rapido_D	peso_subcampione_a	peso_subcampione_D
2022-11-24	31	1	NA	NA	NA	1	2022_1	ITA17	NA	NA	NA	NA
2022-11-24	66	2	NA	NA	NA	1	2022_1	ITA17	NA	NA	NA	NA
2022-11-24	29	3	NA	NA	NA	1	2022_1	ITA17	NA	NA	NA	NA

lw_pars

Store the length-weigth parameters of target species. This file is used to reconstruct length (or weight) when it is not available in the recorded data. Example: shrimps where missi a part of the tail but have the head intact are suitable only for length measurement; fishes that were spoiled by the gear may be ok for weight but not measurables for length. The columns refers to:

species_name: indicates which species belong to the code category (solemon code)
sex: indicates the sex in case when sexual dimorphism in the growth is relevant. If dimorphism is not relevant, put combined parameter and write NA in this field
a: parameter “a” for the length-weight relationship
b: parameter “b” for the length-weight relationship
source: source from where a and b were retrieved

Example of the lw_pars file structure
species_name	sex	a	b	source
SOLEVUL	NA	0.00460	3.11	benchmark_assessment
MULLBAR	NA	0.00871	3.09	fishbase
RAJACLA	NA	0.00269	3.23	fishbase

maturity_stages

Store the code of the maturity scales for target species. This file is used to format input files for trust.

SPECIES: indicates which species belong to the code category (solemon code)
SEX: indicates the sex
SCALE: include the alphanumeric part of the maturity scale code

Example of the maturity_stages file structure
SPECIES	SEX	SCALE
MELIKER	F	MEDPF
MERLMER	F	MEDFI
MERLMER	M	MEDFI

solemon_TB

This is just the TB file updated to 2021, as stored in trust. It serves to perform some checks and it should not be modified. Preview not shown.

species_list

This file is downloaded from the trust database and not modified. It contains the species list. Last update xxx. If need to modify, please do it in thrust and then download the excel again!.

Species: species scientific name
Medits: species solemon code
Sp_Subcat: commercial category of the species
Lan_Class: not relevant
Sp_Subcat: not relevant

Example of the species_list file structure
Species	Medits	Sp_Subcat	Len_Class	Notes
Aaptos aaptos	AAPTAAP	E	NA	NA
Abra prismatica	ABRAPRI	E	NA	NA
Abra sp	ABRASPP	E	NA	NA

SoleMon data handling handbook version 2024

2024-10-24

Access file creation and structure

Starting a new access file

Starting a new haul

Haul sheet Structure

.accdb auto compilation general details

Collecting data

Standard procedure

Target species

Elasmobranchs

Other commercial species

Shellfishes (when all the individuals are sampled)

Subsamples

Target species

Case 1 (C1_species) - Target species

Case 2 (C2_haul) - Target species

Case 3 (C3_benthos) - Target species

Non-target species (MUREBRA, HEXATRU, etc..)

Case 1 (C1_species) - Non target

Case 2 (C2_haul) - Non target

Case 3 (C3_benthos) - Non target

Case 4 (C4_multiple) - Non target

Processing data

process haul data

process single hauls

loop to process more hauls

Data files associated to the folder

target_species

catch_sample_disattivati

fishID

haul_order

lw_pars

maturity_stages

solemon_TB

species_list