retrieve: best practices
I just wanted to note some best practices for speeding up retrievals for better understanding. Assume i have several files, i want to retrieve and i don't want to download everything recursively. As far as i understood, i should:
retrieve a list of files
option 1
- use
slk_helpers gen_file_query <file1> <file2> ... <fileN>
to create a json query - use
slk_helpers search_limited
to create a search based on the query - run
slk retrieve
with the id
this should avoid tapes being rejected and stored if those files are on the same tape? we could implement this if slk_retrieve
receives a list of files as input.
option 2
- use
slk_helpers gfbt <file1> <file2> ... <fileN> --gen-search-query
to create a json query for each group of files. - run
slk_helpers search_limited
on those queries to obtain search ids - run
slk_retrieve
in parallel for each search id
is option 2 recommendable? I imagine creating a job script for each group of files that are on the same tape to retrieve them (or use threads). Or is there no advantage to option 1? i assume that none of the files are cached...