Ingest different DRS formats
This PR is a significant rewrite of how we handle DRS file paths to allow us to ingest paths of a few different known DRS formats.
This involved:
- creating a new crate to handle just loading the different DRS types
- changing ingestion to load the different types and then convert them to our metadata format
- fix some funky issues with
data-dir
not behaving how it should- I also made it optional so not having it means that all datasets are ingested
Merge request reports
Activity
assigned to @k204237
requested review from @k204206
- Resolved by Martin Bergemann
- Resolved by Martin Bergemann
- Resolved by Martin Bergemann
- Resolved by Martin Bergemann
- Resolved by Martin Bergemann
- Resolved by Martin Bergemann
- Resolved by Martin Bergemann
- Resolved by Martin Bergemann
Many of my questions come from the fact that I am having trouble comprehending the rust code. I started consulting the rust book but I must say it'll take me months if not years of practice to mildly catch up with the level of the code. So in general I think it would be a good idea to keep an eye on the doc-strings to at least explain what's going on, although it might be a no-brainer to you.
Also, would it be possible to add an example toml file to the README? With some comments? That'll be great.
- Resolved by Martin Bergemann
- Resolved by Martin Bergemann
- Resolved by Martin Bergemann
I tried to run this with the following config:
/work/ch1187/freva-regiklim/freva/drs_config.toml
freva-ingest --data-dir /work/bb1203/freva/model/regional --config-dir /work/ch1187/freva-regiklim/freva 2022-06-02T17:08:10.183794Z WARN ingest_dataset{dataset="nukleus" batch_size=1000}: freva::drs::ingest: /work/bb1203/freva/model/regional/nukleus/output/GER-3km/GERICS/ECMWF-ERAINT/evaluation/r1i1p1/GERICS-REMO2015/v1/1hr/clivi/v20201205/clivi_GER-3km_ECMWF-ERAINT_evaluation_r1i1p1_GERICS-REMO2015_v1_1hr_200909010030-200909302330.nc not a valid drs file, skipping: InvalidCordexPath(InvalidCordexPathError { reason: "Parsing Error: Error { input: \"nukleus/output/GER-3km/GERICS/ECMWF-ERAINT/evaluation/r1i1p1/GERICS-REMO2015/v1/1hr/clivi/v20201205/clivi_GER-3km_ECMWF-ERAINT_evaluation_r1i1p1_GERICS-REMO2015_v1_1hr_200909010030-200909302330.nc\", code: Verify }" })
I don't really understand why this isn't working. Could you check what is wrong?
- Resolved by Martin Bergemann
- Resolved by Martin Bergemann
added 8 commits
- fd835210 - Make `ingest_opts` a ref again
- 727084ee - Add config example to readme
- d2e5949c - Add warn(missing_docs) to drs and fix issues
- d06121f2 - Forgot to make warn(missing_docs) apply to the module
- b4d03d5e - Add docs to the rest of drs
- 607cb64a - Add some actual documentation to `drs`
- ba1687e3 - Rearrange code for handling data-dir
- f92835d9 - Remove the cordex activity and product constraint
Toggle commit list