Update Meeting_29_01_2025 authored by Stephan Kindermann's avatar Stephan Kindermann
# Agenda
- short catch up since last meeting(s)
-
*
- Data ingest / crawler discussion - existing approaches:
* ESGF (CEDA ingest tool, and ESGF publisher)
* [stac-generator](https://github.com/cedadev/stac-generator)
* crawl --\> extract --\> elastic_search ingest
......@@ -23,13 +22,31 @@
-
# Discussion
## discussion figures:
- DKRZ metadata crawler / indexer approach: build new - reuse existing approaches etc.
* ceda indexer: developer left, very generic tool, relatively good code base but more generic then what we need, generation of kerchunk / aggregation done separately - not clear if it has major advantages to build on this ...
* esgf-pub: crawling / gridmapfile generation done separately anyway, not a good code base, quite CMIP/ESGF specific, unclear future development roadmap (current funding problems, freeze situation) ..
* eerie approach: zarr / kerchunking approach central, yet we probably can not borrow much from other approaches (esgf etc.)
* freva indexer: disadvantages are that we need to build our higher aggregation levels based on the freva base level, these devs. are then quite dependant on the freva/solr base layer, which is old and dkrz specific. Major advantage would be that we could relay on a shared stable production dkrz indexing solution
- separate discussion about catalog of catalog approach and specific DKRZ catalog solution
#### action items:
* continue discussion with freva people
* @carsten: look once again into the ceda crawling approach, to see whether there is a major advantage on reusing/building upon their code base
* carsten/fabi: discuss kerchunking approach - virtualzarr etc. , follow new zarr3 related approaches
* continue discussion as part of our regular thursday meetings ..
## figures for guiding discussions:
### status vs. ideal
![stac-ing1](uploads/0956b11e3638e119bd7f621611484d44/stac-ing1.jpg)
### metacat perspective
Catalogs to play with / link to in test env:
- DestinE, e.g. https://hda.data.destination-earth.eu/ui/catalog
- CEDA CMIP6 STAC, [via stacbrowser](https://radiantearth.github.io/stac-browser/#/external/api.stac.ceda.ac.uk/?.language=de)
- [EERIE cloud](https://eerie.cloud.dkrz.de/datasets/)
......@@ -52,5 +69,3 @@ with Diagram("Web Service", show=False) as diag:
ELB("STAC MetaCat") >> [cat1,cat2,cat3,cat4,cat5]
diag
```
\ No newline at end of file
# Action Items
\ No newline at end of file