Agenda

short catch up since last meeting(s)
Data ingest / crawler discussion - existing approaches:
- ESGF (CEDA ingest tool, and ESGF publisher)
  - stac-generator
    - crawl --> extract --> elastic_search ingest
  - esg-publisher
    - (crawled) --> extract --> publish (on kafka) --> remote stac ingest
  - (both: kerchunk/sidecar file generation .. tool??)
- Warmworld
  - static / ad hoc tool generate --> ?static catalog gen? / ?test stac ingest?
- Expect (cross catalog and non-ESGF catalogs)
  - static cross catalog overlay ??
- EERIE
  - see cloudify
- NextGems etc: intake
  - (intake --> stac) !?
- (Freva: tbd. in February meeting)
  - crawler --> direct solr ingest --> (stac gen?)
status test servers, prototyping setup etc.

Discussion

DKRZ metadata crawler / indexer approach: build new - reuse existing approaches etc.
- ceda indexer: developer left, very generic tool, relatively good code base but more generic then what we need, generation of kerchunk / aggregation done separately - not clear if it has major advantages to build on this ...
- esgf-pub: crawling / gridmapfile generation done separately anyway, not a good code base, quite CMIP/ESGF specific, unclear future development roadmap (current funding problems, freeze situation) ..
- eerie approach: zarr / kerchunking approach central, yet we probably can not borrow much from other approaches (esgf etc.)
- freva indexer: disadvantages are that we need to build our higher aggregation levels based on the freva base level, these devs. are then quite dependant on the freva/solr base layer, which is old and dkrz specific. Major advantage would be that we could relay on a shared stable production dkrz indexing solution
separate discussion about catalog of catalog approach and specific DKRZ catalog solution

action items:

continue discussion with freva people
@carsten: look once again into the ceda crawling approach, to see whether there is a major advantage on reusing/building upon their code base
carsten/fabi: discuss kerchunking approach - virtualzarr etc. , follow new zarr3 related approaches
continue discussion as part of our regular thursday meetings ..

figures for guiding discussions:

status vs. ideal

metacat perspective

Catalogs to play with / link to in test env:

DestinE, e.g. https://hda.data.destination-earth.eu/ui/catalog
CEDA CMIP6 STAC, via stacbrowser
EERIE cloud

from diagrams import Diagram, Cluster
from diagrams.aws.compute import EC2
from diagrams.aws.database import RDS
from diagrams.aws.network import ELB

with Diagram("Web Service", show=False) as diag:
    with Cluster("STAC Catalogs"):
        cat1 = RDS("ESGF EAST STAC Catalog")
        cat2 = RDS("DKRZ proj Catalog")
        cat3 = RDS("EERIE Catalog")
        cat4 = RDS("WW Catalog")
        cat5 = RDS("DKRZ NN Catalog, e.g. DestinE")
    ELB("STAC MetaCat") >> [cat1,cat2,cat3,cat4,cat5] 
diag

Comments

Please register or sign in to add a comment.

Meeting_29_01_2025

Agenda

Discussion

action items:

figures for guiding discussions:

status vs. ideal

metacat perspective

Comments

status vs. ideal

metacat perspective