Skip to content

Discovery and execution lifecycle

The lifecycle starts with an operator registering a source and ends with an execution record containing manifest, backend status, and provenance.

Discovery

register source
      |
      v
mark for discovery
      |
      v
scheduler_tick
      |
      v
discover_batch
      |
      v
TAP queries -> field map -> metadata rows
                         |
                         v
                discovery signature
                         |
                         v
                workflow_run_pending
Control Source
Query templates project_config.discovery.queries
Enrichment queries project_config.discovery.enrichments
Field mapping project_config.discovery.prepare_metadata.field_map
Signature fields project_config.discovery.prepare_metadata.signature
Batch limits automation.discovery and BEAMPIPE_SHAPING_*

Discovery signatures determine whether prepared metadata changed enough to trigger future work. Exclude volatile archive fields such as access URLs, file sizes, and timestamps when they should not cause reruns.

Execution

pending source(s)
      |
      v
create execution
      |
      v
build manifest -> DALiuGE Graphs -> stage / translate / submit
      |                |                              |
      v                v                              v
ledger row       prepared graph                  backend run
      |                                               |
      v                                               v
poll tick <---------- REST DIM or Slurm status -------+
      |
      v
completed / failed / cancelled
Control Source
Grouping manifest.group_by
Manifest shape manifest.source_template, dataset_template, path
Graph mutation graph_patches YAML, documented as DALiuGE Graphs
Backend selection deployment_profile_name or project default
Admission automation.execution and execution shaping variables

Operator model

Beampipe separates authoritative state, backend detail, and audit narrative:

Layer Storage Use when
Execution ledger batch_execution_record FSM truth, list/filter runs, cancel
Run record workflow_manifest.beampipe_run_record Backend integration detail, poll counters, raw excerpts
Provenance provenance_events Operator timeline: discovery changes, execution transitions, alerts
Metrics beampipe_* on :9090 Dashboards and alert thresholds

Recommended debug order for a stuck run: readiness, metrics, provenance events, then beampipe_run_record in the execution response.

Next: choose backend behavior in Deployment profiles.