Ingestion failures
This runbook documents failures we have encountered with the ingestion process.
Null pointer exceptions
We have occasionally seen failures where Datahub rejects a small number of models.
For example:
2024-10-23T08:23:53.5205940Z 'failures': [{'error': 'Unable to emit metadata to DataHub GMS: java.lang.NullPointerException',
2024-10-23T08:23:53.5206850Z 'info': {'exceptionClass': 'com.linkedin.restli.server.RestLiServiceException',
2024-10-23T08:23:53.5207654Z 'message': 'java.lang.NullPointerException',
2024-10-23T08:23:53.5208145Z 'status': 500,
2024-10-23T08:23:53.5209032Z 'urn': 'urn:li:dataset:(urn:li:dataPlatform:athena,athena_cadet.awsdatacatalog.avature_stg.stg_people_fields_pivoted_wide,PROD)',
2024-10-23T08:23:53.5210539Z 'workunit_id': 'urn:li:dataset:(urn:li:dataPlatform:athena,athena_cadet.awsdatacatalog.avature_stg.stg_people_fields_pivoted_wide,PROD)-upstreamLineage'}},
2024-10-23T08:23:53.5211753Z {'error': 'Unable to emit metadata to DataHub GMS: java.lang.NullPointerException',
2024-10-23T08:23:53.5212614Z 'info': {'exceptionClass': 'com.linkedin.restli.server.RestLiServiceException',
2024-10-23T08:23:53.5213333Z 'message': 'java.lang.NullPointerException',
2024-10-23T08:23:53.5213814Z 'status': 500,
2024-10-23T08:23:53.5214679Z 'urn': 'urn:li:dataset:(urn:li:dataPlatform:athena,athena_cadet.awsdatacatalog.avature_stg.stg_job_fields_pivoted_wide,PROD)',
2024-10-23T08:23:53.5216091Z 'workunit_id': 'urn:li:dataset:(urn:li:dataPlatform:athena,athena_cadet.awsdatacatalog.avature_stg.stg_job_fields_pivoted_wide,PROD)-upstreamLineage'}}],
The cause is as yet unclear, but this does not prevent the rest of the ingestion from completing.
Server version mismatch
When the datahub server version changes, the acryl-datahub
python library should also be updated to a compatable version.
If these get out of sync you will get errors like Unable to emit metadata to DataHub GMS, likely because the server version is too old relative to the client
.
You will need to upgrade/downgrade the python library for the ingestion to work again.
An error occurred (NoSuchKey) when calling the GetObject operation
The CaDeT ingestion expects to find dbt catalog/manifest files in an S3 bucket: s3://mojap-derived-tables/prod/run_artefacts/latest/target
.
Normally this is set by the “dbt docs” job in create-a-derived-table.
If this fails to run properly, the catalog/manifest files may be removed due to the bucket’s retention policy, and the ingestion will fail until dbt docs has successfully run again.
Recursion errors
If the Datahub ingestion process exceeds python’s recursion limit, it may interfere with ingestion; for example:
2024-07-03T11:05:37.3320639Z [2024-07-03 11:05:37,331] INFO {datahub.ingestion.source.dbt.dbt_common:1161} - Failed to parse compiled code for snapshot.mojap_derived_tables.derived_delius_dim__snapshot_sentences_at_disp: maximum recursion depth exceeded
We have seen this happen when calling sqlglot, which formats compiled SQL code we ingest from DBT.
Whether the recursion limit is reached depends on:
- the SQL used in Create a Derived Table (only very long expressions will trigger the issue)
- the
compiled_code
being output in the DBT manifest (it’s possible this depends on the exact DBT command that generated the manifest) - the runtime behaviour of Datahub (exactly when the recursion limit is reached depends on the runtime call stack)
To mitigate this, we switched off include_compiled_code
and include_column_lineage
for the DBT ingestion source, however recent versions of datahub should handle this gracefully, and fixes have since gone into sqlglot, so we don’t
expect this to happen again if either of these settings is switched back on.