Skip to main content

Restore search indices

Our Datahub deployment stores metadata in two places: PostgreSQL and OpenSearch.

If the two become out of sync, we may see inconsistencies in Datahub and Find MoJ data, for example:

  • containers may appear empty, even though they should contain datasets
  • search results may be missing some entities
  • filtering on a tag doesn’t bring back everything with that tag

If this happens, follow these instructions to restore OpenSearch indices.

Before starting the datahub-datahub-restore-indices-job-template cron job, amend the configuration to use the following arguments:

  - -a
  - batchSize=800
  - -a
  - urnBasedPagination=true

This reduces the default batchSize to avoid running out of memory on dev and enables urnBasedPagination for performance (see RestoreIndices argument reference)

You can optionally include a urn or urnLike argument to restrict the reindex to specific entities, e.g.

  - -a
  - batchSize=800
  - -a
  - urnBasedPagination=true
  - -a
  - urnLike=urn:li:dataset:(urn:li:dataPlatform:dbt,cadet%

Troubleshooting

This page was last reviewed on 7 January 2025. It needs to be reviewed again on 7 July 2025 by the page owner #data-catalogue .
This page was set to be reviewed before 7 July 2025 by the page owner #data-catalogue. This might mean the content is out of date.