Skip to main content

How search works

Datahub uses Elasticsearch as its search engine. It is possible to customise search ranking, filtering, and queries using a configuration file but for the time being we are using the default configuration.

Fields are indexed by elasticsearch based on how Datahub’s metadata model is configured; all fields that are marked with a “Searchable” annotation will be included in the elasticsearch mapping.

To find out which fields are searchable, see the .pdl files in metadata-models or view the Searchable tag in the datahub demo instance.

The Elasticsearch field names diverge from the GraphQL ones, which changes what you need to pass in for search filters. General conventions:

  • don’t include the aspect (e.g. it’s name rather than properties.name)
  • collections are plural

How the GraphQL API translates queries into Elasticsearch queries

The GraphQL API has several queries for search. We are using searchAcrossEntities and aggregateAcrossEntities. There are two kinds of query string we can pass in, a simple one and a structured one for advanced searches. We’re using the simple one.

When you call one of these GraphQL queries, the GMS constructs a corresponding Elasticsearch query:

Useful fields

Filter Format Related fields
urn URN
customProperties prop2=pikachu or prop2
browsePathsV2
deprecated boolean
removed boolean
typeNames
name qualifiedName
description hasDescription
lastOperationTime datetime
createdAt datetime
lastModifiedAt datetime
platform platformInstance
tags URN hasTags
glossaryTerms URN hasGlossaryTerms
siblings URN
owners URN hasOwners
roles URN hasRoles
container URN hasContainer
This page was last reviewed on 28 March 2024. It needs to be reviewed again on 28 June 2024 by the page owner #data-catalogue .
This page was set to be reviewed before 28 June 2024 by the page owner #data-catalogue. This might mean the content is out of date.