How search works
Datahub uses Elasticsearch as its search engine. It is possible to customise search ranking, filtering, and queries using a configuration file but for the time being we are using the default configuration.
Fields are indexed by elasticsearch based on how Datahub’s metadata model is configured; all fields that are marked with a “Searchable” annotation will be included in the elasticsearch mapping.
To find out which fields are searchable, see the .pdl files in metadata-models or view the Searchable
tag in the datahub demo instance.
The Elasticsearch field names diverge from the GraphQL ones, which changes what you need to pass in for search filters. General conventions:
- don’t include the aspect (e.g. it’s
name
rather thanproperties.name
) - collections are plural
How the GraphQL API translates queries into Elasticsearch queries
The GraphQL API has several queries for search. We are using searchAcrossEntities and aggregateAcrossEntities. There are two kinds of query string we can pass in, a simple one and a structured one for advanced searches. We’re using the simple one.
When you call one of these GraphQL queries, the GMS constructs a corresponding Elasticsearch query:
- For each field in the
SearchConfig
, SearchQueryBuilder generates Elasticsearch should clauses - SearchQueryBuilder Applies score functions to adjust the ranking of results
- SearchRequestHandler applies any filters. By default, soft-deleted results are filtered out.
Useful fields
Filter | Format | Related fields |
---|---|---|
urn | URN | |
customProperties |
prop2=pikachu or prop2
|
|
browsePathsV2 | ||
deprecated | boolean | |
removed | boolean | |
typeNames | ||
name | qualifiedName | |
description | hasDescription | |
lastOperationTime | datetime | |
createdAt | datetime | |
lastModifiedAt | datetime | |
platform | platformInstance | |
tags | URN | hasTags |
glossaryTerms | URN | hasGlossaryTerms |
siblings | URN | |
owners | URN | hasOwners |
roles | URN | hasRoles |
container | URN | hasContainer |