ADR-002 We will host DataHub internally
Status
✅ Accepted
Context
A data catalogue for the MOJ is a critical pillar of the data strategy. Following a review of open source catalogues, DataHub was selected as the preferred tool.
Since Acryl Data offer a SaaS version of DataHub, this ADR is concerned with the choice of how we host our catalogue.
Analysis
There are two options:
- We use Acryl Data’s managed SaaS offering
- We host DataHub ourselves using one of our two strategic hosting platforms:
Both approaches have advantages and disadvantages:
SaaS (third-party managed) hosting
Advantages
- We do not have to provision and manage infrastructre
- Availability and uptime are part of a paid-for SLA
- Version updates and patches are managed for us
- We do not need a deep understanding of the internals of the application
- Team members do not have to spend time configuring and deploying the application
Disadvantages
- We have less control over what infrastructure is provisioned
- Hosting costs will be higher
- Procurement for a recurring, licensed product would cause significant delays in getting an application to users
- A supplier relationship and contract will need to be managed and monitored
- Private sector entities can be acquired by another entity, or IPOs can initiate unexpected changes to a service or its pricing
In-house hosting
Advantages
- We have much greater control over the provisioning of resources
- Our strategic hosting platforms are security baselined
- Our hosting costs will be lower, since they will include any volume discounts and not include third-party margins
- Connecting the catalogue to our other internal systems will be more straightforward (i.e. our SSO service and catalogue metadata ingestion)
- We are easily able to provision dev/test and staging instances
- Acryl provide helm charts for self-hosting, so we are not required to create the deployments from scratch
Disadvantages
- We are responsible for SLAs, uptime etc.
- We are responsible for updates and patches1
- Our team requires a deep knowledge of the setup and configuation of the application2
1 Arguably, an advantage, in that we also gain control over when upgrades are rolled out, and they can be tested in a preprod environment beforehand.
2 Arguably, also an advantage.
Decision
The internal hosting advantages of cost, security and connectivity outweigh any advantaged of third-party SaaS hosting. The decision is to host DataHub internally, using Cloud Platform.
Consequences
The Data Catalogue team will need to:
- Understand the DataHub deployment stack
- Create and maintain Cloud Platform environments
- Integrate Entra ID into the application
- Deploy the application
- Secure the application
- Monitor performance
- Check for and apply patches and other updates
- Test new features in a realistic non-production environments