Migration to AWS
Prozorro.Sale is a digital auction system that helps generate state revenue through open and fair bidding.
Prozorro.Sale was created through cooperation with the Ministry of Economic Development and Trade of Ukraine, Transparency International Ukraine, the State Deposit Guarantee Fund, the National Bank of Ukraine, and Ukrainian electronic platforms. Prozorro.Sale system is obligatory at the legislative level for state, municipal asset realization in Ukraine. The target groups are state, local authorities, the State Property Fund of Ukraine as well as businesses and citizens participating in online auctions.
Prozorro.Sale needed to provide uninterrupted access to the resource.
That is, so that if any of the problems arose:
- physical loss of the data center,
- loss of Internet connection,
- loss of administrator access, and further consequences,
- logical application errors or data corruption due to external influences, should not prevent the use of the service by the users 24/7.
A disaster recovery infrastructure project based on the AWS cloud was initiated.
The project consisted of 4 stages:
- Preparation of site architecture and migration of unsynchronized services to AWS.
- Copying of infrastructure services.
- Real-time data synchronization and testing.
- Final refinements, general system testing, documentation compilation.
The war in Ukraine caught the project at stage №3 – Real-time data synchronization and testing
In the current situation, Disaster Recovery took over the role of the main, production platform.
The concentration of forces was redirected to the design of new infrastructure for prod env and modifications to the IaC platform, to ensure reliability and availability, as well as data selection and migration.
Steps in designing the new infrastructure:
- Identify resource types to optimize cost, availability, and reliability.
- Defining affinity and anti-affinity parameters for Kubernetes stateful set with PVC to support multiple AZs based on gp2 storage class.
- VPN 3d party migration using AWS export/import tools.
IaC platform modification steps:
- Terraform code modification.
- GitLab CI/CD modification for IaC.
Based on data classification methods, it was determined which data should be migrated:
- ELK, Opendistro logs larger than 10TB.
- MongoDB data.
- Data from swift repository to S3.
Created an architecture that minimizes risk and handles the rest of the tasks.
We decided to keep one instance of GitLab in the backup datacenter. The deployment tool was GitLab CI, extending the existing pipelines so that when the working version of the code on the main share was updated in sync, the backup would also be updated. The expectation was that the actual gap between versions would be only a few minutes, saving and securing the entire business process.
We also limited the use of all proprietary services, except for EKS: it was a balance of independence and optimal support costs.
Completely changed the file service process. The new logic of the process was caching the data and writing it in parallel to the two storages: the main and backup. In other words, the data was considered as received right after caching, but the transaction is closed inside the service only after receiving write confirmation from both storages.
A willful decision was made to “lose” the logs along with the main resource: synchronizing it was too time consuming compared to the planned result (in fact, the logs were not recreated for one working day only, since the backups were daily).
Since the goal was to switch ARI clients as smoothly as possible (by rewriting DNS records), it was necessary to synchronize keys between sites. The keys consisted of ARI keys as well as VPN profiles. With ARI keys, the solution was similar in code – an extension of SI. With VPN accesses, there was no solution.
The Triangu team fully prepared the infrastructure for the migration to the AWS cloud service. In the process of migration, the IaC was re-formed and a new platform for the production environment was formed.
Prozorro.Sale successfully migrated and has been running stably on AWS cloud services for over 3 months.