Saved ~$30,000 on monthly cloud costs
We built and scaled an automated data pipeline and hybrid cloud setup to reduce processing delays and cut cloud spend for high-volume genetic data workflows.
the client
Client is revolutionizing the food industry using cutting-edge food innovation engine that combining data science and machine learning with biology and genetics.
Industry
Food Sciences & Crop Genetics.
Engagement Details
- Service Type: Product Development Teams
- Model: Offshore
Technology Stack
- Python
- AWS (Cloud)
- GCP (Cloud)
- Docker
- Terraform
Business Need
Automate and develop high volume data pipelines to fuel the AI engine for genetic data processing>
Challenges
- Scattered Data sources across – Files (multiple formats), databases (SQL & NoSQL), Websites, FTP’s & external API’s.
- High Running costs on data pipelines on the cloud.
- Slower processing of diverse genetic data on machine learning pipelines.
Services
- Built a team of Data Architects, Data Engineers & Data Scientists with specific domain expertise on Food Sciences & Genetics.
- Used in-house developed accelerator (Centipede) to automate the data aggregation and cleansing from multiple sources.
- Reviewed existing architecture and created a step-by-step plan to migrate and improve the design to reduce cloud costs and improve data processing speeds.
- Added a dedicated team to manage the MLOps and cloud infrastructure with 24×7 monitoring for production & Beta Stage environments.
Result
- Reduced manual data cleansing and reduced data aggregation timelines.
- Improved execution time of Machine Learning pipeline and reduced costs using a hybrid cloud architecture.
- Saved ~$30,000 on monthly cloud costs.
- Reduced Infrastructure management across clouds using IaC (Infrastructure as Code Terraform).
- 24×7 MLOps & monitoring teams with minimum turnaround time.