Infrastructure
Srasta runs entirely inside your infrastructure — on-prem, private cloud, or hybrid. This guide covers deployment models, hardware sizing, and GPU selection by tier.
Srasta does not require public SaaS hosting. All components operate inside your controlled infrastructure.
GPU servers in your data centre. Full control, zero external dependencies. Recommended for regulated industries and air-gapped requirements.
AWS, Azure, or GCP — deployed inside your VPC. No data leaves your cloud account. Supports GPU instance types across all major providers.
VMware, OpenStack, or Proxmox on your existing data centre infrastructure. Srasta deploys via Docker Compose or Kubernetes.
On-prem inference combined with cloud integrations. Run your models on owned hardware while connecting to cloud-based storage or services.
Our production reference configuration — the platform you see demonstrated is running on this hardware.
The DGX Spark is NVIDIA's purpose-built enterprise AI platform. It is the recommended starting point for organisations deploying Srasta on-prem at scale.
Sizing scales with subscription tier and workload concurrency.
Lower VRAM requirements. Suitable for knowledge retrieval, document Q&A, and policy lookup. Runs on entry-level GPU hardware.
Recommended for agentic workflows, code intelligence, and multi-step reasoning. Requires A100 / H100 class GPU. Our reference deployment runs 30B FP8.
Mixture-of-Experts architectures require higher VRAM and benefit from multi-GPU routing. Srasta supports model pooling and hybrid routing strategies.
For production deployments:
Prevents ingestion workloads from impacting active inference latency.
Isolate the vector store to its own node for reliable search performance at scale.
Monitor latency, token usage, error rates, and cost per team before scaling.
Distribute across availability zones for resilience in AWS, Azure, and GCP environments.
Snapshot vector collections, knowledge ingestion pipelines, and governance configuration on a scheduled basis.
Optional but recommended for Enterprise Plus deployments with unpredictable concurrency.
Answer three questions to get an indicative configuration. For accurate sizing, schedule a deployment session with our team.
We scope every deployment during the AI Readiness Assessment — including model selection, GPU sizing, cloud vs on-prem trade-offs, and a cost estimate.