CoreWeave, Inc. (CoreWeave) powers the creation and delivery of the intelligence that drives innovation.
The company is the AI Hyperscaler driving the AI revolution. The company’s CoreWeave Cloud Platform consists of its proprietary software and cloud services that deliver the software and software intelligence needed to manage complex AI infrastructure at scale. The company’s platform supports the development and use of ground-breaking models and the delivery of the next generation of AI appli...
CoreWeave, Inc. (CoreWeave) powers the creation and delivery of the intelligence that drives innovation.
The company is the AI Hyperscaler driving the AI revolution. The company’s CoreWeave Cloud Platform consists of its proprietary software and cloud services that deliver the software and software intelligence needed to manage complex AI infrastructure at scale. The company’s platform supports the development and use of ground-breaking models and the delivery of the next generation of AI applications that are changing the way the company live and work across the globe—the company’s platform is trusted by some of the world’s leading AI labs and AI enterprises, including Cohere, IBM, Meta, Microsoft, Mistral, and NVIDIA.
The company purpose-builds its CoreWeave Cloud Platform to be the infrastructure and application platform for AI. The company’s platform manages the complexity of engineering, assembling, running, and monitoring state-of-the-art infrastructure at a massive scale to deliver high performance and efficiency to AI workloads. Through the company’s proprietary software capabilities, the company enables its customers to achieve substantially higher total system performance and more favorable uptime relative to other AI offerings within existing infrastructure cloud environments and unlock speed at scale. By delivering more compute cycles to AI workloads and thereby reducing the time required to train models, the company’s capabilities can significantly accelerate the time to solution for customers in the ongoing hyper-competitive race to build the next bleeding-edge AI models. For example, in June 2023, the company’s NVIDIA H100 Tensor Core GPU training cluster completed the MLPerf benchmark test (which benchmarks how quickly a system can train a model from scratch) in eleven minutes—a record and 29 times faster than the next best competitor at the time of the benchmark test.
These efficiencies also extend from training to inference use cases, as the company’s CoreWeave Cloud Platform significantly improves both run-time efficiency for inference workloads and enables overall higher AI application uptime. These performance gains help to ensure lower performance-adjusted costs and a superior end-user experience. The supercomputers the company builds to power its platform are optimized to support many types of AI workloads, and they are augmented by the company’s suite of cloud services to deliver meaningful time and cost savings to customers through the company’s orchestration, automation, and monitoring capabilities.
Customers utilize the company’s platform through a set of cloud services comprising Infrastructure Services, Managed Software Services, and Application Software Services, all augmented by the company’s Mission Control and Observability software. The company’s comprehensive and integrated cloud services work together as a suite to deliver compute, networking, and storage. These services enable the provisioning of infrastructure, the orchestration of workloads, and the proactive monitoring of the company’s customers’ training and inference environments to increase performance and minimize interruptions.
The company’s CoreWeave Cloud Platform is hosted in the company’s distributed network of active purpose-built data centers that are interconnected using low latency connections to major metropolitan areas, and incorporate state-of-the-art data center networking equipment, enhanced access to power, and where appropriate, the latest liquid cooling technologies.
As of December 31, 2024, the company had 32 data centers running more than 250,000 GPUs in total and supported by more than 360 MW of active power. The company’s total contracted power extends to approximately 1.3 GW as of December 31, 2024, which the company expects to roll out over the coming years.
The company benefits from robust collaborations with leading chipmakers, OEMs, and software providers to supply the company with infrastructure components and other products. The company has a proven track record of rapidly expanding its power capacity to support the growth of the company’s data center footprint along with its collection of managed cloud services.
The company’s customers include some of the world’s leading AI labs and AI enterprises — the builders and integrators of AI — who depend on the company’s platform for their core products and most promising innovations. The company delivers significant benefits to its customers in terms of overall performance, time to market, and reduced cost of ownership, which results in the company’s customers making large, long-term initial commitments and expanding those commitments with the company over time. The company also sells access to its platform on an on-demand basis through a pay-as-you-go model.
Solution
The company’s CoreWeave Cloud Platform is an integrated solution that is purpose-built for running AI workloads such as model training and inference at maximum performance and efficiency. It includes Infrastructure Services, Managed Software Services, and Application Software Services, all of which are augmented by the company’s Mission Control and Observability software. The company’s proprietary software enables the provisioning of infrastructure, the orchestration of workloads, and the monitoring of the company’s customers’ training and inference environments to enable high availability and minimize downtime. Built on a microservices-based architecture, the components of the company’s platform are fully fungible and composable. Customers can configure their use of the company’s CoreWeave Cloud Platform to best fit their needs. For instance, they can choose to bring their own storage or managed software services or run the company’s respective solutions, and choose the type and scale of deployment that best suits their workloads. This flexibility allows the company’s customers to customize their use of the company’s platform without compromising performance or efficiency. The company has designed security as a fundamental component across its platform and technology stack. The company leverages advanced security capabilities such as XDR and DLP, adheres to industry leading security standards such as SOC2 and ISO 27001, and employs the company’s in-house information security teams to ensure that the company’s customers operate in a secure environment.
The following is a summary of the company’s Layered Architecture Stack.
Infrastructure Services provide the company’s customers with access to advanced GPU and CPU compute, highly performant networking (supported by DPUs), and storage.
Managed Software Services include CKS (a purpose-built-for-AI managed Kubernetes environment with a focus on efficiency, performance and ease of use), the company’s flexible Virtual Private Cloud offering, and the company’s Bare Metal service that runs Kubernetes directly on high-performance servers for maximum performance and efficiency.
Application Software Services build on top of the company’s infrastructure and managed software services, integrating additional tools to further accelerate and improve training and inference for the company’s customers. This includes SUNK, which allows customers to run Slurm-based workloads on top of Kubernetes and colocate jobs—including training and inference workloads—on a single cluster; CoreWeave Tensorizer, which significantly increases the efficiency of model checkpointing and enables high-speed model loading; and the company’s inference optimization services.
The company’s purpose-built technology stack is augmented by its lifecycle management and monitoring software, Mission Control and Observability, and the company’s advanced cluster validation, proactive health checking capabilities, and observability capabilities. The company’s AI cloud runs in a distributed network of 32 active purpose-built data centers, which are specifically engineered to support high intensity AI workloads with features including enhanced power, liquid cooling, and networking components, reinforcing the robustness of the company’s entire technology stack. The company’s Third-Party Tooling and Solutions further enhance this flexibility by providing a composable architecture that allows customers to customize their solution by integrating additional third-party tools.
Growth Strategies
The company’s principal growth strategies are to extend the company’s product leadership and innovation; continue capturing additional workloads from existing customers; extend into broader enterprise customers across new industries and verticals; expand internationally; increase the company’s vertical integration; and maximize the economic life of the company’s infrastructure.
Platform and Product Offerings
The company’s CoreWeave Cloud Platform is an integrated solution that enables companies to run AI workloads with high performance and efficiency. It includes the company’s Infrastructure Services, Managed Software Services, and Application Software Services, all bolstered by the company’s Mission Control and Observability software. The company’s proprietary software underpins every component of the platform, allowing for highly secure provisioning of infrastructure, effective orchestration of workloads, and real-time monitoring of training and inference environments. Security is a fundamental component of the company’s platform. The company ensures its customers operate in a secure environment by implementing a zero trust model for data access and leveraging advanced security technologies, including XDR and DLP deployed across the company’s endpoints. Additionally, the company uses single-sign-on and multi-factor authentication to ensure the company’s CoreWeave Cloud Platform remains resilient against identity-based cyber threats.
Infrastructure Services: Delivering bleeding-edge compute, networking, and storage infrastructure
The company’s platform is powered by its foundational Infrastructure Services, which utilize the company’s proprietary software and a combination of high-performance GPUs, CPUs, DPUs, storage, and networking equipment, all calibrated to deliver the performance at scale required to power AI workloads.
Compute. The company’s compute is delivered through combinations of GPU and CPU nodes interconnected with state-of-the-art, high-performance networking technology such as NVIDIA InfiniBand and optimized through DPUs in a high throughput network topology that provides extensive scalability for AI workloads. GPU nodes are the primary engines for AI compute and are supported by the latest generation of CPUs, memory, PCI-Express data interconnects, NVMe SSDs, and DPUs. These components help to extract maximum performance out of GPUs and also offload non-core tasks. The company continuously monitors the health and performance of GPU and CPU nodes to ensure improved resilience and rapid recovery.
GPU Compute. The company provides its customers with access to a vast portfolio of high-performance GPUs that are purpose-built for AI. This includes the NVIDIA H200, which the company was among the first to market with at production scale, and NVIDIA H100. The company’s H100 architecture enabled the company to break the MLPerf record in 2023, delivering training speeds 29 times faster than competitors at greater scale.
CPU Compute. The company offers versatile CPU instances to support AI workloads. The company’s infrastructure utilizes some of the industry’s highest performing and latest CPUs instead of prior generation technology that can degrade GPU performance and utility. The company’s CPUs complement the company’s GPUs by performing tasks, including data pre-processing, control plane functions, and workload orchestration, which frees GPUs to focus on compute intensive tasks.
DPUs. DPUs optimize compute for AI workloads by offloading networking, security, and storage management tasks from GPUs and CPUs. They are a critical enabling component for increasing overall efficiency and performance.
Nimbus. Nimbus is the company’s control and data-plane software running on the company’s DPUs inside Bare Metal instances, performing the typical role of a hypervisor in enabling security, flexibility and performance. Nimbus-enabled DPUs remove the need for a virtualization layer and give customers the flexibility to run directly on the company’s servers without a hypervisor, enabling greater compute performance. Nimbus also provides security through the isolation of customer models and data encryption, while enabling them to set up Virtual Private Cloud environments.
Networking. The company’s networking architecture is highly specialized and uniquely designed to meet the complex needs of AI use cases. It includes the company’s high-performance InfiniBand based cluster networking, its data center network fabric which connects the company’s GPU and CPU nodes to the company’s control plane via DPUs for efficient offloading of certain processing tasks, the company’s VPC networking framework, as well as the company’s Direct Connect offering which provides enterprise-grade networking and supports multi-cloud deployments. The ultra-fast connection and superior throughput enabled by the company’s networking architecture ensures faster training and inference times for the company’s customers.
Cluster Networking is the result of the company’s relationship with NVIDIA to design a networking architecture that is purpose-built for AI clusters. The NVIDIA InfiniBand network that the company deploys is one of the largest in existence with up to 3,200Gbps of non-blocking GPU interconnect and provides industry-leading effective networking throughput to accelerate time to train and serve models. The company’s Blackwell deployments will further be supported by external NVLink Switches, a low latency, scalable, and energy-efficient protocol that allows GPUs to communicate with other GPUs and CPUs within the same and different systems more efficiently. These technologies enable the company to offer its customers access to tens of thousands of GPUs connected in a single cluster, and the ability to create massive megaclusters. The company’s megacluster resilience is supported by Mission Control, which prevents and rapidly remediates any deficiencies that arise.
VPC Networking creates isolated virtual networks to manage CoreWeave Cloud Platform resources, allows customers to securely connect compute, storage, and networking resources to their development and deployment platforms using the latest security best practices, including encryption, isolation, and access control.
CoreWeave Direct Connect plugs into the company’s carrier grade networking backbone and enables embedded scale that supports the company’s data centers. It is built to support multi-cloud needs, operates across the United States and Europe, and provides private, highly performant connections to transfer data with speed and security. It allows the company’s customers to easily connect their CoreWeave clusters with resources available at other cloud providers or on-premises. Direct Connect boasts port speeds of up to 400Gbps, flexible options to connect through either dedicated ports or existing carriers, and budget-friendly costs with no data transfer or egress fees.
Storage. The company’s systems incorporate enterprise-grade, software-defined scale-out storage capabilities. The company’s highly performant, secure, and reliable storage capabilities are designed for the most complex and demanding AI workloads. They load data at rapid speeds, enabling large distributed AI workloads to be scaled up in seconds. These storage capabilities also allow customers to benefit from auto-provisioning of GPUs and store large volumes of model checkpoints and intermediate results so that teams can stay on track after interruptions to their training or inference jobs. The company’s purpose-built storage architecture enables the company’s customers to achieve significantly faster load times. The company’s storage services, which include object storage and distributed file storage, leverage industry-best security practices, including encryption at rest and in transit, role-based identity access management, and authentication.
Object Storage. Given traditional object storage solutions are not designed for accelerated workloads, customers typically need to leverage a caching layer on top of object storage to run their GPU clusters at top performance. The company’s object storage solution is built from the ground-up specifically for AI. It eliminates the need for an intermediate cache based on a file storage system. It leverages the company’s proprietary Local Object Transport Accelerator, which caches data locally onto GPU nodes to deliver the performance required to go straight from object storage to GPUs.
Distributed File Storage. The company offers distributed file storage solutions that centralize asset storage and support parallel computation setups. For customers who require these features and prefer to utilize a distributed file storage system in addition to object storage, the company’s Distributed File Storage system provides the flexibility to do so. The company’s platform also enables customers who run on-premise to integrate their own distributed file storage system into the company’s platform, providing a truly flexible technology stack that is designed to support the company’s customers’ storage needs.
Dedicated Storage Clusters. The company’s microservices based architecture allows the company to support its customers’ choice of storage back-ends. The company works with an ecosystem of storage cloud partners to provide flexibility and choice in order to get access to their preferred storage solutions that are well integrated with the company’s CoreWeave Cloud Platform.
Managed Software Services: Fully managed deployment for the most efficient workloads
The company’s Managed Software Services include CoreWeave Kubernetes Service, the company’s AI workload orchestration and auto-scaling solution, the company’s VPC product, and the company’s Bare Metal service.
CoreWeave Kubernetes Service. CKS is the company’s AI-optimized, managed Kubernetes service that minimizes the burden of managing large GPU clusters. CKS is purpose-built for AI workloads and delivers fast performance, security, and flexibility in a fully managed Kubernetes solution. CKS has built-in guardrails and automated processes specialized for AI workloads that reduce the need for teams to spend countless hours managing complex Kubernetes clusters. CKS clusters leverage Bare Metal nodes without a hypervisor to maximize node performance, and the company’s DPU-based architecture for complete isolation and acceleration with private-cluster VPCs. As such, CKS ensures that each node operates at peak performance within a secure, isolated environment. There are extensive customization options available to manage data, control access and policy, and handle authentication and other security controls, giving customers ultimate flexibility and authority over their specific data management practices.
Virtual Private Cloud. The company’s VPC solution allows customers to utilize an isolated, private section of the company’s CoreWeave Cloud Platform where they can run their resources in a controlled network environment. It delivers a flexible experience backed by enhanced security, with both direct control and a high degree of customization. It enables hyper specific networking policies on a workload-by-workload basis, including terminating VPNs, managing routing, and access control. The company’s robust node isolation capabilities create both data and VPC segregation that deliver maximum security for the company’s customer workloads.
Bare Metal. The company’s Bare Metal service is fast, reliable, and performant. The vast majority of AI workloads do not need virtualization. Instead, they need direct access to resources to run training, inference, and experimentation with maximum performance and low latency. The company eliminates the need to have a hypervisor layer and enable the company’s customers to run Kubernetes, or an orchestration platform of their choice, directly on Bare Metal instances. This allows the company to combine the flexibility of cloud-like provisioning and the power and performance of dedicated hardware, unlocking higher performance, increasing reliability, freeing up compute resources, and allowing for in-depth insights on cluster health and performance through granular metrics. For customers who prefer a more managed experience, the company’s Bare Metal service also enables customers to spin up Bare Metal nodes in CKS and offload certain basic functions, such as storage and network drivers.
Application Software Services: Additional tools that accelerate and improve training and inferencing
The company’s Application Software Services allow companies to seamlessly integrate additional tools that accelerate and improve training and inference. Due to the company’s microservices-based architecture, the company’s platform is fully composable, with customers able to run their own orchestration solution or third-party applications seamlessly as part of the company’s stack. As with all the company’s other services, the company follows security best practices at every step.
Slurm on Kubernetes Integration (SUNK). Slurm is an open-source workload manager popular among AI model developers and other users of high-performance compute. It is an industry leader for the orchestration of massive parallel scheduling jobs such as training LLMs. Kubernetes, on the other hand, is designed for containerized workloads in cloud-native environments making it particularly well suited for model serving. Customers previously had to either choose between Slurm or Kubernetes on a per cluster basis for their resource management needs, even though the two applications serve different use cases and excel in different environments. Often, this led to maintaining two distinct platforms and pools of compute. The company has eliminated the need to choose between Slurm or Kubernetes by creating SUNK, a proprietary offering released in early 2024 that integrates Slurm with CKS and allows for Slurm jobs to run inside Kubernetes. This allows developers to leverage the resource management of both Slurm and Kubernetes and results in a seamless end-user experience. Different AI workloads can be co-located on the same cluster, including training, inference, and experimentation, unlocking greater workload fungibility. SUNK enhances the efficiency of compute by sharing resources between Slurm and Kubernetes and streamlining deployment of the company’s system. Introducing Slurm into the company’s stack has solved a major infrastructure pain point for the company’s customers while reducing their total cost of ownership.
CoreWeave Tensorizer. CoreWeave Tensorizer is the company’s training and inference optimization solution that spins up models rapidly from storage directly into GPU memory from a variety of different endpoints. It serializes models into a single binary file and incorporates a caching layer to quickly flow models to the closest node to the client, thereby dramatically cutting down resource expenditure caused by long loading times. When tested on a larger model size using a higher-performing GPU, the impact of Tensorizer on model load time becomes even more pronounced. In terms of relative performance, the company’s Tensorizer had average model load times that were 1.7x and 1.4x faster than that of SafeTensors and HuggingFace respectively. The company’s Tensorizer solution also increases model training efficiency by enabling fast checkpoints and reducing restart times. Checkpointing ensures models are frequently saved and archived during the training process such that failures do not require full and time consuming restarts.
Inference Optimization & Services. The company’s platform is optimized for inference, delivering unparalleled flexibility, efficiency and access to the compute required to serve these workloads effectively. Through the company’s planned inference optimization services, customers will be able to right-size their workloads with access to a varied GPU fleet customized to their specific performance and cost requirements, and can provision what they need when they need it to match the complexity of their workloads.
Mission Control & Observability: Delivering full visibility into and control over infrastructure and workloads
AI compute infrastructure brings together multiple cutting-edge technologies including the latest and most powerful compute resources, high bandwidth memory, high-speed chip-to-chip interconnects, and node-to-node connectivity. While these technologies deliver unprecedented performance, they come with the additional overhead and complexity of dealing with node and system failures. The company’s Mission Control & Observability solution helps customers to manage their infrastructure and operations with a suite of services designed to ensure optimal setup and functionality of infrastructure components throughout their entire lifecycle, thereby preventing and rapidly remediating issues and enhancing performance. Mission Control consists of the company’s FLCC and NLCC, and is complemented by the company’s Observability solution.
Fleet LifeCycle Controller (FLCC). The company’s Fleet LifeCycle Controller automates node provisioning, testing, and monitoring to validate nodes and systems for Day 1 operations and beyond.
Node LifeCycle Controller (NLCC). The company’s Node LifeCycle Controller proactively assesses ongoing node health and performance to ensure problematic nodes are replaced with healthy ones before they cause failures. Both FLCC and NLCC are systems that are running on the company’s back-end.
Observability. The company’s Observability solution provides access to a rich collection of node and system-level metrics to complement the capabilities of Mission Control that help prevent or quickly identify system faults and restore system-level performance. Faults can occur for various reasons, such as bad user configuration (memory allocation issues), misbehaving software updates, server component malfunctions, or issues in the high speed node-to-node network. Customers can collect and receive alerts on metrics across their fleet, using dashboards that visualize either the entire cluster or individual jobs, and identify root-cause issues in a matter of minutes, as well as early warning signs that enable them to replace or repair nodes that are close to failure. The company’s Observability solution gives customers deep insight into their infrastructure down to the temperature of individual GPUs.
Deep Technical Partnerships
The company’s partnership strategy is to augment its capabilities by developing deep technical partnerships and relationships with companies that share the company’s vision for empowering customers and accelerating the AI ecosystem. These partnerships are central to how the company builds its products and services: they enable the company to unlock capabilities further up the stack and serve as layers of abstraction that extend the company’s reach to a broader customer base who ultimately run on the company’s infrastructure. Through the company’s partnerships, the company has been able to drive enhancements in AI Ops/inference (e.g., Run:ai), improve infrastructure capabilities (e.g., Datadog), and engineer new purpose-built data architectures (e.g., Vast Data and Pure Storage). The company’s partnership strategy further extends to its relationships and collaboration with suppliers like NVIDIA whose GPUs the company brings to market in highly efficient and performant architecture for end customers, and to the company’s data center providers such as Switch and Chirisa Technology Parks who the company works with to integrate the most cutting-edge design and cooling standards customized for the performance requirements of AI workloads.
Customer Experience
All of the company’s solutions are complemented by its highly skilled and AI specialized Customer Experience teams, which provide deep insights and expertise in AI compute, networking, and storage. The company’s teams’ deep AI expertise is fundamental to the company gaining the trust of some of the world’s leading AI labs and AI enterprises, and so the company has purposefully designed its Customer Experience organization in a way that seamlessly extends the company’s expert capabilities to its customers. The company’s Customer Experience team is consisted of:
Fleet Ops and Cloud Ops: Teams that support the company’s FLCC and NLCC solutions and function as operations centers supporting overall system resilience across the company’s customers’ infrastructure. The company’s Fleet Ops team monitors how fleets are operating, while the company’s Cloud Ops team monitors customer-specific deployments.
Support Specialists: Dedicated Support Operations Engineers and Customer Success Engineers who are tasked with setting-up deployments and working with customers to scale workloads as quickly and efficiently as possible. These teams help the company’s customers achieve their goals with the company’s product through proactive support and resources and focus on understanding and assisting customers post-sale to foster long-term relationships. The company’s team of engineering experts is available 24/7 to assist customers and provide the support needed to ensure optimal cluster performance. Support specialists also include the company’s Documentation team, which creates and maintains the technical documentation to support the company’s customer success, sales, and marketing teams.
Solutions Architects: Experts who work with the company’s engineering teams to ensure customer infrastructure runs at peak performance. The company’s solutions architects focus on helping customers understand and derive value from the company’s platform, and cover both pre-sale and post-sale processes, including proof-of-concept, implementation, switching to new architectures, and ongoing management/deployment. They ensure customers derive maximum value from their infrastructure and assist with any tuning work required to unlock the leading performance of the company’s platform.
Across the company’s entire platform, the company is committed to delivering robust security solutions that maintain system integrity, availability, and data security. The company continuously tests its platform for security vulnerabilities using automated testing methods and have teams dedicated to penetration testing, enterprise vulnerability management, and application security. The company also has a Security Operations team that monitors security events in real time 24 hours a day to provide continuous protection.
Data Center Footprint
The company’s platform is powered by some of the largest and most sophisticated data centers in the world, consisting of complex installations of the most cutting-edge GPU accelerators. These large clusters are designed and built with pioneering network architectures and a ‘no compromise’ philosophy that centers around maximizing performance for AI workloads. The state-of-the-art hardware deployed in the company’s data centers requires the most forward thinking physical infrastructure designs to deliver highly performant networking, power, and cooling characteristics. Each component of the company’s data center technology stack is purposefully selected to contribute to, and compound, this performance advantage. The result is a geographically distributed and highly performant data center footprint underpinned by cutting-edge components, high-density design configurations, and robust security standards.
Cutting-edge data center technologies maximize rack density. The company’s data centers are purposefully designed to maximize rack density and drive power per rack. They utilize advanced liquid cooling systems that enable the efficient use of space by removing the bulky chillers and airflow management equipment required in air cooling systems. The improved heat capacity of liquid cooling systems also allows the company to stand up racks closer together, supporting higher power density while preventing performance degradation of the company’s systems due to overheating. Liquid cooling systems, while highly efficient at supporting dense, power-intensive workloads, require a fundamental redesign of the data center in order to accommodate necessary piping, pumps, and heat exchangers. This often entails having a separate subfloor to contain routing and plumbing infrastructure while keeping the main floor more organized and compact to drive density.
These design requirements render traditional cloud service providers, who are tied to significant existing air-cooled data center footprints and buildout plans, incapable of maximizing their data center capacity without time intensive and costly retrofitting challenges. Liquid cooling systems and the greater power density and efficiency it unlocks, is not a nice-to-have feature but rather an essential component of any GPU-based infrastructure that is purpose-built for AI.
Broad Footprint. The company has architected a distributed and interconnected portfolio of data centers that delivers a high density of compute close to major population centers, thereby minimizing latency for end users. The company’s portfolio covers cities spanning the United States and is rapidly expanding into new markets in Europe, delivering high-compute capacity to regions that are capacity constrained.
Massive Scale. Delivering highly performant and flexible infrastructure means building high capacity data centers that can quickly scale from tens to hundreds of megawatts in response to burst workloads, i.e., sudden surges in demand. The company leverages dark fiber connectivity to deliver rapid and efficient inter-data-center connectivity and enable customer workloads to burst across regions. By connecting multiple active purpose-built data centers with high speed interconnects, the company delivers the flexible compute required for large scale supercomputers optimized for AI workloads, and can scale this compute to match the company’s customers’ needs.
Embedded Security. The company reinforces its infrastructure with industry leading security standards and certifications, including SOC 2 and ISO 27001, to ensure that customers are met with the most robust data security practices. The company’s security measures also extend to its physical security, where the company employs rigorous standards around background checks, access control, security awareness training, and a zero-trust framework. This multi-layered approach ensures a secure, resilient operating environment at every level of the company’s data centers.
Altogether, these design considerations have enabled the company to deliver some of the most performant, efficient, and secure infrastructure deployments in the industry, at unprecedented scale. The company’s innovation also extends beyond the physical hardware and infrastructure powering the company’s data centers to the supporting software and infrastructure services. This allows customers to run their AI jobs without the burden of managing the underlying infrastructure themselves.
As of December 31, 2024, the company had more than 360 MW of active power and approximately 1.3 GW of total contracted power. This includes 32 cutting-edge data center facilities in the United States across 15 states, as well as facilities in three countries. Importantly, the company’s power capacity is available in large individual deployments that are capable of serving large infrastructure clusters, which is critical for the company’s larger customers.
In May 2024, the company opened its European headquarters in London to help facilitate the company’s expansion to Europe. The company’s initial expansion to Europe includes five sites in the United Kingdom, Spain, and Sweden. In June 2024, the company announced an investment to expand in Spain and Sweden and launch in Norway, which the company expects will more than double the company’s MW of capacity by the end of 2025.
Customers
The company offers a solution used by organizations of all sizes that require sophisticated AI compute, from the largest of enterprises to small, well-funded start-ups. The company’s customers today are segmented into two key verticals: AI Natives and AI Enterprises. Companies within these verticals include the builders of AI, the integrators of AI, and those that have business models whose success largely hinges on deploying AI capabilities in their core products and technology stack. The company’s differentiation lies in the specialized AI infrastructure that the company delivers and makes accessible to its customers on Day 1, which enables them to accelerate their time to market and achieve lower total costs of ownership. The company shifts the entire burden of managing infrastructure from the company’s customers to its platform, allowing them to spend more time building models, serving models, or focusing on critical strategic priorities such as growing their customer base or developing their product roadmaps.
Customer Verticals
The company’s customers include:
AI Natives: Customers whose core competency and singular focus is AI. These companies serve as important layers of abstraction for the market and enable end customers across industries to unlock the power of AI by accessing the power of foundational models. This vertical includes:
AI Labs: Research organizations building their own foundational models through proprietary datasets, and creating products to deliver those models to the market. While these labs have tended to focus more on building general-purpose models, industry-specific AI labs, such as the AI cluster at the Chan Zuckerberg Initiative Foundation supporting research in the life sciences, are expected to become a growing part of this segment. These AI labs deliver their general purpose models or custom versions of those models via APIs to businesses of all shapes and sizes in order to power a growing range of applications. The company’s AI Lab customers include, among others, Cohere and Mistral AI.
AI Ops / ML Ops: Organizations providing software or platform solutions ‘up the stack’ from cloud infrastructure to make AI adoption easier for businesses with inference or training needs. These companies provide a range of solutions, including access to pre-trained model hubs and datasets, APIs to streamline model deployment, and dedicated tooling that helps optimize training and fine-tuning techniques. Companies within this space will be critical enablers of AI and its broader adoption as they will support customers who either lack the willingness, expertise, or financial means to build proprietary models through their own research teams and dedicated infrastructure. AI Ops / ML Ops customers include, among others, Replicate.
AI Enterprises: Large companies whose business and product are not AI, but are being driven by AI. These include Fortune 500 companies and other large enterprises with business models that are increasingly AI-enabled through their own development efforts and/or that leverage AI directly to drive internal efficiency gains. The company has seen initial traction in adopting its platform from big tech, financial institutions, and life sciences companies to date with customers such as Meta, IBM, Microsoft, and Jane Street.
In addition to the AI Natives and AI Enterprises who the company sells to directly, there is a large segment of the market that uses the company’s platform, solutions and services indirectly. This includes the tens of thousands of enterprises that have not yet built in-house AI models or systems, but who use solutions developed by AI Natives. As AI continues to find product market fit, the company expects these enterprises to grow their indirect consumption of the company’s platform through AI labs. Furthermore, as these enterprises increase their consumption and the company’s product roadmap evolves and verticalizes up-the-stack, the company expects that it will serve more of these enterprise customers directly through the company’s dedicated solutions and partnerships.
For the year ended December 31, 2024, the company’s largest customer was Microsoft, which accounted for 62% of the company’s revenue.
In February 2023, the company entered into a Master Services Agreement (the ‘Microsoft Master Services Agreement’) with Microsoft, pursuant to which the company provides Microsoft with access to its infrastructure and platform services through fulfillment of reserved capacity orders submitted to the company by Microsoft and as may be amended upon the company’s and Microsoft’s mutual agreement.
The company’s focus on committed contracts and revenue from some of the world’s leading providers of AI is deliberate in that these close relationships have allowed the company to scale rapidly and provide the company with unique insights into delivering highly performant and efficient AI compute at scale. In turn, the company enables these customers to incorporate better AI into their products to drive faster time to market and broader adoption, thereby reinforcing their continued adoption of the company’s platform.
Competition
The company’s primary competitors are larger, global enterprises that offer general purpose cloud computing as part of a broader, diversified product portfolio. Key companies in this category are Amazon (AWS), Google (Google Cloud Platform), IBM, Microsoft (Azure), and Oracle.
In addition to these large companies, the company also competes with smaller cloud service providers, including Crusoe and Lambda.
Research and Development
The company’s research and development costs were $56 million for the year ended December 31, 2024.
History
The company was founded in 2017. It was incorporated in 2017 as a Delaware limited liability company under the name The Atlantic Crypto Corporation LLC and converted to a Delaware corporation in 2018 under the name Atlantic Crypto Corporation.