Efficient Networking Management for Enterprises in AWS with Shared VPCs

Introduction
How do enterprises efficiently manage Networking Management across hundreds of cloud accounts while maintaining security, reducing costs, and minimizing operational overhead? As organizations such as Swisscom adopt Amazon Web Services (AWS), they face the complex challenge of implementing scalable, automated networking solutions that break away from traditional high-touch, manual approaches.
When Swisscom, Switzerland’s leading telecom provider, began their AWS cloud journey, they sought to revolutionize Networking Management through a fully automated, secure, centrally governed implementation. Using AWS Shared VPCs and strategic automation allowed Swisscom to create a networking model that not only reduces IPv4 waste but also enables cost-effective scaling across hundreds of accounts while dramatically improving operational efficiency.
This post describes Swisscom’s requirements and their innovative journey to implement a large-scale, automated Networking Management solution on AWS.
Swisscom’s Requirements
When adopting cloud networking, you quickly realize that one size doesn’t fit all. For Swisscom, defining clear requirements was crucial to creating a successful networking strategy. The following key considerations shaped their approach:
1.Multi-tenant but centrally governed:
The network constructs should automatically extend to provisioned accounts, allowing reuse of critical centrally deployed and governed resources.
2.Highly automated:
The solution should be fully automated, not needing manual steps or the intervention of engineering or operational teams.
3.Self-service:
The application teams can self-provision and deprovision networking components as needed for their workloads.
4.Secure:
The networking components align to segmentation and zoning architecture agreed by security teams.
5.Reducing IPv4 waste:
IPv4 addresses are finite resources, thus the amount of Swisscom routable IPs used in AWS must be kept as small as possible.
6.Cost-effective:
This should scale to hundreds of accounts while being as cost-efficient as possible.
To achieve these requirements, especially the cost efficiency and central governance, Swisscom decided to use Shared VPCs to minimize the number of VPCs and associated resources (such as NAT Gateways, AWS Transit Gateway Attachments, and VPC Endpoints). Shared VPCs allow a VPC deployed centrally to share its subnets to other accounts using AWS Resource Access Manager (AWS RAM). This results in the VPC being centrally managed, but application teams can deploy resources into these VPCs from their accounts. Furthermore, using Shared VPC allows Swisscom to control the usage of internal routable IPv4 addresses, limiting the waste of these finite resources.
VPC Architecture
Swisscom hosts the Shared VPC centrally, in a “Platform VPC account” managed by the Cloud Platform Engineering team.
The Shared VPC architecture is made up of three types of subnets:
Public, externally routable subnets:
To/from the internet, such as Public Application Load Balancers. Outbound internet connectivity is achieved through an Internet Gateway (IGW) and NAT Gateway to limit usage of public IPv4 addresses.
Private, internally routable subnets:
Using Swisscom assigned internal routable IPv4 address space. This ensures that it can be reached over Swisscom’s dedicated Direct Connect connection from an on-premises datacenter.
Private, non-routable subnets:
Using Swisscom assigned IPv4 address space that is locally significant to the VPC. This is used to host resources that don’t need ingress on-premises or internet connectivity, such as Kubernetes Pods and backend databases that need significant IPv4 usage. Resources in this space can access services beyond the VPC through private or public NAT gateways.

Automated Networking with Shared VPCs at Swisscom
A standardized Shared VPC in the Swisscom Organization looks like the following:
1. Platform VPC Account:
The VPCs are deployed by the platform engineering team in a dedicated “Platform VPC” account per environment, such as development, staging, and production. Each VPC consists of multiple subnets:
a) Public routable subnets:
For internet connectivity. Public routable subnets are shared among multiple accounts.
b) Private routable subnet:
For intra-VPC and on-premises connectivity. Private routable subnets are shared among multiple accounts.
c) Multiple private non-routable subnets:
Dynamically created based on application team requirements. Private non-routable subnets are shared to only one account, and dedicated to specific applications.
d) One non-routable endpoint subnet:
With data plane service endpoints where low latency requirements are deployed. Endpoint subnets are never shared to any account.
e) Transit Gateway attachment subnet:
For attaching the VPC to the transit gateway that resides in the centralized Platform VPC account. Transit gateway attachment subnets are never shared.
2. Team 1 Account:
Application teams can request sharing the different subnets to their account. Private routable subnets are always shared. Furthermore, Team 1 requested the public subnet and a dedicated, non-routable subnet.
3. Team 2 Account:
The Team 2 workload doesn’t need public exposure, as they operate an internal web application only. Therefore, they didn’t request a public subnet.
4. Team 3 Account:
Like the previous ones, Team 3 only hosts an internal-facing Amazon Redshift database, and the endpoint must be reachable from on-premises.
The use of Shared VPCs drove the requirement for an automated solution, which allowed internal application teams to request access to the Shared VPCs being managed centrally. When making this request, the automation creates and shares subnets to their account for consumption while implementing the Swisscom public cloud zoning model.
Automation Walkthrough
Automation plays a significant role in enabling application teams to consume Shared VPCs. It simplifies operations by handling requests and provisioning, managing subnet capacity across multiple teams, and scaling beyond a single VPC.
Swisscom built an automated solution to handle these considerations. In the following steps, we describe the workflow:

The high-level flow is shown in the following steps:
1. Establish shared VPCs:
2. Automated capacity management:
In a multi-tenant Shared VPC setup, you must track the usage of key VPC metrics to measure their readiness for new applications. A capacity tracking application is used to measure and score the VPC health. This application uses Amazon CloudWatch events to periodically trigger AWS Lambda functions, which store capacity data in Amazon DynamoDB.
3. Provisioning requests with Service Catalog:
The application team uses a Service Catalog product to request VPC subnets, using a Lambda custom resource to send requests to AWS Step Functions in the “Platform VPC” account for handling the request.
4. VPC scheduler:
Step Functions validates the VPC that should be used by checking the environment (in this case Dev) and the VPC health score. The net result is the selection of a VPC with appropriate capacity.
5. Subnet sharing:
The subnets are shared with the application account. The account admins can’t change the VPC setup, but they consume it by deploying workloads into the account.
In the next section, we walk through some of the key features of this solution.
Automated Capacity Management
Shared VPCs involve multiple tenants sharing the same address space, making networking management and capacity planning crucial. Swisscom identified this requirement and built an automated capacity management solution that regularly monitors the provisioned Shared VPCs and their readiness to host new applications using a scoring system. Furthermore, the scoring system ensures that existing applications have room to grow within the VPCs, and if not, it alerts the Platform team through CloudWatch alarms.
Hundreds of accounts are expected to request a VPC, so it is inevitable that capacity within a single VPC will eventually be fully consumed. Swisscom needed a way to scale the number of VPCs without manually assigning a VPC to each account. To address this, VPCs are “pooled” together into logical groups. These groups help identify the application environment (such as dev, pre-production, production) and the functional group (such as general workload, Telco, Streaming, and Analytics) using tags.
When the system creates a new VPC, it adds it to an inventory of active VPCs within DynamoDB using the following process:
1. The platform engineering team:
Deploys a new VPC with tags for the pool (a grouping of pools) and function (alignment to a Swisscom division or use case).
2. For new VPCs:
A CloudWatch event triggers a Lambda function to collect VPC details. For existing VPCs, this is regularly validated using a scheduled CloudWatch event.
3. The Lambda function:
Collects key metrics such as subnet usage, and tags and calculates a health score.
4. Details are stored:
In a DynamoDB table to be referenced when VPC requests are received.

VPC Provisioning Requests
The platform engineering team centrally creates and shares well-architected Service Catalog products with application accounts, allowing application owners to self-provision VPCs. This approach streamlines Networking Management, ensuring that application account users don’t need permissions to manage networking in the account. By using constraints, the Service Catalog product retains the necessary permissions to provision required resources. The product retrieves some parameters from pre-provisioned AWS Systems Manager parameters, which define data aligned to account characteristics (for example, the environment being Dev or the line of business). The system enforces the use of Systems Manager parameters to assign the right VPC to the account, ensuring proper zoning and security for its specific use case.
The Service Catalog product enables users to select t-shirt sizes and other parameters, ensuring the assignment of the requested capacity.

VPC Scheduler
The VPC scheduler is responsible for receiving and processing the requests from the AWS Service Catalog products, providing a reliable cross-account automation for assigning the Shared VPC networking resources to the requestor. This was an area of trial and error for Swisscom. They used Lambda custom resources as a mechanism to assume that a role in the platform account triggered a Service Catalog product, which provisioned and shared the VPC subnets with the initiating spoke account.
Although initially this setup was functional, it increased operational load on the Swisscom Platform Team due to the following limitations:
1. Having multiple Service Catalog products:
Needlessly increased the complexity.
2. Service Catalog doesn’t natively support:
Retry mechanisms or error management.
As a result, the team re-architected the central scheduler into Step Functions. This was more customizable and effective in handling retry and errors, and integrated directly with the AWS CloudFormation service whenever possible.

1. Input validation:
Verifies the incoming request event such as Create, Update, and Delete from spoke service catalog product.
2. VPC selection:
Queries the DynamoDB table to find a suitable VPC based on requested t-shirt size, function, capacity, and tags.
3. Subnet creation:
Creates new subnets and shares them with AWS RAM using CloudFormation.
4. Systems Manager parameters:
Retrieves the outputs of the CloudFormation template, such as Subnet-IDs, and returns them to the requesting account.
5. Error handling:
Implements retry logic for transient failures and provides clear error messages for permanent failures.
Subnet Sharing
Upon completion of the VPC Scheduler, the application team receives a success message for the provisioned Service Catalog product. They can update and delete their Shared VPC if necessary. These actions would retrigger the cross-account automation. The application team can now provision services into the newly assigned VPC, inheriting the zoning, connectivity, and IP addressing, but they can’t change it.
Lessons Learned
One of the most significant advantages of the shared VPC approach is the simplicity it brings to consumers. Using a shared VPC relieves consumers of the platform from the burden of dealing with complex network configurations and time-consuming operations. Instead, they can focus their efforts on core business functions, allowing for increased productivity and efficiency. This simplicity not only enhances overall user experience but also reduces the learning curve for new team members. Although consumers benefit from simplicity, providers must navigate the complexities of networking management, building and managing the shared infrastructure, and the associated automation. This ensures its availability, security, and seamless integration with various consumer teams.
At Swisscom, they learned that their internal platform consumers appreciate this solution but also expect it to work flawlessly at any given time. After all, they must avoid lowering the service level of a self-developed solution compared to the battle-proven service that AWS offers. However, after investing in and building up the domain expertise required for this shared VPC concept, they quickly reaped the fruit of their labor. Today, Swisscom’s organization on AWS hosts more than 800 accounts. For this footprint, they calculated that an account-dedicated VPC design would be approximately 30 times more expensive as compared to the implemented shared VPC design. Thanks to the scalable architecture, cost savings are likely to be even higher with a growing number of accounts within the organization.
Conclusion
Swisscom’s cloud journey started on the premise of building standardization, optimizing costs, and reducing complexity or cognitive load through automation. Choosing Shared VPCs required careful consideration of the application onboarding journey, shifting the heavy lifting from users to platform providers, who leverage automation to ease the operational burden.
Although the shared VPC approach introduces additional complexity for platform providers, the centralization of resources and networking management ultimately results in a reduction in overall costs. Using shared infrastructure allowed platform providers to optimize resource usage, monitor performance on a broader scale, and minimize redundant network components. Although the initial setup and ongoing maintenance may require additional investments, the long-term cost benefits and simplified experience for consumer teams make it worthwhile.
Cloudastra helps organizations like Swisscom by offering relevant services that streamline cloud management, enhance automation, and optimize resource utilization, ensuring a seamless transition to cloud infrastructure while maintaining cost-effectiveness and security.
When Swisscom, Switzerland’s leading telecom provider, began their AWS cloud journey, they sought to revolutionize networking through a fully automated, secure, centrally governed implementation. Using AWS Shared VPCs and strategic automation allowed Swisscom to create a networking model that not only reduces IPv4 waste but also enables cost-effective scaling across hundreds of accounts while dramatically improving operational efficiency.
This post describes Swisscom’s requirements and their innovative journey to implement a large-scale, automated networking solution on AWS.
Do you like to read more educational content? Read our blogs at Cloudastra Technologies or contact us for business enquiry at Cloudastra Contact Us