CloudWorld Stage E
Wednesday, February 17, 2021
Talk to executives in IT divisions of large enterprises about security and invariably the conversation will hover around
Is DevSecOps the only thing you need to do for security in your IT division or is there more?
What impact does bringing in secure culture in an engineering context mean?
What handshake is needed between the IT function and the security / risk function for large enterprises?
How does this impact roles and responsibilities of a developer?
This talk is an attempt to answer questions such as these using a real world examples of transformations seen in Fortune 100 companies.
BUSINESS PROBLEM & CHALLENGE
Network automation was not well practiced or well understood inside our network engineering team, but was sorely needed. We needed to decrease effort and mistakes on daily management tasks by minimizing the direct human interaction with network devices. High on our priority list of goals, was improving network security by recognizing and fixing security vulnerabilities and increasing the network performance.
HOW WE OVERCAME THE CHALLENGE
We started by simplifying daily workflows, baselining our configurations and removing snowflakes. While this can be very labour-intensive at the outset when you’re working on a global scale in a highly critical customer environment, the long-term benefits far outweighed the labour.
Next, we created an inventory file which listed all network devices by type, model, location and IP address - this enabled us to retrieve info about devices and using network programming and automation, allowing us to deploy to all devices, or even a subset of devices (eg. only those in a specific area), depending on what was needed. The benefit to this is we avoided manual configuration and logging into hundreds of different devices to add configuration to each one.
Overcoming these two big challenges set us up for success and enabled us to deploy at a global scale. We lived by the mantra:
“If it’s not repeatable, it’s not automatable. And if it’s not automatable, it’s not scalable.”
LEARNINGS AND MEASURABLE OUTCOMES
So what did we learn? For starters, it can be hard to automate a use case or test in the same way you would if doing it manually. Testing that requires physical movement, for example losing service provider links or hardware failure is also a challenge, as automating something like that is very tricky. We also learned that code reviews are extremely important. Shared code ownership means the entire team can make changes anywhere, at any time.
And what we’re the measurable outcomes?
Faster deployment times - we were able to efficiently push changes to over 300 network devices and audit the configuration of our global network, taking the time to execute from days down to hours.
Removed the fear of large and complex network changes - the accuracy and efficiency with which we were able to deploy at scale, gave business and the leadership more confidence in subsequent large scale network changes and deployments.
Faster feedback on network changes - it allowed us to get reviews on network configuration changes with version control and peer review, treating infrastructure as code (IaC).
Helped with adhering to PSIRT/CSIRT challenging timeframes and security vulnerabilities.
We started by simplifying daily workflows, baselining our configurations and removing snowflakes. Next, we created an inventory file which listed all network devices by type, model, location and IP address.
Speed of deployment; speed of feedback on network changes; speed of adherence to PSIRT/CSIRT timeframes; confidence and buy-in from senior leadership on subsequent deployments!
Apache Kafka is a complex system with multiple parameters to configure for different use cases. As a Consulting Engineer at Confluent, I can see many clients who need to benchmark their production systems and understand its capacity. Benchmark details can be used to optimize the throughput and better utilize resources. Benchmarking and measuring aren’t just for finding a bottleneck; they’re about trying to better understand the loads you’re placing on the system. This talk will focus on methods and tools of Apache Kafka performance analysis and benchmarking. It will be helpful to anyone trying to operate a large Apache Kafka cluster and achieve the throughput and latency goals.
Most organizations engaged in transformation today are moving from left to right in digitally-driven maturity models. The objectives are well known: increase agility, boost productivity, and provide seamless digital experiences for consumers.
Architects play a pivotal role as the curators of this transformation. In this session, Asanka will share his experience on how architects can contribute and introduce a framework to follow on refactoring enterprises.
The purpose of this effort is to collect, organize, and analyze data on remote learning at a school, teacher, and student levels across multiple platforms and activities to monitor, track, and report on remote/blended learning across all schools in the New York City.
-Ingest data from all learning tools and DOE enterprise systems
Support all data formats
-Access to system-wide data across all platforms, resources, school types
-Ability to connect any additional data sources to central database
-Supplemental learning resources by grade level
-Access to individual-level data, in order to create aggregates and cuts as needed (including by school, by demographics, etc.)
Tracing can be very powerful. It gives the ability to connect the customer experience to the backend services several hops away. This comes down to what information is in your traces. There isn't one standard set of tags to add for EVERY application. It comes down to how traces are used and what matters to your organization. During this session we will discuss the need for tracing, dive into the why (and what) you would want to trace via distributed tracing, and delve into the OpenTelemetry specs and architecture on how we can tailor (or tag) our traces. You’ll leave with an understanding of the semantic context in OpenTelemetry and how it might help you understand your applications.