Building Background Effects for ClipsLast September, Slack released Clips, allowing users to capture video, audio, and screen recordings in messages to help distributed teams connect and share their work. We’ve continued iterating on Clips since its release, adding thumbnail selection, background blur, and most recently, background image replacement. This blog post provides a deep dive into our implementation of […]
06/07/22
Scaling Slack’s Mobile Codebases: ModernizationIn the first two posts about the Duplo initiative, we described why we decided to revamp our mobile codebases, the initial phase to clean up tech debt, and our efforts to modularize our iOS and Android codebases (post 1, post 2). In this final post, we will discuss the last theme of the Duplo initiative, […]
05/04/22
Continuous Load TestingBuilding load test infrastructure is tricky and poses many questions. How can we identify performance regressions in newly deployed builds, given the overhead of spinning up test clients? To gather the most representative results, should we load test at our peak hours or when there’s a lull? How do we incentivize engineers to invest time […]
04/29/22
Slack’s Incident on 2-22-22By Laura Nolan, with contributions from Glen D. Sanford, Jamie Scheinblum, and Chris Sullivan. Assessing conditions Slack experienced a major incident on February 22 this year, during which time many users were unable to connect to Slack, including the author — which certainly made my role as Incident Commander more challenging! This incident was a […]
04/26/22
Handling Flaky Tests at Scale: Auto Detection & SuppressionAt Slack, the goal of the Mobile Developer Experience Team (DevXp) is to empower developers to ship code with confidence while enjoying a pleasant and productive engineering experience. We use metrics and surveys to measure productivity and developer experience, such as developer sentiment, CI stability, time to merge (TTM), and test failure rate. The DevXp […]
04/05/22
Stabilize, Modularize, Modernize: Scaling Slack’s Mobile CodebasesIn the first post about the Duplo initiative, we discussed the reasons for launching a project to revamp Slack’s mobile codebases, and what we accomplished in Duplo’s initial Stabilization phase. This post will explore modularization, and then there will be a third post to describe how we modernized our codebase and the overall results of […]
03/28/22
Applying Product Thinking to Slack’s Internal Compute PlatformAccording to a recent Thoughtworks radar, “the industry is increasingly gaining experience with platform engineering product teams that create and support internal platforms.” They caveated this with a piece of advice: “When creating a platform, it’s critical to have clearly defined customers and products that will benefit from it rather than building in a vacuum.” […]
03/09/22
Balancing Safety and Velocity in CI/CD at SlackIn 2021, we changed developer testing workflows for Webapp, Slack’s main monorepo, from predominantly testing before merging to a multi-tiered testing workflow after merging. This changed our previous definition of safety and developer workflows between testing and deploys. In this project, we aimed to ensure frequent, reliable, and high-quality releases to our customers for a […]
02/18/22
Building Self-driving Kafka clusters using open source componentsIn this article, I will talk about how Slack uses Kafka, and how a small-but-mighty team built and operationalized a self-driving Kafka cluster over the last four years to run at scale. Kafka is used at Slack as a pub-sub system, playing an essential role in the all-important Job Queue, our asynchronous job execution framework […]
02/11/22
Stabilize, Modularize, Modernize: Scaling Slack’s Mobile CodebasesWhen do you need to overhaul a large code base to address tech debt? What is the best way to address widespread inconsistencies and outdated patterns? How can you make significant architectural improvements to a complex application while still continuing to ship features? These were questions we grappled with at the beginning of 2020, when […]
01/12/22
A Simple Kubernetes Admission WebhookWhile adding a recent feature to our Kubernetes compute platform, we had the need to mutate newly-created pods based on annotations set by users. The mutation needed to follow simple business rules, and didn’t need to keep track of any state. Surely there must be a canonical solution to this simple problem? Well, sort of. […]
12/14/21
Going from Coder to Slack EngineerOver 70% of the files uploaded on Slack are images, and over 75% of those images are screenshots. What this tells us is that though images are ephemeral, screenshots are often used as a quick way to provide extra detail and context, and typically gain a high level of engagement over a short time period. […]
12/02/21
The Case of the Recursive ResolversOn September 30th 2021, Slack had an outage that impacted less than 1% of our online user base, and lasted for 24 hours. This outage was the result of our attempt to enable DNSSEC — an extension intended to secure the DNS protocol, required for FedRAMP Moderate — but which ultimately led to a series of […]
11/29/21
Developing in the OpenWe use plenty of open source tools at Slack and we’ve benefited immensely from the wider Android, Kotlin, and Gradle communities. We also try to be good citizens by giving back. This includes things like sponsoring the Kotlin Lang Slack, contributions to projects we use like Anvil and Insetter, sharing projects of our own like […]
11/10/21
How Two Interns Are Helping Secure Millions of Lines of CodeAt Slack, proactively securing our systems is a top priority. One way we achieve this is by automating the detection of security issues with static code analysis, which are tools that inspect programs without executing them. They’re often used with security-based rules to automate identification of vulnerabilities and insecure programming practices, which frees up more […]
11/04/21
Building the Next Evolution of Cloud Networks at Slack – A RetrospectiveAbout a year ago, I wrote a blog post called Building the Next Evolution of Cloud Networks at Slack. In it, we discussed how Slack’s AWS infrastructure has evolved over the years and the pain points that drove us to spin up a brand-new network architecture redesign project called Whitecastle. If you have not had […]
10/20/21
Infrastructure Observability for Changing the Spend CurveSlack is an integral part of where work happens for teams across the world, and our work in the Core Development Engineering department supports engineers throughout Slack that develop, build, test, and release high-quality services to Slack’s customers. In this article, we share how teams at Slack evolved our internal tooling and made infrastructure bets. […]