October 9, 2023

9 min read

Unleashing Impact at Slack’s Data Engineering Internship

Camryn McDonaldSoftware Engineering Intern

Donavon ClaySoftware Engineering Intern

Esha ThotaSoftware Engineering Intern

Naman ModaniTPM Intern

Tayven TaylorSoftware Engineering Intern

Introduction

Ever wondered what it’s like to intern as a software engineer at Slack? Picture yourself on the famous Ohana floor—the 61st floor of the Salesforce Tower in San Francisco— it is one of many privileges we had as interns. Not only did our experience with Slack’s Data Engineering team let us step onto the tech frontier, but it also gave us the chance to gain invaluable engineering experience and forge relationships across teams. Most importantly, we got to contribute to Slack’s mission through two impactful projects: The Reliable Data Discovery Tool helped users discover relevant data assets, while the Job Performance Tracking and Alerting system empowered engineers with insights into end-to-end performance metrics. Stick with us to hear about what it was like to be a Data Engineering intern on various projects, and what made us successful in this journey.

Onboarding

The onboarding experience for all Data Engineering interns was a seamless and well-orchestrated process, accommodating our varying start times. We were divided into two distinct two-week portions, the first focused on Salesforce and Slack integration. We participated in a series of informative meetings, engaging break-out sessions, and hands-on tutorial prep sessions. This comprehensive approach left us interns feeling well-prepared and confident as we embarked on our internship journey.

Following this, the second portion encompassed a week solely dedicated to onboarding onto the Data Engineering pillar. During this time, we were provided with detailed setup documentation, ensuring we had the necessary tools and resources at our disposal. More importantly, this week emphasized team-building, fostering a welcoming and helpful environment. We had the opportunity to meet and get to know everyone on the team, allowing us to forge connections and feel like an integral part of the Data Engineering team at Slack from day one.

Double the projects, double the impact!

As Data Engineering interns, we helped advance Slack’s mission through two major projects, spanning across four core teams at Data Engineering. The Reliable Data Discovery Tool helped users with a better data search experience, while the Job Performance Tracking and Alerting system enabled targeted ETL job performance optimization for data engineers.

Reliable Data Discovery Tool

Camryn McDonald, Esha Thota

Problem statement: Our internal analytics tool allows Slack employees to search dashboards (1,700), queries (1.8 million), and tables (11,000) to retrieve data. That’s a lot of data! As the numbers of each of these existing search items increase, the data search/retrieval process becomes increasingly slow and unreliable. How can users truly know what makes a good dashboard or table? How can we make it easy for users to locate the information relevant to their use cases?

Users are missing an easily-navigable, reliable search experience that makes it simple for them to locate the data they need and gain insight into its reliability and usefulness. By using our backend solution of OpenSearch — an open-source search and analytics platform built on Elasticsearch — and creating a frontend that makes our data easy to access and parse, we are able to assist our users better in finding the data that they need.

Backend: Diving into backend work with this intern project, we utilized OpenSearch — an improved solution compared to our current implementation. We worked to index and search metadata from DE’s internal data assets, return ranked search results for main object types, and display relevant metadata alongside results to indicate reliability. All of this work helped to address the issue of scalability as we continue to get more and more data.

Frontend: Throughout the course of our intern project, we truly got to take a deep dive into design. Utilizing Typescript, React, and AntDesign, we built out a flexible and easily navigable search UI for users to find, sort, and filter their data. It can return relevant information about data objects (including title, owner, monthly active users and warnings, etc.), and manages the state of results in URL for easier sharing of findings. We learned about what truly makes a good user experience- and how to design for it!

Integration and Problem Solving: Having worked on different aspects of the project, we gained experience in integrating various frontend features seamlessly with the backend, utilizing the Fast API. We learned to work together to approach tricky problems, how to deal with blockers, and how to ask the right questions to get the answers we needed.

Job Performance Tracking and Alerting

Donavon Clay, Tayven Taylor

Problem statement: Across many of Slack’s engineering teams, thousands of data pipelines run, process, and generate petabytes of data daily (yeah, that’s a lot). Sometimes, those processing jobs fail because of performance issues, and teams have to take the time to investigate the failure and take corrective action. While this process works, it’s inefficient and takes our engineers’ time away from other projects. What teams are missing is a comprehensive view into historical job performance that gives them the tools to manage their pipelines proactively, instead of reactively.

Our project takes data from two main sources, the Spark History Server and Apache Airflow, and creates an all-encompassing table with various metrics that is used to visualize teams’ historical job performance on internal dashboards. With the ability to look at how their jobs perform over time, we give teams the opportunity to optimize their jobs and resource usage, while limiting failure.

Spark metrics: This neat UI known as the Spark History Server allows us to dissect our data processing jobs and look at performance metrics (think memory usage, runtime, etc.). After running through some documentation, we learned how to set up an ingestion job that queried the server’s API, parsed the performance data, and stored it for later usage. There was definitely a learning curve to it, but we were able to gain some valuable knowledge of tools used in data engineering.

Airflow metrics: Airflow is the tool we use to schedule batch jobs. At a high level it functions sort of like an ordered to-do list (do this task, then once it’s finished, do this). The data we were interested in pulling from here included things like team ownership and performance expectations. We spent some quality time with the documentation to get the hang of the technology and then set up ingestion pipelines to pull in all of the data and fields we needed, doing sanity checks to make sure it worked properly.

Enriched table and dashboards: After working on both Spark Metrics and Airflow Metrics datasets, the final step was to merge them together to get a table that has not only job performance, but team identification and configurations as well. Using this enriched table, we created dashboards for teams to look at their most computationally expensive tasks, historical job trends, and set up alerts to signal when a job might fail. This was something that hadn’t previously existed, so many teams were excited to see it. We were happy to see we made an impact, all whilst learning these technologies we hadn’t had a chance to see in school.

The TPM perspective

Naman Modani

As a Technical Product Management (TPM) intern on the Data Engineering team, I had the unique opportunity to work cross-functionally on multiple high-impact projects. Beyond managing the delivery of the Reliable Data Discovery Tool and Job Performance Tracking and Alerting initiatives, I also helped drive two concurrent flagship programs: Data Retention, aligning with Slack’s privacy goals, and Legacy Users Deprecation, replacing old, less efficient tables with faster, modular ones.

My role involved aligning the product and engineering vision, facilitating clear communication between stakeholders, and proactively surfacing risks, bottlenecks, and dependencies. With the help of my mentor and manager, I learned so much about what it takes to successfully manage projects in an agile environment. While no day was the same, my key takeaways include:

Identifying and aligning on scope early is crucial. This enabled us to break down epics into actionable stories while mitigating risks upfront.

Effective stakeholder management is mission-critical. I strived to keep all parties continuously looped in through meetings, documentation, and weekly updates. This open flow of information enabled seamless collaboration, and helped drive accountability through active listening and engagement with Product, Engineering, and leadership.

Flexibility and adaptation are vital. Requirements evolve rapidly, so I had to remain nimble and adjust timelines accordingly. Agile principles kept me grounded when plans changed.
Technical acumen goes a long way. Actively developing foundational data/systems knowledge helped me better contextualize projects and converse knowledgeably with cross-functional partners. This also helped me automate manual processes for several programs I worked with.
Soft skills make all the difference. Patience and empathy were instrumental in working through complex technical challenges and building strong intra-team connections. The human element was invaluable.

I loved diving deep into Slack’s data systems and helping drive meaningful initiatives. Cross-functional work gave me valuable insights into delivering impactful projects amidst rapid change. I’m thrilled to apply these learnings in my future roles.

Conclusion

As we reflect on our internship experience—especially on what made our intern journey successful—a few key themes emerge:

The opportunity to collaborate on real-world projects enabled impactful hands-on learning. Tackling complex challenges let us gain practical engineering skills beyond textbook knowledge. Working cross-functionally with our talented mentors and managers helped us achieve more than we could have individually.
The unwavering support of our mentors empowered our growth. Whenever we felt stuck, they patiently guided us through technical roadblocks. Their insights and encouragement pushed us to keep learning and improving. By working closely with our mentors, we grew tremendously both technically and personally.
The welcoming and fun Slack culture made each day enjoyable. Bonding during activities like office onsites (minigolf!), and lunch outings fostered camaraderie. This helped us build meaningful connections across teams and significantly improved team collaboration.
Navigating challenges developed our resilience and adaptability. From relocating to new cities to tackling complex data systems, we learned to embrace change and push past our comfort zones. These experiences helped us mature professionally and personally and laid a solid foundation for our future career.

Overall, these factors helped make our internship a success at Slack. We’re deeply grateful for the lessons, connections, and memories from our time here. As we each embark on the next chapters in our careers, we’ll carry the inspirational Slack ethos with us.

#airflow#data-engineering#internships#search#spark

Executing Cron Scripts Reliably At Scale

Cron scripts are responsible for critical Slack functionality. They ensure reminders execute on time, email…

September 28, 2023

6 min read

The Query Strikes Again

On Thursday, 12 Oct. 2022, the EMEA part of the Datastores team — the team…

November 15, 2023

15 min read

Unleashing Impact at Slack’s Data Engineering Internship

Optimizing Our E2E Pipeline

How we built enterprise search to be secure and private

Migration Automation: Easing the Jenkins → GHA shift with help from AI

Break Stuff on Purpose