A podcast from the team at Salesforce Engineering, exploring code, technology, tools, tips, and the life of the developer.
Code[ish] Salesforce Engineering
A podcast from the team at Salesforce Engineering, exploring code, technology, tools, tips, and the life of the developer.
118. Why Writing Matters for Engineers
In this episode, Ian, Laura, and Wesley talk about the importance of communication skills, specifically writing, for people in technical roles. Ian calls writing the single most important meta skill you can have. And the good news is that you can get better at it, with deliberate practice!
Ian and Wesley both come from engineering backgrounds but have moved into more writing-intensive roles as their careers have progressed. Laura is an instructional designer with experience across many industries. They all agree that writing plays several different important roles for people, whether it's to educate, persuade, or even mark a decision.
So if writing is such a critical part of what you're doing from an engineering perspective, how can you get better at it? Laura offers a handful of practices, including providing context, supplying the appropriate level of detail for the audience, using stories or analogies, incorporating repetition, and finding a good editor (even if it's yourself coming back to a piece with fresh eyes).
The guests close the episode by sharing some of their favorite resources for improving communication skills, which are listed below.
Links from this episode
“Programming as Theory Building" by Peter Naur
Example of an RFC process
Illusion of Explanatory Depth
The Sense of Style by Steven Pinker
Write and Organize for Deeper Learning by Patti Shank
Tech Writing course from Google
117. Open Source with Jim Jagielski
This episode is hosted by Alyssa Arvin, Senior Program Manager for Open Source at Salesforce, with guest Jim Jagielski, the newest member of Salesforce’s Open Source Program Office (OSPO). They talk about Jim’s early explorations into open source software during his time as an actual rocket scientist at NASA and his role in the formation of the Apache Software Foundation. Next, they discuss getting started in open source, specifically, how to find the right open source community for you to start contributing to. They suggest looking for a code of conduct that the project members take seriously to make sure you’re joining a community that is welcoming and takes diversity and inclusion seriously.
Who’s part of an open source community? Well, that would be more than just the contributors--it’s also the project’s end users, even companies who consume it. Those companies have a responsibility to support the projects they use, to contribute back and provide feedback to keep making it better. As an individual contributor (IC), contributing to open source can be part of your growth plan! Leveraging open source contributions to grow your skill helps you become a better employee. Jim encourages companies to adopt as frictionless a process as possible for employees contributing to open source.
Salesforce sees open source as a strategic advantage for the company. It’s a way of driving culture, of ensuring that teams collaborate and communicate and, in the process of doing that, drive innovation to benefit not only the individuals who contribute but the company as well.
How important is open source to your corporate culture? That will drive how you go about building an Open Source Program Office (OSPO). It really is, at the end of the day, a cultural shift.
Finally, Jim shares concrete tips for getting started with your first open source project. He suggests “lurking” in the community and checking their bug tracker for issues marked as “good for newbies.” Most projects have a handful of people who are signed up to be mentors and can help you out. And, look for something like a contributing.md file that makes it clear how you can get involved and what the future will hold for you as you get more involved.
Alyssa closes with the comment that she’s excited to work with and learn from Jim, and we are too! Expect to hear more from him on future podcast episodes.
Links from this episode
Open Source at Salesforce
Apache Software Foundation
Open Source Initiative
People Powered by Jono Bacon
The Apache Way
116. Success From Anywhere
This episode of Codeish includes Greg Nokes, distinguished technical architect with Salesforce Heroku, and Lisa Marshall, Senior Vice President of TMP Innovation & Learning at Salesforce. Lisa manages a team within technology and product that focuses on overall employee success in attracting technical talent and creating a great onboarding experience.
The impact of remote work
Salesforce is looking at various work configurations across remote and in-office options in different ways. She shares, "In the past 12 months, we've been thinking about what the future will look like. What do our employees want? What do our leaders want for different worker types?"
In addition to the fully remote and in-office workers are flex workers who "come into office maybe one, two, three days a week to work with their Scrum teams, or maybe even one day, every other week. You come to an office to work together when it makes sense for you and your team for collaboration and other ways."
She notes there's a lot to learn from workers like Greg, who has been working remotely for 12 years.
Greg notes, "It took me years to figure out how to work successfully from home and how to have home not encroach on work and work not encroach on home." After unsuccessfully working from the couch, he needed to get an office with a door. Greg stresses that remote work in the pandemic is not the same as remote work at other times. "One of my joys was going to a coffee shop, having a really good cup of coffee and sitting there without headphones on, just listening to people talk while I would write and just that background noise. And I really miss that. So I want to make sure that everybody who's been forced to go remote knows that the present is not a great example of remote work. It's a lot different, and it's a lot harder."
Lisa and her team have been talking with other companies who are fully remote and stress that the experience of working fully remote during the pandemic "... isn't normal. We know we all want to see each other. We want to get back together at times where it makes sense." Part of this is focusing on "the things that we can do right now that we want to keep doing in the future when things start to open up."
Greg Nokes asserts that a remote-first work approach differs in organizations where remote work is an afterthought. He gives the example of a group of San Francisco employees sending a lunch invitation over a messaging platform, "...and then everyone in San Francisco signed off and when they signed back on, I'm like, ‘What happened?’ They'll go, ‘Well, so-and-so said, where do we want to get lunch? And then we all talked about it in the coffee shop; we're all sitting in, and then we went to lunch together.’ And we're like, ‘that's not remote.’"
Lisa Marshall shares the need for intentional inclusivity. "We all know how horrible it feels when you're in a meeting. And when you're a remote person, and others are in the room, and it's very hard sometimes to get a word in edgewise, it's difficult to hear all the common things."
Her team is working on organizational guidelines, including team agreements on how people want to work together. One senior leadership team has decided their weekly team meetings will be 100% remote because they found they communicate better when they're all online versus some co-located.
How will offices look in the future?
Lisa believes the majority of the office will be flex in the future. "So we're looking at how do we want to configure our spaces to support the kinds of work people want to do in the office? What kind of different technologies can we use? What kind of seating arrangements around couches or different pods or other considerations for building in those spaces to be truly about collaboration versus only individual work?"
Lisa's team is also focused on trying apps and tools to see what works and start rolling the tech out to other locations.
Greg Nokes shares, "The las
115. Demystifying the User Experience with Performance Monitoring
In this episode of Codeish, Greg Nokes, distinguished technical architect with Salesforce Heroku, talks with Innocent Bindura, a senior developer at Raygun about performance monitoring.
Raygun provides tools and utilities for developers to improve software quality through crash reporting and browser and application performance monitoring.
According to Innocent, the absence of crash reports does not mean that software is performing well. Software can work - but not be optimal. Thus, Innocent takes a holistic view:
“I look at the size of my audience, and if it's something sizable, that gets a lot of traffic, for example, a shopping cart that gets a lot of traffic on a Black Friday. I would want to be in a comfort zone when I know that during the peak periods my application is still performing, so I tend to look at the end-user, how their experience looks like during very high peak periods. And from there I start working my way back to the technology that is supporting that application.”
Raygun really shines in monitoring the time spent in different functions and helping to improve the performance of highly hit endpoints. This includes performance telemetry of browser pages, the current application running, and server-side performance application monitoring. Raygun has lightweight SDKs or lightweight providers that can be injected into code. These provide a catch-all to deal with unhandled exceptions. They also encourage best practices for developers.
Over time, Raygun can provide a complete picture of how the user session performed “from the point they visited your page, logged in, visited a couple of pages, and then left your application. The crash reports and the traces relating to that particular user are also tied up with that session on the Raygun side.” Innocent highlights a sampling strategy that reduces the noise of APM data.
Raygun also provides a birds-eye application view that provides aggregated stats on application performance: “For the run product, you will have each page aggregated over time, regardless of how many users you've had in a period of time. You want to look at the individual sessions. That information is aggregated and you're able to see, for example, your median, your P90, and P99.”
Innocent focuses on the P99 figure because “whoever is in there has had a terrible time, and that forms the basis of my investigations. I want to know why there are so many sessions in that P99, and that P99 is probably a six or seven-second load time. I want to move that to a sub-three-second.” Innocent provides a definition of P99 for new customers undergoing the journey of performance optimization.
Next, Innocent asserts that decisions should be based on numbers and empirical evidence. He has found that the use of actionable data has enabled him to redesign applications and focus on the mission-critical command needed in real time.
Innocent concludes: “I think the life of a developer is an interesting one. We fit in everywhere situations permit, and we definitely take different routes to develop our careers. But ultimately what we should all be concerned about is the quality of the products that I produce. This definitely reflects on my capability as a software developer. What sets me apart from the next developer is not the number of cool techniques I can do with code, it's delivering a product that actually works and what better way of knowing what works when you actually measure things. Everybody should live by the philosophy of assuming nothing, measure everything. Everything and everything
114. Beyond Root Cause Analysis in Complex Systems
In this episode of Codeish, Marcus Blankenship, a Senior Engineering Manager at Salesforce, is joined by Robert Blumen, a Lead DevOps Engineer at Salesforce.
During their discussion, they take a deep dive into the theories that underpin human error and complex system failures and offer fresh perspectives on improving complex systems.
Root cause analysis is the method of analyzing a failure after it occurs in an attempt to identify the cause. This method looks at the fundamental reasons that a failure occurs, particularly digging into issues such as processes, systems, designs, and chains of events. Complex system failures usually begin when a single component of the system fails, requiring nearby "nodes" (or other components in the system network) to take up the workload or obligation of the failed component.
Complex system breakdowns are not limited to IT. They also exist in medicine, industrial accidents, shipping, and aeronautics. As Robert asserts: "In the case of IT, [systems breakdowns] mean people can't check their email, or can’t obtain services from a business. In other fields of medicine, maybe the patient dies, a ship capsizes, a plane crashes."
The 5 WHYs
The 5 WHYs root cause analysis is about truly getting to the bottom of a problem by asking “why” five levels deep. Using this method often uncovers an unexpected internal or process-related problem.
Accident investigation can represent both simple and complex systems. Robert explains, "Simple systems are like five dominoes that have a knock-on effort. By comparison, complex systems have a large number of heterogeneous pieces. And the interaction between the pieces is also quite complex. If you have N pieces, you could have N squared connections between them and an IT system."
He further explains, "You can lose a server, but if you're properly configured to have retries, your next level upstream should be able to find a different service. That's a pretty complex interaction that you've set up to avoid an outage."
In the case of a complex system, generally, there is not a single root cause for the failure. Instead, it's a combination of emergent properties that manifest themselves as the result of various system components working together, not as a property of any individual component.
An example of this is the worst airline disaster in history. Two 747 planes were flying to Gran Canaria airport. However, the airport was closed due to an exploded bomb, and the planes were rerouted to Tenerife. The runway in Tenerife was unaccustomed to handling 747s. Inadequate radars and fog compounded a combination of human errors such as misheard commands. Two planes tried to take off at the same time and collided with each other in the air.
Robert talks about Dr. Cook, who wrote about the dual role of operators.
"The dual role is the need to preserve the operation of the system and the health of the business. Everything an operator does is with those two objectives in mind." They must take calculated risks to preserve outputs, but this is rarely recognized or complemented.
Another component of complex systems is that they are in a perpetual state of partially broken. You don't necessarily discover this until an outage occurs. Only through the post-mortem process do you realize there was a failure. Humans are imperfect beings and are naturally prone to making errors. And when we are given responsibilities, there is always the chance for error.
What's a more useful way of thinking about the causes of failures in a complex system?
Robert gives the example of a tree structure or AC graph showing one node at the edge, representing the outage or incident.
If you step back one layer, you might not ask what is the cause, but rather what were contributing causes? In this manner, you might find multiple contributing factors that interconnect as more nodes grow. With this understanding, you can then look at the system and say, "Well, where a
113. Principles of Pragmatic Engineering
Karan Gupta, Senior Vice President of Engineering, Shift Technologies joins host Marcus Blankenship, Senior Manager Software Engineering, Heroku in this week's episode.
Karan shared his career trajectory, which includes founding aliceapp.ai, a fast, privacy-first recording and transcription service for investigative journalism, and acting as an advisor for various companies, including Alphy, a platform for women's career advancement.
A concept important to Karan is pragmatic engineering. Pragmatic engineering is about having "an oversized impact on the business by applying the right technology at the right time". It's about the technology, the process of creating that technology, and its impact on the underlying business. For example, building an electric car is cool, but producing a version in which people feel safe? That's engineering that changes the world forever.
According to Karan, these are the key things that matter in development:
Function (capabilities provided)
Form (how it looks and feels)
Fabrication (how it is built on the inside)
He recalls the value of the snake game on 404 pages. And the value of intentionality, saying "once you add a feature, it's probably going to be there forever. It's probably going to need maintenance and love and care forever. So do we really want to put it in?"
He talks about design and the balance between form versus function, such as designing something aesthetically pleasing versus easy to use. Then, there's fabrication: "How well can we make it? Can we deliver it quickly? And can others maintain it?" Sometimes using off-the-shelf software and well-proven frameworks are the most effective, and "Perfect is the enemy of good enough."
Karan stresses the importance of being a learning organization. "Be open to picking up what's out there to help make more informed choices, especially if the choice is to stick with the tried and tested." Good engineers are always open to learning about what new things are coming out and open to different opinions, frameworks, and ways of thinking.
Links from this episode
Get a volume level balancer. Talk about more than ruby