contents

Scaling Rails

Introduction

Episode #1 - Page Responsiveness

Episode #2 - Page Caching

Episode #3 - Cache Expiration

Episode #4 - New Relic RPM

Episode #5 - Advanced Page Caching

Episode #6 - Action Caching

Episode #7 - Fragment Caching

Episode #8 - Memcached

Episode #9 - Taylor Weibley & Databases

Episode #10 - Client-side Caching

Episode #11 - Advanced HTTP Caching

Episode #12 - Jesse Newland & Deployment

Episode #13 - Jim Gochee & Advanced RPM

Episode #14 - Rack & Metal

Episode #15 - Load Testing - Part 1

Episode #16 - Load Testing - Part 2

Episode #17 - Scaling Your Database - Part 1

Episode #18 - Scaling Your Database - Part 2

Episode #19 - On The Edge - Part 1

Episode #20 - On The Edge - Part 2

Episode #21 - On The Edge - Part 3

Master's Interviews

Mark Imbriaco, 37signals

Ward Cunningham, AboutUs.org

Lior Shiff, Product Madness - pt. 2

Lior Shiff, Product Madness - pt. 1

Jesse Proudman, Blue Box Group - pt. 2

Jesse Proudman, Blue Box Group - pt. 1

Adam Wiggins and Ryan Tomayko, Heroku - pt. 1

Adam Wiggins and Ryan Tomayko, Heroku - pt. 2

Adam Wiggins and Ryan Tomayko, Heroku - pt. 3

Thorsten von Eicken, RightScale

Developing for Performance

Yehuda Katz & Justin George Talk Rails 3

Webinar Replay: Optimizing Your Online Store for the Holidays

Webinar Replay: Using Apdex to Improve Online Customer Satisfaction

Application Server Provisioning and Tuning

How Performance Feedback can Reduce Testing in Agile Development

RAILS_ENV=local_production

Rails Teamwork

Ward Cunningham, AboutUs.org

Scalable Teams, Part 2: Leadership

Scalable Teams, Part 1: Communication

Benchmarking Reports

State of the Stack: A Ruby on Rails Benchmarking Report - 25 May 2010

State of the Stack: A Ruby on Rails Benchmarking Report - 7 January 2010

The State of the Stack: A Ruby on Rails Benchmarking Report - 10 June 2009

The State of the Stack: A Ruby on Rails Benchmarking Report - 2 April 2009

« Back to RailsLab

How Performance Feedback can Reduce Testing in Agile Development

Here at New Relic we’re an Agile shop. For the most part we follow scrum, however on occasion we’ve been known to break from the scrum "bible". New Relic is a Rails shop. Every part of our infrastructure is either Ruby or Ruby on Rails. We use RPM to optimize and performance-tune our application and consequently we are "eating our own dog food". The end result is that RPM enables us to spend more time building high quality features. We think what some of what we’re learning about performance management and Agile Development along the way is really quite interesting and may be useful to others.

How our Development Lifecycle is Changing

The two tools that form the pillars of our development process are Pivotal Tracker, and RPM. As you might have guessed, with Tracker we manage the development team’s activities and with RPM we manage the health of our production application (which happens to be RPM). However, over time, a surprising change has occurred in our development life-cycle.

A funny thing happens when you have constant visibility into your application’s health. The team starts to orient itself around that as a primary driver. Every morning, most of us in development, unsolicited, sign into RPM to "see how things are doing". We often find something of interest. A new error, a slow page request, a strange fluctuation in DB response times, a growing heap. Right away these get entered into Tracker and we start working on them. At New Relic, you don’t need permission from anyone to make the site better.

Once we’ve massaged the site into a healthy state, it’s back to implementing our Agile stories. We all dump stories into Tracker and product management sorts by priority. Then we start banging out the stories. And here’s where it starts to get interesting. Because we have great visibility into the application’s health with RPM, the need for finding problems in a pre-production environment is reduced. WHAT you ask?? What about the mantra that all Ruby codes needs complete test coverage?

Stop Testing - What Did You Say?

Well, "complete" coverage is a fallacy anyway. With test coverage, it’s a probability game. At New Relic, we try to analyze the business value of everything we do - from feature building to test writing. For us, simple tests that at least execute the code are mandatory. Tests that probe for edge cases in important code are mandatory. However, stress tests, performance tests, and complex integration tests are not normally done.

Here’s what we’ve discovered - having constant deep visibility into the health of the production application is an acceptable substitute for some testing. And, it even finds things that you’d never find in a contrived test environment. We’ve seen time and again that many problems only happen in production. Of course, the cost of finding a problem in production can be expensive if you look at the cost to the business. However, here’s where Agile comes to the rescue. Our team can quickly fix production problems, normally in under 5 minutes. So for us, the cost of letting a problem slip into production is relatively low.

What this means for our business is that because RPM gives us great visibility into the health of our production application, we can spend less time on integration tests, performance tests, stress tests, and edge case testing - while still having a high quality site. We can spend more time building features that deliver value to our customers. Agile is great because it allows for teams to react quickly. Normally, we think of this only in the context of building new features, however at New Relic it’s also allowed us to change our fundamental assumptions about how much time we invest in testing.

Others are Seeing this Trend

I can’t claim all the credit for this change in development process, or more accurately, I can’t claim to have first noticed this. Ward Cunningham of AboutUs.org introduced this notion when he spoke about New Relic at RailsConf 2008. He was talking about performance and stress testing and what he noticed was that his "team stopped trying to performance test new functionality and instead just pushed it to production and watched what happened. If RPM showed a significant problem, then it’s cap deploy:rollback." My first thought was "now, that’s just reckless". Now I realize no, it’s just plain good business sense. Performance testing is expensive and it never really simulates what’s going to happen in production anyway. And, how often is code so bad that you need to take serious action? Once a month? Fine, then rollback and make fixes once a month.

Then, a week later, Ward dropped another bomb. He told me they don’t even have a staging server. If the code passes unit tests, it’s straight to production. And if RPM says there’s a problem, you guessed it, cap deploy:rollback. At New Relic, we’re not quite there yet - we still have a staging environment. But I think Ward is onto something. If failure rates are low, if visibility is immediate, and if the repair cost is low, then why not push the envelope? It’s all about value to the business and while bugs are an expense, new features drive the revenue.

To illustrate the kind of visibility we get with RPM, below are some examples of problems we’ve noticed…

Some strange MySQL error we noticed one day. It’s on our backlog to investigate. You need fine-grained detail to dig out SQL errors.

We saw that the combination of a long time window and a lot of database detail caused some of our queries to take way too long. This poor guy had a 150 second query. We ended up restricting what length of time some views could display. RPM let us see the specific combination of queries that needed to be changed and restricted.

One afternoon I saw our DB activity spike. Without a tool like RPM, we might have wasted a bunch of time digging into our code or trying to see if we were under a hack attack. Turns out a background job was consuming too much of the database.

Scalable Teams, Part 1: Communication

A scalable team is one that can remain productive regardless of the variations in workload and team composition. Easy, fluid communication between team members, the team and the customer, and the team and their code is a key pillar to creating a scalable team. Wolfram Arnold from RubyFocus discusses the importance of communication with Edward Hieatt and Davis Frank of Pivotal Labs.

Play Video (74.7 MB, 10:09, M4V)

Ward Cunningham, AboutUs.org

Ward Cunningham of AboutUs.org shares his thoughts on Agile development and the need to compliment the use of specific tools and processes with mastery of development techniques.

Play Video (78.6 MB, 6:50, MPEG-4)

Tell us what you think. We’d love to have your feedback.

feed me

RSS FeedRSS feed

on iTuneson iTunes

email notification

We'll email you when more become available. We will not spam you. Swearsies.

 

tags

Interview

Caching

Database

eCommerce

Capacity Planning

Scaling Rails

Cache Expiration

Agile Development

Plugins

Ruby Versions

Rails Versions

Load Testing

application bottlenecks

scaling database

Cloud Computing

Facebook Development

MySpace Development

Social Apps

Scaling

Rails

application

bottlenecks

Ruby Gems

 

Feedback

RailsLab is brought to you by these expert contributors.

New Relic