Monthly Archives: April 2015

You Don’t Need a DevOps Team: You Need a Tools Team

Lately the job boards have been filled with ads that look something like this:

Seeking Senior DevOps Engineer

* Must be able to debug all databases created since 1980

* Be a core contributor to at least 10 open source projects

* Have experience with Go, Java, Python, Ruby, and C#

* Understand the kernel and be able to debug panics at 3AM

* Be willing to participate in the on-call rotation

* Insert some other absurd skill here!

Am I the only one who thinks this is a joke? Hiring one person isn’t going to make a DevOps implementation successful. DevOps is a shift in how work is flowed to various parts of your orginization, and who should take responsibility when certain actions occur. It is the next evolution of the cross-functional team.

Why is it so difficult to have success with an idea that seems so simple? At Rally we’ve found that it stems from two issues:

  • Development teams are asked to own their services in production, but lack the neccessary access to resolve problems.
  • Operations teams are very interrupt-driven, so asking them to build tools and systems for other people is never their top priority.

It’s taken us some time to find the right solution to these two problems above, and while our solution may not be right for everyone it’s worth consideration in most engineering organizations.

Iteration 1: Embed a Core Ops Member on a Dev Team

A few years ago, when we started moving towards a Service Oriented Architecture (SOA,) the decision was made that development teams should “own” the code they put in production and therefore should be “on-call” in case an outage occurred. Great idea, right? Systems administration isn’t going to know how to fix a problem that keeps happening every night at 2 AM, but the developer who wrote that code will. So let them get the alerts and provide them with the mechanisms to help debug issues as they occur in production.

However, due to orginizational constraints, we did not give the developers production access. So we effectively tied their hands behind their back while asking them to tread water in the open ocean. There was no way this would ever work. We needed to provide certain tools that would help the developer understand what was happening with their application. So we increased our metrics-gathering and logging capabilities and thought all would be well with the world.

At the same time, the team building this first service needed to spin up new hosts and get them configured in the same way as our other production hosts; another task they could not complete. Furthermore, our operations team was busy handling other support cases for our main application (ALM) which was more important than spinning up new hosts for some new service (that wasn’t even production-ready yet.)

So our engineering leadership made the decision that a core operations member would go and work with this service team and provide them with the needed system administrator skills to get their application into production. He would help bring up new hosts, get them configured (which was done by hand at the time,) set up deployment pipelines, and help debug issues that occurred in production while the application was being dark-launched. This approach worked well. The team found they had a lot more throughput and could experiment without having to bother several other teams to get the work done.

But ultimately this iteration wasn’t tenable. We couldn’t scale having an operations team member embedded on every service team we created. Plus, those teams were focused on delivering features, not solving the real problem at hand: automation and lack of tooling for production.

Iteration 2: Tooling Team, Take One

So we formed a new team that would work on speeding up the delivery lifecycle for the engineering orginization. The thought was that by giving this team a very specific goal, they would be less likely to be interrupted by other tasks.

We couldn’t have been more wrong. In fact, this team found that 80% of the work done was interrupt-driven, and therefore we could never accomplish the task we set out to do.

You’re probably wondering: Why was this team interrupted so often if it had such a specific task to perform? Well, part of it was the team make-up. One member of our team was the sole person responsible for maintianing our build infrastructure (the machines where our CI jobs were executed.) This meant most of his time was spent debugging issues around that system, instead of helping the team. In hindsight we should have found someone else to own that infrastructure (which we’ve now done) but when we were doing this it was hard to justify pulling someone off another team to maintain the machines when we already had someone.

Another reason we were interrupted was that we pushed out the first iteration of our configuration management solution way too early, and the teams that chose to use it were constantly finding problems. We’d have to drop everything and rush to fix their pipelines, so they could get their builds out to production. This sometimes took days or weeks, depending on the sitution. We also spent a ton of time trying to automate the configuration of systems that really didn’t need to be automated in the first place.

Iteration 3: Devops Team, aka SysAdmin for Hire

During the iteration 2 experiment, we also hired several system adminstrators in our remote offices to help facilitate the production tasks for the service teams. They worked outside of the core operations team and were often divided among multiple teams (unlike our iteration 1 experiment.) This was really unfortunate for the team whose work needed to be performed in production, but whose “DevOps” person was working with another team. Since their tooling support was incomplete, the iteration 2 team often felt blocked.

This had an unintended consequence: there was automation support for their service, in a silo very specific to their stack. Their “DevOps” person would help them operationalize that tool, so when they needed to perform a specific task they’d just run a Jenkins job or execute some CLI task. This was great for them but made it hard to reuse the scripts for other teams.

The “DevOps” team members were unable to spend their time buiding generic tools for their teams (or other teams for that matter) because they were now the point of interrupt for production outages and other tasks. Fighting fires became their fulltime job.

Iteration 4: Merge the Tooling Team with Ops

Eventually we realized that the tooling team from iteration 2 was building many of the things that our core operations team had wanted to build for years. So we did what every other engineering department would do: we merged them, and created a new team called Infrastructure Engineering. This team’s goal was to build reproducible infrastructure using tools like Chef and Docker, facilitating the goal of delivering services to production faster. It would ensure that all tasks were automated and had sensible UIs for developer interaction (whether CLI or web.)

But here, all we did was take two teams that were already experiencing high interrupt rates and physically relocate them next to each other in the office. We did nothing to address the interrupt-driven lifestyle that had become commonplace among both teams. The utopia they thought would occur as a result of the merger was quickly diminishing. After several developers left the team for various reasons, it was time to reevaluate the problems.

The Problems

  • Our core operations team is specifically an interrupt-driven team. They fight the fires that other teams cannot on their own.
  • Developers were being alerted or paged when their applications failed in production but didn’t have appropriate access to fix the problems, which caused more interrupts for the core ops team.
  • Compliance requirements were going to eat up even more of the core operations team time.
  • More and more services were coming online and we did not have the correct automation in place to make this easier (spinning up new VMs, for instance.)

Iteration 5: The Hard Decision

During our Q1 PSI planning at the beginning of this year, our engineering leadership made a very hard decision. We would divide the Infrastructure Engineering team into three parts: a tooling/platform team, a compliance team, and an interrupt/fire-fighting team. Siloing the interrupt work would allow the other two teams to actually complete the work that was needed by the end of the quarter.

Building a tooling team free of interrupts, and with a focused product (our first iteration of a Platform as a Service, or PaaS,) meant we were able to accurately predict and prioritze work in our quarterly PSI planning. We broke down our customer needs and set goals for each iteration on things we wanted to deliver. This brought a renewed sense of passion to the team and we’ve been crushing it ever since.

Now: Why do I feel this is the proper implentation of DevOps, vs. what we were doing before?

Our Developers Are More Efficient

The goal of every engineering orginization should be to make its developers more efficient. We did this years ago with the advent of TDD and automated testing. While you may still manual test your applications, the coverage required of your QA personel is drastically reduced because of the automation I hope you have in place. This makes your entire team more efficient, allowing them to increase their throughput.

Now ask yourself: Are there other areas of automation you wish you had in place that could make your development teams more efficient? This is why you need a tools team. There’s not enough time in the day for your developers to both create the tools they need and crank out features. So you have to decide: which would you rather have?

Your Ops Team Has Other Problems to Solve

Your operations team is a fountain of knowledge that’s been shaped and molded over years of midnight pages and one-too-many weekend alerts. They possess crucial information about the state of your infrastructure, and ideas to make it better. Tap into that knowledge and allow your tooling team to build operation tools that help them automate their day-to-day workflow so they can focus on building you a better system.

Your Ops and Development Teams Should Already Be Communicating

These teams should be focused on real problems: How can I effectively scale my application? Do we have enough bandwidth for a given service? What happens when this service increases the database load? While tooling can solve some of the problems your development teams face, it’s often not enough. Your development teams should be working closely with operations to solve application and system problems that are occuring in your environments. This is the value you want them delivering.

Now What?

There are probably people on your development and operations teams who are passionate about building these tools. Talk with them and find out what they would do to help your orginization, because I bet they go home at night and think about these problems. You should be harnessing this energy, and this is how you get started. Find a few things you can do quickly that will provide immediate value to your teams, and let them work on these. Then watch how the effect cascades and how those tools speed up other areas of your development cycle.

Over fifty fake call centres in Delhi-NCR duping job seekers, says Delhi Police

NEW DELHI: As Delhi and NCR has turned into a major educational hub, the region has also become the den of fraudsters duping job seekers of their hard-earned money on the pretext of getting them placed in multinational companies.

Investigations in this connection by Delhi Police and UPSTF reveal that at least 50 such call centers and job portals are running in the region. Similar frauds are being committed by some Nigerian nationals through phishing emails.

On March 25, UPSTF arrested two persons for allegedly running a fake call centre by taking data of job seekers from famous job portals, which unearthed the network of scamsters.

This case is just the tip of an iceberg and the police estimate that over 50 such fake call centers are operating in Delhi, Noida and Gurgaon.

Victims of such fraud are spread across India who are initially asked to make a payment of Rs 10,000 to Rs 25,000 for job in IT giants and MNCs.

Recently, an FIR was registered by the Delhi police against unidentified fraudsters after a complaint by Tata that several job seekers were duped on the pretext of getting them placed in it.

The youths were duped to the tune of Rs 8,000 to Rs 10,000 in the name of application and processing fee, police said.

While this case is being investigated by Economic Offence Wing of the Delhi Police, the wing formed to probe financial crimes had earlier unearthed a similar fraud in which gullible job seekers across the country were duped in the name of Maruti Suzuki India.

“Many of these gangs download resumes from job websites and then target people in faraway cities so that once duped they refrain from travelling to Delhi and registering complaints. We have also seen a case where people were duped in the name of getting them jobs in Delhi Metro Rail Corporation,” said a senior official in EOW.

Explaining the modus-operandi, UPSTF’s Additional Superintendent of Police Triveni Singh said, “Conmen first buy data of job seekers from famous job portals for Rs 25,000-Rs 40,000. Then they make a fake placement website which sounds similar to existing famous portals. They make calls to job seekers that their resume has been selected for a job in leading IT, banking and international companies and then demand money.”

Data of most of the job portals are compromised as they are not following stringent process to keep their data secured, a senior police officer said.

The police claim it becomes difficult to bust these gangs as their bank account, payment gateway and domain names are taken on fake identifies and many a times the servers are based abroad.

“Payment gate does not do physical verification of their client and allows payment against their commission. Similarly, domain registrar and server do not follow stringent rules and register customers on incomplete details,” Singh said.

As per police claims, in order to sound professional, gangs hire English speaking call centre executives and rent an office space at a plush commercial building.

“Job seekers are communicated in proper English through internet calling so that their number becomes difficult to track. In the name of registration and interview they extract Rs 20,000-50,000 from a candidate. All mobile SIMs, mobile phones and bank account are activated on fake documents,” Singh said.

According to cyber crime experts, most people in the scam are below 30 years of age and target fresh graduates who are desperate for jobs.

“Educational institutes have mushroomed in Delhi and NCR. These scamsters target unemployed fresh graduates. Most of these conmen have worked with job portals and call centers so they know how to exploit the loopholes.

“The initial payment they charge from each candidate is low so a complainant hesitates to approach police and even cops do not take these cases seriously. But criminals keep minting money and after running the racket from one place for a few months they change their location and other details,” Cyber Crime expert, Kislay Chaudhary said.