Points vs Hours

Should we use points or hours?

This decision really comes down to choosing between relative vs absolute estimation. It’s obviously easier to estimate things in relative terms. A simple example to illustrate this could be if I show you two large stones and ask you if it would be easier to estimate by their appearance which one would be heavier to lift (relative), or to guess how many kilograms each one weighs (absolute). 

Just because it’s easier to estimate things in relative terms, does that mean it’s better? When it comes to estimating work, points are superior. Going back to the stone example, not only is it harder to accurately guess the weight, it also doesn’t tell us what we really need to know: how much effort do we think it will require to lift each stone. 

However, there are more considerations with this decision than just how you estimate work. Even if you use points, hours should still play a major role in your business. If you’re asking this question, maybe you’re trying to figure out if points are right for your organization but it’s not clear if they are a complete replacement for hours, or how they may co-exist. You’re in the right place! 

As you read on, keep in mind that everything we discuss in this article is based on our experience  in the Infrastructure and DevOps world. 

The journey many have taken

I think the best approach to explain how hours and points differ in estimating work is to bring these concepts down to earth and illustrate their use in a story that you can probably relate to.

Imagine I have a team of 5 engineers that are fully dedicated at “40” hours per week and we are given a project. I break the project down into two lists, each one representing the work required to get the project done. 

For the first list, the work is represented as tasks and we estimate them by how much time (hours) we think each is going to require to be completed. The second list is represented as stories and are estimated in points by using a Fibonacci sequence of 1, 2, 3, 5, 8, and 13. Each story is estimated with a point value that represents the complexity and level of effort (e.g. a 1 point story is very simple and requires a low amount of effort to deliver). 

Let’s start with the hours scenario.

The list of hours on the tasks add up to 1000 hours worth of work, and they can all be started in parallel. To be agile, we decide to run our project in two week sprints and to stick with hours as our estimation system. I look at the team’s average hours of engineering work from previous projects and find they log about 35 hours every week. Now that I know the actual bandwidth that we can plan for, I declare that the project can be completed in 3 sprints (~6 weeks) because 1000 hours / (35 hours * 5 engineers) = ~6 weeks.

We get started and after the first sprint, having anticipated 35% of the project to be completed, I notice progress of only 20%. If this trend continues, the project is actually going to take 10 weeks.

This also produces another problem: it looks like the team collectively put in 200 hours of work instead of 350 over the previous two weeks. I know this isn’t true, so I have each engineer put the actual hours they worked on their tasks and realize I indeed got around 350 hours accounted for but many tasks took much more time than was anticipated. I also have them go back and re-estimate the remaining list of tasks based on what they’ve learned after sprint 1. The team produces a new remaining estimate of 1400 hours. I conclude the project is now going to take 8 more weeks. With week 1 in the books already, this adds up to 3 weeks longer than what was originally estimated, so not too horrible.

We feel good about where we are now because we have applied lessons learned and used real data to help us forecast better. We also schedule more weekly meetings to keep this alignment going throughout the project. Then after sprint 2, I noticed we missed the mark again due to tasks taking longer than expected, but also because of various blockers and dependencies the team had not previously anticipated. On top of this, we also had more scope added by the stakeholders. 

As the sprints roll by, other events begin piling up such as engineers being out sick, dealing with other fires, and blockers. This volatility only continues and our estimations have become completely warped from where we began. When the project finally finishes, we were way off our original estimates. It’s also not clear that it would’ve been possible to have the foresight needed to prevent this.

In retrospect, we invested a lot of overhead in trying to accurately predict when things would be completed by with little success. It was better than having no idea whatsoever on how much time the project was going to take, and we could see where the engineers’ time was being spent, but for planning purposes, it left a lot to be desired.

And now the points scenario.

So for the same project, we take the list of stories and add it up to 500 points worth of work. Once again, we start under the assumption that all the work can be started in parallel. In this scenario, we’ll also run in two week sprints and say that the team’s velocity is established at 100 points, therefore we declare that the project can be completed in 5 sprints (10 weeks).  

We start the project and after our first sprint, we complete 90 points instead of 100, with two 5 point stories left incomplete to be finished in the next sprint. The team has similar feedback as from the hours scenario: some of the stories took longer than anticipated but they are confident we will catch up in the next sprint as the incomplete ones are almost done. There’s no need to adjust the points on any story nor is there any need to schedule more meetings to course correct our project schedule. 

The same challenges come up, such as the blockers and dependencies which we didn’t account for before. The team has sick days, other fires come up, and so on. However, our velocity is still holding at ~100 points per sprint. When scope gets added on, that work is ingested and estimated, then added to the backlog to be worked on based on priority. 

Since scope is added on after the project starts, the project will end up taking longer than what was first estimated in both scenarios, but the team using points will be better at estimating when it will finish, and will also complete the project sooner. 

Why would the team using points finish the project sooner?

The points team is able to get more work done than the team using hours because they are not burdened with the overhead involved on each task. If you think the difference in overhead is trivial, consider this. Using hours led the team to scheduling more meetings every week. The team needed the time for planning and estimating hours on new work, updating for actuals on each completed or in-progress task, and occasionally re-estimating remaining work. For argument’s sake, let’s say it’s 2 hours per week. This team of 5 engineers would be spending 20 hours per sprint on overhead that’s adding questionable planning value. Even if there wasn’t a meeting for updating the actuals, it’s usually difficult to rely on engineers to get this done on their own time unless it’s closely bundled with billing. Planning using points takes the average team 1 hour per week, because they are not trying to figure out in granular detail the hours needed, they aren’t going back and adjusting for what actually happened, and in general they aren’t re-estimating work. It’s costing the team working with hours twice the overhead. 

In the real world.

This story initially said the team would be fully dedicated to the project. Having 5 engineers whose sole focus is on a single project sounds like a fairy tale scenario to many reading this. We hinted about the team “dealing with other fires,” but the reality is a large amount of priority shifting and unplanned work is common for DevOps teams. This is all the more reason why overhead is a big deal. Being able to quickly get work represented and estimated is key to a sustainable process.  

Seeing is believing.

For those of you who haven’t yet worked with points and were left unsatisfied with the story above, understand that it’s one of those things that only really makes sense once you start doing it. Don’t get too hung up on the numbers used (e.g., 1000 hours vs 500 points) as those were just arbitrary examples. There’s a lot of material from official Agile sources you can read on the topic. Just know that until you really experience it, it probably isn’t going to fully click.

How do you know if points are better to estimate work with over hours?

Let’s cut to the chase. Any work estimation system you use is only as valuable as the degree to which it helps you predict what your team can deliver in a given period of time (i.e., a sprint). Better estimates create better predictions. 

Let’s take a closer look at what happened after the first sprint in each scenario from the example story. The team using hours was expected to be 35% done with the project but only made 20% progress, whereas the team using points was aiming to be 20% done and nearly hit the mark at 18%. This is because there’s a difference between saying a team has a velocity of 100 points vs 350 hours of bandwidth. First of all, bandwidth is almost irrelevant in this conversation. Hours worked is not causal to productivity. In other words, while hours worked is important to know for other reasons, it doesn’t tell you anything about what can get delivered in a sprint. 

You might say, “that’s why we estimate tasks in hours,” but then your planning is only as good as your ability to accurately guess how much time your tasks are going to take. If you have first-hand experience trying to do so, you’ll know it’s really hard! It requires a lot of precision and, frankly, luck. You can get better at it with experience, but it’s ultimately a game of diminishing returns. 

Velocity is all about what actually gets delivered. Once you’ve established your velocity as a team, this is something that you can take to the bank when planning. The sum of points of the stories that get moved to done are counted in velocity, period, end of discussion. This is why you don’t split points on unfinished stories between sprints. It’s all or nothing. Either you got the story done or you didn’t. The beauty, though, is there’s no need to reconcile after the work has been done. If we’re a bit low in one sprint, we can safely assume we’ll make up for it in a future sprint. We don’t need to waste energy trying to validate if we got the engineers’ full bandwidth or not by having them update the points on stories that proved to be underestimated. If some stories are underestimated, it will be balanced out by other stories being overestimated. This is something Kanban also accomplishes quite nicely by measuring a team’s throughput.

What we advocate for

Use both, separately.

In the most simple terms, we prefer to use points to estimate work and log hours to measure actuals. How this often manifests is a team using points to estimate their stories in a tool like Jira, and then logging their time against projects and categories in a tool like Harvest or SAP. The reason we like both is because they serve different purposes. 

These are the major questions that points and hours provide answers to:

Points:

  • When can we deliver?
  • How many engineers do we need on a project?
  • How many engineers do we need to hire?
 

Hours:

  • How much is a project costing?
  • Where is the team’s time going?
  • How much bandwidth does the team have?
 

Doesn’t Agile prescribe using both together?

Yes. Agile would prescribe something like: the team estimates stories using points and then individuals assigned to each story’s sub-tasks can estimate those in hours. Our position is we would never say this is a mandatory practice. With that said, if this helps the team or individual estimate better and they want to use it, go for it. We advocate for paths that require less overhead.

When to use points

When can we deliver?

Points are better than hours at estimating when something can get delivered by because at the end of the day, that velocity number is about one thing: results. If you didn’t deliver, it doesn’t count.  

On that note, keeping commitments is still hard. This is more true when you begin talking in the magnitude of months or quarters. There are just too many variables which you have zero control over. Scope changes, more dependencies emerge, engineering debates chew up time (e.g., what to name things), there may be gripes about the project budget, stakeholders change, vendors can miss commitments, engineers take time off, people resign, new engineers join the team and require ramp up, maybe some of them end up not working out, your department undergoes a re-organization, the company announces they are being acquired, and so on!

Using points and velocity goes a long way in accounting for volatility but there’s no silver bullet solution to this problem.  

How much will a project cost?

This ties into the question above and below and is a must-have not only in project planning, but in CAPEX and OPEX planning. 

Let’s go back to the example story. In the hours world, we would have initially budgeted for 5 engineers working full-time for 6 weeks, and if we’re smart we would have requested a buffer on top of that to cover unforeseen circumstances. Even so, this still would have resulted in the need to go back and ask for a lot more time and money.

In the points example, we also would have required additional funding because of the scope increase, but the overall delta would have been much less because of the benefits of relative estimation and knowing the team’s velocity.

Regardless, this isn’t an exact science, I don’t care what anyone says. Budgeting is hard. Since points are more accurate in estimating work than hours, it’s a better way to forecast the labor needed on a project.

On that note, if you are practicing a “points to hours” conversion ratio system, I strongly advise against doing this (more on this later). 

How many engineers do we need to hire?

If the team of 5 is running at 100 points per sprint, it’s good enough math to say that if you added another engineer, the velocity would increase to ~120 points. This is nice because you can look at your backlog, measure the amount of work you need to complete in your sprints to keep up with demand, and derive your desired head count. 

Side note: If a team is saying they need more people, that means they need more people. I’m all about using data to help justify decisions, but be mindful about how much energy it takes a manager that’s likely overworked to justify the need for more head count. 

When to use hours

How much is a project costing?

No one uses points to calculate cost or when invoicing. Teams that are using points to estimate their work still need to log their time. Whether you use a system like SAP or Harvest, setting it up to be able to report on this is necessary. 

If you set up your time tracking system correctly, then you can easily answer this next question.

Where is the team’s time going?

Nothing is going to be more accurate than analyzing the time logged by the team. Choosing a tool that offers granular tracking options is highly preferred. Know that we’ve had success using points for this purpose, but it doesn’t compare to the precision of the logged hours. 

One consideration is that these are also likely different systems of record that are managed by different teams/departments. If your time tracking tool isn’t able to give you this data, or if access is a concern, then points can be an acceptable backup plan.

How much bandwidth does the team have?

This question should really be about individuals versus the team. Hours are better than points at answering this, but having a conversation is ultimately the best method. Someone might be overloaded but it doesn’t show up in the time they log. Maybe they are working the normal amount of hours, but are dealing with so much stress and complexity that they cannot take on any more responsibility. However, if someone on the team is logging over 40 hours per week, that’s a clear indicator they are over-subscribed.

Points and velocity cannot be relied on to tell you how hot or cold a team is running as these numbers are not about the individual and they are not necessarily aware of lower or upper limits of the team’s capacity. For example, if a team’s velocity increases, it doesn’t automatically mean the team is working harder and longer hours to make that velocity number go up. It could be the result of a process improvement, or engineers on the team improved their skills and became more productive. Conversely, if a team’s velocity decreases, there could be other factors at play besides less work on their plates. On the contrary, velocity could decrease if there’s too much context switching going on that’s causing work to get started but not finished.

What points should never be used for

Never use points to assess how productive an individual is.

Before I discuss why it would be wrong to use points for this, know that determining productivity should be the result of context-rich analysis and dialogue with everyone involved. You should also take the relevant situational factors into consideration as much as possible.

I never advocate for the use of points to performance manage someone. It’s unethical because it’s bad math and context is often lacking. Points are about the team, not the individual. Stories are estimated by the team and those estimations reflect their collective capability. Velocity is a measurement of the total output of the team.

You might challenge that as trends develop for a team (i.e. velocity), similar trends also develop for engineers. This is true, but you cannot use this as a comparison of productivity. If engineer A has an average velocity of 13 points and engineer B has an average velocity of 10 points, this does not mean engineer A is 3 points more productive than B. Engineer B might be contributing to engineer A’s stories getting completed. They could be working on different projects that are operating differently. They often do not control all of the variables involved in a story getting completed. One engineer might be context switching more than another. There are more possible causes but you get the idea. 

The other factor to consider is that the quality of work getting done is really important when evaluating productivity, but is also subjective and difficult to measure. Someone might be churning out pull requests and completing stories at a rapid rate but is leaving a mess in their wake. Others may produce really meaningful innovations but the work was associated with a single story. How do you convey the significance of this contribution through this data alone?  

Are there any circumstances where I would say it’s reasonable to look at points for this? Outliers exist and can be informative, I’m not denying that. If an engineer gets no stories completed sprint after sprint and the rest of the team is getting their stories completed, then you might have a problem. Just know this is a dangerous road to go down if it’s your only piece of evidence.

Never use points to compare teams to each other.

Points also happen to be team specific. 

Let’s say for the sake of argument you gave me two teams that are, on paper, the same: the same size, they have identical work loads, the time zone factor is identical, they have the same volume and cadences of meetings, they use the same tools, they support the same teams, they fire fight exactly the same amount of fires, they have the same methods of intaking work, and the engineers have identical experience and skills. This is obviously not possible to find two teams that meet in the same place on all of these categories but even then, I wouldn’t compare their velocities. Why not you ask?

Points are a relative estimation system. The jumping off point for every team is to figure out what example of work they will anchor their estimations to, and then measure everything else based on that. The amount of factors involved in how a team goes about this is not worth trying to control for, just assume that each team is unique in this regard.

If you insist on comparing teams to each other, I would ask to what end? If you want to see which teams are performing well, I would suggest to focus on things like: are they delivering? Are their customers happy? Do other teams have a positive experience working with them? These are much better questions to get answers to.  

Never convert points to hours.

There’s simply no need. It’s infinitely easier and more accurate to just have people log their time. 

Some of you might be already doing this and are thinking that you are getting something meaningful out of it. I would ask the question: why did you start doing this?

Here are some of the reasons I’ve run into:

  • You wanted a model to help you figure out how many points a story should be.
  • You wanted a model to help you figure out how to size projects.
  • You needed a way to demonstrate how busy your team is in something more grounded in reality.
  • You had to produce this for financial/budget reasons.
  • A project was estimated in hours and you needed a way to convert this.
  • You were told to do it.

Let’s talk about why the math breaks down. 

Points are a team-level metric and estimate how complex something is in the context of what the team can do. You might be thinking that one can expect a positive correlation between points and hours. Time is a factor of complexity, so this is a true statement. The issue here is that it’s counter-productive towards solving your problems. Once you create a ratio of points to hours, it’s absolute, and absolute estimation is the same as using hours. In other words, you’ve completely diluted the power of relative estimation. 

It gets even messier. Consider these scenarios:

  • The amount of time it takes to complete a story or task is not always directly correlated to the level of complexity involved. A task may not take a lot of time to execute, but requires a lot of preparation and considerations. Alternatively, other tasks may take quite a long time to complete but are able to be run in the background. 
  • In practice, stories can get started and stopped, and restarted again. This cycle ends up taking longer than if the story was started and completed in the same working session. Oh, if only life worked this way. 
  • A story may involve many engineers collaborating in different ways. Some of those ways might be indirectly. It could need a meeting with people not on your team at all. Do you count their time? Why not? Isn’t it all the company’s money at the end of the day? 
  • You might split a story between sprints where now the original and the new story have the same amount of points. Some stories end up taking forever (you know it happens more often than you want to admit). They carry on sprint after sprint because other priorities keep getting in the way. What then?

I can keep going and probably write a book on this. The point is these things are not related enough to map nicely. 

Summary

What I hope you take away from reading this is that a decision to use points does not replace the need to track hours, but there’s a time and a place to use each. They are different tools for different problems and when applied incorrectly, there can be a lot of bad outcomes as a result. However, when utilized as they were designed to be, they can each really help you get the information you need to steer your organization with confidence.

Ready to take the next step?

If you enjoyed this article and are interested in learning more about how Blue Pisces can help your company, click on the link below to set up a free discovery call.

Leave a Reply

Your email address will not be published. Required fields are marked *