## Zodiacs and Earth’s Precession

You may have seen recent posts claiming that NASA published some new results on the zodiac signs. Some of the bigger internet publishers, like Distractify, have popularized the story. The story says there is a 13th sign we haven’t seen before and all the zodiacs need to have their dates adjusted.

For clarity, NASA has no opinions about astrology. They never have and the won’t any time soon.  And there is no 13th sign. Snopes, bless their hearts for their wonderful works, called this out clearly.

However, your sun sign still might not be safe… depending on your level of trust in the basic components of your zodiac’s definition. Traditionally your zodiac, or sun sign, is defined to be the constellation that the sun is in on the morning of your birth. This is reliable from year to year because we travel around the sun once. So the sun cycles through each section of the sky in a big arc.

However, the location of the sun is strongly controlled by the location of earth’s axis in relationship to the rest of the sky. The tilt of earth’s axis rotates every 22,000 years- this is called Precession. Many ancient cultures knew about this and it is well studied and validated. Precession causes the zodiacs to slide backwards through the season at a rate of one sign shift each 1.2kyr. Even the Greeks noticed that the zodiac signs were slowly cycling, or regressing, through the seasons.

The Greeks wanted to stop this from happening so Ptolemy, a famous Astronomer who worked in Alexandria, fixed the astrological signs to the equinoxes and seasonal changes.  From then on, the western notions of zodiac signs are now more tied to local seasonal elements than to the solar systems located many light years away. These are the definitions we use today! This means that there is no way for NASA (or anyone else!) to scientifically reinvent them. Timing of zodiac signs are kind of like math definitions. Humans defined them, so there is no scientific way that NASA could ever prove them wrong. There is nothing to prove, timing of each zodiac is just a definition applied by Ptolemy.

If you are someone who happens to believe deeply in astrology, then it’s deeply unlikely that anything in this article will effect that way you view your zodiac sign. However, I think it’s interesting to note that if you believe in Astrology, then you implicitly believe that the time of year you were born in affects your personality. This belief in zodiac signs could mean you would also appreciate research on seasonal affective disorder. Or you might really want to check out this fabulous article by the Atlantic which discusses recent research showing people born in the summer are more susceptible to mood swings (and many other personality quirks that are seasonally significant!).

This is one situation where science, math, and astrology aren’t always well separated. Science can tell us something about the world around us. And, historically, humans built belief systems around that knowledge (Astrology).  Then we had to update the astrological belief system when the science no longer matched. Because initially, sun sign was a good indicator, until we learned about Precession. So astrology must be adjusted to match the seasons, not the stars. Which, some argue, is what the sun signs were supposed to do anyways. And, while we can understand Precession scientifically, we can’t prove anything about the tenants of astrology. Perhaps it’s only a matter of time until we find more scientific reasons to support certain aspects of Astrology? Certainly much of the internet is ready to believe that NASA should be able to weigh in.

Posted in Communicating Math, Media, Nature | | 3 Comments

## Knitting and Math

I have two main communities: my math department (I’m a grad student) and my knitting group. When I talk to some people in the knitting community at large, they have the impression that more complex knitting is very “mathy,” because “you have to count stitches” in various patterns.

I don’t really see the counting as mathematical. The person who designed the original pattern used spatial reasoning, and that’s math. When I knit complicated cable or lace patterns, I visualize the pattern part of it, and that’s math. Even something like adjusting gauges (the number of stitches per inch) to make garments of an intended size – that is sometimes math.

And a particular type of yarn that I’m enamored with – so much so, that I’m writing this – is mathematical.

## Getting Stripes

For the non-knitting folks: if you’ve ever seen a hand knitted garment with multiple colors, there are a few ways for a knitter to achieve this. The first (more traditional) way is to have several balls of yarn and to knit each stitch with one of these, changing as one goes. That’s not a bad way to do it, if one only has to change colors a few times. But there’s a catch. Every time one length of yarn is broken to let another be tied in, there are now two ends to weave in. If you don’t knit or crochet, ask someone who does how they feel about weaving in ends. You’ll soon learn it’s not the enjoyable part of fibre art.

Another project, with many ends, waiting to be woven in.

The second way to change colors is to buy a ball (or skein or hank) of yarn, which already has multiple colors in it.

Usually, when a ball of yarn has multiple colors, those colors occur at regular intervals. If the yarn was dyed as a 2 yard skein, then every two yards, the same color will appear. So, roughly the same number of stitches will be required to get back to a color.

This makes for a really pretty fabric, if you’re knitting something that has about the same width (or circumference) throughout the project. Socks, mittens, rectangular scarves, and even hats will all look roughly as you would expect them to.

These socks – these socks look great in a yarn that works this way.

However, a lot of the patterns knitters like to make change width as they go. Specifically, triangle shawls and semicircular shawls have been extremely popular in recent years. Here’s how they’re constructed.

We cast on just a couple of stitches (usually 4 or 6) to start and then knit back and forth (as indicated by the very “high-tech,” penned arrows above). Each row, we add a few stitches. By the time we get to the outside edge, there are hundreds of stitches in each row.

Now, let’s say our skein had fairly long bands, so we actually get full color stripes all the way to the end of the shawl. In knitting, length of yarn translates to a number of stitches, and stitches translate to area: a block of 100 stitches can be 2 rows of 50 stitches, or it can be 10 rows of 10 stitches, or it can be a single row with 100 stitches. It all depends on the row width. So, if 3 yards of yarn gives 100 stitches, the area of knitted fabric won’t change – but the shape it takes might. Knitting friends, this spatial reasoning was brought to you by mathematical thinking J. You’ve all done math, every time you’ve estimated the yarn needed for a pair of socks or a scarf. No counting needed.

Anyway, in the shawls pictured above, we can clearly see the row sizes are changing and it doesn’t take much mathematical intuition to realize that a stripe that’s 1” deep near the start (the part with short rows) has a lot less area, and therefore a lot fewer stitches, than a 1” stripe near the end (with the long rows). So it takes much less yarn to make that 1” stripe near the center or beginning, than it does to make a stripe of the same width toward the outer edge.

How much difference, exactly? Caterpillar Green Yarns figured that out, and we just get to reverse engineer it.

## Finally, the mathy yarn

The shawl below – before Caterpillar Green’s innovation – could only have been constructed by

• purchasing skeins of all of the unique colors you see below,
• knitting a few rows in grey, cutting the yarn, tying on purple yarn,
• knitting a few rows in purple, cutting the yarn, tying on grey yarn,
• knitting a few rows in grey, cutting the yarn, tying on red yarn,
• etc,

and then weaving in all of those ends. Again, weaving in that many ends just sucks.

But Caterpillar Green found a way to dye the yarn – all in one skein – with the appropriate lengths of yarn.

Our goal is to understand what lengths and why it works.

We’ll use the first style of triangular shawl as our model. Now, we need a standard unit for the length of our yarn. Let’s use the length it takes to make that first “stripe.” If the whole picture shows the shawl, the darkened triangles, together, make up the first stripe. It’s actually going to be more convenient to think about half of that stripe.

(Yes, the first stripe looks like two little triangles put together. Further on, they’ll actually look more stripes.)

Also, every knitter will make a slightly different size triangle with the same length of yarn, and those will change if they change needle sizes. But most knitters also stay consistent within a project. The same knitter will also be knitting the later rows, so the amount of yarn needed for an equal area will remain the same.

Let’s break the rest of the shawl up. This is somewhat simplified. I only have 4 color stripes, but we’ll get the idea. The sides are also mirrored, so I can show off the stripes and the triangles that are key.

The second stripe is made up of 3 of those triangles, or 1+2. The third stripe has 5 triangles, or 3+2. In fact, if we keep adding two triangles for each new section, we have enough area to all stripes in the pattern. Also, we’re using all the odd numbers.

Doubling that (to account for both sides) doesn’t change the proportions. The second full stripe (both sides together) contains 3 times the area as the first (the two dark triangles put together). So the second color section in our yarn has to have 3 times as much yarn as the first. The third color section has to have 5 times as much as the first. The forth has to have 7 times as much as the first, etc.

This is just one shape, though. Let’s see what else can be knit with this yarn.

The key is to notice that the total amount of yarn – using that first color section as our unit – is as follows:

After 1 color: 1 = 12,

after 2 colors: 1 + 3 = 4 = 22,

after 3 colors: 4 + 5 = 9 = 32,

after 4 colors: 9 + 7 = 16 = 42,

after n colors: n2.

It’s all squares. Or, at least, it’s perfect squares times the yarn that was in that original color section.

It turns out, there are many shapes, for which this same pattern will make nice color stripes. A few of them are drawn below.

Another triangular shape

A circle of radius r, which would have r color sections, is $\pi$r2. The first ring (actually a small circle) uses the standard amount of yarn in the first color section of the yarn. It happens to make a circle of radius r, with area$\pi$r2. The next section has three times as much yarn, so it adds an area of 3$\pi$r2. The total area is now 4$\pi$r2 =$\pi$(2r)2, or a circle of radius 2r, which is exactly what we wanted. It means we have a ring around the first circle, of width r. It’s easy to see that adding the next color strand, which contains 5 times as much as the first, adds 5$\pi$r2 in area, for a total of 9$\pi$r2 =$\pi$(3r)2, or a circle of radius 3r. The pattern continues. Even though we can’t break the subsequent rings up, visually, into smaller circles (like we did with the triangles), the areas still work as we would hope.

Similar calculations work for a semicircular shawl or even a quarter circle shawl. They also work for squares, different shaped triangles, and wedges of triangles. The yarn, which worked so well for a triangular shawl, also lends itself well to a variety of shapes. Some example shawls, from the Caterpillar Green Yarns website, are shown below.

Here is a photo of the shawl I designed. It looks a little less triangular, because of the way the increases curve.

There are so many things that can be done with this yarn. I wrote a pattern for it.  Of course, the makers of the yarn at Caterpillar Green Yarns will happily give suggestions. Then, there’s the community at Ravelry, who have found wonderful things to do with a yarn that stripes in a different way than we’ve all been used to.

There aren’t crochet projects yet, but who knows, maybe you’re the industrious crocheter to come up with an equally lovely project in your craft.

If you are a knitter, be sure to check out Ravelry, even if you’re not searching for this yarn. It’s where I get inspiration for almost all of my knitting projects, and when I’ve finished with them, it has been a wonderful place to display pictures. I only post about 70% of what I complete, but that percentage is getting better, especially as it gets easier to upload photos.

I hope you liked learning about this yarn. I’ve already made two lovely shawls and have a third set aside as a fall project. And of course, I shall only work on it after putting in a full day of work on my research and grading (wink).

Author’s bio: Shannon Paper-Negaard has a masters and is working toward a PhD, both in mathematics. She is studying dynamical systems at the University of Minnesota. She loves to knit and spin yarn, and she’ll talk your ear off about knitting, math, travel, chocolate, or her bicycle. Find her on Twitter at @ShannonPaper or on Ravelry as paperdolls.

Posted in Art | | 2 Comments

## Steppin’ Up Challenge, Part 3

Welcome to the third and final part of the Steppin Up Challenge analysis! If you missed part 1 & 2, I encourage you to check it out here and here. So, here we are, analyzing the results of the 20 competitors in this four week challenge where we tracked active minutes and steps taken on a daily basis.

Last time we looked at the scatterplot of active minutes versus steps taken across all competitors. We looked for competitor’s who strayed far away from the linear regression, thereby having a higher step/min count or a lower step/min count.

We looked closely at competitor P, who I renamed Powerhouse. We found some interesting conclusions. Our linear regression still provided fairly reasonable results. However, all of the conclusions I made that were based on my knowledge of her as a runner were totally debunked, because she didn’t run at all during this 4 week period. Oops! Perhaps I’ll have better luck analyzing my own data.

And I also promised that I would analyze my own data, because anything I could share about someone else’s data, I should also share about my own. So, what about competitor “N”?  N was dubbed Namaste because I know that competitor N does a lot of yoga and walking and not much else. Let’s take a look at my, I mean her, time series.

Here there is a very clear relationship between steps and minutes. Which is actually a little surprising. If Namaste is doing a lot of yoga, I would expect her activity minutes to increase, but for her steps to stay flat. However, these time series are very much in phase with one another. The peaks in Namaste’s steps correlate well to her activity times. Let’s take a look at her scatterplot.

Namaste has an average of only 70.4 steps per active minute. This is lower than walking speed, which is reasonable given her choice of exercise (yoga and occasional biking). I would hedge my bets that the days which are furthest from the linear regression on lower right are the days that Namaste did yoga. The data points above the line are the days where she mostly walked everywhere.  In fact, I would bet that the minutes and steps are well correlated because yoga days also require extra walking (to get to the studio). However, I only know that because Namaste is my data. I wouldn’t be able to inuit this if it wasn’t my data.

On her, I mean my, fastest day, June 24th, she did 128 active steps/active minutes. Which is a little above average walking speed. Clearly, walking is fast enough for her. …I mean me. I don’t do anything faster than a walk. It’s a turtle’s life for me.

The only other competitor I really want to share with you is competitor M, who happens to be one of the winners of the whole competition. I’m going to show you his time series, and I want you to guess what happened:

Do you have a guess as to what caused the low numbers of steps and minutes for 4 days in late June/early July?  To me, it was clear, competitor M got sick. I talked with him about it and yes, he over did extended himself in week 1 and got sick during week 2.  However, his exercise level was high enough that he made up for it and was one of two winners of the competition. So, sometimes, the data tells a clear story. But sometimes it doesn’t…

I’m not sure what the conclusion is here. Certainly, there are lots of interesting things I can learn about your life if I have access to daily data related to your lifestyle. Data sleuthing can intuit a fair number of things about your life and fitness level. However, with nothing but steps and activity time, there might not be enough information to make any real conclusions. Maybe if I had GPS data I could really get somewhere! But, with this information I can’t even tell you how fast Powerhouse can run. And I can’t confidently tell you what days Namaste did yoga. But I can make some basic observations about your life, as shown here.

Did I cross the line between comforting or creepy? Only you can be the judge. But you should know, that FitBit or Jawbone or whatever your pedometer of choice is… they have this data. Your cellphone carrier might have it too. You may also be giving it to Niantic if you play Pokemon GO. And, of course, PRISM probably has it. The New York Times recently did a piece on the Chinese app, WeChat, and how it is changing the way business is done. …and the government gets all the data. In the US we tend to partition our data across companies so different companies each only have part of your data (like what I did here). And this usually keeps one entity from having all the data. But what would happen if such a monster app existed in the US?

## Math in the Media: August 2016

The internet has lots of great and terrible uses of math and mathematical visualizations! This is our opportunity to applaud the winners and be confused by the blunders. Here are a few of my favorites from August 2016:

## 1. The Gold Star goes to…

National Geographic for their article on severe lightning. I’m highlighting this article because it is a great use of statistics. First, they give a statistic about how often this particular area is affected by lightning.

…Goodman notes that Norway is not particularly prone to severe lightning. Satellite data from NASA’s Global Hydrology Research Center show that in an average year, southern Norway sees fewer than one lightning strike per square kilometer.

This is not, in and of itself, a useful number because I have no clue how many lightning strikes normally affect anything.  It’s like saying “6.5 people in the US died last year due to spiders” without giving context. It is useful to know that 33 people died in the US due to lightning. So if you aren’t afraid of lightning, then you probably shouldn’t be afraid of spiders. I’m so passionate about this problem that I wrote an article about it.  But, back to lightning in Norway!  National Geographic then tells us:

In contrast, the world’s biggest lightning hot spot, Venezuela’s Lake Maracaibo, gets struck more than 232 times per square kilometer in an average year—and endures nighttime thunderstorms 297 days of the year.”

This a wonderful use of comparing statistics. The fact is already interesting by itself AND it provides context to the statement above.  Beautiful work National Geographic!

## 2. Terrible use of “mathiness”

The graphic below makes me grumpy. This is a cute trick but there is no deep math in this image. There is no mysterious tangency between an ever-growing fibonacci spiral and a circle. There is no deep connection between the circle and a fibonacci spiral. This is could have been done with any spiral that has constant rate of radius growth and angular increase. This constant rotation to make the edge of the spiral intersect the circle is a relationship between a spiral and a circle. And a pretty trivial one at that. It’s very pretty, but as a graphic, is it not providing some new insight into the fibonacci spiral. And the “radius” following the edge of the circle within the next fibonacci rectangle is not particularly meaningful. So it fails my requirements of a good visualization. It’s like mathematical pandering. And it makes me sad.

And, in the words of a mathematican friend, “As far as I can tell, the point is that spirals look spirally”

For comparison, here is a much better example of cool math visualization.

When I look at this, I learn something! When white balls move in straight lines, it’s possible to make it appear like a circle of circles is rolling around! That’s cool! Straight lines = visually spinning.

## 3. Math Cartoon of the Month

And finally, a lovely cartoon about the odds of being exactly who you are:

Did you have a favorite experience with math on the internet in August? Share it in the comments below! Until next time, have a mathy September!

Posted in Art, Communicating Math, Internet Math, Media | | 5 Comments

## Steppin’ Up Challenge, Part 2

Welcome to part two of the Steppin Up Challenge analysis! If you missed part 1, I encourage you to check it out here.

Last time we looked at the scatterplot of everyone’s 4 week totals. But today I want to look more closely at the few individuals. The reason for doing this is to see exactly how much I can intuit about their lives from the Steppin’ Up data. In data science, there is a line between comforting (oh thanks google calendar for putting my flight information from gmail onto my calendar!) and creepy (How on earth did Hulu know, based on my viewing preferences alone, that I am an unmarried 30-35 year old male?).[theory ref] [google ref] I’m going to look around at few interesting things and see what we can figure out. But first I have to focus my attention, because I don’t have time to crawl through each persons data manually.

There are a couple individuals who stand out as exceptionally good at high steps per min or low steps per minute. These individuals can be found on the scatterplot by looking for the people who are the furthest from the linear regression line. To do this, imagine a line perpendicular to the linear regression line. We pick the competitors who are associated with the longest purple perpendicular line. In this case, P, G, and L are all similarly above the linear regression line. These three competitors all took more steps per step of exercise than the average. Competitors N, A, and F are all similarly below the linear regression line. Thus these competitors have more active minutes logged than the average competitor, meanwhile they have lower step counts. I’m going to choose competitor P and N to take a deeper dive. I’m choosing ‘P’ because I know she is a runner. Let’s call her ‘Powerhouse’. And I’m choosing ‘N’ because that one’s me. And it seems only fair that I analyze my data in public as much as I’m analyzing anyone else’s. I’ll call myself ‘Namaste’.

Across all days, steps and minutes combined, Powerhouse completed 246 steps per active minute. In contrast, Namaste only completed 145 steps per minute. If we subtract the 9124 we computed in the last article as the standard deduction, then Powerhouse has 84 steps/min and Namaste has 19 steps/min. From the last article we know that the group average was 49 steps/min. Thus, Powerhouse has almost twice the group average and Namaste has less than 1/2 the group average. Clearly some variations are possible!

Let’s take a look at Powerhouse’s time series of minutes and steps to try to understand more about her month.

It looks like Powerhouse had at least three days where she took a long run. The first day of the competition as well as two other days where her steps spike above 20,000 steps per day. However, it’s also possible there were other days where Powerhouse went for a run. For example, we can see that there are a few days in early July where Powerhouse had more than 15,000 steps for 4 days in a row. This is probably not just casual walking. There are probably some short runs in there, just based on the steps. However, there is a very interesting relationship between the minutes and the steps graphs.

Why are there some days where the number of minutes seems to peak (or valley) at the same time as her steps? Well, let’s take July 5th. Here is a low point in steps and minutes. Changes are good Powerhouse took a rest day and only did the obligatory walking required by her normal day. However, what do you say about her peak days: June 27th (25458 steps, 214 min) and July 6th (22963 steps,152 min)? Probably these are days where she did a lot of activity. We can figure out her steps/active minutes on those days by subtracting off the average steps that can’t be associated with exercise. But we don’t want to use the group average (9124) like we did above, we want to find Powerhouse’s personal average non-activity based steps. Personalization! It’s what makes things both comforting and/or creepy.

Powerhouse has a non-activity average of 8865 steps per day. Thus, we can determine that June 27th has 16593 exercise based steps over 214 min and July 6th has 14098 exercise based steps over 152 min. On these days she is hitting 77.5 steps/min and 92.75 steps/min. These are certainly underestimates of her steps/min when she is actually running. They are from data points which land on the upper right of the scatterplot, almost exactly on the linear regression line. Last time we calculated a few basic steps/min ratios for common activities (see below). From this quick analysis we can conclude that she are probably including some low intensity activity (like walking) on her highest activity days.

But, perhaps it’s possible to cherry pick a particular day where Powerhouse only went for a run. And perhaps we can figure out how fast she runs? Or at least get a lower bound on her speed. Let’s take the days which are furthest from her average. One of the days which is furthest from the linear regression line is on June 22nd. On this day, Powerhouse did 22 min of activity and went 16362 steps which translates to approximately 7497 exercise steps. When we look at this day on her time series, there is nothing particularly notable about this day. Clearly the relationship between steps and active minutes is not a simple one. So, the linear extrapolation we are making is probably not the most reliable method of determining running speed. If we translate Powerhouse’s numbers, she did 340 steps per exercise minute on June 22nd. Which means, based on this data, she ran 6 minute miles for almost 4 miles. My next step was to validate. How fast did Powerhouse normally run during this month? How close will I get to her actual run speed?

I spoke with Powerhouse and she told me something very unexpected! She told me that she didn’t run at all. Nada. Zip. Zilch. She also told that tore her MCL (which is a muscle inside of her knee) a while back and she hasn’t been able to run.  All activity in this challenge was walking, biking, and workout videos. Her job involves a lot of one on one conversations which are generally done while walking. The good news is that our computed step rate of 77.5 – 94.75 steps/min was definitely walking. And this implies that her highest stepping days mainly consisted of many hours of walking (because that’s what most days consistent of). So, some of the conclusions were reasonable. However, my attempts to cherrypick a running speed where totally thwarted.

What a fabulous and terrible dilemma!  At this point, I could definitely go back and edit this post to disguise and hide my previous attempt to determine her run speed. If this was a peer-reviewed article, I, almost certainly, would decline to share my false conclusions. But this isn’t peer reviewed! And I think there is something really valuable about seeing that even a professional can be lead astray when she (or he) starts trying to get more information out of a data set than is appropriate. If I continue to slice and dice the data into smaller partitions, I’m likely to find all kinds of weird things that may or may not have anything to do with reality. Data Skeptic recently published a podcast with Chris Stucchio about Multiple comparisons and p-hacking. So, I want to leave my erroneous analysis in the article, because this is a great problem in data mining. How far do you mine before you are just making stuff up?

In summary, I thought it could be really cool to figure out Powerhouse’s run speed. But, in fact, she didn’t run at all. My basic assumption about my predictive methods was wrong. Perhaps I found a day where she did a work-out video (lots of steps over a short period of time). Or perhaps there was some combination of exercise that just didn’t hit the 10 minute mark to make her FitBit count it as active minutes. Or perhaps there was some human error in reporting (which is also totally possible). Who knows! My original basis for inquiry (that she was running) was wrong. So many of my future conclusions were also wrong. Perhaps this there is a moral here about understanding the context of the problem before diving into the math?

This brings up a tangential but interesting point: Data Science can only be creepy if it’s accurate. We’ve probably all received advertisements on social media that are totally off base. In this situation we are amused or annoyed but we are definitely not creeped out. It’s like I expect a computer algorithm to not be able to understand me. So when it can’t predict something about me, I’m not surprised. I expect that. But there’s always human assumptions behind those algorithms. There’s someone back there fundamentally assuming that Powerhouse is running, even when she isn’t. They might assume that I’m single because I’m watching the Bachelorette, even though I’m not. What I’m trying to allude to is the idea that there can metaphorical fingerprints on data science results, depending on how the model was built. And in this case, it brought me to very wrong conclusions about Powerhouse.

Fascinating!

Tune in next time when I divulge the details of my month and conclude the series of posts on this challenge.

## Steppin’ Up Challenge, Part 1

Last month, I participated in a team building “Steppin Up Challenge” at work. We all had pedometers and logged our daily step count for 4 weeks. We also included our number of active minutes. There were 20 individual competitors who signed up for the event. For the next few posts I’m going to dig around the data and share some of the most interesting mathematical findings in this competition.
In total, the 19 people in our competition walked a total of 7.1 Million steps in 4 weeks. That’s roughly 13,383 steps per person per day. Wow! Now, we’ve probably all heard the recommendation that each person should take 10,000 steps per day. Why is that? The Centers for Disease Control and Prevention actually recommends getting 150 minutes of activity a week. For walkers, this amounts to about 10,000 steps per day, speed depending. So, on average, our competitors walked more than the suggested number of steps. Good job to us! But the next question is: did we achieve the minutes of exercise goal as well?

As it turns out, we were completing a weekly average of 607 minutes per week, or 10 hours per week. Definitely above the CDCs recommended values. So, how it is possible that we exceeded the goal of 10,000 steps per day by 30%, but we completely smashed the goal of 150 active minute per week by 400%?

This is probably because we were doing things that were active, but not step generating. Considering bicycling. A friend gave me some rough math from his bike commute; he found that he got about 450 “steps” on his pedometer for each mile he biked. This is less than a quarter of what he would get if he actually walked the mile. But, then we also need to consider that he is taking less time to travel that mile. So a pedestrian might spend 20 minutes to walk 2000 steps (=1 mile) while a bicyclist could spend 6 minutes biking the same distance to get 450 steps. If we normalize these, the pedestrian gets 100 steps per minute while the bicyclist gets 75 steps per minute. What about runners? If a runner can run an eight minute mile, they are running 2000 steps in 8 min. Thus a runner does approximately 250 steps per minute. (edit: normal runners do not do 250 steps/min! Because a running stride is longer than a walking stride, it takes fewer steps to complete a mile. Perhaps a reasonable value would be 1500 steps/mile for an 8-10 minute mile. Thus an 8-10 minute mile could reasonably be between 170-180 steps/min.)

So, what was the average steps per minute of our competitors? Are we more like runners or like walkers?

The average steps per active minute for the competitors over the 4 weeks of our challenge was 165.9 steps per minute of exercise. Wait a second- this is a value that is WAY above the steps/min we just computed for pedestrians and bicyclists! Why could that be? One option is that we are all runners. But I know I’m not a runner and did no running during the competition. I also know of many others in the same boat. We didn’t get high steps because we are avid runners. So why is that value so high?

Well, this is probably because there are a certain number of steps you take in a day that are definitely not associated with activity. All the steps you earn from walking back and forth from the refrigerator to get another diet orange Fanta don’t count towards your active minute total. But how do we decide what counts as activity and what does not? For this challenge, we decided to use the definition of active minutes that FitBit uses:

“Active Minutes” are awarded after 10 minutes of continuous moderate-to-intense activity. This includes walking at a brisk pace:

• E.g. You walk briskly to work for 11 minutes = 11 active minutes
• E.g. You walk briskly to get lunch in 5 minutes = 0 active minutes

So, we should be able to partition the steps into activity based steps and non-activity based steps. Like, there must be some average number of steps that everyone takes everyday that have nothing to do with the active exercise one does. This is a great question a basic a linear regression. With a linear regression, we are finding the best fit line of the data. We’ll get our regression results as an “m” and “b” as part of the y=mx + b equation for a line. The m tells us the what our average steps/active minute was. And the b is the y-intercept of the line– in this situation that is representative of the number of steps that we all took regardless of activity. For our data, b = 255487 steps. Take a look at the graph below for a scatterplot of the 19 competitors as well as the best fit line.

So if b = 255487 for the four weeks, this means we are, on average, taking 9124 steps per day, that are not exercise related. For reference, the average American supposedly walks between 300 and 3000 steps per day. Clearly there is a selection bias in our competitors! The people who signed up for the Steppin’ challenge have active lives even when they are not exercising. (Well except maybe contestant R, who appears to be a bit of an outlier).

Now let’s get back to the steps per minute computation. We want to see if our competitors are more like bikers (75 steps per minute) or runners (250 steps per minute). We originally computed 165.9 steps per day, but this included the 9124 steps we took everyday that had nothing to do with exercise. If we subtract that average from everyone and recompute the steps per active minute we get: 48.7 steps per active minute.

That feels really low! (Also note, the m from the regression gives us the slope between active minutes and steps. In this case, m = 49 steps per active minute. This is pretty close to our gross approximation of the 48.7 steps per minute.) As mentioned earlier, it’s probable that we were all doing exercise that doesn’t generate a lot of steps. But, we are even lower than biking? How?

Well, pedometers are very bad at measuring activity that doesn’t involve walking motions. And I know from anecdotal conversations that contestant M lifts a lot of weights. Weight lifting is definitely physical activity even though you get very few steps per minute. And personally, I do a lot of CorePower yoga. Yoga is definitely low step count. But if my sweat is any indication, it’s a good workout! Based on my experience, I get about 200 steps per hour of yoga. That’s only 3.3 steps per minute. If a bunch of contestants were all doing a few hours activity that involved only 200 steps/hour, then our global average will definitely drop below bicycling levels.

Based on this initial analysis, I’m concluding that this group does a lot of walking in their day to day lives. In contrast, the exercise completed by this group (on the whole) involves a lot of low stepping activities. However, there are some exceptions to this rule… But that’s a topic for next time!

## Pinpoint the moment when…

Graphs and data are always there for us, especially if we want to look back in time and find the exact moment that something changed. Today I’m presenting a bit of a poem of images about time series.

• Here’s the exact moment that Robert De Niro “gave up” on his career.

• In episode #576 of NPR’s podcast Planet Money, they highlight moment when women stopped coding.

• Even the Onion gets in on the action. They claim that FB has a new feature which tells you exactly when things are over.

Time series can be intuitive and very easy to understand. They are a great tool for looking at how something has changed over time.  In fact, the 6 Sigma methodology has made a whole theory around controlling processes with time series graphs.

Time series are great for helping pinpoint the moment when…