For more fine-grained analysis of our FIT file
data, it’d
be great to be able to investigate it interactively. Other languages such
as Ruby, Raku and Python have a built-in REPL.1 Yet Perl
doesn’t.2 But help is at hand! PDL (the Perl Data
Language) is designed to be used
interactively and thus has a REPL we can use to manipulate and investigate
our activity data.3
Getting set up
Before we can use PDL, we’ll have to install it:
After it has finished installing (this can take a while), you’ll be able to
start the perlDL shell with the pdl command:
perlDL shell v1.357
PDL comes with ABSOLUTELY NO WARRANTY. For details, see the file
'COPYING' in the PDL distribution. This is free software and you
are welcome to redistribute it under certain conditions, see
the same file for details.
ReadLines, NiceSlice, MultiLines enabled
Reading PDL/default.perldlrc...
Found docs database /home/vagrant/perl5/perlbrew/perls/perl-5.38.3/lib/site_perl/5.38.3/x86_64-linux/PDL/pdldoc.db
Type 'help' for online help
Type 'demo' for online demos
Loaded PDL v2.100 (supports bad values)
Note: AutoLoader not enabled ('use PDL::AutoLoader' recommended)
pdl>
To exit the pdl shell, enter Ctrl-D at the prompt and you’ll be returned
to your terminal.
Cleaning up to continue
To manipulate the data in the pdl shell, we want to be able to call
individual routines from the geo-fit-plot-data.pl script. This way we can
use the arrays that some of the routines return to initialise PDL data
objects.
It’s easier to manipulate the data if we get ourselves a bit more organised
first.4 In other words, we need to extract the
routines into a module, which will make calling the code we created
earlier
from within pdl much easier.
Before we create a module, we need to do some refactoring. One thing that’s
been bothering me is the way the plot_activity_data() subroutine also
parses and manipulates date/time data. This routine should be focused on
plotting data, not on massaging its requirements into the correct form.
Munging the date/time data is something that should happen in its own
routine. This way we encapsulate the concepts and abstract away the
details. Another way of saying this is that the plotting routine shouldn’t
“know” how to manipulate date/time information to do its job.
To this end, I’ve moved the time extraction code into a routine called
get_time_data():
sub get_time_data{my@activity_data=@_;# get the epoch time for the first point in the time datamy@timestamps=map{$_->{'timestamp'}}@activity_data;my$first_epoch_time=$date_parser->parse_datetime($timestamps[0])->epoch;# convert timestamp data to elapsed minutes from start of activitymy@times=map{my$dt=$date_parser->parse_datetime($_);my$epoch_time=$dt->epoch;my$elapsed_time=($epoch_time-$first_epoch_time)/60;$elapsed_time;}@timestamps;return@times;}
The main change here in comparison to the previous version of the
code
is that we pass the activity data as an argument to get_time_data(),
returning the @times array to the calling code.
The code creating the date string used in the plot title now also resides in
its own function:
sub get_date{my@activity_data=@_;# determine date from timestamp datamy@timestamps=map{$_->{'timestamp'}}@activity_data;my$dt=$date_parser->parse_datetime($timestamps[0]);my$date=$dt->strftime("%Y-%m-%d");return$date;}
Where again, we’re passing the @activity_data array to the function. It
then returns the $date string that we use in the plot title.
Both of these routines use the $date_parser object, which I’ve extracted
into a constant in the main script scope:
It’s time to make our module. I’m not going to create the full Perl module
infrastructure here, as it’s not
necessary for our current goal. I want to import a module called
Geo::FIT::Utils and then access the functions that it
provides.5 Thus–in an appropriate project
directory–we need to create a file called lib/Geo/FIT/Utils.pm as well as
its associated path:
we now have the scaffolding of a module that (at least, theoretically)
exports the functionality we need.
Line 1 specifies the name of the module. Note that the module’s name must
match its path on the filesystem, hence why we created the file
Geo/FIT/Utils.pm.
We import the Exporter module (line
3) so that we can specify the functions to export. This is the @EXPORT_OK
array’s purpose (lines 6-12).
Finally, we end the file on line 14 with the code 1;. This line is
necessary so that importing the package (which in this case is also a
module) returns a true value. The value 1 is synonymous with Boolean true
in Perl, hence why it’s best practice to end module files with 1;.
Copying all the code except the main() routine from geo-fit-plot-data.pl
into Utils.pm, we end up with this:
packageGeo::FIT::Utils;usestrict;usewarnings;useExporter5.57'import';useGeo::FIT;useScalar::Utilqw(reftype);useList::Utilqw(max sum);useChart::Gnuplot;useDateTime::Format::Strptime;my$date_parser=DateTime::Format::Strptime->new(pattern=>"%Y-%m-%dT%H:%M:%SZ",time_zone=>'UTC',);sub extract_activity_data{my$fit=Geo::FIT->new();$fit->file("2025-05-08-07-58-33.fit");$fit->openordie$fit->error;my$record_callback=sub {my($self,$descriptor,$values)=@_;my@all_field_names=$self->fields_list($descriptor);my%event_data;formy$field_name(@all_field_names){my$field_value=$self->field_value($field_name,$descriptor,$values);if($field_value=~/[a-zA-Z]/){$event_data{$field_name}=$field_value;}}return\%event_data;};$fit->data_message_callback_by_name('record',$record_callback)ordie$fit->error;my@header_things=$fit->fetch_header;my$event_data;my@activity_data;do{$event_data=$fit->fetch;my$reftype=reftype$event_data;if(defined$reftype&&$reftypeeq'HASH'&&defined%$event_data{'timestamp'}){push@activity_data,$event_data;}}while($event_data);$fit->close;return@activity_data;}# extract and return the numerical parts of an array of FIT data valuessub num_parts{my$field_name=shift;my@activity_data=@_;returnmap{(split'',$_->{$field_name})[0]}@activity_data;}# return the average of an array of numberssub avg{my@array=@_;return(sum@array)/(scalar@array);}sub show_activity_statistics{my@activity_data=@_;print"Found ",scalar@activity_data," entries in FIT file\n";my$available_fields=join", ",sortkeys%{$activity_data[0]};print"Available fields: $available_fields\n";my$total_distance_m=(split'',${$activity_data[-1]}{'distance'})[0];my$total_distance=$total_distance_m/1000;print"Total distance: $total_distance km\n";my@speeds=num_parts('speed',@activity_data);my$maximum_speed=max@speeds;my$maximum_speed_km=$maximum_speed*3.6;print"Maximum speed: $maximum_speed m/s = $maximum_speed_km km/h\n";my$average_speed=avg(@speeds);my$average_speed_km=sprintf("%0.2f",$average_speed*3.6);$average_speed=sprintf("%0.2f",$average_speed);print"Average speed: $average_speed m/s = $average_speed_km km/h\n";my@powers=num_parts('power',@activity_data);my$maximum_power=max@powers;print"Maximum power: $maximum_power W\n";my$average_power=avg(@powers);$average_power=sprintf("%0.2f",$average_power);print"Average power: $average_power W\n";my@heart_rates=num_parts('heart_rate',@activity_data);my$maximum_heart_rate=max@heart_rates;print"Maximum heart rate: $maximum_heart_rate bpm\n";my$average_heart_rate=avg(@heart_rates);$average_heart_rate=sprintf("%0.2f",$average_heart_rate);print"Average heart rate: $average_heart_rate bpm\n";}sub plot_activity_data{my@activity_data=@_;# extract data to plot from full activity datamy@times=get_time_data(@activity_data);my@heart_rates=num_parts('heart_rate',@activity_data);my@powers=num_parts('power',@activity_data);# plot datamy$date=get_date(@activity_data);my$chart=Chart::Gnuplot->new(output=>"watopia-figure-8-heart-rate-and-power.png",title=>"Figure 8 in Watopia on $date: heart rate and power over time",xlabel=>"Elapsed time (min)",ylabel=>"Heart rate (bpm)",terminal=>"png size 1024, 768",xtics=>{incr=>5,},ytics=>{mirror=>"off",},y2label=>'Power (W)',y2range=>[0,1100],y2tics=>{incr=>100,},);my$heart_rate_ds=Chart::Gnuplot::DataSet->new(xdata=>\@times,ydata=>\@heart_rates,style=>"lines",);my$power_ds=Chart::Gnuplot::DataSet->new(xdata=>\@times,ydata=>\@powers,style=>"lines",axes=>"x1y2",);$chart->plot2d($power_ds,$heart_rate_ds);}sub get_time_data{my@activity_data=@_;# get the epoch time for the first point in the time datamy@timestamps=map{$_->{'timestamp'}}@activity_data;my$first_epoch_time=$date_parser->parse_datetime($timestamps[0])->epoch;# convert timestamp data to elapsed minutes from start of activitymy@times=map{my$dt=$date_parser->parse_datetime($_);my$epoch_time=$dt->epoch;my$elapsed_time=($epoch_time-$first_epoch_time)/60;$elapsed_time;}@timestamps;return@times;}sub get_date{my@activity_data=@_;# determine date from timestamp datamy@timestamps=map{$_->{'timestamp'}}@activity_data;my$dt=$date_parser->parse_datetime($timestamps[0]);my$date=$dt->strftime("%Y-%m-%d");return$date;}our@EXPORT_OK=qw(
extract_activity_data
show_activity_statistics
plot_activity_data
get_time_data
num_parts
);1;
… which is what we had before, but put into a nice package for easier
use.
One upside to having put all this code into a module is that the
geo-fit-plot-data.pl script is now much simpler:
We’re now ready to investigate our power and heart rate data interactively!
Start pdl and enter use lib 'lib' at the pdl> prompt so that it can
find our new module:6
$ pdl
perlDL shell v1.357
PDL comes with ABSOLUTELY NO WARRANTY. For details, see the file
'COPYING' in the PDL distribution. This is free software and you
are welcome to redistribute it under certain conditions, see
the same file for details.
ReadLines, NiceSlice, MultiLines enabled
Reading PDL/default.perldlrc...
Found docs database /home/vagrant/perl5/perlbrew/perls/perl-5.38.3/lib/site_perl/5.38.3/x86_64-linux/PDL/pdldoc.db
Type 'help' for online help
Type 'demo' for online demos
Loaded PDL v2.100 (supports bad values)
Note: AutoLoader not enabled ('use PDL::AutoLoader' recommended)
pdl> use lib 'lib'
Now import the functions we wish to use:
pdl> use Geo::FIT::Utils qw(extract_activity_data get_time_data num_parts)
Since we need the activity data from the FIT file to pass to the other
routines, we grab it and put it into a variable:
pdl> @activity_data = extract_activity_data
We also need to load the time data:
pdl> @times = get_time_data(@activity_data)
which we can then read into a PDL array:
With the time data in a PDL array, we can manipulate it more easily. For
instance, we can display elements of the array with the PDL print
statement in combination with the splice() method. The following code
shows the last five elements of the $time array:
Eyeballing the graph from above, we can see that the second sprint occurred
between approximately 47 and 48 minutes elapsed time. We know that the
arrays of time and power data all have the same length. Thus, if we find
out the indices of the $time array between these times, we can use them to
select the corresponding power data. To get array indices for known data
values, we use the PDL which
command:
Using the max() method on this output gives us the maximum power:
pdl> print $power($indices)->max
932
In other words, the maximum power for the second sprint was 932 W. Not as
good as the first sprint (which achieved 1023 W), but I was getting
tired by this stage.
The same procedure allows us to find the maximum power for the first sprint
with PDL. Again, eyeballing the graph above, we can see that the peak for
the first sprint occurred between 24 and 26 minutes. Constructing the query
in PDL, we have
i.e. 165 bpm, which matches the value that we found
earlier.
Note that I broadened the range of times to search over heart rate data here
because its peak occurred a bit after the power peak for the second sprint.
Looking forward
Where to from here? Well, we could extend this code to handle processing
multiple FIT files. This would allow us to find trends over many activities
and longer periods. Perhaps there are other data sources that one could
combine with longer trends. For instance, if one has access to weight data
over time, then it’d be possible to work out things like power-to-weight
ratios. Maybe looking at power and heart rate trends over a longer time can
identify things such as overtraining. I’m not a sport scientist, so I don’t
know how to go about that, yet it’s a possibility. Since we’ve got
fine-grained, per-ride data, if we can combine this with longer-term
analysis, there are probably many more interesting tidbits hiding in there
that we can look at and think about.
Open question
One thing I haven’t been able to work out is where the calorie information
is. As far as I can tell, Zwift calculates how many calories were burned
during a given ride. Also, if one uploads the FIT file to a service such as
Strava, then it too shows calories burned and the value is the same. This
would imply that Strava is only displaying a value stored in the FIT file.
So where is the calorie value in the FIT data? I’ve not been able to find
it in the data messages that Geo::FIT reads, so I’ve no idea what’s going
on there.
Conclusion
What have we learned? We’ve found out how to read, analyse and plot data
from Garmin FIT files all by using Perl modules. Also, we’ve learned how to
investigate the data interactively by using the PDL shell. Cool!
One main takeaway that might not be obvious is that you don’t really need
online services such as Strava. You should now have the tools to process,
analyse and visualise data from your own FIT files. With Geo::FIT,
Chart::Gnuplot and a bit of programming, you can glue together the
components to provide much of the same (and in some cases, more)
functionality yourself.
I wish you lots of fun when playing around with FIT data!
Support
If you liked this post and want to see more, please
buy me a coffee!