A New Faster Algorithm for Gregorian Date Conversion

8 hours ago 1

…how "Century-February-Padding" beats traditional year slicing.

3 November 2025

This is the first of a series of articles in which I outline some computer algorithms that I have developed for faster date conversion in the Gregorian calendar.

Most computer systems store dates as a single integer counting days or seconds since an epoch such as 1970-01-01 (UNIX time). Converting that integer to a Year, Month and Day is a fundamental calculation made in almost all programming languages and standard libraries.

In 2021, Cassio Neri and Lorenz Schneider developed the Neri‑Schneider algorithm , which is, to my knowledge, the fastest publicly documented algorithm for performing the above calculation. It has since been implemented in several prominent libraries, as well as in the Linux Kernel. This seems to have achieved most of the remaining possible speed gains over previous algorithms, but it is fun to see what is left to improve.

In this first article, I outline a technique that benchmarks around 2–10% faster than Neri‑Schneider across the computers that I currently have access to. I will show how it works step by step, although you can jump straight to the algorithm if you prefer. Benchmark results are provided, but you can also grab the benchmark code on GitHub to test it yourself. Improvements to the inverse function are presented, and finally I note some prior work from others that have similarity to this algorithm.

The 2nd article will introduce a potentially new fast technique for completely eliminating integer overflow in date conversions; and finally the 3rd article will outline a special-case 64-bit only algorithm that performs around 20% faster than Neri‑Schneider on the M4 MacBook Pro processor.

Comparison Algorithm (Year Only)

The speedup from this new algorithm comes exclusively from the part that calculates the Year, so I will first focus on that calculation in isolation.

The technique for year determination in many date libraries, including C++ Boost and Neri‑Schneider algorithm, is roughly as follows:

Given:   days = Days since epoch, where "0000-01-01" is zero   —   Then:

  1. cent = days * 4 / 146097
  2. dcen = days - cent * 146097 / 4
  3. ycen = dcen * 4 / 1461
  4. year = cent * 100 + ycen

Approximate number of CPU cycles: 19

Note:

  • 146097 is the number of days per 400-year cycle.
  • 1461 is the number of days per 4-year cycle.
  • All division signs on this page are integer division (rounded down).
  • Expensive operations are highlighted in red. E.g., multiplications and divisions which usually take around 3–4 times longer to compute than basic operations such as addition/subtraction or multiply/divide by 2N.
  • This is a simplification of what most date libraries actually do to perform date conversion, as they usually have to also compute the Day and Month at the same time. Usually that is achieved by using year 0001 as the epoch, and shifting by 306 days to model an interim year beginning on 1 March. The overall approach is the same though, including multiplying the number of centuries by 100. This is outlined further below.
  • The Neri‑Schneider algorithm uses modulus instead of multiplication on line 3, but this is not the source of the main speedup from that algorithm, which mainly speeds up the calculation of the day-of-year and month/day.

If this formula is unfamiliar to you, then this article by Howard Hinnant might help shed some light on it.

Faster Algorithm (Year Only)

Given:   days = Days since epoch, where "0001-01-01" is zero   —   Then:

  1. cent = days * 4 / 146097
  2. days += cent - cent / 4
  3. year = days * 4 / 1461

Approximate number of CPU cycles: 13.

This is both simpler and faster than the comparison algorithm, containing one fewer line, and two fewer "expensive" operations.

Note: It turns out this idea is not novel; however the way it will be adapted to the full Day/Month/Year may be.

How it Works

The first line is the same as before. The calculation slices the timeline into 100 year "centuries":

The above is equivalent to: floor(days / 36524.25), where 36524.25 is the average number of days per 100 years.

The next line is where things are different. Normally, Gregorian date calculations attempt to continue slicing down into progressively narrower periods of time, accounting for the 400/100-year cycle, then 4-year, then recombine the results. Instead, this new approach, pads the entire timeline with an extra fake February 29 for every 100th year which is not divisible by 400. In doing so, the new interim date then completely aligns with the Julian Calendar leap year rule.

This line adds the equivalent number of days that would have elapsed in a Julian calendar to come to the same date as the Gregorian calendar.

I.e. in the Julian calendar, leap years are every 4 years without exception, which is very easy to deal with mathematically. The Gregorian calendar omits a leap year every 100 years, but every 400 years does not omit them. By adding 1 every 100-years, and subtracting 1 every 400 years, we can efficiently apply that mapping.

The final line is almost the same as line 3 of the comparison algorithm:

The above is equivalent to: floor(days / 365.25), where 365.25 is the average number of days per 4 years in the Julian calendar.

That‘s it. The year variable now holds the correct Gregorian year for the given "rata die" (day-count).

Application to full Day/Month/Year

While it's interesting to calculate the Year quickly, the important question is whether Day, Month and Year can all be calculated quickly together, as that is what most date libraries do for performance reasons.

I will show step by step how existing algorithms can be transformed into a new faster form.
The below pseudocode example starts by modifying the Boost C++ algorithm with this new technique.

Old Way (e.g. Boost)

  1. days += EPOCH_SHIFT + 306
  2. cent = (days * 4 + 3) / 146097
  3. dcen = days - (146097 * cent) / 4
  4. ycen = (dcen * 4 + 3) / 1461
  5. yday = dcen - ycen * 1461 / 4
  6. year = 100 * cent + ycen + (month < 3)

New Way (Version 1) C++ Example

  1. days += EPOCH_SHIFT + 306
  2. cent = (days * 4 + 3) / 146097
  3. juld = days + cent - cent / 4
  4. year = (juld * 4 + 3) / 1461
  5. yday = juld - year * 1461 / 4
  6. year += (month < 3)

Note: EPOCH_SHIFT is a constant that changes changes the epoch from whatever the system natively uses (such as 1970-01-01) to 0001-01-01. In this specific example the number would be 719162, but if a different epoch is used, a different constant is required.

The lines that have substantially changed are shown in red/green diff colours.

This change seems to speedup the Boost algorithm by around 7-8%, but there are more improvements that can be made.

There is repetition in the algorithm shown in the previous section, being the term (foo * 4 + 3).
Due to the way the new formula works, we can take advantage of this, and avoid multiplying and dividing by 4 more times than needed.

New Way (Version 2) C++ Example

  1. days += EPOCH_SHIFT + 306
  2. qday = days * 4 + 3
  3. cent = qday / 146097
  4. qjul = qday + cent * 4 - (cent & ~3)
  5. year = qjul / 1461
  6. yday = (qjul - 1461 * year) / 4
  7. year += (month < 3)

Again, lines with substantial changes are highlighted in green.

This change seems to provide another ~6% speed-up.

Note:

  • The term cent & ~3 on line 4 is a bitwise operation that sets the lowest 2 bits of cent to zero in one CPU cycle.
    It is equal to:  floor(cent / 4) * 4.

Full Algorithm

Finally, we will complete the algorithm using the fast day/month calculation as used by Neri‑Schneider.
The calculation of yday is also modified to using their technique of modulus instead of multiplication and subtraction. I do not know the reasoning of why modulus performs better in this context, but I will reach out to the authors of that algorithm and update this post once it is clear.
Line 4 is also adjusted to subtract before adding. For reasons I don't know, changing the order of these terms seems to have a non-trivial impact on the speed, despite being identical mathematically and having the same ultimate overflow characteristics. This change slows down the performance on my MacBook M4 Pro, however it speeds up most other platforms.

New Way (with Neri‑Schneider EAF) C++ Example

  1. days += EPOCH_SHIFT + 306
  2. qday = days * 4 + 3
  3. cent = qday / 146097
  4. qjul = qday - (cent & ~3) + cent * 4
  5. year = qjul / 1461
  6. yday = (qjul % 1461) / 4
  7. N = yday * 2141 + 197913
  8. M = N / 65536
  9. D = N % 65536 / 2141
  10. bump = (yday >= 306)
  11. day = D + 1
  12. month = bump ? M - 12 : M
  13. year += bump

Note:

  • The variables N, D, M are using a scaled-up version of the more common formula as noted in the comments. Scaling it up and storing the numerator allows the CPU pipeline to parallelise the next two lines, as they are no longer dependent on each other. This is explained well by Neri‑Schneider.
  • The number 65536 is equal to 216 so the division and modulus of this number compile to single-cycle bit operations.

Improvements to the Inverse Function

The inverse of the problem is where Year, Month and Day are given, and the return value is the day-count ("rata die").

Once again the fastest published variant seems to be the Neri-Schneider version. Interestingly, their inverse algorithm is basically the direct inverse of the main algorithm presented in this article.

With some small tweaks, we can make the function simultaneously around 4% faster, and fully cover the 32-bit or 64-bit output range, rather than only ~25% of the range.

The changes to the general approach for the 32-bit case are shown below, with notable changes highlighted in red/green diff colours:

Given:   year, month, day   —   Then to calculate  rata die:

Neri-Schneider

  1. const S = 82
  2. const YEAR_SHIFT = 719468 + 146097 * S
  3. const RATA_SHIFT = 400 * S
  4. bump = month <= 2
  5. year += YEAR_SHIFT - bump
  6. cent = year / 100
  7. M = bump ? month + 12 : month
  8. y_days = year * 1461 / 4 - cent + cent / 4
  9. m_days = (979 * M - 2919) / 32
  10. d_days = day - 1
  11. rata_die = y_days + m_days + d_days - RATA_SHIFT

New (Faster & Overflow Safe)

  1. const S = 14700
  2. const YEAR_SHIFT = 719468 + 146097 * S + 1
  3. const RATA_SHIFT = 400 * S
  4. bump = month <= 2
  5. year += YEAR_SHIFT - bump
  6. cent = year / 100
  7. phase = bump ? 8829 : -2919
  8. y_days = year * 365 + year / 4 - cent + cent / 4
  9. m_days = (979 * month + phase) / 32
  10.  
     
  11. rata_die = y_days + m_days + day - RATA_SHIFT

Note:

  • The avoidance of overflow is achieved on line 10 by changing  “year * 1461 / 4”  to  “year * 365 + year / 4”.
  • The YEAR_SHIFT comprises 14700 * 400 years, as that is the smallest multiple of 400 years which exceeds 231 days.
  • Other changes are just micro-optimisations to save a few cycles.

Given that achieving full bit-range coverage comes at the cost of only 1 CPU cycle, this seems a good tradeoff for most use cases.

Full benchmark results are not shown here as this is not the main focus of this article.

Side-Note on Month / Day Determination

When learning about fast date algorithms, one of the most surprising aspects is that the Month and Day can be determined from a linear equation. At first, the Gregorian calendar seems very messy, February is bizarre and the months seems to fluctuate between 30 and 31 days in length almost at random.

Various authors have shown how the year is much more regular when February is treated as the last month of the year, as it was originally designed in the ancient Roman Calendar. This is sometimes demonstrated through a table of numbers or line-chart, however I have found that displaying the months like the following helps see the regularity at a glance:

31

I

MAR

30

II

APR

31

III

MAY

30

IV

JUN

31

V

JUL

31

VI

AUG

30

VII

SEP

31

VIII

OCT

30

IX

NOV

31

X

DEC

31

XI

JAN

28/29

XII

FEB

While I haven't seen any indication that the Romans actually ever laid out their calendar this way, one can imagine that the native use of Roman numerals might cause a Roman to think of it laid out something like this.

It can be seen that the spacing of the short months is regular. The occasional double-long-month pair breaks the alternating pattern, but these themselves are also equidistant.

To a person who natively thinks in Roman numerals, remembering that the short months are: II, VII, XII, along with IV & IX would be much easier than the way us modern folks have to memorise it.

Finally, January and February being a standalone pair help demonstrate why it was probably intuitive for the Romans to begin their political/military year in January, despite March being the start of the cultural year.

Prior Similar Algorithms

While researching for this article, I found that the general approach taken here is not entirely new. Padding the number-line with an extra fake February every 100 years (except every 400 years) seems to have been used possibly first in the Hatcher (1985) algorithm. Rather than show their entire algorithm, I will present the line-item which is most similar, adjusted to fit the pattern used in this article (their paper uses float arithmetic). The similar line is the adjustment applied to get the Julian date equivalent from the Gregorian date:

Hatcher Way

  1. juld = days + (3 * cent + 3) / 4

Faster Way

  1. juld = days + cent - cent / 4

The approach used by Hatcher is slightly slower, but not as slow as it seems at first as the multiplication by 3 can be implemented in two CPU cycles as cent + cent + cent rather than a usual multiplication which might take three CPU cycles.

This approach may have been overlooked due to the paper using float arithmetic throughout, as well as the slow technique used in that paper to calculate the day and month: operations such as MOD 12 and MOD 30.6 may not compile very efficiently.

An algorithm from Peter Baum (1998, 2017, 2020) uses this faster technique directly.
See section 3.4.2 with the calculation: B = A - INT(A / 4)

The Baum algorithm is shown in the Neri‑Schneider benchmarks as slower than Boost, but that is due to other parts of the algorithm perhaps not being as aggressively optimised. Apart from adopting the fast day/month from Neri-Schneider, it seems that taking the step of using the cent & ~3 and avoiding extra divisions and multiplications by 4 seems to be the necessary step to get it faster overall.

Overflow Behaviour

Many date conversion algorithms overflow close to "MAX_INT" / 4, due to the early multiplication by 4. As shown earlier, the inverse function can easily be adapted to avoid overflow altogether with almost no performance impact, but the same is not true in the forward direction.

This new algorithm pads the number line with three additional days every 400 years, so it will overflow around 0.002% earlier than Boost and Neri-Schneider.

The difference is negligible, so for any new libraries, or applications that don't need to support the same range, changing the date library could potentially be justified. For published libraries that are being used in the wild, where 3rd party developers are already using and expecting the range to stay stable, hands may be tied.

Fast techniques to eliminate overflow completely will be outlined in the 2nd article of this series.

Applying "EAF" (ala Neri‑Schneider)

So far, we have only adopted the fast day/month calculation as used by Neri‑Schneider. We have not yet adopted the fast year/date-of-year calculation they use, which is for a good reason.

Adopting that logic would look like the below change:

This Algorithm

  1. year = qjul / 1461
  2. yday = qjul % 1461 / 4

Neri‑Schneider

  1. P_2 = 2939745 * qjul
  2. year = P_2 / 4294967296
  3. yday = (P_2 % 4294967296) / 2939745 / 4;

At first this transformation may look very unusual, in fact at first it looks like it should be slower.

The reason it is quite fast is that 4294967296 is equal to 232, and therefore the division and modulus of that number on lines 6 and 7 is just taking the high and low 32 bits of the prior product, an operation that is very fast; and on 32-bit computers, is even a "free" step, as the prior multiplication simply places the high and low parts into adjacent 32-bit registers.

The magic number 2939745 is chosen by Neri‑Schneider as:  2939745 / 232 ≈ 1 / 1461, and therefore these steps manually implement the division required to calculate year.

In fact, the compiler would ordinarily transform the code year = qjul / 1461 into something very much like that. The whole reason to write it out manually this way is that processors can parallelise part of the work, as line 7 does not depend at all on line 6, and the division can start right away. This division is also compiled as a multiplication and bit-shift, so the subsequent division by 4 gets merged with that bit-shift, resulting in fewer steps overall.

Adopting this change does speedup the entire algorithm even more. However, while the calculation of year is accurate across the full 32-bit range, the calculation of yday is only accurate until qjul reaches the value 28,825,529. This makes the overall algorithm only work for a range of 19,730 years. This is not a problem in the Neri‑Schneider algorithm, as they apply this calculation within a selected 100 year "century", however in the new formula, this is applied directly against the timeline.

Some special-case performance sensitive applications might benefit from adopting this if it is known with certainty that no dates will ever be needed outside this range.

Benchmarks

The benchmark code is a direct fork of the benchmarks provided by Neri-Schneider (GitHub link ). The "speedup" is calculated by first subtracting the "scan" performance, which removes the overhead of calling the function from the benchmarks, as is also used by Neri‑Schneider.

At the time of writing this article I have only had time to test the algorithm on the following systems, however I plan to add more soon. If you identy a system where it is slower, feel free to let me know and I will update this page.

Lower numbers are faster.

PlatformScanNeri‑SchneiderNew WayApprox.
Speedup
Lenovo IdeaPad Slim 5 (Windows 11 24H2)
Snapdragon X126100 @ 2.956 GHz
Compiler: MSVC 19.44 (ARM64)
1682738562366758.7%
Dell Inspiron 13-5378 (Windows 10 22H2)
Intel Core i3-7100 @ 2.4 GHz
Compiler: MSVC 19.44 (x64)
?*140416127349> 9.3%
MacBook Pro 2016 (MacOS 12.7.6)
Intel Core i7 @ 2.7 GHz
Compiler: Apple clang 14.0.0
1037570340655418.0%
MacBook Pro 2024 (MacOS 15.6.1)
Apple M4 Pro
Compiler: Apple clang 17.0.0
365121750212033.0%
MacBook Pro 2020 (MacOS 14.3.1)
Intel Core i5 @ 2 GHz
Compiler: Apple clang 15.0.0
561660874594772.5%

These numbers were obtained by running the benchmarks three times and taking the middle result.

*I forgot to record the "scan" result for Dell Inspiron 13-5378, and will update this page once re-tested. Any non-zero result for "scan" will make the speedup even greater.

Next Steps

As noted previously, in the next articles I will outline:

  1. A potentially new fast technique for completely eliminating integer overflow in date conversions
  2. A special-case 64-bit only algorithm that performs around 20% faster than Neri‑Schneider on the M4 Mac Pro processor.
  3. Some other completely different techniques that I tried out while developing this version, some that provide speedups in certain cases, others that are just interesting.

If you are particularly interested, you can have a preview of some of those ideas by viewing implementations in the benchmark codebase, although they are currently light on comments. See the test-cases: eras and eras_bitapprox as well as fast64bit .

If you found this interesting, you should:

  1. Follow me on X to get notified when I publish more date and algorithm related articles.
  2. Check out my article on a new Martian timekeeping system.

Read Entire Article