By Aaron J. Becker
 October 17, 2025
    
 Many comments on my post on the most evenly-split names pointed out that ordering names by their “gender neutrality” isn’t intuitive or natural, and that the list of names that emerges from prioritizing even gender balance isn’t what most people would expect. So this is a quick follow-up that will visualize the baby names that have been common for both sexes in the Social Security Administration’s baby name data.
 This post was published directly from the Python notebook used to conduct the data analysis— feel free to skip over the code blocks if that’s not your thing, or use the following button to hide all the code blocks:
  contents
   Please reach out if you want to try running this code yourself— I’m happy to share the notebook and helper libraries, but I haven’t yet put them in a place where I can share them easily.
  # this code block imports and configures the tools that we'll be using throughout the notebook
from pathlib import Path
import sys, os
import polars as pl
import numpy as np
import matplotlib.pyplot as plt
import matplotlib as mpl
from matplotlib import transforms
# matplotlib theming
from aquarel import load_theme
theme = load_theme("gruvbox_light")
theme.apply()
# customize font in visualizations
mpl.rcParams['font.family'] = 'monospace'
mpl.rcParams['font.size'] = 14
# always show all rows in polars dataframes
pl.Config.set_tbl_rows(-1)
# show all list items when displaying list columns in polars
pl.Config.set_fmt_table_cell_list_len(-1)
# configure python import path for helper modules
sys.path.append(Path(os.getcwd()).parent.parent.parent.as_posix())
# import the helper module
from fetch_ssa_data.fetch_ssa_data import get_cleaned_national_df
  the plotting function
 This plotting function was modified from my previous post on the most evenly-split names. The most significant change is that I scaled the height of each bar according to the min and max total births present in the chart to make it easier to determine which names are the most popular overall.
  def plot_gender_neutral_names(df: pl.DataFrame, 
                              start_year: int = 2000,
                              end_year: int = 2024,
                              total_births_threshold: int = 5000,
                              title=None,
                              sort_label = '% girls (desc.)',
                              min_barheight: float = 1.0,
                              max_barheight: float = 5.0) -> plt.Figure:
    """
    Create a horizontal stacked bar chart showing gender distribution for names.
    Bar heights are proportional to log of total births.
    This code was generated with partial assistance from Claude.
    
    Parameters:
    -----------
    df : pl.DataFrame
        DataFrame with columns: name, pct_female, pct_male, female_births, male_births, total_births
        Data must already be sorted and filtered.
    title : str
        Chart title
    figsize : tuple
        Figure size (width, height)
    min_barheight : float
        Minimum bar height for readability (default 0.6)
    max_barheight : float
        Maximum bar height (default 2.0)
    
    Returns:
    --------
    fig : matplotlib.figure.Figure
    """
    male_color = '#2b7fff'    
    female_color = '#fb2c36'
    # Extract data
    names = df['name'].to_list()
    pct_female = df['pct_female'].to_list()
    pct_male = df['pct_male'].to_list()
    female_births = df['female_births'].to_list()
    male_births = df['male_births'].to_list()
    total_births = df['total_births'].to_list()
    
    n_plots = len(names)
    # base figure size on # of series to plot
    figsize = (10, 4 + n_plots * 0.5)
    
    # Calculate bar heights proportional to total births    
    total_births = np.array(total_births)
    # Normalize to range [min_barheight, max_barheight]
    births_min, births_max = total_births.min(), total_births.max()
    if births_max > births_min:
        normalized_heights = min_barheight + (total_births - births_min) / (births_max - births_min) * (max_barheight - min_barheight)
    else:
        normalized_heights = np.full(len(total_births), min_barheight)
    
    # Calculate y positions (cumulative sum with spacing)
    spacing = 0.2  # Space between bars
    bar_heights = normalized_heights
    y_positions = []
    current_y = 0
    for height in bar_heights:
        y_positions.append(current_y + height / 2)
        current_y += height + spacing
    
    if title is None:
        title = f"common unisex baby names in the U.S.,\n{start_year}-{end_year}"
    # Create figure
    fig, ax = plt.subplots(figsize=figsize)
    
    # Create stacked horizontal bars
    # Female portion (left side, from 0 to pct_female)
    ax.barh(y_positions, pct_female, height=bar_heights, color=female_color, label='Female', alpha=1)
    
    # Male portion (right side, from pct_female to 1.0)
    ax.barh(y_positions, pct_male, left=pct_female, height=bar_heights, color=male_color, label='Male', alpha=1)
    
    # Add vertical grid lines at every 10% interval
    grid_color = "#a6a09b"
    for x_pos in [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]:        
        ax.axvline(x=x_pos, color=grid_color, linewidth=1, alpha=1, zorder=0)
    # heavier line at 50%
    ax.axvline(x=0.5, color=grid_color, linewidth=2, alpha=1, zorder=0)
    
    # Add text labels for birth counts
    girls_labeled = False
    boys_labeled = False
    for i, (name, y_pos, pf, pm, fb, mb) in enumerate(zip(names, y_positions, pct_female, pct_male, female_births, male_births)):
        # only show "girls" or "boys" once
        # Female births label (left-aligned, near left edge of bar)
        if i == 0 or not girls_labeled:
            ax.text(0.02, y_pos, f'{fb:,} girls', ha='left', va='center', 
                    color='white', fontsize=12, fontweight='bold')
            girls_labeled = True
        else:
            ax.text(0.02, y_pos, f'{fb:,}', ha='left', va='center', 
                    color='white', fontsize=12, fontweight='bold')
        
        # Male births label (right-aligned, near right edge of bar)
        if i == 0 or not boys_labeled:
            ax.text(0.98, y_pos, f'{mb:,} boys', ha='right', va='center', 
                    color='white', fontsize=12, fontweight='bold')
            boys_labeled = True
        else:
            ax.text(0.98, y_pos, f'{mb:,}', ha='right', va='center', 
                    color='white', fontsize=12, fontweight='bold')
        
    # Set y-axis labels (names)
    ax.set_ylim(current_y, -spacing)
    ax.set_yticks(y_positions)
    ax.set_yticklabels(names, fontsize=18, fontweight='bold')
    
    # Set x-axis limits and labels
    ax.set_xlim(0, 1)
    ax.set_xticks([0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0])
    ax.set_xticklabels(['0%', '10%', '20%', '30%', '40%', '50%', '60%', '70%', '80%', '90%', '100%'])
    ax.set_xlabel('Gender Balance', fontsize=16)
    
    # note that bottommost row is added as title, since it's easier to add height and work upward
    subsubtitle = f'bar height is proportional to total births'
    text = ax.set_title(subsubtitle, fontsize=14, style='italic', loc='left', ha='left')
    ex = text.get_window_extent()
    x, y = text.get_position()
    t = transforms.offset_copy(text._transform, y=ex.height + 5, units='dots')
    subtitle = f"min. {total_births_threshold:,} births in both genders, sorted by {sort_label}"
    text = ax.text(x, y, subtitle, fontsize=16, ha='left', transform=t)
    # set subtitle
    ex = text.get_window_extent()
    x, y = text.get_position()
    t = transforms.offset_copy(text._transform, y=ex.height, units='dots')
    ax.text(x, y, title, fontsize=22, fontweight='bold', transform=t, va='bottom')
    
    # Remove spines (except bottom)
    ax.spines['top'].set_visible(False)
    ax.spines['right'].set_visible(False)
    ax.spines['left'].set_visible(False)
    
    # Add note at bottom
    note_text = f"Data source: Social Security Administration\nIncludes names that were given to >= {total_births_threshold:,} boys AND girls between {start_year} and {end_year}.\nBar height represents relative popularity (bounded linear scale)."
    fig.text(0.01, -0.01, note_text, ha='left', fontsize=11)
    # add nameplay.org link
    fig.text(0.98, -0.01, "NamePlay.org", ha='right', fontsize=22, fontweight='bold')
    
    plt.tight_layout()
    plt.show()
    return fig
  data processing code
 After loading the SSA baby name data, we’ll filter out rows before our defined start year, and then filter to names with at least min_births_either_gender births in both genders.
 We’ll use this function for two different views of the data:
 - Names with at least 25,000 births in both genders since 1940
- Names with at least 10,000 births in both genders since 2000
 def get_common_unisex_names(min_births_either_gender: int = 25_000, start_year: int = 1940) -> pl.DataFrame:
    name_data = get_cleaned_national_df()
    # start year filtering
    name_data = name_data.filter(pl.col('year') >= start_year)
    # filter to names with at least X births in both genders
    gn_names = name_data.group_by('name').agg(
        pl.when(pl.col('gender') == 'M').then(pl.col('births')).otherwise(0).sum().alias('male_births'),
        pl.when(pl.col('gender') == 'F').then(pl.col('births')).otherwise(0).sum().alias('female_births'),
        pl.col('births').sum().alias('total_births')
    ).filter(pl.col('male_births') >= min_births_either_gender, pl.col('female_births') >= min_births_either_gender)
    # calculate percentages for plotting
    gn_names = gn_names.with_columns(
        (pl.col('male_births') / pl.col('total_births')).alias('pct_male'),
        (pl.col('female_births') / pl.col('total_births')).alias('pct_female')
    ).sort('pct_female', descending=True)
    return gn_names
  common unisex names since 1940
 I’ll be the first to admit that the 25,000 births in both genders threshold is arbitrary. I did fiddle with it a bit to find a threshold that was both a round number and a reasonable cut-off for the number of names that would be included.
 Here’s the code to fetch data and plot the first view:
  min_births_either_gender = 25_000
start_year = 1940
gn_names = get_common_unisex_names(min_births_either_gender=min_births_either_gender, start_year=start_year)
print(f'there are {len(gn_names)} names with at least {min_births_either_gender} births in both genders since {start_year}')
# plot the data
fig = plot_gender_neutral_names(gn_names, total_births_threshold=min_births_either_gender, start_year=start_year)
   there are 40 names with at least 25000 births in both genders since 1940
   
 I like that this list includes names like “Ryan” that breached the 25,000 births threshold while remaining overwhelmingly masculine. I’m curious to hear what people think about scaling the bar heights to total popularity. It’s not precise enough to enable direct numerical comparisons between bars, but that’s what the numerical labels on the bars are for. I do feel like it makes it easier to see, at a glance, which names are more popular overall.
 common unisex names since 2000
 Another unscientific admission: I chose 10,000 births as the threshold for the more recent view in order to have a similar number of names in each chart. If we reduced the threshold in proportion to the reduction in the number of years of data, we’d end up with many more names, because unisex names have become more popular over time:
  proportional_threshold = (2024-2000) / (2024 - 1940) * 25_000
print(f'A duration-proportional reduction in the threshold would be: {proportional_threshold:,.0f}')
n_names = len(get_common_unisex_names(min_births_either_gender=proportional_threshold, start_year=2000))
print(f'There would be {n_names} names with at least {proportional_threshold:,.0f} births in both genders since 2000')
   A duration-proportional reduction in the threshold would be: 7,143
There would be 57 names with at least 7,143 births in both genders since 2000
  Here’s the code to process the data and create the more recent chart:
  min_births_either_gender = 10_000
start_year = 2000
gn_names = get_common_unisex_names(min_births_either_gender=min_births_either_gender, start_year=start_year)
print(f'there are {len(gn_names)} names with at least {min_births_either_gender} births in both genders since {start_year}')
# plot the data
fig = plot_gender_neutral_names(gn_names, total_births_threshold=min_births_either_gender, start_year=start_year)
   there are 39 names with at least 10000 births in both genders since 2000
   
 conclusions
 I’m not going to go in-depth with the analysis for this post, mostly because I’m pressed for time. The main observation I draw from the charts above is that there are many more names that make the cut with a low percentage of girls’ births than vice versa, and that it seems to be much more common to give girls popular boys’ names than vice versa.
 The best way to reach me is via email at [email protected] or on reddit. I’m always happy to chat about the data and the charts, or about names in general.