Visualizing the most common unisex names in the US

2 hours ago 1

By Aaron J. Becker

October 17, 2025

Many comments on my post on the most evenly-split names pointed out that ordering names by their “gender neutrality” isn’t intuitive or natural, and that the list of names that emerges from prioritizing even gender balance isn’t what most people would expect. So this is a quick follow-up that will visualize the baby names that have been common for both sexes in the Social Security Administration’s baby name data.

This post was published directly from the Python notebook used to conduct the data analysis— feel free to skip over the code blocks if that’s not your thing, or use the following button to hide all the code blocks:

contents

Please reach out if you want to try running this code yourself— I’m happy to share the notebook and helper libraries, but I haven’t yet put them in a place where I can share them easily.

# this code block imports and configures the tools that we'll be using throughout the notebook from pathlib import Path import sys, os import polars as pl import numpy as np import matplotlib.pyplot as plt import matplotlib as mpl from matplotlib import transforms # matplotlib theming from aquarel import load_theme theme = load_theme("gruvbox_light") theme.apply() # customize font in visualizations mpl.rcParams['font.family'] = 'monospace' mpl.rcParams['font.size'] = 14 # always show all rows in polars dataframes pl.Config.set_tbl_rows(-1) # show all list items when displaying list columns in polars pl.Config.set_fmt_table_cell_list_len(-1) # configure python import path for helper modules sys.path.append(Path(os.getcwd()).parent.parent.parent.as_posix()) # import the helper module from fetch_ssa_data.fetch_ssa_data import get_cleaned_national_df

the plotting function

This plotting function was modified from my previous post on the most evenly-split names. The most significant change is that I scaled the height of each bar according to the min and max total births present in the chart to make it easier to determine which names are the most popular overall.

def plot_gender_neutral_names(df: pl.DataFrame, start_year: int = 2000, end_year: int = 2024, total_births_threshold: int = 5000, title=None, sort_label = '% girls (desc.)', min_barheight: float = 1.0, max_barheight: float = 5.0) -> plt.Figure: """ Create a horizontal stacked bar chart showing gender distribution for names. Bar heights are proportional to log of total births. This code was generated with partial assistance from Claude. Parameters: ----------- df : pl.DataFrame DataFrame with columns: name, pct_female, pct_male, female_births, male_births, total_births Data must already be sorted and filtered. title : str Chart title figsize : tuple Figure size (width, height) min_barheight : float Minimum bar height for readability (default 0.6) max_barheight : float Maximum bar height (default 2.0) Returns: -------- fig : matplotlib.figure.Figure """ male_color = '#2b7fff' female_color = '#fb2c36' # Extract data names = df['name'].to_list() pct_female = df['pct_female'].to_list() pct_male = df['pct_male'].to_list() female_births = df['female_births'].to_list() male_births = df['male_births'].to_list() total_births = df['total_births'].to_list() n_plots = len(names) # base figure size on # of series to plot figsize = (10, 4 + n_plots * 0.5) # Calculate bar heights proportional to total births total_births = np.array(total_births) # Normalize to range [min_barheight, max_barheight] births_min, births_max = total_births.min(), total_births.max() if births_max > births_min: normalized_heights = min_barheight + (total_births - births_min) / (births_max - births_min) * (max_barheight - min_barheight) else: normalized_heights = np.full(len(total_births), min_barheight) # Calculate y positions (cumulative sum with spacing) spacing = 0.2 # Space between bars bar_heights = normalized_heights y_positions = [] current_y = 0 for height in bar_heights: y_positions.append(current_y + height / 2) current_y += height + spacing if title is None: title = f"common unisex baby names in the U.S.,\n{start_year}-{end_year}" # Create figure fig, ax = plt.subplots(figsize=figsize) # Create stacked horizontal bars # Female portion (left side, from 0 to pct_female) ax.barh(y_positions, pct_female, height=bar_heights, color=female_color, label='Female', alpha=1) # Male portion (right side, from pct_female to 1.0) ax.barh(y_positions, pct_male, left=pct_female, height=bar_heights, color=male_color, label='Male', alpha=1) # Add vertical grid lines at every 10% interval grid_color = "#a6a09b" for x_pos in [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]: ax.axvline(x=x_pos, color=grid_color, linewidth=1, alpha=1, zorder=0) # heavier line at 50% ax.axvline(x=0.5, color=grid_color, linewidth=2, alpha=1, zorder=0) # Add text labels for birth counts girls_labeled = False boys_labeled = False for i, (name, y_pos, pf, pm, fb, mb) in enumerate(zip(names, y_positions, pct_female, pct_male, female_births, male_births)): # only show "girls" or "boys" once # Female births label (left-aligned, near left edge of bar) if i == 0 or not girls_labeled: ax.text(0.02, y_pos, f'{fb:,} girls', ha='left', va='center', color='white', fontsize=12, fontweight='bold') girls_labeled = True else: ax.text(0.02, y_pos, f'{fb:,}', ha='left', va='center', color='white', fontsize=12, fontweight='bold') # Male births label (right-aligned, near right edge of bar) if i == 0 or not boys_labeled: ax.text(0.98, y_pos, f'{mb:,} boys', ha='right', va='center', color='white', fontsize=12, fontweight='bold') boys_labeled = True else: ax.text(0.98, y_pos, f'{mb:,}', ha='right', va='center', color='white', fontsize=12, fontweight='bold') # Set y-axis labels (names) ax.set_ylim(current_y, -spacing) ax.set_yticks(y_positions) ax.set_yticklabels(names, fontsize=18, fontweight='bold') # Set x-axis limits and labels ax.set_xlim(0, 1) ax.set_xticks([0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]) ax.set_xticklabels(['0%', '10%', '20%', '30%', '40%', '50%', '60%', '70%', '80%', '90%', '100%']) ax.set_xlabel('Gender Balance', fontsize=16) # note that bottommost row is added as title, since it's easier to add height and work upward subsubtitle = f'bar height is proportional to total births' text = ax.set_title(subsubtitle, fontsize=14, style='italic', loc='left', ha='left') ex = text.get_window_extent() x, y = text.get_position() t = transforms.offset_copy(text._transform, y=ex.height + 5, units='dots') subtitle = f"min. {total_births_threshold:,} births in both genders, sorted by {sort_label}" text = ax.text(x, y, subtitle, fontsize=16, ha='left', transform=t) # set subtitle ex = text.get_window_extent() x, y = text.get_position() t = transforms.offset_copy(text._transform, y=ex.height, units='dots') ax.text(x, y, title, fontsize=22, fontweight='bold', transform=t, va='bottom') # Remove spines (except bottom) ax.spines['top'].set_visible(False) ax.spines['right'].set_visible(False) ax.spines['left'].set_visible(False) # Add note at bottom note_text = f"Data source: Social Security Administration\nIncludes names that were given to >= {total_births_threshold:,} boys AND girls between {start_year} and {end_year}.\nBar height represents relative popularity (bounded linear scale)." fig.text(0.01, -0.01, note_text, ha='left', fontsize=11) # add nameplay.org link fig.text(0.98, -0.01, "NamePlay.org", ha='right', fontsize=22, fontweight='bold') plt.tight_layout() plt.show() return fig

data processing code

After loading the SSA baby name data, we’ll filter out rows before our defined start year, and then filter to names with at least min_births_either_gender births in both genders.

We’ll use this function for two different views of the data:

  • Names with at least 25,000 births in both genders since 1940
  • Names with at least 10,000 births in both genders since 2000
def get_common_unisex_names(min_births_either_gender: int = 25_000, start_year: int = 1940) -> pl.DataFrame: name_data = get_cleaned_national_df() # start year filtering name_data = name_data.filter(pl.col('year') >= start_year) # filter to names with at least X births in both genders gn_names = name_data.group_by('name').agg( pl.when(pl.col('gender') == 'M').then(pl.col('births')).otherwise(0).sum().alias('male_births'), pl.when(pl.col('gender') == 'F').then(pl.col('births')).otherwise(0).sum().alias('female_births'), pl.col('births').sum().alias('total_births') ).filter(pl.col('male_births') >= min_births_either_gender, pl.col('female_births') >= min_births_either_gender) # calculate percentages for plotting gn_names = gn_names.with_columns( (pl.col('male_births') / pl.col('total_births')).alias('pct_male'), (pl.col('female_births') / pl.col('total_births')).alias('pct_female') ).sort('pct_female', descending=True) return gn_names

common unisex names since 1940

I’ll be the first to admit that the 25,000 births in both genders threshold is arbitrary. I did fiddle with it a bit to find a threshold that was both a round number and a reasonable cut-off for the number of names that would be included.

Here’s the code to fetch data and plot the first view:

min_births_either_gender = 25_000 start_year = 1940 gn_names = get_common_unisex_names(min_births_either_gender=min_births_either_gender, start_year=start_year) print(f'there are {len(gn_names)} names with at least {min_births_either_gender} births in both genders since {start_year}') # plot the data fig = plot_gender_neutral_names(gn_names, total_births_threshold=min_births_either_gender, start_year=start_year)
there are 40 names with at least 25000 births in both genders since 1940

I like that this list includes names like “Ryan” that breached the 25,000 births threshold while remaining overwhelmingly masculine. I’m curious to hear what people think about scaling the bar heights to total popularity. It’s not precise enough to enable direct numerical comparisons between bars, but that’s what the numerical labels on the bars are for. I do feel like it makes it easier to see, at a glance, which names are more popular overall.

common unisex names since 2000

Another unscientific admission: I chose 10,000 births as the threshold for the more recent view in order to have a similar number of names in each chart. If we reduced the threshold in proportion to the reduction in the number of years of data, we’d end up with many more names, because unisex names have become more popular over time:

proportional_threshold = (2024-2000) / (2024 - 1940) * 25_000 print(f'A duration-proportional reduction in the threshold would be: {proportional_threshold:,.0f}') n_names = len(get_common_unisex_names(min_births_either_gender=proportional_threshold, start_year=2000)) print(f'There would be {n_names} names with at least {proportional_threshold:,.0f} births in both genders since 2000')
A duration-proportional reduction in the threshold would be: 7,143 There would be 57 names with at least 7,143 births in both genders since 2000

Here’s the code to process the data and create the more recent chart:

min_births_either_gender = 10_000 start_year = 2000 gn_names = get_common_unisex_names(min_births_either_gender=min_births_either_gender, start_year=start_year) print(f'there are {len(gn_names)} names with at least {min_births_either_gender} births in both genders since {start_year}') # plot the data fig = plot_gender_neutral_names(gn_names, total_births_threshold=min_births_either_gender, start_year=start_year)
there are 39 names with at least 10000 births in both genders since 2000

conclusions

I’m not going to go in-depth with the analysis for this post, mostly because I’m pressed for time. The main observation I draw from the charts above is that there are many more names that make the cut with a low percentage of girls’ births than vice versa, and that it seems to be much more common to give girls popular boys’ names than vice versa.

The best way to reach me is via email at [email protected] or on reddit. I’m always happy to chat about the data and the charts, or about names in general.

Read Entire Article