The surprising complexity of .properties files

3 months ago 1

28 Jul, 2025

I was recently working on better support for .properties files in Jar.Tools. Probably every Java developer has worked with this format and knows that it's a simple one. So did I until I started implementing it. This is a kind of list of quirks and interesting cases I faced during this journey.


There are three separators (and one of them is whitespace)

Most people think .properties means key=value. In reality:

  • key=value
  • key:value
  • key␠value (one or more spaces or tabs)

All three are valid. That means the following are different lines with the same meaning:

server.port=8080 server.port:8080 server.port 8080

What I validate

  • Missing separator: if a non‑comment, non‑blank line has no =, :, or whitespace separator, that’s an error.

  • Empty key: a line that’s just = or : (or just whitespace before value) is an error for an empty key.

    =value # ⟵ error: empty key :value # ⟵ error: empty key

What is allowed

  • Explicit empty values are fine with any separator:

    empty.key= empty.key: empty.key␠

    All three parse as empty.key with an empty string value.

Continuations: odd vs even backslashes, and trailing whitespace

A line ending with a continuation backslash \ joins with the next line. This is where bugs hide:

  • Odd number of trailing backslashes → continuation.
  • Even number → the last backslash is escaped, so no continuation.
# Continues (odd backslashes at EOL) sql.query=SELECT * FROM users \ # Does NOT continue (even backslashes at EOL) literal.backslash=path ends with \\ # value ends with a single '\'

Trailing whitespace matters

A backslash followed by trailing spaces still behaves as a continuation marker in practice. If the file ends right after that whitespace (no next line), it’s a broken continuation error.

broken.continuation=this ends with a backslash \␠␠␠ # EOF here → error: “Line ends with continuation backslash but file ended.”

Multiline values done right

sql.query=SELECT id, name, email \ FROM users \ WHERE active = true \ ORDER BY name

When parsed, this becomes a single value:

SELECT id, name, email FROM users WHERE active = true ORDER BY name

Duplicates are subtle (case‑sensitive keys)

I treat keys as case‑sensitive and flag all occurrences when the same key appears multiple times:

duplicate.key=first duplicate.key=second duplicate.key=third

All three lines receive a warning that includes the index of every occurrence (e.g., “Duplicate key ‘duplicate.key’ found at: line 2, line 5, line 8”). By contrast:

myKey=one MyKey=two myKey=three

Only the two myKey entries get flagged; MyKey is distinct.

Why warn and not error? Real configs sometimes rely on “last one wins,” but it’s almost never intentional. A warning keeps you honest without breaking builds.

Unicode: \uXXXX escapes, surrogate pairs, and “garbage‑in” behavior

Properties files support \uXXXX escapes. That opens a whole Unicode can: invalid lengths, non‑hex digits, surrogate pairs for emoji, and “unknown” escapes.

Invalid escape sequences

Things like \u123 or \u12G4 show up in the wild. I parse them gracefully—no exceptions—and keep values as close as possible to what the user typed. The validator focuses on not crashing; it doesn’t over‑correct malformed text.

Surrogate pairs for emoji

Escaped emoji like \uD83D\uDE80 (🚀) decode correctly. In UTF‑8 mode I emit a warning (“Unicode escape sequence detected”) because direct Unicode is usually clearer. In ISO‑8859‑1 mode, escapes are often necessary, so I emit no warning.

Standard escapes “just work”

The usual suspects decode as expected:

  • \t, \n, \r, \f, \\
  • escaped separators and specials: \ , \:, \=, \#, \!

Unknown single‑letter escapes like \q or \z are treated literally (the backslash disappears, the letter stays). Again: avoid surprising the user.

Encoding modes: UTF‑8 vs ISO‑8859‑1

Historically, Java treated .properties as Latin‑1 (ISO‑8859‑1), with \uXXXX for anything beyond that range. Many modern tools use UTF‑8. To make intent explicit, I let the validator run in either mode.

ISO‑8859‑1 mode

  • Error on characters outside Latin‑1.

    unicode.chinese=你好世界 # error (outside ISO-8859-1) unicode.emoji=🎉🚀 # error valid.iso=café # fine (é is Latin‑1)
  • \uXXXX for Latin‑1 letters like \u00e9 (é) is allowed and not warned.

UTF‑8 mode

  • Direct Unicode is preferred and not warned.
  • \uXXXX escapes are warned as unnecessary (but still decoded). That includes escapes for ASCII: \u0041 → “A” with a warning.

Pick the mode that matches your runtime, and you’ll get the right balance of errors vs. guidance.

Comments and structure: preserve intent, don’t rewrite history

Lines starting with # or ! are comments. During validation, I:

  • Attach leading comments to the next property as leadingComments.
  • Keep raw text for each entry exactly as read.
  • Do not escape or normalize anything during validation.

During formatting, I:

  • Preserve comments as‑is.

  • Add a consistent key = value spacing.

  • Escape =, :, and spaces inside values so the output remains parsable:

    # original key=value with = and : chars # formatted key = value with \= and \: chars

This “no touching during validation” rule prevents a whole class of “the linter changed my config” surprises.

Lines that look empty… but aren’t

A sneaky category:

  • A line that’s only = or : → empty key error.

  • A line that’s key␠␠␠ → a valid key with an explicit empty value (whitespace is the separator).

  • Whitespace around separators with empty values is fine:

A practical checklist (aka mini‑linter rules)

  • Flag lines with no =, :, or whitespace separator (error).

  • Flag empty keys (error) but allow explicit empty values.

  • Handle continuation logic: odd vs even trailing backslashes; treat trailing whitespace after a continuation backslash as continuation; error if EOF cuts it off.

  • Treat keys as case‑sensitive; warn on duplicates and list all occurrences.

  • Decode standard escapes; treat unknown escapes literally without crashing.

  • Support UTF‑8 and ISO‑8859‑1 modes:

    • UTF‑8: warn on \uXXXX as unnecessary.
    • ISO‑8859‑1: error on out‑of‑range chars; allow \uXXXX freely.
  • Keep validation read‑only; do formatting in a separate step.

  • Preserve comments and attach them to following entries for context.

  • Represent multiline values as a single logical value; track start/end lines for tooling.

Closing thoughts

I was planning to be done with .properties files validation in few days tops, but after one week of debugging I realized, that even though it looks simple, real‑world examples mixes legacy encoding rules, permissive separators, escape sequences, and multiline values. I will not touch this format again :)

Read Entire Article