"Safe" YAML Monster

6 days ago 2

Some time ago I read this tweet. My first reaction was "wow" and I decided to investigate and reproduce it.

Worth to mention, that @joernchen used unsafe methods to build yaml-monster for 3 languages in a single magic.yaml. Right now I am going to show how we can use "safe" parsing and join 3 4 5 6 langs in a single magic.yaml.

My research started from googling something like "yaml parser differentials/quirks/confusions". Unfortunately, there were only mentions about "The Norway problem" and very strange formats for number representation :). Here we can see these features. However, nothing about "parser differentials" that can be related to our context. So the only way was to start with my own experiments.

I don't like the unsafe way, so I focused on different ways how the final yaml document is build and the way how keys can be represented.

The most common approach is using the same "key" multiple times in the same document. While it's not a valid document, many parsers accept it but result can be different. Unfortunately golang parser reject "the multiple same key" it and fails, so we have nothing here for our context.

The next feature that was found in yaml specification was "merge key" (<<: ...). And it gave some fruitful results:

lang: python/go <<: [{lang: ruby}]

Here, Ruby incorrectly implements this feature and overrides lang property. Nice!

The next step took a little bit more time. As I don't like unsafe way, yaml tags were forgotten for some time. Later revising what we can use, !!binary was found. Playing with it, we come up with the following document:

lang: python !!binary bGFuZw==: go <<: [{lang: ruby}]

In this case:

Python represents !!binary key as a bytearray b'lang' so keys 'lang' and b'lang' are different. Go is happy with "multiple same keys" validation as it compares lang and bGFuZw== (not decoded keys)

Here we are. Right now we have a valid yaml that safely parsed differently by Python/Go/Ruby.

At this point, I was satisfied and forgot about this monster for some time.

Later I found this writeup. It's a great research. Usually, in the security context, "parser differentials are related to url and http request parsing. JSON with the same keys also can be mentioned. However, at this time, we have yaml monster in play.

So, I decided to contact with @joernchen and present my finding about "safe yaml monster" as a follow up to his research.

As usual, if we have something broken, it can be milked in different ways. My safe solution was different from @joernchen's.

{\xEF \xBF \xBB}!!binary bGFuZx==: ruby lang: python !!binary bGFuZw==: go

{\xEF \xBF \xBB} - it's the UTF-8 BOM, not a string. Here I don't have an explanation why and how BOM is affected Ruby in such way that is not overwritten by "go" value. Without BOM Ruby would report "go".

It became interesting. If we have 3 langs in pack and multiple safe solutions, why not to try find another one.

As a result, Java(jackson) came to play:

{\xEF \xBF \xBB}!!binary bGFuZx==: ruby <<: [{lang: python}] !!binary lang: java !!binary bGFuZw==: go

It's a combined variation with a BOM for Ruby and merge-key to make Go happy.

Jackson ignores the !!binary tag and uses lang as a key...

As a recap here we can say that:

Rust behaves here like Java. Nodejs behavior is the same as Go.

After some time I decided to see if we can split Nodejs and Go in this magic yaml. It took much more time than previous monsters. However some juicy things were found:

alias-lang: &lang !!binary bGFuZz== !!binary bGFuZy==: node ? *lang : go <<: [{lang: python, <<: [{lang: ruby}]}] !!binary lang: rust

Here we used YAML's Anchors and Aliases and a little options in Ruby safe_load, aliases: true. Also Java(jackson) does not support aliases so it's not included. One interesting thing, if we replace alias-lang with the !!str version alias-lang: &lang lang instead of binary this won't work and Go wins over Nodejs. What is happening here: Nodejs after parsing sees it as "*lang": "go".

The next step was to try BOM variation and investigate how it used here. As a result we have:

{\xEF \xBF \xBB}!!binary bGFuZx==: ruby !!binary lang: rust !!binary bGFuZy==: node alias-lang: &lang !!binary bGFuZz== ? *lang : go <<: [{lang: python}]

Nothing interesting, almost the same, except that Ruby can be used without the aliases: true option :). Wow, we again are set with the default options. In this case Ruby looks only at the first line, so aliases are out of its scope.

Let's make another step and see what we can do with Java(Snake Engine). As name suggest behavior will be similar to Python version. Indeed, split them was hard :) but we were lucky. Splitting them we lost Rust, so it's not 6th monster but another 5 parser version.

!!binary bGFuZx==: ruby !!binary lang: rust !!binary bGFuZy==: node alias-lang: &lang !!binary bGFuZz== ? *lang : go alias-lang2: !!str &lang2 lang # # The only way to split python and java at the moment <<: [ { ? *lang2 : java, lang: python }, ]

Here Rust fails with duplicated key in mapping in the last block. In this case the Snake implementation does not override existing key but Python does.

That's it (at least I thought so).

P.S.

Versions that were used:

docker run --rm -v "$PWD":/usr/src/myapp -w /usr/src/myapp python bash -c "pip install pyyaml; python python.py magic-5-bom-java.yaml" ... Digest: sha256:fce9bc7648ef917a5ab67176cf1c7eb41b110452e259736144bc22f32f3aa622 Status: Downloaded newer image for python:latest ... Successfully installed pyyaml-6.0.1 ... {'lang': 'python', b'lang': 'go', b'\x95\xa9\xe0': 'rust', 'alias-lang': b'lang', 'alias-lang2': 'lang'} python docker run --rm -v "$PWD":/usr/src/myapp -w /usr/src/myapp ruby ruby ruby.rb magic-5-bom-java.yaml ... Digest: sha256:8584c968202ea356984262c4422461ee3a6022c0c4d8fb517b7b9c6395556670 Status: Downloaded newer image for ruby:latest {"lang"=>"ruby"} ruby docker run --rm -v "$PWD":/usr/src/myapp -w /usr/src/myapp golang go run go.go magic-5-bom-java.yaml ... Digest: sha256:a66eda637829ce891e9cf61ff1ee0edf544e1f6c5b0e666c7310dce231a66f28 Status: Downloaded newer image for golang:latest go: downloading gopkg.in/yaml.v3 v3.0.1 map[alias-lang:lang alias-lang2:lang lang:go ���:rust] go docker run --rm -v "$PWD":/usr/src/myapp -w /usr/src/myapp node node.js magic-5-bom-java.yaml ... Digest: sha256:b98ec1c96103fbe1a9e449b3854bbc0a0ed1c5936882ae0939d4c3a771265b4b Status: Downloaded newer image for node:latest {"lang":"node","���":"rust","alias-lang":{"type":"Buffer","data":[108,97,110,103]},"*lang":"go","alias-lang2":"lang","<<":[{"lang":"python"}]} node ###### Here version without Java docker run --rm -v "$PWD":/usr/src/myapp -w /usr/src/myapp rust cargo run magic-5-bom.yaml ... Digest: sha256:2c454db58842de39b18057df0617d24eb4f94f77d99ea8dfc0788387d0c9dc81 Status: Downloaded newer image for rust:latest ... Compiling yaml-rust2 v0.8.1 ... Hash({String("\u{feff}!!binary bGFuZx=="): String("ruby"), String("lang"): String("rust"), String("bGFuZy=="): String("node"), String("alias-lang"): String("bGFuZz=="), String("bGFuZz=="): String("go"), String("<<"): Array([Hash({String("lang"): String("python")})])}) ..... String("rust") For Java I won't show tests with docker jdk-21 org.yaml:snakeyaml:2.2

P.P.S. I thought that the research was completed, a brief writeup was composed, started to prepare a git repository with sources. However, something went wrong. I didn't close the link https://yaml.org/type/merge.html in my browser and some time later looked at it. And as usual I saw something that invited the 6th monster to the party.

!!binary bGFuZx==: ruby !!binary lang: rust !!binary bGFuZy==: node alias-lang: &lang !!binary bGFuZz== ? *lang : go alias-lang2: !!str &lang2 lang # # The only way to split python and java at the moment <<: [ { ? *lang2 : java, #lang: python }, ] !!merge qwerty: {lang: "python"}

!!merge. Previously, I tried multiple merge keys <<, but Node and Rust didn't like it and rejected as a duplicate key. With !!merge tag in our pocket we are able to create magic-6.yaml!

Read Entire Article