Reverse-engineering Firestore calls to export your data

5 hours ago 1

You can usually reverse engineer any website’s API using browser tools, the general pattern is:

Open Developer Tools > Network

Refresh the page and click around until you see some requests show up that you’d like to reproduce programmatically

Copy the request’s URL and cookie

Run it from your terminal or script: curl -H "Cookie: XXX" "https://api.example.com/v1/YYY/ZZZ"

It’s sometimes helpful, but often limited. APIs expose only a slice of your data, and potentially after server-side filters or transforms.

But if the app uses Firestore or Supabase, this gets a lot more powerful. Rather than a fixed set of APIs, the client talks directly to the database, and usually has access to all data associated with your user.

The underlying authorization mechanism is often a Firestore security rule that allows you access to any document with a matching userId parameter.

rules_version = '2'; service cloud.firestore { match /databases/{database}/documents { match /{collection}/{document} { // Allows you to read any document with a `userId` field // that matches that of your auth token allow read: if request.auth != null && request.auth.uid == resource.data.userId } } }

Thus, there’s an easy to execute algorithm to export all your data from any Firestore-backed app. The instructions below describe how to export your data, and you can find my Bun script here.

Step 0: Check whether the site uses Firebase

This snippet checks for Firebase or Supabase cookies. Copy it into your browser console while signed into a target website.

(async()=>{const e={Supabase:[],Firebase:[]};document.cookie.split(";").map(c=>c.trim().split("=")[0]).forEach(n=>/^(sb-|supabase)/i.test(n)&&e.Supabase.push(`Cookie:${n}`));(window.firebase||window.__FIREBASE_DEFAULTS__)&&e.Firebase.push("Global object:window.firebase");try{if(indexedDB.databases){(await indexedDB.databases()).forEach(d=>/firebase|firestore/i.test(d.name)&&e.Firebase.push(`IndexedDB:${d.name}`));}_

I highly encourage you to seek counsel with your LLM before copying any JS into your browser console.

Step 1: Download your Firebase token

This snippet copies your auth cookie to your clipboard, run it in the same manner as before.

(async()=>{copy((await new Promise(r=>{const o=indexedDB.open('firebaseLocalStorageDb');o.onsuccess=()=>o.result.transaction('firebaseLocalStorage').objectStore('firebaseLocalStorage').getAll().onsuccess=e=>r(e.target.result)})).find(x=>x.fbase_key.startsWith('firebase:authUser:')).value);console.log('Firestore auth token copied to clipboard ✅')})();

If you’re using my repo, save the copied token to a creds.json in the root directory.

Step 2: Figure out a collection to query

Woo! You have access to your data! Unfortunately, Firestore doesn’t expose a list of collections to your level of authorization.

Thus, you must figure out what collections are being accessed. That’s pretty easy:

Open up the network tab of the console, refresh/click around, and look for events sent to https://firestore.googleapis.com/.../Listen/channel. They’ll likely just show up as channel?... in the UI.

Look through the Payload - it’ll have requests of form reqX___data__ and these correspond to DB queries.

An example reqX___data__ request schema is as follows:

{ "database": "projects/MyAppName-12345/databases/(default)", "addTarget": { "query": { "structuredQuery": { "from": [ { "collectionId": "MyCollectionName" } ], "where": { "compositeFilter": { "op": "AND", "filters": [ { "fieldFilter": { "field": { "fieldPath": "MyUserIdField" }, "op": "EQUAL", "value": { "stringValue": "UUID-My-User-Id" } } } ] } }, }, }, } }

Some fields are omitted, but the key parts are:

MyCollectionName - the name of the collection being requested

MyUserIdField - The field in which documents store a userId. Security rules will likely check for this predicate to authenticate you.

You need the first two bits to download any data, as all queries you make must filter on the MyUserIdField - else they’ll be rejected.

I’d recommend copying the output, opening the firebase.ts script with your LLM-powered editor, and asking it to fill in the appropriate values for you given the output. LLMs are good at this.

Step 3: Figure out more collections to query

Okay, you found one collection to query, how do you find more? You have two options:

Repeat the above process: Click around, look through more network requests to /channel, and copy them into the script under the collections field.

Look through the source: The collection names very likely live as string literals in the website’s JS source. If you find instances of one collection name, and look at ~nearby lines of code, you’re likely going to find other collection names.

I copied ~100 lines of source code around the 2 instances of one collection name, and asked an LLM to infer the additional collection names.

Step 4: Run the script (or DIY)

If you’re running my Bun script, it should take in a list of collections, a userIdField to filter for auth on, and a creds.json to authenticate. It’ll dump a JSON, for each collection that you can later analyse.

DIYing it isn’t that bad either - you should probably look at my script for inspiration, it’s pretty easy, and an LLM can one-shot it quite readily - especially if you put my script in context.

You’re all set! Enjoy your data!

Appendix: Firestore Indexes

One snag you might run into, especially if you decide to build your own script and run your own arbitrary queries, is unindexed queries.

Most DBs let you run any query, and poorly indexed queries just run slower. Firestore will straight up not let you execute a query if it’s missing the right type of index.

Fortunately, all indexes on only a single field are prebuilt. Regardless, this might be a blocker if you try running arbitrary queries, as you can’t build new indexes without access to the GCP console.

In that case, you might want to spend more time clicking around the app to find queries that are already indexed.

Appendix: Supabase

It’s very likely quite easy to replicate this for Supabase RLS, if not even easier.

In particular - it seems like Supabase might expose a DB schema even to non-privileged clients, thus allowing you to skip steps 2 and 3.

Let me know if you end up reproducing this for Supabase.

Read Entire Article

Reverse-engineering Firestore calls to export your data

Step 0: Check whether the site uses Firebase

Step 1: Download your Firebase token

Step 2: Figure out a collection to query

Step 3: Figure out more collections to query

Step 4: Run the script (or DIY)

Appendix: Firestore Indexes

Appendix: Supabase

Related

It is high time we let go of the Mersenne Twister (2019)

Show HN: ComplaintBox: A social platform for complaints, sol...

'The Social Network Part II' in Works with Aaron Sorkin to W...