Show HN: Generate coherent, synthetic data at scale

4 days ago 3

Go Version License Go Report Card Join our Discord

datagen is a tool to generate coherent, synthetic data generation from models expressed in a simple, declarative DSL.

Watch the video

Salient features:

  • A declarative DSL for defining data models with Go-like syntax
  • High performance through transpilation to native Go code
  • Multiple output formats (CSV, JSON, XML, stdout)
  • Database integration with direct loading to MySQL
  • Model relationships via cross-references using self.datagen
  • Tag-based filtering for selective data generation
  • Built-in functions for common data items

There are various ways of installing datagen.

Check your $PATH, and choose a directory you would like to place the datagenc compiler in.

echo $PATH /Users/username/go/bin:/opt/homebrew/bin:/opt/homebrew/sbin

Say, you wish to place the binary in /opt/homebrew/bin;

export GOBIN=/opt/homebrew/bin go install github.com/ds-horizon/datagen/cmd/datagenc@latest

Option 2: Install from Source

git clone github.com/ds-horizon/datagen

For permanent access on Mac/Unix, add the binary to your path, or add the current directory to your path:

echo 'export PATH=$PATH:$(pwd)' >> ~/.bashrc # for bash echo 'export PATH=$PATH:$(pwd)' >> ~/.zshrc # for zsh

For permanent access on Windows, add to your shell profile:

echo '$env:PATH += ";C:\path\to\datagen"' >> $PROFILE

Now, source the rc files or fire up a new terminal window for the changes to take effect.

You can launch datagen for trying it out with:

# Create a simple model file cat > user.dg << 'EOF' model user { metadata { count: 100 } fields { id() int name() string } gens { func id() { return iter + 1 } func name() { return Name() } } } EOF # Generate data datagenc gen user.dg -f csv -o ./output

this will generate a user.csv file in output directory with 100 user records.

Refer to CONTRIBUTING.md

MIT License, see LICENSE.

Read Entire Article