Faker: Generate Realistic Test Data in Python with One Line of Code – CodeCut

2 days ago 1

Table of Contents

Motivation

Let’s say you want to create data with certain data types (bool, float, text, integers) with special characteristics (names, address, color, email, phone number, location) to test some Python libraries or specific implementation. But it takes time to find that specific kind of data. You wonder: is there a quick way that you can create your own data?

What if there is a package that enables you to create fake data in one line of code such as this:

fake.profile() { 'address': '076 Steven Trace\nJillville, ND 12393', 'birthdate': datetime.date(1981, 11, 19), 'blood_group': 'O-', 'company': 'Johnson-Rodriguez', 'current_location': (Decimal('61.969848'), Decimal('121.407164')), 'job': 'Patent examiner', 'mail': '[email protected]', 'name': 'Katie Romero', 'residence': '271 Smith Wells\nMichaelport, MN 40933', 'sex': 'F', 'ssn': '281-84-3963', 'username': 'eparker', 'website': ['https://www.gonzalez.com/', 'https://rogers-scott.com/'] }

This can be done with Faker, a Python package that generates fake data for you, ranging from a specific data type to specific characteristics of that data, and the origin or language of the data. Let’s discover how we can use Faker to create fake data.

💻 Get the Code: The complete source code and Jupyter notebook for this tutorial are available on GitHub. Clone it to follow along!

Basics of Faker

Start with installing the package:

pip install Faker

Import Faker:

from faker import Faker fake = Faker()

Some basic methods of Faker:

print(fake.color_name()) print(fake.name()) print(fake.address()) print(fake.job()) print(fake.date_of_birth(minimum_age=30)) print(fake.city()) Tan Kristin Buck 715 Peter Views Abigailport, ME 57602 Systems analyst 1946-03-07 Evanmouth

Let’s say you are an author of a fiction book who want to create a character but find it difficult and time-consuming to come up with a realistic name and information. You can write:

name = fake.name() color = fake.color_name() city = fake.city() job = fake.job() print(f'Her name is {name}. She lives in {city}. Her favorite color is {color}. She works as a {job}') Her name is Debra Armstrong. She lives in Beanview. Her favorite color is GreenYellow. She works as a Lawyer

With Faker, you can generate a persuasive example instantly!

Location-Specific Data Generation

Luckily, we can also specify the location of the data we want to fake. Maybe the character you want to create is from Italy. You also want to create instances of her friends. Since you are from the US, it is difficult for you to generate relevant information to that location. That can be easily taken care of by adding location parameter in the class Faker:

fake = Faker('it_IT') for _ in range(10): print(fake.name()) Angelica Donarelli-Marangoni Rosaria Castiglione Federica Iacovelli Puccio Armellini Dina Donini-Alboni Dott. Carolina Marrone Olga Nosiglia Graziella Russo Paulina Galiazzo Dott. Riccardo Padovano

Or create information from multiple locations:

fake = Faker(['ja_JP','zh_CN','es_ES','en_US','fr_FR']) for _ in range(10): print(fake.city()) 齐齐哈尔市 Blakefort North Joeborough 玉兰市 Saint Suzanne-les-Bains Melilla 調布市 富津市 Maillot-sur-Mer East Jamesshire

If you are from these specific countries, I hope you recognize the location. In case you are curious about other locations that you can specify, check out the doc here.

Create Text

Create Random Text

We can create random text with:

fake = Faker('en_US') print(fake.text()) Gas threat perhaps minute energy thus. Relate group science car discussion budget art. Let visit reach senior. Story once list almost. Enough major everyone.

Try with the Vietnamese language:

fake = Faker('vi_VN') print(fake.text()) Như không cho số vậy tại đến. Hơn các thay. Khi từ cũng không rất là. Gần được cho có nơi như vẫn cho. Nơi đi về giống. Mà cũng từ nhưng lớn. Từng của nếu khi như nhưng.

None of these random text makes sense, but it is a good way to quickly create text for testing.

Create Text from Selected Words

Or we can also create text from a list of words:

fake = Faker() my_information = ['dog','swimming', '21', 'slow', 'girl', 'coffee', 'flower','pink'] print(fake.sentence(ext_word_list=my_information)) print(fake.sentence(ext_word_list=my_information)) Coffee pink coffee. Dog pink 21 pink. ```text ## Create Profile Data {#create-profile-data} We can quickly create a profile with: ```python fake = Faker() fake.profile() {'job': 'Nurse, adult', 'company': 'Johnson, Moore and Glover', 'ssn': '762-56-8929', 'residence': '742 Shane Groves\nLake Jasminefort, GU 12583', 'current_location': (Decimal('-77.3842165'), Decimal('7.407430')), 'blood_group': 'B-', 'website': ['https://brooks.com/'], 'username': 'brownamanda', 'name': 'Carolyn Navarro', 'sex': 'F', 'address': '505 Lewis Grove Apt. 588\nHowardville, ID 68181', 'mail': '[email protected]', 'birthdate': datetime.date(1946, 6, 13)}

As we can see, most relevant information about a person is created with ease, even with mail, ssn, username, and website.

What is even more useful is that we can create a dataframe of 100 users from different countries:

import pandas as pd fake = Faker(['it_IT','ja_JP', 'zh_CN', 'de_DE','en_US']) profiles = [fake.profile() for i in range(100)] pd.DataFrame(profiles).head() job company ssn residence current_location blood_group website username name sex address mail birthdate
0 Physiological scientist Sobrero-Mazzanti Group CLGTNO59H42A473Z Incrocio Cabrini, 14 Appartamento 59\n74100, L… (-88.2637715, 149.968584) AB+ [http://federici-endrizzi.it/, http://www.paru…] giuliagreco Dott. Liliana Serraglio F Vicolo Milo, 0\n64020, Ripattoni (TE) [email protected] 1998-10-10
1 花火師 阿部運輸株式会社 701-41-9799 和歌山県印旛郡本埜村鳥越20丁目23番18号 (79.245074, 109.117174) O+ [https://suzuki.com/, http://ishikawa.jp/] lyamamoto 斉藤 明美 F 東京都江戸川区神明内40丁目12番20号 [email protected] 1916-12-09
2 小説家 小林食品株式会社 103-28-5057 島根県富津市細野7丁目16番1号 (-84.3304275, 38.093874) A+ [https://tanaka.jp/, http://www.fujita.net/, h…] minoru62 渡辺 英樹 M 青森県川崎市川崎区長畑22丁目27番12号 [email protected] 2008-02-17
3 ゲームクリエイター 佐藤水産有限会社 123-85-7967 宮城県調布市隼町3丁目22番12号 アーバン台東327 (-49.3689775, -134.762867) AB- [http://www.sato.org/, http://kato.net/, http:…] ayamamoto 鈴木 洋介 M 栃木県川崎市中原区虎ノ門30丁目27番20号 [email protected] 1917-01-25
4 薬剤師 合同会社高橋建設 891-98-2169 山梨県山武郡横芝光町轟4丁目22番10号 コート天神島159 (-62.1493985, -105.171377) B+ [http://yamashita.jp/, http://www.shimizu.com/] yosukekimura 田中 真綾 F 山口県府中市下吉羽6丁目20番2号 [email protected] 2001-08-09

Create Random Python Datatypes

If we just care about the type of your data, without caring so much about the information, we can easily generate random datatypes such as:

Boolean:

print(fake.pybool()) False

A list of 5 elements with different data_type:

print(fake.pylist(nb_elements=5, variable_nb_elements=True)) ['[email protected]', 8515, 6618, 'UexWQJkGrJFGBAVfHgUt']

A decimal with 5 left digits and 6 right digits (after the .):

print(fake.pydecimal(left_digits=5, right_digits=6, positive=False, min_value=None, max_value=None)) -26114.564612

You can find more about other Python datatypes that you can create here.

Conclusion

I hope you find Faker a helpful tool to create data efficiently. You may find this tool useful for what you are working on or may not at the moment. But it is helpful to know that there exists a tool that enables you to generate data with ease for your specific needs such as testing.

Feel free to check out more information about Faker here.

Read Entire Article