Table of Contents
- Motivation
- Basics of Faker
- Location-Specific Data Generation
- Create Text
- Create Profile Data
- Create Random Python Datatypes
- Conclusion
Motivation
Let’s say you want to create data with certain data types (bool, float, text, integers) with special characteristics (names, address, color, email, phone number, location) to test some Python libraries or specific implementation. But it takes time to find that specific kind of data. You wonder: is there a quick way that you can create your own data?
What if there is a package that enables you to create fake data in one line of code such as this:
fake.profile() { 'address': '076 Steven Trace\nJillville, ND 12393', 'birthdate': datetime.date(1981, 11, 19), 'blood_group': 'O-', 'company': 'Johnson-Rodriguez', 'current_location': (Decimal('61.969848'), Decimal('121.407164')), 'job': 'Patent examiner', 'mail': '[email protected]', 'name': 'Katie Romero', 'residence': '271 Smith Wells\nMichaelport, MN 40933', 'sex': 'F', 'ssn': '281-84-3963', 'username': 'eparker', 'website': ['https://www.gonzalez.com/', 'https://rogers-scott.com/'] }This can be done with Faker, a Python package that generates fake data for you, ranging from a specific data type to specific characteristics of that data, and the origin or language of the data. Let’s discover how we can use Faker to create fake data.
💻 Get the Code: The complete source code and Jupyter notebook for this tutorial are available on GitHub. Clone it to follow along!
Basics of Faker
Start with installing the package:
pip install FakerImport Faker:
from faker import Faker fake = Faker()Some basic methods of Faker:
print(fake.color_name()) print(fake.name()) print(fake.address()) print(fake.job()) print(fake.date_of_birth(minimum_age=30)) print(fake.city()) Tan Kristin Buck 715 Peter Views Abigailport, ME 57602 Systems analyst 1946-03-07 EvanmouthLet’s say you are an author of a fiction book who want to create a character but find it difficult and time-consuming to come up with a realistic name and information. You can write:
name = fake.name() color = fake.color_name() city = fake.city() job = fake.job() print(f'Her name is {name}. She lives in {city}. Her favorite color is {color}. She works as a {job}') Her name is Debra Armstrong. She lives in Beanview. Her favorite color is GreenYellow. She works as a LawyerWith Faker, you can generate a persuasive example instantly!
Location-Specific Data Generation
Luckily, we can also specify the location of the data we want to fake. Maybe the character you want to create is from Italy. You also want to create instances of her friends. Since you are from the US, it is difficult for you to generate relevant information to that location. That can be easily taken care of by adding location parameter in the class Faker:
fake = Faker('it_IT') for _ in range(10): print(fake.name()) Angelica Donarelli-Marangoni Rosaria Castiglione Federica Iacovelli Puccio Armellini Dina Donini-Alboni Dott. Carolina Marrone Olga Nosiglia Graziella Russo Paulina Galiazzo Dott. Riccardo PadovanoOr create information from multiple locations:
fake = Faker(['ja_JP','zh_CN','es_ES','en_US','fr_FR']) for _ in range(10): print(fake.city()) 齐齐哈尔市 Blakefort North Joeborough 玉兰市 Saint Suzanne-les-Bains Melilla 調布市 富津市 Maillot-sur-Mer East JamesshireIf you are from these specific countries, I hope you recognize the location. In case you are curious about other locations that you can specify, check out the doc here.
Create Text
Create Random Text
We can create random text with:
fake = Faker('en_US') print(fake.text()) Gas threat perhaps minute energy thus. Relate group science car discussion budget art. Let visit reach senior. Story once list almost. Enough major everyone.Try with the Vietnamese language:
fake = Faker('vi_VN') print(fake.text()) Như không cho số vậy tại đến. Hơn các thay. Khi từ cũng không rất là. Gần được cho có nơi như vẫn cho. Nơi đi về giống. Mà cũng từ nhưng lớn. Từng của nếu khi như nhưng.None of these random text makes sense, but it is a good way to quickly create text for testing.
Create Text from Selected Words
Or we can also create text from a list of words:
fake = Faker() my_information = ['dog','swimming', '21', 'slow', 'girl', 'coffee', 'flower','pink'] print(fake.sentence(ext_word_list=my_information)) print(fake.sentence(ext_word_list=my_information)) Coffee pink coffee. Dog pink 21 pink. ```text ## Create Profile Data {#create-profile-data} We can quickly create a profile with: ```python fake = Faker() fake.profile() {'job': 'Nurse, adult', 'company': 'Johnson, Moore and Glover', 'ssn': '762-56-8929', 'residence': '742 Shane Groves\nLake Jasminefort, GU 12583', 'current_location': (Decimal('-77.3842165'), Decimal('7.407430')), 'blood_group': 'B-', 'website': ['https://brooks.com/'], 'username': 'brownamanda', 'name': 'Carolyn Navarro', 'sex': 'F', 'address': '505 Lewis Grove Apt. 588\nHowardville, ID 68181', 'mail': '[email protected]', 'birthdate': datetime.date(1946, 6, 13)}As we can see, most relevant information about a person is created with ease, even with mail, ssn, username, and website.
What is even more useful is that we can create a dataframe of 100 users from different countries:
import pandas as pd fake = Faker(['it_IT','ja_JP', 'zh_CN', 'de_DE','en_US']) profiles = [fake.profile() for i in range(100)] pd.DataFrame(profiles).head()| 0 | Physiological scientist | Sobrero-Mazzanti Group | CLGTNO59H42A473Z | Incrocio Cabrini, 14 Appartamento 59\n74100, L… | (-88.2637715, 149.968584) | AB+ | [http://federici-endrizzi.it/, http://www.paru…] | giuliagreco | Dott. Liliana Serraglio | F | Vicolo Milo, 0\n64020, Ripattoni (TE) | [email protected] | 1998-10-10 |
| 1 | 花火師 | 阿部運輸株式会社 | 701-41-9799 | 和歌山県印旛郡本埜村鳥越20丁目23番18号 | (79.245074, 109.117174) | O+ | [https://suzuki.com/, http://ishikawa.jp/] | lyamamoto | 斉藤 明美 | F | 東京都江戸川区神明内40丁目12番20号 | [email protected] | 1916-12-09 |
| 2 | 小説家 | 小林食品株式会社 | 103-28-5057 | 島根県富津市細野7丁目16番1号 | (-84.3304275, 38.093874) | A+ | [https://tanaka.jp/, http://www.fujita.net/, h…] | minoru62 | 渡辺 英樹 | M | 青森県川崎市川崎区長畑22丁目27番12号 | [email protected] | 2008-02-17 |
| 3 | ゲームクリエイター | 佐藤水産有限会社 | 123-85-7967 | 宮城県調布市隼町3丁目22番12号 アーバン台東327 | (-49.3689775, -134.762867) | AB- | [http://www.sato.org/, http://kato.net/, http:…] | ayamamoto | 鈴木 洋介 | M | 栃木県川崎市中原区虎ノ門30丁目27番20号 | [email protected] | 1917-01-25 |
| 4 | 薬剤師 | 合同会社高橋建設 | 891-98-2169 | 山梨県山武郡横芝光町轟4丁目22番10号 コート天神島159 | (-62.1493985, -105.171377) | B+ | [http://yamashita.jp/, http://www.shimizu.com/] | yosukekimura | 田中 真綾 | F | 山口県府中市下吉羽6丁目20番2号 | [email protected] | 2001-08-09 |
Create Random Python Datatypes
If we just care about the type of your data, without caring so much about the information, we can easily generate random datatypes such as:
Boolean:
print(fake.pybool()) FalseA list of 5 elements with different data_type:
print(fake.pylist(nb_elements=5, variable_nb_elements=True)) ['[email protected]', 8515, 6618, 'UexWQJkGrJFGBAVfHgUt']A decimal with 5 left digits and 6 right digits (after the .):
print(fake.pydecimal(left_digits=5, right_digits=6, positive=False, min_value=None, max_value=None)) -26114.564612You can find more about other Python datatypes that you can create here.
Conclusion
I hope you find Faker a helpful tool to create data efficiently. You may find this tool useful for what you are working on or may not at the moment. But it is helpful to know that there exists a tool that enables you to generate data with ease for your specific needs such as testing.
Feel free to check out more information about Faker here.
.png)

