- Introduction — What’s Tablib?
- Working with Datasets
- Importing Information
- Exporting Information
- Dynamic Columns
- Formatters
- Wrapping Up
For a few years I’ve been working with instruments like Pandas and PySpark in Python for information ingestion, information processing, and information exporting. These instruments are nice for complicated information transformations and large information sizes (Pandas when the information suits in reminiscence). Nevertheless, usually I’ve used these instruments when the next circumstances apply:
- The information measurement is comparatively small. Suppose nicely under 100,000 rows of knowledge.
- Efficiency shouldn’t be a problem in any respect. Consider a one-off job or a job that repeats at midnight each night time, however I don’t care if it takes 20 seconds or 5 minutes.
- There are not any complicated transformations wanted. Consider merely importing 20 JSON recordsdata with the identical format, stacking them on prime of one another, after which exporting this as a CSV file.