Overview
What is Sprout?
Sprout is a Python package that is designed to help create and manage a standardised and organised structure for storing and describing research data. Using modern data engineering and management practices, Sprout helps to ensure and enforce that research data are well-designed, discoverable, well-documented, and ultimately more (re)usable for later analyses.
Specifically, Sprout does this by integrating everything around the Frictionless Data Package standard, storing the data in the Apache Parquet format, and using a specific directory structure to organise the data into a “data package” with one or more “data resources”.
Why use it?
There are many tools for processing, cleaning, and transforming data, and for storing them in databases on servers and building data warehouses. However, few tools are specifically designed to create structured and organised data that is directly linked to its metadata—and even fewer assist in creating and structuring that metadata. Without metadata, data is nearly unusable.
Sprout integrates a few key standards and practices to help build a fantastic dataset along with its metadata. In that way, it ensures the data will be more easily usable for later analysis, for finding the right data needed for the analysis, and for better understanding of what is actually in the data.
In particular, because Sprout is a Python package, you can use Python as you normally would to help build up your data. You don’t need to learn another tool nor use a new interface. It’s just code.
We’ve found few tools that do what Sprout does. But there are a few similar tools available, though with quite different purposes and use-cases, such as:
- Open Data Editor: Is a web- and click-based tool for creating and editing data packages.
- gen3: Is a web-based platform for building data commons.
Learning more
This website contains documentation on Sprout’s design, how to use it, and details about the interface.
- How-to guide: The guide section provides a step-by-step introduction to using Sprout, including installation, setup, and creating and managing data packages.
- Reference: The reference section contains detailed information about the Sprout Python Package API, specifically the documentation of individual classes, methods, and functions.
- Design: The design section describes the architecture and interface of Sprout, the requirements, use-cases, user personas, and the C4 model.
Contributors
These are the people who have contributed to Sprout by submitting changes through pull requests 🎉