Project Scope

Project Description
The Cloud Data Extractor is a Python library designed to simplify and standardize the process of downloading structured data from Google Cloud Firestore and transforming it into formats commonly used in data analysis, machine learning, and application development. The library is designed to be lightweight, easy to integrate, and suitable for handling Google service account credentials, navigating the Firestore API, and performing repetitive transformations to convert documents into pandas DataFrames or other usable structures.
Problem Benefits of Solving
Firestore requires setup that can be complicated for users. The library will make the process of connecting GCP Firestore to an app easier and more user-friendly.
Lack of a built-in way to convert Firestore documents to pandas DataFrames. The library will have methods that allows user to convert fetched data into pandas DataFrames and other useful data structures.
No easy mechanism to reuse Firestore data across different applications. Using this library will prevent users from having duplicate code for Firestore connection across various projects.
User needs an easy way to interact with library. An interface class will allow user to interact with the application in an easy way.
Indicative Schedule
Project Start Date: 29.11.2025 Indicative Project End Date: 01.02.2026
Solution Scope
During implementation the developer will:
  • Design the overall library structure using a modern Python package layout.
  • Implement a credential-loading mechanism that securely reads Google service account keys and validates them.
  • Implement high-level data extraction functions.
  • Ensure compatibility with multiple Firestore projects, allowing users to specify project IDs on initialization.
  • Add error-handling and custom exception classes for clearer debugging and user-friendly messages.
  • Implement example scripts that show the library in action.
  • Prepare comprehensive documentation of project.
  • Create tests of the library methods.
  • Ensure the library follows clean code standards, including docstrings and modular design.
Project Assumptions
  1. The user possesses valid Google Cloud Platform (GCP) credentials, specifically a service account JSON with Firestore read permissions.
  2. A Firestore database exists in the target GCP project and contains at least one collection.
  3. The user knows their GCP project ID and Firestore collection names.
  4. Internet access is available when using the library to connect to Firestore.
  5. The library will be run in a Python 3.9+ environment.
  6. Firestore data structures are reasonably consistent.
  7. Firestore collections are expected to contain a manageable amount of data that can fit into memory when converted to a DataFrame.
Key Risks Rating
Invalid or misconfigured GCP credentials provided by users High
Insufficient Firestore service permissions Medium
Network issues while using library Medium
Too large dataset sizes Medium
Firestore schema inconsistency Medium