Overview

Advanced Google Sheets tools provide enhanced integration capabilities with Google Sheets, including the ability to retrieve complete sheet data. This complements the existing Google Sheets and Google Spreadsheet tools with additional functionality for comprehensive data access.

Key Features

GOOGLE_SHEETS_GET_SHEET

  • Retrieve complete sheet data from Google Sheets
  • Access to all cells and formatting information
  • Support for multiple sheets within a workbook
  • Efficient data retrieval for large datasets

Authentication

Google Sheets integration requires Google API credentials and appropriate permissions. Required setup:
  1. Google Cloud Console project with Sheets API enabled
  2. Service account credentials or OAuth 2.0 client
  3. Appropriate sharing permissions on target sheets
Note: Treat API credentials as sensitive information and never commit them to public repositories.

Example: Complete Sheet Data Retrieval

- id: get_full_sheet_data
  name: get_full_sheet_data
  tool: GOOGLE_SHEETS_GET_SHEET
  config:
    - name: credentials
      value: "{{secrets.GOOGLE_SHEETS_CREDENTIALS}}"
  input:
    - name: sheet_id
      value: "1BxiMVs0XRA5nFMdKvBdBZjgmUUqptlbs74OgvE2upms"
    - name: sheet_name
      value: "Sheet1"  # Optional: defaults to first sheet
    - name: include_formatting
      value: true
    - name: include_metadata
      value: true

- id: analyze_sheet_structure
  name: analyze_sheet_structure
  tool: PYTHON_SANDBOX_RUN
  input:
    - name: code
      value: |
        import json
        
        # Analyze retrieved sheet data
        sheet_data = json.loads('''{{steps.get_full_sheet_data.result.sheet_data}}''')
        
        print("Google Sheet Analysis")
        print("=" * 25)
        
        # Basic information
        print(f"Sheet Name: {sheet_data.get('sheet_name', 'Unknown')}")
        print(f"Sheet ID: {sheet_data.get('sheet_id', 'Unknown')}")
        
        # Data dimensions
        rows = sheet_data.get('rows', [])
        if rows:
            max_cols = max(len(row) for row in rows)
            print(f"Dimensions: {len(rows)} rows × {max_cols} columns")
            
            # Data preview
            print("\nData Preview (first 5 rows):")
            for i, row in enumerate(rows[:5], 1):
                print(f"Row {i}: {row[:5]}...")  # Show first 5 columns
        else:
            print("No data found in sheet")
        
        # Metadata
        if 'metadata' in sheet_data:
            metadata = sheet_data['metadata']
            print(f"\nMetadata:")
            print(f"  Last Modified: {metadata.get('modified_time', 'Unknown')}")
            print(f"  Owner: {metadata.get('owner', 'Unknown')}")

- id: convert_to_structured_format
  name: convert_to_structured_format
  tool: PYTHON_SANDBOX_RUN
  input:
    - name: code
      value: |
        import json
        
        sheet_data = json.loads('''{{steps.get_full_sheet_data.result.sheet_data}}''')
        rows = sheet_data.get('rows', [])
        
        if not rows:
            print("No data to convert")
        else:
            # Assume first row contains headers
            headers = rows[0] if rows else []
            data_rows = rows[1:] if len(rows) > 1 else []
            
            # Convert to array of objects
            structured_data = []
            for row in data_rows:
                row_obj = {}
                for i, header in enumerate(headers):
                    value = row[i] if i < len(row) else ""
                    row_obj[header] = value
                structured_data.append(row_obj)
            
            # Output structured data
            result = {
                "headers": headers,
                "data": structured_data,
                "record_count": len(structured_data)
            }
            
            print(json.dumps(result, indent=2))

- id: export_structured_data
  name: export_structured_data
  tool: OUTPUT_FILE
  input:
    - name: content
      value: "{{steps.convert_to_structured_format.result.stdout}}"
    - name: filename
      value: "google_sheets_export_{{date | format('YYYY-MM-DD')}}.json"
    - name: fileType
      value: "json"

Example: Multi-Sheet Workbook Processing

- id: get_workbook_info
  name: get_workbook_info
  tool: GOOGLE_SHEETS_GET_SHEET
  config:
    - name: credentials
      value: "{{secrets.GOOGLE_SHEETS_CREDENTIALS}}"
  input:
    - name: sheet_id
      value: "{{input.workbook_id}}"
    - name: get_sheet_list
      value: true  # Get list of all sheets in workbook

- id: process_all_sheets
  name: process_all_sheets
  tool: PYTHON_SANDBOX_RUN
  input:
    - name: code
      value: |
        import json
        
        # Get list of sheets from workbook
        workbook_info = json.loads('''{{steps.get_workbook_info.result.workbook_info}}''')
        sheets = workbook_info.get('sheets', [])
        
        print("Multi-Sheet Processing Plan")
        print("=" * 30)
        print(f"Workbook ID: {workbook_info.get('workbook_id', 'Unknown')}")
        print(f"Total Sheets: {len(sheets)}")
        print("\nSheets to process:")
        
        processing_plan = []
        for i, sheet in enumerate(sheets):
            sheet_name = sheet.get('name', f'Sheet{i+1}')
            sheet_info = {
                "sheet_name": sheet_name,
                "step_id": f"get_sheet_{i+1}",
                "processing_order": i+1
            }
            processing_plan.append(sheet_info)
            print(f"  {i+1}. {sheet_name}")
        
        print(f"\nProcessing plan created for {len(processing_plan)} sheets")
        print(json.dumps(processing_plan, indent=2))

- id: get_sales_sheet
  name: get_sales_sheet
  tool: GOOGLE_SHEETS_GET_SHEET
  config:
    - name: credentials
      value: "{{secrets.GOOGLE_SHEETS_CREDENTIALS}}"
  input:
    - name: sheet_id
      value: "{{input.workbook_id}}"
    - name: sheet_name
      value: "Sales Data"
    - name: include_formatting
      value: true

- id: get_inventory_sheet
  name: get_inventory_sheet
  tool: GOOGLE_SHEETS_GET_SHEET
  config:
    - name: credentials
      value: "{{secrets.GOOGLE_SHEETS_CREDENTIALS}}"
  input:
    - name: sheet_id
      value: "{{input.workbook_id}}"
    - name: sheet_name
      value: "Inventory"
    - name: include_formatting
      value: true

- id: consolidate_workbook_data
  name: consolidate_workbook_data
  tool: PYTHON_SANDBOX_RUN
  input:
    - name: code
      value: |
        import json
        from datetime import datetime
        
        # Consolidate data from multiple sheets
        sales_data = json.loads('''{{steps.get_sales_sheet.result.sheet_data}}''')
        inventory_data = json.loads('''{{steps.get_inventory_sheet.result.sheet_data}}''')
        
        consolidated_report = {
            "report_date": datetime.now().isoformat(),
            "workbook_id": "{{input.workbook_id}}",
            "sheets_processed": [],
            "summary_statistics": {},
            "data_quality_checks": []
        }
        
        # Process sales data
        if sales_data.get('rows'):
            sales_rows = len(sales_data['rows']) - 1  # Exclude header
            consolidated_report["sheets_processed"].append({
                "sheet_name": "Sales Data",
                "row_count": sales_rows,
                "status": "processed"
            })
        
        # Process inventory data
        if inventory_data.get('rows'):
            inventory_rows = len(inventory_data['rows']) - 1  # Exclude header
            consolidated_report["sheets_processed"].append({
                "sheet_name": "Inventory",
                "row_count": inventory_rows,
                "status": "processed"
            })
        
        # Calculate summary statistics
        total_rows = sum(sheet["row_count"] for sheet in consolidated_report["sheets_processed"])
        consolidated_report["summary_statistics"] = {
            "total_data_rows": total_rows,
            "sheets_processed": len(consolidated_report["sheets_processed"]),
            "processing_time": "{{date | format('HH:mm:ss')}}"
        }
        
        print("Workbook Consolidation Complete")
        print("=" * 35)
        print(json.dumps(consolidated_report, indent=2))

- id: save_consolidated_report
  name: save_consolidated_report
  tool: OUTPUT_FILE
  input:
    - name: content
      value: "{{steps.consolidate_workbook_data.result.stdout}}"
    - name: filename
      value: "workbook_consolidation_report.json"
    - name: fileType
      value: "json"

Example: Data Quality Analysis

- id: retrieve_data_for_analysis
  name: retrieve_data_for_analysis
  tool: GOOGLE_SHEETS_GET_SHEET
  config:
    - name: credentials
      value: "{{secrets.GOOGLE_SHEETS_CREDENTIALS}}"
  input:
    - name: sheet_id
      value: "{{input.analysis_sheet_id}}"
    - name: sheet_name
      value: "{{input.sheet_name}}"
    - name: include_formatting
      value: true
    - name: include_metadata
      value: true

- id: perform_data_quality_analysis
  name: perform_data_quality_analysis
  tool: PYTHON_SANDBOX_RUN
  input:
    - name: code
      value: |
        import json
        import re
        from collections import Counter
        
        # Analyze data quality
        sheet_data = json.loads('''{{steps.retrieve_data_for_analysis.result.sheet_data}}''')
        rows = sheet_data.get('rows', [])
        
        if not rows:
            print("No data found for analysis")
        else:
            headers = rows[0] if rows else []
            data_rows = rows[1:] if len(rows) > 1 else []
            
            print("Data Quality Analysis Report")
            print("=" * 35)
            print(f"Dataset: {sheet_data.get('sheet_name', 'Unknown')}")
            print(f"Total Records: {len(data_rows)}")
            print(f"Total Columns: {len(headers)}")
            print()
            
            # Column-by-column analysis
            quality_report = {
                "overall_stats": {
                    "total_records": len(data_rows),
                    "total_columns": len(headers),
                    "completeness_score": 0
                },
                "column_analysis": [],
                "data_issues": []
            }
            
            total_cells = 0
            empty_cells = 0
            
            for col_idx, header in enumerate(headers):
                col_values = [row[col_idx] if col_idx < len(row) else "" for row in data_rows]
                
                # Calculate column statistics
                non_empty_values = [v for v in col_values if v.strip()]
                empty_count = len(col_values) - len(non_empty_values)
                completeness = (len(non_empty_values) / len(col_values)) * 100 if col_values else 0
                
                # Data type detection
                numeric_count = sum(1 for v in non_empty_values if str(v).replace('.', '').replace('-', '').isdigit())
                date_count = sum(1 for v in non_empty_values if re.match(r'\d{1,2}[/-]\d{1,2}[/-]\d{2,4}', str(v)))
                email_count = sum(1 for v in non_empty_values if '@' in str(v) and '.' in str(v))
                
                likely_type = "text"
                if numeric_count / len(non_empty_values) > 0.8 if non_empty_values else False:
                    likely_type = "numeric"
                elif date_count / len(non_empty_values) > 0.5 if non_empty_values else False:
                    likely_type = "date"
                elif email_count / len(non_empty_values) > 0.5 if non_empty_values else False:
                    likely_type = "email"
                
                col_analysis = {
                    "column_name": header,
                    "completeness_percent": round(completeness, 2),
                    "empty_cells": empty_count,
                    "likely_data_type": likely_type,
                    "unique_values": len(set(non_empty_values)),
                    "sample_values": non_empty_values[:3]
                }
                
                quality_report["column_analysis"].append(col_analysis)
                
                total_cells += len(col_values)
                empty_cells += empty_count
                
                # Identify potential issues
                if completeness < 90:
                    quality_report["data_issues"].append(f"Column '{header}' has low completeness ({completeness:.1f}%)")
                
                if len(set(non_empty_values)) == 1 and non_empty_values:
                    quality_report["data_issues"].append(f"Column '{header}' has only one unique value")
            
            # Overall completeness score
            overall_completeness = ((total_cells - empty_cells) / total_cells) * 100 if total_cells > 0 else 0
            quality_report["overall_stats"]["completeness_score"] = round(overall_completeness, 2)
            
            print(f"Overall Data Completeness: {overall_completeness:.1f}%")
            print()
            
            # Display column analysis
            print("Column Analysis:")
            for col in quality_report["column_analysis"]:
                print(f"  📊 {col['column_name']}")
                print(f"     Completeness: {col['completeness_percent']}%")
                print(f"     Type: {col['likely_data_type']}")
                print(f"     Unique Values: {col['unique_values']}")
                print()
            
            # Display issues
            if quality_report["data_issues"]:
                print("🚨 Data Quality Issues:")
                for issue in quality_report["data_issues"]:
                    print(f"  - {issue}")
            else:
                print("✅ No major data quality issues detected")
            
            print("\n" + json.dumps(quality_report, indent=2))

- id: create_quality_recommendations
  name: create_quality_recommendations
  tool: OPENAI_INVOKE
  config:
    - name: version
      value: gpt-4
  input:
    - name: prompt
      value: |
        Based on this data quality analysis report, please provide specific recommendations for improving data quality:
        
        {{steps.perform_data_quality_analysis.result.stdout}}
        
        Please provide:
        1. Priority issues to address first
        2. Specific steps to improve data completeness
        3. Data validation rules to implement
        4. Best practices for ongoing data maintenance
        5. Suggested data cleaning workflows

- id: save_quality_analysis_report
  name: save_quality_analysis_report
  tool: OUTPUT_FILE
  input:
    - name: content
      value: |
        Google Sheets Data Quality Analysis Report
        ========================================
        
        Generated: {{date | format('YYYY-MM-DD HH:mm:ss')}}
        
        ## Analysis Results
        {{steps.perform_data_quality_analysis.result.stdout}}
        
        ## Recommendations
        {{steps.create_quality_recommendations.result.content}}
        
        ## Next Steps
        1. Address high-priority data quality issues
        2. Implement recommended data validation rules
        3. Set up regular data quality monitoring
        4. Train users on data entry best practices
        5. Consider automated data cleaning workflows
    - name: filename
      value: "data_quality_analysis_{{date | format('YYYY-MM-DD')}}.txt"
    - name: fileType
      value: "txt"

Sheet Retrieval Options

Basic Retrieval

  • sheet_id: Google Sheets document ID
  • sheet_name: Specific sheet tab name (optional)
  • range: Specific cell range (optional)

Advanced Options

  • include_formatting: Include cell formatting information
  • include_metadata: Include sheet metadata (owner, modified date, etc.)
  • get_sheet_list: Retrieve list of all sheets in workbook
  • preserve_empty_cells: Include empty cells in output

Data Format

Standard Response

{
  "sheet_id": "1BxiMVs0XRA5nFMdKvBdBZjgmUUqptlbs74OgvE2upms",
  "sheet_name": "Sheet1",
  "rows": [
    ["Header 1", "Header 2", "Header 3"],
    ["Value 1", "Value 2", "Value 3"],
    ["Value 4", "Value 5", "Value 6"]
  ],
  "metadata": {
    "modified_time": "2024-01-15T10:30:00Z",
    "owner": "user@example.com"
  }
}

With Formatting

{
  "sheet_data": {
    "rows": [...],
    "formatting": {
      "cell_formats": {
        "A1": {"bold": true, "background_color": "#ffff00"},
        "B1": {"italic": true, "text_color": "#ff0000"}
      }
    }
  }
}

Use Cases

  • Data Analysis: Retrieve complete datasets for analysis
  • Reporting: Generate reports from Google Sheets data
  • Data Migration: Export data from Google Sheets to other systems
  • Quality Assurance: Analyze data quality and completeness
  • Backup and Archive: Create backups of important spreadsheets
  • Integration: Connect Google Sheets data to other workflow tools
  • Dashboard Creation: Pull data for dashboard and visualization tools
  • Compliance Reporting: Extract data for regulatory reporting

Best Practices

Performance Optimization

  • Retrieve only necessary data ranges
  • Use appropriate inclusion options (formatting, metadata)
  • Consider pagination for very large sheets
  • Cache frequently accessed data

Data Handling

  • Always validate retrieved data structure
  • Handle empty cells and missing data gracefully
  • Implement error handling for access permission issues
  • Consider data privacy and security requirements

Integration Strategy

  • Use structured data formats for downstream processing
  • Implement data quality checks after retrieval
  • Log data retrieval operations for audit trails
  • Plan for rate limiting and quota management