Skip to content

Conversation

@devin-ai-integration
Copy link
Contributor

@devin-ai-integration devin-ai-integration bot commented Dec 17, 2025

Summary

This PR merges multiple fix branches into a single branch for coordinated release:

From PR #21 (SUPPORT-14412): Instagram V2 parity fixes including token selection logic based on fb_page_id presence, filtering Facebook Page IDs from Instagram insights queries, graceful error handling for recoverable errors (business conversion, 30-day limit, object not found), improved pagination timestamp handling, and proactive 30-day date range validation.

From PR #22 (SUPPORT-14107): Facebook Ads action breakdown query fixes including DSL parameter extraction (level, action_breakdowns, date_preset, time_increment, breakdowns, fields list) and ensuring account_id is always included for backwards compatibility.

From PR #18: Fix for Pydantic validation error when UI sends empty list [] for accounts field instead of empty dict {}.

A merge conflict in page_loader.py was resolved by including both sets of changes to _build_params: DSL parameter extraction runs first, followed by the 30-day validation call.

Updates since last revision

Fixed CI workflow ordering (critical fix):

  • Root cause: Platform tests were failing with "Failed to get hash for image" because they tried to use the branch image tag before it was pushed to ECR
  • Added push-staging job that pushes all 3 component images to ECR with push_latest: false AFTER unit tests pass
  • Platform tests (tests-kbc-*) now depend on push-staging instead of build
  • Final push jobs only run on semantic version tags (is_deploy_ready == 'true') with push_latest: true
  • Added scripts/validate_test_outputs.sh to validate output tables exist after platform tests

Previous updates:

  • Added CI/CD platform tests for all three components (Pages, Ads, Instagram)
  • Added comprehensive unit tests (21 tests) for component_id guards
  • Added component_id guards for Instagram-specific logic
  • Pass component_id from Component to FacebookClient and PageLoader constructors

Review & Testing Checklist for Human

  • CRITICAL: Verify CI passes - The workflow ordering fix should resolve "Failed to get hash for image" errors. All 3 platform test jobs should now run successfully.
  • CRITICAL: Verify KBC_STORAGE_TOKEN secret is configured in GitHub repository settings for project 5011
  • CRITICAL: Verify test configs have valid OAuth tokens - Meta tokens expire, check configs in project 5011 still work
  • Review workflow DAG: Verify the job ordering is correct: buildtestspush-stagingtests-kbc-* → (on release) push-*deploy
  • Test Facebook Ads extraction: Run a job and verify act_* accounts are NOT filtered out and no 30-day validation errors occur
  • Test SUPPORT-14107 customer config: Re-run job in project 15170 with sync query using DSL syntax

Recommended test plan:

  1. Wait for CI to complete on this branch - verify all three platform test jobs pass (this is the main verification for the workflow fix)
  2. If CI passes, deploy using image tag from this branch's CI build
  3. Re-run Facebook Ads job (152270357) - should complete without "Skipping account(s)" warnings
  4. Re-run SUPPORT-14412 customer job (Instagram V2) - should complete with graceful error handling

Notes

The workflow now has two push phases:

  • push-staging: Runs after unit tests, pushes branch tag with push_latest: false (enables platform tests)
  • push-* (Final): Runs only on release tags after platform tests pass, with push_latest: true

The platform test configs are in project 5011 with hardcoded config IDs in the workflow. If these configs are deleted or modified, CI will fail. The test configs require valid Meta OAuth tokens which may expire.

The component_id is propagated through three layers: ComponentFacebookClientPageLoader. All three PageLoader instantiation sites in client.py have been updated to pass component_id.

Related Jira tickets: SUPPORT-14412, SUPPORT-14107

Link to Devin run: https://app.devin.ai/sessions/4ceb3ae74a734baaa45aa06f601c3550
Requested by: [email protected] (@ZdenekSrotyr)

Previous Devin session: https://app.devin.ai/sessions/d820bdbb1686427f81b3777a37f3fc2d
Previous Devin session: https://app.devin.ai/sessions/936c7440de3e444ab6b4152c9ec60443
Previous requester: Matyáš Jirát (@matyas-jirat-keboola)

devin-ai-integration bot and others added 23 commits November 26, 2025 14:02
…thout fb_page_id

Instagram Business Account IDs (17 digits starting with '17841') no longer
require fb_page_id to be set. The user access token is sufficient for
Instagram insights queries, removing the dependency on page token lookup.

This fixes the 400 Bad Request errors that occurred when Instagram accounts
were missing the fb_page_id field in their configuration.

Related: SUPPORT-14412, SUPPORT-14419
Co-Authored-By: Matyáš Jirát <[email protected]>
Simplified the token selection logic:
- If account has fb_page_id: look up page token using the Facebook Page ID
- If account has no fb_page_id: use user token directly

This approach is more explicit and doesn't rely on undocumented ID patterns.
Removed the _is_instagram_business_account_id() heuristic method.

Co-Authored-By: Matyáš Jirát <[email protected]>
…ts queries

For Instagram-specific insights queries (using metrics like follower_count, reach),
only process accounts that have fb_page_id set (Instagram Business Account entries).
This prevents calling Instagram metrics on Facebook Page IDs which causes 400 errors.

Co-Authored-By: Matyáš Jirát <[email protected]>
…or handling

1. Fix igaccounts sync action to match V1 behavior:
   - Filter to only return accounts that have instagram_business_account
   - Transform response to use Instagram Business Account ID as primary ID
   - Include fb_page_id pointing to the linked Facebook Page

2. Add graceful handling for 'Media Posted Before Business Account Conversion' error:
   - Detect error code 100 with subcode 2108006
   - Return empty data instead of failing (like V1 does)
   - Log warning message for visibility

Co-Authored-By: Matyáš Jirát <[email protected]>
…tch V1 behavior

- Make _is_business_conversion_error more robust like V1
- Check error phrase in JSON error fields (error_user_title, error_user_msg, message)
- Check raw response text as fallback
- Check exception message string as final fallback (like V1)
- Add better logging to show actual Facebook error response for debugging

Co-Authored-By: Matyáš Jirát <[email protected]>
Facebook sometimes returns pagination URLs with since pointing to 'now'
and no until parameter, which causes 'since param is not valid' errors.
This fix detects when since is very recent (within last hour) and there's
no until, and skips the pagination request since there's no more historical
data to fetch.

Co-Authored-By: Matyáš Jirát <[email protected]>
… and since is recent

The previous fix only handled the case where there was no until parameter.
But when we remove a future until, we also need to check if since is too
recent (within the last hour), because making a request with since=now
and no until causes 'since param is not valid' errors.

Co-Authored-By: Matyáš Jirát <[email protected]>
Handle recoverable errors gracefully instead of failing:
- 30-day limit exceeded: return empty data with warning
- Object does not exist / missing permissions: return empty data with warning
- Business account conversion: already handled

This allows old configs with 'since(30 days ago)' or stale accounts
to continue working with warnings instead of errors.

Co-Authored-By: Matyáš Jirát <[email protected]>
… guidance

- Move pagination timestamp warnings from warning to debug level
- Simplify recoverable error messages (remove full JSON response)
- Add actionable guidance for each error type:
  - 30-day limit: suggest changing to 29 days ago
  - Object not found: suggest removing account or re-running Add Account
- Add warning with guidance when FB Page accounts are filtered for IG queries

Co-Authored-By: Matyáš Jirát <[email protected]>
…eries

- Add validation in _build_params() that throws UserException when date range exceeds 30 days
- Only validates for Instagram insights queries (those with IG-specific metrics)
- Provides clear error message with actual date range and guidance to reduce to 29 days
- Keeps existing recoverable error handling as safety net for edge cases

Co-Authored-By: Matyáš Jirát <[email protected]>
Reduces log noise by moving internal token decision messages to debug level:
- 'Require page token' / 'Don't need page token'
- 'Batch request requires page token, falling back to individual requests'

Co-Authored-By: Matyáš Jirát <[email protected]>
The refactoring introduced a bug where _attempt_batch_processing returned
iter([]) instead of None when batch processing failed. Since iter([]) is
not None, the caller would skip individual processing entirely.

Changed both 'return iter([])' to 'return None' so that when batch
processing fails (empty response or HTTP error), the code falls back
to individual account processing as intended.

Co-Authored-By: Matyáš Jirát <[email protected]>
…to regular function

The previous implementation used 'yield' which made it a generator function.
In Python, generator functions always return a generator object (never None),
so the 'return None' statements didn't work as intended - the caller's
'if batch_results is not None' check was always true, causing individual
processing to be skipped entirely.

Changed to return a list instead of yielding, so None is properly returned
when batch processing should fall back to individual account processing.

Co-Authored-By: Matyáš Jirát <[email protected]>
…orking state

Reverts the changes from commits 209c203, 7fadd92, and 92d4dc7 by restoring
client.py and page_loader.py to their state at commit 88257f9.

The refactoring introduced a subtle Python bug where a generator function's
return None statements don't actually return None - they return a generator
object. This caused individual account processing to be skipped entirely.

Co-Authored-By: Matyáš Jirát <[email protected]>
…n queries

- Extend _build_params to parse DSL syntax: level, action_breakdowns,
  date_preset, time_increment, breakdowns, and fields list from curly braces
- Extend action breakdown detection to check both parameters and fields string
  (for DSL-style queries like insights.action_breakdowns(action_type))
- Extend _copy_common_fields to include metric fields (ad_name, impressions,
  clicks, spend, reach, etc.) for action breakdown queries

Co-Authored-By: Matyáš Jirát <[email protected]>
Remove extra metric fields (adset_name, frequency, cpc, cpm, cpp, ctr) from
_copy_common_fields to maintain backwards compatibility with existing V2 tables.
Only copy fields that already existed in V2 async tables per SUPPORT-14107:
ad_name, impressions, clicks, spend, reach.

Co-Authored-By: Matyáš Jirát <[email protected]>
Add account_id to the fields list when parsing DSL syntax to maintain
backwards compatibility with existing tables. Previously, when no explicit
fields list was set, the Facebook API returned account_id by default.
Now that we parse the DSL fields list, we need to explicitly include it.

Co-Authored-By: Matyáš Jirát <[email protected]>
- Extract _extract_page_content helper to DRY duplicated logic
- Add proper type hints (QueryRow, Account) for better IDE support
- Import IG_INSIGHTS_METRICS from page_loader instead of duplicating
- Standardize logging to use module-level logger
- Remove unused _handle_batch_request method
- Remove dead loop in _get_fb_graph_node

Co-Authored-By: Matyáš Jirát <[email protected]>
Restores _handle_batch_request method and loop in _get_fb_graph_node
that were removed in the previous refactoring commit.

Co-Authored-By: Matyáš Jirát <[email protected]>
Combines fixes from both PRs:
- PR #21 (SUPPORT-14412): Instagram accounts token handling, error handling, 30-day validation
- PR #22 (SUPPORT-14107): DSL parameter extraction for action breakdown queries

Resolves merge conflict in page_loader.py by including both:
1. DSL parameter extraction (level, action_breakdowns, date_preset, time_increment, breakdowns, fields list)
2. Instagram insights 30-day date range validation

Co-Authored-By: Matyáš Jirát <[email protected]>
@devin-ai-integration
Copy link
Contributor Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

devin-ai-integration bot and others added 3 commits December 17, 2025 12:52
The Instagram insights filtering was incorrectly skipping Facebook Ads accounts
(those with act_* prefix) because they use the same metrics (reach, impressions)
as Instagram. This fix adds a guard to skip the filtering when dealing with
Facebook Ads accounts.

Also improved the warning message to be more neutral (removed misleading
'Facebook Pages' text since act_* accounts are Ad accounts, not Pages).

Co-Authored-By: Matyáš Jirát <[email protected]>
For Facebook Pages, account.id IS the page ID, but fb_page_id may be None.
The previous logic only looked up page tokens using fb_page_id, which caused
Facebook Pages to fall back to user tokens that don't work for /feed endpoint.

This fix uses 'fb_page_id or account.id' as the lookup key, which:
- For Instagram Business Accounts: uses fb_page_id (linked Facebook Page)
- For Facebook Pages: uses account.id (which IS the page ID)

Also added debug logging when no page token is found to help diagnose issues.

Co-Authored-By: Matyáš Jirát <[email protected]>
Two issues fixed:

1. _get_pages_token() now paginates /me/accounts results. Previously it only
   fetched the first page, so users with many Facebook Pages would not get
   page tokens for pages beyond the first 25 results.

2. Reduced log spam by:
   - Logging a single warning for all accounts without page tokens (instead
     of per-account debug logs)
   - Avoiding retry with user token when the 'page token' that failed was
     already the user token (no point retrying with the same token)

This should fix the 'massive spam of problems' in logs for Facebook Pages v2
jobs where page tokens weren't being found due to pagination issues.

Co-Authored-By: Matyáš Jirát <[email protected]>
MAX_INSTAGRAM_INSIGHTS_DAYS = 30

# Instagram-specific metrics that indicate an IG insights query
IG_INSIGHTS_METRICS = ["follower_count", "reach", "impressions", "profile_views"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we sure that those metrics are exclusive for instagram insights and not other custom queries? they sounds quite general

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're absolutely right - reach and impressions are NOT exclusive to Instagram. We actually encountered this exact issue during testing!

What happened:

  • Job 152270357 (Facebook Ads) was incorrectly having its act_* accounts filtered out because the query used reach and impressions metrics
  • The _is_instagram_insights_query() function was returning true for Facebook Ads queries

Current mitigation:

  • We added an act_* prefix guard in _process_single_sync_query to skip Instagram filtering when any account has an act_ prefix (Facebook Ads accounts)

The underlying issue remains:

  • follower_count and profile_views ARE Instagram-specific
  • reach and impressions are used by Instagram, Facebook Ads, AND Facebook Pages

Possible solutions:

  1. Remove reach and impressions from IG_INSIGHTS_METRICS and only use truly IG-specific metrics (follower_count, profile_views)
  2. Change the filtering logic to only apply when the config accounts include IG-style entries (at least one has fb_page_id)
  3. Keep the current act_* guard as a heuristic

Option 2 might be the cleanest - instead of detecting IG queries by metrics, detect them by the presence of IG accounts in the config.

I'll defer to you and @matyas-jirat-keboola on the preferred approach. Happy to implement whichever solution you prefer.

return self.client.get_accounts("me/adaccounts", "account_id,id,business_name,name,currency")

@sync_action("igaccounts")
def run_ig_accounts_action(self) -> list[dict[str, Any]]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to change the format of the returned account, we need to also handle backward compatibility during the component run, which is saved with the old format of accounts. This might be the cleaner solution. here is example of such job: https://connection.us-east4.gcp.keboola.com/admin/projects/4214/queue/34751211

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! You're right that we need to handle backwards compatibility for existing configurations that have accounts saved in the old format.

Current state:

  • The igaccounts sync action now returns accounts in a new format: id = Instagram Business Account ID, fb_page_id = linked Facebook Page ID
  • Existing configs may have accounts saved in the old format (where id was the Facebook Page ID)

What needs to be done:

  • The runtime should accept both old and new format accounts and normalize them internally before running queries
  • This could be done in the Configuration model or in _process_single_sync_query where accounts are processed

I'll defer to you and @matyas-jirat-keboola on whether this should be addressed in this PR or as a follow-up. If you want me to add the backwards compatibility handling, I can do that.

- Add _is_action_breakdown_query() helper that handles both string and dict parameters
- Add _has_action_reaction_breakdown() helper for action_reaction specific detection
- Use urllib.parse.parse_qs for proper query string parsing with fallback
- Support DSL-style fields string (.action_breakdowns() syntax)
- Add missing fields to _copy_common_fields(extended=True): ad_name, adset_name, impressions, clicks, spend, reach

This fixes the issue where action breakdown queries were not properly detected
when parameters were in different formats, causing empty columns in output.

Co-Authored-By: [email protected] <[email protected]>
devin-ai-integration bot and others added 8 commits December 19, 2025 12:11
…elds

Instead of maintaining an explicit list of fields to copy, follow the
Clojure V1 approach: when extended=True, copy ALL scalar fields from
original_row to action rows.

This is simpler, more flexible, and automatically includes any new
fields that Facebook API returns without code changes.

Co-Authored-By: [email protected] <[email protected]>
- Remove _has_action_reaction_breakdown() helper entirely
- Remove DSL detection from _is_action_breakdown_query()
- Simplify _populate_action_row() to copy all fields from action data
- action_reaction is now included automatically if present in data

Co-Authored-By: [email protected] <[email protected]>
Customer config uses DSL-style fields (e.g., insights.action_breakdowns(action_type))
which requires DSL detection to properly route output to action rows.
Without this, output mapping fails with missing columns error.

Co-Authored-By: [email protected] <[email protected]>
Add parsing for DSL patterns in sync insights queries:
- .level(X) -> level=X
- .action_breakdowns(X) -> action_breakdowns=X
- .date_preset(X) -> date_preset=X
- .time_increment(X) -> time_increment=X
- {field1,field2,...} -> fields=field1,field2,...

Without this, sync queries using DSL syntax don't send the
required parameters to Facebook API, resulting in missing columns.

Co-Authored-By: [email protected] <[email protected]>
Replace individual .X() handlers with generic regex extractor that
automatically forwards any .X(Y) pattern as X=Y parameter.

Only special handling for:
- since/until: date conversion via get_past_date()
- metric: comma-separated list normalization

Any new DSL pattern (e.g. .filtering(), .breakdowns()) will be
automatically forwarded without code changes.

query.parameters still overrides DSL-derived values (existing behavior).

Co-Authored-By: [email protected] <[email protected]>
Facebook API doesn't return account_id when explicit fields are specified
in DSL. Following Clojure approach: derive account_id from the query
context (page_id) instead of relying on API response.

For Ads accounts (page_id starts with 'act_'), strip the prefix to get
the numeric account_id that output mapping expects.

Co-Authored-By: [email protected] <[email protected]>
- Use _is_action_breakdown_query() helper method instead of inline detection
- Copy all scalar fields for action breakdown queries (Clojure approach)
- Generic DSL regex extractor in page_loader.py
- Derive account_id from page_id for Ads accounts

Co-Authored-By: [email protected] <[email protected]>
…pty-list' into devin/1765969731-combined-support-14412-14107
devin-ai-integration bot and others added 5 commits December 19, 2025 15:36
- Pass component_id from Component to FacebookClient and PageLoader
- Add _is_instagram_component() method to scope IG-specific logic
- Guard account filtering to only run for Instagram component
- Guard 30-day validation to only run for Instagram component
- Guard recoverable error handling to only run for Instagram component

This prevents regressions in Facebook Ads and Facebook Pages components
that may use similar metrics (reach, impressions) but don't have the
same constraints as Instagram insights queries.

Co-Authored-By: [email protected] <[email protected]>
- Add unit tests for PageLoader._is_instagram_component()
- Add unit tests for FacebookClient._is_instagram_component()
- Add tests for 30-day validation (skipped for Ads/Pages, triggers for IG)
- Add tests for DSL parsing (action_breakdowns, breakdowns, time_increment, etc.)
- Add tests for account_id auto-inclusion in fields
- Add test data directories with sample configs for each component type

Co-Authored-By: [email protected] <[email protected]>
- Add KBC_HOST and separate test config IDs for Pages, Ads, Instagram
- Create separate tests-kbc jobs for each component
- Update push job dependencies to wait for respective tests
- Uses keboola/action-run-configs-parallel with tag override

Co-Authored-By: [email protected] <[email protected]>
- Add validate_test_outputs.sh script that checks:
  - Expected tables exist in output buckets
  - Required columns are present (id, name for accounts)
  - Table freshness (warns if data is stale)
- Add validation step after each tests-kbc-* job
- Validation runs via Storage API to verify test outputs

Co-Authored-By: [email protected] <[email protected]>
…m tests

- Add push-staging job that pushes all 3 component images to ECR (push_latest: false)
- Platform tests (tests-kbc-*) now depend on push-staging instead of build
- Final push jobs only run on release (is_deploy_ready == true) with push_latest: true
- This fixes 'Failed to get hash for image' error in platform tests

Co-Authored-By: [email protected] <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants