-
Notifications
You must be signed in to change notification settings - Fork 0
fix: combine SUPPORT-14412 and SUPPORT-14107 fixes #23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
fix: combine SUPPORT-14412 and SUPPORT-14107 fixes #23
Conversation
…thout fb_page_id Instagram Business Account IDs (17 digits starting with '17841') no longer require fb_page_id to be set. The user access token is sufficient for Instagram insights queries, removing the dependency on page token lookup. This fixes the 400 Bad Request errors that occurred when Instagram accounts were missing the fb_page_id field in their configuration. Related: SUPPORT-14412, SUPPORT-14419 Co-Authored-By: Matyáš Jirát <[email protected]>
Simplified the token selection logic: - If account has fb_page_id: look up page token using the Facebook Page ID - If account has no fb_page_id: use user token directly This approach is more explicit and doesn't rely on undocumented ID patterns. Removed the _is_instagram_business_account_id() heuristic method. Co-Authored-By: Matyáš Jirát <[email protected]>
…ts queries For Instagram-specific insights queries (using metrics like follower_count, reach), only process accounts that have fb_page_id set (Instagram Business Account entries). This prevents calling Instagram metrics on Facebook Page IDs which causes 400 errors. Co-Authored-By: Matyáš Jirát <[email protected]>
…or handling 1. Fix igaccounts sync action to match V1 behavior: - Filter to only return accounts that have instagram_business_account - Transform response to use Instagram Business Account ID as primary ID - Include fb_page_id pointing to the linked Facebook Page 2. Add graceful handling for 'Media Posted Before Business Account Conversion' error: - Detect error code 100 with subcode 2108006 - Return empty data instead of failing (like V1 does) - Log warning message for visibility Co-Authored-By: Matyáš Jirát <[email protected]>
Co-Authored-By: Matyáš Jirát <[email protected]>
…tch V1 behavior - Make _is_business_conversion_error more robust like V1 - Check error phrase in JSON error fields (error_user_title, error_user_msg, message) - Check raw response text as fallback - Check exception message string as final fallback (like V1) - Add better logging to show actual Facebook error response for debugging Co-Authored-By: Matyáš Jirát <[email protected]>
Facebook sometimes returns pagination URLs with since pointing to 'now' and no until parameter, which causes 'since param is not valid' errors. This fix detects when since is very recent (within last hour) and there's no until, and skips the pagination request since there's no more historical data to fetch. Co-Authored-By: Matyáš Jirát <[email protected]>
… and since is recent The previous fix only handled the case where there was no until parameter. But when we remove a future until, we also need to check if since is too recent (within the last hour), because making a request with since=now and no until causes 'since param is not valid' errors. Co-Authored-By: Matyáš Jirát <[email protected]>
Handle recoverable errors gracefully instead of failing: - 30-day limit exceeded: return empty data with warning - Object does not exist / missing permissions: return empty data with warning - Business account conversion: already handled This allows old configs with 'since(30 days ago)' or stale accounts to continue working with warnings instead of errors. Co-Authored-By: Matyáš Jirát <[email protected]>
… guidance - Move pagination timestamp warnings from warning to debug level - Simplify recoverable error messages (remove full JSON response) - Add actionable guidance for each error type: - 30-day limit: suggest changing to 29 days ago - Object not found: suggest removing account or re-running Add Account - Add warning with guidance when FB Page accounts are filtered for IG queries Co-Authored-By: Matyáš Jirát <[email protected]>
…eries - Add validation in _build_params() that throws UserException when date range exceeds 30 days - Only validates for Instagram insights queries (those with IG-specific metrics) - Provides clear error message with actual date range and guidance to reduce to 29 days - Keeps existing recoverable error handling as safety net for edge cases Co-Authored-By: Matyáš Jirát <[email protected]>
Reduces log noise by moving internal token decision messages to debug level: - 'Require page token' / 'Don't need page token' - 'Batch request requires page token, falling back to individual requests' Co-Authored-By: Matyáš Jirát <[email protected]>
The refactoring introduced a bug where _attempt_batch_processing returned iter([]) instead of None when batch processing failed. Since iter([]) is not None, the caller would skip individual processing entirely. Changed both 'return iter([])' to 'return None' so that when batch processing fails (empty response or HTTP error), the code falls back to individual account processing as intended. Co-Authored-By: Matyáš Jirát <[email protected]>
…to regular function The previous implementation used 'yield' which made it a generator function. In Python, generator functions always return a generator object (never None), so the 'return None' statements didn't work as intended - the caller's 'if batch_results is not None' check was always true, causing individual processing to be skipped entirely. Changed to return a list instead of yielding, so None is properly returned when batch processing should fall back to individual account processing. Co-Authored-By: Matyáš Jirát <[email protected]>
…orking state Reverts the changes from commits 209c203, 7fadd92, and 92d4dc7 by restoring client.py and page_loader.py to their state at commit 88257f9. The refactoring introduced a subtle Python bug where a generator function's return None statements don't actually return None - they return a generator object. This caused individual account processing to be skipped entirely. Co-Authored-By: Matyáš Jirát <[email protected]>
…n queries - Extend _build_params to parse DSL syntax: level, action_breakdowns, date_preset, time_increment, breakdowns, and fields list from curly braces - Extend action breakdown detection to check both parameters and fields string (for DSL-style queries like insights.action_breakdowns(action_type)) - Extend _copy_common_fields to include metric fields (ad_name, impressions, clicks, spend, reach, etc.) for action breakdown queries Co-Authored-By: Matyáš Jirát <[email protected]>
Remove extra metric fields (adset_name, frequency, cpc, cpm, cpp, ctr) from _copy_common_fields to maintain backwards compatibility with existing V2 tables. Only copy fields that already existed in V2 async tables per SUPPORT-14107: ad_name, impressions, clicks, spend, reach. Co-Authored-By: Matyáš Jirát <[email protected]>
Add account_id to the fields list when parsing DSL syntax to maintain backwards compatibility with existing tables. Previously, when no explicit fields list was set, the Facebook API returned account_id by default. Now that we parse the DSL fields list, we need to explicitly include it. Co-Authored-By: Matyáš Jirát <[email protected]>
- Extract _extract_page_content helper to DRY duplicated logic - Add proper type hints (QueryRow, Account) for better IDE support - Import IG_INSIGHTS_METRICS from page_loader instead of duplicating - Standardize logging to use module-level logger - Remove unused _handle_batch_request method - Remove dead loop in _get_fb_graph_node Co-Authored-By: Matyáš Jirát <[email protected]>
Restores _handle_batch_request method and loop in _get_fb_graph_node that were removed in the previous refactoring commit. Co-Authored-By: Matyáš Jirát <[email protected]>
Combines fixes from both PRs: - PR #21 (SUPPORT-14412): Instagram accounts token handling, error handling, 30-day validation - PR #22 (SUPPORT-14107): DSL parameter extraction for action breakdown queries Resolves merge conflict in page_loader.py by including both: 1. DSL parameter extraction (level, action_breakdowns, date_preset, time_increment, breakdowns, fields list) 2. Instagram insights 30-day date range validation Co-Authored-By: Matyáš Jirát <[email protected]>
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
The Instagram insights filtering was incorrectly skipping Facebook Ads accounts (those with act_* prefix) because they use the same metrics (reach, impressions) as Instagram. This fix adds a guard to skip the filtering when dealing with Facebook Ads accounts. Also improved the warning message to be more neutral (removed misleading 'Facebook Pages' text since act_* accounts are Ad accounts, not Pages). Co-Authored-By: Matyáš Jirát <[email protected]>
For Facebook Pages, account.id IS the page ID, but fb_page_id may be None. The previous logic only looked up page tokens using fb_page_id, which caused Facebook Pages to fall back to user tokens that don't work for /feed endpoint. This fix uses 'fb_page_id or account.id' as the lookup key, which: - For Instagram Business Accounts: uses fb_page_id (linked Facebook Page) - For Facebook Pages: uses account.id (which IS the page ID) Also added debug logging when no page token is found to help diagnose issues. Co-Authored-By: Matyáš Jirát <[email protected]>
Two issues fixed:
1. _get_pages_token() now paginates /me/accounts results. Previously it only
fetched the first page, so users with many Facebook Pages would not get
page tokens for pages beyond the first 25 results.
2. Reduced log spam by:
- Logging a single warning for all accounts without page tokens (instead
of per-account debug logs)
- Avoiding retry with user token when the 'page token' that failed was
already the user token (no point retrying with the same token)
This should fix the 'massive spam of problems' in logs for Facebook Pages v2
jobs where page tokens weren't being found due to pagination issues.
Co-Authored-By: Matyáš Jirát <[email protected]>
| MAX_INSTAGRAM_INSIGHTS_DAYS = 30 | ||
|
|
||
| # Instagram-specific metrics that indicate an IG insights query | ||
| IG_INSIGHTS_METRICS = ["follower_count", "reach", "impressions", "profile_views"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are we sure that those metrics are exclusive for instagram insights and not other custom queries? they sounds quite general
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're absolutely right - reach and impressions are NOT exclusive to Instagram. We actually encountered this exact issue during testing!
What happened:
- Job 152270357 (Facebook Ads) was incorrectly having its
act_*accounts filtered out because the query usedreachandimpressionsmetrics - The
_is_instagram_insights_query()function was returningtruefor Facebook Ads queries
Current mitigation:
- We added an
act_*prefix guard in_process_single_sync_queryto skip Instagram filtering when any account has anact_prefix (Facebook Ads accounts)
The underlying issue remains:
follower_countandprofile_viewsARE Instagram-specificreachandimpressionsare used by Instagram, Facebook Ads, AND Facebook Pages
Possible solutions:
- Remove
reachandimpressionsfromIG_INSIGHTS_METRICSand only use truly IG-specific metrics (follower_count,profile_views) - Change the filtering logic to only apply when the config accounts include IG-style entries (at least one has
fb_page_id) - Keep the current
act_*guard as a heuristic
Option 2 might be the cleanest - instead of detecting IG queries by metrics, detect them by the presence of IG accounts in the config.
I'll defer to you and @matyas-jirat-keboola on the preferred approach. Happy to implement whichever solution you prefer.
| return self.client.get_accounts("me/adaccounts", "account_id,id,business_name,name,currency") | ||
|
|
||
| @sync_action("igaccounts") | ||
| def run_ig_accounts_action(self) -> list[dict[str, Any]]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to change the format of the returned account, we need to also handle backward compatibility during the component run, which is saved with the old format of accounts. This might be the cleaner solution. here is example of such job: https://connection.us-east4.gcp.keboola.com/admin/projects/4214/queue/34751211
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! You're right that we need to handle backwards compatibility for existing configurations that have accounts saved in the old format.
Current state:
- The
igaccountssync action now returns accounts in a new format:id= Instagram Business Account ID,fb_page_id= linked Facebook Page ID - Existing configs may have accounts saved in the old format (where
idwas the Facebook Page ID)
What needs to be done:
- The runtime should accept both old and new format accounts and normalize them internally before running queries
- This could be done in the
Configurationmodel or in_process_single_sync_querywhere accounts are processed
I'll defer to you and @matyas-jirat-keboola on whether this should be addressed in this PR or as a follow-up. If you want me to add the backwards compatibility handling, I can do that.
- Add _is_action_breakdown_query() helper that handles both string and dict parameters - Add _has_action_reaction_breakdown() helper for action_reaction specific detection - Use urllib.parse.parse_qs for proper query string parsing with fallback - Support DSL-style fields string (.action_breakdowns() syntax) - Add missing fields to _copy_common_fields(extended=True): ad_name, adset_name, impressions, clicks, spend, reach This fixes the issue where action breakdown queries were not properly detected when parameters were in different formats, causing empty columns in output. Co-Authored-By: [email protected] <[email protected]>
…elds Instead of maintaining an explicit list of fields to copy, follow the Clojure V1 approach: when extended=True, copy ALL scalar fields from original_row to action rows. This is simpler, more flexible, and automatically includes any new fields that Facebook API returns without code changes. Co-Authored-By: [email protected] <[email protected]>
- Remove _has_action_reaction_breakdown() helper entirely - Remove DSL detection from _is_action_breakdown_query() - Simplify _populate_action_row() to copy all fields from action data - action_reaction is now included automatically if present in data Co-Authored-By: [email protected] <[email protected]>
Customer config uses DSL-style fields (e.g., insights.action_breakdowns(action_type)) which requires DSL detection to properly route output to action rows. Without this, output mapping fails with missing columns error. Co-Authored-By: [email protected] <[email protected]>
Add parsing for DSL patterns in sync insights queries:
- .level(X) -> level=X
- .action_breakdowns(X) -> action_breakdowns=X
- .date_preset(X) -> date_preset=X
- .time_increment(X) -> time_increment=X
- {field1,field2,...} -> fields=field1,field2,...
Without this, sync queries using DSL syntax don't send the
required parameters to Facebook API, resulting in missing columns.
Co-Authored-By: [email protected] <[email protected]>
Replace individual .X() handlers with generic regex extractor that automatically forwards any .X(Y) pattern as X=Y parameter. Only special handling for: - since/until: date conversion via get_past_date() - metric: comma-separated list normalization Any new DSL pattern (e.g. .filtering(), .breakdowns()) will be automatically forwarded without code changes. query.parameters still overrides DSL-derived values (existing behavior). Co-Authored-By: [email protected] <[email protected]>
Facebook API doesn't return account_id when explicit fields are specified in DSL. Following Clojure approach: derive account_id from the query context (page_id) instead of relying on API response. For Ads accounts (page_id starts with 'act_'), strip the prefix to get the numeric account_id that output mapping expects. Co-Authored-By: [email protected] <[email protected]>
- Use _is_action_breakdown_query() helper method instead of inline detection - Copy all scalar fields for action breakdown queries (Clojure approach) - Generic DSL regex extractor in page_loader.py - Derive account_id from page_id for Ads accounts Co-Authored-By: [email protected] <[email protected]>
…pty-list' into devin/1765969731-combined-support-14412-14107
- Pass component_id from Component to FacebookClient and PageLoader - Add _is_instagram_component() method to scope IG-specific logic - Guard account filtering to only run for Instagram component - Guard 30-day validation to only run for Instagram component - Guard recoverable error handling to only run for Instagram component This prevents regressions in Facebook Ads and Facebook Pages components that may use similar metrics (reach, impressions) but don't have the same constraints as Instagram insights queries. Co-Authored-By: [email protected] <[email protected]>
- Add unit tests for PageLoader._is_instagram_component() - Add unit tests for FacebookClient._is_instagram_component() - Add tests for 30-day validation (skipped for Ads/Pages, triggers for IG) - Add tests for DSL parsing (action_breakdowns, breakdowns, time_increment, etc.) - Add tests for account_id auto-inclusion in fields - Add test data directories with sample configs for each component type Co-Authored-By: [email protected] <[email protected]>
- Add KBC_HOST and separate test config IDs for Pages, Ads, Instagram - Create separate tests-kbc jobs for each component - Update push job dependencies to wait for respective tests - Uses keboola/action-run-configs-parallel with tag override Co-Authored-By: [email protected] <[email protected]>
- Add validate_test_outputs.sh script that checks: - Expected tables exist in output buckets - Required columns are present (id, name for accounts) - Table freshness (warns if data is stale) - Add validation step after each tests-kbc-* job - Validation runs via Storage API to verify test outputs Co-Authored-By: [email protected] <[email protected]>
…m tests - Add push-staging job that pushes all 3 component images to ECR (push_latest: false) - Platform tests (tests-kbc-*) now depend on push-staging instead of build - Final push jobs only run on release (is_deploy_ready == true) with push_latest: true - This fixes 'Failed to get hash for image' error in platform tests Co-Authored-By: [email protected] <[email protected]>
Summary
This PR merges multiple fix branches into a single branch for coordinated release:
From PR #21 (SUPPORT-14412): Instagram V2 parity fixes including token selection logic based on
fb_page_idpresence, filtering Facebook Page IDs from Instagram insights queries, graceful error handling for recoverable errors (business conversion, 30-day limit, object not found), improved pagination timestamp handling, and proactive 30-day date range validation.From PR #22 (SUPPORT-14107): Facebook Ads action breakdown query fixes including DSL parameter extraction (
level,action_breakdowns,date_preset,time_increment,breakdowns, fields list) and ensuringaccount_idis always included for backwards compatibility.From PR #18: Fix for Pydantic validation error when UI sends empty list
[]foraccountsfield instead of empty dict{}.A merge conflict in
page_loader.pywas resolved by including both sets of changes to_build_params: DSL parameter extraction runs first, followed by the 30-day validation call.Updates since last revision
Fixed CI workflow ordering (critical fix):
push-stagingjob that pushes all 3 component images to ECR withpush_latest: falseAFTER unit tests passtests-kbc-*) now depend onpush-staginginstead ofbuildis_deploy_ready == 'true') withpush_latest: truescripts/validate_test_outputs.shto validate output tables exist after platform testsPrevious updates:
component_idfromComponenttoFacebookClientandPageLoaderconstructorsReview & Testing Checklist for Human
KBC_STORAGE_TOKENsecret is configured in GitHub repository settings for project 5011build→tests→push-staging→tests-kbc-*→ (on release)push-*→deployact_*accounts are NOT filtered out and no 30-day validation errors occurRecommended test plan:
Notes
The workflow now has two push phases:
push-staging: Runs after unit tests, pushes branch tag withpush_latest: false(enables platform tests)push-*(Final): Runs only on release tags after platform tests pass, withpush_latest: trueThe platform test configs are in project 5011 with hardcoded config IDs in the workflow. If these configs are deleted or modified, CI will fail. The test configs require valid Meta OAuth tokens which may expire.
The
component_idis propagated through three layers:Component→FacebookClient→PageLoader. All threePageLoaderinstantiation sites inclient.pyhave been updated to passcomponent_id.Related Jira tickets: SUPPORT-14412, SUPPORT-14107
Link to Devin run: https://app.devin.ai/sessions/4ceb3ae74a734baaa45aa06f601c3550
Requested by: [email protected] (@ZdenekSrotyr)
Previous Devin session: https://app.devin.ai/sessions/d820bdbb1686427f81b3777a37f3fc2d
Previous Devin session: https://app.devin.ai/sessions/936c7440de3e444ab6b4152c9ec60443
Previous requester: Matyáš Jirát (@matyas-jirat-keboola)