Skip to content

Move hive_cli_params to hook parameters #28101

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Dec 6, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Move hive_cli_params to hook parameters
Those parameters belong to the Hook, not to the connection
definition, so we should actually be able to specify them there.

You can also now specify ``hive_cli_params`` in the HiveOperator
and it will pass the parameters to the HiveCliHook created under
the hood.
  • Loading branch information
potiuk committed Dec 4, 2022
commit 46be2a53c2e55e3b9350276ee934d8d632ceb99f
9 changes: 9 additions & 0 deletions airflow/providers/apache/hive/CHANGELOG.rst
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,15 @@
Changelog
---------

5.0.0
.....

Breaking changes
~~~~~~~~~~~~~~~~

The ``hive_cli_params`` from connection were moved to the Hook. If you have extra parameters defined in your
connections as ``hive_cli_params`` extra, you should move them to the DAG where your HiveOperator is used.

4.1.1
.....

Expand Down
11 changes: 6 additions & 5 deletions airflow/providers/apache/hive/hooks/hive.py
Original file line number Diff line number Diff line change
Expand Up @@ -63,10 +63,8 @@ class HiveCliHook(BaseHook):
traditional CLI. To enable ``beeline``, set the use_beeline param in the
extra field of your connection as in ``{ "use_beeline": true }``

Note that you can also set default hive CLI parameters using the
``hive_cli_params`` to be used in your connection as in
``{"hive_cli_params": "-hiveconf mapred.job.tracker=some.jobtracker:444"}``
Parameters passed here can be overridden by run_cli's hive_conf param
Note that you can also set default hive CLI parameters by passing ``hive_cli_params``
space separated list of parameters to add to the hive command.

The extra connection parameter ``auth`` gets passed as in the ``jdbc``
connection string as is.
Expand All @@ -78,6 +76,8 @@ class HiveCliHook(BaseHook):
Possible settings include: VERY_HIGH, HIGH, NORMAL, LOW, VERY_LOW
:param mapred_job_name: This name will appear in the jobtracker.
This can make monitoring easier.
:param hive_cli_params: Space separated list of hive command parameters to add to the
hive command.
"""

conn_name_attr = "hive_cli_conn_id"
Expand All @@ -92,10 +92,11 @@ def __init__(
mapred_queue: str | None = None,
mapred_queue_priority: str | None = None,
mapred_job_name: str | None = None,
hive_cli_params: str = "",
) -> None:
super().__init__()
conn = self.get_connection(hive_cli_conn_id)
self.hive_cli_params: str = conn.extra_dejson.get("hive_cli_params", "")
self.hive_cli_params: str = hive_cli_params
self.use_beeline: bool = conn.extra_dejson.get("use_beeline", False)
self.auth = conn.extra_dejson.get("auth", "noSasl")
self.conn = conn
Expand Down
3 changes: 3 additions & 0 deletions airflow/providers/apache/hive/operators/hive.py
Original file line number Diff line number Diff line change
Expand Up @@ -87,6 +87,7 @@ def __init__(
mapred_queue: str | None = None,
mapred_queue_priority: str | None = None,
mapred_job_name: str | None = None,
hive_cli_params: str = "",
**kwargs: Any,
) -> None:
super().__init__(**kwargs)
Expand All @@ -102,6 +103,7 @@ def __init__(
self.mapred_queue = mapred_queue
self.mapred_queue_priority = mapred_queue_priority
self.mapred_job_name = mapred_job_name
self.hive_cli_params = hive_cli_params

job_name_template = conf.get_mandatory_value(
"hive",
Expand All @@ -124,6 +126,7 @@ def get_hook(self) -> HiveCliHook:
mapred_queue=self.mapred_queue,
mapred_queue_priority=self.mapred_queue_priority,
mapred_job_name=self.mapred_job_name,
hive_cli_params=self.hive_cli_params,
)

def prepare_template(self) -> None:
Expand Down
1 change: 1 addition & 0 deletions airflow/providers/apache/hive/provider.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ description: |
`Apache Hive <https://hive.apache.org/>`__

versions:
- 5.0.0
- 4.1.1
- 4.1.0
- 4.0.1
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -68,8 +68,6 @@ Extra (optional)
Specify the extra parameters (as json dictionary) that can be used in Hive CLI connection.
The following parameters are all optional:

* ``hive_cli_params``
Specify an object CLI params for use with Beeline CLI and Hive CLI.
* ``use_beeline``
Specify as ``True`` if using the Beeline CLI. Default is ``False``.
* ``auth``
Expand Down