Use python Of trino Package execution insert overwrite, however overwrite But there is no effective problem
according to trino The official website of insert overwrite Of Opening mode , Turn on hive Of insert overwrite conversation , Make the current session insert into Statement support insert overwrite, That is, it supports the function of inserting data and updating according to partition coverage
But in use Python Code execution is never effective , It took a lot of time debug after , Finally, I found the reason. It was because I opened insert overwrite session The statement of does not execute fetchall() As a result, it did not take effect
The following is a code example of the execution
import trino
def insert_overwrite_query(trino, ...):
conn = trino.dbapi.connect()
cursor: Cursor = conn.cursor()
overwrite_sql = "SET SESSION hive.insert_existing_partitions_behavior = 'OVERWRITE'"
cursor.execute(overwrite_sql)
query = """insert into hive.schema.table ..."""
cursor.execute(query)
res = cursor.fetchall()
...
overwrite_sql
After execution, you need to execute cursor.fetchall()
Will take effect , I really don't understand this before sql Why fetchall()?fetchall() The function of is not executed only when data needs to be returned ?
After a search, I learned that fetchall() The real operating mechanism of
The other is python trino Medium fetchall() The role of the , After observing the source code , To determine the trino Query mechanism of
according to cursor.fetchall() Found that its data actually comes from a generator object TrinoResult
, stay fetchall() In fact, it was executed list(TrinoResult(...))
The query results are obtained ( In fact, you do list(cursor.execute(query))
Will also get fetchall() Same result )
class TrinoResult(object):
def __init__(self, query, rows=None, experimental_python_types: bool = False):
self._query = query
self._rows = rows or []
self._rownumber = 0
self._experimental_python_types = experimental_python_types
...
def __iter__(self):
# Initial fetch from the first POST request
for row in self._rows:
self._rownumber += 1
yield row
self._rows = None
# Subsequent fetches from GET requests until next_uri is empty.
while not self._query.finished:
rows = self._query.fetch()
for row in rows:
self._rownumber += 1
logger.debug("row %s", row)
if not self._experimental_python_types:
yield row
else:
yield self._map_to_python_types(row, self._query.columns)
All in all , stay session sql It needs to be executed once later fetchall() Will enable session Of sql take effect , But I always feel a little out of intuition