Предисловие: мне известно о подобных вопросах по этой теме, и я безуспешно пытался их решить.
Я создаю скрипт Python, чтобы получить файл JSON с ftp-адреса, преобразовать его в кадр данных Pandas, а затем отправить его в MySQL для заполнения таблицы. Однако я столкнулся с проблемой с Pandas.to_sql. Ошибка:
UnicodeEncodeError: 'ascii' codec can't encode character '\xd1' in position 87: ordinal not in range(128)
Вот мой код:
import pandas as pd
import json
from sqlalchemy import create_engine
cnx = create_engine('mysql://username:[email protected]/database?charset=utf8',encoding='utf-8')
sl1 = pd.read_json('ftp://username:[email protected]/directory/sailings_cel.txt')
sl1 = pd.json_normalize(sl1["Dataset"])
sl1.to_sql(name='celsailingtemp',con=cnx,if_exists='append',index=True)
Потенциально полезная информация:
- Я пробовал несколько перестановок, чтобы убедиться, что для кодировки установлено значение utf-8 в инструкции create_engine, в соответствии с аналогичными вопросами в StackOverflow.
- Файл JSON предоставляется на FTP-сервере с расширением .txt, но я думаю, что это должно быть в порядке, поскольку он отлично обрабатывается read_json и json_normalize..
- Согласно https://www.utf8-chartable.de/unicode-utf8-table.pl?unicodeinhtml=hex, похоже, что xd1 может быть символом Ñ. Разумеется, нарушителя можно найти в колонке маршрута.
- Говоря о столбцах, структура данных JSON показана ниже. Установка уровня json_normalize в Dataset переводит вложенный столбец маршрута в длинную строку, что меня устраивает. Я установил для этого столбца тип JSON на стороне mysql.
- Структура таблицы MySQL выглядит следующим образом:
- Я, вероятно, мог бы просто вырезать столбец маршрута, но проблема с основной кодировкой не была бы решена, и я хочу сделать это правильно.
Любое руководство по этому вопросу будет высоко оценено!
РЕДАКТИРОВАТЬ: Трассировка:
Traceback (most recent call last):
File "c:/pythondir/debug.py", line 14, in <module>
sl1.to_sql(name='celsailingtemp',con=cnx,if_exists='append',index=True)
File "C:\pythondir\.venv\lib\site-packages\pandas\core\generic.py", line 2602, in to_sql
sql.to_sql(
File "C:\pythondir\.venv\lib\site-packages\pandas\io\sql.py", line 589, in to_sql
pandas_sql.to_sql(
File "C:\pythondir\.venv\lib\site-packages\pandas\io\sql.py", line 1398, in to_sql
table.insert(chunksize, method=method)
File "C:\pythondir\.venv\lib\site-packages\pandas\io\sql.py", line 830, in insert
exec_insert(conn, keys, chunk_iter)
File "C:\pythondir\.venv\lib\site-packages\pandas\io\sql.py", line 747, in _execute_insert
conn.execute(self.table.insert(), data)
File "C:\pythondir\.venv\lib\site-packages\sqlalchemy\engine\base.py", line 1014, in execute
return meth(self, multiparams, params)
File "C:\pythondir\.venv\lib\site-packages\sqlalchemy\sql\elements.py", line 298, in _execute_on_connection
return connection._execute_clauseelement(self, multiparams, params)
File "C:\pythondir\.venv\lib\site-packages\sqlalchemy\engine\base.py", line 1127, in _execute_clauseelement
ret = self._execute_context(
File "C:\pythondir\.venv\lib\site-packages\sqlalchemy\engine\base.py", line 1317, in _execute_context
self._handle_dbapi_exception(
File "C:\pythondir\.venv\lib\site-packages\sqlalchemy\engine\base.py", line 1515, in _handle_dbapi_exception
util.raise_(exc_info[1], with_traceback=exc_info[2])
File "C:\pythondir\.venv\lib\site-packages\sqlalchemy\util\compat.py", line 178, in raise_
raise exception
File "C:\pythondir\.venv\lib\site-packages\sqlalchemy\engine\base.py", line 1257, in _execute_context
self.dialect.do_executemany(
File "C:\pythondir\.venv\lib\site-packages\sqlalchemy\dialects\mysql\mysqldb.py", line 148, in do_executemany
rowcount = cursor.executemany(statement, parameters)
File "C:\pythondir\.venv\lib\site-packages\MySQLdb\cursors.py", line 230, in executemany
return self._do_execute_many(
File "C:\pythondir\.venv\lib\site-packages\MySQLdb\cursors.py", line 255, in _do_execute_many
v = values % escape(next(args), conn)
File "C:\pythondir\.venv\lib\site-packages\MySQLdb\cursors.py", line 110, in _escape_args
ret = tuple(literal(ensure_bytes(arg)) for arg in args)
File "C:\pythondir\.venv\lib\site-packages\MySQLdb\cursors.py", line 110, in <genexpr>
ret = tuple(literal(ensure_bytes(arg)) for arg in args)
File "C:\pythondir\.venv\lib\site-packages\MySQLdb\connections.py", line 286, in literal
s = self._tuple_literal(o)
File "C:\pythondir\.venv\lib\site-packages\MySQLdb\connections.py", line 269, in _tuple_literal
return b"(%s)" % (b",".join(map(self.literal, t)))
File "C:\pythondir\.venv\lib\site-packages\MySQLdb\connections.py", line 288, in literal
s = self.escape(o, self.encoders)
UnicodeEncodeError: 'ascii' codec can't encode character '\xd1' in position 87: ordinal not in range(128)
Фрагмент JSON, включающий оскорбительный символ:
{
"Table": "Cruise Sailing - Celebrity Cruises (CEL)",
"Number_of_CEL_sailings": 1032,
"Dataset": [
{
"SailingID": "117624",
"SailingPlanID": "70561",
"SailingPlanCode": "07G019",
"MarketName": "[United States, Canada, United Kingdom, Ireland, Australia, Spain, German, Portugal, Switzerland, New Zealand, United Arab Emirates, Mexico]",
"MarketNameList": [
"United States",
"Canada",
"United Kingdom",
"Ireland",
"Australia",
"Spain",
"German",
"Portugal",
"Switzerland",
"New Zealand",
"United Arab Emirates",
"Mexico"
],
"ShipID": "1373927",
"ShipName": "Celebrity Flora",
"DepartureDate": "2020-08-23",
"Duration": "7",
"PackageID": "FL07G019",
"SailingName": "7 NIGHT GALAPAGOS INNER LOOP",
"DestinationID": "615",
"DeparturePortCode": "BAL",
"DeparturePortID": "1203",
"ReturnPortCode": "BAL",
"ReturnPortID": "1203",
"PackageTypeID": "0",
"TourOnly": "no",
"Segment": "no",
"Itinerary": [
{
"DayNumber": 1,
"PortID": "1203",
"PortCode": "BAL",
"Description": "BALTRA (GALAPAGOS)",
"Language": "en_US",
"ArrivalTime": "--",
"DepartureTime": "2:30 PM"
},
{
"DayNumber": 1,
"PortID": "3781",
"PortCode": "DAP",
"Description": "DAPHNE ISLAND",
"Language": "en_US",
"ArrivalTime": "5:00 PM",
"DepartureTime": "6:00 PM"
},
{
"DayNumber": 2,
"PortID": "1212",
"PortCode": "RAB",
"Description": "RABIDA (GALAPAGOS)",
"Language": "en_US",
"ArrivalTime": "3:00 PM",
"DepartureTime": "7:00 PM"
},
{
"DayNumber": 2,
"PortID": "1211",
"PortCode": "PGA",
"Description": "EGAS PORT",
"Language": "en_US",
"ArrivalTime": "7:00 AM",
"DepartureTime": "12:00 PM"
},
{
"DayNumber": 2,
"PortID": "1212",
"PortCode": "RAB",
"Description": "RABIDA (GALAPAGOS)",
"Language": "en_US",
"ArrivalTime": "3:00 PM",
"DepartureTime": "7:00 PM"
},
{
"DayNumber": 3,
"PortID": "1656",
"PortCode": "EBI",
"Description": "ELIZABETH BAY",
"Language": "en_US",
"ArrivalTime": "7:00 AM",
"DepartureTime": "12:00 PM"
},
{
"DayNumber": 3,
"PortID": "1209",
"PortCode": "CET",
"Description": "TAGUS COVE (ISABELLA)",
"Language": "en_US",
"ArrivalTime": "3:00 PM",
"DepartureTime": "7:00 PM"
},
{
"DayNumber": 4,
"PortID": "1211",
"PortCode": "SLV",
"Description": "SULLIVAN BAY (SANTIAGO)",
"Language": "en_US",
"ArrivalTime": "7:00 AM",
"DepartureTime": "12:00 PM"
},
{
"DayNumber": 4,
"PortID": "1207",
"PortCode": "BAR",
"Description": "BARTOLOME ISLAND",
"Language": "en_US",
"ArrivalTime": "1:00 PM",
"DepartureTime": "7:00 PM"
},
{
"DayNumber": 5,
"PortID": "1208",
"PortCode": "LSB",
"Description": "LAS BACHAS - (SANTA CRUZ)",
"Language": "en_US",
"ArrivalTime": "7:00 AM",
"DepartureTime": "12:00 PM"
},
{
"DayNumber": 5,
"PortID": "1204",
"PortCode": "NSY",
"Description": "NORTH SEYMOUR ISLAND",
"Language": "en_US",
"ArrivalTime": "1:00 PM",
"DepartureTime": "9:30 PM"
},
{
"DayNumber": 6,
"PortID": "6167",
"PortCode": "SCC",
"Description": "PUERTO BAQUERIZO (CRISTOBAL)",
"Language": "en_US",
"ArrivalTime": "7:00 AM",
"DepartureTime": "12:00 PM"
},
{
"DayNumber": 6,
"PortID": "6168",
"PortCode": "PSC",
"Description": "PUNTA PITT (SAN CRISTOBAL)",
"Language": "en_US",
"ArrivalTime": "3:30 PM",
"DepartureTime": "6:00 PM"
},
{
"DayNumber": 7,
"PortID": "1536",
"PortCode": "PTA",
"Description": "PUERTO AYORA (SANTA CRUZ)",
"Language": "en_US",
"ArrivalTime": "7:00 AM",
"DepartureTime": "7:00 PM"
},
{
"DayNumber": 8,
"PortID": "1203",
"PortCode": "BAL",
"Description": "BALTRA (GALAPAGOS)",
"Language": "en_US",
"ArrivalTime": "7:00 AM",
"DepartureTime": "--"
}
]
},
{
"SailingID": "117625",
"SailingPlanID": "70560",
"SailingPlanCode": "07G020",
"MarketName": "[United States, Canada, United Kingdom, Ireland, Australia, Spain, German, Portugal, Switzerland, New Zealand, United Arab Emirates, Mexico]",
"MarketNameList": [
"United States",
"Canada",
"United Kingdom",
"Ireland",
"Australia",
"Spain",
"German",
"Portugal",
"Switzerland",
"New Zealand",
"United Arab Emirates",
"Mexico"
],
"ShipID": "1373927",
"ShipName": "Celebrity Flora",
"DepartureDate": "2020-08-30",
"Duration": "7",
"PackageID": "FL07G020",
"SailingName": "7 NIGHT GALAPAGOS OUTER LOOP",
"DestinationID": "615",
"DeparturePortCode": "BAL",
"DeparturePortID": "1203",
"ReturnPortCode": "BAL",
"ReturnPortID": "1203",
"PackageTypeID": "0",
"TourOnly": "no",
"Segment": "no",
"Itinerary": [
{
"DayNumber": 1,
"PortID": "1203",
"PortCode": "BAL",
"Description": "BALTRA (GALAPAGOS)",
"Language": "en_US",
"ArrivalTime": "--",
"DepartureTime": "2:30 PM"
},
{
"DayNumber": 1,
"PortID": "3781",
"PortCode": "DAP",
"Description": "DAPHNE ISLAND",
"Language": "en_US",
"ArrivalTime": "5:00 PM",
"DepartureTime": "6:00 PM"
},
{
"DayNumber": 2,
"PortID": "3020",
"PortCode": "GRB",
"Description": "GARDNER BAY (ESPAÑOLA)",
"Language": "en_US",
"ArrivalTime": "7:00 AM",
"DepartureTime": "2:00 PM"
}
]
}
]
}
DDL для celsailingtemp:
CREATE TABLE `celsailingtemp` (
`index` int(11) NOT NULL,
`sailingplanid` int(11) DEFAULT NULL,
`sailingid` int(11) DEFAULT NULL,
`sailingplancode` varchar(45) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci DEFAULT NULL,
`marketname` json DEFAULT NULL,
`marketnamelist` json DEFAULT NULL,
`shipid` int(11) DEFAULT NULL,
`shipname` varchar(45) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci DEFAULT NULL,
`departuredate` date DEFAULT NULL,
`duration` int(11) DEFAULT NULL,
`packageid` varchar(45) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci DEFAULT NULL,
`sailingname` varchar(45) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci DEFAULT NULL,
`destinationid` int(11) DEFAULT NULL,
`departureportcode` varchar(45) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci DEFAULT NULL,
`departureportid` int(11) DEFAULT NULL,
`returnportcode` varchar(45) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci DEFAULT NULL,
`returnportid` int(11) DEFAULT NULL,
`packagetypeid` int(11) DEFAULT NULL,
`touronly` varchar(45) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci DEFAULT NULL,
`segment` varchar(45) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci DEFAULT NULL,
`itinerary` json DEFAULT NULL,
PRIMARY KEY (`index`),
UNIQUE KEY `index_UNIQUE` (`index`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8