How to Use JSON in
MySQL Wrong
Bill Karwin, Square Inc.
Santa Clara, California | April 23th – 25th, 2018
2
Me
• Database Developer at Square Inc.
• MySQL Quality Contributor
• Oracle Ace Director
• Author of "SQL Antipatterns:
Avoiding the Pitfalls of Database Programming"
3
Outline
• Why JSON?
• How do we load JSON data?
• What about LOAD JSON INFILE?
• What about performance?
• What’s that about “generated” columns?
• What about searching multi-valued attributes?
• What about storage size?
• What about client interfaces?
• How to Use JSON in MySQL Right
4
Why JSON?
5
Interest in JSON Is Growing
0.00%
0.10%
0.20%
0.30%
0.40%
0.50%
0.60%
0.70%
0.80%
1-Aug-2008
1-Nov-2008
1-Feb-2009
1-May-2009
1-Aug-2009
1-Nov-2009
1-Feb-2010
1-May-2010
1-Aug-2010
1-Nov-2010
1-Feb-2011
1-May-2011
1-Aug-2011
1-Nov-2011
1-Feb-2012
1-May-2012
1-Aug-2012
1-Nov-2012
1-Feb-2013
1-May-2013
1-Aug-2013
1-Nov-2013
1-Feb-2014
1-May-2014
1-Aug-2014
1-Nov-2014
1-Feb-2015
1-May-2015
1-Aug-2015
1-Nov-2015
1-Feb-2016
1-May-2016
1-Aug-2016
1-Nov-2016
1-Feb-2017
1-May-2017
1-Aug-2017
1-Nov-2017
1-Feb-2018
Stack Overflow: Percent of MySQL QuestionsTagged with JSON
https://data.stackexchange.com/stackoverflow/query/834289/mysql-and-json-tags-by-month
Node.js
ECMA-404
MySQL 5.7
JSON type
6
Why JSON?
• Portable data interchange format
• Easy for humans to read
• Easy for code to use
• It’s not XML
• Flexible schema in an SQL database
• Semi-structured data
• Like a document database
• No more ALTER TABLE?
7
How do we load JSON data?
8
Test Data
• Data dump for dba.StackExchange.com
• 987MB of XML
• "All user content contributed to the Stack Exchange network is cc-by-sa 3.0
licensed, intended to be shared and remixed."
• LOAD XML INFILE to import data
• Then copy into equivalent JSON tables
• Let’s see what trouble we find!
Test Data: https://archive.org/details/stackexchange
My code: https://github.com/billkarwin/bk-tools/tree/master/stackexchange
9
Table: Posts in Traditional Columns
CREATE TABLE Posts (
Id INT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
PostTypeId TINYINT UNSIGNED NOT NULL,
AcceptedAnswerId INT UNSIGNED NULL COMMENT 'if PostTypeId=1',
ParentId INT UNSIGNED NULL COMMENT 'if PostTypeId=2',
CreationDate DATETIME NOT NULL,
Score SMALLINT NOT NULL DEFAULT 0,
ViewCount INT UNSIGNED NOT NULL DEFAULT 0,
Body TEXT NOT NULL,
OwnerUserId INT NULL,
LastEditorUserId INT NULL,
LastEditDate DATETIME NULL,
LastActivityDate DATETIME NULL,
Title TINYTEXT NOT NULL,
Tags TINYTEXT NOT NULL,
AnswerCount SMALLINT UNSIGNED NOT NULL DEFAULT 0,
CommentCount SMALLINT UNSIGNED NOT NULL DEFAULT 0,
FavoriteCount SMALLINT UNSIGNED NOT NULL DEFAULT 0,
ClosedDate DATETIME NULL
);
Import from XML Source Data
11
LOAD XML LOCAL INFILE 'Posts.xml' INTO TABLE Posts
(
Id, PostTypeId, AcceptedAnswerId,
ParentId, @CreationDate, Score,
ViewCount, Body, OwnerUserId,
LastEditorUserId, @LastEditDate, @LastActivityDate,
Title, Tags, AnswerCount,
CommentCount, FavoriteCount, @ClosedDate
)
SET CreationDate = STR_TO_DATE(@CreationDate, @DATETIME_ISO8601),
LastEditDate = STR_TO_DATE(@LastEditDate, @DATETIME_ISO8601),
LastActivityDate = STR_TO_DATE(@LastActivityDate, @DATETIME_ISO8601),
ClosedDate = STR_TO_DATE(@ClosedDate, @DATETIME_ISO8601);
150,657 rows
Table: PostsJson to Store Copy of Data in JSON
CREATE TABLE PostsJson (
Id INT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
Data JSON NOT NULL
);
I’ll copy all the attributes
after the primary key
into a JSON column
A JSON Object Should Look Something Like This
13
{
"PostTypeId": 1,
"Title": "What are the main differences between InnoDB and MyISAM?",
"CreationDate": "2011-01-03 20:46:03.000000",
"Score": 180,
... more ...
}
14
What about LOAD JSON INFILE?
15
LOAD JSON INFILE?
There is no LOAD JSON INFILE statement (yet).
Please vote for these bug report / feature requests!
• LOAD DATA will not load a file into a JSON column unless converted
• https://bugs.mysql.com/bug.php?id=79066
• Need a LOAD JSON statement
• https://bugs.mysql.com/bug.php?id=79209
How to Convert Columns into JSON Fields?
INSERT INTO PostsJson (Id, Data)
SELECT Id,
...some magic...
FROM Posts;
Format JSON Using String Concatenation? Can You Spot the Mistakes?
INSERT INTO PostsJson (Id, Data)
SELECT Id, CONCAT('{',
'"PostTypeId": "', PostTypeId, '", ',
'"AcceptedAnswerId": "', AcceptedAnswerId, '", ',
'"ParentId": "', ParentId, '" ',
'"CreationDate": "', CreationDate, '", ',
'"Score": "', Score, '", ',
'"ViewCount": "', ViewCount, ', ',
'"Body": "', Body, '", ',
'"OwnerUserId" ', OwnerUserId, '", ',
'"LastEditorUserId": "', LastEditorUserId, '", ',
'"LastEditDate": "', LastEditDate, '", ',
'"LastActivityDate": "', LastActivityDate,
'"Title": "', Title, '", ',
'"Tags": "', Tags, '", ',
'"AnswerCount": "', AnswerCount, '", '
'"CommentCount": "', CommentCount, '", ',
'"FavoriteCount": "', FavoriteCount, '", ',
'"ClosedDate": "', ClosedDate '", '
'}')
FROM Posts;
missing comma in JSON
missing colon in JSON
missing termination
missing comma in CONCAT
missing double-quote in JSON
18
It's Easy to Write Invalid JSON
• use [ ] around array
• use "key": "value",
not "key", "value"
• use double-quotes,
not single-quotes
Fix Mistakes:
Use JSON_OBJECT() or JSON_ARRAY() to Produce Valid JSON More Easily
INSERT INTO PostsJson (Id, Data)
SELECT Id, JSON_OBJECT(
'PostTypeId', PostTypeId,
'AcceptedAnswerId', AcceptedAnswerId,
'ParentId', ParentId,
'CreationDate', CreationDate,
'Score', Score,
'ViewCount', ViewCount,
'Body', Body,
'OwnerUserId', OwnerUserId,
'LastEditorUserId', LastEditorUserId,
'LastEditDate', LastEditDate,
'LastActivityDate', LastActivityDate,
'Title', Title,
'Tags', Tags,
'AnswerCount', AnswerCount,
'CommentCount', CommentCount,
'FavoriteCount', FavoriteCount,
'ClosedDate', ClosedDate
)
FROM Posts;
JSON Extraction Function
20
CREATE TABLE PostsJson (
Id INT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
Data JSON NOT NULL
);
SELECT Id,
JSON_EXTRACT(Data, '$.Title'),
JSON_EXTRACT(Data, '$.ParentId'),
JSON_EXTRACT(Data, '$.Body')
FROM PostsJson
WHERE Id = 12828
JSON Extraction Operator
21
SELECT Id,
Data->'$.Title',
Data->'$.ParentId',
Data->'$.Body'
FROM PostsJson
WHERE Id = 12828
22
One Down, Five to Go…
ü Posts
• Badges
• Comments
• PostHistory
• Users
• Votes
23
24
What about performance?
25
Indexes for Optimization
• Avoid a table-scan — use an index to find matching rows
EXPLAIN SELECT * FROM PostHistory WHERE UserId = 2703;
id: 1
select_type: SIMPLE
table: PostHistory
partitions: NULL
type: ref
possible_keys: UserId
key: UserId
key_len: 5
ref: const
rows: 138
filtered: 100.00
Extra: Using index
index on UserId
small number — close
to the actual number
of matching rows
26
No Support for Indexes
• Like any expression, a search on a JSON function can't use an index
EXPLAIN SELECT * FROM PostHistoryJson WHERE Data->'$.UserId' = 2703;
id: 1
select_type: SIMPLE
table: PostHistoryJson
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 459294
filtered: 100.00
Extra: Using where
table-scan reads
ALL rows in the table
large number
How Does That Perform?
With Index on traditional table
+----------------------+----------+
| Status | Duration |
+----------------------+----------+
| starting | 0.000124 |
| checking permissions | 0.000012 |
| Opening tables | 0.000057 |
| init | 0.000010 |
| System lock | 0.000014 |
| optimizing | 0.000015 |
| statistics | 0.000094 |
| preparing | 0.000020 |
| executing | 0.000006 |
| Sending data | 0.000090 |
| end | 0.000009 |
| query end | 0.000012 |
| closing tables | 0.000015 |
| freeing items | 0.000027 |
| cleaning up | 0.000014 |
+----------------------+----------+
With Table-Scan on JSON table
+----------------------+----------+
| Status | Duration |
+----------------------+----------+
| starting | 0.000076 |
| checking permissions | 0.000008 |
| Opening tables | 0.000042 |
| init | 0.000007 |
| System lock | 0.000009 |
| optimizing | 0.000013 |
| statistics | 0.000019 |
| preparing | 0.000015 |
| executing | 0.000004 |
| Sending data | 0.694767 |
| end | 0.000012 |
| query end | 0.000009 |
| closing tables | 0.000011 |
| freeing items | 0.000017 |
| cleaning up | 0.000020 |
+----------------------+----------+
28
All Right—Can We Make an Index on JSON?
• No, JSON columns don't support indexes directly
ALTER TABLE PostsJson ADD INDEX (Data);
ERROR 3152 (42000): JSON column 'Data' supports indexing only via
generated columns on a specified JSON path.
29
What’s that about “generated” columns?
30
Generated Columns
• Define a column as an expression using other columns in the same row
ALTER TABLE Posts ADD COLUMN CreationMonth TINYINT UNSIGNED
AS (MONTH(CreationDate));
• You can then query it, like a VIEW at the column level
SELECT * FROM Posts WHERE CreationMonth = 4;
• It's still a table-scan so far
EXPLAIN SELECT * FROM Posts WHERE CreationMonth = 4;
+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+-------------+
| 1 | SIMPLE | Posts | NULL | ALL | NULL | NULL | NULL | NULL | 145232 | 10.00 | Using where |
+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+-------------+
31
Generated Columns
• Index this virtual column to optimize
ALTER TABLE Posts ADD KEY (CreationMonth);
EXPLAIN SELECT * FROM Posts WHERE CreationMonth = 4;
+----+-------------+-------+------------+------+---------------+---------------+---------+-------+-------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+---------------+---------+-------+-------+----------+-------+
| 1 | SIMPLE | Posts | NULL | ref | CreationMonth | CreationMonth | 2 | const | 11658 | 100.00 | NULL |
+----+-------------+-------+------------+------+---------------+---------------+---------+-------+-------+----------+-------+
• The index is also used if you use the expression
EXPLAIN SELECT * FROM Posts WHERE MONTH(CreationDate) = 4;
+----+-------------+-------+------------+------+---------------+---------------+---------+-------+-------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+---------------+---------+-------+-------+----------+-------+
| 1 | SIMPLE | Posts | NULL | ref | CreationMonth | CreationMonth | 2 | const | 11658 | 100.00 | NULL |
+----+-------------+-------+------------+------+---------------+---------------+---------+-------+-------+----------+-------+
32
Generated Columns Using JSON
• You can use any scalar expression—including JSON functions
ALTER TABLE PostsJson ADD COLUMN CreationDate DATETIME
AS (Data->'$.CreationDate');
• Add index to optimize
ALTER TABLE PostsJson ADD KEY (CreationDate);
EXPLAIN SELECT * FROM PostsJson WHERE CreationDate = '2018-04-20';
+----+-------------+-----------+------------+------+---------------+--------------+---------+-------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-----------+------------+------+---------------+--------------+---------+-------+------+----------+-------+
| 1 | SIMPLE | PostsJson | NULL | ref | CreationDate | CreationDate | 6 | const | 1 | 100.00 | NULL |
+----+-------------+-----------+------------+------+---------------+--------------+---------+-------+------+----------+-------+
• But with JSON, using the expression doesn't cue the use of the index
EXPLAIN SELECT * FROM PostsJson WHERE Data->'$.CreationDate' = '2018-04-20';
+----+-------------+-----------+------------+------+---------------+------+---------+-------+--------+----------+--------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-----------+------------+------+---------------+------+---------+-------+--------+----------+--------------+
| 1 | SIMPLE | PostsJson | NULL | ALL | NULL | NULL | NULL | NULL | 119795 | 100.00 | Using where |
+----+-------------+-----------+------------+------+---------------+------+---------+-------+--------+----------+--------------+
Declare a Foreign Key on a Generated Column
33
ALTER TABLE PostsJson
ADD COLUMN PostTypeId TINYINT UNSIGNED
AS (Data->'$.PostTypeId'),
ADD FOREIGN KEY (PostTypeId)
REFERENCES PostTypes(Id);
ERROR 1215 (HY000): Cannot add foreign key constraint
ALTER TABLE PostsJson
ADD COLUMN PostTypeId TINYINT UNSIGNED
AS (Data->'$.PostTypeId') STORED,
ADD FOREIGN KEY (PostTypeId)
REFERENCES PostTypes(Id);
foreign keys use
STORED generated
columns, not VIRTUAL
But the Next Foreign Key Doesn't Work?
34
ALTER TABLE PostsJson
ADD COLUMN AcceptedAnswerId INT UNSIGNED
AS (Data->'$.AcceptedAnswerId') STORED,
ADD FOREIGN KEY (AcceptedAnswerId)
REFERENCES Posts(Id);
ERROR 3156 (22018): Invalid JSON value for CAST to INTEGER
from column json_extract at row 1
Naturally, Some Posts Don’t Have an Accepted Answer
35
SELECT JSON_PRETTY(Data) FROM PostsJson LIMIT 1;
{
"Tags": "<mysql><innodb><myisam>",
"Score": 180,
"Title": "What are the main differences between InnoDB and MyISAM?",
"ParentId": null,
"ViewCount": 172059,
"ClosedDate": null,
"PostTypeId": 1,
"AnswerCount": 10,
"OwnerUserId": 8,
"CommentCount": 1,
"CreationDate": "2011-01-03 20:46:03.000000",
"LastEditDate": null,
"FavoriteCount": 105,
"AcceptedAnswerId": null,
"LastActivityDate": "2017-03-09 13:33:48.000000",
"LastEditorUserId": null
}
Is That a SQL NULL? No…
36
SELECT IFNULL(Data->'$.AcceptedAnswerId', 'missing')
AS AcceptedAnswerId
FROM PostsJson WHERE Id = 12828;
+------------------+
| AcceptedAnswerId |
+------------------+
| null |
+------------------+
a real SQL NULL would have
defaulted to the second argument;
it would also be spelled in caps
Is That a String 'null'? No…
37
SELECT Data->'$.AcceptedAnswerId' = 'null'
AS AcceptedAnswerId
FROM PostsJson WHERE Id = 12828;
+------------------+
| AcceptedAnswerId |
+------------------+
| 0 |
+------------------+
how can
'null' != 'null'?
It's Actually a Very Small JSON Document: 'null'
38
CREATE TABLE WhatIsIt AS
SELECT Data->'$.AcceptedAnswerId'
AS AcceptedAnswerId
FROM PostsJson WHERE Id = 12828;
CREATE TABLE `WhatIsIt` (
`AcceptedAnswerId` json DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8
the type is revealed
https://bugs.mysql.com/bug.php?id=85755
Get a Scalar Value with JSON_UNQUOTE() or the Operator
39
SELECT Data->>'$.AcceptedAnswerId' = 'null'
AS AcceptedAnswerId
FROM PostsJson WHERE Id = 12828;
+------------------+
| AcceptedAnswerId |
+------------------+
| 1 |
+------------------+
now it’s string
'null' = 'null'
But This Still Doesn't Work
40
ALTER TABLE PostsJson
ADD COLUMN AcceptedAnswerId INT UNSIGNED
AS (Data->>'$.AcceptedAnswerId') STORED,
ADD FOREIGN KEY (AcceptedAnswerId)
REFERENCES Posts(Id);
ERROR 1366 (HY000): Incorrect integer value: 'null' for
column 'AcceptedAnswerId' at row 1
strict mode is on by default,
so implicit type conversions
are errors
Disable Strict Mode? Bad Idea…
41
ALTER TABLE PostsJson
ADD COLUMN AcceptedAnswerId INT UNSIGNED
AS (Data->>'$.AcceptedAnswerId') STORED,
ADD FOREIGN KEY (AcceptedAnswerId)
REFERENCES Posts(Id);
ERROR 1452 (23000): Cannot add or update a child row: a foreign key constraint
fails (`stackexchange`.`#sql-6182_7d`, CONSTRAINT `postsjson_ibfk_2` FOREIGN KEY
(`AcceptedAnswerId`) REFERENCES `posts` (`id`))
integer value of string 'null' = 0
but there is no Posts.Id = 0
Instead, Convert the String 'null' to SQL NULL
42
ALTER TABLE PostsJson
ADD COLUMN AcceptedAnswerId INT UNSIGNED
AS (NULLIF(Data->>'$.AcceptedAnswerId', 'null')) STORED,
ADD FOREIGN KEY (AcceptedAnswerId)
REFERENCES Posts(Id);
Query OK, 150657 rows affected (4.04 sec)
Records: 150657 Duplicates: 0 Warnings: 0
Alternative: Remove Each Attribute That Is 'null'
43
UPDATE PostsJson
SET Data = JSON_REMOVE(Data, '$.AcceptedAnswerId')
WHERE Data->>'$.AcceptedAnswerId' = 'null';
Query OK, 120030 rows affected (7.64 sec)
Rows matched: 120030 Changed: 120030 Warnings: 0
ALTER TABLE PostsJson
ADD COLUMN AcceptedAnswerId INT UNSIGNED
AS (Data->'$.AcceptedAnswerId') STORED,
ADD FOREIGN KEY (AcceptedAnswerId)
REFERENCES Posts(Id);
Query OK, 150657 rows affected (4.71 sec)
Records: 150657 Duplicates: 0 Warnings: 0
simple extract operator
returns SQL NULL for
missing JSON attribute
44
How to Index JSON Attributes, Really
• ALTER TABLE to add a generated columns with
expressions to extract the JSON attributes
• A foreign key requires generated columns to
be STORED, not VIRTUAL
• Adding a VIRTUAL generated column is an
online DDL change
• Adding a STORED generated column must
perform a table-copy
• Nullable attributes must be either:
• Removed from the JSON document, so
JSON_EXTRACT() returns an SQL NULL
• Extracted, unquoted, and then converted to SQL
NULL in the generated as expression
• Finally, declare KEY or FOREIGN KEY on the
generated columns
45
What about searching multi-valued attributes?
Some Attributes Are Multi-Valued
SELECT Data FROM PostsJson LIMIT 1;
{
"Tags": "<mysql><innodb><myisam>",
...
SELECT SUBSTRING_INDEX(
SUBSTRING_INDEX(Data->>'$.Tags', '<', 2), '>', -2) AS Tag1
FROM PostsJson LIMIT 1;
+---------+
| Tag1 |
+---------+
| <mysql> |
+---------+
Convert a List into a JSON Array
UPDATE PostsJson
SET Data = JSON_SET(Data, '$.Tags', JSON_ARRAY(
SUBSTRING_INDEX(SUBSTRING_INDEX(Data->>'$.Tags', '<', 2), '>', -2),
SUBSTRING_INDEX(SUBSTRING_INDEX(Data->>'$.Tags', '<', 3), '>', -2),
SUBSTRING_INDEX(SUBSTRING_INDEX(Data->>'$.Tags', '<', 4), '>', -2),
SUBSTRING_INDEX(SUBSTRING_INDEX(Data->>'$.Tags', '<', 5), '>', -2),
SUBSTRING_INDEX(SUBSTRING_INDEX(Data->>'$.Tags', '<', 6), '>', -2)));
SELECT Data->'$.Tags' AS Tags FROM PostsJson LIMIT 1;
+-------------------------------------------------------------+
| Tags |
+-------------------------------------------------------------+
| ["<mysql>", "<innodb>", "<myisam>", "<myisam>", "<myisam>"] |
+-------------------------------------------------------------+
Search the Array with JSON_SEARCH()
SELECT Id FROM PostsJson
WHERE JSON_SEARCH(Data->'$.Tags', 'one', '<innodb>') IS NOT NULL
LIMIT 1;
+-------+
| Id |
+-------+
| 19298 |
+-------+
Can We Index That? Yes—But Only for One Specific Tag
49
ALTER TABLE PostsJson
ADD COLUMN TagInnodb BOOLEAN AS
(JSON_SEARCH(Data->'$.Tags', 'one', '<innodb>') IS NOT NULL),
ADD KEY (TagInnoDB);
EXPLAIN SELECT * FROM PostsJson WHERE TagInnoDB = 1;
+----+-------------+-----------+------------+------+---------------+-----------+---------+-------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-----------+------------+------+---------------+-----------+---------+-------+------+----------+-------+
| 1 | SIMPLE | PostsJson | NULL | ref | TagInnodb | TagInnodb | 2 | const | 1 | 100.00 | NULL |
+----+-------------+-----------+------------+------+---------------+-----------+---------+-------+------+----------+-------+
Can We Index Every Possible Tag Value? Probably Not…
50
ALTER TABLE PostsJson
ADD COLUMN TagMysql BOOLEAN AS (JSON_SEARCH(Data->'$.Tags', 'one', '<mysql>') IS NOT NULL),
ADD KEY (TagMysql);
ALTER TABLE PostsJson
ADD COLUMN TagMongodb BOOLEAN AS (JSON_SEARCH(Data->'$.Tags', 'one', '<mongodb>') IS NOT NULL),
ADD KEY (TagMongodb);
ALTER TABLE PostsJson
ADD COLUMN TagOracle BOOLEAN AS (JSON_SEARCH(Data->'$.Tags', 'one', '<oracle>') IS NOT NULL),
ADD KEY (TagOracle);
ALTER TABLE PostsJson
ADD COLUMN TagSqlite BOOLEAN AS (JSON_SEARCH(Data->'$.Tags', 'one', '<sqlite>') IS NOT NULL),
ADD KEY (TagSqlite);
...
ERROR 1069 (42000): Too many keys specified; max 64 keys allowed
51
How Can We Index Any Tag?
• Many-to-many relationship between Posts and Tags needs its own table:
CREATE TABLE PostsTags (
PostId INT UNSIGNED,
TagId INT UNSIGNED,
PRIMARY KEY (PostId, TagId),
FOREIGN KEY (PostId) REFERENCES PostsJson,
FOREIGN KEY (TagId) REFERENCES Tags
);
• Fill this table with one row per pairing
• Use one index to search for any tag!
51
52
What about storage size?
53
JSON Data Takes 120% — 317% Space (average 194%)
-
100,000,000
200,000,000
300,000,000
400,000,000
500,000,000
600,000,000
700,000,000
800,000,000
Badges Comments PostHistory Posts Users Votes
Data Length
SQL Tables JSON Tables
54
JSON Data + Indexes Takes 110% — 202% Space (average 154%)
-
100,000,000
200,000,000
300,000,000
400,000,000
500,000,000
600,000,000
700,000,000
800,000,000
Badges Comments PostHistory Posts Users Votes
Data Length + Index Length
SQL Tables JSON Tables
55
Your Mileage May Vary
• The increased size of JSON depends on several factors:
• Number of indexes
• Generated columns as STORED vs. VIRTUAL
• Data types of attributes
• Length of attribute values
• Length of attribute names
Length of INT Values Matters
56
CREATE TABLE IntLengthTest1 (id SERIAL PRIMARY KEY, d JSON);
INSERT INTO IntLengthTest1 SET d = JSON_OBJECT('a', 1234567890);
CREATE TABLE IntLengthTest2 LIKE IntLengthTest1;
INSERT INTO IntLengthTest2 SET d = JSON_OBJECT('a', '1234567890');
Double the rows until both tables have 1048576 rows
INSERT INTO IntLengthTest1 (d)
SELECT d FROM IntLengthTest1; /* repeat 20 times */
INSERT INTO IntLengthTest2 (d)
SELECT d FROM IntLengthTest2; /* repeat 20 times */
57
Length of INT Values Matters
-
10,000,000
20,000,000
30,000,000
40,000,000
50,000,000
60,000,000
70,000,000
"a": 1234567890 "a": "1234567890"
Length of Attribute Names Matters
58
CREATE TABLE AttrLengthTest1 (id SERIAL PRIMARY KEY, d JSON);
INSERT INTO AttrLengthTest1 SET d = JSON_OBJECT('a', 123);
CREATE TABLE AttrLengthTest2 LIKE AttrLengthTest1;
INSERT INTO AttrLengthTest1 SET d =
JSON_OBJECT('supercalifragilisticexpialidocious', 123);
Double the rows until both tables have 1048576 rows
INSERT INTO AttrLengthTest1 (d)
SELECT d FROM AttrLengthTest1; /* repeat 20 times */
INSERT INTO AttrLengthTest2 (d)
SELECT d FROM AttrLengthTest2; /* repeat 20 times */
59
Length of Attribute Names Matters
-
10,000,000
20,000,000
30,000,000
40,000,000
50,000,000
60,000,000
70,000,000
80,000,000
90,000,000
"a": 123 "supercalifragilisticexpialidocious": 123
60
What about client interfaces?
61
Java
• JSON data is returned as java.lang.String
• Use a library to parse a JSON string into a Java object, or format an
object into JSON
• JSON.simple: https://github.com/fangyidong/json-simple
• FasterXML Jackson: https://github.com/FasterXML/jackson
• Google GSON: https://github.com/google/gson
• Oracle JSONP: https://jsonp.java.net/
• Good article with performance comparisons:
- https://blog.takipi.com/the-ultimate-json-library-json-simple-vs-gson-vs-jackson-vs-json/
62
Go
• JSON data is returned as a string
• Use the standard json package
• json.Unmarshal() to parse a JSON string into a Go array or map
• json.Marshal() to convert an array or an object to JSON string
63
PHP
• JSON data is returned as a string
• Use builtin functions to convert from JSON string to/from PHP structures
• json_decode()
• json_encode()
64
How to Use JSON in MySQL Right
65
Stability vs. Maneuverability
https://commons.wikimedia.org/wiki/Category:Cessna_landings#/media/File:Mainland_Air_Cessna_152_ZK-FCQ_Dunedin,_NZ.jpg https://commons.wikimedia.org/wiki/Lockheed_Martin_F-22_Raptor#/media/File:Raptor_F-22_27th.jpg
66
Use JSON Like a Document Store
• Search by the PRIMARY KEY where possible
SELECT * FROM PostsJson WHERE Id = 19298;
• Good to use indexed generated columns in WHERE or ORDER BY
SELECT * FROM PostsJson WHERE OwnerUserId = 2703
ORDER BY CreatedDate;
• Extracting fields is fine when only displaying them in the SELECT-list
SELECT Data->>'$.Title' AS Title FROM PostsJson WHERE Id = 19298;
67
Use JSON Like a Document Store
• X DevAPI to use MySQL like a document store
• No need to install another NoSQL product
• https://dev.mysql.com/doc/refman/5.7/en/document-store.html
• Go to the presentation "MySQL 8.0: a Document Store with all the benefits of a transactional RDBMS"
68
Use SQL and Normalization
• Use JSON as flexible schema only when you need it
• User-defined fields
• Alternative to EAV
• Log-type data
• Storing nested arrays or objects is denormalized design
• Use dependent tables for multi-valued attributes
if you want to search or sort by them
• Use traditional columns instead of JSON fields for constraints
69
Capacity Planning
• Allocate 2x – 3x storage and buffer pool for JSON data
• Good reason to use JSON only for a subset of your data
70
Application Design
• Prefer to encode & decode JSON in your app, not in SQL
• Move that computation out to edge servers to scale out the load
• SQL should treat JSON as a "black box," i.e. irreducible strings
• Test performance cost of JSON encoding & decoding functions
71
See Upcoming Books
MySQL and JSON: A Practical Programming Guide
(2018-06-08) by David Stokes
https://www.mhprofessional.com/mysql-and-json-a-practical-programming-guide
Introducing the MySQL 8 Document Store
(2018-07-31) by Charles Bell
https://www.apress.com/us/book/9781484227244
72
Rate My Session
73
License and Copyright
Copyright 2018 Bill Karwin
http://www.slideshare.net/billkarwin
Released under a Creative Commons 3.0 License:
http://creativecommons.org/licenses/by-nc-nd/3.0/
You are free to share—to copy, distribute,
and transmit this work, under the following conditions:
Attribution.
You must attribute this
work to Bill Karwin.
Noncommercial.
You may not use this work
for commercial purposes.
No Derivative Works.
You may not alter,
transform, or build upon
this work.

More Related Content

PDF
Models for hierarchical data
PPTX
Accreditation of health care organization
KEY
Trees In The Database - Advanced data structures
PDF
Extensible Data Modeling
PDF
Trees and Hierarchies in SQL
PDF
PDF
Sql query patterns, optimized
PPTX
Health promotion and education in school By Sourabh Kosey
Models for hierarchical data
Accreditation of health care organization
Trees In The Database - Advanced data structures
Extensible Data Modeling
Trees and Hierarchies in SQL
Sql query patterns, optimized
Health promotion and education in school By Sourabh Kosey

What's hot (20)

PDF
The MySQL Query Optimizer Explained Through Optimizer Trace
PDF
Sql Antipatterns Strike Back
PPTX
PPTX
Indexing with MongoDB
PPTX
Thrift vs Protocol Buffers vs Avro - Biased Comparison
PDF
Introduction to MongoDB
PDF
Distributed computing with spark
PDF
[Pgday.Seoul 2020] SQL Tuning
PDF
Recursive Query Throwdown
PDF
InnoDB MVCC Architecture (by 권건우)
PDF
Top 10 Mistakes When Migrating From Oracle to PostgreSQL
PDF
Database Anti Patterns
PDF
Streaming Operational Data with MariaDB MaxScale
PDF
MySQL Advanced Administrator 2021 - 네오클로바
PDF
Inside MongoDB: the Internals of an Open-Source Database
ODP
Deep Dive Into Elasticsearch
PDF
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
PDF
Advanced MySQL Query Tuning
PDF
InnoDB Internal
PPTX
Why TypeScript?
The MySQL Query Optimizer Explained Through Optimizer Trace
Sql Antipatterns Strike Back
Indexing with MongoDB
Thrift vs Protocol Buffers vs Avro - Biased Comparison
Introduction to MongoDB
Distributed computing with spark
[Pgday.Seoul 2020] SQL Tuning
Recursive Query Throwdown
InnoDB MVCC Architecture (by 권건우)
Top 10 Mistakes When Migrating From Oracle to PostgreSQL
Database Anti Patterns
Streaming Operational Data with MariaDB MaxScale
MySQL Advanced Administrator 2021 - 네오클로바
Inside MongoDB: the Internals of an Open-Source Database
Deep Dive Into Elasticsearch
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
Advanced MySQL Query Tuning
InnoDB Internal
Why TypeScript?
Ad

Similar to How to Use JSON in MySQL Wrong (20)

PDF
5_MariaDB_What's New in MariaDB Server 10.2 and Big Data Analytics with Maria...
PPTX
BGOUG15: JSON support in MySQL 5.7
PDF
CREATE INDEX … USING VODKA. VODKA CONNECTING INDEXES, Олег Бартунов, Александ...
PDF
Introduction to MySQL Query Tuning for Dev[Op]s
PPTX
The rise of json in rdbms land jab17
PPTX
Cassandra 2.2 & 3.0
PPTX
N1QL: What's new in Couchbase 5.0
KEY
Benefits of using MongoDB: Reduce Complexity & Adapt to Changes
PDF
Sql server 2016: System Databases, data types, DML, json, and built-in functions
PPTX
Php forum2015 tomas_final
PDF
Uncovering SQL Server query problems with execution plans - Tony Davis
PDF
Using JSON with MariaDB and MySQL
PDF
MySQL 8.0 Preview: What Is Coming?
PPTX
PostgreSQL 9.4 JSON Types and Operators
PPTX
Alasql JavaScript SQL Database Library: User Manual
PDF
NoSQL для PostgreSQL: Jsquery — язык запросов
PDF
Conquering JSONB in PostgreSQL
PPTX
PDF
Beyond php - it's not (just) about the code
5_MariaDB_What's New in MariaDB Server 10.2 and Big Data Analytics with Maria...
BGOUG15: JSON support in MySQL 5.7
CREATE INDEX … USING VODKA. VODKA CONNECTING INDEXES, Олег Бартунов, Александ...
Introduction to MySQL Query Tuning for Dev[Op]s
The rise of json in rdbms land jab17
Cassandra 2.2 & 3.0
N1QL: What's new in Couchbase 5.0
Benefits of using MongoDB: Reduce Complexity & Adapt to Changes
Sql server 2016: System Databases, data types, DML, json, and built-in functions
Php forum2015 tomas_final
Uncovering SQL Server query problems with execution plans - Tony Davis
Using JSON with MariaDB and MySQL
MySQL 8.0 Preview: What Is Coming?
PostgreSQL 9.4 JSON Types and Operators
Alasql JavaScript SQL Database Library: User Manual
NoSQL для PostgreSQL: Jsquery — язык запросов
Conquering JSONB in PostgreSQL
Beyond php - it's not (just) about the code
Ad

More from Karwin Software Solutions LLC (13)

PDF
PDF
InnoDB Locking Explained with Stick Figures
PDF
SQL Outer Joins for Fun and Profit
PDF
Survey of Percona Toolkit
PDF
How to Design Indexes, Really
PDF
PDF
MySQL 5.5 Guide to InnoDB Status
PDF
Requirements the Last Bottleneck
PDF
Mentor Your Indexes
PDF
Sql Injection Myths and Fallacies
PDF
Full Text Search In PostgreSQL
PDF
Practical Object Oriented Models In Sql
InnoDB Locking Explained with Stick Figures
SQL Outer Joins for Fun and Profit
Survey of Percona Toolkit
How to Design Indexes, Really
MySQL 5.5 Guide to InnoDB Status
Requirements the Last Bottleneck
Mentor Your Indexes
Sql Injection Myths and Fallacies
Full Text Search In PostgreSQL
Practical Object Oriented Models In Sql

Recently uploaded (20)

PDF
Gestión Unificada de los Riegos Externos
PPTX
Build automations faster and more reliably with UiPath ScreenPlay
PDF
Ebook - The Future of AI A Comprehensive Guide.pdf
PPTX
From XAI to XEE through Influence and Provenance.Controlling model fairness o...
PDF
NewMind AI Journal Monthly Chronicles - August 2025
PDF
Technical Debt in the AI Coding Era - By Antonio Bianco
PDF
Connector Corner: Transform Unstructured Documents with Agentic Automation
PDF
EIS-Webinar-Regulated-Industries-2025-08.pdf
PDF
Introduction to c language from lecture slides
PDF
ELLIE29.pdfWETWETAWTAWETAETAETERTRTERTER
PPTX
Information-Technology-in-Human-Society (2).pptx
PDF
Intravenous drug administration application for pediatric patients via augmen...
PPT
Overviiew on Intellectual property right
PDF
Introduction to MCP and A2A Protocols: Enabling Agent Communication
PDF
Decision Optimization - From Theory to Practice
PDF
EGCB_Solar_Project_Presentation_and Finalcial Analysis.pdf
PPTX
Strategic Picks — Prioritising the Right Agentic Use Cases [2/6]
PDF
Data Virtualization in Action: Scaling APIs and Apps with FME
PPTX
CRM(Customer Relationship Managmnet) Presentation
PDF
CCUS-as-the-Missing-Link-to-Net-Zero_AksCurious.pdf
Gestión Unificada de los Riegos Externos
Build automations faster and more reliably with UiPath ScreenPlay
Ebook - The Future of AI A Comprehensive Guide.pdf
From XAI to XEE through Influence and Provenance.Controlling model fairness o...
NewMind AI Journal Monthly Chronicles - August 2025
Technical Debt in the AI Coding Era - By Antonio Bianco
Connector Corner: Transform Unstructured Documents with Agentic Automation
EIS-Webinar-Regulated-Industries-2025-08.pdf
Introduction to c language from lecture slides
ELLIE29.pdfWETWETAWTAWETAETAETERTRTERTER
Information-Technology-in-Human-Society (2).pptx
Intravenous drug administration application for pediatric patients via augmen...
Overviiew on Intellectual property right
Introduction to MCP and A2A Protocols: Enabling Agent Communication
Decision Optimization - From Theory to Practice
EGCB_Solar_Project_Presentation_and Finalcial Analysis.pdf
Strategic Picks — Prioritising the Right Agentic Use Cases [2/6]
Data Virtualization in Action: Scaling APIs and Apps with FME
CRM(Customer Relationship Managmnet) Presentation
CCUS-as-the-Missing-Link-to-Net-Zero_AksCurious.pdf

How to Use JSON in MySQL Wrong

  • 1. How to Use JSON in MySQL Wrong Bill Karwin, Square Inc. Santa Clara, California | April 23th – 25th, 2018
  • 2. 2 Me • Database Developer at Square Inc. • MySQL Quality Contributor • Oracle Ace Director • Author of "SQL Antipatterns: Avoiding the Pitfalls of Database Programming"
  • 3. 3 Outline • Why JSON? • How do we load JSON data? • What about LOAD JSON INFILE? • What about performance? • What’s that about “generated” columns? • What about searching multi-valued attributes? • What about storage size? • What about client interfaces? • How to Use JSON in MySQL Right
  • 5. 5 Interest in JSON Is Growing 0.00% 0.10% 0.20% 0.30% 0.40% 0.50% 0.60% 0.70% 0.80% 1-Aug-2008 1-Nov-2008 1-Feb-2009 1-May-2009 1-Aug-2009 1-Nov-2009 1-Feb-2010 1-May-2010 1-Aug-2010 1-Nov-2010 1-Feb-2011 1-May-2011 1-Aug-2011 1-Nov-2011 1-Feb-2012 1-May-2012 1-Aug-2012 1-Nov-2012 1-Feb-2013 1-May-2013 1-Aug-2013 1-Nov-2013 1-Feb-2014 1-May-2014 1-Aug-2014 1-Nov-2014 1-Feb-2015 1-May-2015 1-Aug-2015 1-Nov-2015 1-Feb-2016 1-May-2016 1-Aug-2016 1-Nov-2016 1-Feb-2017 1-May-2017 1-Aug-2017 1-Nov-2017 1-Feb-2018 Stack Overflow: Percent of MySQL QuestionsTagged with JSON https://data.stackexchange.com/stackoverflow/query/834289/mysql-and-json-tags-by-month Node.js ECMA-404 MySQL 5.7 JSON type
  • 6. 6 Why JSON? • Portable data interchange format • Easy for humans to read • Easy for code to use • It’s not XML • Flexible schema in an SQL database • Semi-structured data • Like a document database • No more ALTER TABLE?
  • 7. 7 How do we load JSON data?
  • 8. 8 Test Data • Data dump for dba.StackExchange.com • 987MB of XML • "All user content contributed to the Stack Exchange network is cc-by-sa 3.0 licensed, intended to be shared and remixed." • LOAD XML INFILE to import data • Then copy into equivalent JSON tables • Let’s see what trouble we find! Test Data: https://archive.org/details/stackexchange My code: https://github.com/billkarwin/bk-tools/tree/master/stackexchange
  • 9. 9
  • 10. Table: Posts in Traditional Columns CREATE TABLE Posts ( Id INT UNSIGNED AUTO_INCREMENT PRIMARY KEY, PostTypeId TINYINT UNSIGNED NOT NULL, AcceptedAnswerId INT UNSIGNED NULL COMMENT 'if PostTypeId=1', ParentId INT UNSIGNED NULL COMMENT 'if PostTypeId=2', CreationDate DATETIME NOT NULL, Score SMALLINT NOT NULL DEFAULT 0, ViewCount INT UNSIGNED NOT NULL DEFAULT 0, Body TEXT NOT NULL, OwnerUserId INT NULL, LastEditorUserId INT NULL, LastEditDate DATETIME NULL, LastActivityDate DATETIME NULL, Title TINYTEXT NOT NULL, Tags TINYTEXT NOT NULL, AnswerCount SMALLINT UNSIGNED NOT NULL DEFAULT 0, CommentCount SMALLINT UNSIGNED NOT NULL DEFAULT 0, FavoriteCount SMALLINT UNSIGNED NOT NULL DEFAULT 0, ClosedDate DATETIME NULL );
  • 11. Import from XML Source Data 11 LOAD XML LOCAL INFILE 'Posts.xml' INTO TABLE Posts ( Id, PostTypeId, AcceptedAnswerId, ParentId, @CreationDate, Score, ViewCount, Body, OwnerUserId, LastEditorUserId, @LastEditDate, @LastActivityDate, Title, Tags, AnswerCount, CommentCount, FavoriteCount, @ClosedDate ) SET CreationDate = STR_TO_DATE(@CreationDate, @DATETIME_ISO8601), LastEditDate = STR_TO_DATE(@LastEditDate, @DATETIME_ISO8601), LastActivityDate = STR_TO_DATE(@LastActivityDate, @DATETIME_ISO8601), ClosedDate = STR_TO_DATE(@ClosedDate, @DATETIME_ISO8601); 150,657 rows
  • 12. Table: PostsJson to Store Copy of Data in JSON CREATE TABLE PostsJson ( Id INT UNSIGNED AUTO_INCREMENT PRIMARY KEY, Data JSON NOT NULL ); I’ll copy all the attributes after the primary key into a JSON column
  • 13. A JSON Object Should Look Something Like This 13 { "PostTypeId": 1, "Title": "What are the main differences between InnoDB and MyISAM?", "CreationDate": "2011-01-03 20:46:03.000000", "Score": 180, ... more ... }
  • 14. 14 What about LOAD JSON INFILE?
  • 15. 15 LOAD JSON INFILE? There is no LOAD JSON INFILE statement (yet). Please vote for these bug report / feature requests! • LOAD DATA will not load a file into a JSON column unless converted • https://bugs.mysql.com/bug.php?id=79066 • Need a LOAD JSON statement • https://bugs.mysql.com/bug.php?id=79209
  • 16. How to Convert Columns into JSON Fields? INSERT INTO PostsJson (Id, Data) SELECT Id, ...some magic... FROM Posts;
  • 17. Format JSON Using String Concatenation? Can You Spot the Mistakes? INSERT INTO PostsJson (Id, Data) SELECT Id, CONCAT('{', '"PostTypeId": "', PostTypeId, '", ', '"AcceptedAnswerId": "', AcceptedAnswerId, '", ', '"ParentId": "', ParentId, '" ', '"CreationDate": "', CreationDate, '", ', '"Score": "', Score, '", ', '"ViewCount": "', ViewCount, ', ', '"Body": "', Body, '", ', '"OwnerUserId" ', OwnerUserId, '", ', '"LastEditorUserId": "', LastEditorUserId, '", ', '"LastEditDate": "', LastEditDate, '", ', '"LastActivityDate": "', LastActivityDate, '"Title": "', Title, '", ', '"Tags": "', Tags, '", ', '"AnswerCount": "', AnswerCount, '", ' '"CommentCount": "', CommentCount, '", ', '"FavoriteCount": "', FavoriteCount, '", ', '"ClosedDate": "', ClosedDate '", ' '}') FROM Posts; missing comma in JSON missing colon in JSON missing termination missing comma in CONCAT missing double-quote in JSON
  • 18. 18 It's Easy to Write Invalid JSON • use [ ] around array • use "key": "value", not "key", "value" • use double-quotes, not single-quotes Fix Mistakes:
  • 19. Use JSON_OBJECT() or JSON_ARRAY() to Produce Valid JSON More Easily INSERT INTO PostsJson (Id, Data) SELECT Id, JSON_OBJECT( 'PostTypeId', PostTypeId, 'AcceptedAnswerId', AcceptedAnswerId, 'ParentId', ParentId, 'CreationDate', CreationDate, 'Score', Score, 'ViewCount', ViewCount, 'Body', Body, 'OwnerUserId', OwnerUserId, 'LastEditorUserId', LastEditorUserId, 'LastEditDate', LastEditDate, 'LastActivityDate', LastActivityDate, 'Title', Title, 'Tags', Tags, 'AnswerCount', AnswerCount, 'CommentCount', CommentCount, 'FavoriteCount', FavoriteCount, 'ClosedDate', ClosedDate ) FROM Posts;
  • 20. JSON Extraction Function 20 CREATE TABLE PostsJson ( Id INT UNSIGNED AUTO_INCREMENT PRIMARY KEY, Data JSON NOT NULL ); SELECT Id, JSON_EXTRACT(Data, '$.Title'), JSON_EXTRACT(Data, '$.ParentId'), JSON_EXTRACT(Data, '$.Body') FROM PostsJson WHERE Id = 12828
  • 21. JSON Extraction Operator 21 SELECT Id, Data->'$.Title', Data->'$.ParentId', Data->'$.Body' FROM PostsJson WHERE Id = 12828
  • 22. 22 One Down, Five to Go… ü Posts • Badges • Comments • PostHistory • Users • Votes
  • 23. 23
  • 25. 25 Indexes for Optimization • Avoid a table-scan — use an index to find matching rows EXPLAIN SELECT * FROM PostHistory WHERE UserId = 2703; id: 1 select_type: SIMPLE table: PostHistory partitions: NULL type: ref possible_keys: UserId key: UserId key_len: 5 ref: const rows: 138 filtered: 100.00 Extra: Using index index on UserId small number — close to the actual number of matching rows
  • 26. 26 No Support for Indexes • Like any expression, a search on a JSON function can't use an index EXPLAIN SELECT * FROM PostHistoryJson WHERE Data->'$.UserId' = 2703; id: 1 select_type: SIMPLE table: PostHistoryJson partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 459294 filtered: 100.00 Extra: Using where table-scan reads ALL rows in the table large number
  • 27. How Does That Perform? With Index on traditional table +----------------------+----------+ | Status | Duration | +----------------------+----------+ | starting | 0.000124 | | checking permissions | 0.000012 | | Opening tables | 0.000057 | | init | 0.000010 | | System lock | 0.000014 | | optimizing | 0.000015 | | statistics | 0.000094 | | preparing | 0.000020 | | executing | 0.000006 | | Sending data | 0.000090 | | end | 0.000009 | | query end | 0.000012 | | closing tables | 0.000015 | | freeing items | 0.000027 | | cleaning up | 0.000014 | +----------------------+----------+ With Table-Scan on JSON table +----------------------+----------+ | Status | Duration | +----------------------+----------+ | starting | 0.000076 | | checking permissions | 0.000008 | | Opening tables | 0.000042 | | init | 0.000007 | | System lock | 0.000009 | | optimizing | 0.000013 | | statistics | 0.000019 | | preparing | 0.000015 | | executing | 0.000004 | | Sending data | 0.694767 | | end | 0.000012 | | query end | 0.000009 | | closing tables | 0.000011 | | freeing items | 0.000017 | | cleaning up | 0.000020 | +----------------------+----------+
  • 28. 28 All Right—Can We Make an Index on JSON? • No, JSON columns don't support indexes directly ALTER TABLE PostsJson ADD INDEX (Data); ERROR 3152 (42000): JSON column 'Data' supports indexing only via generated columns on a specified JSON path.
  • 29. 29 What’s that about “generated” columns?
  • 30. 30 Generated Columns • Define a column as an expression using other columns in the same row ALTER TABLE Posts ADD COLUMN CreationMonth TINYINT UNSIGNED AS (MONTH(CreationDate)); • You can then query it, like a VIEW at the column level SELECT * FROM Posts WHERE CreationMonth = 4; • It's still a table-scan so far EXPLAIN SELECT * FROM Posts WHERE CreationMonth = 4; +----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+-------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+-------------+ | 1 | SIMPLE | Posts | NULL | ALL | NULL | NULL | NULL | NULL | 145232 | 10.00 | Using where | +----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+-------------+
  • 31. 31 Generated Columns • Index this virtual column to optimize ALTER TABLE Posts ADD KEY (CreationMonth); EXPLAIN SELECT * FROM Posts WHERE CreationMonth = 4; +----+-------------+-------+------------+------+---------------+---------------+---------+-------+-------+----------+-------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+------+---------------+---------------+---------+-------+-------+----------+-------+ | 1 | SIMPLE | Posts | NULL | ref | CreationMonth | CreationMonth | 2 | const | 11658 | 100.00 | NULL | +----+-------------+-------+------------+------+---------------+---------------+---------+-------+-------+----------+-------+ • The index is also used if you use the expression EXPLAIN SELECT * FROM Posts WHERE MONTH(CreationDate) = 4; +----+-------------+-------+------------+------+---------------+---------------+---------+-------+-------+----------+-------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+------+---------------+---------------+---------+-------+-------+----------+-------+ | 1 | SIMPLE | Posts | NULL | ref | CreationMonth | CreationMonth | 2 | const | 11658 | 100.00 | NULL | +----+-------------+-------+------------+------+---------------+---------------+---------+-------+-------+----------+-------+
  • 32. 32 Generated Columns Using JSON • You can use any scalar expression—including JSON functions ALTER TABLE PostsJson ADD COLUMN CreationDate DATETIME AS (Data->'$.CreationDate'); • Add index to optimize ALTER TABLE PostsJson ADD KEY (CreationDate); EXPLAIN SELECT * FROM PostsJson WHERE CreationDate = '2018-04-20'; +----+-------------+-----------+------------+------+---------------+--------------+---------+-------+------+----------+-------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-----------+------------+------+---------------+--------------+---------+-------+------+----------+-------+ | 1 | SIMPLE | PostsJson | NULL | ref | CreationDate | CreationDate | 6 | const | 1 | 100.00 | NULL | +----+-------------+-----------+------------+------+---------------+--------------+---------+-------+------+----------+-------+ • But with JSON, using the expression doesn't cue the use of the index EXPLAIN SELECT * FROM PostsJson WHERE Data->'$.CreationDate' = '2018-04-20'; +----+-------------+-----------+------------+------+---------------+------+---------+-------+--------+----------+--------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-----------+------------+------+---------------+------+---------+-------+--------+----------+--------------+ | 1 | SIMPLE | PostsJson | NULL | ALL | NULL | NULL | NULL | NULL | 119795 | 100.00 | Using where | +----+-------------+-----------+------------+------+---------------+------+---------+-------+--------+----------+--------------+
  • 33. Declare a Foreign Key on a Generated Column 33 ALTER TABLE PostsJson ADD COLUMN PostTypeId TINYINT UNSIGNED AS (Data->'$.PostTypeId'), ADD FOREIGN KEY (PostTypeId) REFERENCES PostTypes(Id); ERROR 1215 (HY000): Cannot add foreign key constraint ALTER TABLE PostsJson ADD COLUMN PostTypeId TINYINT UNSIGNED AS (Data->'$.PostTypeId') STORED, ADD FOREIGN KEY (PostTypeId) REFERENCES PostTypes(Id); foreign keys use STORED generated columns, not VIRTUAL
  • 34. But the Next Foreign Key Doesn't Work? 34 ALTER TABLE PostsJson ADD COLUMN AcceptedAnswerId INT UNSIGNED AS (Data->'$.AcceptedAnswerId') STORED, ADD FOREIGN KEY (AcceptedAnswerId) REFERENCES Posts(Id); ERROR 3156 (22018): Invalid JSON value for CAST to INTEGER from column json_extract at row 1
  • 35. Naturally, Some Posts Don’t Have an Accepted Answer 35 SELECT JSON_PRETTY(Data) FROM PostsJson LIMIT 1; { "Tags": "<mysql><innodb><myisam>", "Score": 180, "Title": "What are the main differences between InnoDB and MyISAM?", "ParentId": null, "ViewCount": 172059, "ClosedDate": null, "PostTypeId": 1, "AnswerCount": 10, "OwnerUserId": 8, "CommentCount": 1, "CreationDate": "2011-01-03 20:46:03.000000", "LastEditDate": null, "FavoriteCount": 105, "AcceptedAnswerId": null, "LastActivityDate": "2017-03-09 13:33:48.000000", "LastEditorUserId": null }
  • 36. Is That a SQL NULL? No… 36 SELECT IFNULL(Data->'$.AcceptedAnswerId', 'missing') AS AcceptedAnswerId FROM PostsJson WHERE Id = 12828; +------------------+ | AcceptedAnswerId | +------------------+ | null | +------------------+ a real SQL NULL would have defaulted to the second argument; it would also be spelled in caps
  • 37. Is That a String 'null'? No… 37 SELECT Data->'$.AcceptedAnswerId' = 'null' AS AcceptedAnswerId FROM PostsJson WHERE Id = 12828; +------------------+ | AcceptedAnswerId | +------------------+ | 0 | +------------------+ how can 'null' != 'null'?
  • 38. It's Actually a Very Small JSON Document: 'null' 38 CREATE TABLE WhatIsIt AS SELECT Data->'$.AcceptedAnswerId' AS AcceptedAnswerId FROM PostsJson WHERE Id = 12828; CREATE TABLE `WhatIsIt` ( `AcceptedAnswerId` json DEFAULT NULL ) ENGINE=InnoDB DEFAULT CHARSET=utf8 the type is revealed https://bugs.mysql.com/bug.php?id=85755
  • 39. Get a Scalar Value with JSON_UNQUOTE() or the Operator 39 SELECT Data->>'$.AcceptedAnswerId' = 'null' AS AcceptedAnswerId FROM PostsJson WHERE Id = 12828; +------------------+ | AcceptedAnswerId | +------------------+ | 1 | +------------------+ now it’s string 'null' = 'null'
  • 40. But This Still Doesn't Work 40 ALTER TABLE PostsJson ADD COLUMN AcceptedAnswerId INT UNSIGNED AS (Data->>'$.AcceptedAnswerId') STORED, ADD FOREIGN KEY (AcceptedAnswerId) REFERENCES Posts(Id); ERROR 1366 (HY000): Incorrect integer value: 'null' for column 'AcceptedAnswerId' at row 1 strict mode is on by default, so implicit type conversions are errors
  • 41. Disable Strict Mode? Bad Idea… 41 ALTER TABLE PostsJson ADD COLUMN AcceptedAnswerId INT UNSIGNED AS (Data->>'$.AcceptedAnswerId') STORED, ADD FOREIGN KEY (AcceptedAnswerId) REFERENCES Posts(Id); ERROR 1452 (23000): Cannot add or update a child row: a foreign key constraint fails (`stackexchange`.`#sql-6182_7d`, CONSTRAINT `postsjson_ibfk_2` FOREIGN KEY (`AcceptedAnswerId`) REFERENCES `posts` (`id`)) integer value of string 'null' = 0 but there is no Posts.Id = 0
  • 42. Instead, Convert the String 'null' to SQL NULL 42 ALTER TABLE PostsJson ADD COLUMN AcceptedAnswerId INT UNSIGNED AS (NULLIF(Data->>'$.AcceptedAnswerId', 'null')) STORED, ADD FOREIGN KEY (AcceptedAnswerId) REFERENCES Posts(Id); Query OK, 150657 rows affected (4.04 sec) Records: 150657 Duplicates: 0 Warnings: 0
  • 43. Alternative: Remove Each Attribute That Is 'null' 43 UPDATE PostsJson SET Data = JSON_REMOVE(Data, '$.AcceptedAnswerId') WHERE Data->>'$.AcceptedAnswerId' = 'null'; Query OK, 120030 rows affected (7.64 sec) Rows matched: 120030 Changed: 120030 Warnings: 0 ALTER TABLE PostsJson ADD COLUMN AcceptedAnswerId INT UNSIGNED AS (Data->'$.AcceptedAnswerId') STORED, ADD FOREIGN KEY (AcceptedAnswerId) REFERENCES Posts(Id); Query OK, 150657 rows affected (4.71 sec) Records: 150657 Duplicates: 0 Warnings: 0 simple extract operator returns SQL NULL for missing JSON attribute
  • 44. 44 How to Index JSON Attributes, Really • ALTER TABLE to add a generated columns with expressions to extract the JSON attributes • A foreign key requires generated columns to be STORED, not VIRTUAL • Adding a VIRTUAL generated column is an online DDL change • Adding a STORED generated column must perform a table-copy • Nullable attributes must be either: • Removed from the JSON document, so JSON_EXTRACT() returns an SQL NULL • Extracted, unquoted, and then converted to SQL NULL in the generated as expression • Finally, declare KEY or FOREIGN KEY on the generated columns
  • 45. 45 What about searching multi-valued attributes?
  • 46. Some Attributes Are Multi-Valued SELECT Data FROM PostsJson LIMIT 1; { "Tags": "<mysql><innodb><myisam>", ... SELECT SUBSTRING_INDEX( SUBSTRING_INDEX(Data->>'$.Tags', '<', 2), '>', -2) AS Tag1 FROM PostsJson LIMIT 1; +---------+ | Tag1 | +---------+ | <mysql> | +---------+
  • 47. Convert a List into a JSON Array UPDATE PostsJson SET Data = JSON_SET(Data, '$.Tags', JSON_ARRAY( SUBSTRING_INDEX(SUBSTRING_INDEX(Data->>'$.Tags', '<', 2), '>', -2), SUBSTRING_INDEX(SUBSTRING_INDEX(Data->>'$.Tags', '<', 3), '>', -2), SUBSTRING_INDEX(SUBSTRING_INDEX(Data->>'$.Tags', '<', 4), '>', -2), SUBSTRING_INDEX(SUBSTRING_INDEX(Data->>'$.Tags', '<', 5), '>', -2), SUBSTRING_INDEX(SUBSTRING_INDEX(Data->>'$.Tags', '<', 6), '>', -2))); SELECT Data->'$.Tags' AS Tags FROM PostsJson LIMIT 1; +-------------------------------------------------------------+ | Tags | +-------------------------------------------------------------+ | ["<mysql>", "<innodb>", "<myisam>", "<myisam>", "<myisam>"] | +-------------------------------------------------------------+
  • 48. Search the Array with JSON_SEARCH() SELECT Id FROM PostsJson WHERE JSON_SEARCH(Data->'$.Tags', 'one', '<innodb>') IS NOT NULL LIMIT 1; +-------+ | Id | +-------+ | 19298 | +-------+
  • 49. Can We Index That? Yes—But Only for One Specific Tag 49 ALTER TABLE PostsJson ADD COLUMN TagInnodb BOOLEAN AS (JSON_SEARCH(Data->'$.Tags', 'one', '<innodb>') IS NOT NULL), ADD KEY (TagInnoDB); EXPLAIN SELECT * FROM PostsJson WHERE TagInnoDB = 1; +----+-------------+-----------+------------+------+---------------+-----------+---------+-------+------+----------+-------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-----------+------------+------+---------------+-----------+---------+-------+------+----------+-------+ | 1 | SIMPLE | PostsJson | NULL | ref | TagInnodb | TagInnodb | 2 | const | 1 | 100.00 | NULL | +----+-------------+-----------+------------+------+---------------+-----------+---------+-------+------+----------+-------+
  • 50. Can We Index Every Possible Tag Value? Probably Not… 50 ALTER TABLE PostsJson ADD COLUMN TagMysql BOOLEAN AS (JSON_SEARCH(Data->'$.Tags', 'one', '<mysql>') IS NOT NULL), ADD KEY (TagMysql); ALTER TABLE PostsJson ADD COLUMN TagMongodb BOOLEAN AS (JSON_SEARCH(Data->'$.Tags', 'one', '<mongodb>') IS NOT NULL), ADD KEY (TagMongodb); ALTER TABLE PostsJson ADD COLUMN TagOracle BOOLEAN AS (JSON_SEARCH(Data->'$.Tags', 'one', '<oracle>') IS NOT NULL), ADD KEY (TagOracle); ALTER TABLE PostsJson ADD COLUMN TagSqlite BOOLEAN AS (JSON_SEARCH(Data->'$.Tags', 'one', '<sqlite>') IS NOT NULL), ADD KEY (TagSqlite); ... ERROR 1069 (42000): Too many keys specified; max 64 keys allowed
  • 51. 51 How Can We Index Any Tag? • Many-to-many relationship between Posts and Tags needs its own table: CREATE TABLE PostsTags ( PostId INT UNSIGNED, TagId INT UNSIGNED, PRIMARY KEY (PostId, TagId), FOREIGN KEY (PostId) REFERENCES PostsJson, FOREIGN KEY (TagId) REFERENCES Tags ); • Fill this table with one row per pairing • Use one index to search for any tag! 51
  • 53. 53 JSON Data Takes 120% — 317% Space (average 194%) - 100,000,000 200,000,000 300,000,000 400,000,000 500,000,000 600,000,000 700,000,000 800,000,000 Badges Comments PostHistory Posts Users Votes Data Length SQL Tables JSON Tables
  • 54. 54 JSON Data + Indexes Takes 110% — 202% Space (average 154%) - 100,000,000 200,000,000 300,000,000 400,000,000 500,000,000 600,000,000 700,000,000 800,000,000 Badges Comments PostHistory Posts Users Votes Data Length + Index Length SQL Tables JSON Tables
  • 55. 55 Your Mileage May Vary • The increased size of JSON depends on several factors: • Number of indexes • Generated columns as STORED vs. VIRTUAL • Data types of attributes • Length of attribute values • Length of attribute names
  • 56. Length of INT Values Matters 56 CREATE TABLE IntLengthTest1 (id SERIAL PRIMARY KEY, d JSON); INSERT INTO IntLengthTest1 SET d = JSON_OBJECT('a', 1234567890); CREATE TABLE IntLengthTest2 LIKE IntLengthTest1; INSERT INTO IntLengthTest2 SET d = JSON_OBJECT('a', '1234567890'); Double the rows until both tables have 1048576 rows INSERT INTO IntLengthTest1 (d) SELECT d FROM IntLengthTest1; /* repeat 20 times */ INSERT INTO IntLengthTest2 (d) SELECT d FROM IntLengthTest2; /* repeat 20 times */
  • 57. 57 Length of INT Values Matters - 10,000,000 20,000,000 30,000,000 40,000,000 50,000,000 60,000,000 70,000,000 "a": 1234567890 "a": "1234567890"
  • 58. Length of Attribute Names Matters 58 CREATE TABLE AttrLengthTest1 (id SERIAL PRIMARY KEY, d JSON); INSERT INTO AttrLengthTest1 SET d = JSON_OBJECT('a', 123); CREATE TABLE AttrLengthTest2 LIKE AttrLengthTest1; INSERT INTO AttrLengthTest1 SET d = JSON_OBJECT('supercalifragilisticexpialidocious', 123); Double the rows until both tables have 1048576 rows INSERT INTO AttrLengthTest1 (d) SELECT d FROM AttrLengthTest1; /* repeat 20 times */ INSERT INTO AttrLengthTest2 (d) SELECT d FROM AttrLengthTest2; /* repeat 20 times */
  • 59. 59 Length of Attribute Names Matters - 10,000,000 20,000,000 30,000,000 40,000,000 50,000,000 60,000,000 70,000,000 80,000,000 90,000,000 "a": 123 "supercalifragilisticexpialidocious": 123
  • 60. 60 What about client interfaces?
  • 61. 61 Java • JSON data is returned as java.lang.String • Use a library to parse a JSON string into a Java object, or format an object into JSON • JSON.simple: https://github.com/fangyidong/json-simple • FasterXML Jackson: https://github.com/FasterXML/jackson • Google GSON: https://github.com/google/gson • Oracle JSONP: https://jsonp.java.net/ • Good article with performance comparisons: - https://blog.takipi.com/the-ultimate-json-library-json-simple-vs-gson-vs-jackson-vs-json/
  • 62. 62 Go • JSON data is returned as a string • Use the standard json package • json.Unmarshal() to parse a JSON string into a Go array or map • json.Marshal() to convert an array or an object to JSON string
  • 63. 63 PHP • JSON data is returned as a string • Use builtin functions to convert from JSON string to/from PHP structures • json_decode() • json_encode()
  • 64. 64 How to Use JSON in MySQL Right
  • 65. 65 Stability vs. Maneuverability https://commons.wikimedia.org/wiki/Category:Cessna_landings#/media/File:Mainland_Air_Cessna_152_ZK-FCQ_Dunedin,_NZ.jpg https://commons.wikimedia.org/wiki/Lockheed_Martin_F-22_Raptor#/media/File:Raptor_F-22_27th.jpg
  • 66. 66 Use JSON Like a Document Store • Search by the PRIMARY KEY where possible SELECT * FROM PostsJson WHERE Id = 19298; • Good to use indexed generated columns in WHERE or ORDER BY SELECT * FROM PostsJson WHERE OwnerUserId = 2703 ORDER BY CreatedDate; • Extracting fields is fine when only displaying them in the SELECT-list SELECT Data->>'$.Title' AS Title FROM PostsJson WHERE Id = 19298;
  • 67. 67 Use JSON Like a Document Store • X DevAPI to use MySQL like a document store • No need to install another NoSQL product • https://dev.mysql.com/doc/refman/5.7/en/document-store.html • Go to the presentation "MySQL 8.0: a Document Store with all the benefits of a transactional RDBMS"
  • 68. 68 Use SQL and Normalization • Use JSON as flexible schema only when you need it • User-defined fields • Alternative to EAV • Log-type data • Storing nested arrays or objects is denormalized design • Use dependent tables for multi-valued attributes if you want to search or sort by them • Use traditional columns instead of JSON fields for constraints
  • 69. 69 Capacity Planning • Allocate 2x – 3x storage and buffer pool for JSON data • Good reason to use JSON only for a subset of your data
  • 70. 70 Application Design • Prefer to encode & decode JSON in your app, not in SQL • Move that computation out to edge servers to scale out the load • SQL should treat JSON as a "black box," i.e. irreducible strings • Test performance cost of JSON encoding & decoding functions
  • 71. 71 See Upcoming Books MySQL and JSON: A Practical Programming Guide (2018-06-08) by David Stokes https://www.mhprofessional.com/mysql-and-json-a-practical-programming-guide Introducing the MySQL 8 Document Store (2018-07-31) by Charles Bell https://www.apress.com/us/book/9781484227244
  • 73. 73 License and Copyright Copyright 2018 Bill Karwin http://www.slideshare.net/billkarwin Released under a Creative Commons 3.0 License: http://creativecommons.org/licenses/by-nc-nd/3.0/ You are free to share—to copy, distribute, and transmit this work, under the following conditions: Attribution. You must attribute this work to Bill Karwin. Noncommercial. You may not use this work for commercial purposes. No Derivative Works. You may not alter, transform, or build upon this work.