Introduction
JSON has become a very widely used file format. It is now being used for API data exchange, log files, configuration files, and many other applications. This tip gives a quick overview of JSON and how to analyze JSON data with a tool called “jq”.
What is JSON?
JSON is a lightweight text-based open standard designed for human-readable data interchange.
JSON vs XML
JSON
{"employees":[
{ "firstName":"John", "lastName":"Doe" },
{ "firstName":"Anna", "lastName":"Smith" },
{ "firstName":"Peter", "lastName":"Jones" }
]}
XML
<employees>
<employee>
<firstName>John</firstName> <lastName>Doe</lastName>
</employee>
<employee>
<firstName>Anna</firstName> <lastName>Smith</lastName>
</employee>
<employee>
<firstName>Peter</firstName> <lastName>Jones</lastName>
</employee>
</employees>
- JSON doesn’t use end tag
- JSON is shorter
- JSON is quicker to read and write
- JSON can use arrays
JSON syntax
JSON syntax is derived from JavaScript object notation syntax.
- Objects are in {}
- Data in objects is represented in key/value pairs (dictionary).
- Arrays are in []
- Data in objects and arrays is separated by ,
- Objects can be nested (e.g. Array of objects, Array of Arrays, … etc).
Supported Data types:
- String
- Number
- Object
- Array
- Boolean (true, false)
- null
Example
{
"firstName": "John",
"lastName": "Smith",
"isAlive": true,
"age": 27,
"address": {
"streetAddress": "21 2nd Street",
"city": "New York",
"state": "NY",
"postalCode": "10021-3100"
},
"accounts": ["facebook","twitter","instagram"],
"phoneNumbers": [
{
"type": "home",
"number": "212 555-1234"
},
{
"type": "office",
"number": "646 555-4567"
},
{
"type": "mobile",
"number": "123 456-7890"
}
],
"children": [],
"spouse": null
}
In the above example you can find:
- String key/value pairs. e.g.:
"firstName": "John",
"age": 27,
"isAlive": true,
"spouse": null
"address": {
"streetAddress": "21 2nd Street",
"city": "New York",
"state": "NY",
"postalCode": "10021-3100"
}
"accounts": ["facebook","twitter","instagram"]
"phoneNumbers": [
{
"type": "home",
"number": "212 555-1234"
},
{
"type": "office",
"number": "646 555-4567"
},
{
"type": "mobile",
"number": "123 456-7890"
}
],
jq
jq is a command line JSON parser. It can be used to format and filter JSON data.
Install
MacOS
brew install jq
Debian and Ubuntu
sudo apt-get install jq
Fedora
sudo dnf install jq
Windows
chocolatey install jq
For more details you can refer to https://stedolan.github.io/jq/download/
The simplest jq program is the expression .
, which takes the input and produces it unchanged as output. It can be used to nicely format JSON. For example, let’s take the below file that contains an IAM policy:
$ cat sample1.txt
{ "Version": "2012-10-17", "Statement": [ { "Sid": "Stmt1507018975000", "Effect": "Allow", "Action": [ "ssm:PutParameter", "ssm:GetParameter" ], "Resource": [ "*" ] } ] }
This is very hard to read. However, If we pipe the output to “jq” It will be organized and colored.
$ cat sample1.txt | jq '.'
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "Stmt1507018975000",
"Effect": "Allow",
"Action": [
"ssm:PutParameter",
"ssm:GetParameter"
],
"Resource": [
"*"
]
}
]
}
NOTE: If the output is to long and you want to use less to scroll through it, but also want to keep the coloring you need to use the -C option with jq and the -R options with less.
$ cat sample1.txt | jq '.' -C | less -R
Object Identifier .foo, .foo.bar
As stated, JSON objects consist of key/value pairs (dictionaries). To get the value of a specific key in a JSON object, you can use .foo
if the key is foo.
Example: To get the “Version” in the file “sample1.txt” shown above you can use the below command:
$ cat sample1.txt | jq '.Version'
"2012-10-17"
If an object is nested inside another object you can use .foo.bar
where “foo” is the key in the outer object and “bar” is the key ins the inner object.
Example: In the below AWS Cloudtrail event (sample2.txt):
{
"eventVersion": "1.0",
"userIdentity": {
"type": "IAMUser",
"principalId": "EX_PRINCIPAL_ID",
"arn": "arn:aws:iam::123456789012:user/Alice",
"accessKeyId": "EXAMPLE_KEY_ID",
"accountId": "123456789012",
"userName": "Alice"
},
"eventTime": "2014-03-06T21:22:54Z",
"eventSource": "ec2.amazonaws.com",
"eventName": "StartInstances",
"awsRegion": "us-east-2",
"sourceIPAddress": "205.251.233.176",
"userAgent": "ec2-api-tools 1.6.12.2",
"requestParameters": {"instancesSet": {"items": [{"instanceId": "i-ebeaf9e2"}]}},
"responseElements": {"instancesSet": {"items": [{
"instanceId": "i-ebeaf9e2",
"currentState": {
"code": 0,
"name": "pending"
},
"previousState": {
"code": 80,
"name": "stopped"
}
}]}}
}
If we want to find the ARN of the user that invoked the event we can use “.userIdentity.arn” as shown below.
$ cat sample2.txt | jq '.userIdentity.arn'
"arn:aws:iam::123456789012:user/Alice"
Array Index .[0]
For JSON arrays, you can select a certain item in the array using [n]
where n is the order of the item in the array (0 is the first item).
Example: In the below AWS IAM policy (sample3.txt)
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "FirstStatement",
"Effect": "Allow",
"Action": ["iam:ChangePassword"],
"Resource": "*"
},
{
"Sid": "SecondStatement",
"Effect": "Allow",
"Action": ["s3:ListAllMyBuckets"],
"Resource": "*"
},
{
"Sid": "ThirdStatement",
"Effect": "Allow",
"Action": [
"s3:List*",
"s3:Get*"
],
"Resource": [
"arn:aws:s3:::confidential-data",
"arn:aws:s3:::confidential-data/*"
],
"Condition": {"Bool": {"aws:MultiFactorAuthPresent": "true"}}
}
]
}
To get all statements:
$ cat sample3.txt | jq '.Statement'
To get first statement
$ cat sample3.txt | jq '.Statement[0]'
{
"Sid": "FirstStatement",
"Effect": "Allow",
"Action": [
"iam:ChangePassword"
],
"Resource": "*"
}
To get second statement
$ cat sample3.txt | jq '.Statement[1]'
{
"Sid": "SecondStatement",
"Effect": "Allow",
"Action": "s3:ListAllMyBuckets",
"Resource": "*"
}
Array Slice .[0:2]
For JSON arrays using [n:m]
you can get a slice of the array where the order of he first returned item with be item is “n” and the order of the last item is “m-1”.
Example: To get the first 2 items of the above policy (sample3.txt):
$ cat sample3.txt | jq '.Statement[0:2]'
[
{
"Sid": "FirstStatement",
"Effect": "Allow",
"Action": [
"iam:ChangePassword"
],
"Resource": "*"
},
{
"Sid": "SecondStatement",
"Effect": "Allow",
"Action": "s3:ListAllMyBuckets",
"Resource": "*"
}
]
Array iterator .[]
For JSON arrays using []
will iterate through all the items of the array. This means that you can apply more filters to all the items in the array to return the output.
Example: to get the Sids of all statements in the policy in sample3.txt.
$ cat sample3.txt | jq '.Statement[].Sid'
"FirstStatement"
"SecondStatement"
"ThirdStatement"
Example: to get all Actions mentioned in all statements.
$ cat sample3.txt | jq '.Statement[].Action[]'
"iam:ChangePassword"
"s3:ListAllMyBuckets"
"s3:List*"
"s3:Get*"
Comma ,
If two filters are separated by a comma, then the same input will be fed into both and the two filters’ output value streams will be concatenated in order: first, all of the outputs produced by the left expression, and then all of the outputs produced by the right.
Example: In sample3.txt, the below command will output the Sids of all statements followed by the Version.
$ cat sample3.txt | jq '.Statement[].Sid , .Version'
"FirstStatement"
"SecondStatement"
"ThirdStatement"
"2012-10-17"
Pipe |
The |
operator combines two filters by feeding the output(s) of the one on the left into the input of the one on the right.
Example: In sample3.txt, the below command will show the resources of all policies. This doesn’t seam useful yet as it would do the same without the pip, but it will be more clear why this is useful in upcoming examples.
$ cat sample3.txt | jq '.Statement[] | .Resource'
"*"
"*"
[
"arn:aws:s3:::confidential-data",
"arn:aws:s3:::confidential-data/*"
]
Array construction [ ]
To make the output an array you can place the filters between [ ]
.
Example:
$ cat sample3.txt | jq '[.Statement[].Sid] '
[
"FirstStatement",
"SecondStatement",
"ThirdStatement"
]
Object construction { }
To make the output an array you can place the filters between { }
with the appropriate keys.
Example:
$ cat sample3.txt | jq '.Statement[] | {"s":.Sid,"a":.Action} '
{
"s": "FirstStatement",
"a": [
"iam:ChangePassword"
]
}
{
"s": "SecondStatement",
"a": [
"s3:ListAllMyBuckets"
]
}
{
"s": "ThirdStatement",
"a": [
"s3:List*",
"s3:Get*"
]
}
Filter objects select(boolean_expression)
To filter multiple objects to only include the ones that meet a certain condition you use select(boolean_expression)
. To be able to build the boolean expression the below operators/functions.
- ==
- =!
- > >=
- < <=
- and
- or
- not
- has
- test
- contains
- startswith
- endswith
Examples:
Below are some examples for querying AWS cloudtrail logs. Each log file has the below structure, and each event looks like the example in sample2.txt mentioned above. For the below examples I assume we have multiple log files unzipped and placed in the current directory.
{
"Records": [
{ cloudtrail event },
{ cloudtrail event },
...
{ cloudtrail event },
]
}
- List all “DescribeInstances” events.
==
$ cat * | jq '.Records[] | select(.eventName=="DescribeInstances")'
- List all RDS events.
==
$ cat * | jq '.Records[] | select(.eventSource=="rds.amazonaws.com")'
- List all Describe events.
startswith
$ cat * | jq '.Records[] | select(.eventName | startswith("Describe")) | .eventName' -r
DescribeStackResource
DescribeStackResource
DescribeStackResource
DescribeStackResource
DescribeStackResource
DescribeLoadBalancerAttributes
...
- List all RDS Describe events.
startswith
and ==
$ cat * | jq '.Records[] | select(.eventName | startswith("Describe")) | select(.eventSource=="rds.amazonaws.com") | .eventName' -r
DescribeDBSecurityGroups
DescribeDBSnapshots
DescribeDBInstances
DescribeDBInstances
DescribeDBClusters
- List all events that has the word “certificate” or “Certificate”.
test
$ cat * | jq '.Records[] | select(.eventName | test("[Cc]ertificate")) | .eventName' -r
ListCertificates
ListTagsForCertificate
DescribeCertificate
DescribeCertificate
ListTagsForCertificate
ListTagsForCertificate
ListTagsForCertificate
DescribeCertificate
...
- List all events that have errors.
has
$ cat * | jq '.Records[] | select(has("errorCode")) | {"error":.errorCode,"event":.eventName,"time":.eventTime}'
{
"error": "NoSuchCORSConfiguration",
"event": "GetBucketCors",
"time": "2018-02-28T05:23:18Z"
}
{
"error": "NoSuchCORSConfiguration",
"event": "GetBucketCors",
"time": "2018-02-28T05:23:18Z"
}
...
- List all events that contain “Describe” or “List”.
contains
and or
$ cat * | jq '.Records[] | select((.eventName | contains("Describe")) or (.eventName | contains("List")) ) | {"event":.eventName,"time":.eventTime}'
{
"event": "DescribeStackResource",
"time": "2018-02-28T00:00:10Z"
}
{
"event": "DescribeStackResource",
"time": "2018-02-28T00:02:07Z"
}
- List all events that do NOT contain “Describe” or “List”.
contains
and not
$ cat * | jq '.Records[] | select(.eventName | contains("Describe") | not) | select(.eventName | contains("List") | not) | {"event":.eventName,"time":.eventTime}'
{
"event": "AssumeRole",
"time": "2018-02-28T00:00:36Z"
}
{
"event": "AssumeRole",
"time": "2018-02-28T00:10:42Z"
}
...
Conditional value if A then B else C end
This cane be used to show a value based on a boolean expression.
Example:
$ cat * | jq '.Records[] | {"describe?": (if(.eventName=="DescribeInstances") then "yes" else "no" end) , "event": .eventName }'
{
"describe?": "no",
"event": "DescribeStackResource"
}
{
"describe?": "no",
"event": "DescribeStackResource"
}
Count items length
You can use length
to get the length of an array.
Example: This can be used to count matched cloudtrail events as shown below. To be able to do this we used the -s
option to join events from all log files into one array.
$ cat * | jq '.Records[]' | jq -s '[ .[] | select(.eventName=="DescribeInstances") ] | length'
108
Comma-separated output @csv
This can be used to represent the output as comma-separated instead of JSON. The input of @csv
needs to be an array.
Example:
$ cat * | jq '.Records[] | select(.eventName=="DescribeInstances") | [.eventTime , .eventName ] | @csv' -r
"2018-02-28T00:18:04Z","DescribeInstances"
"2018-02-28T00:17:37Z","DescribeInstances"
"2018-02-28T00:17:41Z","DescribeInstances"
...
Tab-separated output @tsv
This can be used to represent the output as tab-separated instead of JSON. The input of @tsv
needs to be an array.
Example:
$ cat * | jq '.Records[] | select(.eventName=="DescribeInstances") | [.eventTime , .eventName ] | @tsv' -r
2018-02-28T00:18:04Z DescribeInstances
2018-02-28T00:17:37Z DescribeInstances
2018-02-28T00:17:41Z DescribeInstances
...
jq Manual
You can find the full jq manual on https://stedolan.github.io/jq/manual/.