Skip to content
This repository was archived by the owner on Feb 22, 2024. It is now read-only.

Array slices #22

Merged
merged 8 commits into from
Nov 4, 2020
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 17 additions & 1 deletion DEVELOPING.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,9 +47,25 @@ test to `true`, for example:
</pre>

When one or more tests are focussed in this way, the test suite will fail with the message
"testcase(s) still focussed" even if all the tests pass.
"testcase(s) still focussed" if and only if all the focussed tests pass.
This prevents pull requests being merged in which tests are accidentally left focussed.

To skip one or more tests, edit [cts.json](tests/cts.json) and set the `skip` property of the relevant
test to `true`, for example:
<pre>
}, {
"name": "wildcarded child",
<b>"skip": true,</b>
"selector": "$.*",
"document": {"a" : "A", "b" : "B"},
"result": ["A", "B"]
}, {
</pre>

When one or more tests are skipped in this way, the test suite will fail with the message
"testcase(s) still skipped" if and only if all the tests pass and none are focussed.
This prevents pull requests being merged in which tests are accidentally left skipped.

To see details of which tests run, use:
```
cargo test -- --show-output
Expand Down
68 changes: 68 additions & 0 deletions src/ast.rs
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,8 @@
*/

use serde_json::Value;
use std::cmp::Ordering;
use std::iter;

/// A path is a tree of selector nodes.
///
Expand Down Expand Up @@ -57,9 +59,17 @@ pub enum Selector {
#[derive(Debug)]
pub enum UnionElement {
Name(String),
Slice(Slice),
Index(i64),
}

#[derive(Debug)]
pub struct Slice {
pub start: Option<isize>,
pub end: Option<isize>,
pub step: Option<isize>,
}

type Iter<'a> = Box<dyn Iterator<Item = &'a Value> + 'a>;

impl Path {
Expand Down Expand Up @@ -89,11 +99,69 @@ impl UnionElement {
pub fn get<'a>(&self, v: &'a Value) -> Iter<'a> {
match self {
UnionElement::Name(name) => Box::new(v.get(name).into_iter()),
UnionElement::Slice(slice) => {
if let Value::Array(arr) = v {
let step = slice.step.unwrap_or(1);

let len = arr.len();

let start = slice
.start
.map(|s| if s < 0 { s + (len as isize) } else { s })
.unwrap_or(if step > 0 { 0 } else { (len as isize) - 1 });

let end = slice
.end
.map(|e| if e < 0 { e + (len as isize) } else { e })
.unwrap_or(if step > 0 { len as isize } else { -1 });

Box::new(array_slice(arr, start, end, step))
} else {
Box::new(iter::empty())
}
}
UnionElement::Index(num) => Box::new(v.get(abs_index(*num, v)).into_iter()),
}
}
}

fn array_slice(arr: &[Value], start: isize, end: isize, step: isize) -> Iter<'_> {
let len = arr.len();
let mut sl = vec![];
match step.cmp(&0) {
Ordering::Greater => {
let strt = if start < 0 { 0 } else { start as usize }; // avoid CPU attack
let e = if end > (len as isize) {
len
} else {
end as usize
}; // avoid CPU attack
for i in (strt..e).step_by(step as usize) {
if i < len {
sl.push(&arr[i]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spec says:

A negative index j selects an element of an array of length len if and only if 0 <= j + len < len, in which case it selects the same element as the non-negative index j + len.

then it describes the iterations as:

for (i = start; i < end; i = i + step) {
  ...
}

what should happen if you call this on an input slice of length 5?:

    let s = array_slice(j.as_array().unwrap(), -5, 0, 1);

My interpretation is that it would yield

i = -5;
i < 0 //  true
push(arr[i + len]) // arr[-5+5] === arr[0]
i = i+1 //  -4
i < 0 //  true
push(arr[i + len]) // arr[-4+5] === arr[1]
...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good example of a reference implementation and compliance test suite turning up a hole in the spec.

I think the spec should say (only more formally/precisely) that negative start is syntactic sugar for start + len and similarly for negative end. But then start and end need to be desugared before plugging them into the relevant for loop.

So, for an array of length 5, [-5, 0, 1] corresponds to the elements indexed by the values of i in the for loop:

for (i = 0; i < 0; i = i + 1) {
  ...
}

of which there are none, so the result is empty.

Copy link
Member

@mkmik mkmik Oct 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Desugaring before iterating makes most sense to me as well.

I was tinkering with the consequences if doing it the other way around and I don't think it would provide any advantage. An interesting consequence is that it allows the resulting slice to be longer than the input array (and contain repeated elements) which I think would be quite confusing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the implementation is correct and we just need to fix the spec. ;-)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the privileges of a spec writer! :-)

Copy link
Member

@mkmik mkmik Oct 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if something like this would have less special casing: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=dd7180d6586e5a511b4e56b0058e5562

While I don't think it's fully correct w.r.t overflows etc, but has a few advantages:

  1. no temporary array; just uses rust's own array slice iterator and the ability to reverse any iterator
  2. no risk of "cpu attack", since we're not iterating user provided bounds anyway; worst case of bugs we get a panic accessing a slice out of bounds
  3. less duplication of code; the two branches of step > 0 and step < 0 in the PR contain quite a lot of common code with subtle differences

}
}
}

Ordering::Less => {
let strt = if start > (len as isize) {
len as isize
} else {
start
}; // avoid CPU attack
let e = if end < -1 { -1 } else { end }; // avoid CPU attack
for i in (-strt..-e).step_by(-step as usize) {
if 0 <= -i && -i < (len as isize) {
sl.push(&arr[-i as usize]);
}
}
}

Ordering::Equal => (),
}
Box::new(sl.into_iter())
}

fn abs_index(index: i64, node: &Value) -> usize {
if index >= 0 {
index as usize
Expand Down
13 changes: 9 additions & 4 deletions src/grammar.pest
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ selector = _{ SOI ~ rootSelector ~ matchers ~ EOI }
matchers = ${ matcher* }
rootSelector = { "$" }

matcher = { dotChild | union }
matcher = !{ dotChild | union }

dotChild = _{ wildcardedDotChild | namedDotChild }
wildcardedDotChild = { ".*" }
Expand All @@ -18,9 +18,14 @@ char = {
}

union = { "[" ~ unionElement ~ ("," ~ unionElement)* ~ "]" }
unionElement = _{ unionChild | unionArrayIndex } // TODO: add unionArraySlice
unionChild = { doubleQuotedString | singleQuotedString }
unionArrayIndex = { "-" ? ~ ( "0" | ASCII_NONZERO_DIGIT ~ ASCII_DIGIT* ) }
unionElement = _{ unionChild | unionArraySlice | unionArrayIndex }
unionChild = ${ doubleQuotedString | singleQuotedString }
unionArrayIndex = @{ integer }
integer = _{ "-" ? ~ ( "0" | ASCII_NONZERO_DIGIT ~ ASCII_DIGIT* ) }
unionArraySlice = { sliceStart ? ~ ":" ~ sliceEnd ? ~ ( ":" ~ sliceStep ? ) ? }
sliceStart = @{ integer }
sliceEnd = @{ integer }
sliceStep = @{ integer }

doubleQuotedString = _{ "\"" ~ doubleInner ~ "\"" }
doubleInner = @{ doubleChar* }
Expand Down
26 changes: 26 additions & 0 deletions src/parser.rs
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,7 @@ fn parse_union_indices(matcher_rule: pest::iterators::Pair<Rule>) -> Vec<UnionEl
.into_inner()
.map(|r| match r.as_rule() {
Rule::unionChild => parse_union_child(r),
Rule::unionArraySlice => parse_union_array_slice(r),
Rule::unionArrayIndex => parse_union_array_index(r),
_ => panic!("invalid parse tree {:?}", r),
})
Expand All @@ -71,6 +72,31 @@ fn parse_union_array_index(matcher_rule: pest::iterators::Pair<Rule>) -> UnionEl
UnionElement::Index(i)
}

fn parse_union_array_slice(matcher_rule: pest::iterators::Pair<Rule>) -> UnionElement {
let mut start: Option<isize> = None;
let mut end: Option<isize> = None;
let mut step: Option<isize> = None;
for r in matcher_rule.into_inner() {
match r.as_rule() {
Rule::sliceStart => {
start = Some(r.as_str().parse().unwrap());
}

Rule::sliceEnd => {
end = Some(r.as_str().parse().unwrap());
}

Rule::sliceStep => {
step = Some(r.as_str().parse().unwrap());
}

_ => panic!("invalid parse tree {:?}", r),
}
}

UnionElement::Slice(Slice { start, end, step })
}

fn unescape(contents: &str) -> String {
let s = format!(r#""{}""#, contents);
serde_json::from_str(&s).unwrap()
Expand Down
194 changes: 193 additions & 1 deletion tests/cts.json
Original file line number Diff line number Diff line change
Expand Up @@ -481,6 +481,20 @@
"name": "union child, single quotes, incomplete escape",
"selector": "$['\\']",
"invalid_selector": true
}, {
"name": "union",
"selector": "$[0,2]",
"document": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
"result": [0, 2]
}, {
"name": "union with whitespace",
"selector": "$[ 0 , 1 ]",
"document": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
"result": [0, 1]
}, {
"name": "empty union",
"selector": "$[]",
"invalid_selector": true
}, {
"name": "union array access",
"selector": "$[0]",
Expand Down Expand Up @@ -524,5 +538,183 @@
"name": "union array access, leading -0",
"selector": "$[-01]",
"invalid_selector": true
}
}, {
"name": "union array slice",
"selector": "$[1:3]",
"document": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
"result": [1, 2]
}, {
"name": "union array slice with step",
"selector": "$[1:6:2]",
"document": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
"result": [1, 3, 5]
}, {
"name": "union array slice with everything omitted, short form",
"selector": "$[:]",
"document": [0, 1, 2, 3],
"result": [0, 1, 2, 3]
}, {
"name": "union array slice with everything omitted, long form",
"selector": "$[::]",
"document": [0, 1, 2, 3],
"result": [0, 1, 2, 3]
}, {
"name": "union array slice with start omitted",
"selector": "$[:2]",
"document": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
"result": [0, 1]
}, {
"name": "union array slice with start and end omitted",
"selector": "$[::2]",
"document": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
"result": [0, 2, 4, 6, 8]
}, {
"name": "union array slice, last index",
"selector": "$[-1]",
"document": [0, 1, 2, 3],
"result": [3]
}, {
"name": "union array slice, overflowed index",
"selector": "$[4]",
"document": [0, 1, 2, 3],
"result": []
}, {
"name": "union array slice, underflowed index",
"selector": "$[-5]",
"document": [0, 1, 2, 3],
"result": []
}, {
"name": "union array slice, negative step with default start and end",
"selector": "$[::-1]",
"document": [0, 1, 2, 3],
"result": [3, 2, 1, 0]
}, {
"name": "union array slice, negative step with default start",
"selector": "$[:0:-1]",
"document": [0, 1, 2, 3],
"result": [3, 2, 1]
}, {
"name": "union array slice, negative step with default end",
"selector": "$[2::-1]",
"document": [0, 1, 2, 3],
"result": [2, 1, 0]
}, {
"name": "union array slice, larger negative step",
"selector": "$[::-2]",
"document": [0, 1, 2, 3],
"result": [3, 1]
}, {
"name": "union array slice, negative range with default step",
"selector": "$[-1:-3]",
"document": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
"result": []
}, {
"name": "union array slice, negative range with negative step",
"selector": "$[-1:-3:-1]",
"document": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
"result": [9, 8]
}, {
"name": "union array slice, negative range with larger negative step",
"selector": "$[-1:-6:-2]",
"document": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
"result": [9, 7, 5]
}, {
"name": "union array slice, larger negative range with larger negative step",
"selector": "$[-1:-7:-2]",
"document": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
"result": [9, 7, 5]
}, {
"name": "union array slice, negative from, positive to",
"selector": "$[-5:7]",
"document": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
"result": [5, 6]
}, {
"name": "union array slice, negative from",
"selector": "$[-2:]",
"document": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
"result": [8, 9]
}, {
"name": "union array slice, positive from, negative to",
"selector": "$[1:-1]",
"document": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
"result": [1, 2, 3, 4, 5, 6, 7, 8]
}, {
"name": "union array slice, negative from, positive to, negative step",
"selector": "$[-1:1:-1]",
"document": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
"result": [9, 8, 7, 6, 5, 4, 3, 2]
}, {
"name": "union array slice, positive from, negative to, negative step",
"selector": "$[7:-5:-1]",
"document": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
"result": [7, 6]
}, {
"name": "union array slice, too many colons",
"selector": "$[1:2:3:4]",
"invalid_selector": true
}, {
"name": "union array slice, non-integer array index",
"selector": "$[1:2:a]",
"invalid_selector": true
}, {
"name": "union array slice, zero step",
"selector": "$[1:2:0]",
"document": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
"result": []
}, {
"name": "union array slice, empty range",
"selector": "$[2:2]",
"document": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
"result": []
}, {
"name": "union array slice, default indices with empty array",
"selector": "$[:]",
"document": [],
"result": []
}, {
"name": "union array slice, negative step with empty array",
"selector": "$[::-1]",
"document": [],
"result": []
}, {
"name": "union array slice, maximal range with positive step",
"selector": "$[0:10]",
"document": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
"result": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
}, {
"name": "union array slice, maximal range with negative step",
"selector": "$[9:0:-1]",
"document": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
"result": [9, 8, 7, 6, 5, 4, 3, 2, 1]
}, {
"name": "union array slice, excessively large to value",
"selector": "$[2:113667776004]",
"document": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
"result": [2, 3, 4, 5, 6, 7, 8, 9]
}, {
"name": "union array slice, excessively small from value",
"selector": "$[-113667776004:1]",
"document": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
"result": [0]
}, {
"name": "union array slice, excessively large from value with negative step",
"selector": "$[113667776004:0:-1]",
"document": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
"result": [9, 8, 7, 6, 5, 4, 3, 2, 1]
}, {
"name": "union array slice, excessively small to value with negative step",
"selector": "$[3:-113667776004:-1]",
"document": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
"result": [3, 2, 1, 0]
}, {
"name": "union array slice, excessively large step",
"selector": "$[1:10:113667776004]",
"document": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
"result": [1]
}, {
"name": "union array slice, excessively small step",
"selector": "$[-1:-10:-113667776004]",
"document": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
"result": [9]
}
]}
Loading