logo

Rust crates reviews

Cryptographically verifiable, distributed dependency reviews

reviewer: BurntSushi

https://lib.rs/BurntSushi

$ cargo crev repo fetch url https://github.com/BurntSushi/crev-proofs
$ cargo crev id trust VylyTuk8CMGqIxgHixWaqfiUn3xZyzOA1wFrQ0sR1As

repo: https://github.com/BurntSushi/crev-proofs

crate version
rating
date
reviewer
thoroughness, understanding
positive
2019-08-22
BurntSushi
high, high

I wrote this crate, so this review is a reflection as a result of writing the
code and then reviewing it again for this review.

While the aho-corasick crate is not often used directly, it is a key
optimization technique used in the regex crate for quickly finding
potential matches by searching literals.

I gave this crate a rating of positive instead of the highest strong
because it was somewhat recently rewritten. So it hasn't been thoroughly
vetted yet.

At a higher level, one concern point of this crate is that it has a lot of
unsafe usage. While a small number of those unsafe uses are for the
Aho-Corasick algorithm itself---mostly for explicitly eliding bounds checks
for performance reasons---the vast majority of all unsafe uses are for the
implementation of the Teddy algorithm, which makes heavy use of SIMD through
explicit CPU specific vendor intrinsics. The Teddy algorithm is used as a
fast prefilter to quickly find potential matches when searching for a smaller
number of patterns. The speedups can be an order of magnitude, so the extra
code complexity is worth it.

As with the memchr crate, both the Aho-Corasick algorithm and the Teddy
algorithm are thoroughly tested. Both are tested independently of one another
and when they are used together. Like the memchr crate, the Teddy
algorithm is tested on a wide variety of haystack configurations to test
different haystack lengths and match positions, all of which can exercise
different aspects of the Teddy algorithm. If one counted the total number of
tests for the entire crate (including variations on each), it would easily
be in the tens of thousands.

strong
2019-08-28
BurntSushi
medium, high

The 1.4.0 release removes the hand-rolled unreachable hint in favor of
unreachable_unchecked, which was a new API introduced in Rust 1.27.
Otherwise, nothing substantial was changed, other than allowing some
deprecated APIs to support older versions of Rust.

strong
2019-08-22
BurntSushi
high, high

I wrote this crate, so this review is a reflection as a result of writing
the code and then reviewing it again for this review.

The entire purpose of the memchr crate is to do this:

haystack.iter().position(|&b| b == needle)

... but really fast. As a result, this crate uses SIMD via CPU specific
vendor intrinsics. Consequently, there is a lot of unsafe code in this
crate. There is really no way to avoid this, other than perhaps using a
higher level platform independent SIMD API. But no such thing of sufficient
quality exists for stable Rust at the time of writing.

The testing strategy is the most important bit here. In particular, every
public API item is tested using a permutation of tests that exercise all
possible alignments found in a haystack. (Because the implementations used
aligned loads/stores, which are only correct if the address arithmetic is
correct.) Additionally, there are quickcheck tests that act as a sort of
fuzzer guaranteeing that the implementation is correct for a range of inputs.

I gave the highest rating possible because of the extensive use this crate
has seen, in addition to its level of testing. In particular, memchr
underlies a significant chunk of all text search in the Rust ecosystem.

positive
2019-08-23
BurntSushi
high, high

I wrote this crate, so this review is a reflection as a result of writing the
code and then reviewing it again for this review.

I gave a rating of positive because the implementation in the regex crate has
remain mostly unchanged for a long period of time, so it has matured and been
battle tested. Moreover, the crate uses very little unsafe directly. The
only uses are in the DFA regex engine for explicitly eliding bounds checks
in very hot loops.

A better rating is likely deserved, but there are still a few outstanding
bugs that can produce incorrect matches. In many cases, these bugs are a
result of optimizations using literal searches, which can be subtle and
difficult to get right.

strong
2019-08-22
BurntSushi
high, high

I wrote this crate, so this review is a reflection as a result of writing
the code and then reviewing it again for this review.

The regex-syntax crate is mostly an internal implementation detail of the
regex crate. It is exposed as a stand-alone crate for the occasional use
case where it is convenient to analyze the syntax of a regex. The primary
thing that the regex-syntax crate provides is a parser for the concrete
syntax supported by the regex crate.

The regex-syntax crate is very large, but there is no unsafe used anywhere.
Moreover, it has no dependencies. That means that no matter what concrete
syntax is given, the worst that can happen is a panic. Memory safety should
be preserved (modulo bugs in the compiler or the standard library). On top of
that, the parser does not use explicit recursion and enforces a nest depth
by default, so it should generally be safe with respect to arbitrary input.

positive
2019-08-22
BurntSushi
medium, medium

This crate was written by Amanieu d'Antras in response to my efforts to make
thread safe caching in the regex crate faster. The way to think about it is
"dynamic" thread local storage.

The crate does use a fair amount of unsafe, and the reason for this is to
minimize latency as much as is possible. Each use of unsafe isn't
particularly tricky---it's mostly managing atomic access to a hash table.
With that said, it would be nice if each use of unsafe had a comment
explaining/justifying it. There are some comments scattered about making
reasonable arguments, but more would be good.

This crate did have an issue a while back where if a program used the same
regex (for example) many times from many short lived threads, then the
cached data could grow without bound. Amanieu patched that issue a while
back by reusing thread ids. A thread id is tracked using normal TLS, which
is dropped when the thread is destroyed.

I gave this crate a rating of positive because I cannot identify any issues
after reviewing the code, although, I am not completely certain that there
are no issues because of the use of unsafe code.

© bestia.dev 2023, MIT License, Version: 2023.608.1636

Open source repository for this web app: https://github.com/bestia-dev/cargo_crev_web/