Preliminary results

Work done

A set of two tools have been developed:

BNF2AST2JSON

As the name suggests, this tool take the BNF representation of the IEEE1800-2023 standard, converts it to an AST and then outputs this AST in JSON format, so it may be stored on a filesystem.

The tools works as follows:

  1. The BNF structure has been codified in PEG syntax and stores in bnf.pest. This file describes the structure of the BNF format as used in the IEEE1800-2023. This standard does not fully adhere to the BNF spec, so it is a custom description.

  2. The parser, generated by pest takes the PEG description and uses it to parse the BNF file into a PEG dependent data structure.

  3. This data structure is then walked to obtain the AST that describes the standard.

  4. Finally, the AST is represented in JSON, so another tool can use it.

Command line:

cargo run --input ../BNF.txt > /tmp/ast.json

At this point, we have a number of types that can be parsed and converted to a Directed Acyclic Graph.

Grapher

The job of the grapher is to take the ast.json and an entry point into the graph and then create a DAG that represents all the possible paths from this entry point.

If taken from a valid entry point (for intance any element under “decription”), all path should lead to valid syntax.

This valid syntax may then be processed by a parser to test its functionality.

Preliminary Results

The following DAG describes the paths fanning out from library_description.

The command line to produce this graph is as follows:

cargo run -- --input ../ast.json --entry library_description > /tmp/library_description.graphml

Opening the file with gephy yields the following results:

About the diagram: Each node represents either a literal, indicated with L(), an regular expression (R()), a rule (no special marking) or DUMMY. DUMMY represents a repetition, but has not been implemented yet.

This is the rule to create a library_description:

library_description ::= library_declaration|include_statement|config_declaration|";"

This means a library_description is replaced by either a library_declaration, an include_statement or a config_declaration or a ;

Let’s follow the config_declaration path:

The config_declaration is defined as follows:

config_declaration ::= "config" config_identifier ";" { local_parameter_declaration ";" } design_statement { config_rule_statement } "endconfig" [ ":" config_identifier ]

A config literal, followed by a config_identifier, then a literal ; then a repetition, a design_statement, another repetition, endconfig literal and finally an optional : literal, followed by a config_identifier

Minus the repetitions; this rule consist of the following rules:

config_identifier ::= identifier
design_statement ::= "design" { [ library_identifier "." ] cell_identifier } ";"
cell_identifier ::= identifier
identifier ::= simple_identifier | escaped_identifier
simple_identifier ::= "/[a-zA-Z_]/[a-zA-Z0-9_$]+/"
escaped_identifier ::= "/\\[!-~]+[ \t\n\f]/"

In the second image, it is shown that the config_declaration node is indeed followed by a config literal. The follows a config_identifier node that is made of an identifier node, which in turn can be either a simple_identifier or an escaped_identifier.

Not that at the identifier node, the path splits into two, to rejoin at the ; node. This indicates two different paths an IEEE1800-2023 parser should be able to handle. From this, two separate tests case categories can be generated. One that represents simple_identifiers and one that represents escaped_identifiers. Keep in mind, merely testing the parsing of identifier elements is not enough, because context matters, as shown in the project’s description.

Now, one can just follow the path up to its end. Note that the optional path is tacked on to the end.

Problems

  • In it’s current state, the grapher does not handle recursion well. There are rules that ultimately loop around to itself and this causes a crash.
  • How to deal with [optional] and {repetition}? Should optional elements always be included? How many repetitions? 0? 100?

Next steps

The next step would be to take the node graph, and tack all the literals to one another, under path configuration. For instance, a path generator could be configure to always include a certain node, to ignore optional elements or not, and have an arbitrary number of repetitions.

Conclusion

Though still in its infancy, this toolkit should ultimately be able to generate a sizable set of test cases that would reveal any bugs, or other issues in parsers that handle the IEEE1800-2023 standard.