Guides/Parsing
Parsing is a process of analyzing character stream according to formal lexical, syntactic and/or semantic grammar, producing output structure or evaluation.
Lexical Analysis
Produces an stream of tokens from a stream of input characters. Stream can be a list. Lexing can be done using a sequential machine, regular expressions, or ad hoc splitting. AKA lexing, scanning, tokenizing.
Sequential Machine, AKA finite state machine, finite automata. Uses state transition table.
- dyad ;: Sequential Machine
J implementation with an example of J lexer for Alphabet and Words - Essays/Word Formation on Lines
Sequential machine for J words with space and line tokens with extensive examples - Scripts/JavascriptCruncher
stripping out unnecessary content from the files to reduce file size (comments, etc). - Guides/JWebServer/HttpParser
HTTP header lexer using ;: dyad, and elements of ad hoc parising - Addons/graphics/graphviz
http://olegykj.sourceforge.net/scrshots/graphviz.html
visualizing sequential machines using transition diagrams - JForum:chat/2007-April/000464
JSON style backslash evaluator - JForum:chat/2007-April/000466
JSON tokenizer, with details of producing the sequential machine transition table
Regular Expressions internally may use sequential machine, but have intuitive standard syntax.
- Regular Expressions Lab
Guide to regex library - Essays/Regex Lexer
a lexer based on standard regular expressions and simple token declarations - Scripts/Regular Expressions Substitution
Regular expressions extended for Perl/awk/sed-like substitution
Ad Hoc looks for simple substrings for (iterative) splitting
- JForum:programming/2007-January/004756
example of ad hoc splitting for a list of first/initial/last names - Scripts/Scheme
has a Lisp S-expression string tokenizer
Syntactic Analysis
Produces a structure or evaluates a stream of tokens. The structure is typically a tree of grammar elements. AKA parsing.
Bottom-up, AKA Shift-reduce. E.g., LR parsers.
- Parsing and Execution from J Dictionary, Roger Hui, Kenneth Iverson
- Parsing and Execution from J for C Programmers, Henry Rich
- trace script (https://github.com/jsoftware/general_misc/blob/master/trace.ijs)
provides a model of the J parser whose internal workings can be examined and experimented with - JForum:chat/2007-April/000462
JSON shift-reduce parser
Top-down, AKA Recursive descent. E.g. LL parsers.
- Essays/Recursive Descent Parser
framework for simple building of hand-coded LL parsers using Regex Lexer - Scripts/Scheme
has a tacit recursive-descent parser
Ad Hoc parsing which alternates splitting and combining substring portions on multiple typically non-recursive levels
- csv script (JSvnBase:packages/files/csv.ijs)
reads csv file into a boxed array - pp script
J pretty-print script formatter - User:Chris Burke/Export Script utility (JSvnBase:packages/export)
converts a script into various formats
Handling Structures
Since a lot of parsing is based on ASTs, an introduction to efficient tree handling in J would help. You might look at
- the lab Huffman Coding
- Roger's Essays/Huffman Coding
See Also
J-related information
- Guides/Strings string and text manipulation resources
- JForum:programming/2007-November/008869 some initial links
- User:Dan Bron/Temp/ParseLexExecute implementing J in J
- Guides/Language FAQ/J BNF Is there a BNF description of J?
- JForum:chat/2007-November/000678 J syntax easy to parse? I don't think so
- using JHP for general templating
General information
- WikiPedia:Parsing, Wikipedia