Write a scanner that identifies all the tokens of a Dream program (see Dream Lexical Components). It must take the name of one or more Dream source files from the command line and report any lexical errors. Lexical errors fall into one of three categories: unrecognized character, unterminated string, illegal string.
An unrecognized character is any character of the input file that is not a part of one of the lexical components listed below
An unterminated string is a string whose closing quote is not found by the end of the current input line
An illegal string is a string which contains illegal escape sequences
You may use any language or technology you wish that is available (or can be installed, with instructor approval) on csunix. Your scanner must build and run on csunix.
Your program will be invoked from the command line as follows:
build/install/dream/bin/dream -ds <filename>
where <filename> is the name of a Dream source file to be scanned for tokens, and -ds is an optional debugging output switch:
When the -ds command line option is present, produce a list of tokens, one per line, to standard output. The output should be formatted as shown in the sample run shown in the Dream Lexical Components handout.
When the -ds option is not present, your scanner should not display tokens, but it should still display lexical errors.
You must modify the provided JUnit test LexerTest.java to test your Lexer. A reasonably comprehensive test is expected: your test should check for each keyword and each type of token.
If you use a technology other than Java, use an appropriate unit test framework.
Here’s a quick roadmap to help you get started on Phase 1:
Set up the software. If you wish to use a language other than Java, ANTLR can generate scanners and parsers in other languages (Swift, Go, Python). Or, you can use a different compiler generation tool. Check with the instructor to get clearance for your desired technology, then port the provided Java project to your desired implementation language. If you choose to use a different technology, be aware that you will be tested over your knowledge of aspects of the Java technology that is presented in class.
Use the “Your Submission Repository” link at the top of these instructions to create
a submission repository for your phase 1 submission. Clone the repository to your computer.
It has a copy of the lexer example in a folder named dream
, preconfigured to work with
VScode. Open the dream
folder in VSCode (don’t open the root repository folder, or you won’t get the right
VScode settings).
DreamLexer
. Then compile and run the project with
math.txt as the argument and verify that it still works after those changes.
sModify the provided Main class to implement the -ds option, following the Design Requirements specified above. Implementation tip: Comparing Strings in Java must be done by invoking a method in the String class, not by using the == operator.
Review the ANTLR Lexer Rules specification.
Create an extension for Visual Studio Code that implements syntax coloring for Dream source code files.
Implementing a syntax coloring extension for VSCode is not hard to do, and is a fun way of applying compiler technology. These documents should help you get started:
Create a folder inside the root of your submission repository named bonus that holds your extension. Also, make a short screencast of your extension in action, and include a link to it in your report.
Create a README.md in the dream folder with a brief report in Markdown (don’t edit the root-level README.md, which gets replaced with the results of submission checks when you submit). Your report should indicate the number of hours you spent on this phase, and list any known bugs. Also, include an academic integrity statement indicating what help you received, if any.
Push your code to the submission repository. Check the top-level README.md about a minute after you push to see the results of the automated checks. The test script runs your lexer using the sample input at the end of the Dream Lexical Components sheet. You should see output that matches the sample output on the Dream Lexical Components sheet. Note that your code will be tested with more comprehensive tests.