Write a scanner that identifies all the tokens of a Dream program (see Dream Lexical Components). It must take the name of one or more Dream source files from the command line and report any lexical errors. Lexical errors fall into one of three categories: unrecognized character, unterminated string, illegal string.
An unrecognized character is any character of the input file that is not a part of one of the lexical components listed below
An unterminated string is a string whose closing quote is not found by the end of the current input line
An illegal string is a string which contains illegal escape sequences
You may use any language or technology you wish that is available (or can be installed, with instructor approval) on Ubuntu 24.04.
Your program will be invoked from the command line as follows:
zapp [-ds] <filename>
where
zapp invokes the zapp.cmd script to run your lexer<filename> is the name of a Dream source file to be scanned for tokensYou must modify the provided JUnit test LexerTest.java to test your Lexer. A reasonably comprehensive test is expected: your test should check for each keyword and each type of token.
If you use a language other than Java, use an appropriate unit test framework for your language.
Here’s a quick roadmap to help you get started on Phase 1:
Set up the software. If you wish to use a language other than Java, ANTLR can generate scanners and parsers in other languages (Swift, Go, Python). Or, you can use a different compiler generation tool. Check with the instructor to get clearance for your desired technology, then port the provided Java project to your desired implementation language. If you choose to use a different technology, be aware that you will be tested over your knowledge of aspects of the Java technology that is presented in class.
Use the “Your Submission Repository” link at the top of these instructions to create
a submission repository for your phase 1 submission. Clone the repository to your computer.
It has a copy of the lexer example in a folder named dream, preconfigured to work with
VScode. Open the dream folder in VSCode (don’t open the root repository folder, or you won’t get the right
VScode settings, and IDE features for building and compiling won’t work properly).
Rename Arithmetic.g4 to Dream.g4. Change the first line to read grammar Dream. Edit
the Main class to instantiate DreamLexer, and edit the LexerTest class to use DreamLexer.
Then use the following commands in a command prompt to compile and run the project; it should run without errors:
cd dream
copy src\test\resources\cps450\lexertest.txt math.txt
gradlew build install
zapp math.txt
Modify the provided Main class to implement the -ds option, following the Design Requirements specified above. Implementation tip: Comparing Strings in Java must be done by invoking a method in the String class, not by using the == operator.
Review the ANTLR Lexer Rules specification.
Edit Dream.g4 to define all of the Dream tokens. I suggest doing a few at a time, saving, and generating the scanner / testing. Find sample test input files in the class files in tests/phase1.
Your submission will be tested in two environments:
If you use the provided starter project, as long as you do not change the folder structure, you should not need to do anything to get your Gradle project to build and run in my test environment.
If you use a different technology, you must provide a setup.sh script in the root of your submission repository with commands to install your language environment into the GitHub runner environment (if it isn’t already present in the GitHub runner environment). You must also create an env.sh script in the root of your submission repository to define commands to build your project, run unit tests, and execute your project (find the one I use to build Gradle projects in the class tests folder).
Note that the zapp.cmd script provided in the starter project is not used in the testing environments; it is provided as a convenience for Windows users.
Create an extension for Visual Studio Code that implements syntax coloring for Dream source code files.
Implementing a syntax coloring extension for VSCode is not hard to do, and is a fun way of applying compiler technology. These documents should help you get started:
Create a folder inside the root of your submission repository named bonus that holds your extension. Also, make a short screencast of your extension in action, and include a link to it in your report.
Create a README.md in the dream folder with a brief report in Markdown (don’t edit the root-level README.md, which gets replaced with the results of submission checks when you submit). Your report should indicate the number of hours you spent on this phase, and list any known bugs. Also, include an academic integrity statement indicating what help you received, if any.
Push your code to the submission repository. Check the top-level README.md about a minute after you push to see the results of the automated checks. The test script runs your lexer using the sample input at the end of the Dream Lexical Components sheet. You should see output that matches the sample output on the Dream Lexical Components sheet. Note that your code will be tested with more comprehensive tests.