Sunday, July 5, 2009

The Latest HOT Term

So..every few years it seems we have to come up with a new HOT or Cool buzzword in the tech community. It seems that DSL is now that term. For those that have been hiding underneath a rock for the last year, DSL = Domain Specific Language. Think of this as a custom programming language designed to handle a very specific task. Wikipedia has a decent definition. DSLs aren't knew, they've been around since the age of 1s and 0s.

We in the programming world seem to invent new DSLs everytime we touch the keyboard. Existing items aren't good enough or may be too wordy. DSLs give me cramps. I think in many cases they are over used and for the wrong reasons. Now that isn't to say that I don't think there is a need for them, when they are done for the right reasons. However, it seems that everybody's answer to a problem now is to create a DSL.

The problem with DSLs is that it's yet another language one has to learn to maintain software. My head is already cramped full of various utility languages and I'm sure it will be cramped with some more over time, but many of these languages don't provide the convience that they try to provide. Ask yourself before coming up with another language:

  1. Is there a currently supported standard that works?
  2. Can I extend this existing standard DSL?
  3. If not, why not?
  4. Will creating yet another DSL really solve the problem and make it easier to maintain going forward.
I work with XML which itself is a DSL for creating other Markup Languages, and its simplicity of implementation and power has also been its bane as well. XML is over used in many situations because it is too convient to use. With this said, there are a hand full of programming DSLs with in the XML world. Some are better than others, and several are starting to evolve into general purpose languages in their own right. Some examples.
  1. XSLT - functional language used for transformations. Comes in two flavors XSLT 1.0 and XSLT 2.0. It itself is written using XML.
  2. XPath 1.0 and XPath 2.0 - A DSL for querying information from a XML Tree, usually a DOM.
  3. XML Schema - a DSL for describing XML Validation. Written using XML.
  4. RelaxNG - Another DSL for describing XML Validation and content. Comes in both an XML based syntax and a Human Readable Compact Syntax.
  5. XPointer - a XML DSL for specifying the location of information with a document.
  6. XInclude - a XML DSL for specifying how to include or import information into existing XML documents.
  7. Web Services - A whole family of DSLs describing how to create web service implementations.
  8. SchemaTron - Yet another XML Validation DSL for validating XML files.
XML has made creating a DSL very simple and straight forward. It however is often used for the wrong reasons.

Which brings me to XText. I've been experimenting with it for a couple of days, and like XML it provides great power at little upfront cost. However, I also think like XML it can run into being applied way to often just because it is easy to create and deploy. XText itself has it's own DSL for describing the languages, however, I wish it would have just leveraged an existing established standard language like the ISO EBNF.

So how many more DSLs will we create? More than we really need.

Friday, July 3, 2009

Eclipse: PsychoPath XPath 2.0 Processor Update

It's been a little while since I gave an update on where the PsychoPath XPath 2.0 processor stands from a W3C Test Suite compliance point. Mukul Gandhi (note: nudge him to get his blog on Planet Eclipse since he is an Eclipse committer now) and I have been working our way steadily through the entire test suite. Here are the current results:

Tests: 8137
Failures: 988
Errors: 42

Code Coverage according to ECLEmma:

The test suite now covers about 76% of the PsychoPath. I suspect that we will be close to 80% to 85% code coverage by the time we complete the test suite. When we started we had about 30 or 40 tests that covered about 20% of the code. Unless you run the coverage analysis with a tool like ECLEmma you really do not know how much of your code is actually being executed by your tests. Trying to get to 100% coverage is not worth doing, getting to above 75% is worth doing and should be worked toward.

XPath 2.0 Castable, Durations, and Date/Times:

This has been the bane of my existance for the last several weeks. Reading the W3C specifications on when something can be cast and when something shouldn't be cast can give one a headache pretty quickly. Luckily the XPath 2.0 and XQuery 1.0 Functions Specification has this nice handy table:


The SeqCastableTest test suite covers about 648 test cases for the core specification. PschyoPath now passes all of these tests. With out the above table, trying to figure out the wording of the specification can be difficult at best, and just plain maddening at worst.

The Natives Will Byte You!

One thing that constantly was biting me was the imprecision that the Java primitive types have. PsychoPath in it's original implementation used the Integer, Double, and Float classes exclusively. The problem was that in many cases, especially during IntegerLiteral and DecimalLiteral parsing or creation, these were either too small to hold the numbers the tests wanted, or their precision was not accurate enough. A good explanation of the reason can be found in "Double Precision Numbers". Switching where need to BigInteger and BigDecimal solved many of these issues.

Dates are still another pain to deal with especially calculations. Again you start to run into a precision issue, where some numbers are stored too low. I did some experimentation and both the com.ibm.icu.util.GregorianCalendar and the built in Java GregorianCalendar have precision issues, some in the same place others not. I had thought about trying to leverage Joda Time but that was previously turned down by Eclipse IP team due to pedigree concerns.

I dread debugging these issues. Dates and Time calculations give me a headache.

What's left:

Lots. There are several core functions that are not working or are missing. Some logic expression errors to correct, and a few more lexer/parser issues to address. Overall though, PsychoPath passes about 88% of the W3C Test Suite, so is much more compliant than it started out. Credit still has to be given to Andrea Bittau, the original author, for a well designed implementation, and some very clean and easy to maintain code. Compared to some of the code I've worked on at Eclipse, this is just a pleasure to work with.

All current changes are in the WTP Source Editing CVS Repository.

XText for RelaxNG Update

Peter Friese spent some time with me this afternoon to help work around some of the issues I had with the EBNF grammar and Xtext. The results are very encouraging.

The above is the editor, and outline view that is generated from the XText grammar. We even have some very basic content assistance that comes with it.


The hope is to p0lish this up over the next week or so, and contribute the source to the RelaxNG Editor and Validator enhacement in WTP. A code contribution has already been attached to the bug that provides Grammar aware content assistance and validation for XML files backed by a RelaxNG grammar. Hopefully we can get this project quickly into the WTP Incubator so that RelaxNG support can be added by Helios in some form.

Thanks again for the help Peter.

Thursday, July 2, 2009

UntypedAtomic and YearMonthDuration Casting Question

While working on the PsychoPath XPath 2.0 processor in the eclipse WTP project, I ran across the following situation. According the the W3C XPath 2.0 test suite test case CastableAs030 the following test should fail:


xs:untypedAtomic("-P1Y1M1DT1H1M1.123S") castable as xs:yearMonthDuration


The result should be false, however, we are returning true. As far as my understanding of the specification, the xs:untypedAtomic value should be treated as a string value, and when casting to xs:yearMonthDuration only the form of "-P1Y1M1" is processed ignoring the rest. It also appears that this is a validation duration describing:
a negative 1 year, 1 month, 1 day, 1 hour, 1 minute, and 1.123 seconds

At least that is my interpretation of section 17.1.4 of Casting from primitive types to primitive types. Am I missing something or is this a known bug in the W3C Test Suite?

XText for RelaxNG

So, I decided to try and create an XText Grammar for the RelaxNG Compact Syntax. Mixed results so far, mainly having to do with my lack of knowledge on the XText grammar and it's mapping to EBNF.

Here is what I have so far:



grammar org.oasisopen.relaxng with org.eclipse.xtext.common.Terminals

generate relaxng "http://www.oasis-open.org/relaxng"

terminal Letter:
ID;

terminal CHAR:
'#x9' | '#xA' | '#xD' | ('#x20'..'#xD7FF') | ('#xE000'..'#xFFFD') | ('#x10000'..'#x10FFFF');

terminal NewLine:
'#xA' | '#xD' | ('#xA' '#xD');

terminal NameStartChar:
":" |
('A'..'Z') |
"_" |
('a'..'z') |
('#xC0'..'#xD6') |
('#xD8'..'#xF6') |
('#xF8'..'#x2FF') |
('#x370'..'#x37D') |
('#x37F'..'#x1FFF') |
('#x200C'..'#x200D') |
('#x2070'..'#x218F') |
('#x2C00'..'#x2FEF') |
('#x3001'..'#xD7FF') |
('#xF900'..'#xFDCF') |
('[#xFDF0'..'-#xFFFD') |
('#x10000'..'#xEFFFF');

terminal NameChar:
NameStartChar | "-" | "." | ('0'..'9') | '#xB7' | ('#x0300'..'#x036F') | ('#x203F'..'#x2040');

Model :
(elements += TopLevel);

TopLevel:
Decl* (Pattern | GrammarContent*);

Decl:
'namespace' (IdentifierOrKeyWord '=' NamespaceURILiteral) |
'default' 'namespace' (IdentifierOrKeyWord) '=' NamespaceURILiteral |
'datatypes' IdentifierOrKeyWord '=' Literal;

Pattern:
'element' NameClass '{' Pattern '}' |
'attribute' NameClass '{' Pattern '}' |
Pattern (',' Pattern)+ |
Pattern ('&' Pattern)+ |
Pattern ('|' Pattern)+ |
Pattern '?' |
Pattern '*' |
Pattern '+' |
'list' '{' Pattern '}' |
'mixed' '{' Pattern '}' |
Identifier |
'parent' Identifier |
'empty' |
'text' |
(DataTypeName) DataTypeValue |
DataTypeName ('{' Param* '}') (ExceptPattern) |
'notAllowed' |
'external' AnyURILiteral (Inherit) |
'grammar' '{' GrammarContent* '}' |
'(' Pattern ')';

Param:
IdentifierOrKeyWord '=' Literal;

ExceptPattern:
'-' Pattern;

GrammarContent:
Start | Define |
'div' '{' GrammarContent* '}' |
'include' AnyURILiteral (Inherit) ('{' IncludeContent* '}');

IncludeContent:
Define | Start |
'div' '{' GrammarContent* '}';

Start:
'start' AssignedMethod Pattern;

Define:
Identifier AssignedMethod Pattern;

AssignMethod:
'=' |
'|=' |
'&=';

NameClass:
Name |
NsName (ExceptClassName) |
AnyName (ExceptClassName) |
NameClass '|' NameClass |
'(' NameClass ')';

Name:
IdentiferOrKeyWord | CName;

ExceptNameClass:
'-' NameClass;

DataTypeName:
CName |
'string' |
'token';

DataTypeValue:
Literal;

AnyURILiteral:
Literal;

NamespaceURILiteral:
Literal | 'inherit';

Inherit:
'inherit' '=' IdentifierOrKeyWord;

IdentifierOrKeyWord:
Identifier | KeyWord;


Identifier:
(NCName .. KeyWord) |
QuotedIdentifier;


CName:
NCName ':' NCName;

NsName:
NCName ':*';

AnyName:
'*';

Literal:
LiteralSegment ('~' LiteralSegment)+;

LiteralSegment:
'"' (CHAR .. ('"' | NewLine))* '"' |
"'" (CHAR .. ("'" | Newline))* "'" |
'"""' (('"') ('"') (CHAR .. '"'))* '"""';

KeyWord:
"attribute"
| "default"
| "datatypes"
| "div"
| "element"
| "empty"
| "external"
| "grammar"
| "include"
| "inherit"
| "list"
| "mixed"
| "namespace"
| "notAllowed"
| "parent"
| "start"
| "string"
| "text"
| "token";

NCName:
NCNameStartChar NCNameChar*; /* An XML Name, minus the ":" */

NCNameChar:
NameChar - ':';

NCNameStartChar:
Letter | '_';

Name:
NameStartChar (NameChar)*;

Names:
Name ('#x20' Name)*;

Nmtoken:
(NameChar)+;

Nmtokens:
Nmtoken ('#x20' Nmtoken)*;

QuotedIdentifier:
'\' NCName;


Some of the errors I can figure out...others are just baffling. Particular the ones where it can't find the Rule even though it's been editted. Help from the XText gurus out their would be appreciated.

Wednesday, July 1, 2009

Eclipse Galileo: Setting up PDT 2.1 and XDebug support

Typically when working on PHP files I don't have much of a need to setup a debugger. However, I've recently had to go through some really nasty PHP 4 code. It was one big massive mess of nested if then else statements no functions, and a crap load of global variables. PHP is a good utility language, but some of the people writing the code need to read Uncle Bob's Clean Code book.

My development machine is a Ubuntu box and I have eclipse Galileo Java EE package installed with PDT 2.1 added as well. There is a decent blog entry at Hodge title
Debugging PHP Applications with Xdebug and Eclipse PDT. However you can simplify some of the steps.
  1. Use apt-get or Synaptic to install apache2, php5, php5-cli, and the php5-xdebug module.
  2. Go to the terminal and restart apache2: sudo /etc/init.d/apache2 restart
  3. Follow the rest of the steps outline in Hodge's article for making sure XDeubg is turned on and enabled.
There is no need to download and compile xdebug yourself, as it's already in the Ubuntu apt repository. If you follow the rest of his steps for setting up PDT for XDebug, you should be ready to debug.

Tuesday, June 30, 2009

Eclipse: RelaxNG Tools

Most people that work with XML have at least heard of DTDs and XML Schemas. Both are widely used to provide a validation model for the XML document. XML Schema has also been tried to be used as a general purpose modeling language and used within data binding frameworks for code generation. However, XML Schema and DTDs do not necessarily work or are the best fit for all document designs. XML Schema can be very heavy weight, overly wordy, and in general a pain to work with at times. DTDs do not allow data typing, and personally I hate the DTD syntax.

So, if you need the power that XML Schema can provide, but want an easier for humans to read language what are your alternatives. RelaxNG.

Bug 281529 has been opened to help bring a RelaxNG set of tools and frameworks over to the Web Tools Platform Incubator. RelaxNG provides both an XML syntax and a more human friendly Compact Syntax:

Sample Compact Syntax:

element addressBook {
element card {
attribute name { text },
attribute email { text }
}*
}


In some ways I think the RelaxNG might be a good place for XText and its DSL editor generation abilities. The RelaxNG language is a doman specific language for XML validation. It's compact syntax would seem to fit well into the range that XText can support.

RelaxNG is very popular in the Document centric industries. The Open Document set of grammars are written entirely in RelaxNG. A good set of open source tooling and integration within eclipse is missing for RelaxNG. OxygenXML does contain RelaxNG support but this isn't an open source solution that others can use and contribute to build their own implementations. It would be ideal if we could get all interested players together to work off a common framework to meet the 80% that everybody needs, and then extend where necessary the additional 20%.