||[Sep. 14th, 2005|06:38 pm]
So, there's a really huge and significant difference between </b> and <\b> in HTML. If the HTML is part of XML that gets serialized and unserialized, it's even more significant.|
It is also a difference that is nearly impossible to see when looking at code.
Because your brain knows what kind of slash belongs in a close tag, so it doesn't bother to look closely enough to check what kind it is -- why would it be a backslash? That would be stupid.
If anybody can figure out how to make a program that can look at a bunch of text/code, figure out what context chunks of it are probably in, and point out the things that probably don't belong (i.e., "if this is HTML, that bit looks really unlikely"), that would be really, really cool, and I bet you could make a kajillion dollars off it.
(It might also be AI.)
EDIT: I forgot to mention that this HTML was occurring inside a bunch of Java code -- which makes it much harder to pick out.
Yeah. I'm with you. I'd notice the slash quickly, but, will miss a non-matching closed tag, or forget to close something like a
. I've not used Tidy XML before, but my text editor has a decent code validator.
There has to be an emacs mode for this, too. M-x be-smarter-than-me or something. No, but seriously emacs has some awesome text-highlighting stuff that makes things like this show up well.
Ah, but my awesome emacs text highlighting was already dedicated to coloring the java code that was generating the XML in question...
All the HTML was in beige, because it was part of a String.
If it's supposed to be well formed, parse it and see if you're missing closing tags... <\b> isn't a valid tag anyway...
I, on the other hand, helped debug roughly the following code today:
String value = getValue();
text = text.replaceAll(key, value);
I want something that will detect if I'm alternating between different languages' regular expression idioms in the same functions.
The parser is what was crashing on the bad tag. EIT!
That parser seems to not follow one of the old Unix ideas: "succeed quietly,
fail noisily". It should have been verbose and specific about what it gagged on.
Maybe I should hack up my old parser to serve as a quick-and-dirty scanner
for obvious types of errors. It sure was bitchy enough.
It was, actually. It even gave a meaningful error message.
The problem was, it was only meaningful in hindsight, because the parser is part of the Java code, too, and the whole thing broke when I added some code. So naturally I assumed the new code I had written was at fault, rather than the new data The possibility "actually the code is working, but the parser is getting malformed XML" was very low in my brain until I realized that's what it was...
Ah, but you neglect the two fundamental realizations of computer science:
Only in a sufficiently high-level programming language...
Formally, a function is anything which takes input and produces output. javac is a function which takes java code and produces bytecode or not java code and produces error messages. In this instance, your code is data, but it's data which describes another function. Your function takes text and does HTML stuff. <\b> is data, but it is also a function written in the language of BeemersProgram. This function takes no input and produces an exception.