Log in

No account? Create an account
semantic blindness - The Mad Schemes of Dr. Tectonic [entries|archive|friends|userinfo]

[ userinfo | livejournal userinfo ]
[ archive | journal archive ]

semantic blindness [Sep. 14th, 2005|06:38 pm]
So, there's a really huge and significant difference between </b> and <\b> in HTML. If the HTML is part of XML that gets serialized and unserialized, it's even more significant.

It is also a difference that is nearly impossible to see when looking at code.

Because your brain knows what kind of slash belongs in a close tag, so it doesn't bother to look closely enough to check what kind it is -- why would it be a backslash? That would be stupid.

Yes, exactly.

If anybody can figure out how to make a program that can look at a bunch of text/code, figure out what context chunks of it are probably in, and point out the things that probably don't belong (i.e., "if this is HTML, that bit looks really unlikely"), that would be really, really cool, and I bet you could make a kajillion dollars off it.

(It might also be AI.)

EDIT: I forgot to mention that this HTML was occurring inside a bunch of Java code -- which makes it much harder to pick out.

(Deleted comment)
(Deleted comment)
[User Picture]From: k8cre8
2005-09-14 08:20 pm (UTC)
Yeah. I'm with you. I'd notice the slash quickly, but, will miss a non-matching closed tag, or forget to close something like a
. I've not used Tidy XML before, but my text editor has a decent code validator.
(Reply) (Parent) (Thread)
From: orbitalmechanic
2005-09-14 07:22 pm (UTC)
There has to be an emacs mode for this, too. M-x be-smarter-than-me or something. No, but seriously emacs has some awesome text-highlighting stuff that makes things like this show up well.
(Reply) (Thread)
[User Picture]From: dr_tectonic
2005-09-14 10:40 pm (UTC)
Ah, but my awesome emacs text highlighting was already dedicated to coloring the java code that was generating the XML in question...

All the HTML was in beige, because it was part of a String.
(Reply) (Parent) (Thread)
[User Picture]From: flwyd
2005-09-14 10:37 pm (UTC)
If it's supposed to be well formed, parse it and see if you're missing closing tags... <\b> isn't a valid tag anyway...

I, on the other hand, helped debug roughly the following code today:

String value = getValue();
value.replaceAll("([$\\\\])", "\\\\$1");
text = text.replaceAll(key, value);

I want something that will detect if I'm alternating between different languages' regular expression idioms in the same functions.
(Reply) (Thread)
[User Picture]From: dr_tectonic
2005-09-14 10:45 pm (UTC)
The parser is what was crashing on the bad tag. EIT!
(Reply) (Parent) (Thread)
[User Picture]From: madbodger
2005-09-15 12:45 pm (UTC)
That parser seems to not follow one of the old Unix ideas: "succeed quietly, fail noisily". It should have been verbose and specific about what it gagged on. Maybe I should hack up my old parser to serve as a quick-and-dirty scanner for obvious types of errors. It sure was bitchy enough.
(Reply) (Parent) (Thread)
[User Picture]From: dr_tectonic
2005-09-15 01:03 pm (UTC)
It was, actually. It even gave a meaningful error message.

The problem was, it was only meaningful in hindsight, because the parser is part of the Java code, too, and the whole thing broke when I added some code. So naturally I assumed the new code I had written was at fault, rather than the new data The possibility "actually the code is working, but the parser is getting malformed XML" was very low in my brain until I realized that's what it was...
(Reply) (Parent) (Thread)
[User Picture]From: flwyd
2005-09-15 04:19 pm (UTC)
Ah, but you neglect the two fundamental realizations of computer science:
  • Code is Data
  • Data is Code
(Reply) (Parent) (Thread)
[User Picture]From: dr_tectonic
2005-09-15 04:27 pm (UTC)
Only in a sufficiently high-level programming language...
(Reply) (Parent) (Thread)
[User Picture]From: flwyd
2005-09-15 04:43 pm (UTC)
Formally, a function is anything which takes input and produces output. javac is a function which takes java code and produces bytecode or not java code and produces error messages. In this instance, your code is data, but it's data which describes another function. Your function takes text and does HTML stuff. <\b> is data, but it is also a function written in the language of BeemersProgram. This function takes no input and produces an exception.
(Reply) (Parent) (Thread)