Digit separators

Dylan Language

Chris
I was looking at some Dylan code the other day that had some very large 
numeric constants, e.g., 10000000, and I was reminded that I always 
wanted a way to add commas to make large numbers more readable (in 
whatever language I'm using.)

Ada, for example, allows underscores to be used in numeric constants. 
For example, the above number could be written 10_000_000. What do 
people think of adding a similar capability to Dylan?

I'm not entirely fond of underscore in particular, though other 
characters may be problematic in various ways.                                            
Alexander
I've always wondered why (outside localization) apparently no language or
software uses quote for that purpose, i.e. 999'999.99.

It seems easier on the eye than underscore and is already used by some
countries already as a thousands seperator. Compared to comma (which is also
used as list and clause seperator) it is less overloaded and easier to tell
apart from the decimal dot.

Like comma, quote seems to suffer from the fact that some countries already
use it as a decimal seperator, but I think comma is more commonly used as such
(it certainly has that role in quite a few European countries, Germany for
example). 

Backtick might also be an option, but it looks somewhat less appealing, I
think: 999`999.99

'as
                                            
Hugh
Yeah, I seem to recall Japanese sometimes uses quote to separate groups of
digits, in which case they tend to group 4 digits at a time, not 3.  But
then other times they group 3, with comma or dot!


I think "option" is the key word here.  This sort of thing sounds like it
should be an extension to the format library, or a new-but-similar library. 
Having formatting (and other things, like handling of date/time) be
parameterised on some "culture" value would be nice, and if you wanted to
invent your own "culture of one" for specific formatting, you could :-)

Hugh
                                            
Alexander
Both the Japanese and the Chinese language work with powers of 10000, not with
powers of 1000 (i.e. there are non-composite words for 10, 100, 1000 and also
for 10000, and then again 10^8, etc. -- but not for 10^6, 10^12 etc.). So from
their point of view thousands grouping must be pretty suboptimal. Still, I'm
fairly sure the Japanese use the British/American conventions when writing
down numbers in Arabic numerals.

'as
                                            
Bruce
*Not* 10^12?  Really?  Why not?
                                            
Alexander
Oops -- stupid mistake. I meant to write 10^9. 10^12 is Ãû in Japanese --
what it is in English sort of depends.

'as
                                            
houselacmorg
Interestingly, according to my Chinese dictionary, Ãû (pronounced
zhao4 in Mandarin) was used for 10^6 in pre-modern times.  It is still
used for 10^6 in certain cases... megahertz (MHz), for example, is
zhao4 he4.

-Peter-
                                            
Alexander
housel@acm.org (Peter S. Housel) writes:


Hopefully no one who gets worked up about Megabytes and Mebibytes is reading
this.

'as
                                            
Chris
Except that I'm suggesting it as a change to the literal syntax, which 
isn't determined by a library. Or are you suggesting something else?

Trying to support localized numeric, date, or time literals is a 
challenge when the syntax varies so much by region and culture and 
source files are static, plain text. AppleScript actually displays and 
edits literals using localized formatting, but it stores programs with a 
locale-neutral byte-code.

Ada uses underscore and, IIRC, they can be placed anywhere in the 
literal, allowing people to break the digits up every two or three or 
four or whatever. It's just a form of comment, really. I think something 
lightweight like that would suffice and be easy to implement, until the 
day comes when software is no longer bound by plain text files.
                                            
Hugh
Ah, not as such, I just wasn't paying attention ;-)

In that case I'd probably go with '_', as it's unlikely to be otherwise used
in connection with numbers and is nicely culture-neutral.



I like that flexibility idea.  In that case I'd suggest a compiler warning
if your underscores are at irregular intervals either side of the decimal
point, as that probably indicates a typo of some sort :-)

Hugh