|
|
|
|
| Author |
Message |
Felger Carbon Guest
|
Posted: Thu Aug 21, 2003 3:26 am Post subject: More Japanese Vector Supercomputer |
|
|
Thanx to Robert Myers for the URL of the Dongarra presentation on the
Japanese Earth Simulator. This is a synopsis of the Dongarra 1.7meg
PDF (36 slides):
The first thing is to ignore slide 4, which is the general spec of the
NEC SX-7 (proposed) computer. Totally unrelated to the Earth
Simulator (ES), which is an SX-6.
The fundamental unit of the ES is a silicon chip that runs at 500MHz
and contains one scalar processor and 8 vector processors. This chip
is called the Arithmetic Processor (AP). 8 vector processors at
500MHz, performing multiply-adds, is 8GFlops/sec per chip. A node has
8 chips, hence a vector length of 64, 64GFlops/sec per node. There
are 640 nodes, hence 40Tflops/sec for the entire computer.
A short review:
1G (giga) = 1,000 megs
1T (tera) = 1,000 gigs
1P (peta) = 1,000 teras
Gordon Bell in "What's Next in High Performance Computing"
(Communications of the ACM, Feb 2002): "In 2001, the world’s Top500
computers consist of about 100,000 processors, each operating at about
1Gflops. Together they deliver slightly over 100Tflops."
The Advanced Strategic Computing Initiative (ASCI) is aiming at
1PFlop - a petaflop - in 2010. Bell says "As the performance of
single, multiprocessor chips approaches 100Gflops, a petaflops machine
will only need 10,000 units." Remember, the ES contains 5120 8GFlop
multiprocessor chips (APs) and runs at 0.04 PFlops.
ES memory bandwidth: 3.2GByte/sec per chip (8*500KHz*64-bit word),
256GBytes/sec per node, and ~164TBytes/sec for all 640 nodes. This is
for vector operation; when the scalar units are in operation the
memory bandwidth is lower.
ES disk storage = 600Tbytes (600,000 GBytes)
ES tape storage = 1-15 PetaBytes (PB)
KFLOPs/inhabitant: Japan 450, US 358, Germany 245, Scandinavia 207,
UK 203. This apparently includes only supercomputers in the Top500.
Remember, a 3.2GHz P4 is capable of 6.4GFlops peak, or 6.4 *million*
KFlops!
One ES scalar unit (which is a part of the multiprocessor chip) has a
peak performance of 1GFlops/sec. The comparison on Slide 16 is
somewhat misleading; it compares the P4 (one chip) with an ES node (8
chips). For small vectors, the ES runs at only 1GFlop/sec per node,
about equal to the single-chip 2.53GHz P4. The scalar unit on each ES
multiprocessor chip has data and instruction caches of 64K each.
These caches are not available to the 8 vector units, which have their
own vector registers. Naturally, a vector unit does not require an
instruction cache! ;-)
More background info: In 1976, Cray shipped the Cray-1, which had a
peak performance of 133MFlops. This is 300,000 times slower than the
ES. In 1993, Cray shipped the Y-MP, which in its full 8-processor
configuration could do 2.67GFlops - a mere 15,000 times slower than
the ES. Today's 3.2GHz P4, with 6.4GFlop peak performance, is 48
times faster than a Cray-1! And you can buy one for about a week's
pay for an EE. ;-) |
|
| Back to top |
|
 |
|
|
Robert Myers Guest
|
Posted: Thu Aug 21, 2003 9:40 am Post subject: Re: More Japanese Vector Supercomputer |
|
|
On Wed, 20 Aug 2003 22:26:42 GMT, "Felger Carbon" <fmrfne@jps.net>
wrote:
| Quote: | Thanx to Robert Myers for the URL of the Dongarra presentation on the
Japanese Earth Simulator. This is a synopsis of the Dongarra 1.7meg
PDF (36 slides):
The first thing is to ignore slide 4, which is the general spec of the
NEC SX-7 (proposed) computer. Totally unrelated to the Earth
Simulator (ES), which is an SX-6.
The fundamental unit of the ES is a silicon chip that runs at 500MHz
and contains one scalar processor and 8 vector processors. This chip
is called the Arithmetic Processor (AP). 8 vector processors at
500MHz, performing multiply-adds, is 8GFlops/sec per chip.
|
So, to get back to your original question, why would Japan/NEC endure
the stunning development costs of a completely new vector processor
that only produces 8GFlops/sec/chip when AMD has an evolutionary
design that can deliver 3.8GFlops/sec/chip?
The answer I think, relative to your original post is in two parts:
1. The market is bigger than you think. This is more of a comp.arch
question than a csiphc question, but problems that can be run
efficiently on many parallel processors can in general also be
efficiently vectorized. The general case is that there is a
formulation in which, for some part of the computation, the entire
problem breaks up into N independent parallel processes. Index those
processes j=1...N and that becomes the variable you vectorize on, not
any variable that naturally would index what you would normally
associate with a vector.
In order for this to work, the data have to be laid out in memory in
some predictable way, which occurs often enough naturally or can be
forced with a scatter/gather, and the processor usually needs to be
able to access data in a vector fashion on a non-unit stride in
memory. The ability to access memory on a non-unit stride is what
separates a what I would call a genuine vector unit from SIMD, which
requires either a unit stride or time-consuming pack/unpack
operations.
But, you might still say, who cares? So what if Japan controls the
market for supercomputers? What's at stake here is ultimately the
highest stakes of all human enterprises to be undertaken, which is
molecular biology. We're a *long* way from being able to do molecular
biology with confidence on computers, but the day will come, and if
the US wants to be a player, it has to be a player in the
supercomputer business. Designing atom bombs and space shuttles, by
comparison, is kid's stuff, and the "earth simulator" part of it all
is just an excuse for Japan to provide public financing for a major
move in the direction of technology leadership.
2. Once you've endured the stunning up-front costs, then *you* can get
into the evolutionary design business. The NEC processor is
delivering its 8GFlops/sec/chip lumbering along at 500MHz. If you
only got it up to Madison speeds, 1.5GHz, it would be delivering
24GFlops/sec/chip.
The NEC Vector processor has the potential to more or less blow the US
off the map permanently as far as the supercomputer business is
concerned, and that's why the U.S., which hitherto had been happy
enough to let PC users finance processor development, suddenly got
back into the business of buying specialized high-performance
processors, including a new vector processor from Cray.
_________
The processor is only one piece of the puzzle and maybe not even the
most important piece. The high-speed Black Widow interconnect fabric
being developed for Red Storm is the real magic in that machine, not
the Opteron. Once the fabric is developed, any processor with a
low-latency interconnect to the processor core can become a player.
Since the earth simulator is so geographically spread out, it must
involve some significant interconnect engineering, to put it mildly,
and that has to have gotten the attention of Washington, which had
previously let the Cray T3E team just evaporate, as well.
Once Washington was roused from its permanent pre-retirement afternoon
nap, it also probably realized that supercomputers wasn't the only
business the US was about to blown out of the water on, and that we
were becoming an also-ran in the high-speed interconnect business as
well, all the hooha about hypertransport and infiniband
notwithstanding. That is to say, Washington couldn't rely on PC users
to pay for state-of-the-art high speed inteconnect development,
either.
All told, PC users have been paying for alot of R&D.
RM |
|
| Back to top |
|
 |
Felger Carbon Guest
|
Posted: Thu Aug 21, 2003 2:39 pm Post subject: Re: More Japanese Vector Supercomputer |
|
|
"Robert Myers" <rmyers@rustuck.com> wrote in message
news:q3g8kvklo9f3oh9ff3p2v0ec9kclp8e2v2@4ax.com...
| Quote: |
snip
Since the earth simulator is so geographically spread out, it must
involve some significant interconnect engineering
|
Uh, Robert, you went through those Dongarra slides too quickly. The
geographically spread out system illustrated on slide 21 is for a
proposed Japanese national grid system, not the Earth Simulator
Computer (ESC).
Slide 2 shows the ESC fully contained in a single dedicated building,
and the final comment on that slide lists *centralized* as one of the
outstanding merits of the ESC.
BTW: Slide 3 shows the very rapid development of the NEC vector
computer line:
1995 SX-4 2GFlops 148 LSI chips
1998 SX-6 8GFlops 32 LSI chips
2002 SX-6 8GFlops 1 chip
(The ESC is a 640-node SX-6)
I don't think the U.S. will overtake Japan's vector processor
developments easily. For one thing, Japan is willing to spend $400
million on one vector processor. The U.S. is not. ;-( |
|
| Back to top |
|
 |
chrisv Guest
|
Posted: Thu Aug 21, 2003 6:17 pm Post subject: Re: More Japanese Vector Supercomputer |
|
|
On Thu, 21 Aug 2003 09:39:59 GMT, "Felger Carbon" <fmrfne@jps.net>
wrote:
| Quote: | "Robert Myers" <rmyers@rustuck.com> wrote in message
news:q3g8kvklo9f3oh9ff3p2v0ec9kclp8e2v2@4ax.com...
snip
Since the earth simulator is so geographically spread out, it must
involve some significant interconnect engineering
Uh, Robert, you went through those Dongarra slides too quickly. The
geographically spread out system illustrated on slide 21 is for a
proposed Japanese national grid system, not the Earth Simulator
Computer (ESC).
Slide 2 shows the ESC fully contained in a single dedicated building,
and the final comment on that slide lists *centralized* as one of the
outstanding merits of the ESC.
BTW: Slide 3 shows the very rapid development of the NEC vector
computer line:
1995 SX-4 2GFlops 148 LSI chips
1998 SX-6 8GFlops 32 LSI chips
2002 SX-6 8GFlops 1 chip
(The ESC is a 640-node SX-6)
I don't think the U.S. will overtake Japan's vector processor
developments easily. For one thing, Japan is willing to spend $400
million on one vector processor. The U.S. is not. ;-(
|
Aren't supercomputers obsolete now anyways, with clustering and all?
I'm sure there's a few applications where it's better to have one big
machine, but at what cost? |
|
| Back to top |
|
 |
Guest
|
Posted: Thu Aug 21, 2003 7:05 pm Post subject: Re: More Japanese Vector Supercomputer |
|
|
chrisv <chrisv@nospam.invalid> wrote:
| Quote: | Aren't supercomputers obsolete now anyways, with clustering and all?
I'm sure there's a few applications where it's better to have one big
machine, but at what cost?
|
It's often far easier to code for the shared-memory supercomputers than
for the distributed-memory clusters, and in many cases it is necessary to
optimize the human side of things too, and not just the computers.
Also, the memory bandwidth of the typical supercomputer is generations
ahead of most clusters, and most of my problems are memory-bound, not
cpu-bound.
--
Bjørn-Ove Heimsund |
|
| Back to top |
|
 |
Fix your Windows Problems - FAST.
FREE Safe Scan Registry Check. Locate & Fix Errors in Minutes!
|
|
Robert Myers Guest
|
Posted: Thu Aug 21, 2003 8:27 pm Post subject: Re: More Japanese Vector Supercomputer |
|
|
On Thu, 21 Aug 2003 09:39:59 GMT, "Felger Carbon" <fmrfne@jps.net>
wrote:
| Quote: | "Robert Myers" <rmyers@rustuck.com> wrote in message
news:q3g8kvklo9f3oh9ff3p2v0ec9kclp8e2v2@4ax.com...
snip
Since the earth simulator is so geographically spread out, it must
involve some significant interconnect engineering
Uh, Robert, you went through those Dongarra slides too quickly. The
geographically spread out system illustrated on slide 21 is for a
proposed Japanese national grid system, not the Earth Simulator
Computer (ESC).
Well, actually, I didn't even know about the proposed Japanese |
national grid system (that is to say, I looked into the briefing only
far enough to assure myself that it contained the information you were
requesting), otherwise I might have been careful not to leave the
impression that I was referring to it. The one dedicated building the
Earth Simulator occupies is, what, what, the size of a couple of
basketball courts? At 500MHz, one CPU clock=2 ns=60 cm. Takes alot
of dribbling to get from one end of the court to the other.
Wiring the backplane of a Cray involves carefully measured wire
lengths. A basketball court sized computer must involve either an
awful lot of measured wire or some very careful measurements and alot
of tuning.
RM |
|
| Back to top |
|
 |
Felger Carbon Guest
|
Posted: Fri Aug 22, 2003 1:22 am Post subject: Re: More Japanese Vector Supercomputer |
|
|
"Robert Myers" <rmyers@rustuck.com> wrote in message
news:pko9kvkacilddrldpi0v4t3236ik7e16pl@4ax.com...
| Quote: |
The one dedicated building the
Earth Simulator occupies is, what, what, the size of a couple of
basketball courts? At 500MHz, one CPU clock=2 ns=60 cm. Takes alot
of dribbling to get from one end of the court to the other.
Wiring the backplane of a Cray involves carefully measured wire
lengths. A basketball court sized computer must involve either an
awful lot of measured wire or some very careful measurements and
alot
of tuning.
|
Today, all supercomputers, even the Earth Simulator (ES) vector
machine, involve forms of clustering. The fundamental unit of a
P4-based cluster is one P4. Of an Operon-based cluster, 4 or 8
Opterons. The fundamental unit of the ES is the node, which has a
vector length of 64. Two nodes can fit into one cabinet in the ES
building. So the measured wire problem is contained in half a
cabinet, not in the entire ES building.
The ES is a cluster of nodes. Intercommunication among the cluster
fundamental units is a problem of all clusters, not just the ES.
Again, all supercomputers these days are clusters. |
|
| Back to top |
|
 |
Felger Carbon Guest
|
Posted: Fri Aug 22, 2003 1:22 am Post subject: Re: More Japanese Vector Supercomputer |
|
|
"chrisv" <chrisv@nospam.invalid> wrote in message
news:ehh9kvkis9i7bfoakatctfdmctntmj8i2q@4ax.com...
| Quote: |
Aren't supercomputers obsolete now anyways, with clustering and all?
I'm sure there's a few applications where it's better to have one
big
machine, but at what cost?
|
You are correct, there are some applications that really need a
low-latency vector machine. At what cost? What are you willing to
pay? The Japanese government was willing to pay $400 million to get
the Earth Simulator vector machine. This machine was actually
delivered over a year ago, and has been busily crunching numbers ever
since.
The U.S. government has other priorities. It is willing to _study_
vector machines, but is not willing to actually buy a full-scale
vector machine in the ES class. |
|
| Back to top |
|
 |
Robert Myers Guest
|
Posted: Fri Aug 22, 2003 4:45 am Post subject: Re: More Japanese Vector Supercomputer |
|
|
On Thu, 21 Aug 2003 20:22:32 GMT, "Felger Carbon" <fmrfne@jps.net>
wrote:
<snip>
| Quote: | Today, all supercomputers, even the Earth Simulator (ES) vector
machine, involve forms of clustering. The fundamental unit of a
P4-based cluster is one P4. Of an Operon-based cluster, 4 or 8
Opterons. The fundamental unit of the ES is the node, which has a
vector length of 64. Two nodes can fit into one cabinet in the ES
building. So the measured wire problem is contained in half a
cabinet, not in the entire ES building.
|
I don't know how you get the settling time for the entire cluster down
to a reasonable number without considering physical path delays over
the entire building. It was the first question on my mind when I
heard about the earth simulator. Letting the entire thing run
asynchronously without considering physical location and path delays
between clusters would give decent performance only for problems that
didn't require much global communication. Unless you want the time
step to be set by the speed of sound, fluid mechanical calculations
require global communication at every time step.
RM |
|
| Back to top |
|
 |
Robert Myers Guest
|
Posted: Fri Aug 22, 2003 6:04 am Post subject: Re: More Japanese Vector Supercomputer |
|
|
On Thu, 21 Aug 2003 20:22:32 GMT, "Felger Carbon" <fmrfne@jps.net>
wrote:
<snip>
| Quote: |
The U.S. government has other priorities. It is willing to _study_
vector machines, but is not willing to actually buy a full-scale
vector machine in the ES class.
|
That actually may not be the dumbest move the U.S. has ever made,
since vector processors may be yesterday's technology. I'd certainly
want to look hard at things like Cell and processor in memory before I
dumped even a measly 400 mill on a super duper new vector processor.
Since superfast processors have national security implications, it
would be a reasonable to wonder if we know everything about what the
U.S. is funding in the advanced processor department. My guess would
be that we don't.
RM |
|
| Back to top |
|
 |
|
|
jack Guest
|
Posted: Fri Aug 22, 2003 1:13 pm Post subject: Re: More Japanese Vector Supercomputer |
|
|
Felger Carbon <fmrfne@jps.net> wrote:
: "chrisv" <chrisv@nospam.invalid> wrote in message
: news:ehh9kvkis9i7bfoakatctfdmctntmj8i2q@4ax.com...
::
:: Aren't supercomputers obsolete now anyways, with clustering and all?
:: I'm sure there's a few applications where it's better to have one big
:: machine, but at what cost?
:
: You are correct, there are some applications that really need a
: low-latency vector machine. At what cost? What are you willing to
: pay? The Japanese government was willing to pay $400 million to get
: the Earth Simulator vector machine. This machine was actually
: delivered over a year ago, and has been busily crunching numbers ever
: since.
:
: The U.S. government has other priorities. It is willing to _study_
: vector machines, but is not willing to actually buy a full-scale
: vector machine in the ES class.
Well, you have to admit. When our government is busy spending
$1,000,000,000 (that's billion) a week getting our sons and daughters
killed, it's a foregone conclusion that there will be NO money for
anything else. Agreed? :-(
J.
--
--------
The end to "Personal Computing" as we know it is just around the corner.
TCPA will take away ALL rights from you, the consumer. Learn more
about it here: http://www.againsttcpa.com/what-is-tcpa.html and
here: http://www.againsttcpa.com/tcpa-faq-en.html |
|
| Back to top |
|
 |
Piotr Sawuk Guest
|
Posted: Sat Aug 23, 2003 12:26 am Post subject: Re: More Japanese Vector Supercomputer |
|
|
In article <2llakv0fk3qp2b2de2of4okbvnogcmq1rq@4ax.com>,
Robert Myers <rmyers@rustuck.com> writes:
| Quote: | On Thu, 21 Aug 2003 20:22:32 GMT, "Felger Carbon" <fmrfne@jps.net
wrote:
snip
Today, all supercomputers, even the Earth Simulator (ES) vector
machine, involve forms of clustering. The fundamental unit of a
P4-based cluster is one P4. Of an Operon-based cluster, 4 or 8
Opterons. The fundamental unit of the ES is the node, which has a
vector length of 64. Two nodes can fit into one cabinet in the ES
building. So the measured wire problem is contained in half a
cabinet, not in the entire ES building.
I don't know how you get the settling time for the entire cluster down
to a reasonable number without considering physical path delays over
the entire building. It was the first question on my mind when I
heard about the earth simulator. Letting the entire thing run
asynchronously without considering physical location and path delays
between clusters would give decent performance only for problems that
didn't require much global communication. Unless you want the time
step to be set by the speed of sound, fluid mechanical calculations
require global communication at every time step.
|
as far as I have learned in a MP-course the whole point of
vector-computers is to have a single command being applied
to many consecutive array-positions. not global communication,
but communication in the sense of "the third next array-pos
needs to send data" is required. then it also isn't too
difficult to predict the next steps of the execution-path.
I guess that's why they got abandoned: there aren't many
applications which would need vectors with hundreds of
dimensions (since 3 or 4 dimensions can already be handled
by a simple 64-bit-processor with some smart use of the
registers). for example if I would try to simulate the
earth (or similarily complex system) then I could imagine
that a lot of variables need a similar treatment whenever
something changes (like each atom needs to be moved in the
same direction when the object gets moved). Somehow I
suspect that the uses for vector-computers are foreign
for us simply because there aren't many such computers
around. At least I could think of some nice games I
would wish to play on a vector-computer...
did anybody actually look up what this japanese
vector-supercomputer has been used for?
--
Better send the eMails to netscape.net, as to
evade useless burthening of my provider's /dev/null...
before complaining because of my rudeness, read
http://www.unet.univie.ac.at/~a9702387/en/adl/liar-faq.txt
and killfile me...
P |
|
| Back to top |
|
 |
Robert Myers Guest
|
Posted: Sat Aug 23, 2003 10:06 pm Post subject: Re: More Japanese Vector Supercomputer |
|
|
On 22 Aug 2003 19:26:12 GMT, piotr5@unet.univie.ac.at (Piotr Sawuk)
wrote:
| Quote: | In article <2llakv0fk3qp2b2de2of4okbvnogcmq1rq@4ax.com>,
Robert Myers <rmyers@rustuck.com> writes:
snip
I don't know how you get the settling time for the entire cluster down
to a reasonable number without considering physical path delays over
the entire building. It was the first question on my mind when I
heard about the earth simulator. Letting the entire thing run
asynchronously without considering physical location and path delays
between clusters would give decent performance only for problems that
didn't require much global communication. Unless you want the time
step to be set by the speed of sound, fluid mechanical calculations
require global communication at every time step.
as far as I have learned in a MP-course the whole point of
vector-computers is to have a single command being applied
to many consecutive array-positions. not global communication,
but communication in the sense of "the third next array-pos
needs to send data" is required. then it also isn't too
difficult to predict the next steps of the execution-path.
I guess that's why they got abandoned: there aren't many
applications which would need vectors with hundreds of
dimensions (since 3 or 4 dimensions can already be handled
by a simple 64-bit-processor with some smart use of the
registers). for example if I would try to simulate the
earth (or similarily complex system) then I could imagine
that a lot of variables need a similar treatment whenever
something changes (like each atom needs to be moved in the
same direction when the object gets moved). Somehow I
suspect that the uses for vector-computers are foreign
for us simply because there aren't many such computers
around. At least I could think of some nice games I
would wish to play on a vector-computer...
You're conversing with a veteran Cray programmer who is still trying |
to get used to the idea of cache and who cut his teeth on the notion
of chime or chain slot time. Want a look at what a real computer
looks like? Check out
http://ed-thelen.org/comp-hist/CRAY-1-HardRefMan/CRAY-1-HRM.html
The issues of vector processing and global communication sometimes get
tangled up because of data locality problems, but in general the two
issues are related only weakly.
Its hard to know from your description whether you are referring to
data dependency or a branch embedded in an inner loop as obstacles to
vectorization, but there are ways around both of those obstacles for
many cases of great interest. There is a general strategy for
vectorizing most multi-dimensional simulations of physical phenomena
that I tried to describe in an earlier post, and so the number of
problems to which vector methods are adaptable is quite large.
Vector processors and serious funding for the high-speed interconnect
fabrics were the victims of post-cold-war US DoD and DoE self-delusion
about something called "COTS", or "commercial off-the-shelf," not for
any reason having to do with the class of problems that vector
processing could be applied to.
The Cray 1 was capable of 133 megaflops, cost, as I recall, about $13
million, and required extensive plumbing. A P4 system capable of
delivering performance in the gigaflop range fits into an ATX tower,
can be had for about $1000, and requires no plumbing. Anybody who
could tell a bit from a byte knew this was coming by the time the
Berlin Wall came down, and the US DoD and the DoE decided that letting
PC users fund computer R&D was not such a bad deal.
The NSF had a supercomputer effort going through the nineties, and it
god alot of press, but the NSF doesn't have the funding clout of the
DoD and the DoE, and things got so bad that Cray could not survive
independently as a manufacturer of computers.
The Earth Simulator changed all that. Cray is back in business and
building vector processors again.
That does not mean that the COTS problem has gone away. This very
month IBM had to tell the US government, once again, that it was not
interested in funding R&D for computers that could not be
commercialized, and that, if the US wanted cutting edge supercomputers
for special purposes, it would have to come up with the money.
| Quote: | did anybody actually look up what this japanese
vector-supercomputer has been used for?
|
The earth simulator is billed as being used for earth sciences.
Atmospheric modeling, aka weather prediction, has long been one of the
most demanding applications for high-performance computing. Weather
prediction is useless if the model doesn't run faster than real time,
and that's a real challenge.
RM |
|
| Back to top |
|
 |
Piotr Sawuk Guest
|
Posted: Mon Aug 25, 2003 9:04 pm Post subject: Re: More Japanese Vector Supercomputer |
|
|
In article <lj4fkvck59ol4fjl115q44h1ueuqlnkqia@4ax.com>,
Robert Myers <rmyers@rustuck.com> writes:
| Quote: | On 22 Aug 2003 19:26:12 GMT, piotr5@unet.univie.ac.at (Piotr Sawuk)
wrote:
In article <2llakv0fk3qp2b2de2of4okbvnogcmq1rq@4ax.com>,
Robert Myers <rmyers@rustuck.com> writes:
snip
I don't know how you get the settling time for the entire cluster down
to a reasonable number without considering physical path delays over
the entire building. It was the first question on my mind when I
heard about the earth simulator. Letting the entire thing run
asynchronously without considering physical location and path delays
between clusters would give decent performance only for problems that
didn't require much global communication. Unless you want the time
step to be set by the speed of sound, fluid mechanical calculations
require global communication at every time step.
as far as I have learned in a MP-course the whole point of
vector-computers is to have a single command being applied
to many consecutive array-positions. not global communication,
but communication in the sense of "the third next array-pos
needs to send data" is required. then it also isn't too
difficult to predict the next steps of the execution-path.
I guess that's why they got abandoned: there aren't many
applications which would need vectors with hundreds of
dimensions (since 3 or 4 dimensions can already be handled
by a simple 64-bit-processor with some smart use of the
registers). for example if I would try to simulate the
earth (or similarily complex system) then I could imagine
that a lot of variables need a similar treatment whenever
something changes (like each atom needs to be moved in the
same direction when the object gets moved). Somehow I
suspect that the uses for vector-computers are foreign
for us simply because there aren't many such computers
around. At least I could think of some nice games I
would wish to play on a vector-computer...
You're conversing with a veteran Cray programmer who is still trying
to get used to the idea of cache and who cut his teeth on the notion
of chime or chain slot time. Want a look at what a real computer
|
that I didn't understand. how do you think a cache would
cause damage to the aceptance of vector-computers? and
what do the other 2 notions mean? I'm just a beginning
programmer (assembler) and from this POV I am merely
interested in chip-design and am quite ignorant on this
topic, while you seem to know a lot in this area...
yes, sorry, my error. I should have said that no MP-programming
is required for vector-supercomputers since their Instruction-set
does already contain commands which can easily get paralellized
when the data-locality is handled smartly enough. i.e. vector
commands could get spread onto multiple processors without the
programmer even noticing a difference. if you are doing conscious
MP-programming on such a computer then of course global communication
is an issue, but otherwise (when all the MP-stuff is handled by
the processor internally) the problem is well known from cpu-design
where multiple execution-units work in parallel on some pre-fetched
commands with branch-prediction and stuff. I'm just saying that
theoretically the whole multi-processor stuff could be hidden
in a supercomputer with vector-computer's instruction-set simply
because data is represented as vectors and not memory-positions...
| Quote: |
Its hard to know from your description whether you are referring to
data dependency or a branch embedded in an inner loop as obstacles to
vectorization, but there are ways around both of those obstacles for
many cases of great interest. There is a general strategy for
vectorizing most multi-dimensional simulations of physical phenomena
|
Of course, it's just that I was referring to the similarity
between a vector-computer's capabilities and general MP-strategies.
for example when I have 4 bytes and need to double each of
them, then loading them into a single 32-bit register and
shifting that, with some bit-masking afterwards for the
overflow. that's what we all do nowdays, we use the 32-bit
wide execution-unit as if it where 4 8-bit-processors...
| Quote: | that I tried to describe in an earlier post, and so the number of
problems to which vector methods are adaptable is quite large.
|
basically I was just repeating your argument that global
communication (actually the need for syncronization of
the current execution-process to match procedual execution
instead of asynchronous use of the processor-power currently
available) does destroy decent performance, but not just in
some super-computer, but in some well-designed vector-computer
as well. C just isn't the language in which vector-computers
should be programmed...
| Quote: |
Vector processors and serious funding for the high-speed interconnect
fabrics were the victims of post-cold-war US DoD and DoE self-delusion
about something called "COTS", or "commercial off-the-shelf," not for
any reason having to do with the class of problems that vector
processing could be applied to.
The Cray 1 was capable of 133 megaflops, cost, as I recall, about $13
million, and required extensive plumbing. A P4 system capable of
delivering performance in the gigaflop range fits into an ATX tower,
can be had for about $1000, and requires no plumbing. Anybody who
could tell a bit from a byte knew this was coming by the time the
Berlin Wall came down, and the US DoD and the DoE decided that letting
PC users fund computer R&D was not such a bad deal.
The NSF had a supercomputer effort going through the nineties, and it
god alot of press, but the NSF doesn't have the funding clout of the
DoD and the DoE, and things got so bad that Cray could not survive
independently as a manufacturer of computers.
The Earth Simulator changed all that. Cray is back in business and
building vector processors again.
That does not mean that the COTS problem has gone away. This very
month IBM had to tell the US government, once again, that it was not
interested in funding R&D for computers that could not be
commercialized, and that, if the US wanted cutting edge supercomputers
for special purposes, it would have to come up with the money.
|
I understand this, you are certainly more experienced than me,
I just think that users paying for R&D of vector-computers could
have been possible too. in the mid-eighties it was quite clear
for many people that MP is the future and I'm still wondering
why noone did come up with vector-computers as a basis for that.
I always envisioned a computer where I plug in a processor, and
then another processor into that and so on, until I have a real
super computer, but somehow my dream didn't come true. not technical
obstacles did block this path, lack of research in vector-computers
did. Just my humble opinion...
| Quote: |
did anybody actually look up what this japanese
vector-supercomputer has been used for?
The earth simulator is billed as being used for earth sciences.
Atmospheric modeling, aka weather prediction, has long been one of the
most demanding applications for high-performance computing. Weather
prediction is useless if the model doesn't run faster than real time,
and that's a real challenge.
|
I guess Japan is merely interested in earthquakes and maybe
in hurricanes, they don'T seem to have much agriculture... :-)
but seriously, earth-sciences is much bigger than mere
Atmospheric modeling, there are enough computers already
working on weather-prediction, prediction of ocean-behaviour,
consequences from global warming and vulcanic activities
are much less researched areas of earth-sciences. therefore
I ask again: are you sure that weather is a major application
of this particular Vector Supercomputer (as opposed to
supercomputers in general)?
--
Better send the eMails to netscape.net, as to
evade useless burthening of my provider's /dev/null...
before complaining because of my rudeness, read
http://www.unet.univie.ac.at/~a9702387/en/adl/liar-faq.txt
and killfile me...
P |
|
| Back to top |
|
 |
Robert Myers Guest
|
Posted: Mon Aug 25, 2003 11:59 pm Post subject: Re: More Japanese Vector Supercomputer |
|
|
On 25 Aug 2003 16:04:20 GMT, piotr5@unet.univie.ac.at (Piotr Sawuk)
wrote:
| Quote: | In article <lj4fkvck59ol4fjl115q44h1ueuqlnkqia@4ax.com>,
Robert Myers <rmyers@rustuck.com> writes:
|
<snip>
| Quote: | You're conversing with a veteran Cray programmer who is still trying
to get used to the idea of cache and who cut his teeth on the notion
of chime or chain slot time. Want a look at what a real computer
that I didn't understand. how do you think a cache would
cause damage to the aceptance of vector-computers?
|
It wouldn't necessarily, but the whole mentality of Cray-1 programming
was that the entire machine operated synchronously. If you set things
up correctly, the machine could chain a vector load from memory,
multiply, add, and store to memory, with one new result popping out
each clock cycle, with memory being physically addressed as 64-bit
words and not as bytes, and certainly not as cache lines, because
their was no cache.
For someone who got used to a machine like that, cache seems like a
very odd notion. For certain kinds of problems, cache can actually
slow things down. If your data are stored on non-unit stride in
memory, half of every 128 bit cache line load is useless if you're
doing 64-bit floating point, and you may not be able to keep data
around in cache long enough to offset the extra latency of loading
into cache and then into a register.
I *still* don't always know for sure what Stream benchmarks mean on
microprocessors because they don't always tell you if they've done
something funny with the cache, like skipping over it. Stream tests
the ability of a microprocessor to do the kinds of streaming
calculation (fetch, multiply, add, store) that the Cray-1 would have
excelled at and that show up very frequency in engineering and
scientific work.
Stream-type calculations and vector machines naturally go together,
and cache is generally just an obstacle for Stream-type calculations.
| Quote: | and
what do the other 2 notions mean?
|
From the Cray Hardware Manual cited in my previous post:
"V register reservations
The term "reservation" describes the register condition when a
register is in use and therefore not available for use as a result or
as an operand register for another operation. During execution of a
vector instruction, reservations are placed on the operand V registers
and on the result V register. These reservations are placed on the
registers themselves, not on individual elements of the V register."
"A reservation for a result register is lifted during "chain slot"
time. Chain slot time is the clock period that occurs at functional
unit time plus two clock periods. During this clock period, the result
is available for use as an operand in another vector operation. Chain
slot time has no effect on the reservation placed on operand V
registers. A V register may serve only one vector operation as the
source of one or both operands."
That means that, in doing a floating multiply-add, you could use the
result of a vector operation almost immeidately by doing the multiply
and initiating the add during the chain slot time. If you missed the
chain slot time, you had to wait for the entire vector multiply to
complete (typically taking as many cycles as the length of the vector)
before you could initiate the add operation. Modern vector units have
chaining built right in, but it was a novelty on the Cray-I and had to
be coded by hand at just the right time.
"chime" is a jargon-shortened synonym for chain slot time.
| Quote: | I'm just a beginning
programmer (assembler) and from this POV I am merely
interested in chip-design and am quite ignorant on this
topic, while you seem to know a lot in this area...
I have to bring it up frequently to justify my relative igorance about |
modern microprocessors with cache. :-).
| Quote: | looks like? Check out
http://ed-thelen.org/comp-hist/CRAY-1-HardRefMan/CRAY-1-HRM.html
The issues of vector processing and global communication sometimes get
tangled up because of data locality problems, but in general the two
issues are related only weakly.
yes, sorry, my error. I should have said that no MP-programming
is required for vector-supercomputers since their Instruction-set
does already contain commands which can easily get paralellized
when the data-locality is handled smartly enough. i.e. vector
commands could get spread onto multiple processors without the
programmer even noticing a difference.
|
Well, yes and no. There are similarities and important differences
between machines with vector processing units and multi-processor
parallel machines. To quote
http://www.nersc.gov/aboutnersc/pubs/revolution.html
"...look at the industry's history from 1993 to 1996. Cray Research,
the historic leader in supercomputing technology, was unable to
survive financially as an independent company and was acquired by
Silicon Graphics. Two ambitious new companies that introduced new
technologies in the late 1980s and early 1990s -- Thinking Machines
and Kendall Square Research -- were commercial failures. And Intel
discontinued production of its Paragon supercomputer only a few years
after it was introduced."
"During the same time frame, scientists who had finished the laborious
task of writing scientific codes to run on vector parallel
supercomputers learned that those codes would have to be rewritten if
they were to run on the next-generation, highly parallel
architecture."
That's me! :-(. It also says
"Scientists who are not yet involved in high-performance computing are
understandably hesitant about committing their time and energy to such
an apparently unstable enterprise."
That could be you. ;-).
I can't find a good pedagogical explanation of the similarities and
differences between vector and multiprocessor computation (which are
many, and even a cursory discussion would be lengthy), but
http://parallel.hpc.unsw.edu.au/HPCAsia/papers/33.pdf
presents a recent-head-to-head comparison. That paper will also give
you an idea of some of the issues involved and give you an idea of why
vector units haven't been pursued aggressively of late.
| Quote: | if you are doing conscious
MP-programming on such a computer then of course global communication
is an issue, but otherwise (when all the MP-stuff is handled by
the processor internally) the problem is well known from cpu-design
where multiple execution-units work in parallel on some pre-fetched
commands with branch-prediction and stuff. I'm just saying that
theoretically the whole multi-processor stuff could be hidden
in a supercomputer with vector-computer's instruction-set simply
because data is represented as vectors and not memory-positions...
|
If only life were so simple.
Changing the representation is only a matter of changing the
programming language. In fact, whole libraries full of all the vector
type operations you might want to do are available to hide from you
the ugly details of how the machine is actually doing its dirty work.
If we had good enough languages and/or compilers, then computational
scientists wouldn't have to go through the agonizing code re-writes
that some of us have been through. The amazing thing is that, with
all of the different tricks of modern microprocessors: cache, vector,
SMP, ILP, SIMD, pipelining, super-scalar, (with the notable exceptions
of OoO and SMT), the obstacles to fast computation always come down to
the same thing:
You don't know far enough ahead of time the exact path the code will
take as it threads its way through code with branches (control
indeterminacy) and exactly what data you will need (data
indeterminacy) to keep operational units busy and the pipelines from
getting stalled.
There are probably much more elegant statements around, but getting
around control and data indeterminacy is a big part of the agenda of
computer architecture and code optimization. OoO is such a big win
because it allows you to decide in what order to execute instructions
at the very last moment, without having resolved control and data
dependency issues ahead of time. SMT (hyperthreading) to some extent
accomplishes the same thing.
The Cray-1 in vector mode was designed for and excelled at problems
where data were accessed in one very predictalbe way (constant stride
in memory) and where control indeterminacy could be finessed often
enough to make it unimportant. As it happens, the Cray-1 also
excelled in scalar mode where practically nothing was known ahead of
time becuse its main memory was essentially one big cache.
In most problems, there is much more exploitable parallelism than
programmers can manage to make obvious enough to currently available
compilers that the compiler can actually exploit the parallelism. We
need either smarter programmers or smarter compilers, or both...
Or we need some tools that will allow programmers to help the compiler
find exploitable parallelism without being so smart. That's a problem
I'm working on.
| Quote: |
Its hard to know from your description whether you are referring to
data dependency or a branch embedded in an inner loop as obstacles to
vectorization, but there are ways around both of those obstacles for
many cases of great interest. There is a general strategy for
vectorizing most multi-dimensional simulations of physical phenomena
Of course, it's just that I was referring to the similarity
between a vector-computer's capabilities and general MP-strategies.
for example when I have 4 bytes and need to double each of
them, then loading them into a single 32-bit register and
shifting that, with some bit-masking afterwards for the
overflow. that's what we all do nowdays, we use the 32-bit
wide execution-unit as if it where 4 8-bit-processors...
Your instincts are right in that if you know alot about the problems |
of any kind of paralelism, you have a big head start on understanding
the problems of any other kind of parallelism, because they are all
pretty much the same.
<snip>
| Quote: | C just isn't the language in which vector-computers
should be programmed...
|
If you *really* understand the shortcomings of C as a language for
vector processing, you will rapidly come to the conclusion that it
usually isn't a very good language for modern microprocessors, because
it makes it very hard for a compiler to resolve control and data-flow
uncertainties.
| Quote: |
Vector processors and serious funding for the high-speed interconnect
fabrics were the victims of post-cold-war US DoD and DoE self-delusion
about something called "COTS", or "commercial off-the-shelf," not for
any reason having to do with the class of problems that vector
processing could be applied to.
snip
The Earth Simulator changed all that. Cray is back in business and
building vector processors again.
That does not mean that the COTS problem has gone away. This very
month IBM had to tell the US government, once again, that it was not
interested in funding R&D for computers that could not be
commercialized, and that, if the US wanted cutting edge supercomputers
for special purposes, it would have to come up with the money.
I understand this, you are certainly more experienced than me,
I just think that users paying for R&D of vector-computers could
have been possible too. in the mid-eighties it was quite clear
for many people that MP is the future and I'm still wondering
why noone did come up with vector-computers as a basis for that.
|
I think the mind-bending cost of the earth simulator should give you a
clue to that.
| Quote: | I always envisioned a computer where I plug in a processor, and
then another processor into that and so on, until I have a real
super computer, but somehow my dream didn't come true. not technical
obstacles did block this path, lack of research in vector-computers
did. Just my humble opinion...
|
Do you know about www.beowulf.org? If not, you should pay a visit.
You, too, can dream of owning a supercomputer, or at least your own
computer with all the nasty problems of a supercomputer. ;-).
| Quote: |
did anybody actually look up what this japanese
vector-supercomputer has been used for?
The earth simulator is billed as being used for earth sciences.
Atmospheric modeling, aka weather prediction, has long been one of the
most demanding applications for high-performance computing. Weather
prediction is useless if the model doesn't run faster than real time,
and that's a real challenge.
I guess Japan is merely interested in earthquakes and maybe
in hurricanes, they don'T seem to have much agriculture... :-)
Up until rather recently, Japan would not allow imported rice, and I |
think Japan experiences typhoons rather than hurricanes, but if you
consider weather prediction to be a stand-in for fluid mechanics and
earthquake science to be a standin for solid mechanics, you have just
accounted for a huge chunk of all the computational work that is done
for scientific or technical purposes. Both are areas that are
generally well-suited to vector processors.
| Quote: | but seriously, earth-sciences is much bigger than mere
Atmospheric modeling, there are enough computers already
working on weather-prediction, prediction of ocean-behaviour,
consequences from global warming and vulcanic activities
are much less researched areas of earth-sciences. therefore
I ask again: are you sure that weather is a major application
of this particular Vector Supercomputer (as opposed to
supercomputers in general)?
|
The more serious questions are (and the only ones to which I think a
crisp answer is possible) will Japan use the Earth Simulator for
routine weather precition, and if so, how much of the computer's time
does that require? I suspect but do not know that the answer to the
first question is yes, and the answer to the second question is that
weather prediction has to use significantly less than 100% of a
computer's available time to be useful. Asking what is to be done
with the rest would be like asking what is the Hubble Space Telescope
used for.
RM |
|
| Back to top |
|
 |
|
|
|
|
| |