[Aces-support] Problems on ao nodes
aces-admin at techsquare.com
aces-admin at techsquare.com
Tue Aug 28 14:03:26 EDT 2007
hello jmc-
yes, these two (2) nodes are alright now.
fwiw, a54-1727-033 was troubled earlier today.
[greg]
> Date: Tue, 28 Aug 2007 10:52:03 -0400
> From: Jean-Michel Campin <jmc at ocean.mit.edu>
> Mime-Version: 1.0
> Cc:
> Reply-To: ACES-support at mitgcm.org
>
> Hi,
>
> I have 2 mpi jobs which did not run last night, between 1 am & 3.30 am:
> both were on the same computing nodes:
> a54-1727-033
> a54-1727-035
>
> and generate this error msg:
> poll: protocol failure in circuit setup
> p0_30263: p4_error: Child process exited while making connection to remote process on a54-1727-035: 0
> p0_30263: (2.010777) net_send: could not write to fd=4, errno = 32
>
> Are these 2 nodes OK now for mpi jobs ?
> Thanks,
> Jean-Michel
> _______________________________________________
> Aces-support mailing list
> Aces-support at acesgrid.org
> http://acesgrid.org/mailman/listinfo/aces-support
>
More information about the Aces-support
mailing list