[Aces-support] cannot start new job on ao and geo
Richard Lu
lurr at MIT.EDU
Wed Aug 29 11:44:59 EDT 2007
Thanks for the effort.
Rongrong Lu
--------------------------------------------
Earth Resources Laboratory, MIT
77 Massachusetts Ave., Bldg.54-1815, Cambridge, MA 02139
Tel: 617-253-7835 (o) 617-230-6729 (m)
Email: lurr at mit.edu
Web: http://web.mit.edu/lurr
--------------------------------------------
aces-admin at techsquare.com wrote:
> hello lurr-
>
> this is not strange at all, if the
> "cluster environment" doesn't change.
>
> compute nodes are chosen in a fixed-order.
> if you submit 2 jobs, quit them, and then
> submit 2 more jobs you will get the
> same compute nodes as long as no thing
> else has changed in the "cluster environment" -
> eg, other user-jobs, machines down for repairs, etc.
>
> i am working to find the root cause of this
> breakage, so please bear with me...
>
> [greg]
>
>> Date: Tue, 28 Aug 2007 16:28:46 -0400
>> From: Richard Lu <lurr at mit.edu>
>> MIME-Version: 1.0
>> Cc:
>> Reply-To: ACES-support at mitgcm.org
>>
>> It is strange that if I close all the jobs, then I can submit two job
>> requests, the first one is going to start normally, but the second job
>> is having problem. For example, I request two jobs a moment ago and
>> first job 68906.geo in the queue gets started and the other job
>> 68907.geo had a problem. Hope this give you more clue on what's going
>> on. Thanks.
>>
>> [lurr at geo:~]
>> $ nt1
>> qsub: waiting for job 68907.geo to start
>> qsub: job 68907.geo ready
>>
>>
>> qsub: job 68907.geo completed
>>
>>
>>
>>
>> aces-admin at techsquare.com wrote:
>>> hmm, and again, please ?
>>>
>>> [greg]
>>>
>>>> Date: Tue, 28 Aug 2007 10:30:33 -0400
>>>> From: Richard Lu <lurr at mit.edu>
>>>> MIME-Version: 1.0
>>>> Cc:
>>>> Reply-To: ACES-support at mitgcm.org
>>>>
>>>> No, it still has problem:
>>>>
>>>> [lurr at geo:~/scratch/s40/deimos]
>>>> $ nt1
>>>> qsub: waiting for job 68882.geo to start
>>>> qsub: job 68882.geo ready
>>>>
>>>>
>>>> qsub: job 68882.geo completed
>>>>
>>>>
>>>>
>>>> aces-admin at techsquare.com wrote:
>>>>> hello lurr-
>>>>>
>>>>> is this still happening for you ?
>>>>> i've just checked both geo and ao
>>>>> and they seem to be fine...
>>>>>
>>>>> actually, i just tweaked geo a bit.
>>>>> does that help for you ?
>>>>>
>>>>> [greg]
>>>>>
>>>>> ps. i killed your MATLAB on the head-node.
>>>>> please do not run computationally intensive
>>>>> code on the head nodes, etc.
>>>>>
>>>>>> Date: Tue, 28 Aug 2007 10:05:23 -0400
>>>>>> From: Richard Lu <lurr at mit.edu>
>>>>>> MIME-Version: 1.0
>>>>>> Cc:
>>>>>> Reply-To: ACES-support at mitgcm.org
>>>>>>
>>>>>> Hi, there,
>>>>>>
>>>>>> I cannot start any new job on both ao and geo. When I submit a job
>>>>>> request, it says the job was ready, and then immediately the job gets
>>>>>> killed. Anything wrong? Thanks.
>>>>>>
>>>>>> [lurr at ao:~] $ qsub -I -q long -l nodes=1
>>>>>> qsub: waiting for job 86713.ao to start
>>>>>> qsub: job 86713.ao ready
>>>>>>
>>>>>>
>>>>>> qsub: job 86713.ao completed
>>>>>>
>>>>>>
>>>>>> [lurr at geo:~]
>>>>>> $ qsub -I -q long -l nodes=1
>>>>>> qsub: waiting for job 68879.geo to start
>>>>>> qsub: job 68879.geo ready
>>>>>>
>>>>>>
>>>>>> qsub: job 68879.geo completed
>>>>>>
>>>>>>
>>>>>> Rongrong Lu
>>>>>>
>>>>>> --------------------------------------------
>>>>>> Earth Resources Laboratory, MIT
>>>>>> 77 Massachusetts Ave., Bldg.54-1815, Cambridge, MA 02139
>>>>>> Tel: 617-253-7835 (o) 617-230-6729 (m)
>>>>>> Email: lurr at mit.edu
>>>>>> Web: http://web.mit.edu/lurr
>>>>>> --------------------------------------------
>>>>>> _______________________________________________
>>>>>> Aces-support mailing list
>>>>>> Aces-support at acesgrid.org
>>>>>> http://acesgrid.org/mailman/listinfo/aces-support
>>>>>>
>>>>> _______________________________________________
>>>>> Aces-support mailing list
>>>>> Aces-support at acesgrid.org
>>>>> http://acesgrid.org/mailman/listinfo/aces-support
>>>> _______________________________________________
>>>> Aces-support mailing list
>>>> Aces-support at acesgrid.org
>>>> http://acesgrid.org/mailman/listinfo/aces-support
>>>>
>>> _______________________________________________
>>> Aces-support mailing list
>>> Aces-support at acesgrid.org
>>> http://acesgrid.org/mailman/listinfo/aces-support
>> _______________________________________________
>> Aces-support mailing list
>> Aces-support at acesgrid.org
>> http://acesgrid.org/mailman/listinfo/aces-support
>>
> _______________________________________________
> Aces-support mailing list
> Aces-support at acesgrid.org
> http://acesgrid.org/mailman/listinfo/aces-support
More information about the Aces-support
mailing list