Difference between revisions of "Rocky Python Prime Array"
Line 15: | Line 15: | ||
Lastly, we pass the environment variable $SLURM_ARRAY_TASK_ID as a parameter to our code. We will need to read in this parameter and use it to determine what data to process. We know from our array definition that it will be a number from 0 to 100. | Lastly, we pass the environment variable $SLURM_ARRAY_TASK_ID as a parameter to our code. We will need to read in this parameter and use it to determine what data to process. We know from our array definition that it will be a number from 0 to 100. | ||
'''python-prime-array.run''' | |||
<pre> | <pre> | ||
#!/bin/bash | #!/bin/bash | ||
Line 32: | Line 34: | ||
In the python code, we'll need determine the MIN and MAX values to search. As long as we know our CHUNKSIZE, we should be able to calculate those values using the task id being passed in as a parameter. This way, each execution of the code will process different chunks of numbers. | In the python code, we'll need determine the MIN and MAX values to search. As long as we know our CHUNKSIZE, we should be able to calculate those values using the task id being passed in as a parameter. This way, each execution of the code will process different chunks of numbers. | ||
'''prime_array.py''' | |||
<pre> | <pre> | ||
import sys | import sys | ||
Line 58: | Line 62: | ||
if is_prime(i): | if is_prime(i): | ||
print(i) | print(i) | ||
</pre> | |||
= Running Job = | |||
<pre> | |||
[test_user@rocky7 prime]$ pwd | |||
/home/test_user/projects/python/prime-array/ | |||
</pre> | |||
<pre> | |||
[test_user@rocky7 prime]$ ls | |||
logs prime_array.py python-prime-array.run | |||
</pre> | |||
<pre> | |||
[test_user@rocky7 prime]$ sbatch python-prime-array.run | |||
Submitted batch job 3877 | |||
</pre> | |||
Here we can see the job is queued with all 100 jobs. | |||
<pre> | |||
[test_user@rocky7 prime]$ squeue | |||
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) | |||
5667_[0-100] compute_all PYTHON_P test_use PD 0:00 1 (Resources) | |||
</pre> | </pre> | ||
Line 63: | Line 90: | ||
Once job is no longer listed in the queue: | |||
<pre> | |||
[test_user@rocky7 prime]$ ls logs/ | |||
</pre> | |||
<pre> | |||
[test_user@rocky7 prime]$ cat | |||
</pre> |
Revision as of 05:03, 21 April 2023
Job Array
Job arrays allow you to run the same code many times with a different task id. The task id can then be used to determine which subset of your data to process. This strategy breaks your large job up into multiple smaller jobs that not only execute more quickly but can run concurrently.
In the example of discovering prime numbers, lets say we want to discover all the primes in the first 1 million numbers. We could just create code that goes from 1 to 1000000. But if we use a job array, we could create 100 jobs that each search 10000 numbers.
Batch File
There are three differences when turning this into a job array.
First, we've added a SBATCH parameter to define not only how many jobs but the range of task ids to produce. In our example, we're making the range 0 to 100.
Secondly, for the log file pattern, we're using %A and %a instead of %j. These are patterns specific to job arrays. You can read more about the file patterns at this link
Lastly, we pass the environment variable $SLURM_ARRAY_TASK_ID as a parameter to our code. We will need to read in this parameter and use it to determine what data to process. We know from our array definition that it will be a number from 0 to 100.
python-prime-array.run
#!/bin/bash #SBATCH --job-name=PYTHON_PRIME_ARRAY #SBATCH --output=logs/python_prime_array_%A-%a.out #SBATCH --array=0-100 module load Python python prime_array.py $SLURM_ARRAY_TASK_ID
Python Code
In the python code, we'll need determine the MIN and MAX values to search. As long as we know our CHUNKSIZE, we should be able to calculate those values using the task id being passed in as a parameter. This way, each execution of the code will process different chunks of numbers.
prime_array.py
import sys # How many numbers to check for prime from each job CHUNKSIZE = 10000 ARRAYID=0 if len(sys.argv) > 1: ARRAYID = int(sys.argv[1]) MIN = ARRAYID * CHUNKSIZE MAX = MIN + CHUNKSIZE def is_prime(num): if num <= 1: return False else: for i in range(2, num): if (num % i) == 0: return False return True for i in range(MIN, MAX+1): if is_prime(i): print(i)
Running Job
[test_user@rocky7 prime]$ pwd /home/test_user/projects/python/prime-array/
[test_user@rocky7 prime]$ ls logs prime_array.py python-prime-array.run
[test_user@rocky7 prime]$ sbatch python-prime-array.run Submitted batch job 3877
Here we can see the job is queued with all 100 jobs.
[test_user@rocky7 prime]$ squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 5667_[0-100] compute_all PYTHON_P test_use PD 0:00 1 (Resources)
Once job is no longer listed in the queue:
[test_user@rocky7 prime]$ ls logs/
[test_user@rocky7 prime]$ cat