When memory becomes an issue, then you need to reduce the number of tasks per node. In this way, you will gain not only the extra memory, but also more memory bandwidth, so the code should run a bit faster.
If your application makes use of OpenMP threads, you might try to use less tasks per node and set --cpus-per-task
within your batch script, that distribute them evenly across the node. In some cases you might gain a speed up, but this is strongly application dependent: please have a look at the section Running Jobs of the User Portal for more information.
Please remember that your budget is charged per node on Cray systems: check the Computing Budget documentation.