.. _chap:Efficiency: ========== Efficiency ========== This section will present results and discussion of efficiency evaluations with :term:`FiPy`. Programming in :term:`Python` allows greater efficiency when designing and implementing new code, but it has some intrinsic inefficiencies during execution as compared with the C or FORTRAN programming languages. These inefficiencies can be minimized by translating sections of code that are used frequently into C. :term:`FiPy` has been tested against an in-house phase field code, written at NIST, to model grain growth and subsequent impingement. This problem can be executed by running:: $ examples/phase/impingement/mesh20x20.py \ > --numberOfElements=10000 --numberOfSteps=1000 from the base :term:`FiPy` directory. The in-house code was written by Ryo Kobayashi and is used to generate the results presented in :cite:`WarrenPolycrystal`. The raw CPU execution times for 10 time steps are presented in the following table. The run times are in seconds and the memory usage is in kilobytes. The Kobayashi code is given the heading of FORTRAN while :term:`FiPy` is run with and without inlining. The memory usage is for :term:`FiPy` simulations with the :option:`--inline`. The :option:`--no-cache` flag is on in all cases for the following table. ========== ============ ======================= ============= ============= ============= Elements FiPy (s) FiPy FORTRAN (s) FiPy FORTRAN :option:`--inline` (s) memory (KiB) memory (KiB) ========== ============ ======================= ============= ============= ============= 100 0.77 0.30 0.0009 39316 772 400 0.87 0.37 0.0031 39664 828 1600 1.4 0.65 0.017 40656 1044 6400 3.7 2.0 0.19 46124 1880 25600 19 10 1.3 60840 5188 102400 79 43 4.6 145820 18436 ========== ============ ======================= ============= ============= ============= The plain :term:`Python` version of :term:`FiPy`, which uses ``Numeric`` for all array operations, is around 17 times slower than the FORTRAN code. Using the :option:`--inline` flag, this penalty is reduced to about 9 times slower. It is hoped that in future releases of :term:`FiPy` the process of C inlining for ``Variable`` objects will be automated. This may result in some efficiency gains, greater than we are seeing for this particular problem since all the ``Variable`` objects will be inlined. Recent analysis has shown that a ``Variable`` with multiple operations could be up to 6 times faster at calculating its value when inlined. As presented in the above table, memory usage was also recorded for each :term:`FiPy` simulation. From the table, once base memory usage is subtracted, each cell requires approximately 1.4 kilobytes of memory. The measurement of the maximum memory spike is hard with dynamic memory allocation, so these figures should only be used as a very rough guide. The FORTRAN memory usage is exact since memory is not allocated dynamically. Efficiency comparison between :option:`--no-cache` and :option:`--cache` flags ============================================================================== This table shows results for efficiency tests when using the caching flags. Examples with more variables involved in complex expressions show the largest improvement in memory usage. The :option:`--no-cache` option mainly prevents intermediate variables created due to binary operations from caching their values. This results in large memory gains while not effecting run times substantially. The table below is with :option:`--inline` switched on and with 102400 elements for each case. The :option:`--no-cache` flag is the default option. .. currentmodule:: examples ======================================================== ==================== ================= ====================== ================== Example time per step time per step memory per cell memory per cell ``--no-cache`` (s) ``--cache`` (s) ``--no-cache`` (KiB) ``--cache`` (KiB) ======================================================== ==================== ================= ====================== ================== :mod:`examples.phase.impingement.mesh20x20` 4.3 4.1 1.4 2.3 :mod:`examples.phase.anisotropy` 3.5 3.2 1.1 1.9 :mod:`examples.cahnHilliard.mesh2D` 3.0 2.5 1.1 1.4 :mod:`examples.levelSet.electroChem.simpleTrenchSystem` 62 62 2.0 2.8 ======================================================== ==================== ================= ====================== ================== Efficiency discussion of Pysparse and Trilinos ================================================================== Trilinos provides multigrid capabilities which are beneficial for some problems, but has significant overhead compared to Pysparse. The matrix-building step takes significantly longer in Trilinos, and the solvers also have more overhead costs in memory and performance than the equivalent Pysparse solvers. However, the multigrid preconditioning capabilities of Trilinos can, in some cases, provide enough of a speedup in the solution step to make up for the overhead costs. This depends greatly on the specifics of the problem, but is most likely in the cases when the problem is large and when Pysparse cannot solve the problem with an iterative solver and must use an LU solver, while Trilinos can still have success with an iterative method.