Introduction

Majority of mobile apps need optimization. And there are cases, when optimization is vital for app to succeed. 3D games need optimization in most cases, so as photo and video manipulation apps. Depending on required parameters and functions we can select the best way of image processing and building. The more optimizations techniques we have - the better, as we can select best algorithms and compare them. So what are the options and techniques of image processing for iOS? What performance level all of them have?

Outlining the research task

I decided to use Oil Painting Filter, for this research, as it requires a high performance level of the device. Filter has 2 parameters: Radius and Intensity Levels.

Reasons to choose this filter:

  1. There is no ready Oil Painting Filter done with OpenGL Shading Language or Core Image Kernel Language
  2. This filter requires a device with high performance levels
  3. This filter is not implemented in Core Image framework

Selecting development technology

Let’s review image processing, specifically for Oil Painting Filter done in different programming languages. With iOS we have multiple ways to develop it:

  1. Objective-C (CPU)
  2. C (CPU)
  3. Swift (CPU)
  4. Core Image (GPU)
  5. OpenGL C (GPU)
  6. Metal (GPU)
  7. etc.

As we see from this list, some languages use the GPU and some use CPU. As our task is related to image processing, it is more efficient to use GPU methods. This means that a developer needs to have good experience in OpenGL or Vulkan, which is not always possible. Also developing for the GPU is often much more time consuming, rather than developing for CPU. So we need to decide which is more important, development speed or efficiency of app? I have selected these languages for test:

  1. C Language (CPU)
  2. Swift 3.0 (CPU)
  3. OpenGL (GPU - OpenGL Shading Language)
  4. Core Image (GPU - Core Image Kernel Language)

Our test app will have 5 methods, let’s review all of them:

C Language (CPU) - in this example, we will write Oil Painting Filter on pure C. This code will perform on the CPU. If your app is not time sensitive, and image processing is done not in real time, than this example should work well. It’s easy and quick to do.

Swift 3.0 (CPU) - in this test I decided to compare Swift 3.0 versus Pure C performance. It’s obvious that with Pure C Oil Painting Filter should work faster, but how much faster? Apple updates Swift syntax pretty often, that’s why using it in commercial projects is not a good idea at this moment. There is often a situation that you need to update the code after syntax changes. So it’s better to wait for more stable Swift version.

OpenGL C (GPU) - is the most efficient method available for iOS developer. The downside is that developers should know OpenGL and OpenGL Shading Language well. This method is low-level language and uses the GPU.

OpenGL Swift (GPU) - Oil Painting Filter on Swift was done to compare it with OpenGL C (GPU) speed.

Core Image (GPU) - is a high-level processing technology built above OpenGL by Apple.The advantages are that Core Image has a broad set of default filters, which are easy to use. Reference Core Image Filter.

Testing

To measure code performance I have written special class DFPerformanceMeter, which is available at GitHub. Code example written with this class demonstrates the performance of the Oil Painting Filter. Example is written on Swift 3.0.

C Language (CPU) - as Swift does not support code mix, so C code implementation was done via ObjC Bridge. In this case Bridge Header - “iOS Language Performance Example-Bridging-Header.h”.

Oil Painting Filter on C

            
                #define INTENSITY_SIZE 32
                #define INTENSITY_SIZE_1 (INTENSITY_SIZE - 1)

                void filterOilC(const BYTE* pbyDataIn_i,
                                const int nRadius_i,
                                const int fIntensityLevels_i,
                                const int nWidth_i,
                                const int nHeight_i,
                                BYTE* pbyDataOut_o )
                {
                    int nIntensityCount[INTENSITY_SIZE];
                    int nSumR[INTENSITY_SIZE];
                    int nSumG[INTENSITY_SIZE];
                    int nSumB[INTENSITY_SIZE];

                    memset( pbyDataOut_o, 255, nWidth_i * nHeight_i * 4 );

                    int nBytesInARow = ceil( nWidth_i * 4 / 4.0 ) * 4.0;

                    int h1 = nHeight_i - nRadius_i;
                    int w1 = nWidth_i - nRadius_i;
                    int offsetY = nRadius_i * nBytesInARow;
                    for( int nY = nRadius_i; nY < h1; nY++)
                    {
                        for( int nX = nRadius_i; nX < w1; nX++)
                        {
                            memset( nIntensityCount, 0, sizeof(nIntensityCount) );
                            memset( nSumR, 0, sizeof(nSumR) );
                            memset( nSumG, 0, sizeof(nSumG) );
                            memset( nSumB, 0, sizeof(nSumB) );

                            for( int nY_O = -nRadius_i; nY_O <= nRadius_i; nY_O++ )
                            {
                                for( int nX_O = -nRadius_i; nX_O <= nRadius_i; nX_O++ )
                                {
                                    int offset = ((nX+nX_O)<<2)  + ( nY + nY_O ) * nBytesInARow;
                                    int nR = pbyDataIn_i[offset];
                                    int nG = pbyDataIn_i[offset + 1];
                                    int nB = pbyDataIn_i[offset + 2];
                                    int nCurIntensity = (int)((((nR + nG + nB)) / 3.0) * fIntensityLevels_i) / 255.0;
                                    if( nCurIntensity > INTENSITY_SIZE_1 ) {
                                        nCurIntensity = INTENSITY_SIZE_1;
                                    }
                                    int i = nCurIntensity;
                                    nIntensityCount[i]++;
                                    nSumR[i] += nR;
                                    nSumG[i] += nG;
                                    nSumB[i] += nB;
                                }
                            }

                            int nCurMax = 0;
                            int nMaxIndex = 0;
                            for( int nI = 0; nI < INTENSITY_SIZE; nI++ )
                            {
                                if( nIntensityCount[nI] > nCurMax )
                                {
                                    nCurMax = nIntensityCount[nI];
                                    nMaxIndex = nI;
                                }
                            }

                            int offset = (nX << 2) + offsetY;
                            pbyDataOut_o[offset ] = nSumR[nMaxIndex] / nCurMax;
                            pbyDataOut_o[offset + 1] = nSumG[nMaxIndex] / nCurMax;
                            pbyDataOut_o[offset + 2] = nSumB[nMaxIndex] / nCurMax;
                        }

                        offsetY += nBytesInARow;
                    }
                }
            
        

Swift 3.0 (CPU) - we can predict that Swift example will work slower than C, but it needs to be tested. So, as Apple states that Swift is the replacement of Objective-C, we need to know how much slower Swift is comparing to Objective-C.

Oil Painting Filter on Swift

            
                class func filterOil(uiImage : UIImage, IntensityLevel : Int, radius : Int) -> UIImage {

                let INTENSITY_SIZE = 32
                let INTENSITY_SIZE_1 = (INTENSITY_SIZE - 1)

                let nIntensityCount =  UnsafeMutablePointer.allocate(capacity: INTENSITY_SIZE)
                let nSumR =  UnsafeMutablePointer.allocate(capacity: INTENSITY_SIZE)
                let nSumG =  UnsafeMutablePointer.allocate(capacity: INTENSITY_SIZE)
                let nSumB =  UnsafeMutablePointer.allocate(capacity: INTENSITY_SIZE)

                var cgImage: CGImage = uiImage.cgImage!

                let width = cgImage.width
                let height = cgImage.height
                let bitsPerComponent = cgImage.bitsPerComponent
                let numberOfComponents = (Int)(cgImage.bytesPerRow / width)
                var bytesPerRow = cgImage.bytesPerRow
                let colorSpace = cgImage.colorSpace

                let data: CFData = cgImage.dataProvider!.data!
                let dataLength: size_t = CFDataGetLength(data)
                let bytes = UnsafeMutablePointer.allocate(capacity: dataLength)
                let outPutBytes = UnsafeMutablePointer.allocate(capacity: dataLength)
                memset(outPutBytes, 255, dataLength)
                CFDataGetBytes(data, CFRangeMake(0, dataLength), bytes)

                let sizeOf256Int = MemoryLayout.size*INTENSITY_SIZE
                let height_radius = height - radius
                let width_radius = width - radius

                bytesPerRow = Int(ceil( Double(bytesPerRow) / 4.0 ) * 4.0);

                var offsetY = radius * bytesPerRow

                for nY in radius.. INTENSITY_SIZE_1 ){
                                    nCurIntensity = INTENSITY_SIZE_1
                                }
                                let i = Int(nCurIntensity);
                                nIntensityCount[i] += 1;

                                nSumR[i] += nR;
                                nSumG[i] += nG;
                                nSumB[i] += nB;
                            }
                        }

                        var nCurMax : Int = Int(0);
                        var nMaxIndex : Int = Int(0);
                        for nI in 0.. nCurMax )
                            {
                                nCurMax = nIntensityCount[nI];
                                nMaxIndex = nI;
                            }
                        }

                        let offset = nX * numberOfComponents + offsetY

                        outPutBytes[offset ] = UInt8(nSumR[nMaxIndex] / nCurMax);
                        outPutBytes[offset + 1] = UInt8(nSumG[nMaxIndex] / nCurMax);
                        outPutBytes[offset + 2] = UInt8(nSumB[nMaxIndex] / nCurMax);
                    }
                    offsetY += bytesPerRow
                }

                free(bytes)

                let targetContext: CGContext = CGContext(data: outPutBytes, width: width, height: height, bitsPerComponent: bitsPerComponent, bytesPerRow: bytesPerRow, space: colorSpace!, bitmapInfo: CGBitmapInfo(rawValue: CGImageAlphaInfo.premultipliedLast.rawValue).rawValue)!
                cgImage = targetContext.makeImage()!
                let image: UIImage = UIImage(cgImage: cgImage)

                free(outPutBytes)
                free(nIntensityCount)
                free(nSumR)
                free(nSumG)
                free(nSumB)

                return image
            }
            
        

OpenGL C и OpenGL Swift (GPU) - as Apple states the core of all image processing libraries is OpenGl. This is obvious, as OpenGL is low-level graphic programming API. For comparison an implementation below is written in two languages: C and Swift. The test code itself is implemented on Swift. For implementation of C and Swift simultaneously I use Shader Program.

Oil Painting Filter on OpenGL Shading Language

            
                //FragmentShader.fsh
                precision highp int;
                precision highp float;
                uniform highp sampler2D tex;
                uniform highp float fIntensityLevels;
                uniform highp int radius;
                uniform highp vec2 src_size;
                varying highp vec2 fragTexCoord;

                #define INTENSITY_SIZE 32
                #define INTENSITY_SIZE_1 31

                void main() {
                    highp int nIntensityCount[INTENSITY_SIZE];
                    highp vec3 nSum[INTENSITY_SIZE];

                    for (int i = 0; i < INTENSITY_SIZE; i ++) {
                        nIntensityCount[i] = 0;
                        nSum[i] = vec3(0.0,0.0,0.0);
                    }
                    int nY_O;
                    int nX_O;
                    highp vec3 c;

                    vec2 gg = vec2(1.0, 1.0) / src_size;

                    float rr = float(radius) / src_size.x;
                    if(fragTexCoord.x - rr < 0.0 || fragTexCoord.x + rr > 1.0 || fragTexCoord.y - rr < 0.0 || fragTexCoord.y + rr > 1.0)
                    {
                        gl_FragColor = vec4(0.0, 0.0, 0.0, 0.0);
                        return;
                    }

                    int nCurIntensity = 0;
                    for(  nY_O = -radius; nY_O <= radius; nY_O++ )
                    {
                        for(  nX_O = -radius; nX_O <= radius; nX_O++ )
                            {
                                c = texture2D(tex, fragTexCoord + vec2(nX_O, nY_O) * gg).rgb;
                                nCurIntensity = int( (( c.r + c.g + c.b ) / 3.0)  * fIntensityLevels);
                                if( nCurIntensity > INTENSITY_SIZE_1 ) {
                                nCurIntensity = INTENSITY_SIZE_1;
                            }
                            nIntensityCount[nCurIntensity]++;
                            nSum[nCurIntensity] += c;
                        }
                    }
                    int nCurMax = 0;
                    int nMaxIndex = 0;
                    for( int nI = 0; nI < INTENSITY_SIZE; nI++ )
                    {
                        if( nIntensityCount[nI] > nCurMax )
                        {
                            nCurMax = nIntensityCount[nI];
                            nMaxIndex = nI;
                        }
                    }
                    gl_FragColor = vec4(nSum[nMaxIndex] / float(nCurMax), 1.0);
                }

                //VertexShader.vsh
                precision highp float;
                precision highp int;

                attribute highp vec4 a_position;
                attribute highp vec2 texCoord;

                varying highp vec2 fragTexCoord;

                void main() {
                    fragTexCoord = texCoord;
                    gl_Position = a_position * vec4(1.0, -1.0, 1.0, 1.0);
                }
            
        

Core Image (GPU) - (real-time processing) as Apple states "You don’t need to know the details of OpenGL, OpenGL ES, or Metal to leverage the power of the GPU." In the filter list of Core Image there is no Oil Painting Filter. Core Image provides a tool for implementing custom image processing filters. Core Image Kernel Language. Basically it is a smaller version of OpenGL Shading Language.

Oil Painting Filter on Core Image Kernel Language

            
                //This Shader is located in OilPaintingCoreImage.swift
                static var INTENSITY_SIZE = "32"
                static var INTENSITY_SIZE_1 = "31"

                "kernel vec4 lumaVariableBlur(sampler image, float radius1, float fIntensityLevels) " +
                            "{ " +
                            "   int nIntensityCount["+INTENSITY_SIZE+"];" +
                            "   vec3 nSum["+INTENSITY_SIZE+"];" +
                            "   int i;" +
                            "   for (i = 0; i < "+INTENSITY_SIZE+"; i ++) {" +
                            "       nIntensityCount[i] = 0;" +
                            "       nSum[i] = vec3(0.0,0.0,0.0);" +
                            "   }" +
                            "vec2 size = samplerSize(image);" +
                            "vec2 scale = vec2(1.0, 1.0) / size;" +
                            "vec2 d = destCoord(); " +
                            "int radius = int(radius1);" +
                            "if(int(d.x) - radius < 0 || int(d.x) + radius >= int(size.x) || int(d.y) - radius < 0 || int(d.y) + radius >= int(size.y)){" +
                            "return vec4(0.0, 0.0, 0.0, 0.0);" +
                            "}" +
                            "int nY_O;" +
                            "int nX_O;" +
                            "vec3 c;" +
                            "int nCurIntensity = 0;" +
                            "for( nY_O = -radius; nY_O <= radius; nY_O++ ){" +
                            "for(  nX_O = -radius; nX_O <= radius; nX_O++ ){" +
                            "c = sample(image, samplerCoord(image) + (vec2(nX_O,nY_O) * scale)).rgb;" +
                            "nCurIntensity = int( (( c.r + c.g + c.b ) / 3.0 ) * fIntensityLevels);" +
                            "if( nCurIntensity > "+INTENSITY_SIZE_1+" ) {" +
                            "nCurIntensity = "+INTENSITY_SIZE_1+";" +
                            "}" +
                            "nIntensityCount[nCurIntensity]++;" +
                            "nSum[nCurIntensity] += c;" +
                            "}" +
                            "}" +
                            "int nCurMax = 0;" +
                            "int nMaxIndex = 0;" +
                            "for( int nI = 0; nI < "+INTENSITY_SIZE+"; nI++ ){" +
                            "if( nIntensityCount[nI] > nCurMax ){" +
                            "nCurMax = nIntensityCount[nI];" +
                            "nMaxIndex = nI;" +
                            "}" +
                            "}" +
                            "return vec4(nSum[nMaxIndex] / float(nCurMax), 1.0);" +
                        "} "
            
        

Test Results

Example test at GitHub. Used for test:

  1. Device - iPad Air 2
  2. OS - iOS 10.0.2
  3. Build configuration - release

For this particular test outcome is pretty obvious. OpenGl works faster than CoreImage, CoreImage works faster than C and Swift.

  iPad Air 2 - radius = 20, Intensity = 20
OpenGL Swift 4.85 sec
OpenGL C 4.9 sec
Core Images 8.4 sec
Swift 3.0 27.8 sec
C 37.1 sec

It was interesting to discover that C performance level is lower than Swift.

Used for test #2:

  1. Device - iPhone 5s
  2. OS - iOS 10.2
  3. Build configuration - release

iPhone 5s is not capable of doing OpenGL and Core Image test correctly with radius parameter more than 15. Final rendering was not finished:

oil painting test

That’s why the test with the maximum amount of 15 for radius and 10 for Intensity was created for iPhone 5S.

  iPhone 5s - radius = 15, Intensity = 15
Swift 3.0 28.79 sec
Core Images 36.97 sec
OpenGL Swift 38.74 sec
OpenGL C 38.8 sec
C 42.21 sec

iPhone 5s had pretty strange test results. OpenGL Swift and OpenGL C work slower than Swift. From this we can make an assumption that working with arrays in the Fragment Shader Program is challenging for iPhone 5S, or there could be some issues in iOS 10. In general, this test has shown that it’s better to use Core Image for developing image filters. On if fast processing speed is required OpenGl should be used.

Interesting Facts

  • From Apple docs Core Image Kernel Language it appears: "Core Image does not support the OpenGL Shading Language source code preprocessor. In addition, the following are not implemented:

    1. Data types: mat2, mat3, mat4, struct, arrays”

    But this test has shown that arrays are supported anyway.

  • INTENSITY_SIZE for OpenGL shader could not be more than 204.It’s interesting that for Core Image we have same 204 limit. It seems that the Core Image Shader Program is converted to OpenGl Shader program at a low level. I assume, that this is related to the amount of memory provided for Fragment Shader Program. To have stable result I’ve chosen to have 32 value for INTENSITY_SIZE in all tests.

  • eaglLayer.drawableProperties[@“kEAGLDrawablePropertyRetainedBacking”] for OpenGL C Test should be equal to YES so OpenGL buffer could be displayed. Though in older versions of iOS this parameter used to be equal to NO. OpenGL Swift Test does not require setuping this parameter.

  • Swift started working faster than C in iOS 10.