Attaining hands-on experience with Spark: an exercise in paired programming

jolienyoni.png

Authors: Jolien Vanaelst & Yoni Geerinck

nesa-by-makers-736784-unsplash.jpg

Working in the exciting field of data analytics and IT equals a process of lifelong learning. There is no such thing as ‘graduation’! Whether this comes down to learning a new algorithm, a new programming language, or even finding your way in a new platform, we are constantly keeping our skillset up-to-date. There are of course several paths one could travel when obtaining a new skill. You could read a book about the subject, follow an online course, participate in a workshop, and so on.  At some point however, one has to stop studying courses and dive into real life problems.

valmir-dzivielevski-junior-717484-unsplash.jpg
spark-logo-hd.png

This is exactly the trajectory I followed when I wanted to improve my knowledge of Apache Spark. After taking an online course about the subject, I wanted to deep dive into the code and gain  some hands-on experience. Of course,  you could  explore some advanced online exercises or immediately start an autonomous project.  However, at Ordina, we wanted to experiment with an alternative approach called paired programming. As the name suggests, paired programming comes down to programming in pairs. That is, two people working on the same code on the same computer. Whereas one programmer actually writes the code, the other programmer mostly has a supervising role. He audits the code that is written and thinks conceptually about next steps that are to be taken.

Although this approach is usually undertaken with two experienced programmers, we wanted to explore the benefits of this method for a person that is quite new to the programming language. In addition, we assumed additional value for the colleague  in the role of ‘trainer’. That is, guiding a new colleague in his/her trajectory of exploring a new domain is not only a good test of ones’ own comprehension of the subject, but also show how well you can communicate this knowledge to your peers. Napoleon Games gave us the opportunity to test the paired programming approach in an ongoing project. Under the guidance of Yoni, I could strengthen my programming skills in Apache spark as well as broaden my understanding of the Apache ecosystem.  Below we give an overview on how both the ‘trainer’ and ‘trainee’ experienced the pair programming project and its pro’s and con’s.

The exercise started out quite challenging for both parties involved. As a trainee, the approach might feel like you are thrown in at the deep end. You have only a basic knowledge of the programming language, therefore, programming as well as reasoning about the next steps to take develops quite slow. Thinking, while someone is literally waiting for you to act, can feel somewhat overwhelming. This experience is entirely shared by the trainer. Indeed, he/she has to accept an initial slower code development. The trainer needs to take the time to not just code, but also explain to the trainee what and why he/she is doing something. Patience is required, as the trainee often does not understand everything from a first explanation attempt.

We learned that taking baby steps in explanation is the best way to go forward. It also led to new insights for the trainer, as he realized that, often, when he struggled to explain a specific part of the code or logic, he didn’t understand that part of the code well enough. In addition, we experienced  that the most optimal way to test the trainee’s comprehension, is to have him/her code simple, well defined blocks. This makes it impossible for the trainee to hide difficulties and brings comprehension problems to light. Again, this requires patience as the trainee’s first coding snippets often take a long time. However, this way, you can quickly learn what the trainee is struggling with and which topics have to be re-explained.

As the exercise progressed, the trainee started to be more and more fluent in programming. We noticed that paired programming mellows the steep learning curve as it offers immediate feedback. The trainer helps out with less obvious errors and provide suggestions on more efficient ways of coding or best practices in general. He  can also start asking questions like, why did you program it this way? Are there alternatives? … By asking these questions, the trainer gains a better idea on how the trainee reasons but also starts to learn from that reasoning. At this point, we really started to experience the benefits of pair-programming. The trainee started to create an own vision on how to develop the code and how to make it future proof. The trainee also became a trainer. As we discussed how we would proceed, we started to learn from each other. As we were coding as a pair, we really had to explain our vision to each other, enabling us to find the “logic errors” before we started coding.

Thus,  although learning via paired programming can initially slow you down, after a rather short period the method will actually result in a huge time saving.  As a trainee, your skills will progress much more rapidly than if you had been practicing on your own. Also, we noticed positive effects on project advancement as both the amount of logical and code errors reduced drastically.

In sum, our paired-programming endeavor started off with a clear distinction between “trainer” and “trainee”. However, when time evolved, these roles became vague and we ended as pair programmers who learned from each other. The code we wrote was less buggy and more future proof than the code written when working alone. In the end, both the trainer and trainee were trained, workflow was improved and we learned to work in pairs. It was a challenging, but awesome experience!