Spark lab master

#Spark lab master code

We'll `collect` this result directly into a variable. # Now use `map()` and a `lambda` function to return the number of characters in each word. 'incorrect values for pluralLambdaRDD (1d)') # TEST Pass a lambda function to map (1d) # Let's create the same RDD using a `lambda` function. # ** (1d) Pass a `lambda` function to `map` ** # TEST Apply makePlural to the base RDD(1c) And then call the () action to see the transformed RDD. # Now pass each item in the base RDD into a () transformation that applies the `makePlural()` function to each element. # ** (1c) Apply `makePlural` to the base RDD ** assertEquals( makePlural( 'rat'), 'rats', 'incorrect result: makePlural does not add an s') # Make sure to rerun any cell you change before trying the test again # If incorrect it will report back '1 test failed' for each failed test

#Spark lab master code

# Load in the testing code and check to see if your answer is correct This is a simple function that only adds an 's'. The last code cell before the next markdown section will contain the tests. Once the `` sections are updated and the code is run, the test cell can then be run to verify the correctness of your solution. The cell that needs to be modified will have `# TODO: Replace with appropriate code` on its first line. Exercises will include an explanation of what is expected, followed by code cells where one cell will have one or more `` sections. # This is the general form that exercises will take, except that no example solution will be provided. If you implementation is correct it will print `1 test passed`.

After you have defined `makePlural` you can run the third cell which contains a test. If you have trouble, the next cell has the solution. We'll define a Python function that returns the word with an 's' at the end of the word. # Let's use a `map()` transformation to add the letter 's' to each string in the base RDD we just created. Then we'll print out the type of the base RDD. # We'll start by generating a base RDD by using a Python list and the `sc.parallelize` method. # In this part of the lab, we will explore creating a base RDD with `parallelize` and using pair RDDs to count words. # ** Part 1: Creating a base RDD and pair RDDs ** # Note that, for reference, you can look up the details of the relevant methods in () # *Part 3:* Finding unique words and a mean value # *Part 1:* Creating a base RDD and pair RDDs This could also be scaled to find the most common words on the Internet. In this lab, we will write code that calculates the most common words in the () retrieved from (). The volume of unstructured text in existence is growing dramatically, and Spark is an excellent tool for analyzing this type of data. # This lab will build on the techniques covered in the Spark tutorial to develop a simple word count application. # **Word Count Lab: Building a word count application**