Math 300 - Homework Assignment #5 (back to Math 300 main page)

Suggetsed solutions could be found by these links: myrank.py, run_rank.py, myalgs.py, run_boole.py.

The assignment is due Thursday, July 27, 4:00pm.

Assignment is worth 110 points.

In this assignment you are supposed to implement exactly the same algorithms as you implemented in the previous assignment. There are some minor changes which are highlighted with blue.

Part 1 (45 points).

In this part you are required to create a Python function named page_rank(matr, alpha, e) and store it in a file named "myrank.py", and a Python script named "run_rank.py".

The "myrank.py" file should define the page_rank(matr, alpha, e) function as follows:

1. The function has 3 input parameters:
• matr - 2-dimensional array representing a marix descibing network of some web pages.
• alpha - damping factor.
• e - precision.
2. The function returns an array variable representing a column-vector - PageRank vector.
3. The function shoud implement PageRank algortithm used by Google search engine to rank web pages resulting from a particual search query. The input of the algorithm is an adjacency matrix (input parameter matr) of a directed graph describing some particual network of web pages (see also this link). The algorithm transforms given matrix to a so called Google Matrix using given damping factor (input parameter alpha). Then the algorithm finds the eigenvector corrseponding to the dominant eigenvalue (the ones with the largest magnutude) of the Google Matrix using power iteration method (recall the document you typest in the previous assinment). The computed eigenvector called PageRank vector and it represents the probability distribution of the likelihood that a person randomly clicking on links will arrive at any particular page. This vector is the output of the algorithm. There are dozens of online materials explaining this algorithm, e.g. this one or the original article.
4. The Google Matrix ($G$) should be computed from the adjacency matrix ($A$) as follows: $First, compute matrix B=(bi,j) from matrix A=(ai,j) by dividing every element of A by the sum of all elements of the column to which given elemnt belongs, i.e. bi,j=ai,j ∑ k = 1 N ak,j$ $Then G = α B + ( 1 - α ) 1 N C , where N is the size of the square matrices A,B and C ; C is simply a matrix of all ones, i.e. C = 1 1 ⋯ 1 ⋮ ⋮ ⋱ ⋮ 1 1 ⋯ 1$ Note that you will probably need "for" loop (to iterate over all columns of the matrix, but not rows!) together with numpy.sum function to compute $B$. More on numpy.sum function usage could be found here.
The $G$ matrix could be computed just with one line as soon as you got the $B$ matrix.
5. The power iteration method is extremely simple:
1. You start with the vector ${x}_{0}$ representing uniform distribution, i.e. all cordinates are $\frac{1}{N}$
2. Your first iteration should compute vector ${x}_{1}$ by multiplying ${x}_{0}$ by your Google Matrix $G$, i.e. ${x}_{1}=G{x}_{0}$
3. Every next iteration ($k$) will also multiply vector obtained from preceeding iteration ($k-1$) by your Google Matrix $G$, i.e. ${x}_{k}=G{x}_{k-1}$ .
4. The algorithm should stop when ${\parallel {x}_{k}-{x}_{k-1}\parallel }_{2}=\sqrt{{\left({x}_{k}^{1}-{x}_{k-1}^{1}\right)}^{2}+\cdots +{\left({x}_{k}^{n}-{x}_{k-1}^{n}\right)}^{2}}\le e$, where $e$ denotes precision specified by the input parameter e.
5. Then your PageRank vector is simply ${x}_{k}$.
Note that you probably will need to use "while" loop to implement this algorithm, and the variables for ${x}_{k}$ and ${x}_{k-1}$ (which are column-vectors) should be defined before starting the loop (you can define ${x}_{k-1}$ with all zeroes and ${x}_{k}$ as descibed in (a)). Also note that the expression used in (d) is called 2-norm or Euclidean norm of a vector (in your case vector which norm is computed is obtained by substracting vectos ${x}_{k}$ and ${x}_{k-1}$) and numpy.linalg.norm function may be useful here. More details on usage could be found here.
6. Note that your function should be able to process matrices of any size, i.e. you should not hardcode any particual size of matrix N inside your function. Instead it should be derived (and probably stored into some variable) inside your function by applying proper built-in function to your matrix matr.
7. Your function should check whether input matrix matr is really a square matrix. If not, an error with a proper message should be raised using raise statement.
8. On every iteration your function should print current value of ${x}_{k}$ and ${\parallel {x}_{k}-{x}_{k-1}\parallel }_{2}$ to the console. The floating point numbers should be displayed with exactly 7 digits after decimal points.

The "run_rank.py" script should do the following:

1. Load a square matrix from this file to an array (2-dimensional) variable M. The matrix descibes primitive network of 8 web pages using CSV format.
2. Compute PageRank vector for the network descibed by M by calling page_rank(matr, alpha, e) function (defined in "myrank.py" ) with damping factor 0.85 and precision 10-6. The output should be stored into an array variable named v and immediately printed to the console.
3. Plot computed PageRank vector v using matplotlib.pyplot.bar function. More details on usage could be found here. The axes should be labeled as "Pages" (x-axis) and "Rank" (y-axis).

The diagram produced by your script should look like this.

Part 2 (45 points)

In this part you are required to create a Python function named boole. The function should be stored in a separate file with named "myalgs.py".

This function should implement numerical integration algorithm using one of the Newtonâ€“Cotes formulas called Boole's rule:

1. The function has 3 input parameters:
• fun - function to be integrated.
• a - lower limit of integration.
• b - upper limit of integration.
• n - number of partitions.
2. The function returns a floating point number - value of the defintie integral.
3. Here is how Boole's rule work. Consider the integral $\int_a^b f(x)\,dx$ where $$f$$ (fun input parameter) is twice continuously differentiable. Let $$a=x_0\lt x_1\lt\dots\lt x_n=b$$ be an equally-spaced partition of $$[a,b]$$ (a and b input parameters), where we assume that $$n$$ (n input parameter) is evenly divided by 4. When $$h=(b-a)/n$$ is the uniform width of the subintervals, then accroing to Boole's rule your integral can be estimated as $\int_a^b f(x)\,dx \sim \frac{2h}{45}\sum_{i=1}^{n/4} \left( 7f(x_{4i-4})+32f(x_{4i-3})+12f(x_{4i-2})+32f(x_{4i-1})+7f(x_{4i}) \right).$
4. Your function should plot two things: original function on the given interval (using grid with n*10 points or more) as a blue solid line without any point markers; and values of your function at endpoints of all subintervals (i.e. $$f(x_0), f(x_1), \dots, f(x_n)$$) as a dashed red line using "x" points marker. For this purpose you should probably store both endpoints of your subintervals and values of your function at this points while doing computations, and then use these two sets to create the plot. The axes should be labeled properly, e.g. "x" and "f(x)".
5. Your function should check whether values provided for a and b satifsfy a < b. If not, an error with a proper message should be raised using raise statement.
6. Your function should check whether provided number of subintervals n is divisible by 4. If not, an error with a proper message should be raised using raise statement.
7. Your function should not print any output messages to the console.

Part 3 (20 points)

In this part you are required to create a Python script named "run_boole.py".

The script shoud test the boole function created in Part 2 and do the following:

1. Define 3 anonymous functions lambda expressions:
1. f1 = $$|\sin(x) | + |\cos(x)| + x$$ on [-π , π]
2. f2 = $$\frac{1}{x}$$ on [0.1, 5.1]
3. f3 = $$x^3 - 2 x^2 - 5x + 1$$ on [-3, 4]
2. For each of the above functions do the following:
1. Compute the value of the integral on the given interval using scipy.integrate.quad function. More details on usage could be found here.You will use this value as a reference value to compute error.
2. Compute the value of the integral on the given intervals using your boole function for five different partitions: n=4^1=4, n=4^2=16, n=4^3=64, n=4^4=256 and n=4^5=1024. For every given value of n print the value of n and the estimated value of the integral to the console. The floating point numbers should be displayed with exactly 10 digits after decimal points. All the computed values should be saved for the next step. After computing integral for each of the given partitions, pause the script execution for 1 seconds. For this purpose you may use pause(1) function. More details on usage could be found here.
3. Plot the following:
• x-axis: powers of 4 that used to compute number of subintervals for every step in (b), i.e. n=1, n=2, n=3, n=4 and n=5;
• y-axis: computation error, i.e. absolute value of the difference between the value computed with scipy.integrate.quad function (step (a)) and the value computed with your boole function (step (b)) using corresponding partitioning.
The axes should be properly labeled, e.g. "Number of subintervals" and "Error". The title of your plot should be the name of the function and corresponding interval, e.g. "f3 on [-3, 4]"
4. The script execution should be paused and wait for user to press Enter key. For this purpose you may use raw_input() function.