Data Science

 Classification: Basic Concepts

Decision Tree Induction

Bayes Classification Methods

Rule-Based Classification   

Assignment 4

1- Consider the data in the following table: 

  

TID 

Home Owner

Marital Status

Annual Income

Defaulted   Borrower

 

1

Yes

Single

[120 – < 150K]

No

 

2

No

Married

[90 – < 120K]

No

 

3

No

Single

[60 – < 90K]

No

 

4

Yes

Married

[120 – < 150K]

No

 

5

No

Divorced

[90 – < 120K]

Yes

 

6

No

Married

[60 – < 90K]

No

 

7

Yes

Divorced

[120 – < 150K]

No

 

8

No

Single

[90 – < 120K]

Yes

 

9

No

Married

[60 – < 90K]

No

 

10

No

Single

[90 – < 120K]

Yes

Let Defaulted Borrower be the class label attribute. 

a) Given a data tuple X = (Home Owner= No, Marital Status= Married, Income= $120K). What would a naive Bayesian classification of the Defaulted Borrower for the tuple be?

2- Consider the training example in the following table for a binary classification problem.

  

Customer ID

Gender

Car Type

Shirt Size

Class

 

1

M

Family

S

C0

 

2

M

Sports

M

C0

 

3

M

Sports

M

C0

 

4

M

Sports

L

C0

 

5

M

Sports

XL

C0

 

6

M

Sports

XL

C0

 

7

F

Sports

S

C0

 

8

F

Sports

S

C0

 

9

F

Sports

M

C0

 

10

F

Luxury

L

C0

 

11

M

Family

L

C1

 

12

M

Family

XL

C1

 

13

M

Family

M

C1

 

14

M

Luxury

XL

C1

 

15

F

Luxury

S

C1

 

16

F

Luxury

S

C1

 

17

F

Luxury

M

C1

 

18

F

Luxury

M

C1

 

19

F

Luxury

M

C1

 

20

F

Luxury

L

C1

a) Find the gain for Gender, Car Type, and Shirt Size.

b) Which attribute will be selected as the splitting attribute? 

PLEC Week 4

 

Discuss in 500 words your opinion whether Edward Snowden is a hero or a criminal. You might consider the First Amendment and/or the public’s right to know as well as national security concerns. 

Use at least three sources. Use the Research Databases available from the Danforth Library, not Google.   Include at least 3 quotes from your sources enclosing the copied words in quotation marks and cited in-line by reference to your reference list.  Example: “words you copied” (citation) These quotes should be one full sentence not altered or paraphrased. Cite your sources using APA format. Use the quotes in your paragraphs. Do Not Doublespace.

Write in essay format not in bulleted, numbered or other list format.

Discussion 6

 What are the implications for Basic Attention Token” (BAT) and Blockstack relative to global marketing? 

APA format- Minimum 500 words.

 

Read:

Watch:

  • YouTube: Basic Attention Token
  • YouTube: Blockstart – A New Internet for Decentralized Apps
  • YouTube: Data is the New Gold, who are the New Theives?  

BI -Assignment week 6

Complete the following assignment in one MS word document:

Chapter 10 –discussion question #1-2 & exercise 1 & 7

Chapter 11- discussion question #1-4 & exercise 4

When submitting work, be sure to include an APA cover page and include at least two APA formatted references (and APA in-text citations) to support the work this week.

discussion

 

Give an example of an organization with an ineffective or cumbersome structure. Explain the problems with the current structure and how these problems could be solved.

The assignment is to answer the question provided above in essay form. This is to be in narrative form and should be as thorough as possible. Bullet points should not to be used. The paper should be at least 1.5 – 2 pages in length, Times New Roman 12-pt font, double-spaced, 1 inch margins and utilizing at least one outside scholarly or professional source related to project management. The textbook should also be utilized. Do not insert excess line spacing. APA formatting and citation should be used.

Developing Disaster Recovery Backup Procedures and Recovery Instructions

Briefly review at least three of the first page results regarding RTO.

Make a backup of any Lab Assessment Worksheets you may have completed from this

lab manual. If this is the only lab you’ve worked on, then make a mock Lab Assessment

Worksheet using the worksheet from this lab and back that one up instead.

Write the backup procedures and recovery procedures you used.

Describe your personal procedures in terms of your RTO as explained in Web sites

visited earlier in this lab.

Describe ways you can lower the RTO.

CIS 275

USE IMDB    — ensures correct database is active

GO

PRINT ‘|—‘ + REPLICATE(‘+—-‘,15) + ‘|’

PRINT ‘Read the questions below and insert your queries where prompted.  When  you are finished,

you should be able to run the file as a script to execute all answers sequentially (without errors!)’ + CHAR(10)

PRINT ‘Queries should be well-formatted.  SQL is not case-sensitive, but it is good form to

capitalize keywords and table names; you should also put each projected column on its own line

and use indentation for neatness.  Example:

   SELECT Name,

          CustomerID

   FROM   CUSTOMER

   WHERE  CustomerID < 106;

All SQL statements should end in a semicolon.  Whatever format you choose for your queries, make

sure that it is readable and consistent.’ + CHAR(10)

PRINT ‘Be sure to remove the double-dash comment indicator when you insert your code!’;

PRINT ‘|—‘ + REPLICATE(‘+—-‘,15) + ‘|’ + CHAR(10) + CHAR(10)

GO

GO

PRINT ‘CIS2275, Lab Week 6, Question 1  [3pts possible]:

Write the query to display the name and year of birth for all people born after 1980, who have

directed at least one show (i.e. those who appear at least once in the title_directors table).

Limit results to those who have died (who have a value in the deathYear column).

———————————————————————————————-

Columns to display:    name_basics.primaryName, name_basics.birthYear

Sort in descending order by birth year.’ + CHAR(10)

— [Insert your code here]

GO

PRINT ‘CIS2275, Lab Week 6, Question 2  [3pts possible]:

Show every genre of television show which has had at least one title with 500 episodes.

i.e. limit results to the titleType ”tvEpisode” in the title_basics table, and to titles

containing a row in the title_episode table with episodeNumber 500.

———————————————————————————————-

Columns to display:    title_genre.genre

Display genre name only, and eliminate duplicate values.’ + CHAR(10)

GO

— [Insert your code here]

GO

PRINT ‘CIS2275, Lab Week 6, Question 3  [3pts possible]:

Write a common table expression to identify the WORST shows: join title_basics against title_ratings

and limit your results to those with an averageRating value equal to 1.  Project the title,

type, and startYear from title_basics; and label your CTE as BADSHOWS.

In the main query, show a breakdown of BADSHOWS grouped by type, along with the total number of

rows for each (i.e. GROUP BY titleType)

———————————————————————————————-

Columns to display:    titleType, COUNT(*)

Sort results in descending order by COUNT(*).’ + CHAR(10)

GO

— [Insert your code here]

GO

PRINT ‘CIS2275, Lab Week 6, Question 4  [3pts possible]:

Identify the least popular professions.  Show each profession value from the name_profession table,

along with the total number of matching rows (GROUP BY profession).  Use the HAVING clause to limit

your results to professions with less than 1,000 rows.

———————————————————————————————-

Columns to display:    name_profession.profession, COUNT(*)’ + CHAR(10)

— [Insert your code here]

GO

GO

PRINT ‘CIS2275, Lab Week 6, Question 5  [3pts possible]:

Use the query from #4 above to display the names of all people belonging to these professions.

Use the previous query as a subquery in the FROM clause here to limit the results.

———————————————————————————————-

Columns to display:    name_basics.primaryName, name_profession.profession

Sort results in ascending order by primaryName.’ + CHAR(10)

— [Insert your code here]

GO

GO

PRINT ‘CIS2275, Lab Week 6, Question 6  [3pts possible]:

Show the name of every writer, along with the total number of titles they”ve written (i.e. rows in the 

title_writers table).  Limit results to those who have written between 5,000 and 10,000 titles (inclusive).

———————————————————————————————-

Columns to display:    name_basics.primaryName, COUNT(*)

Sort results in descending order by primaryName.’ + CHAR(10)

— [Insert your code here]

GO

GO

PRINT ‘CIS2275, Lab Week 6, Question 7  [3pts possible]:

Show the actor and character names for everyone who has performed the same role in more than one

show with the title ”Battlestar Galactica”.  i.e. identify the combination of (primaryName, characters)

which occurs in the title_principals table more than once for matching titles.

———————————————————————————————-

Columns to display:    name_basics.primaryName, title_principals.characters, COUNT(*)

Sort results in ascending order by primaryName.’ + CHAR(10)

— [Insert your code here]

GO

GO

PRINT ‘CIS2275, Lab Week 6, Question 8  [3pts possible]:

Identify the names of people who have directed more than five highest-rated shows (i.e. title_ratings.averageRating = 10).

For each of these people, display their names and the total number of shows they have written.

———————————————————————————————-

Columns to display:    name_basics.primaryName, COUNT(*)

Sort results in ascending order by primaryName.’ + CHAR(10)

— [Insert your code here]

GO

GO

PRINT ‘CIS2275, Lab Week 6, Question 9  [3pts possible]:

Display the title and running time for all TV specials ( titleType = ”tvSpecial” ) from 1982; if the run time is

NULL, substitute zero.

———————————————————————————————-

Columns to display:    title_basics.primaryTitle, title_basics.runtimeMinutes

Sort in descending numerical order by the resulting calculated run time value.’ + CHAR(10)

— [Insert your code here]

GO

GO

PRINT ‘CIS2275, Lab Week 6, Question 10  [3pts possible]:

Identify every movie from 1913 (startYear = 1913, titleType = ”movie”); limit your results to those with a non-NULL value

in the runtimeMinutescolumn.  For each movie, display the primaryTitle and the averageRating value from the title_ratings table.

Use DENSE_RANK() to display the rank based on averageRating (label this RATINGRANK), and also the rank based on runtimeMinutes

(label this LENGTHRANK).  Both of these should be based on an asecending sort order.

———————————————————————————————-

Columns to display:    title_basics.primaryTitle, title_ratings.averageRating,

                       RATINGRANK, LENGTHRANK

Sort results in ascending order by primaryTitle.’ + CHAR(10)

— [Insert your code here]

GO

GO

————————————————————————————-

— This is an anonymous program block. DO NOT CHANGE OR DELETE.

————————————————————————————-

BEGIN

    PRINT ‘|—‘ + REPLICATE(‘+—-‘,15) + ‘|’;

    PRINT ‘ End of CIS275 Lab Week 6’ + REPLICATE(‘ ‘,50) + CONVERT(CHAR(12),GETDATE(),101);

    PRINT ‘|—‘ + REPLICATE(‘+—-‘,15) + ‘|’;

END;

executive summaries

 1 page discussion

  1. Search the internet for sample executive summaries.
  2. From your research, describe 3 items that need to be in an executive summary and your reasons why.
  3. Include in your report the ways that you would ensure that your executive summary would get buy-in from the key stakeholders of this project.