Assignment

For this assignment, you are to construct a visual story of yourself, through information.

You have a lot of leeway with this assignment, but I want you to come up with an interesting and informative way to tell someone something about yourself, or to somehow track a particular characteristic or event. What is important, what is not? How does it relate to other things? If you can describe yourself in a paragraph, how would you describe yourself in a visual? Can you pick out a little piece of your experience of life and use graphical structure to tell this story, and even learn from it?

Think of this as building something like a resumé (meaning: “summary”) or curriculum vitae (“story of life”), but constructed through graphical structures rather than words, and topically about something broader than your skills and job history. This visual should investigate and show something about yourself that is interesting, unique, and perhaps quirky.

You can, and likely will, choose to concentrate on a particular area or subset of yourself rather than trying to incorporate everything, but you should try to make a coherent point, statement, or story with this visual. Incorporate as much or as little as you want.

This visual can be based on real data, if you have it, or it can be based on pseudo-data — approximations and an organizational sense, even if there is not true “data” to back it up. The point is rather to come up with a way to visually arrange, organize, and communicate.

You can choose to organize your graphic through some mechanism of time, space, flow, connection, overlap, impact, interest, happiness, …

SyncSession Assignment with Powerpoint presentation..

 

 

SyncSession Assignment 4 (LOA 2)

In order to maximize the time we have in the first syncSession please complete the following activities prior to the meeting time:

Reading Assignment:

  • Trochim/Donnelly: Ch 7-2, 7-3, 8-1 to 8-6, and 9-1

Assignment:

Using the study you have selected for your project assignments, prepare a short PowerPoint presentation (8-10 slides) that describes and summarizes the published study. Ensure that the following items are covered in the presentation:

  • Problem addressed by the study
  • Research question(s) guiding the study
  • Conceptual model of the study relationships
  • Research approach (Exploratory vs Hypothesis-Based)
  • Research design (Qualitative vs Quantitative vs Mixed Methods)
  • Research methods utilized
  • Extension opportunity

assignment

Write at least a 3-4-page APA formatted paper on a business problem that requires data mining (I grade on quality and not quantity so show me you understand what we have leaned so far).

Explain why the problem requires data mining, the general approach you plan to take, what kind of data you plan to use, and finally how you plan to get the data.

You should (1) describe your problem, approach (SPECIFIC ex: CRISP-DM, SEMMA, KDD, etc. ), (2) dataset (SPECIFIC ex: historical artifacts, transactions, etc) , (3) data analysis/ tools and techniques (SPECIFIC algorithm, SPECIFIC technique, SPECIFIC method.. clustering, sampling, etc) , and (4) how your data mining will attempt to solve your problem

You MUST include the 4 targets listed above with SPECIFICS..ex: pre-processing of data, algorithm, method, and/or technique we have discussed. I am looking for how you are understanding the specific concepts and how you are applying these concepts.  

Your paper should include an abstract and a conclusion and a reference page with at least 3-5 references.

All written reports should be submitted in MS Word.  

Week 7 Interactivity Research Assignment

Background:  Applying the material covered in Chapter 6 and now in Chapter 7  specifically focusing on Interactivity, the assignment for this chapter  will analyze the gallery of 49 chart types from Chapter 6 and provide  the following for 5 of your choice.

1.  Select 5 chart type options from the gallery of 49 presented in Chapter 6

2.  For  each, select 3 of the 5 most often used data adjustment features and  for each describe in detail how you would apply each to each of the 5  chart types. Example for one: Chart Type Selected – Word Cloud.  The 3  Data Adjustments selected: Contributing – force input from the  viewer/user to select one word from a drop-down list before moving  forward with the display. The results would display the visualization  with the stats for the word the viewer/user selected.  The format for  this information should be in a table format with no attempt for full  sentences.

3.  Immediately  following this table, provide your perspective related to any problems,  issues or constraints in selecting 3 data adjustment features for each  chart type selected. You do not have to use the same data adjustment  features for each chart type.  An example of issues could be after  selecting a Stream Graph and a Framing data adjustment feature, any  example I developed did not make sense.  I also had to change the data  adjustment feature of navigating as my first choice because I could not  think of an example to fit the data and chart type.   Do NOT use any  suggestion if any is provided in the text for interactivity.  Do not  copy my examples. You must not copy and paste any information from the  text from the pages in the gallery.  You must apply what you have  learned from the previous chapters and not copy and paste from other  sources.  When you do use other sources to help gather any knowledge  such as the text and other online materials such as the book companion  site or the library, include each as a source on the reference page  following APA formatting.

4.  For  each chart type selected, provide examples for each of the 3  Presentation adjustments and why those examples fit the data and chart  type.  Again, use a table format instead of attempting sentences.

5.  Immediately  following this table, provide your perspective related to any problems,  issues or constraints in developing the examples of the 3 Presentation  adjustments for each chart type selected. An example could be after  selecting a Waffle Chart and a Focusing presentation adjustment feature,  I had to develop 4 examples before the final choice made sense.  Do NOT  use any suggestion if any is provided in the text for interactivity. Do  not copy any example I provided.  You must not copy and paste any  information from the text from the pages in the gallery.  You must apply  what you have learned from the previous chapters and not copy and paste  from other sources.  When you do use other sources to help gather any  knowledge such as the text and other online materials such as the book  companion site or the library, include each as a source on the reference  page following APA formatting.

6.  In  a conclusion, provide your reflection on the chapter contents, the  material and discussions in the discussion forum, and the efforts to  complete the above requirements to include how these activities and  knowledge will assist you in the future for your data visualization  projects.  These future projects could be the possible initiations at  your organization or personal effort or maybe an upcoming class or  degree requirement.

Your  research paper should be at least 3 pages (800 words), double-spaced,  margins are normal, have at least 4 APA references, and typed in an  easy-to-read consistent font family (size no lager than 12) in MS Word  (other word processors are fine to use but save it in MS Word format).  Your cover page should contain the following: Title, Student’s name,  University’s name, Course name, Course number, Professor’s name, and  Date. 

R or weka Lab

  

Laboratory I:

         

To download additional .arff data sets go to:

http://www.hakank.org/weka/

or search the Internet for .arff files required

· What’s the difference between a “training set” and a “test set”?

· Why might a pruned decision tree that doesn’t fit the data so well be better than an un-pruned one?

· What’s the first thing that 1R does when making a rule based on a numeric attribute?

· How does 1R avoid overfitting when making a rule based on an enumerated and/or numeric attribute?

· What is the difference between Attribute, Instance and Training set? 

  • What      is the difference between ID3 and C4.5?
  1. Use the following learning      schemes to analyze the iris data (in iris.arff): 

  

OneR

– weka.classifiers.OneR

 

Decision table

– weka.classifiers.DecisionTable -R

 

C4.5

– weka.classifiers.j48.J48

· Do the decisions made by the classifiers make sense to you? Why?

· What can you say about the accuracy of these classifiers? When classifying iris that has not been used for training? 

· How did each one of the methods perform?

  1. Use the following learning      schemes to analyze the bolts data (bolts.arff without the TIME attribute):      

  

Decision Tree

– weka.classifiers.j48.J48

 

Decision table

– weka.classifiers.DecisionTable -R

 

Linear regression

– weka.classifiers.LinearRegression

 

M5′ 

– weka.classifiers.M5′

· The dataset describes the time needed by a machine to produce and count 20 bolts. (More details can be found in the file containing the dataset.) 

· Analyze the data. What adjustments have the greatest effect on the time to count 20 bolts? 

· According to each classifier, how would you adjust the machine to get the shortest time to count 20 bolts?

  1. Produce      a model for both Weather and Weather.nominal data sets. Which method(s) did you use? What did      the tree(s) look like?

Laboratory II:

 

To download additional .arff data sets go to:

weka data folder for

BreastTumor.arff

http://www.hakank.org/weka/

zoo.arff, wine.arff, bodyfat.arff, sleep.arff, pollution.arff

  1. Use the following learning schemes to analyze the zoo      data (in zoo.arff): 

  

OneR

– weka.classifiers.OneR

 

Decision table

– weka.classifiers.DecisionTable -R

 

C4.5

– weka.classifiers.j48.J48

 

K-means

– weka.clusterers.SimpleKMeans

Try using reduced error pruning for the C4.5. Did it change the produced model? Why? 

For K-means, for the first run, set k=10. Adjust as needed. What was the final number of k? Why?

  1. Use the following learning schemes to analyze the      breast tumor data. 

  

Linear regression

– weka.classifiers.LinearRegression

 

M5′ 

– weka.classifiers.M5′

 

Regression Tree

– weka.classifiers.M5′

 

K-means clustering

– weka.clusterers.SimpleKMeans

A) How many leaves did the Model tree produce? Regression Tree? What happens if you change the pruning factor? 

How many clusters did you choose for the K-means method? Was that a good choice? Did you try a different value for k?

B) Now perform the same analysis on the bodyfat.arff data set.

  1. Use a      k-means clustering technique to analyze the iris data set. What did you      set the k value to be? Try several different values. What was the random seed value?      Experiment with different random seed values. How did changing of these values      influence the produced models?
  2. Produce      a hierarchical clustering (COBWEB) model for iris data. How many clusters did it produce? Why?      Does it make sense? What did you expect?

Change the acuity and cutoff parameters in order to produce a model similar to the one obtained in the book. Use the classes to cluster evaluation – what does that tell you?

Laboratory III:

 

To download additional .arff data sets go to:

http://www.hakank.org/weka/

zoo.arff, wine.arff, soybean.arff, zoo2_x.arff, 

sunburn.arff, disease.arff

8. Use the following learning schemes to compare the training set and 10-fold stratified cross-validation scores of the disease data (in disease.arff): 

  

Decision table

– weka.classifiers.DecisionTable -R

 

C4.5

– weka.classifiers.j48.J48

 

Id3

– weka.clusterers.Id3

A) What does the training set evaluation score tell you? 

B) What does the cross-validation score evaluate? 

C) Which one of these models would you say is the best? Why?

9. Use the following learning schemes to analyze the wine data (in wine.arff). 

  

C4.5

– weka.classifiers.j48.J48

 

Decision List

– weka. classifiers.PART

A) What is the most important descriptor (attribute) in wine.arff?

B) How well were these two schemas able to learn the patterns in the dataset? How would you quantify your answer?

C) Compare the training set and 10-fold cross-validations scores of the two schemas.

D) Would you trust these two models? Did they really learn what is important for proper classification of wine?

E) Which one would you trust more, even if just very slightly?

10. Perform the same analysis of sunburn.arff as in 2. Instead of 10-fold cross-validations use 5-fold.

A)-E) Same as in 2.

F) Why could not we use 10-fold evaluation in this example?

11. Choose one of the following three files: soybean.arff, zoo.arff or zoo2_x.arff and use any two schemas of your choice to build and compare the models.

Disucssion

 

  • Describe the difference between statistical significance and practical significance.
  • What assumptions are necessary to perform a large sample test for the difference between two populations means?

security architecture 9.1

 If an attacker can retrieve the API and libraries, then use these to write an agent, and then get the attacker’s agent installed, how should Digital Diskus protect itself from such an attack? Should the business analytics system provide a method of authentication of valid agents in order to protect against a malicious one? Is the agent a worthy attack surface?