Computing at Stanford and Introduction to SAS HRP223 Topic 0 Sept 26th, 2011

Objectives Administrivia Software tools at Stanford Security at Stanford Software tools not endorsed by Stanford Data SAS Administrivia

General The course website has critical details: If you can, please print the slides just before the start of class. Administrivia Goals

This course will provide practical solutions to problems that arise before doing analyses as well as the final push toward getting the results. I will talk about issues like finding unruly data, massaging data into a useful format, building datasets of valid data and choosing statistics. Administrivia

Getting Help Mike Hurley [email protected] is the TA for the course. His office hours will be announced weekly. I will be available for online Q&A at [email protected] or preferably, on the class newsgroup. I will answer questions every morning around dawn. If you post to the newsgroup and do not hear back quickly please email me. Things labeled Assignment, but not Homework, can be done with the help of classmates. You are strongly encouraged to discuss your problems up

until you start writing your answers to the homework problems. Administrivia Preliminaries I assume you know how to use Windows or Mac OS. For this class you need access to a machine with: Windows XP Pro or Vista Business/Ultimate Windows 7 Professional/Business/Ultimate.

XP Home Edition or Vista Home Edition and Windows 7 Home Premium will not work with the software in this class. I use: XP Pro, 7 Pro, and XP Pro running in Parallels on the Mac. Administrivia Getting a Computer

If you want to get a new computer, you can get one at a very good price through Stanford. You can get ideas on what is an acceptable computer here: You want to have XP Pro or the Business or Ultimate version of Vista or Windows 7. Stanford Software

Free Stanford Tools You can get access to free software from Stanford by going here: You must use antivirus software. You will fail the course if you send me a document that contains a virus or other malicious code. There is no forgiveness for this offense and this is not open to debate.

Stanford Software Get the Sophos Scanner Stanford Software Virus and Worm Issues (3) Virus scan before you email me anything!

Right click on the file you want to scan and then pick Scan with Sophos Anti-Virus Sophos keeps itself updated constantly. Stanford Software Sophos Anti-Virus (For both Windows & Mac OS)

Watches for suspicious things and stops them until you authorize the software If your quarantine has a file get help You can submit suspicious files Stanford Software

Stanford Desktop Tools This allows you to install and update BigFix, Security Self-Help and Open AFS and other tools. BigFix automatically checks for important software updates. Security Self-Help checks and allows you to fix security weaknesses on your machine. Open AFS lets you have access to your UNIX

account like it is just another Windows hard drive. Stanford Software Stanford Desktop Tools Stanford Software Your UNIX Account You have a website made for you already: UNIX stuff You can use Stanford Desktop Tools to mount your UNIX drive just like another hard drive. I get stuff on the web quickly with Open AFS If you do not want AFS you can also use SecureFX which you can get from ESS or just go to

Do NOT put confidential/HIPAA sensitive stuff out there. Stanford Software My UNIX Space Stanford Software After AFS is Installed

Stanford Software SecureFX Stanford Software is the easy way to move files to your UNIX space.

Stanford Software Security Passwords The Leland system places restrictions on passwords. You should set your passwords on other machines to be just as hard to crack. You can use Stanfords Security Self-Help Tool which comes with

Stanford Desktop Tools to check your passwords. If you do not know how to set or change your password look here: Security General Security The biggest weaknesses in computer security are the legal users of the system. Walking away from a terminal

Using passwords that are easy to crack Taking data off of restricted machines Viruses and Trojan horses will kill you if you let them! Security Email Email provides all the confidentiality of a postcard.

If you are sending HIPAA sensitive information you can secure your email: Security Unsolicited Email Spam, Spam, Spam, wonderful Spam, yes wonderful Spam You may get unsolicited commercial solicitations,

advertisements, chain letters, or pornography through your Stanford email account. NEVER respond to these messages, never use the REMOVE provided in the email. NEVER put your email address on a web page. At you can choose the Preferences tab and Filters from the left to automatically sack repeat offenders.

Security Security Back up your work! Each year, on average, one student in five loses all their work. Plan on your computer being destroyed at the worst possible time this year. Coffee, computer worm or virus, small child with

refrigerator magnet, physical hard drive failure, theft, bicycle crash, etc. Every day back up your work to more than one location. Security Where to Backup PLEASE use removable media if you have no

network access Floppy disk, CD, DVD, flash media NEVER backup or share confidential data (HIPPA sensitive protected health information) on mobile media without talking to security experts first. At home I use Ask your Tech support person for recommendations. Security

Encrypted USB drives USB drives (also called thumb drives) are a very convenient way to keep backups and allow you to move your data around. However, they are very easy to lose! NEVER store unencrypted, restricted data on a USB drive. You can encrypt at the file level (Excel, winZip) ok You can encrypt the whole drive (PGP disk, TrueCypt) Better. You can have a hardware encrypted USB drive BEST!

There are many manufacturers, however, most are Windows only. IronKey supports both Windows and Mac and is highly recommended. 1 Gig for $50 up to 32 Gig for $250 on Amazon Other Software Data Management and Analysis Tools of the Trade Containers to hold data

Microsoft Excel REDCap Analysis tools SAS with Enterprise Guide R with Rcmdr Other Software Excel

is not a good place for HIPAA sensitive (PHI) material makes it easy to enter bad data can be a huge headache to import Other Software

REDCap is a good place for HIPAA sensitive (PHI) material makes it hard to enter bad data is mostly painless to import for analysis Other Software SAS 9.3

SAS is an old programming language where you type commands and run a bunch of things at once. Other Software Enterprise Guide 4.305 EG is a newish programming environment where you type commands or point and click. Other Software

R 2.13.1 R is a modern programming language with user hostile help files. R Studio Studio is an Integrated Development Environment (IDE) for R. Other Software

R Commander Rcmdr is a friendly, but incomplete, graphical user interface (GUI) for R. Other Software Getting SAS If you have a machine with XP, Vista or Windows 7 Pro, Business or Ultimate and more than 30 Gig of

extra hard drive space you can get SAS for $65 per year. Place the order here: There is a digital download that is HUGE (11+ Gig not Meg). If you have a wired connection on campus consider it. Otherwise ask me for the DVDs. The instructions for installing it can be found here:

Other Software SAS for Free on Campus If you dont mind working in a public place, SAS is in the Lane library and M202 lounge. Other Software Other Tools I Regularly Use

File manipulation UltraEdit Ultracompare Info Management FileLocator Pro Google Sites Other Software

UltraEdit If you work with text files, get UltraEdit and buy the perpetual license. Other Software UltraCompare A tool to track changes in code or other text files

Other Software FileLocator Pro If you cant find files on your machine, consider FileLocator Pro. Other Software

Google Sites If you need to keep track of tons of random facts (like code snippets) consider using Google Sites. Data What is Data? Stuff that will make you famous or cry

you want to pull from the electronic medical record the information you will need to store if it is not in the medical record Data Structured vs. Unstructured Unstructured data Text like dictations, operation notes, data entry

comments Difficult to process Structured data Afford the ability to build ontologies Dates Pick lists (multiple choice) Relatively easy to process Data

Structuring Biomedical Data RxNORM for drug ingredients / brand names ICD-9 for billing diagnostic and procedure codes fairly coarse but nicely hierarchical ICD-O for detailed cancer pathology CPT for procedures No hierarchical structure, difficult to search

SNOMED-CT for general purpose clinical terms Hierarchical, detailed and vast but with some gaps Data What is structured data? All pieces of information that you collect and calculate as part of a study are data. Every persons response to a questionnaire is called a data point.

There are two fundamentally different types of data: numeric and character. Numeric data is always numeric. Information that you could want to do math on is numeric data. Character data is alphanumeric. It includes the obvious things like names and addresses, but it also includes numbers that you should not do math on. Some systems, like R, make finer distinctions and let you set data so they are forced to be factors.

Data What is data coding? A question such as, What is your current age in years? is going to generate numeric data. A question such as, At what age did you first contract a sexually transmitted disease? is going to generate numeric data . But you are going to need to allow for the possibility that

somebody has never contracted a sexually transmitted disease. and you always need to allow for people who never knew or do not remember information or who may be dishonest in their answers. Data What is data coding? (2) When you have a question that generates numeric

data and your subjects response is not a real number you can code a bogus value. Not applicable can be coded as age 1000000. Do not know can be coded as 2000000. The better way to deal with this problem is to use the value NULL. SAS allows you to code 27 different types of NULL. Null values make your job easier when you try to do math on the values.

Data Missing Data SAS represents missing character data as a pair of quotes with nothing between them and missing numbers are stored as a decimal place. You can also use .A, .B, etc. to code for missing numbers but you cant enter them directly.

Data What is data coding? (3) Questions that generate alphanumeric data are always complex compared to numeric data. Where were you born? can be coded as a string of letters from a fill-in-the-blank question or coded as letters or numbers from

a multiple choice format question. Do not use null in fill-in-the-blanks. Data Typical Tasks

Importing data Cleaning Making a subset Numeric and graphical summaries Analyses with graphics Summary reports or

Doing simple math SAS Basics While most people use SAS for processing complex collections of data, it can be used for simple math. The techniques that you use for simple math are also used to make complex changes to any size data sets.

I hope this stuff will make your lives easier in statistics classes SAS Using EG for Math SAS Make a

temporary dataset to hold the answer. A data set is shown in the flowchart. Its contents are displayed in the programming windowpane. You can see it stored in the temporary work library by browsing the Server List. SAS

SAS The Log tab gives you feedback on what SAS did. SAS No Need for a Data Set For a simple calculation you do not need to make a dataset to hold a single number. You have the number show up in the log window. 1. Give SAS a formula.

1+1 2. Tell it what to call the results. theAnswer 3. Print the results out. putlog theAnswer = Use short meaningful names that do not include spaces, punctuation

characters, or leading numbers. 4. Tell it you are done giving it instructions. SAS Basic Math

You put the instructions together by typing a program into the code window, like this: Dont bother to store the results in a dataset. data _null_; theAnswer = 1 + 1; putlog theAnswer =; run; Run it. SAS

The count of how many lines have been submitted The Answer Dont panic. The help that ships with SAS is good. It is its own program hidden in the documentation subfolder

inside the SAS folder off the Windows start button. Search for functions and call routines by category Click the Favorites tab. Final Administrivia Please save a table for the people who are officially enrolled (or are taking the class for

deferred credit). Bring a laptop with SAS if possible. Grades (pass/fail only) Pass 4 of 4 homework assignments for 3 units Pass 3 of 4 homework assignments for 2 units

