The Ethics of Using Learning Analytics Data and Tools
Ethical issues associated with learning analytics include both reasons to use LA data and tools and reasons to use them with caution. Ethical arguments in favor of using LA data and tools rest largely on the principle of acting in the best interests of students and instructors as well as the larger Colorado and national communities. Simply put, failing to follow up on opportunities to enhance student learning and success works against the interests of our students; failing to provide information that would develop stronger teaching practices works against the interests of our instructors; and failing to provide the largest number of capable, well-educated students works against the interests of the communities CSU serves.
Arguments favoring a cautious approach to using LA data and tools rest largely on concerns about the maturity of the tools, the reductiveness of information they provide, and potential abuses of student privacy and faculty academic freedom. While recognizing the insights afforded by the use of LA tools, a number of scholars have called attention to the potential misuse of information LA tools produce. Sharon Slade and Paul Prinsloo (2013), for example, observed that predictions about the likelihood of successful course completion could lead instructors and advisors to discourage students from taking courses or pursuing programs of study in which they are likely (but by no means guaranteed) to fail. Their caution is particularly important given the difficulty faced by students—often first-generation college students and/or members of historically underrepresented groups—who might enter higher education courses with comparatively lower levels of academic preparation than students who are members of families that enjoy higher socio-economic status or families that include members with college degrees. Slade and Prinsloo also expressed concern that inappropriate conclusions might be drawn about the teaching effectiveness of faculty members, a concern that echoes arguments made by a number of scholars about the reductive nature of student evaluations of teaching (see, for example, the 2017 meta-analysis by Uttl, White, and Gonzalez). Other scholars have argued that LA tools are too immature to be used without a great deal of caution, citing privacy concerns (Jones & Salo, 2018; Pardo & Siemens, 2014), reservations about issues related to privacy and the potential commercialization of student data (Flavin, 2016; Rubel & Jones, 2016), and concerns about the reductivism inherent in any analysis of “big data” (Stephens, 2017).
The importance of these concerns for scholars involved with learning analytics are addressed in the editor’s introduction to a recent issue of the Journal of Learning Analytics.
Questions related to privacy and ethics in connection to learning analytics have been an ongoing concern since the early days of learning analytics. Examples of some of the major questions are related to the ownership and protection of personal data, data sharing and access, ethical use of data, and ethical implications of the use of learning analytics in education. It is well recognized that these issues lie at the very heart of the field and that great care must be taken in order to assure trust building with stakeholders that are involved in and affected by the use of learning analytics. (Gašević, Dawson, & Jovanović, 2016)
With these concerns in mind, numerous proposals have been made regarding ethical principles and practices related to both the analyses that LA tools produce and access to the data on which they are based. In 2013, George Siemens suggested that we look not only at data ownership and retention but also at the issue of learner control over how their data should be used. One year later, Abelardo Pardo and Siemens (2014) proposed an ethical framework for learning analytics that focused on four aspects of privacy that had emerged in response to the growing collection of digital user data over the past two decades: “transparency, student control over the data, security, and accountability and assessment” (p. 448). More recently, Andrew Cormack (2016) has argued that we should draw on ethical frameworks used in medical research to separate “the processes of analysis (pattern-finding) and intervention (pattern-matching)” so that we can protect learners and teachers from “inadvertent harm during data analysis” (p. 91). Hendrik Drachsler and Wolfgang Greller (2016) proposed DELICATE, an eight-point checklist based on recent legal principles and the growing literature on ethical use of LA data that supports a “trusted implementation of learning analytics” (p. 89). And in a promising approach to preserving privacy while ensuring benefits to learners and teachers, Mehmet Emre Gursoy, Ali Inan, Mehmet Ercan Nergiz, and Yucel Saygin (2017) have developed and tested a framework for the development and enforcement of “privacy-preserving learning analytics (PPLA)” (p. 69).
Building on these efforts, a small but growing number of higher-education institutions (e.g., Charles Sturt University, 2015; Colorado State University, 2018; University of Michigan, 2018), professional organizations such as the Society for Learning Analytics Research (Gašević, 2018) and the Reinvention Collaborative (Jensen & Roof, 2017), and non-governmental organizations such as Jisc (Sclater, 2014; Sclater & Bailey, 2015) have developed frameworks to inform the ethical use of LA data and tools. Other institutions and organizations are currently adapting existing or developing new frameworks.
Areas that raise ethical issues include predictive analytics, assessment of teaching effectiveness, the use of multi-modal data, and working with publishers and other vendors.
Using Predictive Analytics
For instructors, predictive analytics (particularly “zero-day analytics,” which are provided prior to the start of a course) have the potential to set up biases about student abilities and potential. Our recommendation is to pursue two paths. First, prior to start of the course, we would be wise to report predictive analytics to course instructors only for groups of students (see Recommendations 12 and 13), and then only after instructors have completed some initial training on how to use such information in educationally effective ways. For example, instructors might be informed about the percentage of first-generation students, majors/non-majors, and racially-minoritized students in their courses. They might also be provided with the range of GPAs and other descriptive performance indicators. We suggest that these data be accompanied by links to teaching strategies shown to be effective with the relevant student groups. At some point in the semester, predictions based in part on these and other demographic and academic factors might be applied to individual students. It could be useful to see which students are performing above or below predictions, for instance, so that instructors can identify students in need and get a sense of how the class as a whole is performing. This approach is in use at UC Davis, where Carolyn Tomas, Vice Provost & Dean for Undergraduate Education, and Marco Molinaro, Assistant Vice Provost for Educational Effectiveness, have been developing a model for providing and using student data that requires faculty members who wish to obtain such data to complete relevant trainings.
Second, recognizing that bias is often the product of lack of knowledge about the limitations of the information provided to us, we recommend that professional development efforts regarding the use of learning analytics include significant attention to the limitations of predictive analytics (see Recommendation 13), the potential for implicit bias, and appropriate interpretations and use of the data. With an understanding of those limitations, instructors will be better prepared to interpret and use the information provided by these tools appropriately, both prior to and during a course.
In addition, it’s clear that we need to identify appropriate interventions—or at least gain a better understanding of previously-identified productive behaviors that might be recommended to students who exhibit particular patterns of behavior (or lack of behavior).
For advisors, it will be useful to consider how predictive analytics might be used to channel students into paths other than those students desire to take. For example, by discouraging students from pursuing programs of study in which they are likely (but by no means guaranteed) to fail, predictive analytics tools could serve to restrict opportunities for students based on background and experience rather than ability and potential.
With these considerations in mind, it might be the case that bias will be avoided through the careful development of nudges and alerts that can be sent automatically rather than solely at the discretion of an instructor or advisor (see Recommendation 4). Work in this area might eventually allow us to use nudges and alerts in ways that are more equitable than current practices.
Assessing Teaching Effectiveness
Administrators—in particular, department chairs/heads and college deans—might reasonably assume that LA data and tools can provide insights into the performance of individual instructors and advisors. Similarly, they might be tempted to use these tools to gain insights into instructors as a course is being offered, with the intention of providing formative feedback while it can benefit student learning and success. We recommend that information from these tools be used, if it is used at all, only in conjunction with other information about teaching effectiveness. It should not serve as the sole or primary basis for assessing teaching effectiveness (see Recommendation 9).
Using Multi-Modal Data
Multi-modal data, such as location/time data revealed through connections to Wi-Fi routers or harvesting of social media behaviors and posts, can help us gain insights into student learning behaviors outside the classroom. This kind of data can help us learn, for example, which students are attending tutoring or study group sessions or visiting the library, and how much time they spend in these activities.
While we can collect this kind of data, we should avoid doing so. And in cases where collection is a normal part of the process of gaining access to a resource (such as logging into a Wi-Fi router), we should not include this data in our analyses. We believe strongly that student privacy should be respected and that the collection of this data, while potentially useful in developing and applying models of behaviors that lead to student learning and success, is both intrusive and unnecessary.
This does not mean, however, that we should not rely on data that is provided by students as they use university services, such as tutoring, sponsored study groups, and advising. Students are aware that their attendance is recorded at these and similar activities.
Working with LA Tools Vendors
Vendors, including publishers and other learning companies such as EAB, will have access to significant amounts of data about our students. This raises numerous ethical and regulatory challenges for the university. These relationships should be a constant area of focus in our learning analytics efforts. We recommend that all faculty and staff who adopt LA tools should be aware of the requirements for vendors to complete the Colorado State University Digital Tool FERPA, Data Ownership, and Data Privacy Agreement (see Recommendation 9). Because most faculty may be unaware of the extensive—and substantive—implications of vendors’ uses of students’ data (and possibly personally identifiable information (PII)), we recommend that guidelines be developed to aid faculty and staff in making informed choices regarding the use of e-texts, adaptive courseware, and other vended digital learning platforms (see Recommendations 10 and 11). These guidelines should address issues including vendors’ potential use of students’ data to inform the development of new products, their use of students’ data for sale to third parties, and vendors’ potential use of faculty and members’ intellectual property (in particular, course materials posted on vendor sites).
 Some learning analytics researchers are also referring to audio and voice recordings as well as biometric data as “multi-modal.” These kinds of data, when used to assess learning, are often collected in lab settings.