I've been on the mailing list of ProDaBi for a while now, seeing announcements for their colloquium. Although I've attended before, this is the first session that I felt applied directly to the research and teaching that I do. The colloquium contains two talks and discussion, one hour per speaker.
The first speaker was Graham Dove of NYU, and his talk was called Learning data science through civic engagement with open data. Given the work George and I are doing on a the development of a Knowledge Engineering course, I felt that this research could give us some insights for the development of a course project.
Graham and his team interviewed people involved in open data communities to find out how and why people in New York City become engaged with open data? What are drivers for initial and continued engagement?
Some motivations included people aiming to use open data as a resource for education (as we would do in our course), people using open data for community activism, or as a resource for enterpreneurship.
For the barriers to the open data community, they found some that I would expect: the skills required to use the data effectively, not being aware of training and events, and inaccessible language or jargon. Others were unexpected: negative cognitive experiences, such as bad experiences with math appear to be a barrier to open data engagement. Another interesting aspect I had never thought of was that the required skills are not just data science or analytics skills, users also require knowledge of civics and goverment to interpret data correctly.
Then, to facilitate open data engagement, we can:
One interesting take-away from this talk was that open data literacy should not just include mathematics, statistics, and programming, but local and civic knowledge too.
Furthermore, in their study, Graham and his team also found that intermediaries play a large role. Communities, events, trainings, collaborations etcetera play a large role in initial and continued engagment.
Finally, Graham proposed some pathways to success. Two I found especially interesting are 1) to provide entry points to newcomers who may be overwhelmed and 2) to train-the-trainers. Train-the-trainers means to build up the knowledge of community members by teaching them something, then have them teach the same concept to someone else.
The discussion after the talk was interesting too. There were discussions on whether open data engagement increases civic engagement, how to address data education in K-12 CS, and on equity and access: does access to open data communities impact the next step the career? Anecdotally, it seems that students have stayed engaged. A study on long-term impact is on the agenda.
Finally, there was the question of collaborations and scale-up. How can other communities start engaging with open data to increase data literacy? This is an open question. Graham himself is not involved in setting up these communities, but only researches them. Perhaps other people have some ideas?
This talk was really inspiring to me. I have been interested in being involved in teaching my local community for a while. I do feel that setting up something like this would be very rewarding, but it will also take a lot of effort, and research and teaching already takes up most of my time...
The second talk was by Rob Gould of UCLA, who talked about their Introduction to Data Science course for high schools. His main question was, why should students take a data science course/ why should we teach a data science course?
Rob started off by discussing our current data literacy crisis (ignorance-based decision making, increased inequity, weakened privacy). Schools have the opportunity to address this and teach data literacy. The promise of such a curriculum for students is that it leads to: quality decision making, improved control over career, insight into daily lives, increased autonomy. A course would provide students with tools they need to pose data-driven questions. And it has a hidden agenda: engagement with Data Science will hopefully increase interest in STEM for underrepresented populations.
The Introduction to Data Science course has three key components:
In the USA, the pathway to college requires four mathematics courses in a specific order. The state of California has relaxed this and now allows any four mathematics courses as preparation. This means that for some programs, it is possible to replace a math course with this Intro to Data Science. Unfortunately, the perception is that data science is less rigorous than math, which undermines data literacy. There is much to be done here on emphasizing that data science is worthy and necessary in and of itself.
In the discussion after the talk, there was the question of how to encourage student to do reproducible research? The answer of Rob was that it is impossible for one course to teach all aspects of data science. However, the students may get a feel of this when they work in R. They work in groups, and thus learn about what code is easy to read and what is not.
Another interesting concern was about all the different aspects of CS that we aim to include in K12 education: computational thinking, coding, data science. How can we proceed with these demands but still keep a coherent curriculum? A lot of CS becomes integrated in Math, as it is the main course that is open to this type of material. However, at some point, this becomes impossible. Furthermore, we do not ask math teachers to teach physics or chemistry, why do we ask them to teach computer science? The question of which part of the curriculum we can drop for other parts is an ongoing (and cyclical) discussion. It seems that decisions on this change every decennium, and changes may be rolled back.
From this talk, I'll be considering the lesson plans that Rob discussed for the coming days. Although our students have much broader knowledge than the high-school students being taught here, the lesson on the data cycle especially gives me some ideas for our Knowledge Engineering course. After all, the purpose of the course is to learn more about the capturing and organizaton of data.
Until next time!